-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
代码没问题,只是稍微缓解不平衡,但还是不平衡。 #13
Comments
那你可以看一下是不是你模型本身比较大的问题,还有这个和pytorch的版本也有关系,太新的pytorch还没有经过调试 |
我的模型确实很大,大概30M参数量,不过这还跟pytorch版本有关系吗?你测试的版本是哪一个版本?------------------ 原始邮件 ------------------
***@***.***>
发送时间: 2021年7月7日(星期三) 下午4:00
***@***.***>;
***@***.******@***.***>;
主题: Re: [Link-Li/Balanced-DataParallel] 代码没问题,只是稍微缓解不平衡,但还是不平衡。 (#13)
|
30M应该不会出现这样的情况吧,这个模型和bert-base比都要小很多。我当时测试的应该是1.3附近的一个版本,具体是哪个我已经记不清楚了。 |
我的模型是hiddensize为256的transformer-base,pytorch版本为1.6.0,batchsize为120。 之前的batchsize为150左右。 |
The GPU memory-usage is balanced and can run a larger batchsize, but the accuracy of the trained model has decreased, and it is unclear where the problem is. |
bsz: 158
num_dev: 6
gpu0_bsz: 1
bsz_unit: 31
chunk_sizes: [1, 32, 32, 31, 31, 31]
len(inputs): 6
self.device_ids[:len(inputs)] [0, 1, 2, 3, 4, 5]
replicas: 6
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 13347 C python 10505MiB |
| 1 13347 C python 4991MiB |
| 4 13347 C python 4991MiB |
| 5 13347 C python 4925MiB |
| 6 13347 C python 4925MiB |
| 7 13347 C python 4925MiB |
+-----------------------------------------------------------------------------+
The text was updated successfully, but these errors were encountered: