Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiple GPUs training cannot match up with single GPU training #17

Open
hxu105 opened this issue Sep 10, 2024 · 5 comments
Open

multiple GPUs training cannot match up with single GPU training #17

hxu105 opened this issue Sep 10, 2024 · 5 comments

Comments

@hxu105
Copy link

hxu105 commented Sep 10, 2024

Howdy,

I have tried to reproduce the experiments and encountered some issues with the training setting. I choose the "lmsys/vicuna-7b-v1.5-16k" as my base model and integrate it with ND or HO template (2-layer MLP as adapter) and train the model on CORA dataset for 1 epoch. However, the single GPU training is substantially better than the multiple GPU training. I put the result in the table following.

template single multiple
ND 0.7265 0.6368
HO 0.7618 0.6985

The other training configurations are set to be the same for single and multiple GPU settings. lr = 2e-3, batch size = 12 for single, and batch size = 4 for 3 GPUs.

Could you help to take a look at it?

Many thanks,

HX

@ChenRunjin
Copy link
Collaborator

In my previous experiments, I usually obtained very similar results when training on single or multiple GPUs. However, when training on the Cora dataset, I typically trained for 3-5 epochs (you can set the dataset name to cora.3). I suspect the issue may arise because the model hasn't fully converged when you only train for 1 epoch on smaller datasets, though I'm not entirely certain about this.

@hxu105
Copy link
Author

hxu105 commented Sep 16, 2024

I see, however, the pubmed dataset also suffers the same problem, some can reach more than 90% accuracy when training model on a single GPU but the model only performs around 85% when trained on multiple GPUs. What would be a reasonable number of epochs you suggest to run on multi-GPU setting? Many thanks!

@ChenRunjin
Copy link
Collaborator

Hi, it's a bit strange—I haven't encountered this issue before. On my end, the PubMed dataset can achieve 95% accuracy on node classification with just 1 epoch, but it takes about 5 epochs for link prediction to reach top performance. In my case, the performance between multi-GPU and single-GPU setups is nearly the same. Have you experienced similar issues in other DeepSpeed experiments?

@hxu105
Copy link
Author

hxu105 commented Sep 17, 2024

I got it! I was referring to the link prediction task. So for link prediction, the model usually needs more epochs to converge to optima? Does the model also need more epochs for larger datasets like products and arxiv? Many thanks!

@ChenRunjin
Copy link
Collaborator

For the link prediction task, if you are training on a single small dataset, I recommend using 8 epochs for Cora and 5 epochs for PubMed. For larger datasets like Arxiv and Products, 1 epoch is sufficient. If you are training on multiple datasets, since the model can leverage information from different datasets, I would suggest using the combination of arxiv-products-pubmed-cora.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants