Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Sequence Parallelism #429

Merged
merged 16 commits into from
Sep 4, 2024

Conversation

polisettyvarma
Copy link

No description provided.

@tjruwase tjruwase requested review from samadejacobs and removed request for arashb, duli2012, tjruwase, awan-10, GuanhuaWang and eltonzheng July 26, 2024 14:01
@polisettyvarma
Copy link
Author

@samadejacobs @tjruwase can you please review this to proceed further ?

@polisettyvarma
Copy link
Author

@samadejacobs @tjruwase please review this.

@polisettyvarma
Copy link
Author

@tjruwase @loadams can someone review this ?

megatron/model/utils.py Outdated Show resolved Hide resolved
@polisettyvarma
Copy link
Author

@tjruwase Thanks for the review, please check my replies to your comments.

@polisettyvarma
Copy link
Author

@tjruwase i missed your reply, sorry for the late response. please check my comment

@polisettyvarma
Copy link
Author

@tjruwase please review now

@polisettyvarma
Copy link
Author

@tjruwase it's approved but not merged yet, any reason ?

@tjruwase tjruwase merged commit 0d6e379 into microsoft:main Sep 4, 2024
5 checks passed
@polisettyvarma polisettyvarma deleted the sequence_parallelism branch September 4, 2024 10:56
@polisettyvarma
Copy link
Author

@tjruwase Thanks for merging. I have query regarding hpu specific changes like creating custom bash run scripts for hpu under examples_deepsped/hpu folder. is that okay ?

@tjruwase
Copy link

tjruwase commented Sep 4, 2024

@polisettyvarma, yes that seems reasonable.

@ys950902
Copy link

ys950902 commented Sep 24, 2024

Hi @polisettyvarma, this pr will cause init error for rmsnorm init in torch implementation like below:
[rank0]: self.input_layernorm = RMSNorm(config.hidden_size, config.layernorm_epsilon,
[rank0]: TypeError: RMSNorm.init() got an unexpected keyword argument 'sequence_parallel'

I have raised the pr to fix #448, is it okay for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants