Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] performance problem #193

Open
Xiang-cd opened this issue Apr 15, 2024 · 1 comment
Open

[Feature] performance problem #193

Xiang-cd opened this issue Apr 15, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@Xiang-cd
Copy link

Is your feature request related to a problem? Please describe.

非常赞赏学长们的工作!我有一个小小的问题注意到readme里有一个吞吐和显存占用的表格。BMtrain显著优于Deepspeed- megaton,我好奇这其中的优化主要来源于什么地方呢。同样的逻辑,为什么我们能够支持更多的bach size,吞吐更高?是否也有显卡配置的原因呢(sxm的机器是不是会因为高带宽抹去这样的差距)。
我觉得做到这样的优化绝对是系统顶会级别的工作,学长们有兴趣分析这其中的优化点并总结成文章投稿吗。
其实我非常希望使用BMtrain的框架,但是只看到其中的好,不知道为什么好,心里就不踏实。

Describe the solution you'd like

同上

Describe alternatives you've considered

No response

Additional context

No response

@Xiang-cd Xiang-cd added the enhancement New feature or request label Apr 15, 2024
@Pegessi
Copy link

Pegessi commented May 24, 2024

I believe this work is remarkable for the combination of memory and parallelism and is great for bringing higher throughput. However, insufficient part is that experiments about Megatron-Deepspeed as baseline in this paper do not use Megatron-DeepSpeed with PTD-P and memory optimization in Megatron-LM. Futhermore, Zero3 default config is inefficient because of cpu offloading enabled in few GPUs, which has been dicussed in some blogs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants