Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to run MPT-1b training on Habana device (HPU) using DeepSpeed. #527

Merged
merged 4 commits into from
Aug 22, 2023

Conversation

vivekgoe
Copy link

This PR adds support to run MPT-1b training on Habana device using Habana forked version of DeepSpeed library. Following is a list of main changes made,

  • Add new yaml configuration files for MPT-1b training on HPU devices (Gaudi, Gaudi2),
  • Update train.py to parse DeepSpeed configuration provided in yaml and invoke trainer with DeepSpeed configuration (if provided),

@abhi-mosaic @bandish-shah please help review.

@bandish-shah
Copy link
Contributor

@vivekgoe thanks so much for this PR! I've tagged @dakinggg to review.

@vivekgoe vivekgoe marked this pull request as ready for review August 20, 2023 03:35
@abhi-mosaic abhi-mosaic changed the base branch from main to habana_alpha August 22, 2023 00:36
@abhi-mosaic
Copy link
Contributor

Hi @vivekgoe ! Thank you for this PR showcasing MPT training on HPU. I think for the moment, we would like to keep this in a separate branch habana_alpha and I will merge this PR immediately. This way we on the MosaicML side can continue testing and adding features without conflicting with the main branch.

@abhi-mosaic abhi-mosaic merged commit b0c097f into mosaicml:habana_alpha Aug 22, 2023
5 of 7 checks passed
@vivekgoe
Copy link
Author

@abhi-mosaic Sounds good. Thank you.

hlahkar pushed a commit to hlahkar/llm-foundry that referenced this pull request May 16, 2024
hlahkar pushed a commit to hlahkar/llm-foundry that referenced this pull request Jul 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants