Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save checkpoint to disk for API with new save layout #3399

Merged
merged 70 commits into from
Jun 21, 2024

Conversation

eracah
Copy link
Contributor

@eracah eracah commented Jun 13, 2024

What does this PR do?

  1. This PR introduces an API for saving the common objects one would save in a checkpoint:

    • save_model_to_disk
    • save_optim_to_disk
    • save_composer_metadata_to_disk
    • save_resumption_state_to_disk
      These functions make sure to save each object into its own directory
  2. Introduces a dataclass, CheckpointSaveOptions for configuring checkpoint saving

  3. Introduces a save to disk workhorse function, save_checkpoint_to_disk that handles saving all components of a checkpoint. Uses the API in 1. to save each component into its own separate file

Extras:

  • adds an init_state helper function for tests to initialize a State object for tests

eracah and others added 30 commits June 4, 2024 16:17
@eracah eracah marked this pull request as ready for review June 18, 2024 23:41
Copy link
Contributor

@mvpatel2000 mvpatel2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, would be helpful to have someone check though, many moving parts here

composer/checkpoint/save.py Outdated Show resolved Hide resolved
@eracah eracah enabled auto-merge (squash) June 21, 2024 00:26
@eracah eracah merged commit 632eb15 into mosaicml:dev Jun 21, 2024
17 checks passed
mvpatel2000 pushed a commit to mvpatel2000/composer that referenced this pull request Jul 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants