Skip to content

Commit

Permalink
have TSS subclass BaseCheckpointer (#631)
Browse files Browse the repository at this point in the history
Summary:
Pull Request resolved: #631

# Context
We want to add another checkpointer using [DCP](https://pytorch.org/docs/stable/distributed.checkpoint.html). However, we don't want to duplicate the logic that already exists in TorchSnapshotSaver related to checkpoint frequency, keeping k latest checkpoints, etc

# This Diff
Has `TorchSnapshotSaver` subclass `BaseCheckpointer` and implement `_checkpoint_impl` and `restore` accordingly. Removes utility functions that were moved to `BaseCheckpointer` in previous diff.

* cleans up few tests in `test_torchsnapshot_saver` that are no longer applicable (these are moved to `test_base_checkpointer` in previous diff as their functionality now exists in `BaseCheckpointer` like checkpoint frequency, keep n latest checkpoints, etc)

Reviewed By: galrotem

Differential Revision: D51482322

fbshipit-source-id: f12d900c90e53f684e5371fe0b187482d4d84137
  • Loading branch information
JKSenthil authored and facebook-github-bot committed Dec 6, 2023
1 parent e11cfb2 commit 93199e4
Show file tree
Hide file tree
Showing 2 changed files with 44 additions and 616 deletions.
Loading

0 comments on commit 93199e4

Please sign in to comment.