Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EASY] Always log 1st batch when resuming training #3009

Merged
merged 2 commits into from
Feb 14, 2024
Merged

[EASY] Always log 1st batch when resuming training #3009

merged 2 commits into from
Feb 14, 2024

Conversation

bigning
Copy link
Contributor

@bigning bigning commented Feb 13, 2024

What does this PR do?

composer prints the console logging for 1st batch and every self.log_interval batches. but when resuming training, we probably want to print the 1st batch no matter if cur_batch % self.log_interval == 0. This PR just forces to print console logging for 1st batch after resumption.

test

  1. firstly save checkpoint after training 3 batches.
  2. load checkpoint and set console_log_interval=7ba composer train/train.py train/yamls/pretrain/mpt-125m.yaml train_loader.dataset.split=train_small max_duration=10ba eval_interval=0 console_log_interval=7ba load_path=./125m/checkpoints/ep0-ba3-rank0.pt

before the change:

first logged batch is 7
image

after the change

first logged batch is 4, then 7
image

@bigning bigning requested a review from a team as a code owner February 13, 2024 22:55
Copy link
Contributor

@mvpatel2000 mvpatel2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove cur_batch==1 in this case?

Can we please add the same for def epoch_end?

@bigning
Copy link
Contributor Author

bigning commented Feb 14, 2024

Should we remove cur_batch==1 in this case?

Can we please add the same for def epoch_end?

good catch! updated.

Copy link
Contributor

@mvpatel2000 mvpatel2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the fix.

@bigning bigning merged commit 9e60fa3 into dev Feb 14, 2024
14 checks passed
@bigning bigning deleted the fix-logging branch February 14, 2024 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants