Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added saving of best checkpoint during early stopping #1859

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

sugeeth14
Copy link

With reference to the issue here #1856 and the pull request discussion here #1858 . Best checkpoint will now not be removed when early stopping happens.

@francoishernandez
Copy link
Member

Not sure why you apply this logic only when early_stopped is triggered.
What happens if you have -keep_checkpoints < -early_stopping (tolerance) --> you could still remove your best checkpoint.

2.Upated logic to add config containing info about best checkpoint
3.fixed case when keep_checkpoint < early_stopping.
@sugeeth14
Copy link
Author

sugeeth14 commented Sep 3, 2020

This is the best I could think of

  1. In trainer.py added force validation at every save_checkpoint_steps.
  2. While saving the model if it is triggered by early_stopping best step is also sent otherwise just current step's validation perlexity is sent.
  3. In model_saver.py the following conditions are checked
    a.first save of checkpoint or
    b. better than previous best
    c. If initial best_step is not none ( triggered by early_stop ) and is still the best ( since step != best_step in this case we cannot consider step to be the best since early stop's last step need not contain the best ckpt unless in case it is triggered by stalling. )
    else already existing best ckpt from config is taken.
  4. Best ckpt not removed during the removal.
  5. best_ckpt_config.json always has the best step and perplexity even when early_stopping is not set intially when training is started.

Sorry if this is getting long this is the best I could think to cover all cases. Suggestions welcome. Thanks

@sugeeth14
Copy link
Author

@francoishernandez any update on this ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants