You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
New Features
Added an option is_caching_logits to DistillationConfig. If is_caching_logits is True, the distiller will cache the batches and the output logits of the teacher model, so that those logits will only be calcuated once. It will speed up the distillation process. This feature is only available for BasicDistiller and MultiTeacherDistiller. Be caution of setting it to True on large datasets, since it will store the batches and logits into the memory.
Improvements
Added new argument max_grad_norm to distillers' train method. It sets the strength of gradient clipping. Default -1, i.e., no gradient clipping.
Added new arguments scheduler_class and scheduler_args to distillers' train method. The old scheduler may cause convergence problem and is deprecated in favor of scheduler_class and scheduler_args. See the documentation for details.
Removed print in thedisplay_paramters. Now it won't print the statistics directly to the screen.