-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add anomaly detection support to TensorboardLogger #854
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This pull request was exported from Phabricator. Differential Revision: D58593222 |
Summary: ### This Stack Based on [this RFC](https://docs.google.com/document/d/1K1KQ886dynMRejR0ySH1fctOjS7gxaCS8AB1L_PHxU4/edit?usp=sharing), we are adding a new logger that warns about anomalous values in metrics, and optionally executes a callback function with potential side effects. This could be useful for users to realize sooner that something has gone wrong during training. ### This Diff To provide flexibility when detecting anomalous metric values, instead of assuming and hardcoding a predefined check (like a threshold), let's create an interface that can be overriden to implement custom checks. Differential Revision: D58564201
Summary: ### This Stack Based on [this RFC](https://docs.google.com/document/d/1K1KQ886dynMRejR0ySH1fctOjS7gxaCS8AB1L_PHxU4/edit?usp=sharing), we are adding a new logger that warns about anomalous values in metrics, and optionally executes a callback function with potential side effects. This could be useful for users to realize sooner that something has gone wrong during training. ### This Diff To get started with anomaly detection, let's first define two evaluators: - Threshold is the most intuitive one, and checks that a metric value is within a predefined range. - IsNaN would be useful to catch fast cases where the loss is NaN because of bad inputs. Later on we can implement more interesting evaluators like outliers, changepoint detection, etc. if needed. Differential Revision: D58564199
Summary: ### This Stack Based on [this RFC](https://docs.google.com/document/d/1K1KQ886dynMRejR0ySH1fctOjS7gxaCS8AB1L_PHxU4/edit?usp=sharing), we are adding a new logger that warns about anomalous values in metrics, and optionally executes a callback function with potential side effects. This could be useful for users to realize sooner that something has gone wrong during training. ### This Diff After implementing the evaluators, let's add the `AnomalyLogger` class that receives some configuration of metrics to check for. If an anomaly is detected, then it will call an optional `on_anomaly_detected` method that can be overriden by the user. Next diffs will add this to our `AIXLogger` and `TensorboardLogger` as a base class. Differential Revision: D58564200
This pull request was exported from Phabricator. Differential Revision: D58593222 |
diego-urgell
force-pushed
the
export-D58593222
branch
from
June 25, 2024 23:21
b416344
to
de3aed7
Compare
diego-urgell
added a commit
to diego-urgell/tnt
that referenced
this pull request
Jun 25, 2024
Summary: Pull Request resolved: pytorch#854 ### This Stack Based on [this RFC](https://docs.google.com/document/d/1K1KQ886dynMRejR0ySH1fctOjS7gxaCS8AB1L_PHxU4/edit?usp=sharing), we are adding a new logger that warns about anomalous values in metrics, and optionally executes a callback function with potential side effects. This could be useful for users to realize sooner that something has gone wrong during training. ### This Diff To start leveraging the AnomalyLogger as easily as possible, let's make it the base class for the Tensorboard logger instead of MetricLogger. This will have no effect unless users specify the `tracked_metrics` attribute, which is optional. However, if they do want to use it, they have to make very little changes. Next diff will do the same for the AIXLogger Reviewed By: JKSenthil Differential Revision: D58593222
Summary: Pull Request resolved: pytorch#854 ### This Stack Based on [this RFC](https://docs.google.com/document/d/1K1KQ886dynMRejR0ySH1fctOjS7gxaCS8AB1L_PHxU4/edit?usp=sharing), we are adding a new logger that warns about anomalous values in metrics, and optionally executes a callback function with potential side effects. This could be useful for users to realize sooner that something has gone wrong during training. ### This Diff To start leveraging the AnomalyLogger as easily as possible, let's make it the base class for the Tensorboard logger instead of MetricLogger. This will have no effect unless users specify the `tracked_metrics` attribute, which is optional. However, if they do want to use it, they have to make very little changes. Next diff will do the same for the AIXLogger Reviewed By: JKSenthil Differential Revision: D58593222
This pull request was exported from Phabricator. Differential Revision: D58593222 |
diego-urgell
force-pushed
the
export-D58593222
branch
from
June 26, 2024 00:26
de3aed7
to
30c7b56
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
This Stack
Based on this RFC, we are adding a new logger that warns about anomalous values in metrics, and optionally executes a callback function with potential side effects. This could be useful for users to realize sooner that something has gone wrong during training.
This Diff
To start leveraging the AnomalyLogger as easily as possible, let's make it the base class for the Tensorboard logger instead of MetricLogger. This will have no effect unless users specify the
tracked_metrics
attribute, which is optional. However, if they do want to use it, they have to make very little changes.Next diff will do the same for the AIXLogger
Reviewed By: JKSenthil
Differential Revision: D58593222