Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logger for anomaly detection #852

Closed
wants to merge 3 commits into from

Commits on Jun 25, 2024

  1. Add base AnomalyEvaluator class

    Summary:
    ### This Stack
    
    Based on [this RFC](https://docs.google.com/document/d/1K1KQ886dynMRejR0ySH1fctOjS7gxaCS8AB1L_PHxU4/edit?usp=sharing), we are adding a new logger that warns about anomalous values in metrics, and optionally executes a callback function with potential side effects. This could be useful for users to realize sooner that something has gone wrong during training.
    
    ### This Diff
    
    To provide flexibility when detecting anomalous metric values, instead of assuming and hardcoding a predefined check (like a threshold), let's create an interface that can be overriden to implement custom checks.
    
    Differential Revision: D58564201
    Diego Urgell authored and facebook-github-bot committed Jun 25, 2024
    Configuration menu
    Copy the full SHA
    2ef31b8 View commit details
    Browse the repository at this point in the history
  2. Implement starter anomaly evaluators

    Summary:
    ### This Stack
    
    Based on [this RFC](https://docs.google.com/document/d/1K1KQ886dynMRejR0ySH1fctOjS7gxaCS8AB1L_PHxU4/edit?usp=sharing), we are adding a new logger that warns about anomalous values in metrics, and optionally executes a callback function with potential side effects. This could be useful for users to realize sooner that something has gone wrong during training.
    
    ### This Diff
    
    To get started with anomaly detection, let's first define two evaluators:
    - Threshold is the most intuitive one, and checks that a metric value is within a predefined range.
    - IsNaN would be useful to catch fast cases where the loss is NaN because of bad inputs.
    
    Later on we can implement more interesting evaluators like outliers, changepoint detection, etc. if needed.
    
    Differential Revision: D58564199
    Diego Urgell authored and facebook-github-bot committed Jun 25, 2024
    Configuration menu
    Copy the full SHA
    c382602 View commit details
    Browse the repository at this point in the history

Commits on Jun 26, 2024

  1. Add logger for anomaly detection (pytorch#852)

    Summary:
    Pull Request resolved: pytorch#852
    
    ### This Stack
    
    Based on [this RFC](https://docs.google.com/document/d/1K1KQ886dynMRejR0ySH1fctOjS7gxaCS8AB1L_PHxU4/edit?usp=sharing), we are adding a new logger that warns about anomalous values in metrics, and optionally executes a callback function with potential side effects. This could be useful for users to realize sooner that something has gone wrong during training.
    
    ### This Diff
    
    After implementing the evaluators, let's add the `AnomalyLogger` class that receives some configuration of metrics to check for. If an anomaly is detected, then it will call an optional `on_anomaly_detected` method that can be overriden by the user.
    
    Next diffs will add this to our `AIXLogger` and `TensorboardLogger` as a base class.
    
    Reviewed By: JKSenthil
    
    Differential Revision: D58564200
    diego-urgell authored and facebook-github-bot committed Jun 26, 2024
    Configuration menu
    Copy the full SHA
    4643f94 View commit details
    Browse the repository at this point in the history