Skip to content
Alexander Lavin edited this page Nov 16, 2015 · 6 revisions

Numenta Anomaly Benchmark FAQ

How can I test my own anomaly detection algorithm?

Several options with step-by-step instructions and examples are detailed here in the wiki.

Why do the NAB scores not correlate with precision, recall, and F1 score?

Precision, recall, and their harmonic mean, F1 score, are helpful evaluation functions for most machine learning algorithms. They don't, however, incorporate time in the calculations, and are thus unsuitable for evaluating the ability of an algorithm to perform on real-time, streaming data. A main motivation for the NAB was to design a scoring system which incorporates time and the TP, TN, FN, and FP counts.

See also: Precision and Recall, F1 Score

What's the difference between hand-labeled data and data with known-cause anomalies?

NAB contains real data where the anomalies are known; when these data files are contributed to NAB we are informed of the exact data points that were anomalies. An example is data/realKnownCause/ambient_temperature_system_failure.csv, where the contributor knows the temperature system encountered anomalies at "2013-12-22 20:00:00" and "2014-04-13 09:00:00". The other real data files in NAB are labeled by hand, following a [defined set of procedures and rules](Anomaly Labeling Instructions) such that we accurately label anomalous data.

The TP, TN, FP, and FN have different values between different profiles. Why?

It is okay for the metrics' counts to vary for different application profiles. For a given DUT, the optimization step calculates the best threshold -- i.e. likelihood value above which a data point is anomalous -- for each application profile, where the best threshold is that which maximizes the score. Thus, consider the application profile "Rewards Low FP Rate". The optimal threshold for this profile will likely be higher than that of the other profiles because then the DUT outputs fewer detections, which likely results in fewer FPs.

Do I need to optimize the threshold(s) before scoring a detector under test?

No, but there needs to be a corresponding entry in /config/thresholds.json for the detector, where the values are NULL.

What is this error I keep getting with path and/or file names?

NAB steps will recurse through the data and results directories, so keeping any extra files there will cause errors. And follow the naming conventions explicitly!

I ran the detection step and my results files are different than those checked in. Did I do something wrong?

Try running the scoring and normalization steps, which write additional columns to these files with the scores for each record. This is likely the difference.

Are you accepting pull requests?

Yes, but nothing that changes the resulting NAB scores. We want v1.0 to be static; a changing benchmark isn't very valuable because scores aren't comparable. However, in preparation for v2.0, we are accepting data and algorithms for future inclusion in NAB.

If you add data to NAB, won't the scores change?

Yes, which is why we only add data (or make code changes affecting the scores) with new NAB versions. This way scores of a common NAB version are comparable. There is no time table yet for when we anticipate v2.0.

How do you prevent overfitting? Can't I specifically tune my algorithm to perform well on NAB?

We combat overfitting by including diverse data and anomalies in the dataset. And because the NAB dataset is designed to evaluate generalized anomaly detectors, it would be difficult to tune parameters specifically to the dataset. Yet this is one reason why we would love more data from the community!

What types of data are included in NAB?

Take a look at the README in the data directory.