-
Notifications
You must be signed in to change notification settings - Fork 868
Several options with step-by-step instructions and examples are detailed here in the wiki.
Precision, recall, and their harmonic mean, F1 score, are helpful evaluation functions for most machine learning algorithms. They don't, however, incorporate time in the calculations, and are thus unsuitable for evaluating the ability of an algorithm to perform on real-time, streaming data. A main motivation for the NAB was to design a scoring system which incorporates time and the TP, TN, FN, and FP counts.
See also: Precision and Recall, F1 Score
NAB contains real data where the anomalies are known; when these data files are contributed to NAB we are informed of the exact data points that were anomalies. An example is data/realKnownCause/ambient_temperature_system_failure.csv, where the contributor knows the temperature system encountered anomalies at "2013-12-22 20:00:00" and "2014-04-13 09:00:00". The other real data files in NAB are labeled by hand, following a [defined set of procedures and rules](Anomaly Labeling Instructions) such that we accurately label anomalous data.
It is okay for the metrics' counts to vary for different application profiles. For a given DUT, the optimization step calculates the best threshold -- i.e. likelihood value above which a data point is anomalous -- for each application profile, where the best threshold is that which maximizes the score. Thus, consider the application profile "Rewards Low FP Rate". The optimal threshold for this profile will likely be higher than that of the other profiles because then the DUT outputs fewer detections, which likely results in fewer FPs.
No, but there needs to be a corresponding entry in /config/thresholds.json for the detector, where the values are NULL.
NAB steps will recurse through the data and results directories, so keeping any extra files there will cause errors. And follow the naming conventions explicitly!
I ran the detection step and my results files are different than those checked in. Did I do something wrong?
Try running the scoring and normalization steps, which write additional columns to these files with the scores for each record. This is likely the difference.
Yes, but nothing that changes the resulting NAB scores. We want v1.0 to be static; a changing benchmark isn't very valuable because scores aren't comparable. However, in preparation for v2.0, we are accepting data and algorithms for future inclusion in NAB.
Yes, which is why we only add data (or make code changes affecting the scores) with new NAB versions. This way scores of a common NAB version are comparable. There is no time table yet for when we anticipate v2.0.
We combat overfitting by including diverse data and anomalies in the dataset. And because the NAB dataset is designed to evaluate generalized anomaly detectors, it would be difficult to tune parameters specifically to the dataset. Yet this is one reason why we would love more data from the community!
Take a look at the README in the data directory.