diff --git a/docs/paper.md b/docs/paper.md index 7be0ebca..0d479ea1 100644 --- a/docs/paper.md +++ b/docs/paper.md @@ -94,16 +94,18 @@ In order to meet the needs of researchers, `scores` provides the following key b **Compatability** -- Highly modular and avoids extensive dependencies by providing its own implementations where relevant. -- Easy to integrate and use in a wide variety of environments. It has been tested and used on workstations, servers and in high performance computing (supercomputing) environments. +- Highly modular - provides its own implementations, avoids extensive dependencies and offers a consistent API. +- Easy to integrate and use in a wide variety of environments. It has been used on workstations, servers and in high performance computing (supercomputing) environments. +- Maintains 100% automated test coverage. - Uses Dask [@Dask:2016] for scaling and performance. - Expanding support for `pandas` [@pandas:2024; @McKinney:2010]. + ## Metrics, Statistical Techniques and Data Processing Tools Included in `scores` At the time of writing, `scores` includes over 50 metrics, statistical techniques and data processing tools. For an up to date list, please see the `scores` documentation. -We anticipate more metrics, tools and statistical techniques will be added over time. +The ongoing development roadmap includes the addition of more metrics, tools, and statistical tests. Table: A **Curated Selection** of the Metrics, Tools and Statistical Tests Currently Included in `scores` @@ -113,7 +115,7 @@ Table: A **Curated Selection** of the Metrics, Tools and Statistical Tests Curre | | **Probability** |Scores for evaluating forecasts that are expressed as predictive distributions, ensembles, and probabilities of binary events. |Brier Score [@BRIER_1950], Continuous Ranked Probability Score (CRPS) for Cumulative Distribution Functions (CDFs) (including threshold-weighting, see @Gneiting:2011), CRPS for ensembles [@Gneiting_2007; @Ferro_2013], Receiver Operating Characteristic (ROC), Isotonic Regression (reliability diagrams) [@dimitriadis2021stable]. | -| **Categorical** |Scores for evaluating forecasts based on categories. |Probability of Detection (POD), Probability of False Detection (POFD), False Alarm Ratio (FAR), Success Ratio, Accuracy, Peirce's Skill Score [@Peirce:1884], Critical Success Index (CSI), Gilbert Skill Score [@gilbert:1884], Heidke Skill Score, Odds Ratio, Odds Ratio Skill Score, F1 Score, FIxed Risk Multicategorical (FIRM) Score [@Taggart:2022a]. +| **Categorical** |Scores for evaluating forecasts based on categories. |Probability of Detection (POD), Probability of False Detection (POFD), False Alarm Ratio (FAR), Success Ratio, Accuracy, Peirce's Skill Score [@Peirce:1884], Critical Success Index (CSI), Gilbert Skill Score [@gilbert:1884], Heidke Skill Score, Odds Ratio, Odds Ratio Skill Score, F1 Score, FIxed Risk Multicategorical (FIRM) Score [@Taggart:2022a]. | | **Statistical Tests** |Tools to conduct statistical tests and generate confidence intervals. | Diebold-Mariano [@Diebold:1995] with both the @Harvey:1997 and @Hering:2011 modifications. | @@ -121,23 +123,23 @@ Table: A **Curated Selection** of the Metrics, Tools and Statistical Tests Curre ## Use in Academic Work -In 2015, the Australian Bureau of Meteorology began developing a new verification system called Jive. For a description of Jive see @loveday2024jive. The Jive verification metrics have been used to support several publications [@Griffiths:2017; @Foley:2020; @Taggart:2022d; @Taggart:2022b; @Taggart:2022c]. `scores` has arisen from the Jive verification system and was created to modularise the Jive verification functions and make them available as an open source package. +In 2015, the Australian Bureau of Meteorology began developing a new verification system called Jive, which became operational in 2022. For a description of Jive see @loveday2024jive. The Jive verification metrics have been used to support several publications [@Griffiths:2017; @Foley:2020; @Taggart:2022d; @Taggart:2022b; @Taggart:2022c]. `scores` has arisen from the Jive verification system and was created to modularise the Jive verification functions and make them available as an open source package. `scores` has been used to explore user-focused approaches to evaluating probabilistic and categorical forecasts [@Loveday2024ts]. ## Related Software Packages -There are multiple open source verification packages in a range of languages. Below is a comparison of `scores` to other open source Python verification packages. None of these include all of the functions implemented within `scores`, and vice versa. - -`xskillscore` [@xskillscore] provides many of the same functions as `scores`. The Jupyter Notebook tutorials in `scores` cover a wider array of metrics. +There are multiple open source verification packages in a range of languages. Below is a comparison of `scores` to other open source Python verification packages. None of these include all of the metrics implemented in `scores` (and vice versa). + +`xskillscore` [@xskillscore] provides many but not all of the same functions as `scores` and does not have direct support for pandas. The Jupyter Notebook tutorials in `scores` cover a wider array of metrics. `climpred` [@Brady:2021] utilises `xskillscore` combined with data handling functionality, and is focused on ensemble forecasts for climate and weather. `climpred` makes some design choices related to data structure (specifically associated with climate modelling) which may not generalise effectively to broader use cases. Releasing `scores` separately allows the differing design philosophies to be considered by the community. -`METplus` [@Brown:2021] is widely used by weather and climate model developers. `METplus` includes a database and a visualisation system, with Python and shell script wrappers to utilise the complex `MET` package. Verification scores in `MET` are implemented in C++ rather than Python. +`METplus` [@Brown:2021] is a verification system used by weather and climate model developers. `METplus` includes a database and a visualisation system, with Python and shell script wrappers to utilise the `MET` package for the calculation of scores. `MET` is implemented in C++ rather than Python. `METplus` is used as a system rather than providing a modular Python API. -`Verif` [@nipen2023verif] is a command line tool for generating verification plots and is utilised very differently to `scores`. +`Verif` [@nipen2023verif] is a command line tool for generating verification plots whereas `scores` provides a Python API for generating numerical scores. -`Pysteps` [@gmd-12-4185-2019; @Imhoff:2023] is a package for short-term ensemble prediction systems, and includes a significant verification submodule with many useful verification scores. As `Pysteps` includes functionality well beyond verification, it is not as modular. +`Pysteps` [@gmd-12-4185-2019; @Imhoff:2023] is a package for short-term ensemble prediction systems, and includes a significant verification submodule with many useful verification scores. `PySteps` does not provide a standalone verification API. `PyForecastTools` [@Morley:2020] is a Python package for model and forecast validation which supports `dmarray` rather than `xarray` data structures and does not include Jupyter Notebook tutorials.