Skip to content

Commit

Permalink
Paper alternative for consideration following discussions (#480)
Browse files Browse the repository at this point in the history
* Respond to discussions / review
* Respond to grammar on API consistency
  • Loading branch information
tennlee committed Jun 6, 2024
1 parent 054c56e commit 34c6ce6
Showing 1 changed file with 13 additions and 11 deletions.
24 changes: 13 additions & 11 deletions docs/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,16 +94,18 @@ In order to meet the needs of researchers, `scores` provides the following key b

**Compatability**

- Highly modular and avoids extensive dependencies by providing its own implementations where relevant.
- Easy to integrate and use in a wide variety of environments. It has been tested and used on workstations, servers and in high performance computing (supercomputing) environments.
- Highly modular - provides its own implementations, avoids extensive dependencies and offers a consistent API.
- Easy to integrate and use in a wide variety of environments. It has been used on workstations, servers and in high performance computing (supercomputing) environments.
- Maintains 100% automated test coverage.
- Uses Dask [@Dask:2016] for scaling and performance.
- Expanding support for `pandas` [@pandas:2024; @McKinney:2010].


## Metrics, Statistical Techniques and Data Processing Tools Included in `scores`

At the time of writing, `scores` includes over 50 metrics, statistical techniques and data processing tools. For an up to date list, please see the `scores` documentation.

We anticipate more metrics, tools and statistical techniques will be added over time.
The ongoing development roadmap includes the addition of more metrics, tools, and statistical tests.

Table: A **Curated Selection** of the Metrics, Tools and Statistical Tests Currently Included in `scores`

Expand All @@ -113,31 +115,31 @@ Table: A **Curated Selection** of the Metrics, Tools and Statistical Tests Curre
|
| **Probability** |Scores for evaluating forecasts that are expressed as predictive distributions, ensembles, and probabilities of binary events. |Brier Score [@BRIER_1950], Continuous Ranked Probability Score (CRPS) for Cumulative Distribution Functions (CDFs) (including threshold-weighting, see @Gneiting:2011), CRPS for ensembles [@Gneiting_2007; @Ferro_2013], Receiver Operating Characteristic (ROC), Isotonic Regression (reliability diagrams) [@dimitriadis2021stable].
|
| **Categorical** |Scores for evaluating forecasts based on categories. |Probability of Detection (POD), Probability of False Detection (POFD), False Alarm Ratio (FAR), Success Ratio, Accuracy, Peirce's Skill Score [@Peirce:1884], Critical Success Index (CSI), Gilbert Skill Score [@gilbert:1884], Heidke Skill Score, Odds Ratio, Odds Ratio Skill Score, F1 Score, FIxed Risk Multicategorical (FIRM) Score [@Taggart:2022a].
| **Categorical** |Scores for evaluating forecasts based on categories. |Probability of Detection (POD), Probability of False Detection (POFD), False Alarm Ratio (FAR), Success Ratio, Accuracy, Peirce's Skill Score [@Peirce:1884], Critical Success Index (CSI), Gilbert Skill Score [@gilbert:1884], Heidke Skill Score, Odds Ratio, Odds Ratio Skill Score, F1 Score, FIxed Risk Multicategorical (FIRM) Score [@Taggart:2022a].
|
| **Statistical Tests** |Tools to conduct statistical tests and generate confidence intervals. | Diebold-Mariano [@Diebold:1995] with both the @Harvey:1997 and @Hering:2011 modifications.
|
| **Processing Tools** |Tools to pre-process data. |Data matching, discretisation, cumulative density function manipulation. |

## Use in Academic Work

In 2015, the Australian Bureau of Meteorology began developing a new verification system called Jive. For a description of Jive see @loveday2024jive. The Jive verification metrics have been used to support several publications [@Griffiths:2017; @Foley:2020; @Taggart:2022d; @Taggart:2022b; @Taggart:2022c]. `scores` has arisen from the Jive verification system and was created to modularise the Jive verification functions and make them available as an open source package.
In 2015, the Australian Bureau of Meteorology began developing a new verification system called Jive, which became operational in 2022. For a description of Jive see @loveday2024jive. The Jive verification metrics have been used to support several publications [@Griffiths:2017; @Foley:2020; @Taggart:2022d; @Taggart:2022b; @Taggart:2022c]. `scores` has arisen from the Jive verification system and was created to modularise the Jive verification functions and make them available as an open source package.

`scores` has been used to explore user-focused approaches to evaluating probabilistic and categorical forecasts [@Loveday2024ts].

## Related Software Packages

There are multiple open source verification packages in a range of languages. Below is a comparison of `scores` to other open source Python verification packages. None of these include all of the functions implemented within `scores`, and vice versa.

`xskillscore` [@xskillscore] provides many of the same functions as `scores`. The Jupyter Notebook tutorials in `scores` cover a wider array of metrics.
There are multiple open source verification packages in a range of languages. Below is a comparison of `scores` to other open source Python verification packages. None of these include all of the metrics implemented in `scores` (and vice versa).
`xskillscore` [@xskillscore] provides many but not all of the same functions as `scores` and does not have direct support for pandas. The Jupyter Notebook tutorials in `scores` cover a wider array of metrics.

`climpred` [@Brady:2021] utilises `xskillscore` combined with data handling functionality, and is focused on ensemble forecasts for climate and weather. `climpred` makes some design choices related to data structure (specifically associated with climate modelling) which may not generalise effectively to broader use cases. Releasing `scores` separately allows the differing design philosophies to be considered by the community.

`METplus` [@Brown:2021] is widely used by weather and climate model developers. `METplus` includes a database and a visualisation system, with Python and shell script wrappers to utilise the complex `MET` package. Verification scores in `MET` are implemented in C++ rather than Python.
`METplus` [@Brown:2021] is a verification system used by weather and climate model developers. `METplus` includes a database and a visualisation system, with Python and shell script wrappers to utilise the `MET` package for the calculation of scores. `MET` is implemented in C++ rather than Python. `METplus` is used as a system rather than providing a modular Python API.

`Verif` [@nipen2023verif] is a command line tool for generating verification plots and is utilised very differently to `scores`.
`Verif` [@nipen2023verif] is a command line tool for generating verification plots whereas `scores` provides a Python API for generating numerical scores.

`Pysteps` [@gmd-12-4185-2019; @Imhoff:2023] is a package for short-term ensemble prediction systems, and includes a significant verification submodule with many useful verification scores. As `Pysteps` includes functionality well beyond verification, it is not as modular.
`Pysteps` [@gmd-12-4185-2019; @Imhoff:2023] is a package for short-term ensemble prediction systems, and includes a significant verification submodule with many useful verification scores. `PySteps` does not provide a standalone verification API.

`PyForecastTools` [@Morley:2020] is a Python package for model and forecast validation which supports `dmarray` rather than `xarray` data structures and does not include Jupyter Notebook tutorials.

Expand Down

0 comments on commit 34c6ce6

Please sign in to comment.