Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nick updates to paper #398

Merged
merged 9 commits into from
May 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 9 additions & 8 deletions docs/paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -180,13 +180,13 @@ @misc{Taggart:2022d
year = {2022},
note = {Accessed on September 9, 2023}
}
@misc{loveday2023userfocused,
title={A User-Focused Approach to Evaluating Probabilistic and Categorical Forecasts},
author={Nicholas Loveday and Robert Taggart and Mohammadreza Khanarmuei},
year={2023},
eprint={2311.18258},
archivePrefix={arXiv},
primaryClass={stat.AP}
@article{loveday2024user,
title={A User-Focused Approach to Evaluating Probabilistic and Categorical Forecasts},
author={Loveday, Nicholas and Taggart, Robert and Khanarmuei, Mohammadreza},
journal={Weather and Forecasting},
year={2024},
publisher={American Meteorological Society},
doi={10.1175/WAF-D-23-0201.1}
}
@article{nipen2023verif,
title={Verif: A weather-prediction verification tool for effective product development},
Expand Down Expand Up @@ -227,7 +227,8 @@ @misc{loveday2024jive
year={2024},
eprint={2404.18429},
archivePrefix={arXiv},
primaryClass={physics.ao-ph}
primaryClass={physics.ao-ph},
doi={10.48550/arXiv.2404.18429}
}
@article{Ferro_2013,
title={Fair scores for ensemble forecasts: Fair Scores for Ensemble Forecasts},
Expand Down
35 changes: 16 additions & 19 deletions docs/paper.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
---
title: 'scores: A Python package for verifying accuracy using xarray'
title: 'scores: A Python package for evaluating and verifying forecasts using xarray'
tags:
- Python
- geoscience
- verification
- science
- earth system science
- statistics
- modelling
- geoscience
- earth system science

authors:
- name: Tennessee Leeuwenburg
orcid: 0009-0008-2024-1967
Expand All @@ -17,18 +18,18 @@ authors:
affiliations:
- name: Bureau of Meteorology, Australia
index: 1
date: 1 December 2023
date: 17 May 2024
bibliography: paper.bib

---

# Summary

`scores` is a Python package containing mathematical functions for the verification, evaluation and optimisation of forecasts, predictions or models. It primarily supports the meteorological, climatological and geoscience communities. In addition to supporting the Earth system science communities, it also has wide potential application in machine learning and other domains.
`scores` is a Python package containing mathematical functions for the verification, evaluation and optimisation of forecasts, predictions or models. It primarily supports the geoscience communities; in particular, the meteorological, climatological and oceanographic communities. In addition to supporting the Earth system science communities, it also has wide potential application in machine learning and other domains such as economics.

`scores` not only includes common scores (e.g. MAE, RMSE), it also includes novel scores not commonly found elsewhere (e.g. FIRM, Flip-Flop Index), complex scores (e.g. threshold weighted CRPS), and statistical tests (such as the Diebold Mariano test). It also contains isotonic regression which is becoming an increasingly important tool in forecast verification and can be used to generate stable reliability diagrams. Additionally, it provides pre-processing tools for preparing data for scores in a variety of formats including cumulative distribution functions (CDF). At the time of writing, `scores` includes over 50 metrics, statistical techniques and data processing tools.
`scores` not only includes common scores (e.g. Mean Aboslute Error), it also includes novel scores not commonly found elsewhere (e.g. FIxed Risk Multicategorical (FIRM) score, Flip-Flop Index), complex scores (e.g. threshold-weighted continuous ranked probability score), and statistical tests (such as the Diebold Mariano test). It also contains isotonic regression which is becoming an increasingly important tool in forecast verification and can be used to generate stable reliability diagrams. Additionally, it provides pre-processing tools for preparing data for scores in a variety of formats including cumulative distribution functions (CDF). At the time of writing, `scores` includes over 50 metrics, statistical techniques and data processing tools.

All of the scores and statistical techniques in this package have undergone a thorough scientific review. Every score has a companion Jupyter Notebook tutorial that demonstrates its use in practice.
All of the scores and statistical techniques in this package have undergone a thorough scientific and software review. Every score has a companion Jupyter Notebook tutorial that demonstrates its use in practice.

`scores` is focused on supporting xarray [@Hoyer:2017] datatypes for earth system data. It also aims to be compatible with pandas and geopandas, and to work with NetCDF4, hdf5, Zarr and GRIB data sources among others. Scores is designed to utilise Dask for scaling and performance.

Expand All @@ -44,13 +45,13 @@ In order to meet the needs of researchers, `scores`:

- is designed to work with n-dimensional data (e.g., geospatial, vertical and temporal dimensions) for both point-based and gridded data. `scores` can effectively handle the dimensionality, data size and data structures commonly utilised for:
- gridded earth system data (e.g. Numerical Weather Prediction models)
- tabular, point, lat/lon or site-based data (e.g. forecasts for specific locations).
- tabular, point, latitude/longitude or site-based data (e.g. forecasts for specific locations).
- is designed to handle missing data, masking of data and weighting of results.
- includes a companion Jupyter Notebook for each score, metric and statistical test to demonstrate its use in practice.
- includes a companion Jupyter Notebook tutorial for each metric and statistical test that demonstrates its use in practice.
- includes novel scores not commonly found elsewhere (e.g. FIRM [@Taggart:2022a], Flip-Flop Index [@Griffiths:2019; @griffiths2021circular]).
- is highly modular and avoids extensive dependencies by providing its own implementations where relevant.
- is intended to be easy to integrate and utilise in a wide variety of environments. It has been tested and used on workstations, servers and in high performance computing (supercomputing) environments.
- utilises Dask for scaling and performance.
- is intended to be easy to integrate and use in a wide variety of environments. It has been tested and used on workstations, servers and in high performance computing (supercomputing) environments.
- uses Dask for scaling and performance.

## Metrics, Statistical Techniques and Data Processing Tools Included in Scores

Expand All @@ -66,19 +67,15 @@ Here is a **curated selection** of the metrics, tools and statistical tests curr
| **[Probability](https://scores.readthedocs.io/en/latest/included.html#probability)** |Scores for evaluating forecasts that are expressed as predictive distributions, ensembles, and probabilities of binary events. |Brier Score, Continuous Ranked Probability Score (CRPS) for Cumulative Distribution Functions (CDFs) (including threshold-weighting, see [@Gneiting:2011]), CRPS for ensembles [@Gneiting_2007; @Ferro_2013], Receiver Operating Characteristic (ROC), Isotonic Regression (reliability diagrams) [@dimitriadis2021stable]. |
| **[Categorical](https://scores.readthedocs.io/en/latest/included.html#categorical)** |Scores for evaluating forecasts based on categories. |Probability of Detection (POD), False Alarm Rate (FAR), Probability of False Detection (POFD), Success Ratio, Accuracy, Peirce's Skill Score, Critical Success Index (CSI), Gilbert Skill Score, Heidke Skill Score, Odds Ratio, Odds Ratio Skill Score, F1 score, FIxed Risk Multicategorical (FIRM) Score [@Taggart:2022a]. |
| **[Statistical Tests](https://scores.readthedocs.io/en/latest/included.html#statistical-tests)** |Tools to conduct statistical tests and generate confidence intervals. | Diebold-Mariano [@Diebold:1995] with both the [@Harvey:1997] and [@Hering:2011] modifications. |
| **[Processing tools](https://scores.readthedocs.io/en/latest/included.html#processing-tools-for-preparing-data)** |Tools to pre-process data. |Data matching, Discretization, Cumulative Density Function Manipulation. |
| **[Processing Tools](https://scores.readthedocs.io/en/latest/included.html#processing-tools-for-preparing-data)** |Tools to pre-process data. |Data matching, Discretisation, Cumulative Density Function Manipulation. |

Additionally, `scores` has an area specifically to hold emerging scores which are still undergoing research and development. This provides a clear mechanism for people to share, access and collaborate on new scores, and be able to easily re-use versioned implementations of those scores.

## Use in Academic Work

In 2015, the Australian Bureau of Meteorology began developing a new verification system called Jive. For a description of Jive see [@loveday2024jive].

The Jive verification metrics have been used to support several publications [@Griffiths:2017; @Foley:2020; @Taggart:2022b; @Taggart:2022c; @Taggart:2022d].

`scores` has arisen from the Jive verification system. `scores` includes mathematical functions from Jive and is intended to modularise these functions and make them available as an open source package.
In 2015, the Australian Bureau of Meteorology began developing a new verification system called Jive. For a description of Jive see [@loveday2024jive]. The Jive verification metrics have been used to support several publications [@Griffiths:2017; @Foley:2020; @Taggart:2022b; @Taggart:2022c; @Taggart:2022d]. `scores` has arisen from the Jive verification system and was created to modularise the Jive verification functions and make them available as an open source package.

`scores` has been used to explore user-focused approaches to evaluating probabilistic and categorical forecasts [@loveday2023userfocused].
`scores` has been used to explore user-focused approaches to evaluating probabilistic and categorical forecasts [@loveday2024user].

## Related Software Packages

Expand Down
Loading