clean up docstring formatting and type hints (#16)

* docs: align docstrings with napoleon (google) standards * refactor: replace relative imports with pkg level imports * docs: updating contributing notes to include napoleon docstring style * docs: review amendments to docstrings/types * docs: update coords_increasing check docstring * docs: gather dims union types and mse types --------- Signed-off-by: Aidan Griffiths <[email protected]> Co-authored-by: agriffit <[email protected]>
nci · Aug 17, 2023 · 68875f1 · 68875f1
1 parent 5f62616
commit 68875f1
Show file tree

Hide file tree

Showing 8 changed files with 257 additions and 228 deletions.
diff --git a/docs/contributing.md b/docs/contributing.md
@@ -30,14 +30,14 @@ A new score or metric should be developed on a separate feature branch, rebased
  - The implementation of the new metric or score in xarray, ideally with support for pandas and dask
  - 100% unit test coverage
  - A tutorial notebook showcasing the use of that metric or score, ideally based on the standard sample data
- - API documentation (docstrings) which clearly explain the use of the metrics
+ - API documentation (docstrings) using [Napoleon (google)](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html) style, making sure to clearly explain the use of the metrics
  - A reference to the paper which described the metrics, added to the API documentation
  - For metrics which do not have a paper reference, an online source or reference should be provided
  - For metrics which are still under development or which have not yet had an academic publication, they will be placed in a holding area within the API until the method has been properly published and peer reviewed (i.e. `scores.emerging`). The 'emerging' area of the API is subject to rapid change, still of sufficient community interest to include, similar to a 'preprint' of a score or metric.
 
- All merge requests should comply with the coding standards outlined in this document. Merge requests will undergo both a code review and a science review. The code review will focus on coding style, performance and test coverage. The science review will focus on the mathematical correctness of the implementation and the suitability of the method for inclusion within 'scores'.
+All merge requests should comply with the coding standards outlined in this document. Merge requests will undergo both a code review and a science review. The code review will focus on coding style, performance and test coverage. The science review will focus on the mathematical correctness of the implementation and the suitability of the method for inclusion within 'scores'.
 
- A github ticket should be created explaining the metric which is being implemented and why it is useful.
+A github ticket should be created explaining the metric which is being implemented and why it is useful.
 
 ### Development Process for a Correction or Improvement
 

diff --git a/src/scores/continuous.py b/src/scores/continuous.py
@@ -6,31 +6,38 @@
 
 
 def mse(fcst, obs, reduce_dims=None, preserve_dims=None, weights=None):
-    """
+    """Calculates the mean squared error from forecast and observed data.
 
-    Returns:
-      - By default an xarray containing a single floating point number representing the mean absolute
-        error for the supplied data. All dimensions will be reduced.
-      - Otherwise: Returns an xarray representing the mean squared error, reduced along
-      the relevant dimensions and weighted appropriately.
+    Dimensional reduction is not supported for pandas and the user should
+    convert their data to xarray to formulate the call to the metric. At
+    most one of reduce_dims and preserve_dims may be specified.
+    Specifying both will result in an exception.
 
     Args:
-      - fcst: Forecast or predicted variables in xarray or pandas
-      - obs: Observed variables in xarray or pandas
-      - reduce_dims: Optionally specify which dimensions to reduce when calculating MSE.
-                     All other dimensions will be preserved.
-      - preserve_dims: Optionally specify which dimensions to preserve when calculating MSE. All other
-                       dimensions will be reduced. As a special case, 'all' will allow all dimensions to
-                       be preserved. In this case, the result will be in the same shape/dimensionality as
-                       the forecast, and the errors will be the squared error at each point (i.e. single-value
-                       comparison against observed), and the forecast and observed dimensions must match
-                       precisely.
-      - weights: Not yet implemented. Allow weighted averaging (e.g. by area, by latitude, by population, custom)
-
-    Notes:
-      - Dimensional reduction is not supported for pandas and the user should convert their data to xarray
-        to formulate the call to the metric.
-      - At most one of reduce_dims and preserve_dims may be specified. Specifying both will result in an exception.
+        fcst (Union[xr.Dataset, xr.DataArray, pd.Dataframe, pd.Series]):
+            Forecast or predicted variables in xarray or pandas.
+        obs (Union[xr.Dataset, xr.DataArray, pd.Dataframe, pd.Series]):
+            Observed variables in xarray or pandas.
+        reduce_dims (Union[str, Iterable[str]): Optionally specify which
+            dimensions to reduce when calculating MSE. All other dimensions
+            will be preserved.
+        preserve_dims (Union[str, Iterable[str]): Optionally specify which
+            dimensions to preserve when calculating MSE. All other dimensions
+            will be reduced. As a special case, 'all' will allow all dimensions
+            to be preserved. In this case, the result will be in the same
+            shape/dimensionality as the forecast, and the errors will be
+            the squared error at each point (i.e. single-value comparison
+            against observed), and the forecast and observed dimensions
+            must match precisely.
+        weights: Not yet implemented. Allow weighted averaging (e.g. by
+            area, by latitude, by population, custom)
+
+    Returns:
+        Union[xr.Dataset, xr.DataArray, pd.Dataframe, pd.Series]: An object containing
+            a single floating point number representing the mean absolute
+            error for the supplied data. All dimensions will be reduced.
+            Otherwise: Returns an object representing the mean squared error,
+            reduced along the relevant dimensions and weighted appropriately.
     """
 
     error = fcst - obs
@@ -53,38 +60,40 @@ def mse(fcst, obs, reduce_dims=None, preserve_dims=None, weights=None):
 
 
 def mae(fcst, obs, reduce_dims=None, preserve_dims=None, weights=None):
-    """**Needs a 1 liner function description**
+    """Calculates the mean absolute error from forecast and observed data.
+
+    A detailed explanation is on [Wikipedia](https://en.wikipedia.org/wiki/Mean_absolute_error)
+
+    Dimensional reduction is not supported for pandas and the user should
+    convert their data to xarray to formulate the call to the metric.
+    At most one of reduce_dims and preserve_dims may be specified.
+    Specifying both will result in an exception.
+
     Args:
-      - fcst: Forecast or predicted variables in xarray or pandas.
-      - obs: Observed variables in xarray or pandas.
-      - reduce_dims: Optionally specify which dimensions to reduce when
-          calculating MAE. All other dimensions will be preserved.
-      - preserve_dims: Optionally specify which dimensions to preserve
-          when calculating MAE. All other dimensions will be reduced.
-          As a special case, 'all' will allow all dimensions to be
-          preserved. In this case, the result will be in the same
-          shape/dimensionality as the forecast, and the errors will be
-          the absolute error at each point (i.e. single-value comparison
-          against observed), and the forecast and observed dimensions
-          must match precisely.
-      - weights: Not yet implemented. Allow weighted averaging (e.g. by
-          area, by latitude, by population, custom).
+        fcst (Union[xr.Dataset, xr.DataArray, pd.Dataframe, pd.Series]): Forecast
+            or predicted variables in xarray or pandas.
+        obs (Union[xr.Dataset, xr.DataArray, pd.Dataframe, pd.Series]): Observed
+            variables in xarray or pandas.
+        reduce_dims (Union[str, Iterable[str]]): Optionally specify which dimensions
+            to reduce when calculating MAE. All other dimensions will be preserved.
+        preserve_dims (Union[str, Iterable[str]]): Optionally specify which
+            dimensions to preserve when calculating MAE. All other dimensions
+            will be reduced. As a special case, 'all' will allow all dimensions
+            to be preserved. In this case, the result will be in the same
+            shape/dimensionality as the forecast, and the errors will be
+            the absolute error at each point (i.e. single-value comparison
+            against observed), and the forecast and observed dimensions
+            must match precisely.
+        weights: Not yet implemented. Allow weighted averaging (e.g. by
+            area, by latitude, by population, custom).
 
     Returns:
-      - By default an xarray DataArray containing a single floating
-        point number representing the mean absolute error for the
+        Union[xr.Dataset, xr.DataArray, pd.Dataframe, pd.Series]: By default an xarray DataArray containing
+        a single floating point number representing the mean absolute error for the
         supplied data. All dimensions will be reduced.
 
-        Alternatively, an xarray structure with dimensions preserved as
-        appropriate containing the score along reduced dimensions
-
-    Notes:
-      - Dimensional reduction is not supported for pandas and the user
-        should convert their data to xarray to formulate the call to the metric.
-      - At most one of reduce_dims and preserve_dims may be specified.
-        Specifying both will result in an exception.
-
-    A detailed explanation is on [Wikipedia](https://en.wikipedia.org/wiki/Mean_absolute_error)
+        Alternatively, an xarray structure with dimensions preserved as appropriate
+        containing the score along reduced dimensions
     """
 
     error = fcst - obs

diff --git a/src/scores/probability/__init__.py b/src/scores/probability/__init__.py
@@ -2,4 +2,8 @@
 Import the functions from the implementations into the public API
 """
 
-from .crps_impl import adjust_fcst_for_crps, crps_cdf, crps_cdf_brier_decomposition
+from scores.probability.crps_impl import (
+    adjust_fcst_for_crps,
+    crps_cdf,
+    crps_cdf_brier_decomposition,
+)
diff --git a/src/scores/probability/checks.py b/src/scores/probability/checks.py
@@ -1,5 +1,5 @@
 """
-This module contains methods which make assertions at runtime about the state of various data 
+This module contains methods which make assertions at runtime about the state of various data
 structures and values
 """
 
@@ -8,24 +8,30 @@
 
 
 def coords_increasing(da: xr.DataArray, dim: str):
-    """
-    Returns True if coordinates along `dim` dimension of `da` are increasing,
-    False otherwise. No in-built raise if `dim` is not a dimension of `da`.
+    """Checks if coordinates in a given DataArray are increasing.
+
+    Note: No in-built raise if `dim` is not a dimension of `da`.
+
+    Args:
+        da (xr.DataArray): Input data
+        dim (str): Dimension to check if increasing
+    Returns:
+        (bool):  Returns True if coordinates along `dim` dimension of
+        `da` are increasing, False otherwise.
     """
     result = (da[dim].diff(dim) > 0).all()
     return result
 
 
 def cdf_values_within_bounds(cdf: xr.DataArray) -> bool:
-    """
-    Checks that 0 <= cdf <= 1. Ignores NaNs.
+    """Checks that 0 <= cdf <= 1. Ignores NaNs.
 
     Args:
-        cdf: array of CDF values
+        cdf (xr.DataArray): array of CDF values
 
     Returns:
-        `True` if `cdf` values are all between 0 and 1 whenever values are not NaN,
-        or if all values are NaN; and `False` otherwise.
+        (bool): `True` if `cdf` values are all between 0 and 1 whenever values are not NaN,
+            or if all values are NaN; and `False` otherwise.
     """
     return cdf.count() == 0 or ((cdf.min() >= 0) & (cdf.max() <= 1))