Releases: mmaelicke/scikit-gstat
Version 1.0
Here we present SciKit-GStat, an open source Python package for variogram estimation, that fits well into established frameworks for scientific computing like SciPy, numpy, gstools or pandas. SciKit-GStat is written in a mutable, object-oriented way that mimics the typical geostatistical analysis workflow. Its main strength is the ease of usage and interactivity and it is therefore usable with only a little or even no knowledge in Python.
SciKit-GStat ships with a large number of predefined procedures, algorithms, and models, such as variogram estimators, theoretical spatial models, or binning algorithms. Common approaches to estimate variograms are covered and can be used out of the box. At the same time, the base class is very flexible and can be adjusted to less common problems, as well.
SciKit-GStat can easily interface to GSTools.
- Find the documentation here
- Tutorials: https://mmaelicke.github.io/scikit-gstat/auto_examples/index.html
- DockerHub: https://hub.docker.com/r/mmaelicke/scikit-gstat
If you use SciKit-GStat, pleace cite this publication:
Mälicke, M.: SciKit-GStat 1.0: A SciPy flavoured geostatistical variogram estimation toolbox written in Python, Geosci. Model Dev. Discuss.
[preprint], https://doi.org/10.5194/gmd-2021-174, in review, 2021.
The code itself can also be cited:
Mirko Mälicke, Egil Möller, Helge David Schneider, & Sebastian Müller. (2021, May 28).
mmaelicke/scikit-gstat: A scipy flavoured geostatistical variogram analysis toolbox (Version v0.6.0). Zenodo.
http://doi.org/10.5281/zenodo.4835779
What's Changed
- Implement
fit_method
as a property by @mmaelicke in #106 - Add Meuse data to skgstat by @mmaelicke in #107
- fixing indexing of sigma matrix by @mmaelicke in #110
- Add custom bins (callable, iterable) and count property by @rhugonnet in #112
- Argument for optional model fitting by @rhugonnet in #113
- Add RasterMetricSpace for improved pairwise sampling of large 2D grids by @rhugonnet in #114
- Fix for RasterMetricSpace by @rhugonnet in #115
- Fix subsample by @mmaelicke in #116
- Add py39 to ci by @mmaelicke in #117
- raise Error instead of warning by @mmaelicke in #119
- update tutorials dev branch to master by @mmaelicke in #120
- Tutorials by @mmaelicke in #121
- Maximum Likelihood utility by @mmaelicke in #122
New Contributors
- @rhugonnet made their first contribution in #112
Full Changelog: v0.6.0...v1.0.0
Version 0.6
Description
SciKit-Gstat is a scipy-styled geostatistical toolbox for variogram estimation. It includes two base classes Variogram
and OrdinaryKriging
. Additionally, various variogram classes inheriting from Variogram are available for solving directional or space-time related tasks. The module makes use of a rich selection of semi-variance estimators and variogram model functions while being extensible at the same time.
This version may be the last minor version before the first stable release 1.0
is released!
Version 0.6
brings several smaller adjustments. A new interface was introduced to export a Variogram
directly into a gstools.Krige instance. This makes kriging even more seamless between scikit-gstat
and gstools
.
The Variogram
has a new method called cross_validate
to validate variograms by a leave-one-out Kriging interpolation. This is accompanied by some internals to estimate observation uncertainty and plot error bars in the default plot. Proper uncertainty estimation is still a long way to go and possible a good objective for version 1.1
.
Finally, SciKit-GStat has a skgstat.data
submodule, that can return sample data.
Documentation
- Full Documentation https://mmaelicke.github.io/scikit-gstat
- User Guide https://mmaelicke.github.io/scikit-gstat/userguide/userguide.html
- Tutorials https://mmaelicke.github.io/scikit-gstat/tutorials/tutorials.html
Changes since 0.5
- The util and data submodule are now always loaded at top-level
- fixed a potential circular import
- added uncertainty tools to util. This is not yet finished and may change the signature before it gets stable with Version 1.0.0
Version 0.5.6
- [Variogram] the interal
MetricSpace
instance used to calculate the distance matrix is now available as theVariogram.metric_space
property. - [Variogram]
Variogram.metric_space
is now read-only. - [unittest] two unittests are changed (linting, not functionality)
Version 0.5.5
- [data] new submodule
skgstat.data
contains sample random fields and methods for sampling these fields in a reproducible way at random locations and different sample sizes.
Version 0.5.4
- [util] added a new
cross_validation
utility module to cross-validate variograms with leave-one-out Kriging cross validations.
Version 0.5.3
- [MetricSpace] new class
skgstat.MetricSpace.ProbabilisticMetricSpace
that extends the metric space by a stochastic element to draw samples from the input data, instead of using the full dataset.
Version 0.5.2
- [interface] new interface function added:
skgstat.Variogram.to_gs_krige
. This interface will return agstools.Krige
instance from the fitted variogram. - some typos were corrected
- some code refactored (mainly linting errors)
Version 0.5.1
- [plotting] the spatio-temporal 2D and 3D plots now label the axis correctly.
- [plotting] fixed swapped plotting axes for spatio-temporal plots.
Version 0.5
Description
SciKit-Gstat is a scipy-styled analysis module for geostatistics. It includes two base classes Variogram
and OrdinaryKriging
. Additionally, various variogram classes inheriting from Variogram are available for solving directional or space-time related tasks. The module makes use of a rich selection of semi-variance estimators and variogram model functions while being extensible at the same time.
Version 0.5
brings two major improvements: Instead of passing a numpy.ndarray
, you can now use the new class skgstat.MetricSpace
, which can pre-calculate distances in case they are used all over the place. Secondly, the new interface functions Variogram.to_gstools
and Variogram.to_empirical
can be used to export a Variogram
to gstools and use their field generation, kriging and all the other fancy stuff there.
Documentation
- Full Documentation https://mmaelicke.github.io/scikit-gstat
- User Guide https://mmaelicke.github.io/scikit-gstat/userguide/userguide.html
- Tutorials https://mmaelicke.github.io/scikit-gstat/tutorials/tutorials.html
Changes since 0.4
- [MetricSpace] A new class :class:
MetricSpace <skgstat.MetricSpace>
was introduced. This class can be passed
to any class that accepted coordinates so far. This wrapper can be used to pre-calculate large distance
matrices and pass it to a lot of Variograms. - [MetricSpacePair] A new class :class:
MetricSpacePair <skgstat.MetricSpacePair>
was introduced.
This is a pair of two :class:MetricSpaces <skgstat.MetricSpace>
and pre-calculates all distances between
the two spaces. This is i.e. used in Kriging to pre-calcualte all distance between the input coordinates and
the interpolation grid only once.
Version 0.4.4
- [models] the changes to :func:
matern <skgstat.models.matern>
introduced in0.3.2
are reversed.
The Matérn model does not adapt the smoothness scaling to effective range anymore, as the behavior was too
inconsistent. - [interface] minor bugfix of circular import in
variogram_estimator
interface - [models] :func:
matern(0, ...) <skgstat.models.matern>
now returns the nugget instead ofnumpy.NaN
- [models] :func:
stable(0, ...) <skgstat.models.stable>
now returns the nugget instead ofnumpy.NaN
or a
ZeroDivisionError
.
Version 0.4.3
- [Variogram] :func:
dim <skgstat.Variogram.dim>
now returns the spatial dimensionality of the input data. - [Variogram] fixed a numpy depreaction warning in
_calc_distances
Version 0.4.2
- [Variogram] :func:
bins <skgstat.Variogram.bins>
now cases manual setted bin edges automatically
to a :func:numpy.array
. - [Variogram] :func:
get_empirical <skgstat.Variogram.get_empirical>
returns the empirical variogram.
That is a tuple of the current :func:bins <skgstat.Variogram.bins>
and
:func:experimental <skgstat.Variogram.experimental>
arrays, with the option to move the bin to the
lag classes centers.
Version 0.4.1
- [Variogram] moved the bin function setting into a wrapper instance method, which was an anonymous lambda before.
This makes the Variogram serializable again. - [Variogram] a list of pylint errors were solved. Still enough left.
- [binning] added
'stable_entropy'
option that will optimize the lag class edges to be of comparable Shannon Entropy.
Version 0.4
Description
SciKit-Gstat is a scipy-styled analysis module for geostatistics. It includes two base classes Variogram
and OrdinaryKriging
. Additionally, various variogram classes inheriting from Variogram are available for solving directional or space-time related tasks. The module makes use of a rich selection of semi-variance estimators and variogram model functions while being extensible at the same time.
Documentation
- Full Documentation https://mmaelicke.github.io/scikit-gstat
- User Guide https://mmaelicke.github.io/scikit-gstat/userguide/userguide.html
- Tutorials https://mmaelicke.github.io/scikit-gstat/tutorials/tutorials.html
Breaking change
There is one potetial breaking change compared to version 0.3.0
: The lag_classes generator now yields empty arrays for unoccupied lag classes. This will result in NaN values for the semi-variance.
Changes since 0.3
Version 0.3.11
- [binning] added
stable_entropy
option that will optimize the lag class edges to be of comparable Shannon Entropy. - [Variogram] A new method is introduced to calculate fitting weights. Works for all but the manual fit method. By setting
fit_sigma='entropy'
, the fitting weights will be adjusted according to the lag classes’ Shannon entropy. That will ignore lag classes of high uncertainty and emphasize lags of low uncertainty.
Version 0.3.10
- [binning] added a median aggregation option to ward. This can be enabled by setting
binning_agg_func=‘median’
. The cluster centroids will be derived from the members median value, instead of mean value. - [Variogram] added
fit_method='ml'
- a maximum likelihood fitting procedure to fit the theoretical variogram to the experimental - [Variogram] added
fit_method='manual'
. This is a manual fitting method that takes the variogram parameters either at instantiation prefixed byfit_
, or as keyword arguments by fit. - [Variogram] the manual fitting method will preseve the previous parameters, if the Variogram was fitted before and the fitting parameters are not manually overwritten.
Version 0.3.9
- [binning] added kmeans and ward for forming non-equidistant lag classes based on a distance matrix clustering
- [Kriging] Kriging now stores the last interpolated field as z. This is the first of a few changes in future releases, which will ultimately add some plotting methods to Kriging.
Version 0.3.8
- [plotting] minor bugfixes in plotting routines (wrong arguments, pltting issues)
- [docs] added a tutorial about plotting
- [binning] added auto_derived_lags for a variety of different methods that find a good estimate for either the number of lag classes or the lag class width. These can be used by passing the method name as bin_func parameter: Freedman-Diaconis (‘fd’), Sturge’s rule (‘sturges’), Scott’s rule (‘scott’) and Doane’s extension to Sturge’s rule (‘doane’). Uses histogram_bin_edges <numpy.histogram_bin_edges> internally.
Version 0.3.7
- [Variogram] now accepts arbitary kwargs. These can be used to further specify functional behavior of the class. As of Version 0.3.7 this is used to pass arguments down to the entropy and percentile estimators.
- [Variogram] the describe now adds the init arguments by default to the output. The method can output the init params as a nested dict inside the output or flatten the output dict.
Version 0.3.6
- [Variogram] some internal code cleanup. Removed some unnecessary loops
- [Variogram] setting the n_lags property now correctly forces a recalculation of the lag groupings. So far they were kept untouches, which might result in old experimental variogram values for the changed instance. This is a potential breaking change.
- [Variogram] The lag_classes generator now yields empty arrays for unoccupied lag classes. This will result in NaN values for the semi-variance. This is actually a bug-fix. This is a potential breaking change
Version 0.3.5
- [plotting] The location_trend can now add trend model lines to the scatter plot for the ‘plotly’ backend and calculate the R² for the trend model.
- [Variogram] the internal attribute holding the name of the current distance function was renamed from _dict_func to _dist_func_name
Version 0.3.4
- [plotting] The scattergram functions color the plotted points with respect to the lag bin they are originating from. For matplotlib, this coloring is suppressed, but can activated by passing the argument scattergram(single_color=False).
Version 0.3.3
- [plotting] a new submodule is introduced: skgstat.plotting. This contains all plotting functions. The plotting behavior is not changed, but using skgstat.plotting.backend(), the used plotting library can be switched from matplotlib to plotly
- [stmodels] some code cleanup
- [SpaceTimeVariogram] finally can fit the product-sum model to the experimental variogram
Version 0.3.2
- [models] Matérn model now adapts effective range to smoothness parameter
- [models] Matérn model documentation updated
- [models] some minor updates to references in the docs
Version 0.3.1
- [Variogram] - internal distance calculations were refactored, to speed things up
- [Kriging] - internal distance calculations were refactored, to speed things up
Version 0.3
SciKit-Gstat is a scipy-styled analysis module for geostatistics. It includes two base classes Variogram and OrdinaryKriging. Additionally, various variogram classes inheriting from Variogram are available for solving directional or space-time related tasks. The module makes use of a rich selection of semi-variance estimators and variogram model functions while being extensible at the same time.
This version changed the DirectionalVariogram
class quite substantially. The circular search area was removed, therefore shapely
is not a dependency anymore and the variogram estimation for directional variograms got a performance gain of several magnitudes.
Version 0.2.8
SciKit-Gstat is a scipy-styled analysis module for geostatistics. It includes two base classes Variogram and OrdinaryKriging. Additionally, various variogram classes inheriting from Variogram are available for solving directional or space-time related tasks. The module makes use of a rich selection of semi-variance estimators and variogram model functions while being extensible at the same time.
This version changed some of the internal parameter settings and removed old, not working code. An interface to gstools CovModel was added, which is still experimental and untested.
Version 0.2.7
SciKit-Gstat is a scipy-styled analysis module for geostatistics. It includes two base classes Variogram and OrdinaryKriging. Additionally, various variogram classes inheriting from Variogram are available for solving directional or space-time related tasks. The module makes use of a rich selection of semi-variance estimators and variogram model functions while being extensible at the same time.
This version increases the test coverage a bit and the documentation made progress. Besides some minor bug fixes, the main new feature of this version is the module skgstat.interfaces
that collects interfaces to other packages. PyKrige and scikit-learn are available. GsTools will follow with next release.
Version 0.2.6
SciKit-Gstat is a scipy-styled analysis module for geostatistics. It includes two base classes Variogram and OrdinaryKriging. Additionally, various variogram classes inheriting from Variogram are available for solving
directional or space-time related tasks. The module makes use of a rich selection of semi-variance estimators and variogram model functions while being extensible at the same time.
Note that there are no unit tests for Kriging so far, and they are not documented. Kriging got some new keywords in this version and there are some strategies to increase performance or gain better results. The main bottleneck for performance is not handled yet (on purpose).
The Variogram.compiled_model function is deprecated and was replaced by the much faster Variogram.fitted_model.
Version 0.2.5
SciKit-Gstat is a scipy-styled analysis module for geostatistics. It includes
two base classes Variogram
and OrdinaryKriging
. Additionally, various
variogram classes inheriting from Variogram
are available for solving
directional or space-time related tasks. The module makes use of a rich selection of semi-variance
estimators and variogram model functions, while being extensible at the same
time.
Note that there are no unit tests for Kriging so far and they are not documented. At the current stage, the Kriging is also not optimized for performance. It may change significantly in a future version.
Version 0.2.3
[severe bug] A severe bug in Variogram.__vdiff_indexer was found and fixed. The iterator was indexing the Variogram._diff array different from Variogram.distance. This lead to wrong semivariance values for all versions > 0.1.8!. Fixed now.
Beside this major bug fix unit tests for parameter setting were added and fit_sigma setting of 'exp' was fixed.
The formula from e^(1 / x) to 1. - e^(1 / x) in order to increase with distance and, thus, give less weight to distant lag classes during fitting.