Code from publication @ACM SIGSPATIAL 2018: Modular Software Framework for Compression of Structured Environmental Data.
This framework should help with the design and development of a prediction-based compression method for climate data.
This code has been tested on following machine:
Python: 3.6.1
OS: Debian 4.11.6-1 (2017-06-19) testing (buster)
CPU: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
MEM: 16 GiB 2400MHz DDR4
To recreate the software environment you can use the provided
spec-file.txt
and requirements.txt
files. Currently only GNU/Linux is supported.
conda create -n ENVNAME --file spec-file.txt
conda activate ENVNAME
pip install -r requirements.txt
macOS & Windows: Conda does not support cross-platform export of package names including versions. As soon as this feature is added I'll generate the appropriate macOS and Windows environmental files.
A workflow defines the whole compression process from start to finish. The following figure will give a short overview of the steps and modifiers as well as objects involved in the compression process.
A workflow is defined by its five modifiers: mapper, sequencer, predictor, subtractor and encoder. These modifiers define the whole compression process (map
,flatten
, predict
, subtract
, encode
) depicted in the figure above. The read
and write
processes are provided by the Python standard library.
A modifier defines how to transform one object to the other. It provides one function (with appropriate inverse function for the decompression process) and operate on only on one object. The following table provides an overview of the interface for each of the modifiers.
modifier | function | inverse function | input object | output object |
---|---|---|---|---|
mapper | map() |
rev_map() |
floatarray | integerarray |
sequencer | flatten() |
- | integerarray | sequence object |
predictor | predict() |
- | integerarray, startnode, sequence object | predictionarray |
subtractor | subtract() |
- | integerarray, predictionarray | residualarray |
encoder | encode() |
decode() |
residualarray | coded object |
The input and output objects are switched if the appropriate inverse function is used.
An object defines the state of the data during the (de)compression process. The following UML-diagram provides information about each object and the attributes provided by it.
The keen observer might have discovered that the integer, residual and predictionarrays provide the same attributes and have the same properties. But the separation is necessary due to the fact that the modifiers are only allowed to operate on one kind of object.
The cframe
folder is structured as follows:
cframe/
├── backend
├── data
├── format
├── modifier
│ ├── encoder
│ ├── mapper
│ ├── predictor
│ ├── sequencer
│ └── subtractor
├── objects
│ └── arrays
└── toolbox
backend
Files which define the interfaces of the objects and modifiers. The modifier definitions use amod
suffix and the object definitions use theobj
suffix.data
Example netcdf file [source]format
Preliminary format definition including white papermodifier
Actual instances for modifier objects (following explanation is ordered by execution steps, not alphabetically)mapper
Mapping of floating point values to integerssequencer
Traversal of the dataset withstartnode
and optionalorder
parameter (further sequencer are provided in following repository informationspaces)predictor
Prediction of the next value based on past experiencesubtractor
Modifier for calculation of the difference between prediction and true valueencoder
Methods for writing of the data on disk (according to the definition informat
)
objects
Implementation of the objects. The arrays are put in a separate subfolder for better organisationtoolbox
Additional helper functions to provide workflows, parallel execution, subsetting, and quality assessmentfeeder
Pre-allocation of arrays to help improve calculation speed during the prediction process. This prevents copying data over and over in memoryparallel
Parallel execution of severalworkflows
qualityassessment
Statistics about the compressed datasubsetting
Random subsets of a dataset of sizeN
workflow
Wrapper class for the definition of a complete (de)compression process