Skip to content

Commit

Permalink
Update the maintenance and development guide to include info from the…
Browse files Browse the repository at this point in the history
… SMG - which is now redundant. Split out the anlaysis plugin section to its own page.
  • Loading branch information
duncanwp committed Sep 16, 2015
1 parent 1080d9c commit c11f53c
Show file tree
Hide file tree
Showing 3 changed files with 212 additions and 123 deletions.
132 changes: 132 additions & 0 deletions doc/analysis_plugin_development.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
===========================
Analysis plugin development
===========================

Users can write their own plugins for performing the collocation of two data sets.
There are three different types of plugin available for collocation, first we will describe the overall design and how
these different components interact, then each will be described in more detail.

Basic collocation design
========================

The diagram below demonstrates the basic design of the collocation system, and the roles of each of the components.
In the simple case of the default collocator (which returns only one value) the :ref:`Collocator <collocator_description>`
loops over each of the sample points, calls the relevant :ref:`Constraint <constraint_description>` to reduce the
number of data points, and then the :ref:`Kernel <kernel_description>` which returns a single value, which the
collocator stores.

.. image:: img/CollocationDiagram.png
:width: 600px

.. _kernel_description:

Kernel
======

A kernel is used to convert the constrained points into values in the output. There are two sorts of kernel one
which act on the final point location and a set of data points (these derive from :class:`.Kernel`) and the more specific kernels
which act upon just an array of data (these derive from :class:`.AbstractDataOnlyKernel`, which in turn derives from :class:`.Kernel`).
The data only kernels are less flexible but should execute faster. To create a new kernel inherit from :class:`.Kernel` and
implement the abstract method :meth:`.Kernel.get_value`. To make a data only kernel inherit from :class:`.AbstractDataOnlyKernel`
and implement :meth:`.AbstractDataOnlyKernel.get_value_for_data_only` and optionally overload :meth:`.AbstractDataOnlyKernel.get_value`.
These methods are outlined below.

.. automethod:: cis.collocation.col_framework.Kernel.get_value
:noindex:

.. automethod:: cis.collocation.col_framework.AbstractDataOnlyKernel.get_value_for_data_only
:noindex:

.. _constraint_description:

Constraint
==========

The constraint limits the data points for a given sample point.
The user can also add a new constraint mechanism by subclassing :class:`.Constraint` and providing an implementation for
:meth:`.Constraint.constrain_points`. If more control is needed over the iteration sequence then the
:meth:`.Constraint.get_iterator` method can also be
overloaded. Note however that this may not be respected by all collocators, who may still iterate over all
sample data points. It is possible to write your own collocator (or extend an existing one) to ensure the correct
iterator is used - see the next section. Both these methods, and their signatures, are outlined below.

.. automethod:: cis.collocation.col_framework.Constraint.constrain_points
:noindex:

.. automethod:: cis.collocation.col_framework.Constraint.get_iterator
:noindex:

To enable a constraint to use a :class:`.AbstractDataOnlyKernel`, the method
:meth:`get_iterator_for_data_only` should be implemented (again though, this may be ignored by a collocator). An
example of this is the :meth:`.BinnedCubeCellOnlyConstraint.get_iterator_for_data_only` implementation.

.. _collocator_description:

Collocator
==========

Another plugin which is available is the collocation method itself. A new one can be created by subclassing :class:`.Collocator` and
providing an implementation for :meth:`.Collocator.collocate`. This method takes a number of sample
points and applies the given constraint and kernel methods on the data for each of those points. It is responsible for
returning the new data object to be written to the output file. As such, the user could create a collocation routine
capable of handling multiple return values from the kernel, and hence creating multiple data objects, by creating a
new collocation method.

.. note::

The collocator is also responsible for dealing with any missing values in sample points. (Some sets of sample points may
include values which may or may not be masked.) Sometimes the user may wish to mask the output for such points, the
:attr:`missing_data_for_missing_sample` attribute is used to determine the expected behaviour.

The interface is detailed here:

.. automethod:: cis.collocation.col_framework.Collocator.collocate
:noindex:

Implementation
==============

For all of these plugins any new variables, such as limits, constraint values or averaging parameters,
are automatically set as attributes in the relevant object. For example, if the user wanted to write a new
constraint method (``AreaConstraint``, say) which needed a variable called ``area``, this can be accessed with ``self.area``
within the constraint object. This will be set to whatever the user specifies at the command line for that variable, e.g.::

$ ./cis.py col my_sample_file rain:"model_data_?.nc"::AreaConstraint,area=6000,fill_value=0.0:nn_gridded

Example implementations of new collocation plugins are demonstrated below for each of the plugin types::


class MyCollocator(Collocator):

def collocate(self, points, data, constraint, kernel):
values = []
for point in points:
con_points = constraint.constrain_points(point, data)
try:
values.append(kernel.get_value(point, con_points))
except ValueError:
values.append(constraint.fill_value)
new_data = LazyData(values, data.metadata)
new_data.missing_value = constraint.fill_value
return new_data


class MyConstraint(Constraint):

def constrain_points(self, ref_point, data):
con_points = []
for point in data:
if point.value > self.val_check:
con_points.append(point)
return con_points


class MyKernel(Kernel):

def get_value(self, point, data):
nearest_point = point.furthest_point_from()
for data_point in data:
if point.compdist(nearest_point, data_point):
nearest_point = data_point
return nearest_point.val

1 change: 1 addition & 0 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Contents:
statistics
overlay_examples
plugin_development
analysis_plugin_development
maintenance_and_development
CIS as a Python library (API) <api/cis>

Expand Down
202 changes: 79 additions & 123 deletions doc/maintenance_and_development.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,34 @@
Maintenance and Developer Guide
===============================

Unit test suite
===============
Source files
============

The cis source code is hosted at https://github.com/cedadev/jasmin_cis.git, while the conda recipes and other files are
hosted here: https://github.com/cistools.

Test suites
===========

The unit tests suite can be ran using Nose readily. Just go the root of the repository (i.e. cis) and type
``nosetests cis/test/unit`` and this will run the full suite of tests.

A comprehensive set of integration tests are also provided. There is a folder full of test data
at: ``/group_workspaces/jasmin/cis/cis_repo_test_files`` which has been compressed and is available as a tar inside that
folder.

To add files to the folder simply copy them in then delete the old tar file and create a new one with::

tar --dereference -zcvf cis_repo_test_files.tar.gz .

Ignore warning about file changing - it is because the tar file is in the directory. Having the tar file in the
directory, however, means the archive can be easily unpacked, without creating an intermediate folder.
To make the integration tests run this needs to be copied to the local machine and decompressed. Then set the
environment variable ``CIS_DATA_HOME`` to the location of the data sets, and run ``nosetests cis/test/integration``.

The unit tests suite can be ran using Nose readily. Just go the root of the repository (i.e. cis) and type ``nosetests cis/test/unit`` and this will run the full suite of tests.
A comprehensive set of integration tests are also provided. These require data sets which can be found in the JASMIN CIS group workspace under the ``cis_repo_test_files`` directory. To run the integration tests set the environment variable ``CIS_DATA_HOME`` to the location of the data sets, and then run ``nosetests cis/test/integration``.
There are also a number of plot tests available under the ``test/plot_tests`` directory which can be run using the ``run_all.sh`` script. These perform a diff of some standard plots against reference plots, however small changes in the platform libraries and fonts can break these tests so they shouldn't be relied on.
There are also a number of plot tests available under the ``test/plot_tests`` directory which can be run using
the ``run_all.sh`` script. These perform a diff of some standard plots against reference plots, however small changes
in the platform libraries and fonts can break these tests so they shouldn't be relied on.


Dependencies
Expand All @@ -19,8 +41,30 @@ A graph representing the dependency tree can be found at ``doc/cis_dependency.do
:width: 900px


Creating a Release
==================

To carry out intermediate releases follow this procedure:

1. Check the version number and status is updated in the CIS source code (cis/__init__.py)

2. Tag the new version on Github with new version number and release notes.

3. Create a tarball - use ``python setup.py egg_info sdist`` in the cis root dir.

4. Install this onto the release virtual environment: this is at ``/group_workspaces/jasmin/cis/cis_dev_venv``. So activate
the venv, upload the tarball somewhere on the GWS and then do ``pip install <LOCATION_OF_TARBALL>``.

5. Create an anaconda build - see below.

6. Request Phil Kershaw upload the tarball to PyPi. (Optional)

For a release onto JASMIN, complete the steps above and then ask Alan Iwi to produce an RPM, deploy it on a
test VM, confirm functionality then rollout across full JAP and LOTUS nodes.


Anaconda Build
==============
--------------

The Anaconda build recipes for CIS and the dependencies which can't be found either in the core channel, or in SciTools are stored in their own github repository `here <https://github.com/cistools/conda-recipes>`_.
To build a new CIS package clone the conda-recipes repository and then run the following command::
Expand All @@ -47,134 +91,46 @@ This will output the documentation in html under the directory ``doc/_build/html

.. _analysis_plugin_development:

Analysis plugin development
===========================

Users can write their own plugins for performing the collocation of two data sets.
There are three different types of plugin available for collocation, first we will describe the overall design and how
these different components interact, then each will be described in more detail.

Basic collocation design
------------------------

The diagram below demonstrates the basic design of the collocation system, and the roles of each of the components.
In the simple case of the default collocator (which returns only one value) the :ref:`Collocator <collocator_description>`
loops over each of the sample points, calls the relevant :ref:`Constraint <constraint_description>` to reduce the
number of data points, and then the :ref:`Kernel <kernel_description>` which returns a single value, which the
collocator stores.

.. image:: img/CollocationDiagram.png
:width: 600px
Continuous Integration Server
=============================
JASMIN provide a Jenkins CI Server on which the CIS unit and integration tests are run whenever origin/master is updated.
The integration tests take approximately 7 hours to run whilst the unit tests take about 5s. The Jenkins server is
hosted on jasmin-sci1-dev at ``/var/lib/jenkins`` and is accessed at http://jasmin-sci1-dev.ceda.ac.uk:8080/

.. _kernel_description:
We also have a Travis cloud instance (https://travis-ci.org/cedadev/cis) which in principle allows us to build and test
on both Linux and OS X. There are unit test builds currently working but because of a hard time limit on builds (120
minutes) the integration tests don't currently run.

Kernel
------
Copying files to the CI server
------------------------------

A kernel is used to convert the constrained points into values in the output. There are two sorts of kernel one
which act on the final point location and a set of data points (these derive from :class:`.Kernel`) and the more specific kernels
which act upon just an array of data (these derive from :class:`.AbstractDataOnlyKernel`, which in turn derives from :class:`.Kernel`).
The data only kernels are less flexible but should execute faster. To create a new kernel inherit from :class:`.Kernel` and
implement the abstract method :meth:`.Kernel.get_value`. To make a data only kernel inherit from :class:`.AbstractDataOnlyKernel`
and implement :meth:`.AbstractDataOnlyKernel.get_value_for_data_only` and optionally overload :meth:`.AbstractDataOnlyKernel.get_value`.
These methods are outlined below.
The contents of the test folder will not be automatically copied across to the Jenkins directory, so if you add any
files to the folder you'll need to manually copy them to the Jenkins directory or the integration tests will fail. The
directory is ``/var/lib/jenkins/workspace/CIS Integration Tests/cis/test/test_files/``. This is not entirely simple
because:

.. automethod:: cis.collocation.col_framework.Kernel.get_value
:noindex:
* We don't have write permissions on the test folder
* Jenkins doesn't have read permissions for the CIS group_workspace

.. automethod:: cis.collocation.col_framework.AbstractDataOnlyKernel.get_value_for_data_only
:noindex:
In order to copy files across we have done the following:

.. _constraint_description:
1. Copy the files we want to /tmp

Constraint
----------
2. Open up the CIS Integration Tests webpage and click 'Configure'

The constraint limits the data points for a given sample point.
The user can also add a new constraint mechanism by subclassing :class:`.Constraint` and providing an implementation for
:meth:`.Constraint.constrain_points`. If more control is needed over the iteration sequence then the
:meth:`.Constraint.get_iterator` method can also be
overloaded. Note however that this may not be respected by all collocators, who may still iterate over all
sample data points. It is possible to write your own collocator (or extend an existing one) to ensure the correct
iterator is used - see the next section. Both these methods, and their signatures, are outlined below.
3. Scroll down to 'Build' where the shell script to be executed is found and insert a line to copy the file to the
directory, e.g. ``cp /tmp/file.nc /var/lib/jenkins/workspace/CIS Integration Tests/cis/test/test_files``

.. automethod:: cis.collocation.col_framework.Constraint.constrain_points
:noindex:
4. Run the CIS Integration Tests

.. automethod:: cis.collocation.col_framework.Constraint.get_iterator
:noindex:
5. Remove the line from the build script

To enable a constraint to use a :class:`.AbstractDataOnlyKernel`, the method
:meth:`get_iterator_for_data_only` should be implemented (again though, this may be ignored by a collocator). An
example of this is the :meth:`.BinnedCubeCellOnlyConstraint.get_iterator_for_data_only` implementation.
6. Remove the files from /tmp

.. _collocator_description:

Collocator
----------

Another plugin which is available is the collocation method itself. A new one can be created by subclassing :class:`.Collocator` and
providing an implementation for :meth:`.Collocator.collocate`. This method takes a number of sample
points and applies the given constraint and kernel methods on the data for each of those points. It is responsible for
returning the new data object to be written to the output file. As such, the user could create a collocation routine
capable of handling multiple return values from the kernel, and hence creating multiple data objects, by creating a
new collocation method.

.. note::

The collocator is also responsible for dealing with any missing values in sample points. (Some sets of sample points may
include values which may or may not be masked.) Sometimes the user may wish to mask the output for such points, the
:attr:`missing_data_for_missing_sample` attribute is used to determine the expected behaviour.

The interface is detailed here:

.. automethod:: cis.collocation.col_framework.Collocator.collocate
:noindex:

Implementation
--------------
Problems with Jenkins
---------------------

For all of these plugins any new variables, such as limits, constraint values or averaging parameters,
are automatically set as attributes in the relevant object. For example, if the user wanted to write a new
constraint method (``AreaConstraint``, say) which needed a variable called ``area``, this can be accessed with ``self.area``
within the constraint object. This will be set to whatever the user specifies at the command line for that variable, e.g.::

$ ./cis.py col my_sample_file rain:"model_data_?.nc"::AreaConstraint,area=6000,fill_value=0.0:nn_gridded

Example implementations of new collocation plugins are demonstrated below for each of the plugin types::


class MyCollocator(Collocator):
def collocate(self, points, data, constraint, kernel):
values = []
for point in points:
con_points = constraint.constrain_points(point, data)
try:
values.append(kernel.get_value(point, con_points))
except ValueError:
values.append(constraint.fill_value)
new_data = LazyData(values, data.metadata)
new_data.missing_value = constraint.fill_value
return new_data


class MyConstraint(Constraint):
def constrain_points(self, ref_point, data):
con_points = []
for point in data:
if point.value > self.val_check:
con_points.append(point)
return con_points
class MyKernel(Kernel):
def get_value(self, point, data):
nearest_point = point.furthest_point_from()
for data_point in data:
if point.compdist(nearest_point, data_point):
nearest_point = data_point
return nearest_point.val
Sometimes the Jenkins server experiences problems which make it unusable. One particular issue we've encountered more
than once is that Jenkins occasionally loses all its stylesheets and then becomes impossible to use. Asking CEDA support
(or Phil Kershaw) to restart Jenkins should solve this.

0 comments on commit c11f53c

Please sign in to comment.