DOC: User Guide Split sample test

Add user guide split sample test
DassHydro · Jan 25, 2024 · b90b7d7 · b90b7d7
1 parent 56178d8
commit b90b7d7
Show file tree

Hide file tree

Showing 4 changed files with 350 additions and 1 deletion.
diff --git a/doc/source/_static/lez.png b/doc/source/_static/lez.png
diff --git a/doc/source/user_guide/index.rst b/doc/source/user_guide/index.rst
@@ -11,3 +11,4 @@ Quickstart
 
     quickstart/cance_first_simulation
     quickstart/france_large_domain_simulation
+    quickstart/lez_split_sample_test
diff --git a/doc/source/user_guide/quickstart/france_large_domain_simulation.rst b/doc/source/user_guide/quickstart/france_large_domain_simulation.rst
@@ -96,7 +96,7 @@ simulation period is different.
 Model mesh creation
 *******************
 
-For the meshing, we only need the flow direction file and the mainland France bouding box ``bbox`` to pass to the `smash.factory.generate_mesh`
+For the ``mesh``, we only need the flow direction file and the mainland France bouding box ``bbox`` to pass to the `smash.factory.generate_mesh`
 function. A bouding box in `smash` is a list of 4 values (``xmin``, ``xmax``, ``ymin``, ``ymax``), each of which corresponds respectively to 
 the x minimum value, the x maximum value, the y mimimum value and the y maximum value. The values must be in the same unit and projection as the 
 flow direction.

diff --git a/doc/source/user_guide/quickstart/lez_split_sample_test.rst b/doc/source/user_guide/quickstart/lez_split_sample_test.rst
@@ -0,0 +1,348 @@
+.. _user_guide.quickstart.lez_split_sample_test:
+
+=======================
+Lez - Split Sample Test
+=======================
+
+This third and last quickstart tutorial on `smash` will be carried out on another French catchment, **the Lez at Lattes**. The objective is to 
+set up a split sample test :cite:p:`klemevs1986operational` over two distinct periods ``p1`` and ``p2``.
+
+.. image:: ../../_static/lez.png
+    :width: 400
+    :align: center
+
+You need first to download all the required data.
+
+.. button-link:: https://smash.recover.inrae.fr/dataset/Lez-dataset.tar
+    :color: primary
+    :shadow:
+    :align: center
+
+    **Download**
+
+If the download was successful, a file named ``Lez-dataset.tar`` should be available. We can switch to the directory where this file has been 
+downloaded and extract it using the following command:
+
+.. code-block:: shell
+
+    tar xf Lez-dataset.tar
+
+Now a folder called ``Lez-dataset`` should be accessible and contain the following files and folders:
+
+- ``France_flwdir.tif``
+    A GeoTiff file containing the flow direction data,
+- ``gauge_attributes.csv``
+    A csv file containing the gauge attributes (gauge coordinates, drained area and code),
+- ``prcp``
+    A directory containing precipitation data in GeoTiff format with the following directory structure: ``%Y/%m`` 
+    (``2012/08``),
+- ``pet``
+    A directory containing daily interannual potential evapotranspiration data in GeoTiff format,
+- ``qobs``
+    A directory containing the observed discharge data in csv format,
+- ``descriptor``
+    A directory containing physiographic descriptors in GeoTiff format.
+
+We can open a Python interface in the **conda environment**. The current working directory will be assumed to be the directory where 
+the ``Lez-dataset`` is located.
+
+.. code-block:: shell
+
+    (smash) python3
+
+.. ipython:: python
+    :suppress:
+
+    import os
+    os.system("python3 gen_dataset.py -d Lez")
+
+Imports
+-------
+
+We will first import everything we need in this tutorial.
+
+.. ipython:: python
+
+    import smash
+    import numpy as np
+    import pandas as pd
+    import matplotlib.pyplot as plt
+
+Model creation
+--------------
+
+Model setup creation
+********************
+
+Since we are going to work on two different periods, each of 6 months, we need to create two ``setups`` dictionaries where the only difference 
+will be in the simulation period arguments ``start_time`` and ``end_time``. The first period ``p1`` will run from ``2012-08-01`` to
+``2013-01-31`` and the second, from ``2013-02-01`` to ``2013-07-31``.
+
+.. ipython:: python
+
+    setup_p1 = {
+        "start_time": "2012-08-01",
+        "end_time": "2013-01-31",
+        "dt": 86_400, # daily time step
+        "hydrological_module": "gr4", 
+        "routing_module": "lr",
+        "read_prcp": True, 
+        "prcp_directory": "./Lez-dataset/prcp", 
+        "read_pet": True,  
+        "pet_directory": "./Lez-dataset/pet",
+        "read_qobs": True,
+        "qobs_directory": "./Lez-dataset/qobs"
+    }
+
+    setup_p2 = {
+        "start_time": "2013-02-01",
+        "end_time": "2013-07-31",
+        "dt": 86_400, # daily time step
+        "hydrological_module": "gr4", 
+        "routing_module": "lr",
+        "read_prcp": True, 
+        "prcp_directory": "./Lez-dataset/prcp", 
+        "read_pet": True,  
+        "pet_directory": "./Lez-dataset/pet",
+        "read_qobs": True,
+        "qobs_directory": "./Lez-dataset/qobs" 
+    }
+
+Model mesh creation
+*******************
+
+For the ``mesh``, there is no need to generate two different ``meshes`` dictionaries, the same one can be used for both periods. Similar to the 
+**Cance** tutorial, we are going to create a ``mesh`` from the attributes of the catchment gauges.
+
+.. ipython:: python
+
+    gauge_attributes = pd.read_csv("./Lez-dataset/gauge_attributes.csv")
+
+    mesh = smash.factory.generate_mesh(
+        flwdir_path="./Lez-dataset/France_flwdir.tif",
+        x=list(gauge_attributes["x"]),
+        y=list(gauge_attributes["y"]),
+        area=list(gauge_attributes["area"] * 1e6), # Convert km² to m²
+        code=list(gauge_attributes["code"]),
+    )
+
+And quickly see that the generated ``mesh`` is correct
+
+.. ipython:: python
+
+    plt.imshow(mesh["flwdst"]);
+    plt.colorbar(label="Flow distance (m)");
+    @savefig user_guide.quickstart.lez_split_sample_test.flwdst.png
+    plt.title("Lez - Flow distance");
+
+.. ipython:: python
+
+    plt.imshow(mesh["flwacc"]);
+    plt.colorbar(label="Flow accumulation (m²)");
+    @savefig user_guide.quickstart.lez_split_sample_test.flwacc.png
+    plt.title("Lez - Flow accumulation");
+
+.. ipython:: python
+
+    base = np.zeros(shape=(mesh["nrow"], mesh["ncol"]))
+    base = np.where(mesh["active_cell"] == 0, np.nan, base)
+    for pos in mesh["gauge_pos"]:
+        base[pos[0], pos[1]] = 1
+    plt.imshow(base, cmap="Set1_r");
+    @savefig user_guide.quickstart.lez_split_sample_test.gauge_position.png
+    plt.title("Lez - Gauge position");
+
+Then, we can initialize the two `smash.Model` objects
+
+.. ipython:: python
+
+    model_p1 = smash.Model(setup_p1, mesh)
+    model_p2 = smash.Model(setup_p2, mesh)
+
+Model simulation
+----------------
+
+Optimization
+************
+
+First, we will optimize both models for each period to generate two sets of optimized rainfall-runoff parameters.
+So far, to optimize, we have called the method associated with the `smash.Model` object `Model.optimize <smash.Model.optimize>`. This method
+will modify the associated object in place (i.e. the values of the rainfall-runoff parameters after calling this function are modified). Here, we
+want to optimize the model but still keep this model to run the validation afterwards. To do this, instead of calling the method
+`Model.optimize <smash.Model.optimize>` method, we can call the `smash.optimize` function, which is identical but takes a
+`smash.Model` object as input and returns a copy of it. This method allows you to optimize a `smash.Model` object and store the results in 
+another object without modifying the initial one.
+
+Similar to the **Cance** tutorial, we will perform a simple spatially uniform optimization of the rainfall-runoff parameters by minimizing the
+Nash-Sutcliffe efficiency on the most downstream gauge.
+
+.. To speed up documentation generation
+.. ipython:: python
+    :suppress:
+
+    ncpu = min(5, max(1, os.cpu_count() - 1))
+    model_p1_opt = smash.optimize(model_p1, common_options={"ncpu": ncpu})
+    model_p2_opt = smash.optimize(model_p2, common_options={"ncpu": ncpu})
+
+.. ipython:: python
+    :verbatim:
+
+    model_p1_opt = smash.optimize(model_p1)
+    model_p2_opt = smash.optimize(model_p2)
+
+We can take a look at the hydrographs and optimized rainfall-runoff parameters.
+
+.. ipython:: python
+
+    code = model_p1.mesh.code[0]
+
+    f, (ax1, ax2) = plt.subplots(1, 2, figsize=(11, 4));
+
+    qobs = model_p1_opt.response_data.q[0,:].copy()
+    qobs = np.where(qobs < 0, np.nan, qobs) # To deal with missing values
+    qsim = model_p1_opt.response.q[0,:]
+    ax1.plot(qobs);
+    ax1.plot(qsim);
+    ax1.grid(ls="--", alpha=.7);
+    ax1.set_xlabel("Time step");
+    ax1.set_ylabel("Discharge ($m^3/s$)");
+
+    qobs = model_p2_opt.response_data.q[0,:].copy()
+    qobs = np.where(qobs < 0, np.nan, qobs) # To deal with missing values
+    qsim = model_p2_opt.response.q[0,:]
+    ax2.plot(qobs, label="Observed discharge");
+    ax2.plot(qsim, label="Simulated discharge");
+    ax2.grid(ls="--", alpha=.7);
+    ax2.set_xlabel("Time step");
+    ax2.legend();
+
+    @savefig user_guide.quickstart.lez_split_sample_test.optimize_q.png
+    f.suptitle(
+        f"Observed and simulated discharge at gauge {code}"
+        " for period p1 (left) and p2 (right)\nCalibration"
+    );
+
+.. ipython:: python
+
+    ind = tuple(model_p1.mesh.gauge_pos[0, :])
+
+    opt_parameters_p1 = {
+        k: model_p1_opt.get_rr_parameters(k)[ind] for k in ["cp", "ct", "kexc", "llr"]
+    } # A dictionary comprehension
+
+    opt_parameters_p2 = {
+        k: model_p2_opt.get_rr_parameters(k)[ind] for k in ["cp", "ct", "kexc", "llr"]
+    } # A dictionary comprehension
+
+    opt_parameters_p1
+    opt_parameters_p2
+
+Temporal validation
+*******************
+
+Rainfall-runoff parameters transfer
+'''''''''''''''''''''''''''''''''''
+
+We can now transfer the optimized rainfall-runoff parameters from each period to their respective validation period. 
+We will transfer the rainfall-runoff parameters from ``model_p1_opt`` to ``model_p2`` and from ``model_p2_opt`` to ``model_p1``. 
+There are several ways to do this:
+
+- Transfer all rainfall-runoff parameters at once
+    All rainfall-runoff parameters are stored in the variable ``values`` of the object `Model.rr_parameters <smash.Model.rr_parameters>`. 
+    We can therefore pass the whole array of rainfall-runoff parameters from one object to the other.
+
+    .. ipython:: python
+
+        model_p1.rr_parameters.values = model_p2_opt.rr_parameters.values.copy()
+        model_p2.rr_parameters.values = model_p1_opt.rr_parameters.values.copy()
+
+    .. note::
+        A deep copy is recommended to avoid that the rainfall-runoff parameters between each object become shallow copies and
+        so that the modification of one of the arrays leads to the modification of another.
+
+- Transfer each rainfall-runoff parameter one by one
+    It is also possible to loop on each rainfall-runoff parameter and assign new rainfall-runoff parameter by passing
+    by getters and setters
+
+    .. ipython:: python
+
+        for key in model_p1.rr_parameters.keys:
+            model_p1.set_rr_parameters(key, model_p2_opt.get_rr_parameters(key))
+            model_p2.set_rr_parameters(key, model_p1_opt.get_rr_parameters(key))
+
+    .. note::
+        This method allows, instead of looping on all rainfall-runoff parameters, to loop only on some. We can replace
+        ``model_p1.rr_parameters.keys`` by ``["cp", "ct"]`` for example
+
+Forward run
+'''''''''''
+
+Once the rainfall-runoff parameters have been transferred, we can proceed with the validation forward runs by calling the 
+`Model.forward_run <smash.Model.forward_run>` method.
+
+.. ipython:: python
+
+    model_p1.forward_run()
+    model_p2.forward_run()
+
+and visualize hydrographs
+
+.. ipython:: python
+
+    code = model_p1.mesh.code[0]
+
+    f, (ax1, ax2) = plt.subplots(1, 2, figsize=(11, 4));
+
+    qobs = model_p1.response_data.q[0,:].copy()
+    qobs = np.where(qobs < 0, np.nan, qobs) # To deal with missing values
+    qsim = model_p1.response.q[0,:]
+    ax1.plot(qobs);
+    ax1.plot(qsim);
+    ax1.grid(ls="--", alpha=.7);
+    ax1.set_xlabel("Time step");
+    ax1.set_ylabel("Discharge ($m^3/s$)");
+
+    qobs = model_p2.response_data.q[0,:].copy()
+    qobs = np.where(qobs < 0, np.nan, qobs) # To deal with missing values
+    qsim = model_p2.response.q[0,:]
+    ax2.plot(qobs, label="Observed discharge");
+    ax2.plot(qsim, label="Simulated discharge");
+    ax2.grid(ls="--", alpha=.7);
+    ax2.set_xlabel("Time step");
+    ax2.legend();
+
+    @savefig user_guide.quickstart.lez_split_sample_test.forward_run_q.png
+    f.suptitle(
+        f"Observed and simulated discharge at gauge {code}"
+        " for period p1 (left) and p2 (right)\nValidation"
+    );
+
+Model performances
+------------------
+
+Finally, we can look at calibration and validation performances using certain metrics. Using the function `smash.metrics`,
+you can compute one metric of your choice (among those available) for all the gauges that make up the ``mesh``. Here, we are interested 
+in the ``nse`` (the calibration metric) and the ``kge`` for the downstream gauge only. We will create two `pandas.DataFrame`, one for the 
+calibration performances and the other for the validation performances.
+
+.. ipython:: python
+
+    metrics = ["nse", "kge"]
+    perf_cal = pd.DataFrame(index=["p1", "p2"], columns=metrics)
+    perf_val = perf_cal.copy()
+
+    for m in metrics:
+        perf_cal.loc["p1", m] = np.round(smash.metrics(model_p1_opt, metric=m)[0], 2)
+    
+    for m in metrics:
+        perf_cal.loc["p2", m] = np.round(smash.metrics(model_p2_opt, metric=m)[0], 2)
+    
+    for m in metrics:
+        perf_val.loc["p1", m] = np.round(smash.metrics(model_p1, metric=m)[0], 2)
+    
+    for m in metrics:
+        perf_val.loc["p2", m] = np.round(smash.metrics(model_p2, metric=m)[0], 2)
+
+    perf_cal # Calibration performances
+
+    perf_val # Validation performances