Merge pull request #43 from DassHydro-dev/maintenance/0.5.x

Merge maintenance/0.5.x into main - version 0.5.0
DassHydro · Jul 12, 2023 · 9a7ef92 · 9a7ef92
2 parents 05cc1b7 + bd29487
commit 9a7ef92
Show file tree

Hide file tree

Showing 33 changed files with 2,124 additions and 265 deletions.
diff --git a/doc/source/api_reference/hdf5_handler.rst b/doc/source/api_reference/hdf5_handler.rst
@@ -0,0 +1,5 @@
+.. _api_reference.hdf5_handler:
+
+============
+Hdf5 handler
+============
diff --git a/doc/source/api_reference/hdf5_io.rst b/doc/source/api_reference/hdf5_io.rst
@@ -0,0 +1,5 @@
+.. _api_reference.hdf5_io:
+
+=======
+Hdf5 io
+=======
diff --git a/doc/source/api_reference/index.rst b/doc/source/api_reference/index.rst
@@ -20,6 +20,10 @@ Core Python
    io
    generate_samples
    sparse_storage
+   hdf5_io
+   raster_handler
+   hdf5_handler
+   object_handler
 
 
 Wrapped Fortran
@@ -28,4 +32,4 @@ Wrapped Fortran
    :maxdepth: 1
 
    derived_type
-   optimize_routines
+   optimize_routines
diff --git a/doc/source/api_reference/object_handler.rst b/doc/source/api_reference/object_handler.rst
@@ -0,0 +1,5 @@
+.. _api_reference.object_handler:
+
+==============
+Object handler
+==============
diff --git a/doc/source/api_reference/raster_handler.rst b/doc/source/api_reference/raster_handler.rst
@@ -0,0 +1,26 @@
+.. _api_reference.raster_handler:
+
+==============
+Raster handler
+==============
+
+.. currentmodule:: smash.tools.raster_handler
+
+Some functions to manipulate raster files
+*****************************************
+.. autosummary::
+   :toctree: smash/
+
+   gdal_raster_open
+   gdal_read_windowed_raster
+   gdal_reproject_raster
+   gdal_crop_dataset_to_array
+   gdal_crop_dataset_to_ndarray
+   gdal_write_dataset
+   gdal_get_geotransform
+   gdal_smash_window_from_geotransform
+   union_bbox
+   get_bbox
+   get_bbox_from_window
+   get_window_from_bbox
+   crop_array
diff --git a/doc/source/math_num_documentation/signal_analysis/hydrograph_segmentation.rst b/doc/source/math_num_documentation/signal_analysis/hydrograph_segmentation.rst
@@ -59,6 +59,5 @@ For :math:`t_{j}\in E`:
 .. note::
 
     If there exists :math:`m+1` :math:`(m>0)` consecutive events :math:`(sd_{u},ed_{u}),...,(sd_{u+m},ed_{u+m})` 
-    occurring "nearly simultaneously", that means all of these events 
-    occur in no more than ``max_duration`` hours: :math:`ed_{u+m}<sd_{u}+` ``max_duration``, then we 
-    merge these :math:`m+1` events into a single event :math:`(sd_{u},ed_{u+m})`.
+    occurring "nearly simultaneously", then we merge these :math:`m+1` events into a single event :math:`(sd_{u},ed_{u+m})`. 
+    Note that the duration of the merged event may exceed the specified maximum duration of ``max_duration`` hours.
diff --git a/doc/source/release/0.5.0-notes.rst b/doc/source/release/0.5.0-notes.rst
@@ -0,0 +1,122 @@
+.. _release.0.5.0-notes:
+
+.. currentmodule:: smash
+
+=========================
+smash 0.5.0 Release Notes
+=========================
+
+The smash 0.5.0 release continues the ongoing work to improve the handling, fix possible bugs, clarify the documentation. The highlights are:
+
+- Reading atmospheric data with YYYY/MM/DD access
+- New regularization function
+- Spatial disaggregation/aggregation of the input raster
+- Fix bugs in mesh creation, signature calculation and regularization l-curve
+
+------------
+Contributors
+------------
+
+This release was made possible thanks to the contributions of:
+
+- Maxime Jay-Allemand
+- Ngo Nghi Truyen Huynh
+- François Colleoni
+
+------------
+Deprecations
+------------
+
+BayesResult object
+******************
+
+The ``density`` attribute of the :class:`smash.BayesResult` object has been deprecated in preparation for the upcoming release 1.0.0. 
+The other two attributes, ``data`` and ``lcurve``, are still available and can be used for further analysis. 
+
+------------
+Improvements
+------------
+
+Reading atmospheric data with YYYY/MM/DD access
+***********************************************
+
+This mode is triggered by enabling the flag prcp_yyyymmdd_access in the model setup file. The atmospheric data files are supposed to be stored in a directory YYYY/MM/dd. This option is useful if the model is ran time step by time step (many incremental runs). In that case searching the atmospheric data files can be relatively slow (1 second multiplicate by the number of runs). With this mode it is optimized and it is faster. 
+
+------------
+New Features
+------------
+
+New regularization function
+***************************
+
+hard-smoothing : the smoothing regularization function is applied on parameters or states directly. This behavior differs from the ``smoothing`` mode where the regularization is applied on the difference between the background and the control (parameters or states)
+
+New functions for reading and writting hdf5 files
+*************************************************
+
+The new functions are generic. You can save a dictionary to an hdf5, save an object (not only smash) to an hdf5, read an object as dictionary, read an hdf5 as a dict, read an hdf5 as a smash model object. Functions are provided by smash.io.hdf5_io.py. hdf5 can be opened in read-only to provide several simultaneous access. During the export or the reading, the structure of the dictionary or object is preserved. When saving an object or a dictionary in an hdf5, the location can be specified so that dictionary or object can be saved side by side at different places.
+
+Spatial disaggregation/aggregation of the input raster
+******************************************************
+
+If the resolution of the input raster is different from the resolution of the model mesh, the input rasters are automatically reprojected by gdal. In that case the reading of the input can be slower. For best performances, it can be useful to preprocess the input files (precipitations).
+
+See API Reference section :ref:`api_reference.raster_handler`.
+
+-----
+Fixes
+-----
+
+Boundary conditions checking
+****************************
+
+The boundary condition checking previously used a tolerance of 1e-6, which caused issues in certain cases due to machine precision when passing from Python to Fortran via the f90wrapper.
+To address this problem, the tolerance has been decreased to 1e-3.
+
+See issue `#23 <https://github.com/DassHydro-dev/smash/issues/23>`__.
+
+Bug fixes when generating the l-curve.
+**************************************
+
+Issues have been solved when selecting the optimal weight for the regularization term.
+
+Event signatures computation
+****************************
+
+The bug related to the computation of flood event signatures has been resolved for specific cases where the peak event is observed during the last time steps in the time window.
+
+See issue `#28 <https://github.com/DassHydro-dev/smash/issues/28>`__.
+
+Segmentation algorithm
+**********************
+
+If multiple events are detected, the duration of the merged event is no longer constrained by the max duration parameter. Instead, its duration may exceed this value.
+
+Catchment delineation segmentation fault
+****************************************
+
+An error occured when two neighboring cells have antagonistic flow directions ``(1, 5)``, ``(2, 6)``, ``(3, 7)``, ``(4, 8)``. This should be corrected directly in the flow direction file but to avoid 
+segmentation faults when the maximum number of recursions has been reached, a check is added to the code to exit recursion in that case.
+
+See issue `#31 <https://github.com/DassHydro-dev/smash/issues/31>`__.
+
+Catchment flow distances on adjacent non-nested catchments
+**********************************************************
+
+There is a bug when calculating flow distances when two adjacent catchments are considered in the mesh but non-nested. During calculation, a flag is set around the 8 adjacent cells of each upstream cell and not on the upstream cell in particular. As a result, a gauge stuck to a cell of another catchment will not be considered as a non-nested gauge and will be filled with -99. The bug has been solved by flagging only the upstream cell and not the 8 adjacent cells.
+
+See issue `#38 <https://github.com/DassHydro-dev/smash/issues/38>`__.
+
+Correctly handle Nodata value during the spatial disaggregation of the rainfall
+*******************************************************************************
+
+A crash occured during the disaggregation of the rainfall. The creation of a GDAL virtual-destination failed when the parent geotiff file has its Nodata value unset (None type). When this is the case, the Nodata value of the disaggregation rainfall is automatically set to -99.
+
+See issue `#40 <https://github.com/DassHydro-dev/smash/issues/40>`__.
+
+Stop the execution of smash when ``start_time`` is equal to ``end_time``
+************************************************************************
+
+When ``start_time`` is equal to ``end_time``, the code crashes during the data reading with no obvious reason. Now just stop the code execution and return an error when this case occurs.
+
+See issue `#41 <https://github.com/DassHydro-dev/smash/issues/41>`__.
diff --git a/doc/source/release/index.rst b/doc/source/release/index.rst
@@ -7,6 +7,7 @@ Release notes
 .. toctree::
     :maxdepth: 3
 
+    0.5.0 <0.5.0-notes>
     0.4.2 <0.4.2-notes>
     0.4.1 <0.4.1-notes>
     0.4.0 <0.4.0-notes>

diff --git a/doc/source/user_guide/in_depth/optimize/bayes_optimize.rst b/doc/source/user_guide/in_depth/optimize/bayes_optimize.rst
@@ -130,8 +130,6 @@ It can be implemented using the :class:`smash.Model.bayes_optimize` method as fo
             return_br=True
         )
 
-    model_bo.output.cost  # cost value with HDBC
-
 .. ipython:: python
     :verbatim:
 
@@ -144,6 +142,8 @@ It can be implemented using the :class:`smash.Model.bayes_optimize` method as fo
             return_br=True
         )
 
+.. ipython:: python
+
     model_bo.output.cost  # cost value with HDBC
 
 .. note::

diff --git a/setup.py b/setup.py
@@ -15,6 +15,7 @@
         "smash.solver",
         "smash.mesh",
         "smash.io",
+        "smash.tools",
         "smash.dataset",
         "smash.tests",
         "smash.tests.core",

diff --git a/smash/__init__.py b/smash/__init__.py
@@ -13,6 +13,7 @@
 from smash.io.mesh_io import save_mesh, read_mesh
 from smash.io.model_io import save_model, read_model
 from smash.io.model_ddt_io import save_model_ddt, read_model_ddt
+from smash.io.hdf5_io import save_smash_model_to_hdf5, load_hdf5_file
 
 from smash.dataset.load import load_dataset
 
@@ -44,6 +45,8 @@ def __getattr__(name):
     "read_model",
     "save_model_ddt",
     "read_model_ddt",
+    "save_smash_model_to_hdf5",
+    "load_hdf5_file",
     "load_dataset",
 ]
 

diff --git a/smash/core/_build_model.py b/smash/core/_build_model.py
@@ -83,9 +83,9 @@ def _standardize_setup(setup: SetupDT):
     except:
         raise ValueError("argument end_time is not a valid date")
 
-    if (et - st).total_seconds() < 0:
+    if (et - st).total_seconds() <= 0:
         raise ValueError(
-            "argument end_time corresponds to an earlier date than start_time"
+            "argument end_time is a date earlier to or equal to argument start_time"
         )
 
     if setup.read_qobs and setup.qobs_directory == "...":

diff --git a/smash/core/_constant.py b/smash/core/_constant.py
@@ -119,6 +119,7 @@
 JREG_FUN = [
     "prior",
     "smoothing",
+    "hard_smoothing",
 ]
 
 AUTO_WJREG = ["fast", "lcurve"]

diff --git a/smash/core/_event_segmentation.py b/smash/core/_event_segmentation.py
@@ -213,8 +213,8 @@ def _events_grad(
     ind = _detect_peaks(q, mph=np.quantile(q[q > 0], peak_quant))
     list_events = []
 
-    for i in ind:
-        p_search = p[range(max(i - start_seg, 0), i)]
+    for i_peak in ind:
+        p_search = p[range(max(i_peak - start_seg, 0), i_peak)]
         p_search_grad = np.gradient(p_search)
 
         ind_start = _detect_peaks(
@@ -236,41 +236,52 @@ def _events_grad(
 
         ind_start_minq = ind_start[0]
 
-        start = ind_start_minq + max(i - start_seg, 0)
+        start = ind_start_minq + max(i_peak - start_seg, 0)
 
-        peakp = _detect_peaks(p[start:i], mpd=len(p))
+        peakp = _detect_peaks(p[start:i_peak], mpd=len(p))
 
         if peakp.size == 0:
-            peakp = np.argmax(p[start:i]) + start
+            peakp = np.argmax(p[start:i_peak]) + start
 
         else:
             peakp = peakp[0] + start
 
-        qbf = _baseflow_separation(q[i - 1 : start + max_duration + end_search - 1])[0]
+        fwindow = min(
+            start + max_duration + end_search, q.size
+        )  # index for determining the end of dflow windows
 
-        dflow = q[i - 1 : start + max_duration + end_search - 1] - qbf
-        dflow = np.array([sum(i) for i in zip(*(dflow[i:] for i in range(end_search)))])
+        if fwindow <= i_peak:  # reject peak at the last time step
+            continue
 
-        end = i + np.argmin(dflow)
+        qbf = _baseflow_separation(q[i_peak - 1 : fwindow - 1])[0]
+
+        dflow = q[i_peak - 1 : fwindow - 1] - qbf
+        dflow_windows = (dflow[i:] for i in range(min(end_search, dflow.size)))
+        dflow = np.array([sum(i) for i in zip(*dflow_windows)])
+
+        end = i_peak + np.argmin(dflow)
 
         if len(list_events) > 0:
             prev_start = list_events[-1]["start"]
             prev_end = list_events[-1]["end"]
             prev_peakq = list_events[-1]["peakQ"]
             prev_peakp = list_events[-1]["peakP"]
 
-            # % merge two events respecting to max duration:
-            if max(end, prev_end) <= prev_start + max_duration:
+            # % detect double events:
+            if prev_end >= start:
                 list_events[-1]["end"] = max(end, prev_end)
+                list_events[-1]["start"] = min(start, prev_start)
 
-                if q[i] > q[prev_peakq]:
-                    list_events[-1]["peakQ"] = i
+                if q[i_peak] > q[prev_peakq]:
+                    list_events[-1]["peakQ"] = i_peak
 
                     if p[peakp] > p[prev_peakp]:
                         list_events[-1]["peakP"] = peakp
                 continue
 
-        list_events.append({"start": start, "end": end, "peakP": peakp, "peakQ": i})
+        list_events.append(
+            {"start": start, "end": end, "peakP": peakp, "peakQ": i_peak}
+        )
 
     return list_events