ENH: add adaptive optimizers for all mappings (#315)

* initalize branch combine optimizers * ENH: add set, get and forward_pass methods for Net * ENH: add test net forward pass + fix doc net * Fix indent docstring net * add weight/bias_shape into trainable layer * add weight/bias_shape into trainable layer * Remove redundant in Python function sbs_optimize * adaptive optimizers for all mappings * finalize optimizer combination * Fix parameter update when using early stopping + dic api doc optimize * Correct comment typo * Fix comment * Change x_train name to x in net * retrieve key x in control_info + fix api doc for control prior name * Fix dtype for control_info which is applied a finalization Python function * improve unbounded check for finalize_get_control_info function * Generate baseline * Fix errors occured when merging branch * Fix callback argument in _gradient_based_optimize_problem * Fix doc net * Fix raise message net.set_weight_bias * Fix error in previous merge * Add choices to raise error when using sbs optimizer for hybrid structures * Generic check optimizer in case of hybrid models * ENH: add random_state to set_weight and set_bias methods * Minor fix typos * DOC: fix optimize_options documentation * MAINT: remove lbfgsb fortran external file + fix verbose * MAINT: change to g format to display cost values optimize verbose * MAINT: change display format verbose smash.optmize * FIX: update verbose api documentation * FIX: fix float format verbose ann optimize * FIX: remove duplicated function due to merging error * MAINT: handle return options + fix verbose optimize: - Remove iter_projg and iter_cost from return_options - Fix verbose at last iteration sbs optimize * Fix api doc cance * FIX: make check * FIX/ENH: fix doc default_optimize + generic doc for mapping and optimizer * MAINT: remove see also default_optimize doc * FIX: returns control with random values depending on random_state instead of nan for optimize control info with net * MAINT: merge branch main into enh-adaptive-opt-for-all-mappings * Re-generate baseline and fix unittest * Apply suggestion changes from the first review of FC * Apply suggestion changes from PAG and FC review * Apply suggestion changes from FC second review
DassHydro · Sep 12, 2024 · 61ad424 · 61ad424
1 parent d5c92c6
commit 61ad424
Show file tree

Hide file tree

Showing 28 changed files with 1,828 additions and 6,058 deletions.
diff --git a/doc/source/math_num_documentation/forward_structure.rst b/doc/source/math_num_documentation/forward_structure.rst
@@ -9,7 +9,7 @@ In `smash` a forward/direct spatially distributed model is obtained by chaining
 - (optional) a descriptors-to-parameters mapping :math:`\phi` either for parameters imposing spatial constrain and/or regional mapping between physical descriptor and model conceptual parameters, see :ref:`mapping section <math_num_documentation.mapping>`.
 - (optional) a ``snow`` operator :math:`\mathcal{M}_{snw}` generating a melt flux :math:`m_{lt}` which is then summed with the precipitation flux to feed the ``hydrological`` operator :math:`\mathcal{M}_{rr}`.
 - A ``hydrological`` production operator :math:`\mathcal{M}_{rr}` generating an elementary discharge :math:`q_t` which feeds the routing operator. 
-- A ``routing`` operator :math:`\mathcal{M}_{hy}` simulating propagation of discharge :math:`Q)`.
+- A ``routing`` operator :math:`\mathcal{M}_{hy}` simulating propagation of discharge :math:`Q`.
 
 The operators chaining principle  is presented in section :ref:`forward and inverse problems statement <math_num_documentation.forward_inverse_problem.chaining>` (cf. :ref:`Eq. 2 <math_num_documentation.forward_inverse_problem.forward_problem_Mhy_circ_Mrr>` ) and the chaining fluxes are explicitated in the diagram below. The forward model obtained reads :math:`\mathcal{M}=\mathcal{M}_{hy}\left(\,.\,,\mathcal{M}_{rr}\left(\,.\,,\mathcal{M}_{snw}\left(.\right)\right)\right)` .
 

diff --git a/doc/source/user_guide/classical_uses/lez_regionalization.rst b/doc/source/user_guide/classical_uses/lez_regionalization.rst
@@ -224,7 +224,7 @@ We also pass other options specific to the use of a NN:
 - ``optimize_options``
     - ``random_state``: a random seed used to initialize neural network weights.
     - ``learning_rate``: the learning rate used for weights updates during training.
-    - ``termination_crit``: the number of training ``epochs`` for the neural network and a positive number to stop training when the loss function does not decrease below the current optimal value for  ``early_stopping`` consecutive ``epochs``
+    - ``termination_crit``: the maximum number of training ``maxiter`` for the neural network and a positive number to stop training when the loss function does not decrease below the current optimal value for ``early_stopping`` consecutive iterations.
 
 - ``return_options``
     - ``net``: return the optimized neural network
@@ -240,7 +240,7 @@ We also pass other options specific to the use of a NN:
         optimize_options={
             "random_state": 23,
             "learning_rate": 0.004,
-            "termination_crit": dict(epochs=100, early_stopping=20),
+            "termination_crit": dict(maxiter=100, early_stopping=20),
         },
         return_options={"net": True},
         common_options={"ncpu": ncpu},
@@ -255,7 +255,7 @@ We also pass other options specific to the use of a NN:
         optimize_options={
             "random_state": 23,
             "learning_rate": 0.004,
-            "termination_crit": dict(epochs=100, early_stopping=20),
+            "termination_crit": dict(maxiter=100, early_stopping=20),
         },
         return_options={"net": True},
     )
@@ -276,7 +276,7 @@ Other information is available in the `smash.factory.Net` object, including the
 .. ipython:: python
 
     plt.plot(opt_ann.net.history["loss_train"]);
-    plt.xlabel("Epoch");
+    plt.xlabel("Iteration");
     plt.ylabel("$1-NSE$");
     plt.grid(alpha=.7, ls="--");
     @savefig user_guide.classical_uses.lez_regionalization.ann_J.png

diff --git a/doc/source/user_guide/quickstart/cance_first_simulation.rst b/doc/source/user_guide/quickstart/cance_first_simulation.rst
@@ -681,19 +681,21 @@ First, several information were displayed on the screen during optimization
 
 .. code-block:: text
 
-    At iterate      0    nfg =     1    J =      0.695010    ddx = 0.64
-    At iterate      1    nfg =    30    J =      0.098411    ddx = 0.64
-    At iterate      2    nfg =    59    J =      0.045409    ddx = 0.32
-    At iterate      3    nfg =    88    J =      0.038182    ddx = 0.16
-    At iterate      4    nfg =   117    J =      0.037362    ddx = 0.08
-    At iterate      5    nfg =   150    J =      0.037087    ddx = 0.02
-    At iterate      6    nfg =   183    J =      0.036800    ddx = 0.02
-    At iterate      7    nfg =   216    J =      0.036763    ddx = 0.01
-    CONVERGENCE: DDX < 0.01
+    </> Optimize
+        At iterate     0    nfg =     1    J = 6.95010e-01    ddx = 0.64
+        At iterate     1    nfg =    30    J = 9.84107e-02    ddx = 0.64
+        At iterate     2    nfg =    59    J = 4.54087e-02    ddx = 0.32
+        At iterate     3    nfg =    88    J = 3.81818e-02    ddx = 0.16
+        At iterate     4    nfg =   117    J = 3.73617e-02    ddx = 0.08
+        At iterate     5    nfg =   150    J = 3.70873e-02    ddx = 0.02
+        At iterate     6    nfg =   183    J = 3.68004e-02    ddx = 0.02
+        At iterate     7    nfg =   216    J = 3.67635e-02    ddx = 0.01
+        At iterate     8    nfg =   240    J = 3.67277e-02    ddx = 0.01
+        CONVERGENCE: DDX < 0.01
 
 These lines show the different iterations of the optimization with information on the number of iterations, the number of cumulative evaluations ``nfg`` 
 (number of foward runs performed within each iteration of the optimization algorithm), the value of the cost function to minimize ``J`` and the value of the adaptive descent step ``ddx`` of this heuristic search algorihtm. 
-So, to summarize, the optimization algorithm has converged after 7 iterations by reaching the descent step tolerance criterion of 0.01. This optimization required to perform 216 forward run evaluations and leads to a final cost function value on the order of 0.04.
+So, to summarize, the optimization algorithm has converged after 8 iterations by reaching the descent step tolerance criterion of 0.01. This optimization required to perform 240 forward run evaluations and leads to a final cost function value of 0.0367.
 
 Then, we can ask which cost function ``J`` has been minimized and which parameters have been optimized. So, by default, the cost function to be minimized is one minus the Nash-Sutcliffe efficiency ``nse`` (:math:`1 - \text{NSE}`)
 and the optimized parameters are the set of rainfall-runoff parameters (``cp``, ``ct``, ``kexc`` and ``llr``). In the current configuration spatially

diff --git a/smash/_constant.py b/smash/_constant.py
@@ -757,9 +757,7 @@ def get_neurons_from_hydrological_module(hydrological_module: str, hidden_neuron
     "zeros",
 ]
 
-PY_OPTIMIZER_CLASS = ["Adam", "SGD", "Adagrad", "RMSprop"]
-
-PY_OPTIMIZER = [opt.lower() for opt in PY_OPTIMIZER_CLASS]
+OPTIMIZER_CLASS = ["Adam", "SGD", "Adagrad", "RMSprop"]
 
 ACTIVATION_FUNCTION_CLASS = [
     "Sigmoid",
@@ -799,31 +797,23 @@ def get_neurons_from_hydrological_module(hydrological_module: str, hidden_neuron
 
 MAPPING = ["uniform", "distributed"] + REGIONAL_MAPPING
 
-F90_OPTIMIZER = ["sbs", "lbfgsb"]
+ADAPTIVE_OPTIMIZER = [opt.lower() for opt in OPTIMIZER_CLASS]
+GRADIENT_BASED_OPTIMIZER = ["lbfgsb"] + ADAPTIVE_OPTIMIZER
+HEURISTIC_OPTIMIZER = ["sbs"]
 
-OPTIMIZER = F90_OPTIMIZER + PY_OPTIMIZER
+OPTIMIZER = HEURISTIC_OPTIMIZER + GRADIENT_BASED_OPTIMIZER
 
 # % Following MAPPING order
 # % The first optimizer for each mapping is used as default optimizer
 MAPPING_OPTIMIZER = dict(
     zip(
         MAPPING,
         [
-            F90_OPTIMIZER,
-            ["lbfgsb"],
-            ["lbfgsb"],
-            ["lbfgsb"],
-            PY_OPTIMIZER,
-        ],
-    )
-)
-
-F90_OPTIMIZER_CONTROL_TFM = dict(
-    zip(
-        F90_OPTIMIZER,
-        [
-            ["sbs", "normalize", "keep"],
-            ["normalize", "keep"],
+            OPTIMIZER,  # for uniform mapping (all optimizers are possible, default is sbs)
+            *(
+                [GRADIENT_BASED_OPTIMIZER] * 3
+            ),  # for distributed, multi-linear, multi-polynomial mappings (default is lbfgsb)
+            ADAPTIVE_OPTIMIZER,  # for ann mapping (default is adam)
         ],
     )
 )
@@ -843,11 +833,11 @@ def get_neurons_from_hydrological_module(hydrological_module: str, hidden_neuron
 DEFAULT_TERMINATION_CRIT = dict(
     **dict(
         zip(
-            F90_OPTIMIZER,
+            ["sbs", "lbfgsb"],
             [{"maxiter": 50}, {"maxiter": 100, "factr": 1e6, "pgtol": 1e-12}],
         )
     ),
-    **dict(zip(PY_OPTIMIZER, len(PY_OPTIMIZER) * [{"epochs": 200, "early_stopping": 0}])),
+    **dict(zip(ADAPTIVE_OPTIMIZER, len(ADAPTIVE_OPTIMIZER) * [{"maxiter": 200, "early_stopping": 0}])),
 )
 
 CONTROL_PRIOR_DISTRIBUTION = [
@@ -885,6 +875,22 @@ def get_neurons_from_hydrological_module(hydrological_module: str, hidden_neuron
         "control_tfm",
         "termination_crit",
     ],
+    **dict(
+        zip(
+            itertools.product(["uniform", "distributed"], ADAPTIVE_OPTIMIZER),
+            2
+            * len(ADAPTIVE_OPTIMIZER)
+            * [
+                [
+                    "parameters",
+                    "bounds",
+                    "control_tfm",
+                    "learning_rate",
+                    "termination_crit",
+                ]
+            ],
+        )
+    ),  # product between 2 mappings (uniform, distributed) and all adaptive optimizers
     ("multi-linear", "lbfgsb"): [
         "parameters",
         "bounds",
@@ -901,8 +907,25 @@ def get_neurons_from_hydrological_module(hydrological_module: str, hidden_neuron
     ],
     **dict(
         zip(
-            [("ann", optimizer) for optimizer in PY_OPTIMIZER],
-            len(PY_OPTIMIZER)
+            itertools.product(["multi-linear", "multi-polynomial"], ADAPTIVE_OPTIMIZER),
+            2
+            * len(ADAPTIVE_OPTIMIZER)
+            * [
+                [
+                    "parameters",
+                    "bounds",
+                    "control_tfm",
+                    "descriptor",
+                    "learning_rate",
+                    "termination_crit",
+                ]
+            ],
+        )
+    ),  # product between 2 mappings (multi-linear, multi-polynomial) and all adaptive optimizers
+    **dict(
+        zip(
+            [("ann", optimizer) for optimizer in ADAPTIVE_OPTIMIZER],
+            len(ADAPTIVE_OPTIMIZER)
             * [
                 [
                     "parameters",
@@ -914,7 +937,16 @@ def get_neurons_from_hydrological_module(hydrological_module: str, hidden_neuron
                 ]
             ],
         )
-    ),
+    ),  # ann mapping and all adaptive optimizers
+}
+
+OPTIMIZER_CONTROL_TFM = {
+    (mapping, optimizer): ["sbs", "normalize", "keep"]  # in case of sbs optimizer
+    if optimizer == "sbs"
+    else ["normalize", "keep"]  # in case of ann mapping
+    if mapping != "ann"
+    else ["keep"]  # other cases
+    for mapping, optimizer in SIMULATION_OPTIMIZE_OPTIONS_KEYS.keys()
 }
 
 DEFAULT_SIMULATION_COST_OPTIONS = {
@@ -962,11 +994,10 @@ def get_neurons_from_hydrological_module(hydrological_module: str, hidden_neuron
         "rr_states": False,
         "q_domain": False,
         "internal_fluxes": False,
-        "iter_cost": False,
-        "iter_projg": False,
         "control_vector": False,
         "net": False,
         "cost": False,
+        "projg": False,
         "jobs": False,
         "jreg": False,
         "lcurve_wjreg": False,
@@ -976,10 +1007,9 @@ def get_neurons_from_hydrological_module(hydrological_module: str, hidden_neuron
         "rr_states": False,
         "q_domain": False,
         "internal_fluxes": False,
-        "iter_cost": False,
-        "iter_projg": False,
         "control_vector": False,
         "cost": False,
+        "projg": False,
         "log_lkh": False,
         "log_prior": False,
         "log_h": False,

diff --git a/smash/core/model/_standardize.py b/smash/core/model/_standardize.py
@@ -448,8 +448,8 @@ def _standardize_model_setup_hidden_neuron(hidden_neuron: Numeric | ListLike, **
 def _standardize_model_setup(setup: dict) -> dict:
     if isinstance(setup, dict):
         pop_keys = []
-        for key in setup.keys():
-            if key not in DEFAULT_MODEL_SETUP.keys():
+        for key in setup:
+            if key not in DEFAULT_MODEL_SETUP:
                 pop_keys.append(key)
                 warnings.warn(
                     f"Unknown model setup key '{key}'. Choices: {list(DEFAULT_MODEL_SETUP.keys())}",

diff --git a/smash/core/signal_analysis/evaluation/evaluation.py b/smash/core/signal_analysis/evaluation/evaluation.py
@@ -79,22 +79,23 @@ def evaluation(
 
     >>> model.optimize()
     </> Optimize
-        At iterate      0    nfg =     1    J =      0.695010    ddx = 0.64
-        At iterate      1    nfg =    30    J =      0.098411    ddx = 0.64
-        At iterate      2    nfg =    59    J =      0.045409    ddx = 0.32
-        At iterate      3    nfg =    88    J =      0.038182    ddx = 0.16
-        At iterate      4    nfg =   117    J =      0.037362    ddx = 0.08
-        At iterate      5    nfg =   150    J =      0.037087    ddx = 0.02
-        At iterate      6    nfg =   183    J =      0.036800    ddx = 0.02
-        At iterate      7    nfg =   216    J =      0.036763    ddx = 0.01
+        At iterate     0    nfg =     1    J = 6.95010e-01    ddx = 0.64
+        At iterate     1    nfg =    30    J = 9.84107e-02    ddx = 0.64
+        At iterate     2    nfg =    59    J = 4.54087e-02    ddx = 0.32
+        At iterate     3    nfg =    88    J = 3.81818e-02    ddx = 0.16
+        At iterate     4    nfg =   117    J = 3.73617e-02    ddx = 0.08
+        At iterate     5    nfg =   150    J = 3.70873e-02    ddx = 0.02
+        At iterate     6    nfg =   183    J = 3.68004e-02    ddx = 0.02
+        At iterate     7    nfg =   216    J = 3.67635e-02    ddx = 0.01
+        At iterate     8    nfg =   240    J = 3.67277e-02    ddx = 0.01
         CONVERGENCE: DDX < 0.01
 
     Compute multiple evaluation metrics for all catchments
 
     >>> smash.evaluation(model, metric=["mae", "mse", "nse", "kge"])
-    array([[ 3.16965151, 44.78328323,  0.96327233,  0.94752783],
-           [ 1.07771611,  4.38410997,  0.90453297,  0.84582865],
-           [ 0.33045691,  0.50611502,  0.84956211,  0.8045246 ]])
+    array([[ 3.16766095, 44.77915192,  0.96327233,  0.94864655],
+           [ 1.07711864,  4.36171055,  0.90502125,  0.84566253],
+           [ 0.33053449,  0.50542408,  0.84976768,  0.8039571 ]])
 
     Add start and end evaluation dates
 
@@ -106,12 +107,12 @@ def evaluation(
 
     >>> smash.evaluation(model, metric="nse")
     array([[0.96327233],
-           [0.90453297],
-           [0.84956211]])
+           [0.90502125],
+           [0.84976768]])
     >>> smash.evaluation(model, metric="nse", start_eval=start_eval, end_eval=end_eval)
-    array([[0.9404493 ],
-           [0.86493075],
-           [0.76471144]])
+    array([[0.94048417],
+           [0.8667959 ],
+           [0.76593578]])
     """
     metric, start_eval, end_eval = _standardize_evaluation_args(metric, start_eval, end_eval, model.setup)