Skip to content

Commit

Permalink
ENH: add adaptive optimizers for all mappings (#315)
Browse files Browse the repository at this point in the history
* initalize branch combine optimizers

* ENH: add set, get and forward_pass methods for Net

* ENH: add test net forward pass + fix doc net

* Fix indent docstring net

* add weight/bias_shape into trainable layer

* add weight/bias_shape into trainable layer

* Remove redundant in Python function sbs_optimize

* adaptive optimizers for all mappings

* finalize optimizer combination

* Fix parameter update when using early stopping + dic api doc optimize

* Correct comment typo

* Fix comment

* Change x_train name to x in net

* retrieve key x in control_info + fix api doc for control prior name

* Fix dtype for control_info which is applied a finalization Python function

* improve unbounded check for finalize_get_control_info function

* Generate baseline

* Fix errors occured when merging branch

* Fix callback argument in _gradient_based_optimize_problem

* Fix doc net

* Fix raise message net.set_weight_bias

* Fix error in previous merge

* Add choices to raise error when using sbs optimizer for hybrid structures

* Generic check optimizer in case of hybrid models

* ENH: add random_state to set_weight and set_bias methods

* Minor fix typos

* DOC: fix optimize_options documentation

* MAINT: remove lbfgsb fortran external file + fix verbose

* MAINT: change to g format to display cost values optimize verbose

* MAINT: change display format verbose smash.optmize

* FIX: update verbose api documentation

* FIX: fix float format verbose ann optimize

* FIX: remove duplicated function due to merging error

* MAINT: handle return options + fix verbose optimize:
- Remove iter_projg and iter_cost from return_options
- Fix verbose at last iteration sbs optimize

* Fix api doc cance

* FIX: make check

* FIX/ENH: fix doc default_optimize + generic doc for mapping and optimizer

* MAINT: remove see also default_optimize doc

* FIX: returns control with random values depending on random_state instead of nan for optimize control info with net

* MAINT: merge branch main into enh-adaptive-opt-for-all-mappings

* Re-generate baseline and fix unittest

* Apply suggestion changes from the first review of FC

* Apply suggestion changes from PAG and FC review

* Apply suggestion changes from FC second review
  • Loading branch information
nghi-truyen authored Sep 12, 2024
1 parent d5c92c6 commit 61ad424
Show file tree
Hide file tree
Showing 28 changed files with 1,828 additions and 6,058 deletions.
2 changes: 1 addition & 1 deletion doc/source/math_num_documentation/forward_structure.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ In `smash` a forward/direct spatially distributed model is obtained by chaining
- (optional) a descriptors-to-parameters mapping :math:`\phi` either for parameters imposing spatial constrain and/or regional mapping between physical descriptor and model conceptual parameters, see :ref:`mapping section <math_num_documentation.mapping>`.
- (optional) a ``snow`` operator :math:`\mathcal{M}_{snw}` generating a melt flux :math:`m_{lt}` which is then summed with the precipitation flux to feed the ``hydrological`` operator :math:`\mathcal{M}_{rr}`.
- A ``hydrological`` production operator :math:`\mathcal{M}_{rr}` generating an elementary discharge :math:`q_t` which feeds the routing operator.
- A ``routing`` operator :math:`\mathcal{M}_{hy}` simulating propagation of discharge :math:`Q)`.
- A ``routing`` operator :math:`\mathcal{M}_{hy}` simulating propagation of discharge :math:`Q`.

The operators chaining principle is presented in section :ref:`forward and inverse problems statement <math_num_documentation.forward_inverse_problem.chaining>` (cf. :ref:`Eq. 2 <math_num_documentation.forward_inverse_problem.forward_problem_Mhy_circ_Mrr>` ) and the chaining fluxes are explicitated in the diagram below. The forward model obtained reads :math:`\mathcal{M}=\mathcal{M}_{hy}\left(\,.\,,\mathcal{M}_{rr}\left(\,.\,,\mathcal{M}_{snw}\left(.\right)\right)\right)` .

Expand Down
8 changes: 4 additions & 4 deletions doc/source/user_guide/classical_uses/lez_regionalization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,7 @@ We also pass other options specific to the use of a NN:
- ``optimize_options``
- ``random_state``: a random seed used to initialize neural network weights.
- ``learning_rate``: the learning rate used for weights updates during training.
- ``termination_crit``: the number of training ``epochs`` for the neural network and a positive number to stop training when the loss function does not decrease below the current optimal value for ``early_stopping`` consecutive ``epochs``
- ``termination_crit``: the maximum number of training ``maxiter`` for the neural network and a positive number to stop training when the loss function does not decrease below the current optimal value for ``early_stopping`` consecutive iterations.

- ``return_options``
- ``net``: return the optimized neural network
Expand All @@ -240,7 +240,7 @@ We also pass other options specific to the use of a NN:
optimize_options={
"random_state": 23,
"learning_rate": 0.004,
"termination_crit": dict(epochs=100, early_stopping=20),
"termination_crit": dict(maxiter=100, early_stopping=20),
},
return_options={"net": True},
common_options={"ncpu": ncpu},
Expand All @@ -255,7 +255,7 @@ We also pass other options specific to the use of a NN:
optimize_options={
"random_state": 23,
"learning_rate": 0.004,
"termination_crit": dict(epochs=100, early_stopping=20),
"termination_crit": dict(maxiter=100, early_stopping=20),
},
return_options={"net": True},
)
Expand All @@ -276,7 +276,7 @@ Other information is available in the `smash.factory.Net` object, including the
.. ipython:: python
plt.plot(opt_ann.net.history["loss_train"]);
plt.xlabel("Epoch");
plt.xlabel("Iteration");
plt.ylabel("$1-NSE$");
plt.grid(alpha=.7, ls="--");
@savefig user_guide.classical_uses.lez_regionalization.ann_J.png
Expand Down
22 changes: 12 additions & 10 deletions doc/source/user_guide/quickstart/cance_first_simulation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -681,19 +681,21 @@ First, several information were displayed on the screen during optimization

.. code-block:: text
At iterate 0 nfg = 1 J = 0.695010 ddx = 0.64
At iterate 1 nfg = 30 J = 0.098411 ddx = 0.64
At iterate 2 nfg = 59 J = 0.045409 ddx = 0.32
At iterate 3 nfg = 88 J = 0.038182 ddx = 0.16
At iterate 4 nfg = 117 J = 0.037362 ddx = 0.08
At iterate 5 nfg = 150 J = 0.037087 ddx = 0.02
At iterate 6 nfg = 183 J = 0.036800 ddx = 0.02
At iterate 7 nfg = 216 J = 0.036763 ddx = 0.01
CONVERGENCE: DDX < 0.01
</> Optimize
At iterate 0 nfg = 1 J = 6.95010e-01 ddx = 0.64
At iterate 1 nfg = 30 J = 9.84107e-02 ddx = 0.64
At iterate 2 nfg = 59 J = 4.54087e-02 ddx = 0.32
At iterate 3 nfg = 88 J = 3.81818e-02 ddx = 0.16
At iterate 4 nfg = 117 J = 3.73617e-02 ddx = 0.08
At iterate 5 nfg = 150 J = 3.70873e-02 ddx = 0.02
At iterate 6 nfg = 183 J = 3.68004e-02 ddx = 0.02
At iterate 7 nfg = 216 J = 3.67635e-02 ddx = 0.01
At iterate 8 nfg = 240 J = 3.67277e-02 ddx = 0.01
CONVERGENCE: DDX < 0.01
These lines show the different iterations of the optimization with information on the number of iterations, the number of cumulative evaluations ``nfg``
(number of foward runs performed within each iteration of the optimization algorithm), the value of the cost function to minimize ``J`` and the value of the adaptive descent step ``ddx`` of this heuristic search algorihtm.
So, to summarize, the optimization algorithm has converged after 7 iterations by reaching the descent step tolerance criterion of 0.01. This optimization required to perform 216 forward run evaluations and leads to a final cost function value on the order of 0.04.
So, to summarize, the optimization algorithm has converged after 8 iterations by reaching the descent step tolerance criterion of 0.01. This optimization required to perform 240 forward run evaluations and leads to a final cost function value of 0.0367.

Then, we can ask which cost function ``J`` has been minimized and which parameters have been optimized. So, by default, the cost function to be minimized is one minus the Nash-Sutcliffe efficiency ``nse`` (:math:`1 - \text{NSE}`)
and the optimized parameters are the set of rainfall-runoff parameters (``cp``, ``ct``, ``kexc`` and ``llr``). In the current configuration spatially
Expand Down
88 changes: 59 additions & 29 deletions smash/_constant.py
Original file line number Diff line number Diff line change
Expand Up @@ -757,9 +757,7 @@ def get_neurons_from_hydrological_module(hydrological_module: str, hidden_neuron
"zeros",
]

PY_OPTIMIZER_CLASS = ["Adam", "SGD", "Adagrad", "RMSprop"]

PY_OPTIMIZER = [opt.lower() for opt in PY_OPTIMIZER_CLASS]
OPTIMIZER_CLASS = ["Adam", "SGD", "Adagrad", "RMSprop"]

ACTIVATION_FUNCTION_CLASS = [
"Sigmoid",
Expand Down Expand Up @@ -799,31 +797,23 @@ def get_neurons_from_hydrological_module(hydrological_module: str, hidden_neuron

MAPPING = ["uniform", "distributed"] + REGIONAL_MAPPING

F90_OPTIMIZER = ["sbs", "lbfgsb"]
ADAPTIVE_OPTIMIZER = [opt.lower() for opt in OPTIMIZER_CLASS]
GRADIENT_BASED_OPTIMIZER = ["lbfgsb"] + ADAPTIVE_OPTIMIZER
HEURISTIC_OPTIMIZER = ["sbs"]

OPTIMIZER = F90_OPTIMIZER + PY_OPTIMIZER
OPTIMIZER = HEURISTIC_OPTIMIZER + GRADIENT_BASED_OPTIMIZER

# % Following MAPPING order
# % The first optimizer for each mapping is used as default optimizer
MAPPING_OPTIMIZER = dict(
zip(
MAPPING,
[
F90_OPTIMIZER,
["lbfgsb"],
["lbfgsb"],
["lbfgsb"],
PY_OPTIMIZER,
],
)
)

F90_OPTIMIZER_CONTROL_TFM = dict(
zip(
F90_OPTIMIZER,
[
["sbs", "normalize", "keep"],
["normalize", "keep"],
OPTIMIZER, # for uniform mapping (all optimizers are possible, default is sbs)
*(
[GRADIENT_BASED_OPTIMIZER] * 3
), # for distributed, multi-linear, multi-polynomial mappings (default is lbfgsb)
ADAPTIVE_OPTIMIZER, # for ann mapping (default is adam)
],
)
)
Expand All @@ -843,11 +833,11 @@ def get_neurons_from_hydrological_module(hydrological_module: str, hidden_neuron
DEFAULT_TERMINATION_CRIT = dict(
**dict(
zip(
F90_OPTIMIZER,
["sbs", "lbfgsb"],
[{"maxiter": 50}, {"maxiter": 100, "factr": 1e6, "pgtol": 1e-12}],
)
),
**dict(zip(PY_OPTIMIZER, len(PY_OPTIMIZER) * [{"epochs": 200, "early_stopping": 0}])),
**dict(zip(ADAPTIVE_OPTIMIZER, len(ADAPTIVE_OPTIMIZER) * [{"maxiter": 200, "early_stopping": 0}])),
)

CONTROL_PRIOR_DISTRIBUTION = [
Expand Down Expand Up @@ -885,6 +875,22 @@ def get_neurons_from_hydrological_module(hydrological_module: str, hidden_neuron
"control_tfm",
"termination_crit",
],
**dict(
zip(
itertools.product(["uniform", "distributed"], ADAPTIVE_OPTIMIZER),
2
* len(ADAPTIVE_OPTIMIZER)
* [
[
"parameters",
"bounds",
"control_tfm",
"learning_rate",
"termination_crit",
]
],
)
), # product between 2 mappings (uniform, distributed) and all adaptive optimizers
("multi-linear", "lbfgsb"): [
"parameters",
"bounds",
Expand All @@ -901,8 +907,25 @@ def get_neurons_from_hydrological_module(hydrological_module: str, hidden_neuron
],
**dict(
zip(
[("ann", optimizer) for optimizer in PY_OPTIMIZER],
len(PY_OPTIMIZER)
itertools.product(["multi-linear", "multi-polynomial"], ADAPTIVE_OPTIMIZER),
2
* len(ADAPTIVE_OPTIMIZER)
* [
[
"parameters",
"bounds",
"control_tfm",
"descriptor",
"learning_rate",
"termination_crit",
]
],
)
), # product between 2 mappings (multi-linear, multi-polynomial) and all adaptive optimizers
**dict(
zip(
[("ann", optimizer) for optimizer in ADAPTIVE_OPTIMIZER],
len(ADAPTIVE_OPTIMIZER)
* [
[
"parameters",
Expand All @@ -914,7 +937,16 @@ def get_neurons_from_hydrological_module(hydrological_module: str, hidden_neuron
]
],
)
),
), # ann mapping and all adaptive optimizers
}

OPTIMIZER_CONTROL_TFM = {
(mapping, optimizer): ["sbs", "normalize", "keep"] # in case of sbs optimizer
if optimizer == "sbs"
else ["normalize", "keep"] # in case of ann mapping
if mapping != "ann"
else ["keep"] # other cases
for mapping, optimizer in SIMULATION_OPTIMIZE_OPTIONS_KEYS.keys()
}

DEFAULT_SIMULATION_COST_OPTIONS = {
Expand Down Expand Up @@ -962,11 +994,10 @@ def get_neurons_from_hydrological_module(hydrological_module: str, hidden_neuron
"rr_states": False,
"q_domain": False,
"internal_fluxes": False,
"iter_cost": False,
"iter_projg": False,
"control_vector": False,
"net": False,
"cost": False,
"projg": False,
"jobs": False,
"jreg": False,
"lcurve_wjreg": False,
Expand All @@ -976,10 +1007,9 @@ def get_neurons_from_hydrological_module(hydrological_module: str, hidden_neuron
"rr_states": False,
"q_domain": False,
"internal_fluxes": False,
"iter_cost": False,
"iter_projg": False,
"control_vector": False,
"cost": False,
"projg": False,
"log_lkh": False,
"log_prior": False,
"log_h": False,
Expand Down
4 changes: 2 additions & 2 deletions smash/core/model/_standardize.py
Original file line number Diff line number Diff line change
Expand Up @@ -448,8 +448,8 @@ def _standardize_model_setup_hidden_neuron(hidden_neuron: Numeric | ListLike, **
def _standardize_model_setup(setup: dict) -> dict:
if isinstance(setup, dict):
pop_keys = []
for key in setup.keys():
if key not in DEFAULT_MODEL_SETUP.keys():
for key in setup:
if key not in DEFAULT_MODEL_SETUP:
pop_keys.append(key)
warnings.warn(
f"Unknown model setup key '{key}'. Choices: {list(DEFAULT_MODEL_SETUP.keys())}",
Expand Down
33 changes: 17 additions & 16 deletions smash/core/signal_analysis/evaluation/evaluation.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,22 +79,23 @@ def evaluation(
>>> model.optimize()
</> Optimize
At iterate 0 nfg = 1 J = 0.695010 ddx = 0.64
At iterate 1 nfg = 30 J = 0.098411 ddx = 0.64
At iterate 2 nfg = 59 J = 0.045409 ddx = 0.32
At iterate 3 nfg = 88 J = 0.038182 ddx = 0.16
At iterate 4 nfg = 117 J = 0.037362 ddx = 0.08
At iterate 5 nfg = 150 J = 0.037087 ddx = 0.02
At iterate 6 nfg = 183 J = 0.036800 ddx = 0.02
At iterate 7 nfg = 216 J = 0.036763 ddx = 0.01
At iterate 0 nfg = 1 J = 6.95010e-01 ddx = 0.64
At iterate 1 nfg = 30 J = 9.84107e-02 ddx = 0.64
At iterate 2 nfg = 59 J = 4.54087e-02 ddx = 0.32
At iterate 3 nfg = 88 J = 3.81818e-02 ddx = 0.16
At iterate 4 nfg = 117 J = 3.73617e-02 ddx = 0.08
At iterate 5 nfg = 150 J = 3.70873e-02 ddx = 0.02
At iterate 6 nfg = 183 J = 3.68004e-02 ddx = 0.02
At iterate 7 nfg = 216 J = 3.67635e-02 ddx = 0.01
At iterate 8 nfg = 240 J = 3.67277e-02 ddx = 0.01
CONVERGENCE: DDX < 0.01
Compute multiple evaluation metrics for all catchments
>>> smash.evaluation(model, metric=["mae", "mse", "nse", "kge"])
array([[ 3.16965151, 44.78328323, 0.96327233, 0.94752783],
[ 1.07771611, 4.38410997, 0.90453297, 0.84582865],
[ 0.33045691, 0.50611502, 0.84956211, 0.8045246 ]])
array([[ 3.16766095, 44.77915192, 0.96327233, 0.94864655],
[ 1.07711864, 4.36171055, 0.90502125, 0.84566253],
[ 0.33053449, 0.50542408, 0.84976768, 0.8039571 ]])
Add start and end evaluation dates
Expand All @@ -106,12 +107,12 @@ def evaluation(
>>> smash.evaluation(model, metric="nse")
array([[0.96327233],
[0.90453297],
[0.84956211]])
[0.90502125],
[0.84976768]])
>>> smash.evaluation(model, metric="nse", start_eval=start_eval, end_eval=end_eval)
array([[0.9404493 ],
[0.86493075],
[0.76471144]])
array([[0.94048417],
[0.8667959 ],
[0.76593578]])
"""
metric, start_eval, end_eval = _standardize_evaluation_args(metric, start_eval, end_eval, model.setup)

Expand Down
Loading

0 comments on commit 61ad424

Please sign in to comment.