Merge pull request #21 from rte-france/bd-dev

first modification before version 0.5.0
Grid2op · Aug 11, 2020 · 5ce95a1 · 5ce95a1
2 parents 1c04ed3 + a75ae50
commit 5ce95a1
Show file tree

Hide file tree

Showing 29 changed files with 2,908 additions and 133 deletions.
diff --git a/.gitignore b/.gitignore
@@ -157,4 +157,11 @@ l2rpn_baselines/DeepQSimple/saved_baseline/
 l2rpn_baselines/DuelQLeapNet/logs-eval/
 l2rpn_baselines/DuelQSimple/saved_baseline/
 l2rpn_baselines/SAC/saved_baseline/
-
+l2rpn_baselines/TestLeapNet/model_saved/
+l2rpn_baselines/TestLeapNet/tf_logs/
+l2rpn_baselines/TestLeapNet/logs-eval/
+l2rpn_baselines/LeapNetEncoded/logs-eval/
+l2rpn_baselines/LeapNetEncoded/model_saved/
+l2rpn_baselines/LeapNetEncoded/tf_logs/
+l2rpn_baselines/LeapNetEncoded/tf_logs_test/
+l2rpn_baselines/LeapNetEncoded/model_test/
diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -4,6 +4,18 @@ Change Log
 --------
 - stack multiple states in `utils/DeepQAgent`
 
+[0.5.0] - 2020-08-??
+--------------------
+- [FIXED] the counting of the action types frequency in tensorboard (for some baselines)
+- [FIXED] a broken Replay buffer `utils.ReplayBuffer` (used in some baselines)
+- [FIXED] a bug in using multiple environments for some baselines
+- [FIXED] wrong q value update for some baselines
+- [IMPROVED] descriptions and computation of the tensorboard information (for some baselines)
+- [IMPROVED] performance optimization for training and usage of some baselines
+- [ADDED] better serializing as json of the `utils.NNParam` class
+- [ADDED] the LeapNetEncoded baselines that uses a leap neural network (leap net) to create an
+  embedding of the state of the powergrid.
+
 [0.4.4] - 2020-07-07
 --------------------
 - [FIXED] now the baselines can fully support the grid2op MultiMix environment.

diff --git a/docs/LeapNetEncoded.rst b/docs/LeapNetEncoded.rst
@@ -0,0 +1,50 @@
+LeapNetEncoded: D3QN on a state encoded by a leap net
+======================================================
+
+TODO reference the original papers `ESANN Paper <https://hal.archives-ouvertes.fr/hal-02268886>`_
+`Leap Net <https://www.sciencedirect.com/science/article/abs/pii/S0925231220305051>`_
+
+That has now be implemented as a github repository `Leap Net Github <https://github.com/BDonnot/leap_net>`_
+
+Description
+-----------
+The Leap is a type of neural network that has showed really good performances on the predictions of flows on
+powerlines based on the injection and the topology.
+
+In this baseline, we use this very same architecture to model encode the powergrid state (at a given
+step).
+
+Then this embedding of the powergrid is used by a neural network (that can be a regular network or
+a leap net) that parametrized the Q function.
+
+An example to train this model is available in the train function :ref:`Example-leapnetenc`.
+
+Exported class
+--------------
+You can use this class with:
+
+.. code-block:: python
+
+    from l2rpn_baselines.LeapNetEncoded import train, evaluate, LeapNetEncoded
+
+.. automodule:: l2rpn_baselines.LeapNetEncoded
+    :members:
+    :autosummary:
+
+Other non exported class
+------------------------
+These classes need to be imported, if you want to import them with (non exhaustive list):
+
+.. code-block:: python
+
+    from l2rpn_baselines.LeapNetEncoded.LeapNetEncoded_NN import LeapNetEncoded_NN
+    from l2rpn_baselines.LeapNetEncoded.LeapNetEncoded_NNParam import LeapNetEncoded_NNParam
+
+
+.. autoclass:: l2rpn_baselines.LeapNetEncoded.LeapNetEncoded_NN.LeapNetEncoded_NN
+    :members:
+    :autosummary:
+
+.. autoclass:: l2rpn_baselines.LeapNetEncoded.LeapNetEncoded_NNParam.LeapNetEncoded_NNParam
+    :members:
+    :autosummary:
diff --git a/docs/SAC.rst b/docs/SAC.rst
@@ -4,6 +4,9 @@ SAC: Soft Actor Critic
 This baseline comes from the paper:
 `Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor <https://arxiv.org/abs/1801.01290>`_
 
+**NB** This version is a new implementation of the SAC baselines. We recommend you to start using
+it in new projects. The old version had some issues. Out of backward compatibility, it is still
+available under the name "SACOld".
 
 Description
 -----------

diff --git a/docs/SACOld.rst b/docs/SACOld.rst
@@ -0,0 +1,44 @@
+SAC: Soft Actor Critic
+=========================
+
+This baseline comes from the paper:
+`Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor <https://arxiv.org/abs/1801.01290>`_
+
+
+Description
+-----------
+This module proposes an implementation of the SAC algorithm.
+
+**This is an old implementation that is probably not correct, it was included out of
+backward compatibility with earlier version (< 0.5.0) of this package**
+
+An example to train this model is available in the train function :ref:`Example-sacold`.
+
+Exported class
+--------------
+You can use this class with:
+
+.. code-block:: python
+
+    from l2rpn_baselines.SACOld import train, evaluate, SACOld
+
+.. automodule:: l2rpn_baselines.SACOld
+    :members:
+    :autosummary:
+
+Other non exported class
+------------------------
+These classes need to be imported, if you want to import them with (non exhaustive list):
+.. code-block:: python
+
+    from l2rpn_baselines.SACOld.SACOld_NN import SACOld_NN
+    from l2rpn_baselines.SACOld.SACOld_NNParam import SACOld_NNParam
+
+
+.. autoclass:: l2rpn_baselines.SACOld.SACOld_NN.SACOld_NN
+    :members:
+    :autosummary:
+
+.. autoclass:: l2rpn_baselines.SACOld.SACOld_NNParam.SACOld_NNParam
+    :members:
+    :autosummary:
diff --git a/docs/index.rst b/docs/index.rst
@@ -39,6 +39,16 @@ More advanced baselines
 
    DuelQLeapNet
    DoubleDuelingRDQN
+   LeapNetEncoded
+
+
+Deprecated baselines
+---------------------------
+
+.. toctree::
+   :maxdepth: 2
+
+   SACOld
 
 
 Contributions

diff --git a/l2rpn_baselines/DuelQLeapNet/DuelQLeapNet_NN.py b/l2rpn_baselines/DuelQLeapNet/DuelQLeapNet_NN.py
@@ -165,13 +165,13 @@ def _make_x_tau(self, data):
         res = [data_x, *data_tau]
         return res
 
-    def predict_movement(self, data, epsilon, batch_size=None):
+    def predict_movement(self, data, epsilon, batch_size=None, training=False):
         """Predict movement of game controler where is epsilon
         probability randomly move."""
         if batch_size is None:
             batch_size = data.shape[0]
         data_split = self._make_x_tau(data)
-        res = super().predict_movement(data_split, epsilon=epsilon, batch_size=batch_size)
+        res = super().predict_movement(data_split, epsilon=epsilon, batch_size=batch_size, training=training)
         return res
 
     def train(self, s_batch, a_batch, r_batch, d_batch, s2_batch, tf_writer=None, batch_size=None):

diff --git a/l2rpn_baselines/LeapNetEncoded/LeapNetEncoded.py b/l2rpn_baselines/LeapNetEncoded/LeapNetEncoded.py
@@ -0,0 +1,22 @@
+# Copyright (c) 2020, RTE (https://www.rte-france.com)
+# See AUTHORS.txt
+# This Source Code Form is subject to the terms of the Mozilla Public License, version 2.0.
+# If a copy of the Mozilla Public License, version 2.0 was not distributed with this file,
+# you can obtain one at http://mozilla.org/MPL/2.0/.
+# SPDX-License-Identifier: MPL-2.0
+# This file is part of L2RPN Baselines, L2RPN Baselines a repository to host baselines for l2rpn competitions.
+
+
+from l2rpn_baselines.utils import DeepQAgent
+
+DEFAULT_NAME = "LeapNetEncoded"
+
+
+class LeapNetEncoded(DeepQAgent):
+    """
+    Inheriting from :class:`l2rpn_baselines.DeepQAgent` this class implements the  particular agent used for the
+    Double Duelling Deep Q network baseline, with the particularity that the Q network is encoded with a leap net.
+
+    It does nothing in particular.
+    """
+    pass