Skip to content

Commit

Permalink
Merge pull request #21 from rte-france/bd-dev
Browse files Browse the repository at this point in the history
first modification before version 0.5.0
  • Loading branch information
Tezirg authored Aug 11, 2020
2 parents 1c04ed3 + a75ae50 commit 5ce95a1
Show file tree
Hide file tree
Showing 29 changed files with 2,908 additions and 133 deletions.
9 changes: 8 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -157,4 +157,11 @@ l2rpn_baselines/DeepQSimple/saved_baseline/
l2rpn_baselines/DuelQLeapNet/logs-eval/
l2rpn_baselines/DuelQSimple/saved_baseline/
l2rpn_baselines/SAC/saved_baseline/

l2rpn_baselines/TestLeapNet/model_saved/
l2rpn_baselines/TestLeapNet/tf_logs/
l2rpn_baselines/TestLeapNet/logs-eval/
l2rpn_baselines/LeapNetEncoded/logs-eval/
l2rpn_baselines/LeapNetEncoded/model_saved/
l2rpn_baselines/LeapNetEncoded/tf_logs/
l2rpn_baselines/LeapNetEncoded/tf_logs_test/
l2rpn_baselines/LeapNetEncoded/model_test/
12 changes: 12 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,18 @@ Change Log
--------
- stack multiple states in `utils/DeepQAgent`

[0.5.0] - 2020-08-??
--------------------
- [FIXED] the counting of the action types frequency in tensorboard (for some baselines)
- [FIXED] a broken Replay buffer `utils.ReplayBuffer` (used in some baselines)
- [FIXED] a bug in using multiple environments for some baselines
- [FIXED] wrong q value update for some baselines
- [IMPROVED] descriptions and computation of the tensorboard information (for some baselines)
- [IMPROVED] performance optimization for training and usage of some baselines
- [ADDED] better serializing as json of the `utils.NNParam` class
- [ADDED] the LeapNetEncoded baselines that uses a leap neural network (leap net) to create an
embedding of the state of the powergrid.

[0.4.4] - 2020-07-07
--------------------
- [FIXED] now the baselines can fully support the grid2op MultiMix environment.
Expand Down
50 changes: 50 additions & 0 deletions docs/LeapNetEncoded.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
LeapNetEncoded: D3QN on a state encoded by a leap net
======================================================

TODO reference the original papers `ESANN Paper <https://hal.archives-ouvertes.fr/hal-02268886>`_
`Leap Net <https://www.sciencedirect.com/science/article/abs/pii/S0925231220305051>`_

That has now be implemented as a github repository `Leap Net Github <https://github.com/BDonnot/leap_net>`_

Description
-----------
The Leap is a type of neural network that has showed really good performances on the predictions of flows on
powerlines based on the injection and the topology.

In this baseline, we use this very same architecture to model encode the powergrid state (at a given
step).

Then this embedding of the powergrid is used by a neural network (that can be a regular network or
a leap net) that parametrized the Q function.

An example to train this model is available in the train function :ref:`Example-leapnetenc`.

Exported class
--------------
You can use this class with:

.. code-block:: python
from l2rpn_baselines.LeapNetEncoded import train, evaluate, LeapNetEncoded
.. automodule:: l2rpn_baselines.LeapNetEncoded
:members:
:autosummary:

Other non exported class
------------------------
These classes need to be imported, if you want to import them with (non exhaustive list):

.. code-block:: python
from l2rpn_baselines.LeapNetEncoded.LeapNetEncoded_NN import LeapNetEncoded_NN
from l2rpn_baselines.LeapNetEncoded.LeapNetEncoded_NNParam import LeapNetEncoded_NNParam
.. autoclass:: l2rpn_baselines.LeapNetEncoded.LeapNetEncoded_NN.LeapNetEncoded_NN
:members:
:autosummary:

.. autoclass:: l2rpn_baselines.LeapNetEncoded.LeapNetEncoded_NNParam.LeapNetEncoded_NNParam
:members:
:autosummary:
3 changes: 3 additions & 0 deletions docs/SAC.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@ SAC: Soft Actor Critic
This baseline comes from the paper:
`Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor <https://arxiv.org/abs/1801.01290>`_

**NB** This version is a new implementation of the SAC baselines. We recommend you to start using
it in new projects. The old version had some issues. Out of backward compatibility, it is still
available under the name "SACOld".

Description
-----------
Expand Down
44 changes: 44 additions & 0 deletions docs/SACOld.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
SAC: Soft Actor Critic
=========================

This baseline comes from the paper:
`Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor <https://arxiv.org/abs/1801.01290>`_


Description
-----------
This module proposes an implementation of the SAC algorithm.

**This is an old implementation that is probably not correct, it was included out of
backward compatibility with earlier version (< 0.5.0) of this package**

An example to train this model is available in the train function :ref:`Example-sacold`.

Exported class
--------------
You can use this class with:

.. code-block:: python
from l2rpn_baselines.SACOld import train, evaluate, SACOld
.. automodule:: l2rpn_baselines.SACOld
:members:
:autosummary:

Other non exported class
------------------------
These classes need to be imported, if you want to import them with (non exhaustive list):
.. code-block:: python
from l2rpn_baselines.SACOld.SACOld_NN import SACOld_NN
from l2rpn_baselines.SACOld.SACOld_NNParam import SACOld_NNParam
.. autoclass:: l2rpn_baselines.SACOld.SACOld_NN.SACOld_NN
:members:
:autosummary:

.. autoclass:: l2rpn_baselines.SACOld.SACOld_NNParam.SACOld_NNParam
:members:
:autosummary:
10 changes: 10 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,16 @@ More advanced baselines

DuelQLeapNet
DoubleDuelingRDQN
LeapNetEncoded


Deprecated baselines
---------------------------

.. toctree::
:maxdepth: 2

SACOld


Contributions
Expand Down
4 changes: 2 additions & 2 deletions l2rpn_baselines/DuelQLeapNet/DuelQLeapNet_NN.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,13 +165,13 @@ def _make_x_tau(self, data):
res = [data_x, *data_tau]
return res

def predict_movement(self, data, epsilon, batch_size=None):
def predict_movement(self, data, epsilon, batch_size=None, training=False):
"""Predict movement of game controler where is epsilon
probability randomly move."""
if batch_size is None:
batch_size = data.shape[0]
data_split = self._make_x_tau(data)
res = super().predict_movement(data_split, epsilon=epsilon, batch_size=batch_size)
res = super().predict_movement(data_split, epsilon=epsilon, batch_size=batch_size, training=training)
return res

def train(self, s_batch, a_batch, r_batch, d_batch, s2_batch, tf_writer=None, batch_size=None):
Expand Down
22 changes: 22 additions & 0 deletions l2rpn_baselines/LeapNetEncoded/LeapNetEncoded.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Copyright (c) 2020, RTE (https://www.rte-france.com)
# See AUTHORS.txt
# This Source Code Form is subject to the terms of the Mozilla Public License, version 2.0.
# If a copy of the Mozilla Public License, version 2.0 was not distributed with this file,
# you can obtain one at http://mozilla.org/MPL/2.0/.
# SPDX-License-Identifier: MPL-2.0
# This file is part of L2RPN Baselines, L2RPN Baselines a repository to host baselines for l2rpn competitions.


from l2rpn_baselines.utils import DeepQAgent

DEFAULT_NAME = "LeapNetEncoded"


class LeapNetEncoded(DeepQAgent):
"""
Inheriting from :class:`l2rpn_baselines.DeepQAgent` this class implements the particular agent used for the
Double Duelling Deep Q network baseline, with the particularity that the Q network is encoded with a leap net.
It does nothing in particular.
"""
pass
Loading

0 comments on commit 5ce95a1

Please sign in to comment.