feat: support cbf methods #323

Gaiejj · 2024-04-17T04:23:38Z

Description

This PR is already complete in terms of implementation accuracy. We will merge it shortly after improving the code style and documentation.

Related Papers

This Pull Request supports control barrier function-based SafeRL algorithms, including:

End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks
Safe Reinforcement Learning Using Robust Control Barrier Functions
Sampling-based Safe Reinforcement Learning for Nonlinear Dynamical Systems

Example Demo

DDPG_CBF.mp4

Experiment and Performance

Note: Since OmniSafe uses Steps as the x-axis scale when displaying benchmark curves, the x-axis scale of the curves is not entirely consistent with the original implementation. However, the total number of interactive steps is the same. For example, in Pendulum-v1, 400 episodes * 200 steps per episode = 80,000 total steps.

Pendulum-v1 Benchmark
Original Implementations

OmniSafe Implementations

Note The IPO algorithm, based on the Logarithmic Barrier Function, conceptually aligns with the CBF method.
Therefore, it is classified as a variant of the CBF approach.

Unicycle Task

Due to the highly coupled nature of the CBF method with the environment, we are currently only validating the performance of the RCBF method in the Unicycle environment, which is the same as in the original paper. Future work will focus on how to achieve environmental decoupling.

Original Implementations

OmniSafe Implementations

Note: Original implementation is zero cost, so it does not include the cost performance curves.

Motivation and Context

The CBF method is an important branch of SafeRL research. Supporting the CBF method will further expand OmniSafe's contribution to the community.

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide. (required)
My change requires a change to the documentation.
I have updated the tests accordingly. (required for a bug fix or a new feature)
I have updated the documentation accordingly.
I have reformatted the code using make format. (required)
I have checked the code using make lint. (required)
I have ensured make test pass. (required)

codecov · 2024-04-30T08:39:40Z

Codecov Report

Attention: Patch coverage is 95.30201% with 56 lines in your changes missing coverage. Please review.

Project coverage is 96.79%. Comparing base (080e6b8) to head (34104cd).

Files	Patch %	Lines
omnisafe/adapter/barrier_function_adapter.py	92.45%	8 Missing ⚠️
omnisafe/adapter/beta_barrier_function_adapter.py	92.71%	7 Missing ⚠️
omnisafe/utils/tools.py	33.33%	6 Missing ⚠️
omnisafe/envs/cbf_env.py	92.54%	5 Missing ⚠️
omnisafe/common/barrier_solver.py	90.91%	4 Missing ⚠️
...mnisafe/adapter/robust_barrier_function_adapter.py	96.30%	3 Missing ⚠️
omnisafe/envs/rcbf_env.py	93.02%	3 Missing ⚠️
omnisafe/models/actor/beta_learning_actor.py	92.31%	3 Missing ⚠️
...safe/adapter/offpolicy_barrier_function_adapter.py	97.85%	2 Missing ⚠️
omnisafe/algorithms/off_policy/ddpg_cbf.py	93.33%	2 Missing ⚠️
... and 8 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #323      +/-   ##
==========================================
- Coverage   97.03%   96.79%   -0.25%     
==========================================
  Files         137      153      +16     
  Lines        6906     7933    +1027     
==========================================
+ Hits         6701     7678     +977     
- Misses        205      255      +50

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

omnisafe/adapter/beta_barrier_function_adapter.py

omnisafe/adapter/robust_barrier_function_adapter.py

omnisafe/algorithms/on_policy/base/ppo.py

omnisafe/common/barrier_comp.py

omnisafe/common/barrier_solver.py

omnisafe/common/buffer/vector_onpolicy_buffer.py

omnisafe/common/robust_barrier_solver.py

omnisafe/common/robust_gp_model.py

omnisafe/configs/off-policy/DDPGCBF.yaml

omnisafe/configs/off-policy/SACRCBF.yaml

omnisafe/configs/on-policy/IPO.yaml

omnisafe/configs/on-policy/TRPOCBF.yaml

omnisafe/envs/classic_control/envs_from_rcbf.py

omnisafe/envs/rcbf_env.py

omnisafe/utils/tools.py

umfundii · 2024-06-04T17:17:45Z

@Gaiejj Can you please provide instructions on how to reproduce this result? I have been trying to use Omnisafe to reproduce the result, but I have been experiencing issues.

Gaiejj · 2024-06-05T04:37:19Z

@umfundii Hello, here is the complete reproduction process:

Create a new conda environment.

conda create -n cbf python==3.8

Install OmniSafe (this is because the CBF-based method requires new Python dependencies).

pip install -e .

Execute the following commands.

python train_policy.py --algo SACRCBF --env-id Unicycle
python train_policy.py --algo DDPGCBF --env-id Pendulum-v1
python train_policy.py --algo TRPOCBF --env-id Pendulum-v1

My Python environment configuration is as follows.

Package                  Version              Editable project location
------------------------ -------------------- -------------------------
absl-py                  2.1.0
aiohttp                  3.9.5
aiosignal                1.3.1
async-timeout            4.0.3
attrs                    23.2.0
beautifulsoup4           4.12.3
cachetools               5.3.3
certifi                  2024.6.2
charset-normalizer       3.3.2
clarabel                 0.9.0
click                    8.1.7
cloudpickle              3.0.0
contourpy                1.1.1
cvxopt                   1.3.2
cvxpy                    1.5.1
cycler                   0.12.1
decorator                4.4.2
docker-pycreds           0.4.0
ecos                     2.0.13
Farama-Notifications     0.0.4
filelock                 3.14.0
fonttools                4.53.0
frozenlist               1.4.1
fsspec                   2024.6.0
gdown                    5.2.0
gitdb                    4.0.11
GitPython                3.1.43
glfw                     2.7.0
google-auth              2.29.0
google-auth-oauthlib     1.0.0
gpytorch                 1.11
grpcio                   1.64.1
gymnasium                0.28.1
gymnasium-robotics       1.2.2
idna                     3.7
imageio                  2.34.1
imageio-ffmpeg           0.5.1
importlib_metadata       7.1.0
importlib_resources      6.4.0
jax-jumpy                1.0.0
jaxtyping                0.2.19
Jinja2                   3.1.4
joblib                   1.3.2
kiwisolver               1.4.5
lightning-utilities      0.11.2
linear-operator          0.5.2
Markdown                 3.6
markdown-it-py           3.0.0
MarkupSafe               2.1.5
matplotlib               3.7.5
mdurl                    0.1.2
moviepy                  1.0.3
mpmath                   1.3.0
mujoco                   2.3.3
multidict                6.0.5
networkx                 3.1
numpy                    1.23.5
nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12        8.9.2.26
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu12     12.1.0.106
nvidia-nccl-cu12         2.20.5
nvidia-nvjitlink-cu12    12.5.40
nvidia-nvtx-cu12         12.1.105
oauthlib                 3.2.2
omnisafe                 0.5.1.dev37+gdd9068f /home/jiayi/omnisafe_zjy
osqp                     0.6.7
packaging                24.0
pandas                   2.0.3
pettingzoo               1.24.3
pillow                   10.3.0
pip                      24.0
platformdirs             4.2.2
proglog                  0.1.10
protobuf                 4.25.3
psutil                   5.9.8
pyasn1                   0.6.0
pyasn1_modules           0.4.0
pygame                   2.1.0
Pygments                 2.18.0
PyOpenGL                 3.1.7
pyparsing                3.1.2
PySocks                  1.7.1
python-dateutil          2.9.0.post0
pytorch-lightning        2.2.5
pytz                     2024.1
PyYAML                   6.0.1
qdldl                    0.1.7.post2
qpth                     0.0.16
requests                 2.32.3
requests-oauthlib        2.0.0
rich                     13.7.1
rsa                      4.9
safety-gymnasium         1.0.0
scikit-learn             1.3.2
scipy                    1.10.1
scs                      3.2.4.post2
seaborn                  0.13.2
sentry-sdk               2.4.0
setproctitle             1.3.3
setuptools               70.0.0
shellingham              1.5.4
six                      1.16.0
smmap                    5.0.1
soupsieve                2.5
sympy                    1.12.1
tensorboard              2.14.0
tensorboard-data-server  0.7.2
threadpoolctl            3.5.0
torch                    2.3.0
torchmetrics             1.4.0.post0
tqdm                     4.66.4
triton                   2.3.0
typeguard                2.13.3
typer                    0.12.3
typing_extensions        4.12.1
tzdata                   2024.1
urllib3                  2.2.1
wandb                    0.17.0
Werkzeug                 3.0.3
wheel                    0.43.0
xmltodict                0.13.0
yarl                     1.9.4
zipp                     3.19.1

If you encounter further issues, feel free to contact us!

umfundii · 2024-06-18T10:20:50Z

@Gaiejj I have few additional questions.

How to reproduce the graph above if I use an agent. plot() the graphs don't seem to look like the one provided above
Also in the config files( train_cfgs) why use device CPU instead of GPU? Is there any significance because selecting GPU throws an error being separate devices?
Also, the method agent.evaluate doesn't work. It throws the following error KeyError: 'compensator'

Gaiejj · 2024-06-19T12:21:50Z

@umfundii Thank you very much for pointing that out! It greatly helps us improve the robustness and usability of the code in this PR.

We have added the functionality to specify reward metrics and cost metrics in plot.py. By running:

python plot.py --logdir LOGDIR --cost-metrics Metrics/Max_angle_violation --reward-metrics Metrics/EpRet

you can reproduce the corresponding results using plot.py.

To further facilitate user convenience, we have also made adjustments to the relevant sections of experiment_grid. Currently, users can reproduce the setup with one click through the Python script in examples/benchmarks/run_experiment_grid.py as shown below:

if __name__ == '__main__':
    eg = ExperimentGrid(exp_name='Benchmark_CBF_Test')

    cbf_policy = ['TRPOCBF', 'DDPGCBF', 'IPO', 'PPOBetaCBF']
    cbf_env = ['Pendulum-v1']
    eg.add('env_id', cbf_env)

    avaliable_gpus = list(range(torch.cuda.device_count()))
    gpu_id = None

    eg.add('algo', cbf_policy)
    eg.add('logger_cfgs:use_wandb', [False])
    eg.add('train_cfgs:vector_env_nums', [1])
    eg.add('train_cfgs:torch_threads', [1])
    eg.add('train_cfgs:total_steps', [80_000])
    eg.add('seed', [0, 5, 10])
    eg.run(train, num_pool=12, gpu_id=gpu_id)
    reward_metrics = 'Metrics/EpRet'
    cost_metrics = 'Metrics/Max_angle_violation'
    eg.analyze(
        parameter='algo',
        values=None,
        compare_num=4,
        cost_limit=1.0,
        reward_metrics=reward_metrics,
        cost_metrics=cost_metrics,
    )

We fixed the cuda support for TRPOCBF and PPOBetaCBF in commit 7423bc1 and 3934192,.
We fixed the compensator saving issue in commit 3934192, thanks for your meticulous reminder!
Note: Due to the high dependency of CBF method components, such as compensator and solver, on the dynamics of the training environment, the performance of the CBF method has declined in the evaluation environment.

umfundii · 2024-06-26T12:43:38Z

@Gaiejj
Thanks for the fix and additional questions.

Can you please post plots like the one above with the fix? Because Now DDPG-CBF performance is not that good.
Also, if I want to extend this method to all environments within Omnisafe/ safety-gymnasium, can you please provide guidance on how I can accomplish that?

Gaiejj · 2024-06-29T15:02:55Z

@umfundii Thank you again for your detailed observations and feedback!

Based on the latest code, we conducted extensive repeated experiments on 5 random seeds and found that the results of DDPGCBF indeed have slight differences compared to the initially published performance in the PR. This difference may be due to randomness caused by equipment differences. We present the latest results as follows:

We believe this is a very forward-looking suggestion because it can better help OmniSafe assist researchers in the study of CBF methods. We are in the process of code refactoring and documentation writing, so that researchers can extend CBF methods to more environments. We will synchronize the updates to this PR within 3-4 days, and we welcome your guidance then.

Thanks again for your help and suggestions!

Gaiejj · 2024-07-06T03:19:42Z

@umfundii Hello! We have refactored the CBF code. Specifically, we decoupled the solver and dynamics_model to facilitate user customization for specific environments. We have also completed the documentation for this interface. We have built a pre-release version on the branch corresponding to this PR: https://omnisafe-zjy.readthedocs.io/en/latest/saferl/cbf.html

Gaiejj force-pushed the dev-cbf branch from a10f4b8 to cef09dc Compare May 3, 2024 17:12

Gaiejj added 4 commits May 8, 2024 22:52

feat: support cbf methods

9a21e81

fix: fix test

025eea9

wip

71ffe78

chore: update pytest

23f66d1

Gaiejj force-pushed the dev-cbf branch from f371cc3 to 23f66d1 Compare May 8, 2024 14:54

Gaiejj added 3 commits May 8, 2024 23:13

chore: update pytest

259975a

chore: update pytest

08e926c

chore: update pytest

6a18071

Gaiejj marked this pull request as ready for review May 9, 2024 14:05