Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support cbf methods #323

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open

feat: support cbf methods #323

wants to merge 18 commits into from

Conversation

Gaiejj
Copy link
Member

@Gaiejj Gaiejj commented Apr 17, 2024

Description

This PR is already complete in terms of implementation accuracy. We will merge it shortly after improving the code style and documentation.

Related Papers

This Pull Request supports control barrier function-based SafeRL algorithms, including:

Example Demo

DDPG_CBF.mp4

Experiment and Performance

Note: Since OmniSafe uses Steps as the x-axis scale when displaying benchmark curves, the x-axis scale of the curves is not entirely consistent with the original implementation. However, the total number of interactive steps is the same. For example, in Pendulum-v1, 400 episodes * 200 steps per episode = 80,000 total steps.

  • Pendulum-v1 Benchmark

  • Original Implementations

image
  • OmniSafe Implementations
image

Note The IPO algorithm, based on the Logarithmic Barrier Function, conceptually aligns with the CBF method.
Therefore, it is classified as a variant of the CBF approach.

  • Unicycle Task

Due to the highly coupled nature of the CBF method with the environment, we are currently only validating the performance of the RCBF method in the Unicycle environment, which is the same as in the original paper. Future work will focus on how to achieve environmental decoupling.

  • Original Implementations
image
  • OmniSafe Implementations
image

Note: Original implementation is zero cost, so it does not include the cost performance curves.

Motivation and Context

The CBF method is an important branch of SafeRL research. Supporting the CBF method will further expand OmniSafe's contribution to the community.

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds core functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

  • I have read the CONTRIBUTION guide. (required)
  • My change requires a change to the documentation.
  • I have updated the tests accordingly. (required for a bug fix or a new feature)
  • I have updated the documentation accordingly.
  • I have reformatted the code using make format. (required)
  • I have checked the code using make lint. (required)
  • I have ensured make test pass. (required)

Copy link

codecov bot commented Apr 30, 2024

Codecov Report

Attention: Patch coverage is 95.30201% with 56 lines in your changes missing coverage. Please review.

Project coverage is 96.79%. Comparing base (080e6b8) to head (34104cd).

Files Patch % Lines
omnisafe/adapter/barrier_function_adapter.py 92.45% 8 Missing ⚠️
omnisafe/adapter/beta_barrier_function_adapter.py 92.71% 7 Missing ⚠️
omnisafe/utils/tools.py 33.33% 6 Missing ⚠️
omnisafe/envs/cbf_env.py 92.54% 5 Missing ⚠️
omnisafe/common/barrier_solver.py 90.91% 4 Missing ⚠️
...mnisafe/adapter/robust_barrier_function_adapter.py 96.30% 3 Missing ⚠️
omnisafe/envs/rcbf_env.py 93.02% 3 Missing ⚠️
omnisafe/models/actor/beta_learning_actor.py 92.31% 3 Missing ⚠️
...safe/adapter/offpolicy_barrier_function_adapter.py 97.85% 2 Missing ⚠️
omnisafe/algorithms/off_policy/ddpg_cbf.py 93.33% 2 Missing ⚠️
... and 8 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #323      +/-   ##
==========================================
- Coverage   97.03%   96.79%   -0.25%     
==========================================
  Files         137      153      +16     
  Lines        6906     7933    +1027     
==========================================
+ Hits         6701     7678     +977     
- Misses        205      255      +50     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Gaiejj Gaiejj marked this pull request as ready for review May 9, 2024 14:05
omnisafe/envs/rcbf_env.py Outdated Show resolved Hide resolved
omnisafe/utils/tools.py Outdated Show resolved Hide resolved
omnisafe/utils/tools.py Outdated Show resolved Hide resolved
omnisafe/utils/tools.py Outdated Show resolved Hide resolved
@umfundii
Copy link

umfundii commented Jun 4, 2024

@Gaiejj Can you please provide instructions on how to reproduce this result? I have been trying to use Omnisafe to reproduce the result, but I have been experiencing issues.

@Gaiejj
Copy link
Member Author

Gaiejj commented Jun 5, 2024

@umfundii Hello, here is the complete reproduction process:

  1. Create a new conda environment.
conda create -n cbf python==3.8
  1. Install OmniSafe (this is because the CBF-based method requires new Python dependencies).
pip install -e .
  1. Execute the following commands.
python train_policy.py --algo SACRCBF --env-id Unicycle
python train_policy.py --algo DDPGCBF --env-id Pendulum-v1
python train_policy.py --algo TRPOCBF --env-id Pendulum-v1

My Python environment configuration is as follows.

Package                  Version              Editable project location
------------------------ -------------------- -------------------------
absl-py                  2.1.0
aiohttp                  3.9.5
aiosignal                1.3.1
async-timeout            4.0.3
attrs                    23.2.0
beautifulsoup4           4.12.3
cachetools               5.3.3
certifi                  2024.6.2
charset-normalizer       3.3.2
clarabel                 0.9.0
click                    8.1.7
cloudpickle              3.0.0
contourpy                1.1.1
cvxopt                   1.3.2
cvxpy                    1.5.1
cycler                   0.12.1
decorator                4.4.2
docker-pycreds           0.4.0
ecos                     2.0.13
Farama-Notifications     0.0.4
filelock                 3.14.0
fonttools                4.53.0
frozenlist               1.4.1
fsspec                   2024.6.0
gdown                    5.2.0
gitdb                    4.0.11
GitPython                3.1.43
glfw                     2.7.0
google-auth              2.29.0
google-auth-oauthlib     1.0.0
gpytorch                 1.11
grpcio                   1.64.1
gymnasium                0.28.1
gymnasium-robotics       1.2.2
idna                     3.7
imageio                  2.34.1
imageio-ffmpeg           0.5.1
importlib_metadata       7.1.0
importlib_resources      6.4.0
jax-jumpy                1.0.0
jaxtyping                0.2.19
Jinja2                   3.1.4
joblib                   1.3.2
kiwisolver               1.4.5
lightning-utilities      0.11.2
linear-operator          0.5.2
Markdown                 3.6
markdown-it-py           3.0.0
MarkupSafe               2.1.5
matplotlib               3.7.5
mdurl                    0.1.2
moviepy                  1.0.3
mpmath                   1.3.0
mujoco                   2.3.3
multidict                6.0.5
networkx                 3.1
numpy                    1.23.5
nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12        8.9.2.26
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu12     12.1.0.106
nvidia-nccl-cu12         2.20.5
nvidia-nvjitlink-cu12    12.5.40
nvidia-nvtx-cu12         12.1.105
oauthlib                 3.2.2
omnisafe                 0.5.1.dev37+gdd9068f /home/jiayi/omnisafe_zjy
osqp                     0.6.7
packaging                24.0
pandas                   2.0.3
pettingzoo               1.24.3
pillow                   10.3.0
pip                      24.0
platformdirs             4.2.2
proglog                  0.1.10
protobuf                 4.25.3
psutil                   5.9.8
pyasn1                   0.6.0
pyasn1_modules           0.4.0
pygame                   2.1.0
Pygments                 2.18.0
PyOpenGL                 3.1.7
pyparsing                3.1.2
PySocks                  1.7.1
python-dateutil          2.9.0.post0
pytorch-lightning        2.2.5
pytz                     2024.1
PyYAML                   6.0.1
qdldl                    0.1.7.post2
qpth                     0.0.16
requests                 2.32.3
requests-oauthlib        2.0.0
rich                     13.7.1
rsa                      4.9
safety-gymnasium         1.0.0
scikit-learn             1.3.2
scipy                    1.10.1
scs                      3.2.4.post2
seaborn                  0.13.2
sentry-sdk               2.4.0
setproctitle             1.3.3
setuptools               70.0.0
shellingham              1.5.4
six                      1.16.0
smmap                    5.0.1
soupsieve                2.5
sympy                    1.12.1
tensorboard              2.14.0
tensorboard-data-server  0.7.2
threadpoolctl            3.5.0
torch                    2.3.0
torchmetrics             1.4.0.post0
tqdm                     4.66.4
triton                   2.3.0
typeguard                2.13.3
typer                    0.12.3
typing_extensions        4.12.1
tzdata                   2024.1
urllib3                  2.2.1
wandb                    0.17.0
Werkzeug                 3.0.3
wheel                    0.43.0
xmltodict                0.13.0
yarl                     1.9.4
zipp                     3.19.1

If you encounter further issues, feel free to contact us!

@umfundii
Copy link

umfundii commented Jun 18, 2024

@Gaiejj I have few additional questions.

  1. How to reproduce the graph above if I use an agent. plot() the graphs don't seem to look like the one provided above
  2. Also in the config files( train_cfgs) why use device CPU instead of GPU? Is there any significance because selecting GPU throws an error being separate devices?
  3. Also, the method agent.evaluate doesn't work. It throws the following error KeyError: 'compensator'

@Gaiejj
Copy link
Member Author

Gaiejj commented Jun 19, 2024

@umfundii Thank you very much for pointing that out! It greatly helps us improve the robustness and usability of the code in this PR.

  • We have added the functionality to specify reward metrics and cost metrics in plot.py. By running:
python plot.py --logdir LOGDIR --cost-metrics Metrics/Max_angle_violation --reward-metrics Metrics/EpRet

you can reproduce the corresponding results using plot.py.

To further facilitate user convenience, we have also made adjustments to the relevant sections of experiment_grid. Currently, users can reproduce the setup with one click through the Python script in examples/benchmarks/run_experiment_grid.py as shown below:

if __name__ == '__main__':
    eg = ExperimentGrid(exp_name='Benchmark_CBF_Test')

    cbf_policy = ['TRPOCBF', 'DDPGCBF', 'IPO', 'PPOBetaCBF']
    cbf_env = ['Pendulum-v1']
    eg.add('env_id', cbf_env)

    avaliable_gpus = list(range(torch.cuda.device_count()))
    gpu_id = None

    eg.add('algo', cbf_policy)
    eg.add('logger_cfgs:use_wandb', [False])
    eg.add('train_cfgs:vector_env_nums', [1])
    eg.add('train_cfgs:torch_threads', [1])
    eg.add('train_cfgs:total_steps', [80_000])
    eg.add('seed', [0, 5, 10])
    eg.run(train, num_pool=12, gpu_id=gpu_id)
    reward_metrics = 'Metrics/EpRet'
    cost_metrics = 'Metrics/Max_angle_violation'
    eg.analyze(
        parameter='algo',
        values=None,
        compare_num=4,
        cost_limit=1.0,
        reward_metrics=reward_metrics,
        cost_metrics=cost_metrics,
    )
  • We fixed the cuda support for TRPOCBF and PPOBetaCBF in commit 7423bc1 and 3934192,.
  • We fixed the compensator saving issue in commit 3934192, thanks for your meticulous reminder!
  • Note: Due to the high dependency of CBF method components, such as compensator and solver, on the dynamics of the training environment, the performance of the CBF method has declined in the evaluation environment.

@umfundii
Copy link

@Gaiejj
Thanks for the fix and additional questions.

  1. Can you please post plots like the one above with the fix? Because Now DDPG-CBF performance is not that good.
  2. Also, if I want to extend this method to all environments within Omnisafe/ safety-gymnasium, can you please provide guidance on how I can accomplish that?

@Gaiejj
Copy link
Member Author

Gaiejj commented Jun 29, 2024

@umfundii Thank you again for your detailed observations and feedback!

  1. Based on the latest code, we conducted extensive repeated experiments on 5 random seeds and found that the results of DDPGCBF indeed have slight differences compared to the initially published performance in the PR. This difference may be due to randomness caused by equipment differences. We present the latest results as follows:

Benchmark_CBF_Test_New

  1. We believe this is a very forward-looking suggestion because it can better help OmniSafe assist researchers in the study of CBF methods. We are in the process of code refactoring and documentation writing, so that researchers can extend CBF methods to more environments. We will synchronize the updates to this PR within 3-4 days, and we welcome your guidance then.

Thanks again for your help and suggestions!

@Gaiejj
Copy link
Member Author

Gaiejj commented Jul 6, 2024

@umfundii Hello! We have refactored the CBF code. Specifically, we decoupled the solver and dynamics_model to facilitate user customization for specific environments. We have also completed the documentation for this interface. We have built a pre-release version on the branch corresponding to this PR: https://omnisafe-zjy.readthedocs.io/en/latest/saferl/cbf.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants