Release v0.1.1 · pytorch/rl

What's Changed

[Feature] Stacking specs by @vmoens in #892
[Feature] Multicollector interruptor by @albertbou92 in #963
[BugFix] VMAS api fix by @matteobettini in #978
[CI] Fix D4RL tests in CI by @vmoens in #976
[CI] Fix CI by @vmoens in #982
[Refactor] Binary spec inherits from discrete spec by @matteobettini in #984
[Feature] _DataCollector -> DataCollectorBase by @vmoens in #985
[Feature] Discrete SAC by @BY571 in #882
[Refactor, Doc] Refactor refs to SafeModule to TensorDictModule unless necessary by @vmoens in #986
[BugFix] Quickfix by @vmoens in #991
[Feature] Add Dropout to MLP module by @BY571 in #988
[Feature] Warn when collectors collect more frames than requested by @matteobettini in #989
[BugFix] make "_reset", "step_count", and other done_based keys follow done_spec by @matteobettini in #981
[Feature] Bandit datasets by @vmoens in #912
[BugFix] Fix sampling in PPO tutorial by @vmoens in #996
[Refactor] Refactor losses (value function, doc, input batch size) by @vmoens in #987
[BugFix,Feature,Doc] Fix replay buffers sampling info, docstrings and iteration by @vmoens in #1003
[Feature] Replace ValueError by warning in collectors when total_frames is not an exact multiple of frames_per_batch by @albertbou92 in #999
[BugFix] Only call replay buffer transforms when there are by @vmoens in #1008
[BugFix] Patch tests in 1008 by @vmoens in #1009
[Feature] Multidim value functions by @vmoens in #1007
[BugFix] Fix exploration (OU and Gaussian) by @vmoens in #1006
[CI] Fix python version in habitat by @vmoens in #1010
Advantages pass time_dimand docfix by @matteobettini in #1014
[Refactor] Faster transformed distributions by @vmoens in #1017
[WIP, CI] Upgrade cuda channel by @vmoens in #1019
[BugFix] Fix collector reset with truncation by @vmoens in #1021
[Refactor] Improve collector performance by @matteobettini in #1020
[BugFix] Fix params and buffer casting for policies by @vmoens in #1022
[Feature] PPO allow entropy logging when entropy_coeff is 0 by @matteobettini in #1025
[Feature] Distributed data collector (ray) by @albertbou92 in #930
[Refactor] Minor changes in tensordict construction by @vmoens in #1029
[CI] Fix Brax 0.9.0 by @vmoens in #1011
[Feature] Multiagent API in vmas by @matteobettini in #983
[Feature] Benchmarking worflow by @vmoens in #1028
[Benchmark] Fix adv benchmark by @vmoens in #1030
[Doc] Refactor DDPG and DQN tutos to narrow the scope by @vmoens in #979
Revert "[Doc] Refactor DDPG and DQN tutos to narrow the scope" by @vmoens in #1032
[BugFix] Advantage normalisation in ClipPPOLoss is done after computing gain1 by @albertbou92 in #1033
[BugFix] Codecov SHA error by @vmoens in #1035
[Doc] DDPG and DQN refactoring -- Doc cleaning by @vmoens in #1036
[BugFix,CI] Fix macos codecov install by @vmoens in #1039
[BugFix] kwargs update in distributed collectors by @vmoens in #1040
[Feature] make_composite_from_td by @vmoens in #1042
[Refactor] Import envpool locally to avoid importing gym at root level by @vmoens in #1041
[Minor] Fix a typo by @FrankTianTT in #1046
[BugFix] Fix param tying in loss modules by @vmoens in #1037
[Refactor] less ad-hoc disable_env_checker check by @vmoens in #1047
[Refactor] Improve distributed collectors by @vmoens in #1044
[Doc] Document tensordict modules by @vmoens in #1053
[Doc] Minor changes to contributing.md by @vmoens in #1054
[Doc] A bit more doc on modules by @vmoens in #1056
[Refactor] Import enum and interaction_type utils by @Goldspear in #1055
[Feature] Deduplicate calls to common layers in PPO by @vmoens in #1057
[BugFix] CompositeSpec nested key deletion by @btx0424 in #1059
[Feature] Add MaskedCategorical distribution by @xiaomengy in #1012
[Refactor] resetting envs in collectors always passes the _reset entry by @vmoens in #1061
[Refactor] Better integration of QValue tools by @vmoens in #1063
MUJOCO_INSTALLATION.md: Fix typo by @traversaro in #1064
[Refactor] Removes "reward" from root tensordicts by @vmoens in #1065
[Test] Fix tests for older pytorch versions by @vmoens in #1066
[Feature] Reward2go Transform by @BY571 in #1038
[CI] Reduce tests by @vmoens in #1071
[Feature] Skip existing for advantage modules by @vmoens in #1070
[BugFix] Fix parallel env data passing on cuda by @vmoens in #1024
[Refactor] Deprecate interaction_mode by @vmoens in #1067
[Doc] Update KB: cannot find -lGL by @vmoens in #1073
[Doc] fix figures display issues in documentation of actors.py by @DamienAllonsius in #1074
[Example] PPO simplified example by @albertbou92 in #1004
[Feature] Update td in step (not overwrite) by @vmoens in #1075
[CI] Remove migrated CircleCI macOS jobs by @seemethere in #1069
[Feature] Target Return Transform by @BY571 in #1045
[Test] Fix tensorboard tests with ImageIO 2.26 by @vmoens in #1083
[Feature] LSTMModule by @vmoens in #1084
[BugFix] Change default of skip_existing to None by @tcbegley in #1082
[Example] A2C simplified example by @albertbou92 in #1076
[BugFix] Fix output_spec transform calls by @vmoens in #1091
[Feature] Indexing Discrete and OneHot specs by @remidomingues in #1081
[Refactor] Refactor DQN by @vmoens in #1085
[Feature] Auto-init updaters and raise a warning if not present by @vmoens in #1092
[BugFix] Remove false warnings in losses by @vmoens in #1096
[CI, BugFix] Fix CI warnings and errors by @vmoens in #1100
[Refactor] Update vmap imports to torch by @vmoens in #1102
[Refactor] Make advantages non-differentiable by default (except in losses) by @vmoens in #1104
[Feature] Indexing specs by @remidomingues in #1105
[BugFix] Fix EnvPoool by @vmoens in #1106
[Feature,Doc] QValue refactoring and QNet + RNN tuto by @vmoens in #1060
[BugFix] Fix Gym imports by @vmoens in #1023
[CI] pytest should not skip tests for dependencies by @rohitnig in #1048
[BugFix, Doc] Fix tutos by @vmoens in #1107
[CI] Fix tutos (2) by @vmoens in #1109
[Doc] Fix doc rendering by @vmoens in #1112
Added the entry for skip-tests in the environment.yml by @rohitnig in #1113
[CI] Upgrade ubuntu version in GHA by @vmoens in #1116
Fix in windows unit test by @mischab in #1099
Revert "Fix in windows unit test" by @mischab in #1117
[Nova] Lint job on GHA by @osalpekar in #1114
[Nova] Remove CircleCI Wheels Builds by @osalpekar in #1121
[BugFix] Set exploration mode to MODE in all losses by default by @vmoens in #1123
[BugFix] Instruct the value key to PPOLoss by @vmoens in #1124
[Feature] CatFrames for offline data by @vmoens in #1122
[CI] Fix windows CI by @vmoens in #1128
[Refactor] Buffers tensorclass compat and tutorial by @vmoens in #1101
[Feature] Marking the time dimension by @vmoens in #1095
[Doc] Add tuto and time dim info in docs by @vmoens in #1130
[Doc] Fix locked samples from RBs and ccl of tuto by @vmoens in #1132
[BugFix] Fix unlock in RB by @vmoens in #1135
[BugFix] extract the info dict from a list by @xmaples in #1131
[Feature] Added support for vector-based rewards from environments in MO-Gymnasium by @dennismalmgren in #992
[Versioning] v0.1.1 by @vmoens in #1137

New Contributors

@FrankTianTT made their first contribution in #1046
@Goldspear made their first contribution in #1055
@btx0424 made their first contribution in #1059
@traversaro made their first contribution in #1064
@DamienAllonsius made their first contribution in #1074
@seemethere made their first contribution in #1069
@remidomingues made their first contribution in #1081
@rohitnig made their first contribution in #1048
@mischab made their first contribution in #1099
@osalpekar made their first contribution in #1114
@xmaples made their first contribution in #1131
@dennismalmgren made their first contribution in #992

Full Changelog: v0.1.0...v0.1.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.1.1

What's Changed

New Contributors

Contributors