You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The recent release of Gymnasium 1.0 introduces an auto-reset feature that silently but irrevocably changes the behavior of the step method in some but not all environments. While this feature may be useful for certain use cases, it breaks the modularity and data integrity assumptions in TorchRL.
Because of this, as of today there is no plan of supporting gymnasium v1.0 within the library.
Specifically, the auto-reset feature causes the following issues:
Unpredictable step counting: The number of steps executed by the environment becomes unpredictable, making it difficult to accurately count steps and manage training loops.
Data corruption: The environment may produce junk data during resets (reward, done states) or ask for garbage data (purposeless actions), which can pollute buffers and compromise the integrity of the training data if not filtered out.
Regarding 1. and 2.: This is true for vectorized environments as well as regular ones where autoresetting has been toggled on. From a TorchRL perspective, this means that the same script will behave differently when the same backend (gymnasium) is used but a ParallelEnv(GymWrapper(Env)) or a GymWrapper(VectorEnv) will be used. The only fix will be for you, the user, to account for these changes which will otherwise silently corrupt your data. This is not a responsibility we think TorchRL should endorse.
One may argue that resets are infrequent but in some frameworks (eg, Roobohive) they can occur as often as every 50 steps, which could lead to a 2% amount of corrupted data in the buffer.
Increased computational overhead: The additional complexity of auto-resets requires manual filtering and boilerplate code to mitigate these issues, compromising the efficiency and ease of use of TorchRL.
There is a more fundamental issue from the torchrl perspective. This is a typical rollout loop in gymnasium:
Before reset time, the ("next", "done") entry will be True. Then, we'll carry that observation to the root in step_mdp and put the reset"observation" key in the "next" tensordict during the next call to step (which is by essence a call to reset). This will silently cause the last observation of trajectory t to be considered as the first of trajectory t+1.
To maintain the integrity and efficiency of our library, we cannot support Gymnasium 1.0 or later versions at this time. We believe that the auto-reset feature as implemented is incompatible with the design principles of TorchRL, which prioritize modularity, data integrity, and ease of use.
Proposed Solutions:
We propose the following solutions to address this compatibility issue:
Partial resets: Introduce a partial reset mechanism that allows environments to reset without producing invalid data or affecting step counting.
Optional auto-reset: Make the auto-reset feature optional, allowing users to choose whether to enable it or not.
Alternative API: Provide an alternative API that maintains the original behavior of the step method, allowing TorchRL to continue supporting Gymnasium environments without breaking modularity and data integrity.
Unless some version of these is implemented, TorchRL will not be able to support gymnasium v1.0 and further release.
TorchRL is willing to make changes in its GymWrapper classes internally to make it compatible with gymnasium 1.0. As of now, any such work would still require us to change all the training scripts we have and ask users to do the same.
Discussion:
We would like to discuss this issue with the Gymnasium community and explore possible solutions that balance the needs of both libraries. We believe that finding a compatible solution will benefit both communities and promote the development of more robust and efficient reinforcement learning pipelines.
We strongly believe that Gym was a cornerstone in the development of RL thanks to its simplicity and the low probability for users to get things wrong. Let's work together to keep this standard alive!
Description
The recent release of Gymnasium 1.0 introduces an auto-reset feature that silently but irrevocably changes the behavior of the step method in some but not all environments. While this feature may be useful for certain use cases, it breaks the modularity and data integrity assumptions in TorchRL.
Because of this, as of today there is no plan of supporting gymnasium v1.0 within the library.
Specifically, the auto-reset feature causes the following issues:
Regarding 1. and 2.: This is true for vectorized environments as well as regular ones where autoresetting has been toggled on. From a TorchRL perspective, this means that the same script will behave differently when the same backend (gymnasium) is used but a
ParallelEnv(GymWrapper(Env))
or aGymWrapper(VectorEnv)
will be used. The only fix will be for you, the user, to account for these changes which will otherwise silently corrupt your data. This is not a responsibility we think TorchRL should endorse.One may argue that resets are infrequent but in some frameworks (eg, Roobohive) they can occur as often as every 50 steps, which could lead to a 2% amount of corrupted data in the buffer.
Increased computational overhead: The additional complexity of auto-resets requires manual filtering and boilerplate code to mitigate these issues, compromising the efficiency and ease of use of TorchRL.
There is a more fundamental issue from the torchrl perspective. This is a typical rollout loop in gymnasium:
Now take the torchrl rollout (without resets since they should be accounted for):
Before reset time, the
("next", "done")
entry will beTrue
. Then, we'll carry that observation to the root instep_mdp
and put the reset"observation"
key in the"next"
tensordict during the next call tostep
(which is by essence a call toreset
). This will silently cause the last observation of trajectoryt
to be considered as the first of trajectoryt+1
.To maintain the integrity and efficiency of our library, we cannot support Gymnasium 1.0 or later versions at this time. We believe that the auto-reset feature as implemented is incompatible with the design principles of TorchRL, which prioritize modularity, data integrity, and ease of use.
Proposed Solutions:
We propose the following solutions to address this compatibility issue:
Unless some version of these is implemented, TorchRL will not be able to support gymnasium v1.0 and further release.
TorchRL is willing to make changes in its
GymWrapper
classes internally to make it compatible with gymnasium 1.0. As of now, any such work would still require us to change all the training scripts we have and ask users to do the same.Discussion:
We would like to discuss this issue with the Gymnasium community and explore possible solutions that balance the needs of both libraries. We believe that finding a compatible solution will benefit both communities and promote the development of more robust and efficient reinforcement learning pipelines.
We strongly believe that Gym was a cornerstone in the development of RL thanks to its simplicity and the low probability for users to get things wrong. Let's work together to keep this standard alive!
Related content:
#2473
The text was updated successfully, but these errors were encountered: