Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Incompatibility between TorchRL and Gymnasium 1.0: Auto-Reset Feature Breaks Modularity and Data Integrity #2477

Closed
vmoens opened this issue Oct 9, 2024 · 3 comments

Comments

@vmoens
Copy link
Contributor

vmoens commented Oct 9, 2024

Description

The recent release of Gymnasium 1.0 introduces an auto-reset feature that silently but irrevocably changes the behavior of the step method in some but not all environments. While this feature may be useful for certain use cases, it breaks the modularity and data integrity assumptions in TorchRL.

Because of this, as of today there is no plan of supporting gymnasium v1.0 within the library.

Specifically, the auto-reset feature causes the following issues:

  1. Unpredictable step counting: The number of steps executed by the environment becomes unpredictable, making it difficult to accurately count steps and manage training loops.
  2. Data corruption: The environment may produce junk data during resets (reward, done states) or ask for garbage data (purposeless actions), which can pollute buffers and compromise the integrity of the training data if not filtered out.

Regarding 1. and 2.: This is true for vectorized environments as well as regular ones where autoresetting has been toggled on. From a TorchRL perspective, this means that the same script will behave differently when the same backend (gymnasium) is used but a ParallelEnv(GymWrapper(Env)) or a GymWrapper(VectorEnv) will be used. The only fix will be for you, the user, to account for these changes which will otherwise silently corrupt your data. This is not a responsibility we think TorchRL should endorse.

One may argue that resets are infrequent but in some frameworks (eg, Roobohive) they can occur as often as every 50 steps, which could lead to a 2% amount of corrupted data in the buffer.

  1. Increased computational overhead: The additional complexity of auto-resets requires manual filtering and boilerplate code to mitigate these issues, compromising the efficiency and ease of use of TorchRL.

  2. There is a more fundamental issue from the torchrl perspective. This is a typical rollout loop in gymnasium:

replay_buffer = []
obs, _ = envs.reset()
autoreset = np.zeros(envs.num_envs)
for _ in range(total_timesteps):
    next_obs, rewards, terminations, truncations, _ = envs.step(envs.action_space.sample())

    for j in range(envs.num_envs):
        if not autoreset[j]:
            replay_buffer.append((
                obs[j], rewards[j], terminations[j], truncations[j], next_obs[j]
            ))

    obs = next_obs
    autoreset = np.logical_or(terminations, truncations)

Now take the torchrl rollout (without resets since they should be accounted for):

for _ in range(total_timesteps):
	td = env.step(policy(td))
    buffer.add(td)
    td = step_mdp(td)

Before reset time, the ("next", "done") entry will be True. Then, we'll carry that observation to the root in step_mdp and put the reset "observation" key in the "next" tensordict during the next call to step (which is by essence a call to reset). This will silently cause the last observation of trajectory t to be considered as the first of trajectory t+1.

To maintain the integrity and efficiency of our library, we cannot support Gymnasium 1.0 or later versions at this time. We believe that the auto-reset feature as implemented is incompatible with the design principles of TorchRL, which prioritize modularity, data integrity, and ease of use.

Proposed Solutions:

We propose the following solutions to address this compatibility issue:

  • Partial resets: Introduce a partial reset mechanism that allows environments to reset without producing invalid data or affecting step counting.
  • Optional auto-reset: Make the auto-reset feature optional, allowing users to choose whether to enable it or not.
  • Alternative API: Provide an alternative API that maintains the original behavior of the step method, allowing TorchRL to continue supporting Gymnasium environments without breaking modularity and data integrity.

Unless some version of these is implemented, TorchRL will not be able to support gymnasium v1.0 and further release.

TorchRL is willing to make changes in its GymWrapper classes internally to make it compatible with gymnasium 1.0. As of now, any such work would still require us to change all the training scripts we have and ask users to do the same.

Discussion:

We would like to discuss this issue with the Gymnasium community and explore possible solutions that balance the needs of both libraries. We believe that finding a compatible solution will benefit both communities and promote the development of more robust and efficient reinforcement learning pipelines.

We strongly believe that Gym was a cornerstone in the development of RL thanks to its simplicity and the low probability for users to get things wrong. Let's work together to keep this standard alive!

Related content:
#2473

@vmoens vmoens added the bug Something isn't working label Oct 9, 2024
@vmoens vmoens self-assigned this Oct 9, 2024
@antoinebrl
Copy link
Contributor

Thanks for producing a concise report of the new features and the incompatibilities between gymnasium v1 and torchrl.

As a more practical consideration, should torchrl specify lower and upper bounds for its dependencies? https://github.com/pytorch/rl/blob/main/setup.py#L197

@vmoens
Copy link
Contributor Author

vmoens commented Oct 10, 2024

@antoinebrl Yes we should probably do that. I'll also make a PR to error when GymWrapper is created with a gymnasium 1.0 env.

@vmoens vmoens removed their assignment Oct 10, 2024
@vmoens vmoens removed the bug Something isn't working label Oct 10, 2024
@vmoens
Copy link
Contributor Author

vmoens commented Oct 10, 2024

Moved to #2483

@vmoens vmoens closed this as completed Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants