We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
As per the original implementation, the final rewards are supposed to replace the reward at the end of each episode in the replay buffer.
https://github.com/google-research/google-research/blob/901524f4d4ab15ef9d2f5165148347d0f26b32c2/social_rl/adversarial_env/agent_train_package.py#L260-L264
Whereas in the case of this PyTorch implementation the final reward is replaced only for the final return.
paired/algos/storage.py
Lines 201 to 202 in c836e86
Did I misunderstand anything in the code?
The text was updated successfully, but these errors were encountered:
I have a question related to this, but also regarding the original implementation.
In particular, in Algorithm 1 from the original paper
I am not able to make sense of the last 3 lines.
$R(\tau)$ is not defined anywhere else in the paper. What is it supposed to mean?
If anyone can provide some clarification on these last 3 lines I would be immensely grateful
Sorry, something went wrong.
No branches or pull requests
As per the original implementation, the final rewards are supposed to replace the reward at the end of each episode in the replay buffer.
https://github.com/google-research/google-research/blob/901524f4d4ab15ef9d2f5165148347d0f26b32c2/social_rl/adversarial_env/agent_train_package.py#L260-L264
Whereas in the case of this PyTorch implementation the final reward is replaced only for the final return.
paired/algos/storage.py
Lines 201 to 202 in c836e86
Did I misunderstand anything in the code?
The text was updated successfully, but these errors were encountered: