Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the implementation of final rewards correct? #5

Open
nikhilrayaprolu opened this issue Jan 29, 2022 · 1 comment
Open

Is the implementation of final rewards correct? #5

nikhilrayaprolu opened this issue Jan 29, 2022 · 1 comment

Comments

@nikhilrayaprolu
Copy link

nikhilrayaprolu commented Jan 29, 2022

As per the original implementation, the final rewards are supposed to replace the reward at the end of each episode in the replay buffer.

https://github.com/google-research/google-research/blob/901524f4d4ab15ef9d2f5165148347d0f26b32c2/social_rl/adversarial_env/agent_train_package.py#L260-L264

Whereas in the case of this PyTorch implementation the final reward is replaced only for the final return.

paired/algos/storage.py

Lines 201 to 202 in c836e86

def replace_final_return(self, returns):
self.rewards[-1] = returns

Did I misunderstand anything in the code?

@matteobettini
Copy link

I have a question related to this, but also regarding the original implementation.

In particular, in Algorithm 1 from the original paper

Screenshot 2024-02-07 at 16 35 58

I am not able to make sense of the last 3 lines.

$R(\tau)$ is not defined anywhere else in the paper. What is it supposed to mean?

  • Is it setting the final reward of all 3 agents to the scalar regret. If so, this to me only makes sense for the environment designer.
  • Is the original $r_t$ used for training the protagonist and antagonists?
  • Is it overrwriting all rewards at the different timesteps in the trajs with a scalar?

If anyone can provide some clarification on these last 3 lines I would be immensely grateful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants