Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

"Done" default of 1 results in 0 reward episodes #23

Open
SamNPowers opened this issue Mar 17, 2021 · 1 comment
Open

"Done" default of 1 results in 0 reward episodes #23

SamNPowers opened this issue Mar 17, 2021 · 1 comment
Assignees

Comments

@SamNPowers
Copy link

In core/environment.py, done defaults to torch.ones, instead of torch.zeros. This means that in monobeast's act(), the first replay entry each actor creates has a done value of 1. Then when episode returns are reported, those episodes have rewards of 0, though the episodes never really happened at all.

(By the way, excellent repo! Very useful.)

@heiner
Copy link
Contributor

heiner commented Mar 17, 2021

Hey Sam,

Thanks for your interest in TorchBeast and for your kind words.

You are correct. The reason done is True at t=0 is because done == True iff "episode just started" which is the the case for the first episode, too.

If this is a problem in your case I'm happy to accept a patch that turns the torch.ones into torch.zeros as I don't believe this matters currently (I suppose the LSTM/agent state needs to be reset in the same way it is initialized for it to not matter at all, but all of this affects the first episode only).

@heiner heiner self-assigned this Mar 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants