"Done" default of 1 results in 0 reward episodes #23

SamNPowers · 2021-03-17T00:27:35Z

In core/environment.py, done defaults to torch.ones, instead of torch.zeros. This means that in monobeast's act(), the first replay entry each actor creates has a done value of 1. Then when episode returns are reported, those episodes have rewards of 0, though the episodes never really happened at all.

(By the way, excellent repo! Very useful.)

heiner · 2021-03-17T10:12:26Z

Hey Sam,

Thanks for your interest in TorchBeast and for your kind words.

You are correct. The reason done is True at t=0 is because done == True iff "episode just started" which is the the case for the first episode, too.

If this is a problem in your case I'm happy to accept a patch that turns the torch.ones into torch.zeros as I don't believe this matters currently (I suppose the LSTM/agent state needs to be reset in the same way it is initialized for it to not matter at all, but all of this affects the first episode only).

heiner self-assigned this Mar 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Done" default of 1 results in 0 reward episodes #23

"Done" default of 1 results in 0 reward episodes #23

SamNPowers commented Mar 17, 2021

heiner commented Mar 17, 2021

"Done" default of 1 results in 0 reward episodes #23

"Done" default of 1 results in 0 reward episodes #23

Comments

SamNPowers commented Mar 17, 2021

heiner commented Mar 17, 2021