Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial bias towards action=1? #7

Open
no-zzz-un opened this issue Nov 10, 2016 · 6 comments
Open

initial bias towards action=1? #7

no-zzz-un opened this issue Nov 10, 2016 · 6 comments

Comments

@no-zzz-un
Copy link

why does the network start with such a strong bias towards trying action 1 every timestep?

i only occasionally see action=0.

i looks like it would be difficult to break out of this pattern since it receives reward = 0.1 for it before encountering the first pipe-gate..

@yanpanlau
Copy link
Owner

During the initial stage of the training, the agent is simply perform random exploration...The network should able to learn "don't flap too much" after training a while..

@no-zzz-un
Copy link
Author

@yanpanlau if the initial exploration is random, i would expect both actions to be equally likely initially? that isn't the case.

@wobeert
Copy link

wobeert commented Jan 26, 2018

How long is "a while"?
I trained on a 1080TI overnight and it didn't improve at all, or if it did it was not noticeable. It seemed to be performing almost the same actions from when it started training. The model seems to always crash at the very top of the first pipe. I tried training it from scratch but it didn't help. I tried messing around with the epsilon value but it didn't make much of a difference.
Anyone else have this issue?

@yanpanlau
Copy link
Owner

I just re-test it and it should converge after like 100,000 steps. Can you try with the latest code?

@AloshkaD
Copy link

AloshkaD commented Jul 1, 2018

Same issue here as @wobeert described. It didn't change even after 622000 steps. See below

image

@AloshkaD
Copy link

AloshkaD commented Jul 1, 2018

I fixed it! I introduced a bug by mistake to the original code when I was creating multigpu version for gpu keras. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants