-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is this reward function good for competition evaluation? #201
Comments
@luckeciano May you elaborate on v_tgt being null at the initial position? How did you get this null vector? |
Hey @smsong, Actually, I commited a mistake. The v_tgt is not null at initial position (I saw a point in the map, but there is an arrow as well). I'm sorry. However, I printed the components from footstep reward and in this situation, the penalization is very low when compared with the total reward of just give a long footstep during the episode. In one of my tests, my agent did a single footstep, obtaining 47 of reward, losing only ~10 from effort and velocity deviation. Therefore, it is possible to obtain almost all the possible reward without leaving the initial position. I think the reward should be modified - at least the weights. Otherwise, there is a possibility of top submissions without any walk motion. |
@luckeciano Thanks for the clarification and suggestion. |
Hey guys,
I would like to add a concern regarding the reward function.
After some analysis, I think it can be easily exploited for controllers that does not walk. Basically, the positive reward comes from the alive bonus and from footstep duration. An agent can just perform footsteps with no pelvis velocity (maintaining its initial position), or even just perform a long footstep from the beginning of the episode until the end without changing its position. In this way, the penalization is very low (the effort is low and there is no penalization from deviation because in the initial position Vtgt is a null vector).
As the objective of the competition is to learn to effectively walk following the navigation field, I think the reward function should be modified. My first thought is to add another factor that reinforces the idea of move. What do you guys think?
The text was updated successfully, but these errors were encountered: