Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddpg + her #7

Open
stefanwanckel opened this issue Apr 13, 2021 · 4 comments
Open

ddpg + her #7

stefanwanckel opened this issue Apr 13, 2021 · 4 comments

Comments

@stefanwanckel
Copy link

I was wondering whether you have tried to train a model with ddpg + her.
I had some success training sac + her but with ddpg my arm "folds" itself by eventually setting q to joint limits.

If you have, maybe you could share some thoughts on it. Thanks in advance.

@stefanwanckel
Copy link
Author

*Addendum: Even with only ddpg, the robot arm moves into said position and it is very difficult for him to change its position from there. I also implemented ddpg from scratch, corroborated it with "pendulum-v0" gym sample environment and tried it with my robot environment, but the result was similar. After around 10 optimization steps my cumulative reward begins to (slowly) diverge towards the negative.
Any insight would be much appreciated.

@PierreExeter
Copy link
Owner

Hi Stefan,

I did a quick check and I didn't encounter any problem in training a model with DDPG.
I'm not sure what you mean by "fold", can you attach a screenshot to illustrate?

Can you also give more details on your training environment (i.e. observation shape, reward function, action shape, fixed / random goal, fixed / moving goal, action space) and DDPG hyperparameters?

@stefanwanckel
Copy link
Author

By folding I mean that the robot goes into a configuration where all joint angles are at max or at min and it cant get out of that configuration.

Here is a picture:
image

I think it is best to give you a link to my repository.
https://github.com/stefanwanckel/DRL/tree/main/Tryhard

I adapted my GymEnv structure from your repository, so it is pretty similar. I stopped tracking your repository though, so I wills stick with an older version.
I am using the train.py function provided by stable-baselines3-zoo to trian my models.

The init for the environment on the picture are look like this:

id='ur5e_reacher-v5',
entry_point='ur5e_env.envs.ur5e_env:Ur5eEnv',
max_episode_steps=2000,
kwargs={
'random_position' : False,
'random_orientation': False,
'moving_target': False,
'target_type': "sphere",
'goal_oriented' : True,
'obs_type' : 1,
'reward_type' : 13,
'action_type' : 1,
'joint_limits' : "small",
'action_min': [-1, -1, -1, -1, -1, -1],
'action_max': [1, 1, 1, 1, 1, 1],
'alpha_reward': 0.1,
'reward_coeff': 1,
'action_scale': 1,
'eps' : 0.1,
'sim_rep' : 5,
'action_mode' : "force"

The hyperparams look like this:

ur5e_reacher-v5:
n_timesteps: !!float 1e7
policy: 'MlpPolicy'
model_class: 'ddpg'
n_sampled_goal: 4
goal_selection_strategy: 'future'
buffer_size: 1000000
batch_size: 128
gamma: 0.95
learning_rate: !!float 1e-3
noise_type: 'normal'
noise_std: 0.2
policy_kwargs: "dict(net_arch=[512, 512, 512])"
online_sampling: True
#max_episode_length: 100

@PierreExeter
Copy link
Owner

I don't have time for a case by case troubleshooting but I can suggest a few things:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants