Skip to content

Commit

Permalink
Merge branch 'tune-bullets-env' of github.com:araffin/rl-baselines-zo…
Browse files Browse the repository at this point in the history
…o into tune-bullets-env
  • Loading branch information
araffin committed Jan 17, 2019
2 parents 2874d24 + 5de5b28 commit 3b71b78
Show file tree
Hide file tree
Showing 7 changed files with 24 additions and 20 deletions.
2 changes: 1 addition & 1 deletion benchmark.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@
|sac |HumanoidBulletEnv-v0 | 2048.187| 829.776| 149886| 172|
|sac |InvertedDoublePendulumBulletEnv-v0 | 9357.406| 0.504| 150000| 150|
|sac |InvertedPendulumSwingupBulletEnv-v0| 891.508| 0.963| 150000| 150|
|sac |LunarLanderContinuous-v2 | 194.191| 100.631| 149699| 304|
|sac |LunarLanderContinuous-v2 | 269.783| 57.077| 149852| 709|
|sac |Pendulum-v0 | -159.669| 86.665| 150000| 750|
|sac |ReacherBulletEnv-v0 | 17.529| 9.860| 150000| 1000|
|sac |Walker2DBulletEnv-v0 | 2052.646| 13.631| 150000| 150|
5 changes: 4 additions & 1 deletion enjoy.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,10 @@
should_render=not args.no_render,
hyperparams=hyperparams)

model = ALGOS[algo].load(model_path)
# ACER raises errors because the environment passed must have
# the same number of environments as the model was trained on.
load_env = None if algo == 'acer' else env
model = ALGOS[algo].load(model_path, env=load_env)

obs = env.reset()

Expand Down
3 changes: 3 additions & 0 deletions hyperparams/ddpg.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Pendulum-v0:
noise_std: 0.1
memory_limit: 50000

# To be tuned
BipedalWalker-v2:
n_timesteps: !!float 5e6
policy: 'LnMlpPolicy'
Expand All @@ -27,6 +28,7 @@ BipedalWalker-v2:
noise_std: 0.2
memory_limit: 50000

# To be tuned
Walker2DBulletEnv-v0:
n_timesteps: !!float 2e6
policy: 'LnMlpPolicy'
Expand All @@ -36,6 +38,7 @@ Walker2DBulletEnv-v0:
batch_size: 64
normalize_observations: True

# To be tuned
HalfCheetahBulletEnv-v0:
n_timesteps: !!float 2e6
policy: 'LnMlpPolicy'
Expand Down
1 change: 1 addition & 0 deletions hyperparams/ppo2.yml
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,7 @@ MinitaurBulletDuckEnv-v0:
learning_rate: 2.5e-4
cliprange: 0.2

# To be tuned
HumanoidBulletEnv-v0:
normalize: true
n_envs: 8
Expand Down
26 changes: 13 additions & 13 deletions hyperparams/sac.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,24 +5,20 @@ MountainCarContinuous-v0:
learning_rate: lin_3e-4
buffer_size: 1000000
batch_size: 64
ent_coef: 0.01
ent_coef: 'auto'
train_freq: 1
gradient_steps: 1
learning_starts: 10000

Pendulum-v0:
n_timesteps: !!float 60000
policy: 'MlpPolicy'
ent_coef: 0.2
learning_starts: 1000

LunarLanderContinuous-v2:
n_timesteps: !!float 1e5
n_timesteps: !!float 5e5
policy: 'MlpPolicy'
learning_rate: !!float 3e-3
buffer_size: 50000
batch_size: 32
ent_coef: 0.2
batch_size: 256
learning_starts: 1000

BipedalWalker-v2:
Expand Down Expand Up @@ -53,7 +49,7 @@ HalfCheetahBulletEnv-v0:
learning_rate: lin_3e-4
buffer_size: 1000000
batch_size: 64
ent_coef: 0.01
ent_coef: 'auto'
train_freq: 1
gradient_steps: 1
learning_starts: 10000
Expand Down Expand Up @@ -103,12 +99,13 @@ ReacherBulletEnv-v0:
learning_starts: 1000

HumanoidBulletEnv-v0:
normalize: "{'norm_obs': True, 'norm_reward': False}"
n_timesteps: !!float 2e7
policy: 'CustomSACPolicy'
learning_rate: lin_3e-4
buffer_size: 1000000
batch_size: 64
ent_coef: 0.01
ent_coef: 'auto'
train_freq: 1
gradient_steps: 1
learning_starts: 1000
Expand All @@ -135,26 +132,29 @@ InvertedPendulumSwingupBulletEnv-v0:
gradient_steps: 1
learning_starts: 1000

# To be tuned
MinitaurBulletEnv-v0:
normalize: "{'norm_obs': True, 'norm_reward': False}"
n_timesteps: !!float 1e6
policy: 'CustomSACPolicy'
learning_rate: lin_3e-4
buffer_size: 1000000
batch_size: 256
ent_coef: 0.05
batch_size: 64
ent_coef: 'auto'
# ent_coef: 0.0003
train_freq: 1
gradient_steps: 1
learning_starts: 1000

# To be tuned
MinitaurBulletDuckEnv-v0:
normalize: "{'norm_obs': True, 'norm_reward': False}"
# normalize: "{'norm_obs': True, 'norm_reward': False}"
n_timesteps: !!float 1e6
policy: 'CustomSACPolicy'
learning_rate: lin_3e-4
buffer_size: 1000000
batch_size: 256
ent_coef: 0.05
ent_coef: 'auto'
train_freq: 1
gradient_steps: 1
learning_starts: 1000
Binary file modified trained_agents/sac/LunarLanderContinuous-v2.pkl
Binary file not shown.
7 changes: 2 additions & 5 deletions trained_agents/sac/LunarLanderContinuous-v2/config.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
!!python/object/apply:collections.OrderedDict
- - [batch_size, 32]
- [buffer_size, 50000]
- [ent_coef, 0.2]
- [learning_rate, 0.003]
- - [batch_size, 256]
- [learning_starts, 1000]
- [n_timesteps, 100000.0]
- [n_timesteps, 500000.0]
- [policy, MlpPolicy]

0 comments on commit 3b71b78

Please sign in to comment.