Skip to content

Commit

Permalink
Bring back old runner_evo, udpates to runner docs, finish runner smok…
Browse files Browse the repository at this point in the history
…e tests
  • Loading branch information
chrismatix committed Oct 24, 2023
1 parent 2ba7c36 commit bb280e6
Show file tree
Hide file tree
Showing 22 changed files with 1,033 additions and 273 deletions.
29 changes: 24 additions & 5 deletions docs/getting-started/runners.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,28 @@
# Runner
# Runners

## Evo Runner

The Evo Runner optimizes the first agent using evolutionary learning.

See [this experiment](https://github.com/akbir/pax/blob/9a01bae33dcb2f812977be388751393f570957e9/pax/conf/experiment/cg/mfos.yaml) for an example of how to configure it.

## Evo Runner N-Roles

This runner extends the evo runner to `N > 2` agents by letting the first and second agent assume multiple roles that can be configured via `agent1_roles` and `agent2_roles` in the experiment configuration.
Both agents receive different sets of memories for each role that they assume but share the weights.

- For heterogeneous games roles can be shuffled for each rollout using the `shuffle_players` flag.
- Using the `self_play_anneal` flag one can anneal the self-play probability from 0 to 1 over the course of the experiment.

See [this experiment](https://github.com/akbir/pax/blob/bb0e69ef71fd01ec9c85753814ffba3c5cb77935/pax/conf/experiment/rice/shaper_v_ppo.yaml) for an example of how to configure it.

## Weight sharing Runner

A simple baseline for MARL experiments is having one agent assume multiple roles and share the weights between them (but not the memory).
In order for this approach to work the observation vector needs to include one entry that indicates the role of the agent (see [Terry et al.](https://arxiv.org/abs/2005.13625v7).

See [this experiment](https://github.com/akbir/pax/blob/9d3fa62e34279a338c07cffcbf208edc8a95e7ba/pax/conf/experiment/rice/weight_sharing.yaml) for an example of how to configure it.

## Runner 1

Lorem ipsum.

## Runner 2

Lorem ipsum.
6 changes: 5 additions & 1 deletion pax/agents/ppo/ppo.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
make_ipd_network,
)
from pax.envs.iterated_matrix_game import IteratedMatrixGame
from pax.envs.iterated_tensor_game_n_player import IteratedTensorGameNPlayer
from pax.envs.rice.c_rice import ClubRice
from pax.envs.rice.rice import Rice
from pax.envs.rice.sarl_rice import SarlRice
Expand Down Expand Up @@ -517,7 +518,10 @@ def make_agent(
network = make_rice_sarl_network(action_spec, agent_args.hidden_size)
elif args.runner == "sarl":
network = make_sarl_network(action_spec)
elif args.env_id == IteratedMatrixGame.env_id:
elif args.env_id in [
IteratedMatrixGame.env_id,
IteratedTensorGameNPlayer.env_id,
]:
network = make_ipd_network(action_spec, True, agent_args.hidden_size)
else:
raise NotImplementedError(
Expand Down
2 changes: 1 addition & 1 deletion pax/conf/experiment/c_rice/debug.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ env_type: meta
num_players: 6
has_mediator: True
config_folder: pax/envs/rice/5_regions
runner: evo
runner: evo_nroles

# Training
top_k: 5
Expand Down
2 changes: 1 addition & 1 deletion pax/conf/experiment/c_rice/marl_baseline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ env_type: meta
num_players: 6
has_mediator: True
config_folder: pax/envs/rice/5_regions
runner: evo
runner: evo_nroles
rice_v2_network: True

# Training
Expand Down
2 changes: 1 addition & 1 deletion pax/conf/experiment/c_rice/mediator_gs_ppo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ env_type: meta
num_players: 6
has_mediator: True
config_folder: pax/envs/rice/5_regions
runner: evo
runner: evo_nroles
rice_v2_network: True
agent2_reset_interval: 10

Expand Down
2 changes: 1 addition & 1 deletion pax/conf/experiment/c_rice/shaper_v_ppo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ num_players: 5
has_mediator: False
shuffle_players: False
config_folder: pax/envs/rice/5_regions
runner: evo
runner: evo_nroles
rice_v2_network: True

default_club_mitigation_rate: 0.1
Expand Down
63 changes: 56 additions & 7 deletions pax/conf/experiment/cg/mfos.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# @package _global_

# Agents
# Agents
agent1: 'MFOS'
agent2: 'PPO_memory'

Expand All @@ -11,24 +11,27 @@ egocentric: True
env_discount: 0.96
payoff: [[1, 1, -2], [1, 1, -2]]

# Runner
# Runner
runner: evo

top_k: 4
popsize: 1000 #512
# env_batch_size = num_envs * num_opponents
num_envs: 250
num_opps: 1
num_outer_steps: 600
num_inner_steps: 16
save_interval: 100
num_inner_steps: 16
save_interval: 100
num_steps: '${num_inner_steps}'

# Evaluation
# Evaluation
run_path: ucl-dark/cg/12auc9um
model_path: exp/sanity-PPO-vs-PPO-parity/run-seed-0/2022-09-08_20.04.17.155963/iteration_500

# PPO agent parameters
ppo:
ppo1:
num_minibatches: 8
num_epochs: 2
num_epochs: 2
gamma: 0.96
gae_lambda: 0.95
ppo_clipping_epsilon: 0.2
Expand All @@ -49,6 +52,52 @@ ppo:
separate: True # only works with CNN
hidden_size: 16 #50

ppo2:
num_minibatches: 8
num_epochs: 2
gamma: 0.96
gae_lambda: 0.95
ppo_clipping_epsilon: 0.2
value_coeff: 0.5
clip_value: True
max_gradient_norm: 0.5
anneal_entropy: False
entropy_coeff_start: 0.1
entropy_coeff_horizon: 0.6e8
entropy_coeff_end: 0.005
lr_scheduling: False
learning_rate: 0.01 #0.05
adam_epsilon: 1e-5
with_memory: True
with_cnn: False
output_channels: 16
kernel_shape: [3, 3]
separate: True # only works with CNN
hidden_size: 16 #50

# ES parameters
es:
algo: OpenES # [OpenES, CMA_ES]
sigma_init: 0.04 # Initial scale of isotropic Gaussian noise
sigma_decay: 0.999 # Multiplicative decay factor
sigma_limit: 0.01 # Smallest possible scale
init_min: 0.0 # Range of parameter mean initialization - Min
init_max: 0.0 # Range of parameter mean initialization - Max
clip_min: -1e10 # Range of parameter proposals - Min
clip_max: 1e10 # Range of parameter proposals - Max
lrate_init: 0.01 # Initial learning rate
lrate_decay: 0.9999 # Multiplicative decay factor
lrate_limit: 0.001 # Smallest possible lrate
beta_1: 0.99 # Adam - beta_1
beta_2: 0.999 # Adam - beta_2
eps: 1e-8 # eps constant,
centered_rank: False # Fitness centered_rank
w_decay: 0 # Decay old elite fitness
maximise: True # Maximise fitness
z_score: False # Normalise fitness
mean_reduce: True # Remove mean


# Logging setup
wandb:
entity: "ucl-dark"
Expand Down
29 changes: 26 additions & 3 deletions pax/conf/experiment/cg/tabular.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# @package _global_

# Agents
# Agents
agent1: 'Tabular'
agent2: 'Random'

Expand All @@ -25,9 +25,32 @@ num_iters: 10000
# train_batch_size = num_envs * num_opponents * num_steps

# PPO agent parameters
ppo:
ppo1:
num_minibatches: 8
num_epochs: 2
num_epochs: 2
gamma: 0.96
gae_lambda: 0.95
ppo_clipping_epsilon: 0.2
value_coeff: 0.5
clip_value: True
max_gradient_norm: 0.5
anneal_entropy: True
entropy_coeff_start: 0.1
entropy_coeff_horizon: 0.6e8
entropy_coeff_end: 0.005
lr_scheduling: True
learning_rate: 0.01 #0.05
adam_epsilon: 1e-5
with_memory: True
with_cnn: False
output_channels: 16
kernel_shape: [3, 3]
separate: True # only works with CNN
hidden_size: 16 #50

ppo2:
num_minibatches: 8
num_epochs: 2
gamma: 0.96
gae_lambda: 0.95
ppo_clipping_epsilon: 0.2
Expand Down
2 changes: 1 addition & 1 deletion pax/conf/experiment/cournot/eval_shaper_v_ppo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ b: 1
marginal_cost: 10

# Runner
runner: evo
runner: evo_nroles

# Training
top_k: 5
Expand Down
2 changes: 1 addition & 1 deletion pax/conf/experiment/cournot/shaper_v_ppo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ b: 1
marginal_cost: 10

# Runner
runner: evo
runner: evo_nroles

# Training
top_k: 5
Expand Down
2 changes: 1 addition & 1 deletion pax/conf/experiment/fishery/marl_baseline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ s_0: 0.5
s_max: 1.0

# This means the optimum quantity is 2(a-marginal_cost)/3b = 60
runner: evo
runner: evo_nroles

# env_batch_size = num_envs * num_opponents
num_envs: 100
Expand Down
2 changes: 1 addition & 1 deletion pax/conf/experiment/fishery/mfos_v_ppo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ s_0: 0.5
s_max: 1.0

# Runner
runner: evo
runner: evo_nroles

# Training
top_k: 5
Expand Down
2 changes: 1 addition & 1 deletion pax/conf/experiment/fishery/shaper_v_ppo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ w: 0.9
s_0: 0.5
s_max: 1.0
# Runner
runner: evo
runner: evo_nroles

# Training
top_k: 5
Expand Down
2 changes: 1 addition & 1 deletion pax/conf/experiment/rice/gs_v_ppo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ num_players: 5
has_mediator: False
shuffle_players: False
config_folder: pax/envs/rice/5_regions
runner: evo
runner: evo_nroles


# Training
Expand Down
2 changes: 1 addition & 1 deletion pax/conf/experiment/rice/mfos_v_ppo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ num_players: 5
has_mediator: False
shuffle_players: False
config_folder: pax/envs/rice/5_regions
runner: evo
runner: evo_nroles

# Training
top_k: 5
Expand Down
4 changes: 2 additions & 2 deletions pax/conf/experiment/rice/shaper_v_ppo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,15 @@ num_players: 5
has_mediator: False
shuffle_players: True
config_folder: pax/envs/rice/5_regions
runner: evo
runner: evo_nroles
rice_v2_network: True

# Training
top_k: 5
popsize: 1000
num_envs: 1
num_opps: 1
num_outer_steps: 200
num_outer_steps: e200
num_inner_steps: 200
num_iters: 1500
num_devices: 1
Expand Down
1 change: 1 addition & 0 deletions pax/envs/iterated_tensor_game_n_player.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ class EnvParams:


class IteratedTensorGameNPlayer(environment.Environment):
env_id = "iterated_nplayer_tensor_game"
"""
JAX Compatible version of tensor game environment.
"""
Expand Down
17 changes: 15 additions & 2 deletions pax/experiment.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@
from pax.envs.rice.c_rice import ClubRice
from pax.envs.rice.rice import Rice, EnvParams as RiceParams
from pax.envs.rice.sarl_rice import SarlRice
from pax.runners.runner_evo_nroles import EvoRunnerNRoles
from pax.runners.runner_weight_sharing import WeightSharingRunner
from pax.runners.runner_eval import EvalRunner
from pax.runners.runner_eval_multishaper import MultishaperEvalRunner
Expand Down Expand Up @@ -281,7 +282,7 @@ def runner_setup(args, env, agents, save_dir, logger):
logger.info("Evaluating with ipditmEvalRunner")
return IPDITMEvalRunner(agents, env, save_dir, args)

if args.runner == "evo" or args.runner == "multishaper_evo":
if args.runner in ["evo", "multishaper_evo", "evo_nroles"]:
agent1 = agents[0]
algo = args.es.algo
strategies = {"CMA_ES", "OpenES", "PGPE", "SimpleGA"}
Expand Down Expand Up @@ -378,6 +379,18 @@ def get_pgpe_strategy(agent):
args,
)

elif args.runner == "evo_nroles":
logger.info("Training with n_roles EVO runner")
return EvoRunnerNRoles(
agents,
env,
strategy,
es_params,
param_reshaper,
save_dir,
args,
)

elif args.runner == "multishaper_evo":
logger.info("Training with multishaper EVO runner")
return MultishaperEvoRunner(
Expand Down Expand Up @@ -782,7 +795,7 @@ def main(args):

print(f"Number of Training Iterations: {args.num_iters}")

if args.runner == "evo" or args.runner == "multishaper_evo":
if args.runner in ["evo", "evo_nroles", "multishaper_evo"]:
runner.run_loop(env_params, agent_pair, args.num_iters, watchers)
elif args.runner == "rl" or args.runner == "tensor_rl_nplayer":
# number of episodes
Expand Down
Loading

0 comments on commit bb280e6

Please sign in to comment.