Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructuring MADRaS into a Multi-agent MDP #39

Open
wants to merge 54 commits into
base: Version-2.0
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
e26307e
exception handling in step is problematic
Santara Dec 5, 2019
8f6a34f
Added Parked Agents, and reward function and observation for single a…
Santara Dec 6, 2019
8df9925
minor merge
Santara Dec 8, 2019
8550f4b
Multi-agent reset latency issue fixed
Santara Dec 12, 2019
4e6f535
Added respond func to elimate 10s of timeout error
manish-pra Dec 12, 2019
95e6a27
Merge pull request #6 from manish-pra/devel
Santara Dec 12, 2019
686e3bf
added multiprocessing event to parallely close all process(agent) tog…
Dec 13, 2019
8fb3fdd
updated gitignore
rudrasohan Dec 15, 2019
02431ba
updated changes from main repo
rudrasohan Dec 15, 2019
b4655a5
added changes suggested by @manish-pra
rudrasohan Dec 15, 2019
6fa2e00
smooth V2
rudrasohan Dec 15, 2019
d86ee0c
reorganization mirroring main-repo
rudrasohan Dec 15, 2019
5bef0d0
Time for 0.2
rudrasohan Dec 15, 2019
d2ca923
final fixes
rudrasohan Dec 15, 2019
67745c5
Merging #8
Santara Dec 15, 2019
28674d6
Resolved merge conflicts
Santara Dec 15, 2019
f2f1de5
Merging #7
Santara Dec 15, 2019
04f5550
Cleaning up the code...
Santara Dec 15, 2019
b9fca83
Merge branch 'manish-pra-devel' into devel
Santara Dec 15, 2019
2700353
Merge branch 'devel' of https://github.com/Santara/MADRaS into devel
Santara Dec 15, 2019
9e28bf1
Merging #8
Santara Dec 15, 2019
047b5d0
created action_dict
rudrasohan Dec 20, 2019
253b292
recomended sim changes
rudrasohan Dec 20, 2019
d0550be
added multi-agent train script
rudrasohan Dec 20, 2019
b97a446
adderessed @Santara's comments
rudrasohan Dec 21, 2019
11383d5
used inheritance for env wrapper
rudrasohan Dec 21, 2019
f70985c
added extra space
rudrasohan Dec 21, 2019
71b6fe0
done feaure req
rudrasohan Dec 21, 2019
e9b76e2
reformatted dones
rudrasohan Dec 21, 2019
5a9bbe5
Merge pull request #9 from rudrasohan/mult_train
Santara Dec 22, 2019
40f8a93
added comm capabilites (untested)
rudrasohan Dec 22, 2019
f6c730a
removed typos
rudrasohan Dec 22, 2019
53b52e4
syntax correctness
rudrasohan Dec 22, 2019
2a58d11
dimentional correctness
rudrasohan Dec 22, 2019
cd019da
removed buffer multiprocessing errors
rudrasohan Dec 22, 2019
7b9ad3a
updated changes for multi work
rudrasohan Feb 29, 2020
0cdf29d
changed conflicting names
rudrasohan Feb 29, 2020
640ca51
updated madras types nomenclature
rudrasohan Mar 14, 2020
172025b
[UNTESTED] updated comm module with custom network architecure
rudrasohan Mar 14, 2020
e86d664
obs_dim mismatch fixed
rudrasohan Mar 17, 2020
32c56d9
corrected obs formatting
rudrasohan Mar 17, 2020
d879291
buffer fix and variable error fix
rudrasohan Mar 17, 2020
7d4e9d7
IV COMM working
rudrasohan Mar 18, 2020
41fcccc
removed deprecated examples of V1
rudrasohan Mar 23, 2020
7473967
addressed @Santara's comments
rudrasohan Apr 3, 2020
b022d8c
addressed new comments
rudrasohan Apr 5, 2020
184c660
removed stray comments
rudrasohan Apr 5, 2020
9ca2145
removed configs from init
rudrasohan Apr 19, 2020
e0e6796
updated mult train restore path
rudrasohan Apr 19, 2020
3c73f2d
added print changes
rudrasohan Apr 19, 2020
e9c6a9b
Merge pull request #10 from rudrasohan/mult_train
Santara Apr 19, 2020
45c764a
init work on merge
rudrasohan Oct 16, 2020
abfb42d
random working without traffic
rudrasohan Oct 16, 2020
3f03013
initial traffic random working
rudrasohan Oct 16, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
__pycache__/
*.py[cod]
*$py.class

*.vscode/
# C extensions
*.so

Expand Down
12 changes: 6 additions & 6 deletions MADRaS/__init__.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
"""Env Registration."""
from gym.envs.registration import register

register(
id='Madras-v0',
entry_point='MADRaS.envs:MadrasEnv',
max_episode_steps=10000,
reward_threshold=25.0,
)
# register(
# id='Madras-v0',
# entry_point='MADRaS.envs:MadrasEnv',
# max_episode_steps=10000,
# reward_threshold=25.0,
# )
46 changes: 46 additions & 0 deletions MADRaS/agents/generic/pid_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
import numpy as np
import gym
from MADRaS.envs.gym_madras_v2 import MadrasEnv
import os
import sys
import logging

logging.basicConfig(level=logging.DEBUG)

def test_madras_pid(vel, file_name):
env = MadrasEnv()
for key, val in env.agents.items():
print("Observation Space ", val.observation_space)
print("Obs_dim ", val.obs_dim)
print("Testing reset...")
obs = env.reset()
vel = float(vel)
a = [0.0, vel]
b = [0.1, 0.00]
c = [0.2, -0.2]
print("Initial observation: {}."
" Verify if the number of dimensions is right.".format(obs))
for key, value in obs.items():
print("{}: {}".format(key, len(value)))
print("Testing step...")
running_rew = 0
speeds = []
for t in range(300):
obs, r, done, _ = env.step({"MadrasAgent_0": a})
#print("{}".format(obs))

# a = [0.0, 0.0]
running_rew += r["MadrasAgent_0"]
#print("{}: reward={}, done={}".format(t, running_rew, done))
#logger.info("HELLO")
speeds.append(obs["MadrasAgent_0"][21])
if (done['__all__']):
env.reset()
print(speeds)
np.save(file_name, np.array(speeds))
os.system("pkill torcs")


if __name__=='__main__':
#test_madras_vanilla()
test_madras_pid(sys.argv[1], sys.argv[2])
File renamed without changes.
69 changes: 69 additions & 0 deletions MADRaS/agents/generic/test_environment_v2.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
import numpy as np
import gym
from MADRaS.envs.gym_madras_v2 import MadrasEnv
import os
import sys
import logging

logging.basicConfig(filename='Telemetry.log', level=logging.DEBUG)
# logger = logging.getLogger(__name__)
# logger.setLevel(logging.DEBUG)
# fh = logging.FileHandler('Telemetry.log')
# fh.setLevel(logging.DEBUG)
# logger.addHandler(fh)


def test_madras_vanilla():
env = MadrasEnv()
print("Testing reset...")
obs = env.reset()
print("Initial observation: {}."
" Verify if the number of dimensions {} is right.".format(obs, len(obs)))
print("Testing step...")
a = [0.0, 1.0, -1.0]
a = [0.0, 0.2, 0.0]
b = [0.1, 0.3, 0.0]
c = [0.2, 0.4, 0.0]
for t in range(4000):
obs, r, done, _ = env.step({"MadrasAgent_0": a, "MadrasAgent_1": b, "MadrasAgent_2": c})
if ((t+1)%150 == 0):
a = [0.0, -1.0, 1.0]
print("{}: reward={}, done={}".format(t, r, done))
dones = [x for x in done.values()]
if np.any(dones):
env.reset()
os.system("pkill torcs")


def test_madras_pid():
env = MadrasEnv()
for key, val in env.agents.items():
print("Observation Space ", val.observation_space)
print("Obs_dim ", val.obs_dim)
print("Testing reset...")
obs = env.reset()
a = [0.0, 0.2]
b = [0.1, 0.00]
c = [0.2, -0.2]
print("Initial observation: {}."
" Verify if the number of dimensions is right.".format(obs))
for key, value in obs.items():
print("{}: {}".format(key, len(value)))
print("Testing step...")
running_rew = 0
for t in range(4000):
obs, r, done, _ = env.step({"MadrasAgent_0": a, "MadrasAgent_1": b, "MadrasAgent_2": c})
#print("{}".format(obs))
#if ((t+1)%15 == 0):
# a = [0.0, 0.0]
running_rew += r["MadrasAgent_0"]
#print("{}: reward={}, done={}".format(t, running_rew, done))
#logger.info("HELLO")
if (done['__all__']):
env.reset()
os.system("pkill torcs")


if __name__=='__main__':
test_madras_vanilla()
#test_madras_pid()
File renamed without changes.
File renamed without changes.
File renamed without changes.
101 changes: 101 additions & 0 deletions MADRaS/agents/rllib/train_rllib_multi_agent.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
import ray
import gym
import argparse
from ray.rllib.agents.ppo.ppo import PPOTrainer
from ray.rllib.agents.ppo.ppo_policy import PPOTFPolicy
from ray.tune.logger import pretty_print
from ray.rllib.env.multi_agent_env import MultiAgentEnv
from MADRaS.envs.gym_madras_v2 import MadrasEnv
import logging
import numpy as np
logging.basicConfig(filename='Telemetry.log', level=logging.DEBUG)

class MadrasRllib(MultiAgentEnv, MadrasEnv):
"""
MADRaS rllib Env wrapper.
"""
def __init__(self, *args):
MadrasEnv.__init__(self)

def reset(self):
return MadrasEnv.reset(self)

def step(self, action_dict):
return MadrasEnv.step(self, action_dict)

def on_episode_end(info):
episode = info["episode"]
rewards = episode.agent_rewards
total_episode = episode.total_reward


episode.custom_metrics["agent0/rew_2"] = rewards[('MadrasAgent_0', 'ppo_policy_0')]**2.0
episode.custom_metrics["agent1/rew_2"] = rewards[('MadrasAgent_1', 'ppo_policy_1')]**2.0
episode.custom_metrics["env_rew_2"] = total_episode**2.0

def on_sample_end(info):
print(info.keys())
sample = info["samples"]
print(dir(sample))
splits = sample.policy_batches['ppo_policy_0'].split_by_episode()
print(len(splits))
for split in splits:
print("EPISODE= ",np.sum(split['rewards']))



parser = argparse.ArgumentParser()
parser.add_argument("--num-iters", type=int, default=300)

if __name__ == "__main__":
args = parser.parse_args()
ray.init()

env = MadrasRllib()

obs_spaces, action_spaces = [], []
for agent in env.agents:
obs_spaces.append(env.agents[agent].observation_space)
action_spaces.append(env.agents[agent].action_space)

print(obs_spaces)
print(action_spaces)
policies = {"ppo_policy_{}".format(i) : (PPOTFPolicy, obs_spaces[i], action_spaces[i], {}) for i in range(env.num_agents)}

def policy_mapping_fn(agent_id):
id = agent_id.split("_")[-1]
return "ppo_policy_{}".format(id)

ppo_trainer = PPOTrainer(
env=MadrasRllib,
config={
"eager": False,
"num_workers": 1,
"num_gpus": 0,
"vf_clip_param": 20,
# "sample_batch_size": 20, #set them accordingly
"train_batch_size": 500,
"callbacks": {
"on_episode_end": on_episode_end,
#"on_sample_end": on_sample_end,
},
#"lr": 5e-6,
# "sgd_minibatch_size": 24,
"multiagent": {
"policies": policies,
"policy_mapping_fn": policy_mapping_fn,
},
})

#ppo_trainer.restore("{restore path}")

for i in range(args.num_iters):
print("== Iteration", i, "==")

# improve the PPO policy
if (i % 10 == 0):
checkpoint = ppo_trainer.save()
print("checkpoint saved at", checkpoint)

logging.warning("-- PPO --")
print(pretty_print(ppo_trainer.train()))
2 changes: 1 addition & 1 deletion MADRaS/envs/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
"""Env Import."""
from envs.gym_madras import MadrasEnv
from MADRaS.envs.gym_madras_v2 import MadrasEnv
22 changes: 22 additions & 0 deletions MADRaS/envs/data/communications.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
MadrasAgent_0:
buff_size: 2
vars:
- action
- speedX
- speedY
- track
comms:
- MadrasAgent_1
- MadrasAgent_0


MadrasAgent_1:
buff_size: 2
vars:
- action
- speedX
- speedY
- trackPos
- opponents
comms:
- MadrasAgent_0
Loading