Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lola #11

Open
wants to merge 36 commits into
base: main
Choose a base branch
from
Open

Add lola #11

wants to merge 36 commits into from

Conversation

newtonkwan
Copy link
Contributor

@newtonkwan newtonkwan commented Jul 5, 2022

Adds LOLA to the set of strategies.
Goal: LOLA plays against LOLA and shapes the opponent to learn.

  • add argument agent_states to PPO, PPO_gru, DQN update() function
  • Add offline actor critic naive learner with policy gradient (Foerster 2017) with simple replay experience buffer. EDIT: Our NL uses advantage instead of a baseline.
  • Naive learner learns to defect against on another, which reproduces the findings of the LOLA paper
  • Add LOLA-DiCE implementation
  • pull in new runner and implement refactored lola.
  • Get LOLA to shape NL.

@newtonkwan newtonkwan requested a review from akbir July 5, 2022 15:35
pax/watchers.py Outdated Show resolved Hide resolved
pax/ppo/ppo_gru.py Outdated Show resolved Hide resolved
pax/watchers.py Outdated Show resolved Hide resolved
pax/ppo/networks.py Outdated Show resolved Hide resolved
pax/ppo/ppo.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants