Skip to content

Latest commit

 

History

History
60 lines (56 loc) · 21 KB

algorithms.md

File metadata and controls

60 lines (56 loc) · 21 KB

Available algorithms

: thoroughly-tested. In many cases, we verified against known values and/or reproduced results from papers.

~: implemented but lightly tested.

X: known problems; please see github issues.

Algorithms Category Reference Status
Information Set Monte Carlo Tree Search (IS-MCTS) Search Cowley et al. '12 ~
Minimax (and Alpha-Beta) Search Search Wikipedia1, Wikipedia2, Knuth and Moore '75
Monte Carlo Tree Search Search Wikipedia, UCT paper, Coulom '06, Cowling et al. survey
Lemke-Howson (via nashpy) Opt. Wikipedia, Shoham & Leyton-Brown '09
ADIDAS Opt. Gemp et al '22 ~
Sequence-form linear programming Opt. Koller, Megiddo, and von Stengel '94,
Shoham & Leyton-Brown '09
Stackelberg equilibrium solver Opt. Conitzer & Sandholm '06 ~
Magnetic Mirror Descent (MMD) with dilated entropy Opt. Sokota et al. '22 ~
Counterfactual Regret Minimization (CFR) Tabular Zinkevich et al '08, Neller & Lanctot '13
CFR against a best responder (CFR-BR) Tabular Johanson et al '12
Exploitability / Best response Tabular Shoham & Leyton-Brown '09
External sampling Monte Carlo CFR Tabular Lanctot et al. '09, Lanctot '13
Fixed Strategy Iteration CFR (FSICFR) Tabular Neller & Hnath '11 ~
Mean-field Ficticious Play for MFG Tabular Perrin et. al. '20 ~
Online Mirror Descent for MFG Tabular Perolat et. al. '21 ~
Munchausen Online Mirror Descent for MFG Tabular Lauriere et. al. '22 ~
Fixed Point for MFG Tabular Huang et. al. '06 ~
Boltzmann Policy Iteration for MFG Tabular Lauriere et. al. '22 ~
Outcome sampling Monte Carlo CFR Tabular Lanctot et al. '09, Lanctot '13
Policy Iteration Tabular Sutton & Barto '18
Q-learning Tabular Sutton & Barto '18
Regret Matching Tabular Hart & Mas-Colell '00
Restricted Nash Response (RNR) Tabular Johanson et al '08 ~
SARSA Tabular Sutton & Barto '18
Value Iteration Tabular Sutton & Barto '18
Advantage Actor-Critic (A2C) RL Mnih et al. '16
Deep Q-networks (DQN) RL Mnih et al. '15
Ephemeral Value Adjustments (EVA) RL Hansen et al. '18 ~
Proximal Policy Optimization (PPO) RL Schulman et al. '18 ~
AlphaZero (C++/LibTorch) MARL Silver et al. '18
AlphaZero (Python/TF) MARL Silver et al. '18
Correlated Q-Learning MARL Greenwald & Hall '03 ~
Asymmetric Q-Learning MARL Kononen '04 ~
Deep CFR MARL Brown et al. '18
Exploitability Descent (ED) MARL Lockhart et al. '19
(Extensive-form) Fictitious Play (XFP) MARL Heinrich, Lanctot, & Silver '15
Nash Q-Learning MARL Hu & Wellman '03 ~
Neural Fictitious Self-Play (NFSP) MARL Heinrich & Silver '16
Neural Replicator Dynamics (NeuRD) MARL Omidshafiei, Hennes, Morrill, et al. '19 X
Regret Policy Gradients (RPG, RMPG) MARL Srinivasan, Lanctot, et al. '18
Policy-Space Response Oracles (PSRO) MARL Lanctot et al. '17
Q-based ("all-actions") Policy Gradient (QPG) MARL Srinivasan, Lanctot, et al. '18
Regularized Nash Dynamics (R-NaD) MARL Perolat, De Vylder, et al. '22
Regression CFR (RCFR) MARL Waugh et al. '15, Morrill '16
Rectified Nash Response (PSRO_rn) MARL Balduzzi et al. '19 ~
Win-or-Learn-Fast Policy-Hill Climbing (WoLF-PHC) MARL Bowling & Veloso '02 ~
α-Rank Eval. / Viz. Omidhsafiei et al. '19, arXiv
Nash Averaging Eval. / Viz. Balduzzi et al. '18 ~
Replicator / Evolutionary Dynamics Eval. / Viz. Hofbaeur & Sigmund '98, Sandholm '10