Jupyter notebooks and markdown exercise solutions of "Reinforcement Learning: An Introduction", Richard S. Sutton and Andrew G. Barto book.
Completion status:
Chapter | Done | |
---|---|---|
✅ | 1: Introduction | 5/5 |
2: Multi-armed Bandits | 7/10 | |
3: Finite Markov Decision Processes | 14/29 | |
4: Dynamic Programming | 8/10 | |
5: Monte Carlo Methods | 8/14 | |
6: Temporal-Difference Learning | 1/14 | |
7: n-step Bootstrapping | 0/10 | |
8: Planning and Learning with Tabular Methods | 4/8 | |
9: On-policy Prediction with Approximation | 1/8 |
New solutions should be submitted through pull requests, base file formats for markdown and notebooks are available at the stubs folder.
Files should be placed on the proper chapter folder following the naming scheme for markdown and Jupyter notebooks.
Ex_X.XX.md
Ex_X.XX.ipynb