This software contains:
- A Python implementation of the Fourier Basis, in
fourier_fa.py
. - An example Sarsa(λ) with linear function approximation implementation, in
sarsa_lambda_linear.py
. - Two example RLGym agents that use Sarsa(λ) with the Fourier basis, in
mc_sarsa_lambda.py
(Mountain Car) andacro_sarsa_lambda.py
(Acrobot).
The first two modules require NumPy, and the two example agents require both Matplotlib and RLGym (tested on RLGym 0.18.0).
Author: George Konidaris, Brown University
License: GPL 3
The Fourier basis is a linear value function approximation scheme for reinforcement learning tasks with continuous state spaces, where the value function V(s) is approximated as a weighted sum of Fourier basis functions φ:
V(s) = Σi wi φi(s).
A similar scheme is used to approximate the Q-function Q(s, a).
The above equation is linear in the weights wi, which are set via learning. The Fourier basis uses Fourier terms for the basis functions φi:
φi(s) = cos(π ci . s),
where each ci is a coefficient vector (of the same size as the state vector s), each element of which is an integer between 0 and some fixed upper bound, called the order. The basis functions are constructed by generating all such coefficient vectors. For state dimension d and order n, there are (n+1)d such coefficient vectors, which makes the Fourier basis (like all linear bases) suitable only for small problems (typically d <= 7).
Note that s must be scaled so that each of its elements lies in [0, 1].
This code implements the Fourier basis as originally described in:
G.D. Konidaris, S. Osentoski, and P.S. Thomas. Value Function Approximation in Reinforcement Learning using the Fourier Basis. In Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, pages 380-385, August 2011.
If you use the Fourier basis in a research article, please cite the original paper:
@inproceedings{FourierAAAI2011,
author = "G.D. Konidaris and S. Osentoski and P.S. Thomas",
title = "Value Function Approximation in Reinforcement Learning
using the {F}ourier Basis",
booktitle = "Proceedings of the Twenty-Fifth Conference on
Artificial Intelligence",
year = 2011,
pages = "380-385"
}
The FourierBasis class provides methods for constructing and using a Fourier basis.
To construct a Fourier basis, you can supply an order and a dimension. For example:
fb = FourierBasis(order=3, d=2)
... creates a basis of order 3, for a 2-dimensional state space. Therefore:
print(fb.get_coefficients())
outputs:
[[0 0]
[0 1]
[0 2]
[0 3]
[1 0]
[1 1]
[1 2]
[1 3]
[2 0]
[2 1]
[2 2]
[2 3]
[3 0]
[3 1]
[3 2]
[3 3]]
You can also ask for a specific coefficient:
print(fb.get_coefficient(3))
returns
[0 3]
Alternatively, for finer grained control over the basis you can supply a list of orders, and the dimension will be set to the length of the list:
fb = fourier_fa.FourierBasis(order=[3, 1])
... creates a Fourier basis up to order 3 on the first state variable, and up to order 1 on the second state variable. Now
print(fb.get_coefficients())
returns
[[0 0]
[0 1]
[1 0]
[1 1]
[2 0]
[2 1]
[3 0]
[3 1]]
Additionally, in either case you can use the independent
boolean parameter (which defaults to False
) to construct a basis with independent Fourier coefficients (i.e., where only one coefficient is non-zero at a time):
fb = FourierBasis(order=3, d=2, independent=True)
print(fb.coefficients)
... outputs:
[[0. 0.]
[1. 0.]
[2. 0.]
[3. 0.]
[0. 1.]
[0. 2.]
[0. 3.]]
An independent basis is much smaller that a dependent basis, but is a much more restricted function approximation class.
A Fourier basis instance can be evaluated by passing a state into the 'evaluate' function as follows:
fb = FourierBasis(order=1, d=2)
fb.evaluate([0.2, 0.7])
... returns a numpy array with a real value for each basis function (i.e., coefficient). In this case, the array contains 4 values:
array([ 1. , -0.58778525, 0.80901699, -0.95105652])
These outputs are intended to serve as the feature values for a linear function approximator (i.e., multiplied by weights and then summed).
When using the Fourier basis in a gradient descent framework, the original paper suggests scaling the gradients by one over the Euclidean norm of the coefficient vector (setting the all-zero coefficient factor to 1). These factors can be accessed using get_gradient_factors
and get_gradient_factor
.
The SarsaLambdaLinearFA
class is a simple linear online Sarsa(λ) implementation; example domain implementations for Acrobot (acro_sarsa_lambda.py
) and Mountain Car (mc_sarsa_lambda.py
) for reference.