Skip to content

Commit

Permalink
paper edits
Browse files Browse the repository at this point in the history
  • Loading branch information
FlyingWorkshop committed Jul 18, 2024
1 parent 2a044ae commit 032c5f8
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 33 deletions.
2 changes: 1 addition & 1 deletion paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -215,7 +215,7 @@ @misc{Julia
}

@INPROCEEDINGS{SARSOP,
AUTHOR = {Hanna Kurniawati, David Hsu, Wee Sun Lee},
AUTHOR = {Kurniawati, Hanna and Hsu, David Hsu and Lee, Wee Sun},
TITLE = {{SARSOP}: Efficient Point-Based {POMDP} Planning by Approximating Optimally Reachable Belief Spaces},
BOOKTITLE = {Robotics: {S}cience and {S}ystems},
YEAR = {2008},
Expand Down
54 changes: 22 additions & 32 deletions paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,25 +31,21 @@ Partially observable Markov decision processes (POMDPs) are a standard mathemati

## Research Purpose

[CompressedBeliefMDPs.jl](https://github.com/JuliaPOMDP/CompressedBeliefMDPs.jl) is a Julia package [@Julia] for solving large POMDPs in the POMDPs.jl ecosystem [@POMDPs.jl] with belief compression. It offers a simple interface for effeciently sampling and compressing beliefs and for constructing and solving belief-state MDPs. The package can be used to benchmark techniques for sampling, compressing (dimensionality reduction), and planning. It can also
solve complex POMDPs to support applications in a variety of domains.
[CompressedBeliefMDPs.jl](https://github.com/JuliaPOMDP/CompressedBeliefMDPs.jl) is a Julia package [@Julia] for solving large POMDPs in the POMDPs.jl ecosystem [@POMDPs.jl] with belief compression (described below). It offers a simple interface for efficiently sampling and compressing beliefs and for constructing and solving belief-state MDPs. The package can be used to benchmark techniques for sampling, compressing, and planning. It can also solve complex POMDPs to support applications in a variety of domains.

## Relation to Prior Work

### Other Methods for Solving Large POMDPs

While traditional tabular methods like policy and value iteration scale poorly on real-world POMDPs, there are many modern techniques that are effective at solving large POMDPs such as point-based methods [@PBVI; @perseus; @hsvi; @SARSOP] and online planners [@AEMS; @despot; @mcts; @pomcp; @sunberg2018online]. Belief compression is an effective but often overlooked technique that finds an effecient belief representation during planning.
While traditional tabular methods like policy and value iteration scale poorly, there are modern methods such as point-based algorithms [@PBVI; @perseus; @hsvi; @SARSOP] and online planners [@AEMS; @despot; @mcts; @pomcp; @sunberg2018online] that perform well on real-world POMDPs in practice. Belief compression is an equally powerful but often overlooked alternative that is especially potent when belief is sparse.

### Belief Compression
CompressedBeliefMDPs.jl is a modular generalization of the original algorithm. It can be used independently or in conjunction with other planners. It also supports *both* continuous and discrete state, action, and observation spaces.

CompressedBeliefMDPs.jl abstracts the belief compression algorithm of @Roy into four steps:
### Belief Compression

1. sample reachable beliefs,
2. compress the samples,
3. construct the compressed belief-state MDP, and
4. solve using an MDP solver.
CompressedBeliefMDPs.jl abstracts the belief compression algorithm of @Roy into four steps: sampling, compression, construction, and planning. The `Sampler` abstract type handles belief sampling; the `Compressor` abstract type handles belief compression; the `CompressedBeliefMDP` struct handles constructing the compressed belief MDP; and the `CompressedBeliefSolver` and `CompressedBeliefPolicy` structs handle planning in the compressed belief MDP.

Each step is handled by a struct or abstract type. Sampling (1) is handled by the `Sampler` abstract type; compression (2) by the `Compressor` abstract type; construction (3) by the `CompressedBeliefMDP` struct; and solving (4) by the `CompressedBeliefSolver` and `CompressedBeliefPolicy` structs. In contrast, @Roy use a fixed sampler, compressor, and solver. They use a heuristic controller for sampling beliefs; exponential family principal component analysis with Poisson loss for compression [@EPCA]; and local approximation value iteration for the base solver.
Our framework is a generalization of the original belief compression algorithm. @Roy uses a heuristic controller for sampling beliefs; exponential family principal component analysis with Poisson loss for compression [@EPCA]; and local approximation value iteration for the base solver. CompressedBeliefMDPs.jl, on the other hand, is a modular framework, meaning that belief compression can be applied with *any* combination of sampler, compressor, and MDP solver.

### Related Packages

Expand All @@ -65,53 +61,47 @@ CompressedBeliefMDPs.jl also supports fast *exploratory belief expansion* on POM

The `Compressor` abstract type handles compression in CompressedBeliefMDPs.jl. CompressedBeliefMDPs.jl provides seven off-the-shelf compressors:

1. principal component analysis (PCA) [@PCA],
2. kernel PCA [@kernelPCA],
3. probabilistic PCA [@PPCA],
4. factor analysis [@factor],
1. Principal component analysis (PCA) [@PCA],
2. Kernel PCA [@kernelPCA],
3. Probabilistic PCA [@PPCA],
4. Factor analysis [@factor],
5. Isomap [@isomap],
6. autoencoder [@autoencoder], and
7. variational auto-encoder (VAE) [@VAE].
6. Autoencoder [@autoencoder], and
7. Variational auto-encoder (VAE) [@VAE].

The first four are supported through [MultivariateState.jl](https://juliastats.org/MultivariateStats.jl/stable/); Isomap is supported through [ManifoldLearning.jl](https://wildart.github.io/ManifoldLearning.jl/stable/); and the last two are implemented in Flux.jl [@flux].

# Compressed Belief-State MDPs

## Definition

### Belief-State MDPs

Recall that any POMDP can be viewed as a belief-state MDP [@belief-state-MDP], where states are beliefs and transitions are belief updates (e.g., with Bayesian or Kalman filters). Formally, a POMDP is a tuple $\langle S, A, T, R, \Omega, O, \gamma \rangle$, where $S$ is the state space, $A$ is the action space, $T: S \times A \times S \to \mathbb{R}$ is the transition model, $R: S \times A \to \mathbb{R}$ is the reward moel, $\Omega$ is the observation space, $O: \Omega \times S \times A \to \mathbb{R}$ is the observation model, and $\gamma \in [0, 1)$ is the discount factor. The POMDP is said to induce the belief-state MDP $\langle B, A, T', R', \gamma \rangle$, where $B$ is the POMDP belief space, $T': B \times A \times B \to \mathbb{R}$ is the belief update model, and $R': B \times A \to \mathbb{R}$ is the reward model. $A$ and $\gamma$ remain the same.
First, recall that any POMDP can be viewed as a belief-state MDP [@belief-state-MDP], where states are beliefs and transitions are belief updates (e.g., with Bayesian or Kalman filters). Formally, a POMDP is a tuple $\langle S, A, T, R, \Omega, O, \gamma \rangle$, where $S$ is the state space, $A$ is the action space, $T: S \times A \times S \to \mathbb{R}$ is the transition model, $R: S \times A \to \mathbb{R}$ is the reward model, $\Omega$ is the observation space, $O: \Omega \times S \times A \to \mathbb{R}$ is the observation model, and $\gamma \in [0, 1)$ is the discount factor. The POMDP is said to induce the belief-state MDP $\langle B, A, T', R', \gamma \rangle$, where $B$ is the POMDP belief space, $T': B \times A \times B \to \mathbb{R}$ is the belief update model, and $R': B \times A \to \mathbb{R}$ is the reward model. $A$ and $\gamma$ remain the same.

### Compressed Belief-State MDPs

We define the corresponding *compressed belief-state MDP* as $\langle \tilde{B}, A, \tilde{T}, \tilde{R}, \gamma \rangle$ where $\tilde{B}$ is the compressed belief space obtained from the compression $\phi: B \to \tilde{B}$. Then $\tilde{R}(\tilde{b}, a) = R(\phi^{-1}(\tilde{b}), a)$ and $\tilde{T}(\tilde{b}, a, \tilde{b}') = T(\phi^{-1}(\tilde{b}), a, \phi^{-1}(\tilde{b}'))$. When $\phi$ is lossy, $\phi$ may not be invertible. In practice, we circumvent this issue by caching compressions on a first-come-first-serve basis (or under an arbitrary ranking over $B$ if the compression is parallel), so that if $\phi(b_1) = \phi(b_2) = \tilde{b}$ we have $\phi^{-1}(\tilde{b}) = b_1$ if $b_1$ was ranked higher than $b_2$ for $b_1, b_2 \in B$ and $\tilde{b} \in \tilde{B}$.
We define the corresponding *compressed belief-state MDP* (CBMDP) as $\langle \tilde{B}, A, \tilde{T}, \tilde{R}, \gamma \rangle$ where $\tilde{B}$ is the compressed belief space obtained from the compression $\phi: B \to \tilde{B}$. Then $\tilde{R}(\tilde{b}, a) = R(\phi^{-1}(\tilde{b}), a)$ and $\tilde{T}(\tilde{b}, a, \tilde{b}') = T(\phi^{-1}(\tilde{b}), a, \phi^{-1}(\tilde{b}'))$. When $\phi$ is lossy, $\phi$ may not be invertible. In practice, we circumvent this issue by caching items on a first-come, first-served basis (or under an arbitrary ranking over $B$ if the compression is parallel), so that if $\phi(b_1) = \phi(b_2) = \tilde{b}$ we have $\phi^{-1}(\tilde{b}) = b_1$ if $b_1$ was ranked higher than $b_2$ for $b_1, b_2 \in B$ and $\tilde{b} \in \tilde{B}$.

## Implementation

We implement compressed belief MDPs with the `CompressedBeliefMDP` struct. `CompressedBeliefMDP` contains a [`GenerativeBeliefMDP`](https://juliapomdp.github.io/POMDPs.jl/latest/POMDPTools/model/#POMDPTools.ModelTools.GenerativeBeliefMDP), a `Compressor`, and a cache $\phi$ that recovers the original belief. The default constructor handles belief sampling, compressor fitting, belief compressing, and cache management.
The `CompressedBeliefMDP` struct contains a [`GenerativeBeliefMDP`](https://juliapomdp.github.io/POMDPs.jl/latest/POMDPTools/model/#POMDPTools.ModelTools.GenerativeBeliefMDP), a `Compressor`, and a cache $\phi$ that recovers the original belief. The default constructor handles belief sampling, compressor fitting, belief compressing, and cache management. Any POMDPs.jl `Solver` can solve a `CompressedBeliefMDP`.

```julia
using POMDPs, POMDPModels, POMDPTools
using CompressedBeliefMDPs

# construct the CBMDP
pomdp = BabyPOMDP()
sampler = BeliefExpansionSampler(pomdp)
updater = DiscreteUpdater(pomdp)
compressor = PCACompressor(1)
cbmdp = CompressedBeliefMDP(pomdp, sampler, updater, compressor)
```

# Solvers

`CompressedBeliefMDP` can be solved by any POMDPs.jl MDP solver.

```julia
# solve the CBMDP
solver = MyMDPSolver()::POMDPs.Solver
policy = solve(solver, cbmdp)
```

For convenience, we also provide `CompressedBeliefSolver` and `CompressedBeliefPolicy` which wrap the entire belief compression pipeline including sampling beliefs and fitting the compressor.
# Solvers

`CompressedBeliefSolver` and `CompressedBeliefPolicy` wrap the belief compression pipeline, meaning belief compression can be applied without explicitly constructing a `CompressedBeliefMDP`.

```julia
using POMDPs, POMDPModels, POMDPTools
Expand All @@ -132,7 +122,7 @@ v = value(policy, s)
a = action(policy, s)
```

Following @Roy, we use local value approximation as our default base solver because it provides an error bound on our value estimate [@error_bound].
Following @Roy, we use local value approximation as our default base solver, because it bounds the value estimation error [@error_bound].

```julia
using POMDPs, POMDPTools, POMDPModels
Expand All @@ -143,7 +133,7 @@ solver = CompressedBeliefSolver(pomdp)
policy = solve(solver, pomdp)
```

The generality of the base solver in CompressedBeliefMDPs.jl offers a major improvement over the belief compression of @Roy because it supports continuous state, action, and observation spaces. More details, examples, and instructions on implementing custom components can be found in the [documentation](https://juliapomdp.github.io/CompressedBeliefMDPs.jl/dev/).
To solve a continuous-space POMDP, simply swap the base solver. More details, examples, and instructions on implementing custom components can be found in the [documentation](https://juliapomdp.github.io/CompressedBeliefMDPs.jl/dev/).


# Circular Maze
Expand Down

0 comments on commit 032c5f8

Please sign in to comment.