paper edits

JuliaPOMDP · Jul 18, 2024 · 032c5f8 · 032c5f8
1 parent 2a044ae
commit 032c5f8
Show file tree

Hide file tree

Showing 2 changed files with 23 additions and 33 deletions.
diff --git a/paper.bib b/paper.bib
@@ -215,7 +215,7 @@ @misc{Julia
 }
 
 @INPROCEEDINGS{SARSOP,
-    AUTHOR    = {Hanna Kurniawati, David Hsu, Wee Sun Lee},
+    AUTHOR    = {Kurniawati, Hanna and Hsu, David Hsu and Lee, Wee Sun},
     TITLE     = {{SARSOP}: Efficient Point-Based {POMDP} Planning by Approximating Optimally Reachable Belief Spaces},
     BOOKTITLE = {Robotics: {S}cience and {S}ystems},
     YEAR      = {2008},

diff --git a/paper.md b/paper.md
@@ -31,25 +31,21 @@ Partially observable Markov decision processes (POMDPs) are a standard mathemati
 
 ## Research Purpose
 
-[CompressedBeliefMDPs.jl](https://github.com/JuliaPOMDP/CompressedBeliefMDPs.jl) is a Julia package [@Julia] for solving large POMDPs in the POMDPs.jl ecosystem [@POMDPs.jl] with belief compression. It offers a simple interface for effeciently sampling and compressing beliefs and for constructing and solving belief-state MDPs. The package can be used to benchmark techniques for sampling, compressing (dimensionality reduction), and planning. It can also
-solve complex POMDPs to support applications in a variety of domains. 
+[CompressedBeliefMDPs.jl](https://github.com/JuliaPOMDP/CompressedBeliefMDPs.jl) is a Julia package [@Julia] for solving large POMDPs in the POMDPs.jl ecosystem [@POMDPs.jl] with belief compression (described below). It offers a simple interface for efficiently sampling and compressing beliefs and for constructing and solving belief-state MDPs. The package can be used to benchmark techniques for sampling, compressing, and planning. It can also solve complex POMDPs to support applications in a variety of domains. 
 
 ## Relation to Prior Work
 
 ### Other Methods for Solving Large POMDPs
 
-While traditional tabular methods like policy and value iteration scale poorly on real-world POMDPs, there are many modern techniques that are effective at solving large POMDPs such as point-based methods [@PBVI; @perseus; @hsvi; @SARSOP] and online planners [@AEMS; @despot; @mcts; @pomcp; @sunberg2018online]. Belief compression is an effective but often overlooked technique that finds an effecient belief representation during planning.
+While traditional tabular methods like policy and value iteration scale poorly, there are modern methods such as point-based algorithms [@PBVI; @perseus; @hsvi; @SARSOP] and online planners [@AEMS; @despot; @mcts; @pomcp; @sunberg2018online] that perform well on real-world POMDPs in practice. Belief compression is an equally powerful but often overlooked alternative that is especially potent when belief is sparse. 
 
-### Belief Compression
+CompressedBeliefMDPs.jl is a modular generalization of the original algorithm. It can be used independently or in conjunction with other planners. It also supports *both* continuous and discrete state, action, and observation spaces.
 
-CompressedBeliefMDPs.jl abstracts the belief compression algorithm of @Roy into four steps:
+### Belief Compression
 
-1. sample reachable beliefs,
-2. compress the samples,
-3. construct the compressed belief-state MDP, and
-4. solve using an MDP solver.
+CompressedBeliefMDPs.jl abstracts the belief compression algorithm of @Roy into four steps: sampling, compression, construction, and planning. The `Sampler` abstract type handles belief sampling; the `Compressor` abstract type handles belief compression; the `CompressedBeliefMDP` struct handles constructing the compressed belief MDP; and the `CompressedBeliefSolver` and `CompressedBeliefPolicy` structs handle planning in the compressed belief MDP. 
 
-Each step is handled by a struct or abstract type. Sampling (1) is handled by the `Sampler` abstract type; compression (2) by the `Compressor` abstract type; construction (3) by the `CompressedBeliefMDP` struct; and solving (4) by the `CompressedBeliefSolver` and `CompressedBeliefPolicy` structs. In contrast, @Roy use a fixed sampler, compressor, and solver. They use a heuristic controller for sampling beliefs; exponential family principal component analysis with Poisson loss for compression [@EPCA]; and local approximation value iteration for the base solver.
+Our framework is a generalization of the original belief compression algorithm. @Roy uses a heuristic controller for sampling beliefs; exponential family principal component analysis with Poisson loss for compression [@EPCA]; and local approximation value iteration for the base solver. CompressedBeliefMDPs.jl, on the other hand, is a modular framework, meaning that belief compression can be applied with *any* combination of sampler, compressor, and MDP solver.
 
 ### Related Packages
 
@@ -65,53 +61,47 @@ CompressedBeliefMDPs.jl also supports fast *exploratory belief expansion* on POM
 
 The `Compressor` abstract type handles compression in CompressedBeliefMDPs.jl. CompressedBeliefMDPs.jl provides seven off-the-shelf compressors:
 
-1. principal component analysis (PCA) [@PCA],
-2. kernel PCA [@kernelPCA],
-3. probabilistic PCA [@PPCA],
-4. factor analysis [@factor],
+1. Principal component analysis (PCA) [@PCA],
+2. Kernel PCA [@kernelPCA],
+3. Probabilistic PCA [@PPCA],
+4. Factor analysis [@factor],
 5. Isomap [@isomap],
-6. autoencoder [@autoencoder], and
-7. variational auto-encoder (VAE) [@VAE].
+6. Autoencoder [@autoencoder], and
+7. Variational auto-encoder (VAE) [@VAE].
 
 The first four are supported through [MultivariateState.jl](https://juliastats.org/MultivariateStats.jl/stable/); Isomap is supported through [ManifoldLearning.jl](https://wildart.github.io/ManifoldLearning.jl/stable/); and the last two are implemented in Flux.jl [@flux].
 
 # Compressed Belief-State MDPs
 
 ## Definition
 
-### Belief-State MDPs
-
-Recall that any POMDP can be viewed as a belief-state MDP [@belief-state-MDP], where states are beliefs and transitions are belief updates (e.g., with Bayesian or Kalman filters). Formally, a POMDP is a tuple $\langle S, A, T, R, \Omega, O, \gamma \rangle$, where $S$ is the state space, $A$ is the action space, $T: S \times A \times S \to \mathbb{R}$ is the transition model, $R: S \times A \to \mathbb{R}$ is the reward moel, $\Omega$ is the observation space, $O: \Omega \times S \times A \to \mathbb{R}$ is the observation model, and $\gamma \in [0, 1)$ is the discount factor. The POMDP is said to induce the belief-state MDP $\langle B, A, T', R', \gamma \rangle$, where $B$ is the POMDP belief space, $T': B \times A \times B \to \mathbb{R}$ is the belief update model, and $R': B \times A \to \mathbb{R}$ is the reward model. $A$ and $\gamma$ remain the same.
+First, recall that any POMDP can be viewed as a belief-state MDP [@belief-state-MDP], where states are beliefs and transitions are belief updates (e.g., with Bayesian or Kalman filters). Formally, a POMDP is a tuple $\langle S, A, T, R, \Omega, O, \gamma \rangle$, where $S$ is the state space, $A$ is the action space, $T: S \times A \times S \to \mathbb{R}$ is the transition model, $R: S \times A \to \mathbb{R}$ is the reward model, $\Omega$ is the observation space, $O: \Omega \times S \times A \to \mathbb{R}$ is the observation model, and $\gamma \in [0, 1)$ is the discount factor. The POMDP is said to induce the belief-state MDP $\langle B, A, T', R', \gamma \rangle$, where $B$ is the POMDP belief space, $T': B \times A \times B \to \mathbb{R}$ is the belief update model, and $R': B \times A \to \mathbb{R}$ is the reward model. $A$ and $\gamma$ remain the same.
 
-### Compressed Belief-State MDPs
-
-We define the corresponding *compressed belief-state MDP* as $\langle \tilde{B}, A, \tilde{T}, \tilde{R}, \gamma \rangle$ where $\tilde{B}$ is the compressed belief space obtained from the compression $\phi: B \to \tilde{B}$. Then $\tilde{R}(\tilde{b}, a) = R(\phi^{-1}(\tilde{b}), a)$ and $\tilde{T}(\tilde{b}, a, \tilde{b}') = T(\phi^{-1}(\tilde{b}), a, \phi^{-1}(\tilde{b}'))$. When $\phi$ is lossy, $\phi$ may not be invertible. In practice, we circumvent this issue by caching compressions on a first-come-first-serve basis (or under an arbitrary ranking over $B$ if the compression is parallel), so that if $\phi(b_1) = \phi(b_2) = \tilde{b}$ we have $\phi^{-1}(\tilde{b}) = b_1$ if $b_1$ was ranked higher than $b_2$ for $b_1, b_2 \in B$ and $\tilde{b} \in \tilde{B}$.
+We define the corresponding *compressed belief-state MDP* (CBMDP) as $\langle \tilde{B}, A, \tilde{T}, \tilde{R}, \gamma \rangle$ where $\tilde{B}$ is the compressed belief space obtained from the compression $\phi: B \to \tilde{B}$. Then $\tilde{R}(\tilde{b}, a) = R(\phi^{-1}(\tilde{b}), a)$ and $\tilde{T}(\tilde{b}, a, \tilde{b}') = T(\phi^{-1}(\tilde{b}), a, \phi^{-1}(\tilde{b}'))$. When $\phi$ is lossy, $\phi$ may not be invertible. In practice, we circumvent this issue by caching items on a first-come, first-served basis (or under an arbitrary ranking over $B$ if the compression is parallel), so that if $\phi(b_1) = \phi(b_2) = \tilde{b}$ we have $\phi^{-1}(\tilde{b}) = b_1$ if $b_1$ was ranked higher than $b_2$ for $b_1, b_2 \in B$ and $\tilde{b} \in \tilde{B}$.
 
 ## Implementation
 
-We implement compressed belief MDPs with the `CompressedBeliefMDP` struct. `CompressedBeliefMDP` contains a [`GenerativeBeliefMDP`](https://juliapomdp.github.io/POMDPs.jl/latest/POMDPTools/model/#POMDPTools.ModelTools.GenerativeBeliefMDP), a `Compressor`, and a cache $\phi$ that recovers the original belief. The default constructor handles belief sampling, compressor fitting, belief compressing, and cache management.
+The `CompressedBeliefMDP` struct contains a [`GenerativeBeliefMDP`](https://juliapomdp.github.io/POMDPs.jl/latest/POMDPTools/model/#POMDPTools.ModelTools.GenerativeBeliefMDP), a `Compressor`, and a cache $\phi$ that recovers the original belief. The default constructor handles belief sampling, compressor fitting, belief compressing, and cache management. Any POMDPs.jl `Solver` can solve a `CompressedBeliefMDP`.
 
 ```julia
 using POMDPs, POMDPModels, POMDPTools
 using CompressedBeliefMDPs
 
+# construct the CBMDP
 pomdp = BabyPOMDP()
 sampler = BeliefExpansionSampler(pomdp)
 updater = DiscreteUpdater(pomdp)
 compressor = PCACompressor(1)
 cbmdp = CompressedBeliefMDP(pomdp, sampler, updater, compressor)
-```
 
-# Solvers
-
-`CompressedBeliefMDP` can be solved by any POMDPs.jl MDP solver.
-
-```julia
+# solve the CBMDP
 solver = MyMDPSolver()::POMDPs.Solver
 policy = solve(solver, cbmdp)
 ```
 
-For convenience, we also provide `CompressedBeliefSolver` and `CompressedBeliefPolicy` which wrap the entire belief compression pipeline including sampling beliefs and fitting the compressor.
+# Solvers
+
+ `CompressedBeliefSolver` and `CompressedBeliefPolicy` wrap the belief compression pipeline, meaning belief compression can be applied without explicitly constructing a `CompressedBeliefMDP`.
 
 ```julia
 using POMDPs, POMDPModels, POMDPTools
@@ -132,7 +122,7 @@ v = value(policy, s)
 a = action(policy, s)
 ```
 
-Following @Roy, we use local value approximation as our default base solver because it provides an error bound on our value estimate [@error_bound].
+Following @Roy, we use local value approximation as our default base solver, because it bounds the value estimation error [@error_bound].
 
 ```julia
 using POMDPs, POMDPTools, POMDPModels
@@ -143,7 +133,7 @@ solver = CompressedBeliefSolver(pomdp)
 policy = solve(solver, pomdp)
 ```
 
-The generality of the base solver in CompressedBeliefMDPs.jl offers a major improvement over the belief compression of @Roy because it supports continuous state, action, and observation spaces. More details, examples, and instructions on implementing custom components can be found in the [documentation](https://juliapomdp.github.io/CompressedBeliefMDPs.jl/dev/).
+To solve a continuous-space POMDP, simply swap the base solver. More details, examples, and instructions on implementing custom components can be found in the [documentation](https://juliapomdp.github.io/CompressedBeliefMDPs.jl/dev/).
 
 
 # Circular Maze