diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index 9a9a143..8d5ac21 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.10.4","generation_timestamp":"2024-07-24T22:30:49","documenter_version":"1.3.0"}} \ No newline at end of file +{"documenter":{"julia_version":"1.10.4","generation_timestamp":"2024-08-01T17:43:35","documenter_version":"1.3.0"}} \ No newline at end of file diff --git a/dev/api/index.html b/dev/api/index.html index ddbaade..d910c19 100644 --- a/dev/api/index.html +++ b/dev/api/index.html @@ -1,12 +1,12 @@ -API Documentation · CompressedBeliefMDPs

API Documentation

Contents

Index

Types/Functors

CompressedBeliefMDPs.CompressedBeliefMDPType
CompressedBeliefMDP{B, A}

The CompressedBeliefMDP struct is a generalization of the compressed belief-state MDP presented in Exponential Family PCA for Belief Compression in POMDPs.

Type Parameters

  • B: The type of compressed belief states.
  • A: The type of actions.

Fields

  • bmdp::GenerativeBeliefMDP: The generative belief-state MDP.
  • compressor::Compressor: The compressor used to compress belief states.
  • ϕ::Bijection: A bijection representing the mapping from uncompressed belief states to compressed belief states. See notes.

Constructors

CompressedBeliefMDP(pomdp::POMDP, updater::Updater, compressor::Compressor)
+API Documentation · CompressedBeliefMDPs

API Documentation

Contents

Index

Types/Functors

CompressedBeliefMDPs.CompressedBeliefMDPType
CompressedBeliefMDP{B, A}

The CompressedBeliefMDP struct is a generalization of the compressed belief-state MDP presented in Exponential Family PCA for Belief Compression in POMDPs.

Type Parameters

  • B: The type of compressed belief states.
  • A: The type of actions.

Fields

  • bmdp::GenerativeBeliefMDP: The generative belief-state MDP.
  • compressor::Compressor: The compressor used to compress belief states.
  • ϕ::Bijection: A bijection representing the mapping from uncompressed belief states to compressed belief states. See notes.

Constructors

CompressedBeliefMDP(pomdp::POMDP, updater::Updater, compressor::Compressor)
 CompressedBeliefMDP(pomdp::POMDP, sampler::Sampler, updater::Updater, compressor::Compressor)

Constructs a CompressedBeliefMDP using the specified POMDP, updater, and compressor.

Warning

The 4-argument constructor is a quality-of-life constructor that calls fit! on the given compressor.

Example Usage

pomdp = TigerPOMDP()
 updater = DiscreteUpdater(pomdp)
 compressor = PCACompressor(1)
-mdp = CompressedBeliefMDP(pomdp, updater, compressor)

For continuous POMDPs, see ParticleFilters.jl.

Notes

  • While compressions aren't usually injective, we cache beliefs and their compressions on a first-come, first-served basis, so we can effectively use a bijection without loss of generality.
source
CompressedBeliefMDPs.CompressedBeliefPolicyType
CompressedBeliefPolicy

Maps a base policy for the compressed belief-state MDP to a policy for the true POMDP.

Fields

  • m::CompressedBeliefMDP: The compressed belief-state MDP.
  • base_policy::Policy: The base policy used for decision-making in the compressed belief-state MDP.

Constructors

CompressedBeliefPolicy(m::CompressedBeliefMDP, base_policy::Policy)

Constructs a CompressedBeliefPolicy using the specified compressed belief-state MDP and base policy.

Example Usage

policy = solve(solver, pomdp)
+mdp = CompressedBeliefMDP(pomdp, updater, compressor)

For continuous POMDPs, see ParticleFilters.jl.

Notes

  • While compressions aren't usually injective, we cache beliefs and their compressions on a first-come, first-served basis, so we can effectively use a bijection without loss of generality.
source
CompressedBeliefMDPs.CompressedBeliefPolicyType
CompressedBeliefPolicy

Maps a base policy for the compressed belief-state MDP to a policy for the true POMDP.

Fields

  • m::CompressedBeliefMDP: The compressed belief-state MDP.
  • base_policy::Policy: The base policy used for decision-making in the compressed belief-state MDP.

Constructors

CompressedBeliefPolicy(m::CompressedBeliefMDP, base_policy::Policy)

Constructs a CompressedBeliefPolicy using the specified compressed belief-state MDP and base policy.

Example Usage

policy = solve(solver, pomdp)
 s = initialstate(pomdp)
 a = action(policy, s) # returns the approximately optimal action for state s
-v = value(policy, s)  # returns the approximately optimal value for state s
source
CompressedBeliefMDPs.CompressedBeliefSolverType
CompressedBeliefSolver

The CompressedBeliefSolver struct represents a solver for compressed belief-state MDPs. It combines a compressed belief-state MDP with a base solver to approximate the value function.

Fields

  • m::CompressedBeliefMDP: The compressed belief-state MDP.
  • base_solver::Solver: The base solver used to solve the compressed belief-state MDP.

Constructors

CompressedBeliefSolver(pomdp::POMDP, base_solver::Solver; updater::Updater=DiscreteUpdater(pomdp), sampler::Sampler=BeliefExpansionSampler(pomdp), compressor::Compressor=PCACompressor(1))
+v = value(policy, s)  # returns the approximately optimal value for state s
source
CompressedBeliefMDPs.CompressedBeliefSolverType
CompressedBeliefSolver

The CompressedBeliefSolver struct represents a solver for compressed belief-state MDPs. It combines a compressed belief-state MDP with a base solver to approximate the value function.

Fields

  • m::CompressedBeliefMDP: The compressed belief-state MDP.
  • base_solver::Solver: The base solver used to solve the compressed belief-state MDP.

Constructors

CompressedBeliefSolver(pomdp::POMDP, base_solver::Solver; updater::Updater=DiscreteUpdater(pomdp), sampler::Sampler=BeliefExpansionSampler(pomdp), compressor::Compressor=PCACompressor(1))
 CompressedBeliefSolver(pomdp::POMDP; updater::Updater=DiscreteUpdater(pomdp), sampler::Sampler=BeliefExpansionSampler(pomdp), compressor::Compressor=PCACompressor(1), interp::Union{Nothing, LocalFunctionApproximator}=nothing, k::Int=1, verbose::Bool=false, max_iterations::Int=1000, n_generative_samples::Int=10, belres::Float64=1e-3)

Constructs a CompressedBeliefSolver using the specified POMDP, base solver, updater, sampler, and compressor. Alternatively, you can omit the base solver in which case a LocalApproximationValueIterationSolver(https://github.com/JuliaPOMDP/LocalApproximationValueIteration.jl) will be created instead. For example, different base solvers are needed if the POMDP state and action space are continuous.

Example Usage

julia> pomdp = TigerPOMDP();
 julia> solver = CompressedBeliefSolver(pomdp; verbose=true, max_iterations=10);
 julia> solve(solver, pomdp);
@@ -19,7 +19,7 @@
 [Iteration 7   ] residual:       6.03 | iteration runtime:      0.495 ms, (     0.639 s total)
 [Iteration 8   ] residual:       5.73 | iteration runtime:      0.585 ms, (     0.639 s total)
 [Iteration 9   ] residual:       4.02 | iteration runtime:      0.463 ms, (      0.64 s total)
-[Iteration 10  ] residual:       7.28 | iteration runtime:      0.576 ms, (      0.64 s total)
source

Functions

CompressedBeliefMDPs.make_cacheFunction
make_cache(B, B̃)

Helper function that creates a cache that maps each unique belief from the set B to its corresponding compressed representation in .

Arguments

  • B::Vector{<:Any}: A vector of beliefs.
  • B̃::Matrix{Float64}: A matrix where each row corresponds to the compressed representation of the beliefs in B.

Returns

  • Dict{<:Any, Vector{Float64}}: A dictionary mapping each unique belief in B to its corresponding compressed representation in .

Example Usage

B = [belief1, belief2, belief3]
+[Iteration 10  ] residual:       7.28 | iteration runtime:      0.576 ms, (      0.64 s total)
source

Functions

CompressedBeliefMDPs.make_cacheFunction
make_cache(B, B̃)

Helper function that creates a cache that maps each unique belief from the set B to its corresponding compressed representation in .

Arguments

  • B::Vector{<:Any}: A vector of beliefs.
  • B̃::Matrix{Float64}: A matrix where each row corresponds to the compressed representation of the beliefs in B.

Returns

  • Dict{<:Any, Vector{Float64}}: A dictionary mapping each unique belief in B to its corresponding compressed representation in .

Example Usage

B = [belief1, belief2, belief3]
 B̃ = [compressed_belief1; compressed_belief2; compressed_belief3]
-ϕ = make_cache(B, B̃)
source
CompressedBeliefMDPs.make_numericalFunction
make_numerical(B, pomdp)

Helper function that converts a set of beliefs B into a numerical matrix representation suitable for processing by numerical algorithms/compressors.

Arguments

  • B::Vector{<:Any}: A vector of beliefs.
  • pomdp::POMDP: The POMDP model associated with the beliefs.

Returns

  • Matrix{Float64}: A matrix where each row corresponds to a numerical representation of a belief in B.

Example Usage

B = [belief1, belief2, belief3]
-B_numerical = make_numerical(B, pomdp)
source
CompressedBeliefMDPs.compress_POMDPFunction
compress_POMDP(pomdp, sampler, updater, compressor)

Creates a compressed belief-state MDP by sampling, compressing, and caching beliefs from the given POMDP.

Arguments

  • pomdp::POMDP: The POMDP model to be compressed.
  • sampler::Sampler: A sampler to generate a set of beliefs from the POMDP.
  • updater::Updater: An updater to initialize beliefs from states.
  • compressor::Compressor: A compressor to reduce the dimensionality of the beliefs.

Returns

  • CompressedBeliefMDP: The constructed compressed belief-state MDP.
  • Matrix{Float64}: A matrix where each row corresponds to the compressed representation of the sampled beliefs.

Example Usage

```julia pomdp = TigerPOMDP() sampler = BeliefExpansionSampler(pomdp) updater = DiscreteUpdater(pomdp) compressor = PCACompressor(2) m, B̃ = compress_POMDP(pomdp, sampler, updater, compressor)

source
+ϕ = make_cache(B, B̃)
source
CompressedBeliefMDPs.make_numericalFunction
make_numerical(B, pomdp)

Helper function that converts a set of beliefs B into a numerical matrix representation suitable for processing by numerical algorithms/compressors.

Arguments

  • B::Vector{<:Any}: A vector of beliefs.
  • pomdp::POMDP: The POMDP model associated with the beliefs.

Returns

  • Matrix{Float64}: A matrix where each row corresponds to a numerical representation of a belief in B.

Example Usage

B = [belief1, belief2, belief3]
+B_numerical = make_numerical(B, pomdp)
source
CompressedBeliefMDPs.compress_POMDPFunction
compress_POMDP(pomdp, sampler, updater, compressor)

Creates a compressed belief-state MDP by sampling, compressing, and caching beliefs from the given POMDP.

Arguments

  • pomdp::POMDP: The POMDP model to be compressed.
  • sampler::Sampler: A sampler to generate a set of beliefs from the POMDP.
  • updater::Updater: An updater to initialize beliefs from states.
  • compressor::Compressor: A compressor to reduce the dimensionality of the beliefs.

Returns

  • CompressedBeliefMDP: The constructed compressed belief-state MDP.
  • Matrix{Float64}: A matrix where each row corresponds to the compressed representation of the sampled beliefs.

Example Usage

```julia pomdp = TigerPOMDP() sampler = BeliefExpansionSampler(pomdp) updater = DiscreteUpdater(pomdp) compressor = PCACompressor(2) m, B̃ = compress_POMDP(pomdp, sampler, updater, compressor)

source
diff --git a/dev/circular/index.html b/dev/circular/index.html index 1d5aa53..5b39c01 100644 --- a/dev/circular/index.html +++ b/dev/circular/index.html @@ -1,8 +1,8 @@ -Environments · CompressedBeliefMDPs

Circular Maze

Description

This environment is a generalization of the Circular Maze POMDP described in Finding Approximate POMDP solutions Through Belief Compression.[1] The world consists of n_corridor 1D circular corridors that each have corridor_length states. The robot spawns in a random corridor. It must determine which corridor its in, navigate to the proper goal state, and finally declare that it has finished.

*Figure from Finding Approximate POMDP solutions Through Belief Compression.

Action Space

Transitions left and right are noisy and non-deterministic. Transition probabilities are from a discrete von Mises distribution with unit concentration and mean at the target state.

NumActionDescription
1CMAZE_LEFTMove left with von Mises noise.
2CMAZE_RIGHTMove right with von Mises noise.
3CMAZE_SENSE_CORRIDORObserve the current corridor.
4CMAZE_DECLARE_GOALEnds the episode. Receive r_findgoal if at the goal.

State Space

The (ordered) state space is an array of all CircularMazeStates and a terminalstate: [CircularMaze(1, 1), ..., CircularMaze(n_corridors, corridor_length), TerminalState()].

Observation Space

The observation space is the union of the state space and 1:n_corridors. If the robot picks CMAZE_SENSE_CORRIDOR, they observe the index of the current corridor. Otherwise, they observe their current state with von Mises noise.

Rewards

The goal is to navigate to the correct goal state for the given corridor and then to declare the goal once arrived. If the robot correctly declares the goal, it receives r_findgoal. It incurs a r_timestep_penalty for every timestep it does not reach the goal. By default r_findgoal is 1 and r_timestep_penalty is 0.

Starting State

The initial state is sampled from a repeated, discrete von Mises distribution each with a concentration at the center of the hallway.

Episode End

The episode terminates once the robot declares the goal CMAZE_DECLARE_GOAL regardless of whether the robot is correct.

Documentation

CompressedBeliefMDPs.CircularMazeType
CircularMaze(n_corridors::Integer, corridor_length::Integer, discount::Float64, r_findgoal::Float64, r_timestep_penalty::Float64)
+Environments · CompressedBeliefMDPs

Circular Maze

Description

This environment is a generalization of the Circular Maze POMDP described in Finding Approximate POMDP solutions Through Belief Compression.[1] The world consists of n_corridor 1D circular corridors that each have corridor_length states. The robot spawns in a random corridor. It must determine which corridor its in, navigate to the proper goal state, and finally declare that it has finished.

Figure from Finding Approximate POMDP solutions Through Belief Compression.

Action Space

Transitions left and right are noisy and non-deterministic. Transition probabilities are from a discrete von Mises distribution with unit concentration and mean at the target state.

NumActionDescription
1CMAZE_LEFTMove left with von Mises noise.
2CMAZE_RIGHTMove right with von Mises noise.
3CMAZE_SENSE_CORRIDORObserve the current corridor.
4CMAZE_DECLARE_GOALEnds the episode. Receive r_findgoal if at the goal.

State Space

The (ordered) state space is an array of all CircularMazeStates and a terminalstate: [CircularMaze(1, 1), ..., CircularMaze(n_corridors, corridor_length), TerminalState()].

Observation Space

The observation space is the union of the state space and 1:n_corridors. If the robot picks CMAZE_SENSE_CORRIDOR, they observe the index of the current corridor. Otherwise, they observe their current state with von Mises noise.

Rewards

The goal is to navigate to the correct goal state for the given corridor and then to declare the goal once arrived. If the robot correctly declares the goal, it receives r_findgoal. It incurs a r_timestep_penalty for every timestep it does not reach the goal. By default r_findgoal is 1 and r_timestep_penalty is 0.

Starting State

The initial state is sampled from a repeated, discrete von Mises distribution each with a concentration at the center of the hallway.

Episode End

The episode terminates once the robot declares the goal CMAZE_DECLARE_GOAL regardless of whether the robot is correct.

Documentation

CompressedBeliefMDPs.CircularMazeType
CircularMaze(n_corridors::Integer, corridor_length::Integer, discount::Float64, r_findgoal::Float64, r_timestep_penalty::Float64)
 CircularMaze(n_corridors::Integer, corridor_length::Integer; kwargs...)
 CircularMaze()

A POMDP representing a circular maze environment.

Fields

  • n_corridors::Integer: Number of corridors in the circular maze.
  • corridor_length::Integer: Length of each corridor.
  • probabilities::AbstractArray: Probability masses for creating von Mises distributions.
  • center::Integer: The central position in the maze.
  • discount::Float64: Discount factor for future rewards.
  • r_findgoal::Float64: Reward for finding the goal.
  • r_timestep_penalty::Float64: Penalty for each timestep taken.
  • states::AbstractArray: Array of all possible states in the maze.
  • goals::AbstractArray: Array of goal states in the maze.

Example

using CompressedBeliefMDPs
 
 n_corridors = 8
 corridor_length = 25
-maze = CircularMaze(n_corridors, corridor_length)
source
CompressedBeliefMDPs.CircularMazeStateType
CircularMazeState(corridor::Integer, x::Integer)

The CircularMazeState struct represents the state of an agent in a circular maze.

Fields

  • corridor::Integer: The corridor number. The value ranges from 1 to n_corridors.
  • x::Integer: The position of the state within the corridor. The value ranges from 1 to the corridor_length.
source
  • 1Roy doesn't actually name his toy environment. For the original environment details, see the "PCA Performance" subsection on page 8.
+maze = CircularMaze(n_corridors, corridor_length)
source
CompressedBeliefMDPs.CircularMazeStateType
CircularMazeState(corridor::Integer, x::Integer)

The CircularMazeState struct represents the state of an agent in a circular maze.

Fields

  • corridor::Integer: The corridor number. The value ranges from 1 to n_corridors.
  • x::Integer: The position of the state within the corridor. The value ranges from 1 to the corridor_length.
source
  • 1Roy doesn't actually name his toy environment. For the original environment details, see the "PCA Performance" subsection on page 8.
diff --git a/dev/compressors/index.html b/dev/compressors/index.html index d2fa485..a8c5c43 100644 --- a/dev/compressors/index.html +++ b/dev/compressors/index.html @@ -12,4 +12,4 @@ function fit!(c::MyCompressor, beliefs) # YOUR CODE HERE -end

Implementation Tips

Implemented Compressors

CompressedBeliefMDPs currently provides wrappers for the following compression types:

Principal Component Analysis (PCA)

CompressedBeliefMDPs.PCACompressorFunction

Wrapper for MultivariateStats.PCA.

source

Kernel PCA

CompressedBeliefMDPs.KernelPCACompressorFunction

Wrapper for MultivariateStats.KernelPCA.

source

Probabilistic PCA

CompressedBeliefMDPs.PPCACompressorFunction

Wrapper for MultivariateStats.PPCA.

source

Factor Analysis

CompressedBeliefMDPs.FactorAnalysisCompressorFunction

Wrapper for MultivariateStats.FactorAnalysis

source

Isomap

CompressedBeliefMDPs.IsomapCompressorFunction

Wrapper for ManifoldLearning.Isomap.

source

Autoencoder

CompressedBeliefMDPs.AutoencoderCompressorType

Implements an autoencoder in Flux.

source

Variational Auto-Encoder (VAE)

CompressedBeliefMDPs.VAECompressorType

Implements a VAE in Flux.

source
Warning

Some compression algorithms aren't optimized for large belief spaces. While they pass our unit tests, they may fail on large POMDPs or without seeding. For large POMDPs, users may want a custom Compressor.

+end

Implementation Tips

Implemented Compressors

CompressedBeliefMDPs currently provides wrappers for the following compression types:

Principal Component Analysis (PCA)

CompressedBeliefMDPs.PCACompressorFunction

Wrapper for MultivariateStats.PCA.

source

Kernel PCA

CompressedBeliefMDPs.KernelPCACompressorFunction

Wrapper for MultivariateStats.KernelPCA.

source

Probabilistic PCA

CompressedBeliefMDPs.PPCACompressorFunction

Wrapper for MultivariateStats.PPCA.

source

Factor Analysis

CompressedBeliefMDPs.FactorAnalysisCompressorFunction

Wrapper for MultivariateStats.FactorAnalysis

source

Isomap

CompressedBeliefMDPs.IsomapCompressorFunction

Wrapper for ManifoldLearning.Isomap.

source

Autoencoder

CompressedBeliefMDPs.AutoencoderCompressorType

Implements an autoencoder in Flux.

source

Variational Auto-Encoder (VAE)

CompressedBeliefMDPs.VAECompressorType

Implements a VAE in Flux.

source
Warning

Some compression algorithms aren't optimized for large belief spaces. While they pass our unit tests, they may fail on large POMDPs or without seeding. For large POMDPs, users may want a custom Compressor.

diff --git a/dev/index.html b/dev/index.html index 421f36d..5fa669b 100644 --- a/dev/index.html +++ b/dev/index.html @@ -65,4 +65,4 @@ ) policy = solve(solver, pomdp) rs = RolloutSimulator(max_steps=50) -r = simulate(rs, pomdp, policy)

Concepts and Architecture

CompressedBeliefMDPs.jl aims to implement a generalization of the belief compression algorithm for solving large POMDPs. The algorithm has four steps:

  1. collect belief samples,
  2. compress the samples,
  3. create the compressed belief-state MDP,
  4. solve the MDP.

Each step is handled by Sampler, Compressor, CompressedBeliefMDP, and CompressedBeliefSolver respectively.

For more details, please see the rest of the documentation or the associated paper.

+r = simulate(rs, pomdp, policy)

Concepts and Architecture

CompressedBeliefMDPs.jl aims to implement a generalization of the belief compression algorithm for solving large POMDPs. The algorithm has four steps:

  1. collect belief samples,
  2. compress the samples,
  3. create the compressed belief-state MDP,
  4. solve the MDP.

Each step is handled by Sampler, Compressor, CompressedBeliefMDP, and CompressedBeliefSolver respectively.

For more details, please see the rest of the documentation or the associated paper.

diff --git a/dev/samplers/index.html b/dev/samplers/index.html index 14601d6..f247da8 100644 --- a/dev/samplers/index.html +++ b/dev/samplers/index.html @@ -16,13 +16,13 @@ DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.15000000000000002, 0.85]) DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.5, 0.5]) DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.85, 0.15000000000000002]) - DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.9697986577181208, 0.030201342281879207])source

Policy Sampler

CompressedBeliefMDPs.PolicySamplerType
PolicySampler

Samples belief states by rolling out a Policy.

Fields

  • policy::Policy: The policy used for decision making.
  • updater::Updater: The updater used for updating beliefs.
  • n::Integer: The maximum number of simulated steps.
  • rng::AbstractRNG: The random number generator used for sampling.
  • verbose::Bool: Whether to use a progress bar while sampling.

Constructors

PolicySampler(pomdp::POMDP; policy::Policy=RandomPolicy(pomdp), 
+  DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.9697986577181208, 0.030201342281879207])
source

Policy Sampler

CompressedBeliefMDPs.PolicySamplerType
PolicySampler

Samples belief states by rolling out a Policy.

Fields

  • policy::Policy: The policy used for decision making.
  • updater::Updater: The updater used for updating beliefs.
  • n::Integer: The maximum number of simulated steps.
  • rng::AbstractRNG: The random number generator used for sampling.
  • verbose::Bool: Whether to use a progress bar while sampling.

Constructors

PolicySampler(pomdp::POMDP; policy::Policy=RandomPolicy(pomdp), 
 updater::Updater=DiscreteUpdater(pomdp), n::Integer=10, 
 rng::AbstractRNG=Random.GLOBAL_RNG)

Methods

(s::PolicySampler)(pomdp::POMDP)

Returns a vector of unique belief states.

Example

julia> pomdp = TigerPOMDP();
 julia> sampler = PolicySampler(pomdp; n=3); 
 julia> 2-element Vector{Any}:
 DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.5, 0.5])
-DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.15000000000000002, 0.85])
source

ExplorationPolicy Sampler

CompressedBeliefMDPs.ExplorationPolicySamplerType
ExplorationPolicySampler

Samples belief states by rolling out an ExplorationPolicy. Essentially identical to PolicySampler.

Fields

  • explorer::ExplorationPolicy: The ExplorationPolicy used for decision making.
  • on_policy::Policy: The fallback Policy used for decision making when not exploring.
  • updater::Updater: The updater used for updating beliefs.
  • n::Integer: The maximum number of simulated steps.
  • rng::AbstractRNG: The random number generator used for sampling.
  • verbose::Bool: Whether to use a progress bar while sampling.

Constructors

ExplorationPolicySampler(pomdp::POMDP; rng::AbstractRNG=Random.GLOBAL_RNG,
+DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.15000000000000002, 0.85])
source

ExplorationPolicy Sampler

CompressedBeliefMDPs.ExplorationPolicySamplerType
ExplorationPolicySampler

Samples belief states by rolling out an ExplorationPolicy. Essentially identical to PolicySampler.

Fields

  • explorer::ExplorationPolicy: The ExplorationPolicy used for decision making.
  • on_policy::Policy: The fallback Policy used for decision making when not exploring.
  • updater::Updater: The updater used for updating beliefs.
  • n::Integer: The maximum number of simulated steps.
  • rng::AbstractRNG: The random number generator used for sampling.
  • verbose::Bool: Whether to use a progress bar while sampling.

Constructors

ExplorationPolicySampler(pomdp::POMDP; rng::AbstractRNG=Random.GLOBAL_RNG,
 explorer::ExplorationPolicy=EpsGreedyPolicy(pomdp, 0.1; rng=rng), on_policy=RandomPolicy(pomdp),
 updater::Updater=DiscreteUpdater(pomdp), n::Integer=10)

Methods

(s::ExplorationPolicySampler)(pomdp::POMDP)

Returns a vector of unique belief states.

Example Usage

julia> pomdp = TigerPOMDP()
 julia> sampler = ExplorationPolicySampler(pomdp; n=30)
@@ -30,4 +30,4 @@
 3-element Vector{Any}:
  DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.5, 0.5])
  DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.85, 0.15000000000000002])
- DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.9697986577181208, 0.030201342281879207])
source
+ DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.9697986577181208, 0.030201342281879207])source diff --git a/dev/search_index.js b/dev/search_index.js index f5aa49c..a70aaf8 100644 --- a/dev/search_index.js +++ b/dev/search_index.js @@ -1,3 +1,3 @@ var documenterSearchIndex = {"docs": -[{"location":"api/#API-Documentation","page":"API Documentation","title":"API Documentation","text":"","category":"section"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"CurrentModule = CompressedBeliefMDPs","category":"page"},{"location":"api/#Contents","page":"API Documentation","title":"Contents","text":"","category":"section"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"Pages = [\"api.md\"]","category":"page"},{"location":"api/#Index","page":"API Documentation","title":"Index","text":"","category":"section"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"Pages = [\"api.md\"]","category":"page"},{"location":"api/#Types/Functors","page":"API Documentation","title":"Types/Functors","text":"","category":"section"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"Sampler\nCompressor\nCompressedBeliefMDP\nCompressedBeliefPolicy\nCompressedBeliefSolver","category":"page"},{"location":"api/#CompressedBeliefMDPs.Sampler","page":"API Documentation","title":"CompressedBeliefMDPs.Sampler","text":"Abstract type for an object that defines how the belief should be sampled.\n\n\n\n\n\n","category":"type"},{"location":"api/#CompressedBeliefMDPs.Compressor","page":"API Documentation","title":"CompressedBeliefMDPs.Compressor","text":"Abstract type for an object that defines how the belief should be compressed.\n\n\n\n\n\n","category":"type"},{"location":"api/#CompressedBeliefMDPs.CompressedBeliefMDP","page":"API Documentation","title":"CompressedBeliefMDPs.CompressedBeliefMDP","text":"CompressedBeliefMDP{B, A}\n\nThe CompressedBeliefMDP struct is a generalization of the compressed belief-state MDP presented in Exponential Family PCA for Belief Compression in POMDPs.\n\nType Parameters\n\nB: The type of compressed belief states.\nA: The type of actions.\n\nFields\n\nbmdp::GenerativeBeliefMDP: The generative belief-state MDP.\ncompressor::Compressor: The compressor used to compress belief states.\nϕ::Bijection: A bijection representing the mapping from uncompressed belief states to compressed belief states. See notes. \n\nConstructors\n\nCompressedBeliefMDP(pomdp::POMDP, updater::Updater, compressor::Compressor)\nCompressedBeliefMDP(pomdp::POMDP, sampler::Sampler, updater::Updater, compressor::Compressor)\n\nConstructs a CompressedBeliefMDP using the specified POMDP, updater, and compressor.\n\nwarning: Warning\nThe 4-argument constructor is a quality-of-life constructor that calls fit! on the given compressor. \n\nExample Usage\n\npomdp = TigerPOMDP()\nupdater = DiscreteUpdater(pomdp)\ncompressor = PCACompressor(1)\nmdp = CompressedBeliefMDP(pomdp, updater, compressor)\n\nFor continuous POMDPs, see ParticleFilters.jl.\n\nNotes\n\nWhile compressions aren't usually injective, we cache beliefs and their compressions on a first-come, first-served basis, so we can effectively use a bijection without loss of generality.\n\n\n\n\n\n","category":"type"},{"location":"api/#CompressedBeliefMDPs.CompressedBeliefPolicy","page":"API Documentation","title":"CompressedBeliefMDPs.CompressedBeliefPolicy","text":"CompressedBeliefPolicy\n\nMaps a base policy for the compressed belief-state MDP to a policy for the true POMDP.\n\nFields\n\nm::CompressedBeliefMDP: The compressed belief-state MDP.\nbase_policy::Policy: The base policy used for decision-making in the compressed belief-state MDP.\n\nConstructors\n\nCompressedBeliefPolicy(m::CompressedBeliefMDP, base_policy::Policy)\n\nConstructs a CompressedBeliefPolicy using the specified compressed belief-state MDP and base policy.\n\nExample Usage\n\npolicy = solve(solver, pomdp)\ns = initialstate(pomdp)\na = action(policy, s) # returns the approximately optimal action for state s\nv = value(policy, s) # returns the approximately optimal value for state s\n\n\n\n\n\n","category":"type"},{"location":"api/#CompressedBeliefMDPs.CompressedBeliefSolver","page":"API Documentation","title":"CompressedBeliefMDPs.CompressedBeliefSolver","text":"CompressedBeliefSolver\n\nThe CompressedBeliefSolver struct represents a solver for compressed belief-state MDPs. It combines a compressed belief-state MDP with a base solver to approximate the value function.\n\nFields\n\nm::CompressedBeliefMDP: The compressed belief-state MDP.\nbase_solver::Solver: The base solver used to solve the compressed belief-state MDP.\n\nConstructors\n\nCompressedBeliefSolver(pomdp::POMDP, base_solver::Solver; updater::Updater=DiscreteUpdater(pomdp), sampler::Sampler=BeliefExpansionSampler(pomdp), compressor::Compressor=PCACompressor(1))\nCompressedBeliefSolver(pomdp::POMDP; updater::Updater=DiscreteUpdater(pomdp), sampler::Sampler=BeliefExpansionSampler(pomdp), compressor::Compressor=PCACompressor(1), interp::Union{Nothing, LocalFunctionApproximator}=nothing, k::Int=1, verbose::Bool=false, max_iterations::Int=1000, n_generative_samples::Int=10, belres::Float64=1e-3)\n\nConstructs a CompressedBeliefSolver using the specified POMDP, base solver, updater, sampler, and compressor. Alternatively, you can omit the base solver in which case a LocalApproximationValueIterationSolver(https://github.com/JuliaPOMDP/LocalApproximationValueIteration.jl) will be created instead. For example, different base solvers are needed if the POMDP state and action space are continuous.\n\nExample Usage\n\njulia> pomdp = TigerPOMDP();\njulia> solver = CompressedBeliefSolver(pomdp; verbose=true, max_iterations=10);\njulia> solve(solver, pomdp);\n[Iteration 1 ] residual: 8.51 | iteration runtime: 635.870 ms, ( 0.636 s total)\n[Iteration 2 ] residual: 3.63 | iteration runtime: 0.504 ms, ( 0.636 s total)\n[Iteration 3 ] residual: 10.1 | iteration runtime: 0.445 ms, ( 0.637 s total)\n[Iteration 4 ] residual: 15.2 | iteration runtime: 0.494 ms, ( 0.637 s total)\n[Iteration 5 ] residual: 6.72 | iteration runtime: 0.432 ms, ( 0.638 s total)\n[Iteration 6 ] residual: 7.38 | iteration runtime: 0.508 ms, ( 0.638 s total)\n[Iteration 7 ] residual: 6.03 | iteration runtime: 0.495 ms, ( 0.639 s total)\n[Iteration 8 ] residual: 5.73 | iteration runtime: 0.585 ms, ( 0.639 s total)\n[Iteration 9 ] residual: 4.02 | iteration runtime: 0.463 ms, ( 0.64 s total)\n[Iteration 10 ] residual: 7.28 | iteration runtime: 0.576 ms, ( 0.64 s total)\n\n\n\n\n\n","category":"type"},{"location":"api/#Functions","page":"API Documentation","title":"Functions","text":"","category":"section"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"fit!\nmake_cache\nmake_numerical\ncompress_POMDP","category":"page"},{"location":"api/#CompressedBeliefMDPs.fit!","page":"API Documentation","title":"CompressedBeliefMDPs.fit!","text":"fit!(compressor::Compressor, beliefs)\n\nFit the compressor to beliefs.\n\n\n\n\n\n","category":"function"},{"location":"api/#CompressedBeliefMDPs.make_cache","page":"API Documentation","title":"CompressedBeliefMDPs.make_cache","text":"make_cache(B, B̃)\n\nHelper function that creates a cache that maps each unique belief from the set B to its corresponding compressed representation in B̃.\n\nArguments\n\nB::Vector{<:Any}: A vector of beliefs.\nB̃::Matrix{Float64}: A matrix where each row corresponds to the compressed representation of the beliefs in B.\n\nReturns\n\nDict{<:Any, Vector{Float64}}: A dictionary mapping each unique belief in B to its corresponding compressed representation in B̃.\n\nExample Usage\n\nB = [belief1, belief2, belief3]\nB̃ = [compressed_belief1; compressed_belief2; compressed_belief3]\nϕ = make_cache(B, B̃)\n\n\n\n\n\n","category":"function"},{"location":"api/#CompressedBeliefMDPs.make_numerical","page":"API Documentation","title":"CompressedBeliefMDPs.make_numerical","text":"make_numerical(B, pomdp)\n\nHelper function that converts a set of beliefs B into a numerical matrix representation suitable for processing by numerical algorithms/compressors.\n\nArguments\n\nB::Vector{<:Any}: A vector of beliefs.\npomdp::POMDP: The POMDP model associated with the beliefs.\n\nReturns\n\nMatrix{Float64}: A matrix where each row corresponds to a numerical representation of a belief in B.\n\nExample Usage\n\nB = [belief1, belief2, belief3]\nB_numerical = make_numerical(B, pomdp)\n\n\n\n\n\n","category":"function"},{"location":"api/#CompressedBeliefMDPs.compress_POMDP","page":"API Documentation","title":"CompressedBeliefMDPs.compress_POMDP","text":"compress_POMDP(pomdp, sampler, updater, compressor)\n\nCreates a compressed belief-state MDP by sampling, compressing, and caching beliefs from the given POMDP.\n\nArguments\n\npomdp::POMDP: The POMDP model to be compressed.\nsampler::Sampler: A sampler to generate a set of beliefs from the POMDP.\nupdater::Updater: An updater to initialize beliefs from states.\ncompressor::Compressor: A compressor to reduce the dimensionality of the beliefs.\n\nReturns\n\nCompressedBeliefMDP: The constructed compressed belief-state MDP.\nMatrix{Float64}: A matrix where each row corresponds to the compressed representation of the sampled beliefs.\n\nExample Usage\n\n```julia pomdp = TigerPOMDP() sampler = BeliefExpansionSampler(pomdp) updater = DiscreteUpdater(pomdp) compressor = PCACompressor(2) m, B̃ = compress_POMDP(pomdp, sampler, updater, compressor)\n\n\n\n\n\n","category":"function"},{"location":"compressors/#Compressors","page":"Compressors","title":"Compressors","text":"","category":"section"},{"location":"compressors/#Defining-a-Belief-Compressor","page":"Compressors","title":"Defining a Belief Compressor","text":"","category":"section"},{"location":"compressors/","page":"Compressors","title":"Compressors","text":"In this section, we outline the requirements and guidelines for defining a belief Compressor.","category":"page"},{"location":"compressors/#Interface","page":"Compressors","title":"Interface","text":"","category":"section"},{"location":"compressors/","page":"Compressors","title":"Compressors","text":"The Compressor interface is extremely minimal. It only supports two methods: fit! and the associated functor. For example, if you wanted to implement your own Compressor, you could write something like this","category":"page"},{"location":"compressors/","page":"Compressors","title":"Compressors","text":"struct MyCompressor <: Compressor\n foo\n bar\nend\n\n# functor definition\nfunction (c::MyCompressor)(beliefs)\n # YOUR CODE HERE\n return compressed_beliefs\nend\n\nfunction fit!(c::MyCompressor, beliefs)\n # YOUR CODE HERE\nend","category":"page"},{"location":"compressors/#Implementation-Tips","page":"Compressors","title":"Implementation Tips","text":"","category":"section"},{"location":"compressors/","page":"Compressors","title":"Compressors","text":"For robustness, both the functor and fit! should be able to handle AbstractVector and AbstractMatrix inputs. \nfit! is called only once after beliefs are sampled from the POMDP.\nCompressedBeliefSolver will attempt to convert each belief state (often of type DiscreteBelief) into an AbstractArray{Float64} using convert_s. As a convenience, CompressedBeliefMDP implements conversions for commonly used belief types; however, if the POMDP has a custom belief state, then it is the users' responsibility to implement the appropriate conversion. See the source code for help. ","category":"page"},{"location":"compressors/#Implemented-Compressors","page":"Compressors","title":"Implemented Compressors","text":"","category":"section"},{"location":"compressors/","page":"Compressors","title":"Compressors","text":"CompressedBeliefMDPs currently provides wrappers for the following compression types:","category":"page"},{"location":"compressors/","page":"Compressors","title":"Compressors","text":"a principal component analysis (PCA) compressor,\na kernel PCA compressor,\na probabilistic PCA compressor,\na factor analysis compressor,\nan isomap compressor,\nan autoencoder compressor\na variational auto-encoder (VAE) compressor","category":"page"},{"location":"compressors/#Principal-Component-Analysis-(PCA)","page":"Compressors","title":"Principal Component Analysis (PCA)","text":"","category":"section"},{"location":"compressors/","page":"Compressors","title":"Compressors","text":"PCACompressor","category":"page"},{"location":"compressors/#CompressedBeliefMDPs.PCACompressor","page":"Compressors","title":"CompressedBeliefMDPs.PCACompressor","text":"Wrapper for MultivariateStats.PCA.\n\n\n\n\n\n","category":"function"},{"location":"compressors/#Kernel-PCA","page":"Compressors","title":"Kernel PCA","text":"","category":"section"},{"location":"compressors/","page":"Compressors","title":"Compressors","text":"KernelPCACompressor","category":"page"},{"location":"compressors/#CompressedBeliefMDPs.KernelPCACompressor","page":"Compressors","title":"CompressedBeliefMDPs.KernelPCACompressor","text":"Wrapper for MultivariateStats.KernelPCA.\n\n\n\n\n\n","category":"function"},{"location":"compressors/#Probabilistic-PCA","page":"Compressors","title":"Probabilistic PCA","text":"","category":"section"},{"location":"compressors/","page":"Compressors","title":"Compressors","text":"PPCACompressor","category":"page"},{"location":"compressors/#CompressedBeliefMDPs.PPCACompressor","page":"Compressors","title":"CompressedBeliefMDPs.PPCACompressor","text":"Wrapper for MultivariateStats.PPCA.\n\n\n\n\n\n","category":"function"},{"location":"compressors/#Factor-Analysis","page":"Compressors","title":"Factor Analysis","text":"","category":"section"},{"location":"compressors/","page":"Compressors","title":"Compressors","text":"FactorAnalysisCompressor","category":"page"},{"location":"compressors/#CompressedBeliefMDPs.FactorAnalysisCompressor","page":"Compressors","title":"CompressedBeliefMDPs.FactorAnalysisCompressor","text":"Wrapper for MultivariateStats.FactorAnalysis\n\n\n\n\n\n","category":"function"},{"location":"compressors/#Isomap","page":"Compressors","title":"Isomap","text":"","category":"section"},{"location":"compressors/","page":"Compressors","title":"Compressors","text":"IsomapCompressor","category":"page"},{"location":"compressors/#CompressedBeliefMDPs.IsomapCompressor","page":"Compressors","title":"CompressedBeliefMDPs.IsomapCompressor","text":"Wrapper for ManifoldLearning.Isomap.\n\n\n\n\n\n","category":"function"},{"location":"compressors/#Autoencoder","page":"Compressors","title":"Autoencoder","text":"","category":"section"},{"location":"compressors/","page":"Compressors","title":"Compressors","text":"AutoencoderCompressor","category":"page"},{"location":"compressors/#CompressedBeliefMDPs.AutoencoderCompressor","page":"Compressors","title":"CompressedBeliefMDPs.AutoencoderCompressor","text":"Implements an autoencoder in Flux.\n\n\n\n\n\n","category":"type"},{"location":"compressors/#Variational-Auto-Encoder-(VAE)","page":"Compressors","title":"Variational Auto-Encoder (VAE)","text":"","category":"section"},{"location":"compressors/","page":"Compressors","title":"Compressors","text":"VAECompressor","category":"page"},{"location":"compressors/#CompressedBeliefMDPs.VAECompressor","page":"Compressors","title":"CompressedBeliefMDPs.VAECompressor","text":"Implements a VAE in Flux.\n\n\n\n\n\n","category":"type"},{"location":"compressors/","page":"Compressors","title":"Compressors","text":"warning: Warning\nSome compression algorithms aren't optimized for large belief spaces. While they pass our unit tests, they may fail on large POMDPs or without seeding. For large POMDPs, users may want a custom Compressor.","category":"page"},{"location":"circular/#Circular-Maze","page":"Environments","title":"Circular Maze","text":"","category":"section"},{"location":"circular/#Description","page":"Environments","title":"Description","text":"","category":"section"},{"location":"circular/","page":"Environments","title":"Environments","text":"This environment is a generalization of the Circular Maze POMDP described in Finding Approximate POMDP solutions Through Belief Compression.[1] The world consists of n_corridor 1D circular corridors that each have corridor_length states. The robot spawns in a random corridor. It must determine which corridor its in, navigate to the proper goal state, and finally declare that it has finished.","category":"page"},{"location":"circular/","page":"Environments","title":"Environments","text":"(Image: )","category":"page"},{"location":"circular/","page":"Environments","title":"Environments","text":"*Figure from Finding Approximate POMDP solutions Through Belief Compression.","category":"page"},{"location":"circular/","page":"Environments","title":"Environments","text":"[1]: Roy doesn't actually name his toy environment. For the original environment details, see the \"PCA Performance\" subsection on page 8.","category":"page"},{"location":"circular/#Action-Space","page":"Environments","title":"Action Space","text":"","category":"section"},{"location":"circular/","page":"Environments","title":"Environments","text":"Transitions left and right are noisy and non-deterministic. Transition probabilities are from a discrete von Mises distribution with unit concentration and mean at the target state. ","category":"page"},{"location":"circular/","page":"Environments","title":"Environments","text":"Num Action Description\n1 CMAZE_LEFT Move left with von Mises noise.\n2 CMAZE_RIGHT Move right with von Mises noise.\n3 CMAZE_SENSE_CORRIDOR Observe the current corridor.\n4 CMAZE_DECLARE_GOAL Ends the episode. Receive r_findgoal if at the goal.","category":"page"},{"location":"circular/#State-Space","page":"Environments","title":"State Space","text":"","category":"section"},{"location":"circular/","page":"Environments","title":"Environments","text":"The (ordered) state space is an array of all CircularMazeStates and a terminalstate: [CircularMaze(1, 1), ..., CircularMaze(n_corridors, corridor_length), TerminalState()].","category":"page"},{"location":"circular/#Observation-Space","page":"Environments","title":"Observation Space","text":"","category":"section"},{"location":"circular/","page":"Environments","title":"Environments","text":"The observation space is the union of the state space and 1:n_corridors. If the robot picks CMAZE_SENSE_CORRIDOR, they observe the index of the current corridor. Otherwise, they observe their current state with von Mises noise.","category":"page"},{"location":"circular/#Rewards","page":"Environments","title":"Rewards","text":"","category":"section"},{"location":"circular/","page":"Environments","title":"Environments","text":"The goal is to navigate to the correct goal state for the given corridor and then to declare the goal once arrived. If the robot correctly declares the goal, it receives r_findgoal. It incurs a r_timestep_penalty for every timestep it does not reach the goal. By default r_findgoal is 1 and r_timestep_penalty is 0. ","category":"page"},{"location":"circular/#Starting-State","page":"Environments","title":"Starting State","text":"","category":"section"},{"location":"circular/","page":"Environments","title":"Environments","text":"The initial state is sampled from a repeated, discrete von Mises distribution each with a concentration at the center of the hallway. ","category":"page"},{"location":"circular/","page":"Environments","title":"Environments","text":"(Image: )","category":"page"},{"location":"circular/#Episode-End","page":"Environments","title":"Episode End","text":"","category":"section"},{"location":"circular/","page":"Environments","title":"Environments","text":"The episode terminates once the robot declares the goal CMAZE_DECLARE_GOAL regardless of whether the robot is correct.","category":"page"},{"location":"circular/#Documentation","page":"Environments","title":"Documentation","text":"","category":"section"},{"location":"circular/","page":"Environments","title":"Environments","text":"CircularMaze","category":"page"},{"location":"circular/#CompressedBeliefMDPs.CircularMaze","page":"Environments","title":"CompressedBeliefMDPs.CircularMaze","text":"CircularMaze(n_corridors::Integer, corridor_length::Integer, discount::Float64, r_findgoal::Float64, r_timestep_penalty::Float64)\nCircularMaze(n_corridors::Integer, corridor_length::Integer; kwargs...)\nCircularMaze()\n\nA POMDP representing a circular maze environment.\n\nFields\n\nn_corridors::Integer: Number of corridors in the circular maze.\ncorridor_length::Integer: Length of each corridor.\nprobabilities::AbstractArray: Probability masses for creating von Mises distributions.\ncenter::Integer: The central position in the maze.\ndiscount::Float64: Discount factor for future rewards.\nr_findgoal::Float64: Reward for finding the goal.\nr_timestep_penalty::Float64: Penalty for each timestep taken.\nstates::AbstractArray: Array of all possible states in the maze.\ngoals::AbstractArray: Array of goal states in the maze.\n\nExample\n\nusing CompressedBeliefMDPs\n\nn_corridors = 8\ncorridor_length = 25\nmaze = CircularMaze(n_corridors, corridor_length)\n\n\n\n\n\n","category":"type"},{"location":"circular/","page":"Environments","title":"Environments","text":"CircularMazeState","category":"page"},{"location":"circular/#CompressedBeliefMDPs.CircularMazeState","page":"Environments","title":"CompressedBeliefMDPs.CircularMazeState","text":"CircularMazeState(corridor::Integer, x::Integer)\n\nThe CircularMazeState struct represents the state of an agent in a circular maze.\n\nFields\n\ncorridor::Integer: The corridor number. The value ranges from 1 to n_corridors.\nx::Integer: The position of the state within the corridor. The value ranges from 1 to the corridor_length.\n\n\n\n\n\n","category":"type"},{"location":"#CompressedBeliefMDPs.jl","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"","category":"section"},{"location":"#Introduction","page":"CompressedBeliefMDPs.jl","title":"Introduction","text":"","category":"section"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"Welcome to CompressedBeliefMDPs.jl! This package is part of the POMDPs.jl ecosystem and takes inspiration from Exponential Family PCA for Belief Compression in POMDPs. ","category":"page"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"This package provides a general framework for applying belief compression in large POMDPs with generic compression, sampling, and planning algorithms.","category":"page"},{"location":"#Installation","page":"CompressedBeliefMDPs.jl","title":"Installation","text":"","category":"section"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"You can install CompressedBeliefMDPs.jl using Julia's package manager. Open the Julia REPL (press ] to enter the package manager mode) and run the following command:","category":"page"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"pkg> add CompressedBeliefMDPs","category":"page"},{"location":"#Quickstart","page":"CompressedBeliefMDPs.jl","title":"Quickstart","text":"","category":"section"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"Using belief compression is easy. Simplify pick a Sampler, Compressor, and a base Policy and then use the standard POMDPs.jl interface.","category":"page"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"using POMDPs, POMDPTools, POMDPModels\nusing CompressedBeliefMDPs\n\npomdp = BabyPOMDP()\ncompressor = PCACompressor(1)\nupdater = DiscreteUpdater(pomdp)\nsampler = BeliefExpansionSampler(pomdp)\nsolver = CompressedBeliefSolver(\n pomdp;\n compressor=compressor,\n sampler=sampler,\n updater=updater,\n verbose=true, \n max_iterations=100, \n n_generative_samples=50, \n k=2\n)\npolicy = solve(solver, pomdp)","category":"page"},{"location":"#Continuous-Example","page":"CompressedBeliefMDPs.jl","title":"Continuous Example","text":"","category":"section"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"This example demonstrates using CompressedBeliefMDP in a continuous setting with the LightDark1D POMDP. It combines particle filters for belief updating and Monte Carlo Tree Search (MCTS) as the solver. While compressing a 1D space is trivial toy problem, this architecture can be easily scaled to larger POMDPs with continuous state and action spaces.","category":"page"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"using POMDPs, POMDPModels, POMDPTools\nusing ParticleFilters\nusing MCTS\nusing CompressedBeliefMDPs\n\npomdp = LightDark1D()\npomdp.movement_cost = 1\nbase_solver = MCTSSolver(n_iterations=10, depth=50, exploration_constant=5.0)\nupdater = BootstrapFilter(pomdp, 100)\nsolver = CompressedBeliefSolver(\n pomdp,\n base_solver;\n updater=updater,\n sampler=PolicySampler(pomdp; updater=updater)\n)\npolicy = solve(solver, pomdp)\nrs = RolloutSimulator(max_steps=50)\nr = simulate(rs, pomdp, policy)","category":"page"},{"location":"#Continuous-Example-2","page":"CompressedBeliefMDPs.jl","title":"Continuous Example","text":"","category":"section"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"This example demonstrates using CompressedBeliefMDP in a continuous setting with the LightDark1D POMDP. It combines particle filters for belief updating and Monte Carlo Tree Search (MCTS) as the solver. While compressing a 1D space is trivial toy problem, this architecture can be easily scaled to larger POMDPs with continuous state and action spaces.","category":"page"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"using POMDPs, POMDPModels, POMDPTools\nusing ParticleFilters\nusing MCTS\nusing CompressedBeliefMDPs\n\npomdp = LightDark1D()\npomdp.movement_cost = 1\nbase_solver = MCTSSolver(n_iterations=10, depth=50, exploration_constant=5.0)\nupdater = BootstrapFilter(pomdp, 100)\nsolver = CompressedBeliefSolver(\n pomdp,\n base_solver;\n updater=updater,\n sampler=PolicySampler(pomdp; updater=updater)\n)\npolicy = solve(solver, pomdp)\nrs = RolloutSimulator(max_steps=50)\nr = simulate(rs, pomdp, policy)","category":"page"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"Note: We use MCTS here as a proof of concept that CompressedBeliefMDPs can handle continuous state and action spaces. In reality, belief compression has no effect on MCTS with double progressive widening. If you want to solve continuous POMDPs, we suggest implementing a custom solver or looking into Crux.jl.","category":"page"},{"location":"#Large-Example","page":"CompressedBeliefMDPs.jl","title":"Large Example","text":"","category":"section"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"In this example, we tackle a more realistic scenario with the TMaze POMDP, which has 123 states. To handle the larger state space efficiently, we employ a variational auto-encoder (VAE) to compress the belief simplex. By leveraging the VAE's ability to learn a compact representation of the belief state, we focus computational power on the relevant compressed belief states during each Bellman update.","category":"page"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"using POMDPs, POMDPModels, POMDPTools\nusing CompressedBeliefMDPs\n\npomdp = TMaze(60, 0.9)\nsolver = CompressedBeliefSolver(\n pomdp;\n compressor=VAECompressor(123, 6; hidden_dim=10, verbose=true, epochs=2),\n sampler=PolicySampler(pomdp, n=500),\n verbose=true, \n max_iterations=1000, \n n_generative_samples=30,\n k=2\n)\npolicy = solve(solver, pomdp)\nrs = RolloutSimulator(max_steps=50)\nr = simulate(rs, pomdp, policy)","category":"page"},{"location":"#Concepts-and-Architecture","page":"CompressedBeliefMDPs.jl","title":"Concepts and Architecture","text":"","category":"section"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"CompressedBeliefMDPs.jl aims to implement a generalization of the belief compression algorithm for solving large POMDPs. The algorithm has four steps:","category":"page"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"collect belief samples,\ncompress the samples,\ncreate the compressed belief-state MDP,\nsolve the MDP.","category":"page"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"Each step is handled by Sampler, Compressor, CompressedBeliefMDP, and CompressedBeliefSolver respectively.","category":"page"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"For more details, please see the rest of the documentation or the associated paper.","category":"page"},{"location":"samplers/#Samplers","page":"Samplers","title":"Samplers","text":"","category":"section"},{"location":"samplers/#Defining-a-Sampler","page":"Samplers","title":"Defining a Sampler","text":"","category":"section"},{"location":"samplers/","page":"Samplers","title":"Samplers","text":"In this section, we outline the requirements and guidelines for defining a belief Sampler.","category":"page"},{"location":"samplers/#Interface","page":"Samplers","title":"Interface","text":"","category":"section"},{"location":"samplers/","page":"Samplers","title":"Samplers","text":"The Sampler interface only has one method: the functor. For example, if you wanted to implement your own Sampler, you could write something like this","category":"page"},{"location":"samplers/","page":"Samplers","title":"Samplers","text":"struct MySampler <: Compressor\n foo\n bar\nend\n\n# functor definition\nfunction (c::MySampler)(pomdp::POMDP)\n # YOUR CODE HERE\n return sampled_beliefs\nend","category":"page"},{"location":"samplers/#Implemented-Sampler","page":"Samplers","title":"Implemented Sampler","text":"","category":"section"},{"location":"samplers/","page":"Samplers","title":"Samplers","text":"CompressedBeliefMDPs provides the following generic belief samplers:","category":"page"},{"location":"samplers/","page":"Samplers","title":"Samplers","text":"an exploratory belief expansion sampler\na Policy rollout sampler\nan ExplorationPolicy rollout sampler","category":"page"},{"location":"samplers/#Exploratory-Belief-Expansion","page":"Samplers","title":"Exploratory Belief Expansion","text":"","category":"section"},{"location":"samplers/","page":"Samplers","title":"Samplers","text":"BeliefExpansionSampler","category":"page"},{"location":"samplers/#CompressedBeliefMDPs.BeliefExpansionSampler","page":"Samplers","title":"CompressedBeliefMDPs.BeliefExpansionSampler","text":"BeliefExpansionSampler\n\nFast extension of exploratory belief expansion (Algorithm 21.13 in Algorithms for Decision Making) that uses k-d trees.\n\nFields\n\nupdater::Updater: The updater used to update beliefs.\nmetric::NearestNeighbors.MinkowskiMetric: The metric used to measure distances between beliefs.\n\nIt must be a Minkowski metric.\n\nn::Integer: The number of belief expansions to perform.\n\nConstructors\n\nBeliefExpansionSampler(pomdp::POMDP; updater::Updater=DiscreteUpdater(pomdp),\nmetric::NearestNeighbors.MinkowskiMetric=Euclidean(), n::Integer=3)\n\nMethods\n\n(s::BeliefExpansionSampler)(pomdp::POMDP)\n\nCreates an initial belief and performs exploratory belief expansion. Returns the unique belief states. Only works for POMDPs with discrete state, action, and observation spaces.\n\nExample Usage\n\njulia> pomdp = TigerPOMDP();\njulia> sampler = BeliefExpansionSampler(pomdp; n=2);\njulia> beliefs = sampler(pomdp)\nSet{DiscreteBelief{TigerPOMDP, Bool}} with 4 elements:\n DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.15000000000000002, 0.85])\n DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.5, 0.5])\n DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.85, 0.15000000000000002])\n DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.9697986577181208, 0.030201342281879207])\n\n\n\n\n\n","category":"type"},{"location":"samplers/#Policy-Sampler","page":"Samplers","title":"Policy Sampler","text":"","category":"section"},{"location":"samplers/","page":"Samplers","title":"Samplers","text":"PolicySampler","category":"page"},{"location":"samplers/#CompressedBeliefMDPs.PolicySampler","page":"Samplers","title":"CompressedBeliefMDPs.PolicySampler","text":"PolicySampler\n\nSamples belief states by rolling out a Policy.\n\nFields\n\npolicy::Policy: The policy used for decision making.\nupdater::Updater: The updater used for updating beliefs.\nn::Integer: The maximum number of simulated steps.\nrng::AbstractRNG: The random number generator used for sampling.\nverbose::Bool: Whether to use a progress bar while sampling.\n\nConstructors\n\nPolicySampler(pomdp::POMDP; policy::Policy=RandomPolicy(pomdp), \nupdater::Updater=DiscreteUpdater(pomdp), n::Integer=10, \nrng::AbstractRNG=Random.GLOBAL_RNG)\n\nMethods\n\n(s::PolicySampler)(pomdp::POMDP)\n\nReturns a vector of unique belief states.\n\nExample\n\njulia> pomdp = TigerPOMDP();\njulia> sampler = PolicySampler(pomdp; n=3); \njulia> 2-element Vector{Any}:\nDiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.5, 0.5])\nDiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.15000000000000002, 0.85])\n\n\n\n\n\n","category":"type"},{"location":"samplers/#ExplorationPolicy-Sampler","page":"Samplers","title":"ExplorationPolicy Sampler","text":"","category":"section"},{"location":"samplers/","page":"Samplers","title":"Samplers","text":"ExplorationPolicySampler","category":"page"},{"location":"samplers/#CompressedBeliefMDPs.ExplorationPolicySampler","page":"Samplers","title":"CompressedBeliefMDPs.ExplorationPolicySampler","text":"ExplorationPolicySampler\n\nSamples belief states by rolling out an ExplorationPolicy. Essentially identical to PolicySampler.\n\nFields\n\nexplorer::ExplorationPolicy: The ExplorationPolicy used for decision making.\non_policy::Policy: The fallback Policy used for decision making when not exploring.\nupdater::Updater: The updater used for updating beliefs.\nn::Integer: The maximum number of simulated steps.\nrng::AbstractRNG: The random number generator used for sampling.\nverbose::Bool: Whether to use a progress bar while sampling.\n\nConstructors\n\nExplorationPolicySampler(pomdp::POMDP; rng::AbstractRNG=Random.GLOBAL_RNG,\nexplorer::ExplorationPolicy=EpsGreedyPolicy(pomdp, 0.1; rng=rng), on_policy=RandomPolicy(pomdp),\nupdater::Updater=DiscreteUpdater(pomdp), n::Integer=10)\n\nMethods\n\n(s::ExplorationPolicySampler)(pomdp::POMDP)\n\nReturns a vector of unique belief states.\n\nExample Usage\n\njulia> pomdp = TigerPOMDP()\njulia> sampler = ExplorationPolicySampler(pomdp; n=30)\njulia> sampler(pomdp)\n3-element Vector{Any}:\n DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.5, 0.5])\n DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.85, 0.15000000000000002])\n DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.9697986577181208, 0.030201342281879207])\n\n\n\n\n\n","category":"type"}] +[{"location":"api/#API-Documentation","page":"API Documentation","title":"API Documentation","text":"","category":"section"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"CurrentModule = CompressedBeliefMDPs","category":"page"},{"location":"api/#Contents","page":"API Documentation","title":"Contents","text":"","category":"section"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"Pages = [\"api.md\"]","category":"page"},{"location":"api/#Index","page":"API Documentation","title":"Index","text":"","category":"section"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"Pages = [\"api.md\"]","category":"page"},{"location":"api/#Types/Functors","page":"API Documentation","title":"Types/Functors","text":"","category":"section"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"Sampler\nCompressor\nCompressedBeliefMDP\nCompressedBeliefPolicy\nCompressedBeliefSolver","category":"page"},{"location":"api/#CompressedBeliefMDPs.Sampler","page":"API Documentation","title":"CompressedBeliefMDPs.Sampler","text":"Abstract type for an object that defines how the belief should be sampled.\n\n\n\n\n\n","category":"type"},{"location":"api/#CompressedBeliefMDPs.Compressor","page":"API Documentation","title":"CompressedBeliefMDPs.Compressor","text":"Abstract type for an object that defines how the belief should be compressed.\n\n\n\n\n\n","category":"type"},{"location":"api/#CompressedBeliefMDPs.CompressedBeliefMDP","page":"API Documentation","title":"CompressedBeliefMDPs.CompressedBeliefMDP","text":"CompressedBeliefMDP{B, A}\n\nThe CompressedBeliefMDP struct is a generalization of the compressed belief-state MDP presented in Exponential Family PCA for Belief Compression in POMDPs.\n\nType Parameters\n\nB: The type of compressed belief states.\nA: The type of actions.\n\nFields\n\nbmdp::GenerativeBeliefMDP: The generative belief-state MDP.\ncompressor::Compressor: The compressor used to compress belief states.\nϕ::Bijection: A bijection representing the mapping from uncompressed belief states to compressed belief states. See notes. \n\nConstructors\n\nCompressedBeliefMDP(pomdp::POMDP, updater::Updater, compressor::Compressor)\nCompressedBeliefMDP(pomdp::POMDP, sampler::Sampler, updater::Updater, compressor::Compressor)\n\nConstructs a CompressedBeliefMDP using the specified POMDP, updater, and compressor.\n\nwarning: Warning\nThe 4-argument constructor is a quality-of-life constructor that calls fit! on the given compressor. \n\nExample Usage\n\npomdp = TigerPOMDP()\nupdater = DiscreteUpdater(pomdp)\ncompressor = PCACompressor(1)\nmdp = CompressedBeliefMDP(pomdp, updater, compressor)\n\nFor continuous POMDPs, see ParticleFilters.jl.\n\nNotes\n\nWhile compressions aren't usually injective, we cache beliefs and their compressions on a first-come, first-served basis, so we can effectively use a bijection without loss of generality.\n\n\n\n\n\n","category":"type"},{"location":"api/#CompressedBeliefMDPs.CompressedBeliefPolicy","page":"API Documentation","title":"CompressedBeliefMDPs.CompressedBeliefPolicy","text":"CompressedBeliefPolicy\n\nMaps a base policy for the compressed belief-state MDP to a policy for the true POMDP.\n\nFields\n\nm::CompressedBeliefMDP: The compressed belief-state MDP.\nbase_policy::Policy: The base policy used for decision-making in the compressed belief-state MDP.\n\nConstructors\n\nCompressedBeliefPolicy(m::CompressedBeliefMDP, base_policy::Policy)\n\nConstructs a CompressedBeliefPolicy using the specified compressed belief-state MDP and base policy.\n\nExample Usage\n\npolicy = solve(solver, pomdp)\ns = initialstate(pomdp)\na = action(policy, s) # returns the approximately optimal action for state s\nv = value(policy, s) # returns the approximately optimal value for state s\n\n\n\n\n\n","category":"type"},{"location":"api/#CompressedBeliefMDPs.CompressedBeliefSolver","page":"API Documentation","title":"CompressedBeliefMDPs.CompressedBeliefSolver","text":"CompressedBeliefSolver\n\nThe CompressedBeliefSolver struct represents a solver for compressed belief-state MDPs. It combines a compressed belief-state MDP with a base solver to approximate the value function.\n\nFields\n\nm::CompressedBeliefMDP: The compressed belief-state MDP.\nbase_solver::Solver: The base solver used to solve the compressed belief-state MDP.\n\nConstructors\n\nCompressedBeliefSolver(pomdp::POMDP, base_solver::Solver; updater::Updater=DiscreteUpdater(pomdp), sampler::Sampler=BeliefExpansionSampler(pomdp), compressor::Compressor=PCACompressor(1))\nCompressedBeliefSolver(pomdp::POMDP; updater::Updater=DiscreteUpdater(pomdp), sampler::Sampler=BeliefExpansionSampler(pomdp), compressor::Compressor=PCACompressor(1), interp::Union{Nothing, LocalFunctionApproximator}=nothing, k::Int=1, verbose::Bool=false, max_iterations::Int=1000, n_generative_samples::Int=10, belres::Float64=1e-3)\n\nConstructs a CompressedBeliefSolver using the specified POMDP, base solver, updater, sampler, and compressor. Alternatively, you can omit the base solver in which case a LocalApproximationValueIterationSolver(https://github.com/JuliaPOMDP/LocalApproximationValueIteration.jl) will be created instead. For example, different base solvers are needed if the POMDP state and action space are continuous.\n\nExample Usage\n\njulia> pomdp = TigerPOMDP();\njulia> solver = CompressedBeliefSolver(pomdp; verbose=true, max_iterations=10);\njulia> solve(solver, pomdp);\n[Iteration 1 ] residual: 8.51 | iteration runtime: 635.870 ms, ( 0.636 s total)\n[Iteration 2 ] residual: 3.63 | iteration runtime: 0.504 ms, ( 0.636 s total)\n[Iteration 3 ] residual: 10.1 | iteration runtime: 0.445 ms, ( 0.637 s total)\n[Iteration 4 ] residual: 15.2 | iteration runtime: 0.494 ms, ( 0.637 s total)\n[Iteration 5 ] residual: 6.72 | iteration runtime: 0.432 ms, ( 0.638 s total)\n[Iteration 6 ] residual: 7.38 | iteration runtime: 0.508 ms, ( 0.638 s total)\n[Iteration 7 ] residual: 6.03 | iteration runtime: 0.495 ms, ( 0.639 s total)\n[Iteration 8 ] residual: 5.73 | iteration runtime: 0.585 ms, ( 0.639 s total)\n[Iteration 9 ] residual: 4.02 | iteration runtime: 0.463 ms, ( 0.64 s total)\n[Iteration 10 ] residual: 7.28 | iteration runtime: 0.576 ms, ( 0.64 s total)\n\n\n\n\n\n","category":"type"},{"location":"api/#Functions","page":"API Documentation","title":"Functions","text":"","category":"section"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"fit!\nmake_cache\nmake_numerical\ncompress_POMDP","category":"page"},{"location":"api/#CompressedBeliefMDPs.fit!","page":"API Documentation","title":"CompressedBeliefMDPs.fit!","text":"fit!(compressor::Compressor, beliefs)\n\nFit the compressor to beliefs.\n\n\n\n\n\n","category":"function"},{"location":"api/#CompressedBeliefMDPs.make_cache","page":"API Documentation","title":"CompressedBeliefMDPs.make_cache","text":"make_cache(B, B̃)\n\nHelper function that creates a cache that maps each unique belief from the set B to its corresponding compressed representation in B̃.\n\nArguments\n\nB::Vector{<:Any}: A vector of beliefs.\nB̃::Matrix{Float64}: A matrix where each row corresponds to the compressed representation of the beliefs in B.\n\nReturns\n\nDict{<:Any, Vector{Float64}}: A dictionary mapping each unique belief in B to its corresponding compressed representation in B̃.\n\nExample Usage\n\nB = [belief1, belief2, belief3]\nB̃ = [compressed_belief1; compressed_belief2; compressed_belief3]\nϕ = make_cache(B, B̃)\n\n\n\n\n\n","category":"function"},{"location":"api/#CompressedBeliefMDPs.make_numerical","page":"API Documentation","title":"CompressedBeliefMDPs.make_numerical","text":"make_numerical(B, pomdp)\n\nHelper function that converts a set of beliefs B into a numerical matrix representation suitable for processing by numerical algorithms/compressors.\n\nArguments\n\nB::Vector{<:Any}: A vector of beliefs.\npomdp::POMDP: The POMDP model associated with the beliefs.\n\nReturns\n\nMatrix{Float64}: A matrix where each row corresponds to a numerical representation of a belief in B.\n\nExample Usage\n\nB = [belief1, belief2, belief3]\nB_numerical = make_numerical(B, pomdp)\n\n\n\n\n\n","category":"function"},{"location":"api/#CompressedBeliefMDPs.compress_POMDP","page":"API Documentation","title":"CompressedBeliefMDPs.compress_POMDP","text":"compress_POMDP(pomdp, sampler, updater, compressor)\n\nCreates a compressed belief-state MDP by sampling, compressing, and caching beliefs from the given POMDP.\n\nArguments\n\npomdp::POMDP: The POMDP model to be compressed.\nsampler::Sampler: A sampler to generate a set of beliefs from the POMDP.\nupdater::Updater: An updater to initialize beliefs from states.\ncompressor::Compressor: A compressor to reduce the dimensionality of the beliefs.\n\nReturns\n\nCompressedBeliefMDP: The constructed compressed belief-state MDP.\nMatrix{Float64}: A matrix where each row corresponds to the compressed representation of the sampled beliefs.\n\nExample Usage\n\n```julia pomdp = TigerPOMDP() sampler = BeliefExpansionSampler(pomdp) updater = DiscreteUpdater(pomdp) compressor = PCACompressor(2) m, B̃ = compress_POMDP(pomdp, sampler, updater, compressor)\n\n\n\n\n\n","category":"function"},{"location":"compressors/#Compressors","page":"Compressors","title":"Compressors","text":"","category":"section"},{"location":"compressors/#Defining-a-Belief-Compressor","page":"Compressors","title":"Defining a Belief Compressor","text":"","category":"section"},{"location":"compressors/","page":"Compressors","title":"Compressors","text":"In this section, we outline the requirements and guidelines for defining a belief Compressor.","category":"page"},{"location":"compressors/#Interface","page":"Compressors","title":"Interface","text":"","category":"section"},{"location":"compressors/","page":"Compressors","title":"Compressors","text":"The Compressor interface is extremely minimal. It only supports two methods: fit! and the associated functor. For example, if you wanted to implement your own Compressor, you could write something like this","category":"page"},{"location":"compressors/","page":"Compressors","title":"Compressors","text":"struct MyCompressor <: Compressor\n foo\n bar\nend\n\n# functor definition\nfunction (c::MyCompressor)(beliefs)\n # YOUR CODE HERE\n return compressed_beliefs\nend\n\nfunction fit!(c::MyCompressor, beliefs)\n # YOUR CODE HERE\nend","category":"page"},{"location":"compressors/#Implementation-Tips","page":"Compressors","title":"Implementation Tips","text":"","category":"section"},{"location":"compressors/","page":"Compressors","title":"Compressors","text":"For robustness, both the functor and fit! should be able to handle AbstractVector and AbstractMatrix inputs. \nfit! is called only once after beliefs are sampled from the POMDP.\nCompressedBeliefSolver will attempt to convert each belief state (often of type DiscreteBelief) into an AbstractArray{Float64} using convert_s. As a convenience, CompressedBeliefMDP implements conversions for commonly used belief types; however, if the POMDP has a custom belief state, then it is the users' responsibility to implement the appropriate conversion. See the source code for help. ","category":"page"},{"location":"compressors/#Implemented-Compressors","page":"Compressors","title":"Implemented Compressors","text":"","category":"section"},{"location":"compressors/","page":"Compressors","title":"Compressors","text":"CompressedBeliefMDPs currently provides wrappers for the following compression types:","category":"page"},{"location":"compressors/","page":"Compressors","title":"Compressors","text":"a principal component analysis (PCA) compressor,\na kernel PCA compressor,\na probabilistic PCA compressor,\na factor analysis compressor,\nan isomap compressor,\nan autoencoder compressor\na variational auto-encoder (VAE) compressor","category":"page"},{"location":"compressors/#Principal-Component-Analysis-(PCA)","page":"Compressors","title":"Principal Component Analysis (PCA)","text":"","category":"section"},{"location":"compressors/","page":"Compressors","title":"Compressors","text":"PCACompressor","category":"page"},{"location":"compressors/#CompressedBeliefMDPs.PCACompressor","page":"Compressors","title":"CompressedBeliefMDPs.PCACompressor","text":"Wrapper for MultivariateStats.PCA.\n\n\n\n\n\n","category":"function"},{"location":"compressors/#Kernel-PCA","page":"Compressors","title":"Kernel PCA","text":"","category":"section"},{"location":"compressors/","page":"Compressors","title":"Compressors","text":"KernelPCACompressor","category":"page"},{"location":"compressors/#CompressedBeliefMDPs.KernelPCACompressor","page":"Compressors","title":"CompressedBeliefMDPs.KernelPCACompressor","text":"Wrapper for MultivariateStats.KernelPCA.\n\n\n\n\n\n","category":"function"},{"location":"compressors/#Probabilistic-PCA","page":"Compressors","title":"Probabilistic PCA","text":"","category":"section"},{"location":"compressors/","page":"Compressors","title":"Compressors","text":"PPCACompressor","category":"page"},{"location":"compressors/#CompressedBeliefMDPs.PPCACompressor","page":"Compressors","title":"CompressedBeliefMDPs.PPCACompressor","text":"Wrapper for MultivariateStats.PPCA.\n\n\n\n\n\n","category":"function"},{"location":"compressors/#Factor-Analysis","page":"Compressors","title":"Factor Analysis","text":"","category":"section"},{"location":"compressors/","page":"Compressors","title":"Compressors","text":"FactorAnalysisCompressor","category":"page"},{"location":"compressors/#CompressedBeliefMDPs.FactorAnalysisCompressor","page":"Compressors","title":"CompressedBeliefMDPs.FactorAnalysisCompressor","text":"Wrapper for MultivariateStats.FactorAnalysis\n\n\n\n\n\n","category":"function"},{"location":"compressors/#Isomap","page":"Compressors","title":"Isomap","text":"","category":"section"},{"location":"compressors/","page":"Compressors","title":"Compressors","text":"IsomapCompressor","category":"page"},{"location":"compressors/#CompressedBeliefMDPs.IsomapCompressor","page":"Compressors","title":"CompressedBeliefMDPs.IsomapCompressor","text":"Wrapper for ManifoldLearning.Isomap.\n\n\n\n\n\n","category":"function"},{"location":"compressors/#Autoencoder","page":"Compressors","title":"Autoencoder","text":"","category":"section"},{"location":"compressors/","page":"Compressors","title":"Compressors","text":"AutoencoderCompressor","category":"page"},{"location":"compressors/#CompressedBeliefMDPs.AutoencoderCompressor","page":"Compressors","title":"CompressedBeliefMDPs.AutoencoderCompressor","text":"Implements an autoencoder in Flux.\n\n\n\n\n\n","category":"type"},{"location":"compressors/#Variational-Auto-Encoder-(VAE)","page":"Compressors","title":"Variational Auto-Encoder (VAE)","text":"","category":"section"},{"location":"compressors/","page":"Compressors","title":"Compressors","text":"VAECompressor","category":"page"},{"location":"compressors/#CompressedBeliefMDPs.VAECompressor","page":"Compressors","title":"CompressedBeliefMDPs.VAECompressor","text":"Implements a VAE in Flux.\n\n\n\n\n\n","category":"type"},{"location":"compressors/","page":"Compressors","title":"Compressors","text":"warning: Warning\nSome compression algorithms aren't optimized for large belief spaces. While they pass our unit tests, they may fail on large POMDPs or without seeding. For large POMDPs, users may want a custom Compressor.","category":"page"},{"location":"circular/#Circular-Maze","page":"Environments","title":"Circular Maze","text":"","category":"section"},{"location":"circular/#Description","page":"Environments","title":"Description","text":"","category":"section"},{"location":"circular/","page":"Environments","title":"Environments","text":"This environment is a generalization of the Circular Maze POMDP described in Finding Approximate POMDP solutions Through Belief Compression.[1] The world consists of n_corridor 1D circular corridors that each have corridor_length states. The robot spawns in a random corridor. It must determine which corridor its in, navigate to the proper goal state, and finally declare that it has finished.","category":"page"},{"location":"circular/","page":"Environments","title":"Environments","text":"(Image: )","category":"page"},{"location":"circular/","page":"Environments","title":"Environments","text":"Figure from Finding Approximate POMDP solutions Through Belief Compression.","category":"page"},{"location":"circular/","page":"Environments","title":"Environments","text":"[1]: Roy doesn't actually name his toy environment. For the original environment details, see the \"PCA Performance\" subsection on page 8.","category":"page"},{"location":"circular/#Action-Space","page":"Environments","title":"Action Space","text":"","category":"section"},{"location":"circular/","page":"Environments","title":"Environments","text":"Transitions left and right are noisy and non-deterministic. Transition probabilities are from a discrete von Mises distribution with unit concentration and mean at the target state. ","category":"page"},{"location":"circular/","page":"Environments","title":"Environments","text":"Num Action Description\n1 CMAZE_LEFT Move left with von Mises noise.\n2 CMAZE_RIGHT Move right with von Mises noise.\n3 CMAZE_SENSE_CORRIDOR Observe the current corridor.\n4 CMAZE_DECLARE_GOAL Ends the episode. Receive r_findgoal if at the goal.","category":"page"},{"location":"circular/#State-Space","page":"Environments","title":"State Space","text":"","category":"section"},{"location":"circular/","page":"Environments","title":"Environments","text":"The (ordered) state space is an array of all CircularMazeStates and a terminalstate: [CircularMaze(1, 1), ..., CircularMaze(n_corridors, corridor_length), TerminalState()].","category":"page"},{"location":"circular/#Observation-Space","page":"Environments","title":"Observation Space","text":"","category":"section"},{"location":"circular/","page":"Environments","title":"Environments","text":"The observation space is the union of the state space and 1:n_corridors. If the robot picks CMAZE_SENSE_CORRIDOR, they observe the index of the current corridor. Otherwise, they observe their current state with von Mises noise.","category":"page"},{"location":"circular/#Rewards","page":"Environments","title":"Rewards","text":"","category":"section"},{"location":"circular/","page":"Environments","title":"Environments","text":"The goal is to navigate to the correct goal state for the given corridor and then to declare the goal once arrived. If the robot correctly declares the goal, it receives r_findgoal. It incurs a r_timestep_penalty for every timestep it does not reach the goal. By default r_findgoal is 1 and r_timestep_penalty is 0. ","category":"page"},{"location":"circular/#Starting-State","page":"Environments","title":"Starting State","text":"","category":"section"},{"location":"circular/","page":"Environments","title":"Environments","text":"The initial state is sampled from a repeated, discrete von Mises distribution each with a concentration at the center of the hallway. ","category":"page"},{"location":"circular/","page":"Environments","title":"Environments","text":"(Image: )","category":"page"},{"location":"circular/#Episode-End","page":"Environments","title":"Episode End","text":"","category":"section"},{"location":"circular/","page":"Environments","title":"Environments","text":"The episode terminates once the robot declares the goal CMAZE_DECLARE_GOAL regardless of whether the robot is correct.","category":"page"},{"location":"circular/#Documentation","page":"Environments","title":"Documentation","text":"","category":"section"},{"location":"circular/","page":"Environments","title":"Environments","text":"CircularMaze","category":"page"},{"location":"circular/#CompressedBeliefMDPs.CircularMaze","page":"Environments","title":"CompressedBeliefMDPs.CircularMaze","text":"CircularMaze(n_corridors::Integer, corridor_length::Integer, discount::Float64, r_findgoal::Float64, r_timestep_penalty::Float64)\nCircularMaze(n_corridors::Integer, corridor_length::Integer; kwargs...)\nCircularMaze()\n\nA POMDP representing a circular maze environment.\n\nFields\n\nn_corridors::Integer: Number of corridors in the circular maze.\ncorridor_length::Integer: Length of each corridor.\nprobabilities::AbstractArray: Probability masses for creating von Mises distributions.\ncenter::Integer: The central position in the maze.\ndiscount::Float64: Discount factor for future rewards.\nr_findgoal::Float64: Reward for finding the goal.\nr_timestep_penalty::Float64: Penalty for each timestep taken.\nstates::AbstractArray: Array of all possible states in the maze.\ngoals::AbstractArray: Array of goal states in the maze.\n\nExample\n\nusing CompressedBeliefMDPs\n\nn_corridors = 8\ncorridor_length = 25\nmaze = CircularMaze(n_corridors, corridor_length)\n\n\n\n\n\n","category":"type"},{"location":"circular/","page":"Environments","title":"Environments","text":"CircularMazeState","category":"page"},{"location":"circular/#CompressedBeliefMDPs.CircularMazeState","page":"Environments","title":"CompressedBeliefMDPs.CircularMazeState","text":"CircularMazeState(corridor::Integer, x::Integer)\n\nThe CircularMazeState struct represents the state of an agent in a circular maze.\n\nFields\n\ncorridor::Integer: The corridor number. The value ranges from 1 to n_corridors.\nx::Integer: The position of the state within the corridor. The value ranges from 1 to the corridor_length.\n\n\n\n\n\n","category":"type"},{"location":"#CompressedBeliefMDPs.jl","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"","category":"section"},{"location":"#Introduction","page":"CompressedBeliefMDPs.jl","title":"Introduction","text":"","category":"section"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"Welcome to CompressedBeliefMDPs.jl! This package is part of the POMDPs.jl ecosystem and takes inspiration from Exponential Family PCA for Belief Compression in POMDPs. ","category":"page"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"This package provides a general framework for applying belief compression in large POMDPs with generic compression, sampling, and planning algorithms.","category":"page"},{"location":"#Installation","page":"CompressedBeliefMDPs.jl","title":"Installation","text":"","category":"section"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"You can install CompressedBeliefMDPs.jl using Julia's package manager. Open the Julia REPL (press ] to enter the package manager mode) and run the following command:","category":"page"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"pkg> add CompressedBeliefMDPs","category":"page"},{"location":"#Quickstart","page":"CompressedBeliefMDPs.jl","title":"Quickstart","text":"","category":"section"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"Using belief compression is easy. Simplify pick a Sampler, Compressor, and a base Policy and then use the standard POMDPs.jl interface.","category":"page"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"using POMDPs, POMDPTools, POMDPModels\nusing CompressedBeliefMDPs\n\npomdp = BabyPOMDP()\ncompressor = PCACompressor(1)\nupdater = DiscreteUpdater(pomdp)\nsampler = BeliefExpansionSampler(pomdp)\nsolver = CompressedBeliefSolver(\n pomdp;\n compressor=compressor,\n sampler=sampler,\n updater=updater,\n verbose=true, \n max_iterations=100, \n n_generative_samples=50, \n k=2\n)\npolicy = solve(solver, pomdp)","category":"page"},{"location":"#Continuous-Example","page":"CompressedBeliefMDPs.jl","title":"Continuous Example","text":"","category":"section"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"This example demonstrates using CompressedBeliefMDP in a continuous setting with the LightDark1D POMDP. It combines particle filters for belief updating and Monte Carlo Tree Search (MCTS) as the solver. While compressing a 1D space is trivial toy problem, this architecture can be easily scaled to larger POMDPs with continuous state and action spaces.","category":"page"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"using POMDPs, POMDPModels, POMDPTools\nusing ParticleFilters\nusing MCTS\nusing CompressedBeliefMDPs\n\npomdp = LightDark1D()\npomdp.movement_cost = 1\nbase_solver = MCTSSolver(n_iterations=10, depth=50, exploration_constant=5.0)\nupdater = BootstrapFilter(pomdp, 100)\nsolver = CompressedBeliefSolver(\n pomdp,\n base_solver;\n updater=updater,\n sampler=PolicySampler(pomdp; updater=updater)\n)\npolicy = solve(solver, pomdp)\nrs = RolloutSimulator(max_steps=50)\nr = simulate(rs, pomdp, policy)","category":"page"},{"location":"#Continuous-Example-2","page":"CompressedBeliefMDPs.jl","title":"Continuous Example","text":"","category":"section"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"This example demonstrates using CompressedBeliefMDP in a continuous setting with the LightDark1D POMDP. It combines particle filters for belief updating and Monte Carlo Tree Search (MCTS) as the solver. While compressing a 1D space is trivial toy problem, this architecture can be easily scaled to larger POMDPs with continuous state and action spaces.","category":"page"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"using POMDPs, POMDPModels, POMDPTools\nusing ParticleFilters\nusing MCTS\nusing CompressedBeliefMDPs\n\npomdp = LightDark1D()\npomdp.movement_cost = 1\nbase_solver = MCTSSolver(n_iterations=10, depth=50, exploration_constant=5.0)\nupdater = BootstrapFilter(pomdp, 100)\nsolver = CompressedBeliefSolver(\n pomdp,\n base_solver;\n updater=updater,\n sampler=PolicySampler(pomdp; updater=updater)\n)\npolicy = solve(solver, pomdp)\nrs = RolloutSimulator(max_steps=50)\nr = simulate(rs, pomdp, policy)","category":"page"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"Note: We use MCTS here as a proof of concept that CompressedBeliefMDPs can handle continuous state and action spaces. In reality, belief compression has no effect on MCTS with double progressive widening. If you want to solve continuous POMDPs, we suggest implementing a custom solver or looking into Crux.jl.","category":"page"},{"location":"#Large-Example","page":"CompressedBeliefMDPs.jl","title":"Large Example","text":"","category":"section"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"In this example, we tackle a more realistic scenario with the TMaze POMDP, which has 123 states. To handle the larger state space efficiently, we employ a variational auto-encoder (VAE) to compress the belief simplex. By leveraging the VAE's ability to learn a compact representation of the belief state, we focus computational power on the relevant compressed belief states during each Bellman update.","category":"page"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"using POMDPs, POMDPModels, POMDPTools\nusing CompressedBeliefMDPs\n\npomdp = TMaze(60, 0.9)\nsolver = CompressedBeliefSolver(\n pomdp;\n compressor=VAECompressor(123, 6; hidden_dim=10, verbose=true, epochs=2),\n sampler=PolicySampler(pomdp, n=500),\n verbose=true, \n max_iterations=1000, \n n_generative_samples=30,\n k=2\n)\npolicy = solve(solver, pomdp)\nrs = RolloutSimulator(max_steps=50)\nr = simulate(rs, pomdp, policy)","category":"page"},{"location":"#Concepts-and-Architecture","page":"CompressedBeliefMDPs.jl","title":"Concepts and Architecture","text":"","category":"section"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"CompressedBeliefMDPs.jl aims to implement a generalization of the belief compression algorithm for solving large POMDPs. The algorithm has four steps:","category":"page"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"collect belief samples,\ncompress the samples,\ncreate the compressed belief-state MDP,\nsolve the MDP.","category":"page"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"Each step is handled by Sampler, Compressor, CompressedBeliefMDP, and CompressedBeliefSolver respectively.","category":"page"},{"location":"","page":"CompressedBeliefMDPs.jl","title":"CompressedBeliefMDPs.jl","text":"For more details, please see the rest of the documentation or the associated paper.","category":"page"},{"location":"samplers/#Samplers","page":"Samplers","title":"Samplers","text":"","category":"section"},{"location":"samplers/#Defining-a-Sampler","page":"Samplers","title":"Defining a Sampler","text":"","category":"section"},{"location":"samplers/","page":"Samplers","title":"Samplers","text":"In this section, we outline the requirements and guidelines for defining a belief Sampler.","category":"page"},{"location":"samplers/#Interface","page":"Samplers","title":"Interface","text":"","category":"section"},{"location":"samplers/","page":"Samplers","title":"Samplers","text":"The Sampler interface only has one method: the functor. For example, if you wanted to implement your own Sampler, you could write something like this","category":"page"},{"location":"samplers/","page":"Samplers","title":"Samplers","text":"struct MySampler <: Compressor\n foo\n bar\nend\n\n# functor definition\nfunction (c::MySampler)(pomdp::POMDP)\n # YOUR CODE HERE\n return sampled_beliefs\nend","category":"page"},{"location":"samplers/#Implemented-Sampler","page":"Samplers","title":"Implemented Sampler","text":"","category":"section"},{"location":"samplers/","page":"Samplers","title":"Samplers","text":"CompressedBeliefMDPs provides the following generic belief samplers:","category":"page"},{"location":"samplers/","page":"Samplers","title":"Samplers","text":"an exploratory belief expansion sampler\na Policy rollout sampler\nan ExplorationPolicy rollout sampler","category":"page"},{"location":"samplers/#Exploratory-Belief-Expansion","page":"Samplers","title":"Exploratory Belief Expansion","text":"","category":"section"},{"location":"samplers/","page":"Samplers","title":"Samplers","text":"BeliefExpansionSampler","category":"page"},{"location":"samplers/#CompressedBeliefMDPs.BeliefExpansionSampler","page":"Samplers","title":"CompressedBeliefMDPs.BeliefExpansionSampler","text":"BeliefExpansionSampler\n\nFast extension of exploratory belief expansion (Algorithm 21.13 in Algorithms for Decision Making) that uses k-d trees.\n\nFields\n\nupdater::Updater: The updater used to update beliefs.\nmetric::NearestNeighbors.MinkowskiMetric: The metric used to measure distances between beliefs.\n\nIt must be a Minkowski metric.\n\nn::Integer: The number of belief expansions to perform.\n\nConstructors\n\nBeliefExpansionSampler(pomdp::POMDP; updater::Updater=DiscreteUpdater(pomdp),\nmetric::NearestNeighbors.MinkowskiMetric=Euclidean(), n::Integer=3)\n\nMethods\n\n(s::BeliefExpansionSampler)(pomdp::POMDP)\n\nCreates an initial belief and performs exploratory belief expansion. Returns the unique belief states. Only works for POMDPs with discrete state, action, and observation spaces.\n\nExample Usage\n\njulia> pomdp = TigerPOMDP();\njulia> sampler = BeliefExpansionSampler(pomdp; n=2);\njulia> beliefs = sampler(pomdp)\nSet{DiscreteBelief{TigerPOMDP, Bool}} with 4 elements:\n DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.15000000000000002, 0.85])\n DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.5, 0.5])\n DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.85, 0.15000000000000002])\n DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.9697986577181208, 0.030201342281879207])\n\n\n\n\n\n","category":"type"},{"location":"samplers/#Policy-Sampler","page":"Samplers","title":"Policy Sampler","text":"","category":"section"},{"location":"samplers/","page":"Samplers","title":"Samplers","text":"PolicySampler","category":"page"},{"location":"samplers/#CompressedBeliefMDPs.PolicySampler","page":"Samplers","title":"CompressedBeliefMDPs.PolicySampler","text":"PolicySampler\n\nSamples belief states by rolling out a Policy.\n\nFields\n\npolicy::Policy: The policy used for decision making.\nupdater::Updater: The updater used for updating beliefs.\nn::Integer: The maximum number of simulated steps.\nrng::AbstractRNG: The random number generator used for sampling.\nverbose::Bool: Whether to use a progress bar while sampling.\n\nConstructors\n\nPolicySampler(pomdp::POMDP; policy::Policy=RandomPolicy(pomdp), \nupdater::Updater=DiscreteUpdater(pomdp), n::Integer=10, \nrng::AbstractRNG=Random.GLOBAL_RNG)\n\nMethods\n\n(s::PolicySampler)(pomdp::POMDP)\n\nReturns a vector of unique belief states.\n\nExample\n\njulia> pomdp = TigerPOMDP();\njulia> sampler = PolicySampler(pomdp; n=3); \njulia> 2-element Vector{Any}:\nDiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.5, 0.5])\nDiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.15000000000000002, 0.85])\n\n\n\n\n\n","category":"type"},{"location":"samplers/#ExplorationPolicy-Sampler","page":"Samplers","title":"ExplorationPolicy Sampler","text":"","category":"section"},{"location":"samplers/","page":"Samplers","title":"Samplers","text":"ExplorationPolicySampler","category":"page"},{"location":"samplers/#CompressedBeliefMDPs.ExplorationPolicySampler","page":"Samplers","title":"CompressedBeliefMDPs.ExplorationPolicySampler","text":"ExplorationPolicySampler\n\nSamples belief states by rolling out an ExplorationPolicy. Essentially identical to PolicySampler.\n\nFields\n\nexplorer::ExplorationPolicy: The ExplorationPolicy used for decision making.\non_policy::Policy: The fallback Policy used for decision making when not exploring.\nupdater::Updater: The updater used for updating beliefs.\nn::Integer: The maximum number of simulated steps.\nrng::AbstractRNG: The random number generator used for sampling.\nverbose::Bool: Whether to use a progress bar while sampling.\n\nConstructors\n\nExplorationPolicySampler(pomdp::POMDP; rng::AbstractRNG=Random.GLOBAL_RNG,\nexplorer::ExplorationPolicy=EpsGreedyPolicy(pomdp, 0.1; rng=rng), on_policy=RandomPolicy(pomdp),\nupdater::Updater=DiscreteUpdater(pomdp), n::Integer=10)\n\nMethods\n\n(s::ExplorationPolicySampler)(pomdp::POMDP)\n\nReturns a vector of unique belief states.\n\nExample Usage\n\njulia> pomdp = TigerPOMDP()\njulia> sampler = ExplorationPolicySampler(pomdp; n=30)\njulia> sampler(pomdp)\n3-element Vector{Any}:\n DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.5, 0.5])\n DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.85, 0.15000000000000002])\n DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.9697986577181208, 0.030201342281879207])\n\n\n\n\n\n","category":"type"}] }