From 50218f36d677b2ecb5d4c50b8d15f16bca15299b Mon Sep 17 00:00:00 2001 From: "Documenter.jl" Date: Mon, 10 Jul 2023 14:51:35 +0000 Subject: [PATCH] build based on 7dcbd58 --- dev/POMDPTools/beliefs/index.html | 4 +-- dev/POMDPTools/common_rl/index.html | 4 +-- dev/POMDPTools/distributions/index.html | 4 +-- dev/POMDPTools/index.html | 2 +- dev/POMDPTools/model/index.html | 8 ++--- dev/POMDPTools/policies/index.html | 12 ++++---- dev/POMDPTools/simulators/index.html | 18 +++++------ dev/POMDPTools/testing/index.html | 4 +-- dev/POMDPTools/visualization/index.html | 4 +-- dev/api/index.html | 40 ++++++++++++------------- dev/concepts/index.html | 2 +- dev/def_pomdp/index.html | 2 +- dev/def_solver/index.html | 2 +- dev/def_updater/index.html | 2 +- dev/faq/index.html | 2 +- dev/get_started/index.html | 2 +- dev/index.html | 2 +- dev/install/index.html | 2 +- dev/interfaces/index.html | 2 +- dev/offline_solver/index.html | 2 +- dev/online_solver/index.html | 2 +- dev/policy_interaction/index.html | 2 +- dev/run_simulation/index.html | 2 +- dev/search/index.html | 2 +- dev/search_index.js | 2 +- dev/simulation/index.html | 2 +- 26 files changed, 66 insertions(+), 66 deletions(-) diff --git a/dev/POMDPTools/beliefs/index.html b/dev/POMDPTools/beliefs/index.html index c70666a9..59db599d 100644 --- a/dev/POMDPTools/beliefs/index.html +++ b/dev/POMDPTools/beliefs/index.html @@ -1,7 +1,7 @@ -Implemented Belief Updaters · POMDPs.jl

Implemented Belief Updaters

POMDPTools provides the following generic belief updaters:

  • a discrete belief updater
  • a k previous observation updater
  • a previous observation updater
  • a nothing updater (for when the policy does not depend on any feedback)

For particle filters see ParticleFilters.jl.

Discrete (Bayesian Filter)

The DiscreteUpater is a default implementation of a discrete Bayesian filter. The DiscreteBelief type is provided to represent discrete beliefs for discrete state POMDPs.

A convenience function uniform_belief is provided to create a DiscreteBelief with equal probability for each state.

POMDPTools.BeliefUpdaters.DiscreteBeliefType
DiscreteBelief

A belief specified by a probability vector.

Normalization of b is assumed in some calculations (e.g. pdf), but it is only automatically enforced in update(...), and a warning is given if normalized incorrectly in DiscreteBelief(pomdp, b).

Constructor

DiscreteBelief(pomdp, b::Vector{Float64}; check::Bool=true)

Fields

  • pomdp : the POMDP problem
  • state_list : a vector of ordered states
  • b : the probability vector
source

K Previous Observations

POMDPTools.BeliefUpdaters.KMarkovUpdaterType
KMarkovUpdater

Updater that stores the k most recent observations as the belief.

Example:

up = KMarkovUpdater(5)
+Implemented Belief Updaters · POMDPs.jl

Implemented Belief Updaters

POMDPTools provides the following generic belief updaters:

  • a discrete belief updater
  • a k previous observation updater
  • a previous observation updater
  • a nothing updater (for when the policy does not depend on any feedback)

For particle filters see ParticleFilters.jl.

Discrete (Bayesian Filter)

The DiscreteUpater is a default implementation of a discrete Bayesian filter. The DiscreteBelief type is provided to represent discrete beliefs for discrete state POMDPs.

A convenience function uniform_belief is provided to create a DiscreteBelief with equal probability for each state.

POMDPTools.BeliefUpdaters.DiscreteBeliefType
DiscreteBelief

A belief specified by a probability vector.

Normalization of b is assumed in some calculations (e.g. pdf), but it is only automatically enforced in update(...), and a warning is given if normalized incorrectly in DiscreteBelief(pomdp, b).

Constructor

DiscreteBelief(pomdp, b::Vector{Float64}; check::Bool=true)

Fields

  • pomdp : the POMDP problem
  • state_list : a vector of ordered states
  • b : the probability vector
source

K Previous Observations

POMDPTools.BeliefUpdaters.KMarkovUpdaterType
KMarkovUpdater

Updater that stores the k most recent observations as the belief.

Example:

up = KMarkovUpdater(5)
 s0 = rand(rng, initialstate(pomdp))
 initial_observation = rand(rng, initialobs(pomdp, s0))
 initial_obs_vec = fill(initial_observation, 5)
 hr = HistoryRecorder(rng=rng, max_steps=100)
-hist = simulate(hr, pomdp, policy, up, initial_obs_vec, s0)
source

Previous Observation

Nothing Updater

+hist = simulate(hr, pomdp, policy, up, initial_obs_vec, s0)
source

Previous Observation

Nothing Updater

diff --git a/dev/POMDPTools/common_rl/index.html b/dev/POMDPTools/common_rl/index.html index c710ebc8..94acffab 100644 --- a/dev/POMDPTools/common_rl/index.html +++ b/dev/POMDPTools/common_rl/index.html @@ -12,5 +12,5 @@ m = convert(POMDP, env) planner = solve(POMCPSolver(), m) a = action(planner, initialstate(m))

You can also use the constructors listed below to manually convert between the interfaces.

Environment Wrapper Types

Since the standard reinforcement learning environment interface offers less information about the internal workings of the environment than the POMDPs.jl interface, MDPs and POMDPs created from these environments will have limited functionality. There are two types of (PO)MDP types that can wrap an environment:

Generative model wrappers

If the state and setstate! CommonRLInterface functions are provided, then the environment can be wrapped in a RLEnvMDP or RLEnvPOMDP and the POMDPs.jl generative model interface will be available.

Opaque wrappers

If the state and setstate! are not provided, then the resulting POMDP or MDP can only be simulated. This case is represented using the OpaqueRLEnvPOMDP and OpaqueRLEnvMDP wrappers. From the POMDPs.jl perspective, the state of the opaque (PO)MDP is just an integer wrapped in an OpaqueRLEnvState. This keeps track of the "age" of the environment so that POMDPs.jl actions that attempt to interact with the environment at a different age are invalid.

Constructors

Creating RL environments from MDPs and POMDPs

POMDPTools.CommonRLIntegration.MDPCommonRLEnvType
MDPCommonRLEnv(m, [s])
-MDPCommonRLEnv{RLO}(m, [s])

Create a CommonRLInterface environment from MDP m; optionally specify the state 's'.

The RLO parameter can be used to specify a type to convert the observation to. By default, this is AbstractArray. Use Any to disable conversion.

source
POMDPTools.CommonRLIntegration.POMDPCommonRLEnvType
POMDPCommonRLEnv(m, [s], [o])
-POMDPCommonRLEnv{RLO}(m, [s], [o])

Create a CommonRLInterface environment from POMDP m; optionally specify the state 's' and observation 'o'.

The RLO parameter can be used to specify a type to convert the observation to. By default, this is AbstractArray. Use Any to disable conversion.

source

Creating MDPs and POMDPs from RL environments

POMDPTools.CommonRLIntegration.RLEnvMDPType
RLEnvMDP(env; discount=1.0)

Create an MDP by wrapping a CommonRLInterface.AbstractEnv. state and setstate! from CommonRLInterface must be provided, and the POMDPs generative model functionality will be provided.

source
POMDPTools.CommonRLIntegration.RLEnvPOMDPType
RLEnvPOMDP(env; discount=1.0)

Create an POMDP by wrapping a CommonRLInterface.AbstractEnv. state and setstate! from CommonRLInterface must be provided, and the POMDPs generative model functionality will be provided.

source
POMDPTools.CommonRLIntegration.OpaqueRLEnvMDPType
OpaqueRLEnvMDP(env; discount=1.0)

Wrap a CommonRLInterface.AbstractEnv in an MDP object. The state will be an OpaqueRLEnvState and only simulation will be supported.

source
POMDPTools.CommonRLIntegration.OpaqueRLEnvPOMDPType
OpaqueRLEnvPOMDP(env; discount=1.0)

Wrap a CommonRLInterface.AbstractEnv in an POMDP object. The state will be an OpaqueRLEnvState and only simulation will be supported.

source
+MDPCommonRLEnv{RLO}(m, [s])

Create a CommonRLInterface environment from MDP m; optionally specify the state 's'.

The RLO parameter can be used to specify a type to convert the observation to. By default, this is AbstractArray. Use Any to disable conversion.

source
POMDPTools.CommonRLIntegration.POMDPCommonRLEnvType
POMDPCommonRLEnv(m, [s], [o])
+POMDPCommonRLEnv{RLO}(m, [s], [o])

Create a CommonRLInterface environment from POMDP m; optionally specify the state 's' and observation 'o'.

The RLO parameter can be used to specify a type to convert the observation to. By default, this is AbstractArray. Use Any to disable conversion.

source

Creating MDPs and POMDPs from RL environments

POMDPTools.CommonRLIntegration.RLEnvMDPType
RLEnvMDP(env; discount=1.0)

Create an MDP by wrapping a CommonRLInterface.AbstractEnv. state and setstate! from CommonRLInterface must be provided, and the POMDPs generative model functionality will be provided.

source
POMDPTools.CommonRLIntegration.RLEnvPOMDPType
RLEnvPOMDP(env; discount=1.0)

Create an POMDP by wrapping a CommonRLInterface.AbstractEnv. state and setstate! from CommonRLInterface must be provided, and the POMDPs generative model functionality will be provided.

source
POMDPTools.CommonRLIntegration.OpaqueRLEnvMDPType
OpaqueRLEnvMDP(env; discount=1.0)

Wrap a CommonRLInterface.AbstractEnv in an MDP object. The state will be an OpaqueRLEnvState and only simulation will be supported.

source
POMDPTools.CommonRLIntegration.OpaqueRLEnvPOMDPType
OpaqueRLEnvPOMDP(env; discount=1.0)

Wrap a CommonRLInterface.AbstractEnv in an POMDP object. The state will be an OpaqueRLEnvState and only simulation will be supported.

source
diff --git a/dev/POMDPTools/distributions/index.html b/dev/POMDPTools/distributions/index.html index d040bd72..a5271e85 100644 --- a/dev/POMDPTools/distributions/index.html +++ b/dev/POMDPTools/distributions/index.html @@ -1,5 +1,5 @@ -Implemented Distributions · POMDPs.jl

Implemented Distributions

POMDPTools contains several utility distributions to be used in the POMDPs transition and observation functions. These implement the appropriate methods of the functions in the distributions interface.

This package also supplies showdistribution for pretty printing distributions as unicode bar graphs to the terminal.

Sparse Categorical (SparseCat)

SparseCat is a sparse categorical distribution which is specified by simply providing a list of possible values (states or observations) and the probabilities corresponding to those particular objects.

Example: SparseCat([1,2,3], [0.1,0.2,0.7]) is a categorical distribution that assigns probability 0.1 to 1, 0.2 to 2, 0.7 to 3, and 0 to all other values.

POMDPTools.POMDPDistributions.SparseCatType
SparseCat(values, probabilities)

Create a sparse categorical distribution.

values is an iterable object containing the possible values (can be of any type) in the distribution that have nonzero probability. probabilities is an iterable object that contains the associated probabilities.

This is optimized for value iteration with a fast implementation of weighted_iterator. Both pdf and rand are order n.

source

Implicit

In situations where a distribution object is required, but the pdf is difficult to specify and only samples are required, ImplicitDistribution provides a convenient way to package a sampling function.

POMDPTools.POMDPDistributions.ImplicitDistributionType
ImplicitDistribution(sample_function, args...)

Define a distribution that can only be sampled from using rand, but has no explicit pdf.

Each time rand(rng, d::ImplicitDistribution) is called,

sample_function(args..., rng)

will be called to generate a new sample.

ImplicitDistribution is designed to be used with anonymous functions or the do syntax as follows:

Examples

ImplicitDistribution(rng->rand(rng)^2)
struct MyMDP <: MDP{Float64, Int} end
+Implemented Distributions · POMDPs.jl

Implemented Distributions

POMDPTools contains several utility distributions to be used in the POMDPs transition and observation functions. These implement the appropriate methods of the functions in the distributions interface.

This package also supplies showdistribution for pretty printing distributions as unicode bar graphs to the terminal.

Sparse Categorical (SparseCat)

SparseCat is a sparse categorical distribution which is specified by simply providing a list of possible values (states or observations) and the probabilities corresponding to those particular objects.

Example: SparseCat([1,2,3], [0.1,0.2,0.7]) is a categorical distribution that assigns probability 0.1 to 1, 0.2 to 2, 0.7 to 3, and 0 to all other values.

POMDPTools.POMDPDistributions.SparseCatType
SparseCat(values, probabilities)

Create a sparse categorical distribution.

values is an iterable object containing the possible values (can be of any type) in the distribution that have nonzero probability. probabilities is an iterable object that contains the associated probabilities.

This is optimized for value iteration with a fast implementation of weighted_iterator. Both pdf and rand are order n.

source

Implicit

In situations where a distribution object is required, but the pdf is difficult to specify and only samples are required, ImplicitDistribution provides a convenient way to package a sampling function.

POMDPTools.POMDPDistributions.ImplicitDistributionType
ImplicitDistribution(sample_function, args...)

Define a distribution that can only be sampled from using rand, but has no explicit pdf.

Each time rand(rng, d::ImplicitDistribution) is called,

sample_function(args..., rng)

will be called to generate a new sample.

ImplicitDistribution is designed to be used with anonymous functions or the do syntax as follows:

Examples

ImplicitDistribution(rng->rand(rng)^2)
struct MyMDP <: MDP{Float64, Int} end
 
 function POMDPs.transition(m::MyMDP, s, a)
     ImplicitDistribution(s, a) do s, a, rng
@@ -8,4 +8,4 @@
 end
 
 td = transition(MyMDP(), 1.0, 1)
-rand(td) # will return a number near 2
source

Bool Distribution

Deterministic

POMDPTools.POMDPDistributions.DeterministicType
Deterministic(value)

Create a deterministic distribution over only one value.

This is intended to be used when a distribution is required, but the outcome is deterministic. It is equivalent to a Kronecker Delta distribution.

source

Uniform

POMDPTools.POMDPDistributions.UniformType
Uniform(collection)

Create a uniform categorical distribution over a collection of objects.

The objects in the collection must be unique (this is tested on construction), and will be stored in a Set. To avoid this overhead, use UnsafeUniform.

source
POMDPTools.POMDPDistributions.UnsafeUniformType
UnsafeUniform(collection)

Create a uniform categorical distribution over a collection of objects.

No checks are performed to ensure uniqueness or check whether an object is actually in the set when evaluating the pdf.

source

Pretty Printing

+rand(td) # will return a number near 2
source

Bool Distribution

Deterministic

POMDPTools.POMDPDistributions.DeterministicType
Deterministic(value)

Create a deterministic distribution over only one value.

This is intended to be used when a distribution is required, but the outcome is deterministic. It is equivalent to a Kronecker Delta distribution.

source

Uniform

POMDPTools.POMDPDistributions.UniformType
Uniform(collection)

Create a uniform categorical distribution over a collection of objects.

The objects in the collection must be unique (this is tested on construction), and will be stored in a Set. To avoid this overhead, use UnsafeUniform.

source
POMDPTools.POMDPDistributions.UnsafeUniformType
UnsafeUniform(collection)

Create a uniform categorical distribution over a collection of objects.

No checks are performed to ensure uniqueness or check whether an object is actually in the set when evaluating the pdf.

source

Pretty Printing

diff --git a/dev/POMDPTools/index.html b/dev/POMDPTools/index.html index ac731e59..38ec4dd4 100644 --- a/dev/POMDPTools/index.html +++ b/dev/POMDPTools/index.html @@ -1,2 +1,2 @@ -POMDPTools: the standard library for POMDPs.jl · POMDPs.jl

POMDPTools: the standard library for POMDPs.jl

The POMDPs.jl package does nothing more than define an interface or language for interacting with and solving (PO)MDPs; it does not contain any implementations. In practice, defining and solving POMDPs is made vastly easier if some commonly-used structures are provided. The POMDPTools package contains these implementations. Thus, the relationship between POMDPs.jl and POMDPTools is similar to the relationship between a programming language and its standard library.

The POMDPTools package source code is hosted in the POMDPs.jl github repository in the lib/POMDPTools directory.

The contents of the library are outlined below:

+POMDPTools: the standard library for POMDPs.jl · POMDPs.jl

POMDPTools: the standard library for POMDPs.jl

The POMDPs.jl package does nothing more than define an interface or language for interacting with and solving (PO)MDPs; it does not contain any implementations. In practice, defining and solving POMDPs is made vastly easier if some commonly-used structures are provided. The POMDPTools package contains these implementations. Thus, the relationship between POMDPs.jl and POMDPTools is similar to the relationship between a programming language and its standard library.

The POMDPTools package source code is hosted in the POMDPs.jl github repository in the lib/POMDPTools directory.

The contents of the library are outlined below:

diff --git a/dev/POMDPTools/model/index.html b/dev/POMDPTools/model/index.html index 14838e01..d3d53bab 100644 --- a/dev/POMDPTools/model/index.html +++ b/dev/POMDPTools/model/index.html @@ -5,8 +5,8 @@ julia> collect(weighted_iterator(d)) 2-element Array{Pair{Bool,Float64},1}: true => 0.7 - false => 0.3source

Observation Weight

Sometimes, e.g. in particle filtering, the relative likelihood of an observation is required in addition to a generative model, and it is often tedious to implement a custom observation distribution type. For this case, the shortcut function obs_weight is provided.

POMDPTools.ModelTools.obs_weightFunction
obs_weight(pomdp, s, a, sp, o)

Return a weight proportional to the likelihood of receiving observation o from state sp (and a and s if they are present).

This is a useful shortcut for particle filtering so that the observation distribution does not have to be represented.

source

Ordered Spaces

It is often useful to have a list of states, actions, or observations ordered consistently with the respective index function from POMDPs.jl. Since the POMDPs.jl interface does not demand that spaces be ordered consistently with index, the states, actions, and observations functions are not sufficient. Thus POMDPModelTools provides ordered_actions, ordered_states, and ordered_observations to provide this capability.

POMDPTools.ModelTools.ordered_actionsFunction
ordered_actions(mdp)

Return an AbstractVector of actions ordered according to actionindex(mdp, a).

ordered_actions(mdp) will always return an AbstractVector{A} v containing all of the actions in actions(mdp) in the order such that actionindex(mdp, v[i]) == i. You may wish to override this for your problem for efficiency.

source
POMDPTools.ModelTools.ordered_statesFunction
ordered_states(mdp)

Return an AbstractVector of states ordered according to stateindex(mdp, a).

ordered_states(mdp) will always return a AbstractVector{A} v containing all of the states in states(mdp) in the order such that stateindex(mdp, v[i]) == i. You may wish to override this for your problem for efficiency.

source
POMDPTools.ModelTools.ordered_observationsFunction
ordered_observations(pomdp)

Return an AbstractVector of observations ordered according to obsindex(pomdp, a).

ordered_observations(mdp) will always return a AbstractVector{A} v containing all of the observations in observations(pomdp) in the order such that obsindex(pomdp, v[i]) == i. You may wish to override this for your problem for efficiency.

source

Info Interface

It is often the case that useful information besides the belief, state, action, etc is generated by a function in POMDPs.jl. This information can be useful for debugging or understanding the behavior of a solver, updater, or problem. The info interface provides a standard way for problems, policies, solvers or updaters to output this information. The recording simulators from POMDPTools automatically record this information.

To specify info from policies, solvers, or updaters, implement the following functions:

POMDPTools.ModelTools.action_infoFunction
a, ai = action_info(policy, x)

Return a tuple containing the action determined by policy 'p' at state or belief 'x' and information (usually a NamedTuple, Dict or nothing) from the calculation of that action.

By default, returns nothing as info.

source
POMDPTools.ModelTools.solve_infoFunction
policy, si = solve_info(solver, problem)

Return a tuple containing the policy determined by a solver and information (usually a NamedTuple, Dict or nothing) from the calculation of that policy.

By default, returns nothing as info.

source
POMDPTools.ModelTools.update_infoFunction
bp, i = update_info(updater, b, a, o)

Return a tuple containing the new belief and information (usually a NamedTuple, Dict or nothing) from the belief update.

By default, returns nothing as info.

source

Model Transformations

POMDPTools contains several tools for transforming problems into other classes so that they can be used by different solvers.

Linear Algebra Representations

For some algorithms, such as value iteration, it is convenient to use vectors that contain the reward for every state, and matrices that contain the transition probabilities. These can be constructed with the following functions:

POMDPTools.ModelTools.transition_matricesFunction
transition_matrices(p::SparseTabularProblem)

Accessor function for the transition model of a sparse tabular problem. It returns a list of sparse matrices for each action of the problem.

source
transition_matrices(m::Union{MDP,POMDP})
-transition_matrices(m; sparse=true)

Construct transition matrices for (PO)MDP m.

The returned object is an associative object (usually a Dict), where the keys are actions. Each value in this object is an AbstractMatrix where the row corresponds to the state index of s and the column corresponds to the state index of s'. The entry in the matrix is the probability of transitioning from state s to state s'.

source
POMDPTools.ModelTools.reward_vectorsFunction
reward_vectors(m::Union{MDP, POMDP})

Construct reward vectors for (PO)MDP m.

The returned object is an associative object (usually a Dict), where the keys are actions. Each value in this object is an AbstractVector where the index corresponds to the state index of s and the entry is the reward for that state.

source

Sparse Tabular MDPs and POMDPs

The SparseTabularMDP and SparseTabularPOMDP represents discrete problems defined using the explicit interface. The transition and observation models are represented using sparse matrices. Solver writers can leverage these data structures to write efficient vectorized code. A problem writer can define its problem using the explicit interface and it can be automatically converted to a sparse tabular representation by calling the constructors SparseTabularMDP(::MDP) or SparseTabularPOMDP(::POMDP). See the following docs to know more about the matrix representation and how to access the fields of the SparseTabular objects:

Missing docstring.

Missing docstring for SparseTabularMDP. Check Documenter's build log for details.

POMDPTools.ModelTools.SparseTabularPOMDPType
SparseTabularPOMDP

A POMDP object where states and actions are integers and the transition and observation distributions are represented by lists of sparse matrices. This data structure can be useful to exploit in vectorized algorithms to gain performance (e.g. see SparseValueIterationSolver). The recommended way to access the transition, reward, and observation matrices is through the provided accessor functions: transition_matrix, reward_vector, observation_matrix.

Fields

  • T::Vector{SparseMatrixCSC{Float64, Int64}} The transition model is represented as a vector of sparse matrices (one for each action). T[a][s, sp] the probability of transition from s to sp taking action a.
  • R::Array{Float64, 2} The reward is represented as a matrix where the rows are states and the columns actions: R[s, a] is the reward of taking action a in sate s.
  • O::Vector{SparseMatrixCSC{Float64, Int64}} The observation model is represented as a vector of sparse matrices (one for each action). O[a][sp, o] is the probability of observing o from state sp after having taken action a.
  • initial_probs::SparseVector{Float64, Int64} Specifies the initial state distribution
  • terminal_states::Set{Int64} Stores the terminal states
  • discount::Float64 The discount factor

Constructors

  • SparseTabularPOMDP(pomdp::POMDP) : One can provide the matrices to the default constructor or one can construct a SparseTabularPOMDP from any discrete state MDP defined using the explicit interface.

Note that constructing the transition and reward matrices requires to iterate over all the states and can take a while. To learn more information about how to define an MDP with the explicit interface please visit https://juliapomdp.github.io/POMDPs.jl/latest/explicit/ .

  • SparseTabularPOMDP(spomdp::SparseTabularMDP; transition, reward, observation, discount) : This constructor returns a new sparse POMDP that is a copy of the original smdp except for the field specified by the keyword arguments.
source
POMDPTools.ModelTools.transition_matrixFunction
transition_matrix(p::SparseTabularProblem, a)

Accessor function for the transition model of a sparse tabular problem. It returns a sparse matrix containing the transition probabilities when taking action a: T[s, sp] = Pr(sp | s, a).

source
POMDPTools.ModelTools.reward_vectorFunction
reward_vector(p::SparseTabularProblem, a)

Accessor function for the reward function of a sparse tabular problem. It returns a vector containing the reward for all the states when taking action a: R(s, a). The length of the return vector is equal to the number of states.

source
POMDPTools.ModelTools.observation_matrixFunction
observation_matrix(p::SparseTabularPOMDP, a::Int64)

Accessor function for the observation model of a sparse tabular POMDP. It returns a sparse matrix containing the observation probabilities when having taken action a: O[sp, o] = Pr(o | sp, a).

source
POMDPTools.ModelTools.reward_matrixFunction
reward_matrix(p::SparseTabularProblem)

Accessor function for the reward matrix R[s, a] of a sparse tabular problem.

source
POMDPTools.ModelTools.observation_matricesFunction
observation_matrices(p::SparseTabularPOMDP)

Accessor function for the observation model of a sparse tabular POMDP. It returns a list of sparse matrices for each action of the problem.

source

Fully Observable POMDP

POMDPTools.ModelTools.FullyObservablePOMDPType
FullyObservablePOMDP(mdp)

Turn MDP mdp into a POMDP where the observations are the states of the MDP.

source

Generative Belief MDP

Every POMDP is an MDP on the belief space GenerativeBeliefMDP creates a generative model for that MDP.

Warning

The reward generated by the GenerativeBeliefMDP is the reward for a single state sampled from the belief; it is not the expected reward for that belief transition (though, in expectation, they are equivalent of course). Implementing the model with the expected reward requires a custom implementation because belief updaters do not typically deal with reward.

POMDPTools.ModelTools.GenerativeBeliefMDPType
GenerativeBeliefMDP(pomdp, updater)

Create a generative model of the belief MDP corresponding to POMDP pomdp with belief updates performed by updater.

source

Example

using POMDPs
+ false => 0.3
source

Observation Weight

Sometimes, e.g. in particle filtering, the relative likelihood of an observation is required in addition to a generative model, and it is often tedious to implement a custom observation distribution type. For this case, the shortcut function obs_weight is provided.

POMDPTools.ModelTools.obs_weightFunction
obs_weight(pomdp, s, a, sp, o)

Return a weight proportional to the likelihood of receiving observation o from state sp (and a and s if they are present).

This is a useful shortcut for particle filtering so that the observation distribution does not have to be represented.

source

Ordered Spaces

It is often useful to have a list of states, actions, or observations ordered consistently with the respective index function from POMDPs.jl. Since the POMDPs.jl interface does not demand that spaces be ordered consistently with index, the states, actions, and observations functions are not sufficient. Thus POMDPModelTools provides ordered_actions, ordered_states, and ordered_observations to provide this capability.

POMDPTools.ModelTools.ordered_actionsFunction
ordered_actions(mdp)

Return an AbstractVector of actions ordered according to actionindex(mdp, a).

ordered_actions(mdp) will always return an AbstractVector{A} v containing all of the actions in actions(mdp) in the order such that actionindex(mdp, v[i]) == i. You may wish to override this for your problem for efficiency.

source
POMDPTools.ModelTools.ordered_statesFunction
ordered_states(mdp)

Return an AbstractVector of states ordered according to stateindex(mdp, a).

ordered_states(mdp) will always return a AbstractVector{A} v containing all of the states in states(mdp) in the order such that stateindex(mdp, v[i]) == i. You may wish to override this for your problem for efficiency.

source
POMDPTools.ModelTools.ordered_observationsFunction
ordered_observations(pomdp)

Return an AbstractVector of observations ordered according to obsindex(pomdp, a).

ordered_observations(mdp) will always return a AbstractVector{A} v containing all of the observations in observations(pomdp) in the order such that obsindex(pomdp, v[i]) == i. You may wish to override this for your problem for efficiency.

source

Info Interface

It is often the case that useful information besides the belief, state, action, etc is generated by a function in POMDPs.jl. This information can be useful for debugging or understanding the behavior of a solver, updater, or problem. The info interface provides a standard way for problems, policies, solvers or updaters to output this information. The recording simulators from POMDPTools automatically record this information.

To specify info from policies, solvers, or updaters, implement the following functions:

POMDPTools.ModelTools.action_infoFunction
a, ai = action_info(policy, x)

Return a tuple containing the action determined by policy 'p' at state or belief 'x' and information (usually a NamedTuple, Dict or nothing) from the calculation of that action.

By default, returns nothing as info.

source
POMDPTools.ModelTools.solve_infoFunction
policy, si = solve_info(solver, problem)

Return a tuple containing the policy determined by a solver and information (usually a NamedTuple, Dict or nothing) from the calculation of that policy.

By default, returns nothing as info.

source
POMDPTools.ModelTools.update_infoFunction
bp, i = update_info(updater, b, a, o)

Return a tuple containing the new belief and information (usually a NamedTuple, Dict or nothing) from the belief update.

By default, returns nothing as info.

source

Model Transformations

POMDPTools contains several tools for transforming problems into other classes so that they can be used by different solvers.

Linear Algebra Representations

For some algorithms, such as value iteration, it is convenient to use vectors that contain the reward for every state, and matrices that contain the transition probabilities. These can be constructed with the following functions:

POMDPTools.ModelTools.transition_matricesFunction
transition_matrices(p::SparseTabularProblem)

Accessor function for the transition model of a sparse tabular problem. It returns a list of sparse matrices for each action of the problem.

source
transition_matrices(m::Union{MDP,POMDP})
+transition_matrices(m; sparse=true)

Construct transition matrices for (PO)MDP m.

The returned object is an associative object (usually a Dict), where the keys are actions. Each value in this object is an AbstractMatrix where the row corresponds to the state index of s and the column corresponds to the state index of s'. The entry in the matrix is the probability of transitioning from state s to state s'.

source
POMDPTools.ModelTools.reward_vectorsFunction
reward_vectors(m::Union{MDP, POMDP})

Construct reward vectors for (PO)MDP m.

The returned object is an associative object (usually a Dict), where the keys are actions. Each value in this object is an AbstractVector where the index corresponds to the state index of s and the entry is the reward for that state.

source

Sparse Tabular MDPs and POMDPs

The SparseTabularMDP and SparseTabularPOMDP represents discrete problems defined using the explicit interface. The transition and observation models are represented using sparse matrices. Solver writers can leverage these data structures to write efficient vectorized code. A problem writer can define its problem using the explicit interface and it can be automatically converted to a sparse tabular representation by calling the constructors SparseTabularMDP(::MDP) or SparseTabularPOMDP(::POMDP). See the following docs to know more about the matrix representation and how to access the fields of the SparseTabular objects:

Missing docstring.

Missing docstring for SparseTabularMDP. Check Documenter's build log for details.

POMDPTools.ModelTools.SparseTabularPOMDPType
SparseTabularPOMDP

A POMDP object where states and actions are integers and the transition and observation distributions are represented by lists of sparse matrices. This data structure can be useful to exploit in vectorized algorithms to gain performance (e.g. see SparseValueIterationSolver). The recommended way to access the transition, reward, and observation matrices is through the provided accessor functions: transition_matrix, reward_vector, observation_matrix.

Fields

  • T::Vector{SparseMatrixCSC{Float64, Int64}} The transition model is represented as a vector of sparse matrices (one for each action). T[a][s, sp] the probability of transition from s to sp taking action a.
  • R::Array{Float64, 2} The reward is represented as a matrix where the rows are states and the columns actions: R[s, a] is the reward of taking action a in sate s.
  • O::Vector{SparseMatrixCSC{Float64, Int64}} The observation model is represented as a vector of sparse matrices (one for each action). O[a][sp, o] is the probability of observing o from state sp after having taken action a.
  • initial_probs::SparseVector{Float64, Int64} Specifies the initial state distribution
  • terminal_states::Set{Int64} Stores the terminal states
  • discount::Float64 The discount factor

Constructors

  • SparseTabularPOMDP(pomdp::POMDP) : One can provide the matrices to the default constructor or one can construct a SparseTabularPOMDP from any discrete state MDP defined using the explicit interface.

Note that constructing the transition and reward matrices requires to iterate over all the states and can take a while. To learn more information about how to define an MDP with the explicit interface please visit https://juliapomdp.github.io/POMDPs.jl/latest/explicit/ .

  • SparseTabularPOMDP(spomdp::SparseTabularMDP; transition, reward, observation, discount) : This constructor returns a new sparse POMDP that is a copy of the original smdp except for the field specified by the keyword arguments.
source
POMDPTools.ModelTools.transition_matrixFunction
transition_matrix(p::SparseTabularProblem, a)

Accessor function for the transition model of a sparse tabular problem. It returns a sparse matrix containing the transition probabilities when taking action a: T[s, sp] = Pr(sp | s, a).

source
POMDPTools.ModelTools.reward_vectorFunction
reward_vector(p::SparseTabularProblem, a)

Accessor function for the reward function of a sparse tabular problem. It returns a vector containing the reward for all the states when taking action a: R(s, a). The length of the return vector is equal to the number of states.

source
POMDPTools.ModelTools.observation_matrixFunction
observation_matrix(p::SparseTabularPOMDP, a::Int64)

Accessor function for the observation model of a sparse tabular POMDP. It returns a sparse matrix containing the observation probabilities when having taken action a: O[sp, o] = Pr(o | sp, a).

source
POMDPTools.ModelTools.reward_matrixFunction
reward_matrix(p::SparseTabularProblem)

Accessor function for the reward matrix R[s, a] of a sparse tabular problem.

source
POMDPTools.ModelTools.observation_matricesFunction
observation_matrices(p::SparseTabularPOMDP)

Accessor function for the observation model of a sparse tabular POMDP. It returns a list of sparse matrices for each action of the problem.

source

Fully Observable POMDP

POMDPTools.ModelTools.FullyObservablePOMDPType
FullyObservablePOMDP(mdp)

Turn MDP mdp into a POMDP where the observations are the states of the MDP.

source

Generative Belief MDP

Every POMDP is an MDP on the belief space GenerativeBeliefMDP creates a generative model for that MDP.

Warning

The reward generated by the GenerativeBeliefMDP is the reward for a single state sampled from the belief; it is not the expected reward for that belief transition (though, in expectation, they are equivalent of course). Implementing the model with the expected reward requires a custom implementation because belief updaters do not typically deal with reward.

POMDPTools.ModelTools.GenerativeBeliefMDPType
GenerativeBeliefMDP(pomdp, updater)

Create a generative model of the belief MDP corresponding to POMDP pomdp with belief updates performed by updater.

source

Example

using POMDPs
 using POMDPModels
 using POMDPTools
 
@@ -26,7 +26,7 @@
 (a, r, sp) = (true, -5.0, DiscreteBelief{POMDPModels.BabyPOMDP, Bool}(POMDPModels.BabyPOMDP(-5.0, -10.0, 0.1, 0.8, 0.1, 0.9), Bool[0, 1], [1.0, 0.0]))
 (a, r, sp) = (true, -5.0, DiscreteBelief{POMDPModels.BabyPOMDP, Bool}(POMDPModels.BabyPOMDP(-5.0, -10.0, 0.1, 0.8, 0.1, 0.9), Bool[0, 1], [1.0, 0.0]))
 (a, r, sp) = (false, 0.0, DiscreteBelief{POMDPModels.BabyPOMDP, Bool}(POMDPModels.BabyPOMDP(-5.0, -10.0, 0.1, 0.8, 0.1, 0.9), Bool[0, 1], [0.9759036144578314, 0.02409638554216867]))
-(a, r, sp) = (false, 0.0, DiscreteBelief{POMDPModels.BabyPOMDP, Bool}(POMDPModels.BabyPOMDP(-5.0, -10.0, 0.1, 0.8, 0.1, 0.9), Bool[0, 1], [0.9701315984030756, 0.029868401596924433]))

Underlying MDP

POMDPTools.ModelTools.UnderlyingMDPType
UnderlyingMDP(m::POMDP)

Transform POMDP m into an MDP where the states are fully observed.

UnderlyingMDP(m::MDP)

Return m

source

State Action Reward Model

POMDPTools.ModelTools.StateActionRewardType
StateActionReward(m::Union{MDP,POMDP})

Robustly create a reward function that depends only on the state and action.

If reward(m, s, a) is implemented, that will be used, otherwise the mean of reward(m, s, a, sp) for MDPs or reward(m, s, a, sp, o) for POMDPs will be used.

Example

using POMDPs
+(a, r, sp) = (false, 0.0, DiscreteBelief{POMDPModels.BabyPOMDP, Bool}(POMDPModels.BabyPOMDP(-5.0, -10.0, 0.1, 0.8, 0.1, 0.9), Bool[0, 1], [0.9701315984030756, 0.029868401596924433]))

Underlying MDP

State Action Reward Model

POMDPTools.ModelTools.StateActionRewardType
StateActionReward(m::Union{MDP,POMDP})

Robustly create a reward function that depends only on the state and action.

If reward(m, s, a) is implemented, that will be used, otherwise the mean of reward(m, s, a, sp) for MDPs or reward(m, s, a, sp, o) for POMDPs will be used.

Example

using POMDPs
 using POMDPModels
 using POMDPTools
 
@@ -38,4 +38,4 @@
 
 # output
 
--15.0
source

Utility Types

Terminal State

TerminalState and its singleton instance terminalstate are available to use for a terminal state in concert with another state type. It has the appropriate type promotion logic to make its use with other types friendly, similar to nothing and missing.

Note

NOTE: This is NOT a replacement for the standard POMDPs.jl isterminal function, though isterminal is implemented for the type. It is merely a convenient type to use for terminal states.

Warning

WARNING: Early tests (August 2018) suggest that the Julia 1.0 compiler will not be able to efficiently implement union splitting in cases as complex as POMDPs, so using a Union for the state type of a problem can currently have a large overhead.

POMDPTools.ModelTools.TerminalStateType
TerminalState

A type with no fields whose singleton instance terminalstate is used to represent a terminal state with no additional information.

This type has the appropriate promotion logic implemented to function like Missing when added to arrays, etc.

Note that terminal states NEED NOT be of type TerminalState. You can define any state to be terminal by implementing the appropriate isterminal method. Solvers and simulators SHOULD NOT check for this type, but should instead check using isterminal.

source
+-15.0source

Utility Types

Terminal State

TerminalState and its singleton instance terminalstate are available to use for a terminal state in concert with another state type. It has the appropriate type promotion logic to make its use with other types friendly, similar to nothing and missing.

Note

NOTE: This is NOT a replacement for the standard POMDPs.jl isterminal function, though isterminal is implemented for the type. It is merely a convenient type to use for terminal states.

Warning

WARNING: Early tests (August 2018) suggest that the Julia 1.0 compiler will not be able to efficiently implement union splitting in cases as complex as POMDPs, so using a Union for the state type of a problem can currently have a large overhead.

POMDPTools.ModelTools.TerminalStateType
TerminalState

A type with no fields whose singleton instance terminalstate is used to represent a terminal state with no additional information.

This type has the appropriate promotion logic implemented to function like Missing when added to arrays, etc.

Note that terminal states NEED NOT be of type TerminalState. You can define any state to be terminal by implementing the appropriate isterminal method. Solvers and simulators SHOULD NOT check for this type, but should instead check using isterminal.

source
POMDPTools.ModelTools.terminalstateConstant
terminalstate

The singleton instance of type TerminalState representing a terminal state.

source
diff --git a/dev/POMDPTools/policies/index.html b/dev/POMDPTools/policies/index.html index 4f7090e9..7b9e5a54 100644 --- a/dev/POMDPTools/policies/index.html +++ b/dev/POMDPTools/policies/index.html @@ -1,8 +1,8 @@ -Implemented Policies · POMDPs.jl

Implemented Policies

POMDPTools currently provides the following policy types:

  • a wrapper to turn a function into a Policy
  • an alpha vector policy type
  • a random policy
  • a stochastic policy type
  • exploration policies
  • a vector policy type
  • a wrapper to collect statistics and errors about policies

In addition, it provides the showpolicy function for printing policies similar to the way that matrices are printed in the repl and the evaluate function for evaluating MDP policies.

Function

Wraps a Function mapping states to actions into a Policy.

Alpha Vector Policy

Represents a policy with a set of alpha vectors (See AlphaVectorPolicy constructor docstring). In addition to finding the optimal action with action, the alpha vectors can be accessed with alphavectors or alphapairs.

Determining the estimated value and optimal action depends on calculating the dot product between alpha vectors and a belief vector. POMDPTools.Policies.beliefvec(pomdp, b) is used to create this vector and can be overridden for new belief types for efficiency.

POMDPTools.Policies.AlphaVectorPolicyType
AlphaVectorPolicy(pomdp::POMDP, alphas, action_map)

Construct a policy from alpha vectors.

Arguments

  • alphas: an |S| x (number of alpha vecs) matrix or a vector of alpha vectors.

  • action_map: a vector of the actions correponding to each alpha vector

    AlphaVectorPolicy{P<:POMDP, A}

Represents a policy with a set of alpha vectors.

Use action to get the best action for a belief, and alphavectors and alphapairs to

Fields

  • pomdp::P the POMDP problem
  • n_states::Int the number of states in the POMDP
  • alphas::Vector{Vector{Float64}} the list of alpha vectors
  • action_map::Vector{A} a list of action corresponding to the alpha vectors
source
POMDPTools.Policies.beliefvecFunction
POMDPTools.Policies.beliefvec(m::POMDP, n_states::Int, b)

Return a vector-like representation of the belief b suitable for calculating the dot product with the alpha vectors.

source

Random Policy

A policy that returns a randomly selected action using rand(rng, actions(pomdp)).

POMDPTools.Policies.RandomPolicyType
RandomPolicy{RNG<:AbstractRNG, P<:Union{POMDP,MDP}, U<:Updater}

a generic policy that uses the actions function to create a list of actions and then randomly samples an action from it.

Constructor:

`RandomPolicy(problem::Union{POMDP,MDP};
+Implemented Policies · POMDPs.jl

Implemented Policies

POMDPTools currently provides the following policy types:

  • a wrapper to turn a function into a Policy
  • an alpha vector policy type
  • a random policy
  • a stochastic policy type
  • exploration policies
  • a vector policy type
  • a wrapper to collect statistics and errors about policies

In addition, it provides the showpolicy function for printing policies similar to the way that matrices are printed in the repl and the evaluate function for evaluating MDP policies.

Function

Wraps a Function mapping states to actions into a Policy.

Alpha Vector Policy

Represents a policy with a set of alpha vectors (See AlphaVectorPolicy constructor docstring). In addition to finding the optimal action with action, the alpha vectors can be accessed with alphavectors or alphapairs.

Determining the estimated value and optimal action depends on calculating the dot product between alpha vectors and a belief vector. POMDPTools.Policies.beliefvec(pomdp, b) is used to create this vector and can be overridden for new belief types for efficiency.

POMDPTools.Policies.AlphaVectorPolicyType
AlphaVectorPolicy(pomdp::POMDP, alphas, action_map)

Construct a policy from alpha vectors.

Arguments

  • alphas: an |S| x (number of alpha vecs) matrix or a vector of alpha vectors.

  • action_map: a vector of the actions correponding to each alpha vector

    AlphaVectorPolicy{P<:POMDP, A}

Represents a policy with a set of alpha vectors.

Use action to get the best action for a belief, and alphavectors and alphapairs to

Fields

  • pomdp::P the POMDP problem
  • n_states::Int the number of states in the POMDP
  • alphas::Vector{Vector{Float64}} the list of alpha vectors
  • action_map::Vector{A} a list of action corresponding to the alpha vectors
source
POMDPTools.Policies.beliefvecFunction
POMDPTools.Policies.beliefvec(m::POMDP, n_states::Int, b)

Return a vector-like representation of the belief b suitable for calculating the dot product with the alpha vectors.

source

Random Policy

A policy that returns a randomly selected action using rand(rng, actions(pomdp)).

POMDPTools.Policies.RandomPolicyType
RandomPolicy{RNG<:AbstractRNG, P<:Union{POMDP,MDP}, U<:Updater}

a generic policy that uses the actions function to create a list of actions and then randomly samples an action from it.

Constructor:

`RandomPolicy(problem::Union{POMDP,MDP};
          rng=Random.default_rng(),
-         updater=NothingUpdater())`

Fields

  • rng::RNG a random number generator
  • probelm::P the POMDP or MDP problem
  • updater::U a belief updater (default to NothingUpdater in the above constructor)
source

Stochastic Policies

Types for representing randomized policies:

  • StochasticPolicy samples actions from an arbitrary distribution.
  • UniformRandomPolicy samples actions uniformly (see RandomPolicy for a similar use)
  • CategoricalTabularPolicy samples actions from a categorical distribution with weights given by a ValuePolicy.
POMDPTools.Policies.StochasticPolicyType

StochasticPolicy{D, RNG <: AbstractRNG}

Represents a stochastic policy. Action are sampled from an arbitrary distribution.

Constructor:

`StochasticPolicy(distribution; rng=Random.default_rng())`

Fields

  • distribution::D
  • rng::RNG a random number generator
source
POMDPTools.Policies.CategoricalTabularPolicyType
CategoricalTabularPolicy

represents a stochastic policy sampling an action from a categorical distribution with weights given by a ValuePolicy

constructor:

CategoricalTabularPolicy(mdp::Union{POMDP,MDP}; rng=Random.default_rng())

Fields

  • stochastic::StochasticPolicy
  • value::ValuePolicy
source

Vector Policies

Tabular policies including the following:

  • VectorPolicy holds a vector of actions, one for each state, ordered according to stateindex.
  • ValuePolicy holds a matrix of values for state-action pairs and chooses the action with the highest value at the given state
POMDPTools.Policies.VectorPolicyType
VectorPolicy{S,A}

A generic MDP policy that consists of a vector of actions. The entry at stateindex(mdp, s) is the action that will be taken in state s.

Fields

  • mdp::MDP{S,A} the MDP problem
  • act::Vector{A} a vector of size |S| mapping state indices to actions
source
POMDPTools.Policies.ValuePolicyType
 ValuePolicy{P<:Union{POMDP,MDP}, T<:AbstractMatrix{Float64}, A}

A generic MDP policy that consists of a value table. The entry at stateindex(mdp, s) is the action that will be taken in state s. It is expected that the order of the actions in the value table is consistent with the order of the actions in act. If act is not explicitly set in the construction, act is ordered according to actionindex.

Fields

  • mdp::P the MDP problem
  • value_table::T the value table as a |S|x|A| matrix
  • act::Vector{A} the possible actions
source

Value Dict Policy

ValueDictPolicy holds a dictionary of values, where the key is state-action tuple, and chooses the action with the highest value at the given state. It allows one to write solvers without enumerating state and action spaces, but actions and states must support Base.isequal() and Base.hash().

POMDPTools.Policies.ValueDictPolicyType
 ValueDictPolicy(mdp)

A generic MDP policy that consists of a Dict storing Q-values for state-action pairs. If there are no entries higher than a default value, this will fall back to a default policy.

Keyword Arguments

  • value_table::AbstractDict the value dict, key is (s, a) Tuple.
  • default_value::Float64 the defalut value of value_dict.
  • default_policy::Policy the policy taken when no action has a value higher than default_value
source

Exploration Policies

Exploration policies are often useful for Reinforcement Learning algorithm to choose an action that is different than the action given by the policy being learned (on_policy).

Exploration policies are subtype of the abstract ExplorationPolicy type and they follow the following interface: action(exploration_policy::ExplorationPolicy, on_policy::Policy, k, s). k is used to compute the value of the exploration parameter (see Schedule), and s is the current state or observation in which the agent is taking an action.

The action method is exported by POMDPs.jl. To use exploration policies in a solver, you must use the four argument version of action where on_policy is the policy being learned (e.g. tabular policy or neural network policy).

This package provides two exploration policies: EpsGreedyPolicy and SoftmaxPolicy

POMDPTools.Policies.EpsGreedyPolicyType
EpsGreedyPolicy <: ExplorationPolicy

represents an epsilon greedy policy, sampling a random action with a probability eps or returning an action from a given policy otherwise. The evolution of epsilon can be controlled using a schedule. This feature is useful for using those policies in reinforcement learning algorithms.

Constructor:

EpsGreedyPolicy(problem::Union{MDP, POMDP}, eps::Union{Function, Float64}; rng=Random.default_rng(), schedule=ConstantSchedule)

If a function is passed for eps, eps(k) is called to compute the value of epsilon when calling action(exploration_policy, on_policy, k, s).

Fields

  • eps::Function
  • rng::AbstractRNG
  • m::M POMDPs or MDPs problem
source
POMDPTools.Policies.SoftmaxPolicyType
SoftmaxPolicy <: ExplorationPolicy

represents a softmax policy, sampling a random action according to a softmax function. The softmax function converts the action values of the on policy into probabilities that are used for sampling. A temperature parameter or function can be used to make the resulting distribution more or less wide.

Constructor

SoftmaxPolicy(problem, temperature::Union{Function, Float64}; rng=Random.default_rng())

If a function is passed for temperature, temperature(k) is called to compute the value of the temperature when calling action(exploration_policy, on_policy, k, s)

Fields

  • temperature::Function
  • rng::AbstractRNG
  • actions::A an indexable list of action
source

Schedule

Exploration policies often rely on a key parameter: $\epsilon$ in $\epsilon$-greedy and the temperature in softmax for example. Reinforcement learning algorithms often require a decay schedule for these parameters. Schedule can be passed to an exploration policy as functions. For example one can define an epsilon greedy policy with an exponential decay schedule as follow:

    m # your mdp or pomdp model
-    exploration_policy = EpsGreedyPolicy(m, k->0.05*0.9^(k/10))

POMDPTools exports a linear decay schedule object that can be used as well.

POMDPTools.Policies.LinearDecayScheduleType
LinearDecaySchedule

A schedule that linearly decreases a value from start to stop in steps steps. if the value is greater or equal to stop, it stays constant.

Constructor

LinearDecaySchedule(;start, stop, steps)

source

Playback Policy

A policy that replays a fixed sequence of actions. When all actions are used, a backup policy is used.

POMDPTools.Policies.PlaybackPolicyType
PlaybackPolicy{A<:AbstractArray, P<:Policy, V<:AbstractArray{<:Real}}

a policy that applies a fixed sequence of actions until they are all used and then falls back onto a backup policy until the end of the episode.

Constructor:

`PlaybackPolicy(actions::AbstractArray, backup_policy::Policy; logpdfs::AbstractArray{Float64, 1} = Float64[])`

Fields

  • actions::Vector{A} a vector of actions to play back
  • backup_policy::Policy the policy to use when all prescribed actions have been taken but the episode continues
  • logpdfs::Vector{Float64} the log probability (density) of actions
  • i::Int64 the current action index
source

Utility Wrapper

A wrapper for policies to collect statistics and handle errors.

POMDPTools.Policies.PolicyWrapperType
PolicyWrapper

Flexible utility wrapper for a policy designed for collecting statistics about planning.

Carries a function, a policy, and optionally a payload (that can be any type).

The function should typically be defined with the do syntax. Each time action is called on the wrapper, this function will be called.

If there is no payload, it will be called with two argments: the policy and the state/belief. If there is a payload, it will be called with three arguments: the policy, the payload, and the current state or belief. The function should return an appropriate action. The idea is that, in this function, action(policy, s) should be called, statistics from the policy/planner should be collected and saved in the payload, exceptions can be handled, and the action should be returned.

Constructor

PolicyWrapper(policy::Policy; payload=nothing)

Example

using POMDPModels
+         updater=NothingUpdater())`

Fields

  • rng::RNG a random number generator
  • probelm::P the POMDP or MDP problem
  • updater::U a belief updater (default to NothingUpdater in the above constructor)
source

Stochastic Policies

Types for representing randomized policies:

  • StochasticPolicy samples actions from an arbitrary distribution.
  • UniformRandomPolicy samples actions uniformly (see RandomPolicy for a similar use)
  • CategoricalTabularPolicy samples actions from a categorical distribution with weights given by a ValuePolicy.
POMDPTools.Policies.StochasticPolicyType

StochasticPolicy{D, RNG <: AbstractRNG}

Represents a stochastic policy. Action are sampled from an arbitrary distribution.

Constructor:

`StochasticPolicy(distribution; rng=Random.default_rng())`

Fields

  • distribution::D
  • rng::RNG a random number generator
source
POMDPTools.Policies.CategoricalTabularPolicyType
CategoricalTabularPolicy

represents a stochastic policy sampling an action from a categorical distribution with weights given by a ValuePolicy

constructor:

CategoricalTabularPolicy(mdp::Union{POMDP,MDP}; rng=Random.default_rng())

Fields

  • stochastic::StochasticPolicy
  • value::ValuePolicy
source

Vector Policies

Tabular policies including the following:

  • VectorPolicy holds a vector of actions, one for each state, ordered according to stateindex.
  • ValuePolicy holds a matrix of values for state-action pairs and chooses the action with the highest value at the given state
POMDPTools.Policies.VectorPolicyType
VectorPolicy{S,A}

A generic MDP policy that consists of a vector of actions. The entry at stateindex(mdp, s) is the action that will be taken in state s.

Fields

  • mdp::MDP{S,A} the MDP problem
  • act::Vector{A} a vector of size |S| mapping state indices to actions
source
POMDPTools.Policies.ValuePolicyType
 ValuePolicy{P<:Union{POMDP,MDP}, T<:AbstractMatrix{Float64}, A}

A generic MDP policy that consists of a value table. The entry at stateindex(mdp, s) is the action that will be taken in state s. It is expected that the order of the actions in the value table is consistent with the order of the actions in act. If act is not explicitly set in the construction, act is ordered according to actionindex.

Fields

  • mdp::P the MDP problem
  • value_table::T the value table as a |S|x|A| matrix
  • act::Vector{A} the possible actions
source

Value Dict Policy

ValueDictPolicy holds a dictionary of values, where the key is state-action tuple, and chooses the action with the highest value at the given state. It allows one to write solvers without enumerating state and action spaces, but actions and states must support Base.isequal() and Base.hash().

POMDPTools.Policies.ValueDictPolicyType
 ValueDictPolicy(mdp)

A generic MDP policy that consists of a Dict storing Q-values for state-action pairs. If there are no entries higher than a default value, this will fall back to a default policy.

Keyword Arguments

  • value_table::AbstractDict the value dict, key is (s, a) Tuple.
  • default_value::Float64 the defalut value of value_dict.
  • default_policy::Policy the policy taken when no action has a value higher than default_value
source

Exploration Policies

Exploration policies are often useful for Reinforcement Learning algorithm to choose an action that is different than the action given by the policy being learned (on_policy).

Exploration policies are subtype of the abstract ExplorationPolicy type and they follow the following interface: action(exploration_policy::ExplorationPolicy, on_policy::Policy, k, s). k is used to compute the value of the exploration parameter (see Schedule), and s is the current state or observation in which the agent is taking an action.

The action method is exported by POMDPs.jl. To use exploration policies in a solver, you must use the four argument version of action where on_policy is the policy being learned (e.g. tabular policy or neural network policy).

This package provides two exploration policies: EpsGreedyPolicy and SoftmaxPolicy

POMDPTools.Policies.EpsGreedyPolicyType
EpsGreedyPolicy <: ExplorationPolicy

represents an epsilon greedy policy, sampling a random action with a probability eps or returning an action from a given policy otherwise. The evolution of epsilon can be controlled using a schedule. This feature is useful for using those policies in reinforcement learning algorithms.

Constructor:

EpsGreedyPolicy(problem::Union{MDP, POMDP}, eps::Union{Function, Float64}; rng=Random.default_rng(), schedule=ConstantSchedule)

If a function is passed for eps, eps(k) is called to compute the value of epsilon when calling action(exploration_policy, on_policy, k, s).

Fields

  • eps::Function
  • rng::AbstractRNG
  • m::M POMDPs or MDPs problem
source
POMDPTools.Policies.SoftmaxPolicyType
SoftmaxPolicy <: ExplorationPolicy

represents a softmax policy, sampling a random action according to a softmax function. The softmax function converts the action values of the on policy into probabilities that are used for sampling. A temperature parameter or function can be used to make the resulting distribution more or less wide.

Constructor

SoftmaxPolicy(problem, temperature::Union{Function, Float64}; rng=Random.default_rng())

If a function is passed for temperature, temperature(k) is called to compute the value of the temperature when calling action(exploration_policy, on_policy, k, s)

Fields

  • temperature::Function
  • rng::AbstractRNG
  • actions::A an indexable list of action
source

Schedule

Exploration policies often rely on a key parameter: $\epsilon$ in $\epsilon$-greedy and the temperature in softmax for example. Reinforcement learning algorithms often require a decay schedule for these parameters. Schedule can be passed to an exploration policy as functions. For example one can define an epsilon greedy policy with an exponential decay schedule as follow:

    m # your mdp or pomdp model
+    exploration_policy = EpsGreedyPolicy(m, k->0.05*0.9^(k/10))

POMDPTools exports a linear decay schedule object that can be used as well.

POMDPTools.Policies.LinearDecayScheduleType
LinearDecaySchedule

A schedule that linearly decreases a value from start to stop in steps steps. if the value is greater or equal to stop, it stays constant.

Constructor

LinearDecaySchedule(;start, stop, steps)

source

Playback Policy

A policy that replays a fixed sequence of actions. When all actions are used, a backup policy is used.

POMDPTools.Policies.PlaybackPolicyType
PlaybackPolicy{A<:AbstractArray, P<:Policy, V<:AbstractArray{<:Real}}

a policy that applies a fixed sequence of actions until they are all used and then falls back onto a backup policy until the end of the episode.

Constructor:

`PlaybackPolicy(actions::AbstractArray, backup_policy::Policy; logpdfs::AbstractArray{Float64, 1} = Float64[])`

Fields

  • actions::Vector{A} a vector of actions to play back
  • backup_policy::Policy the policy to use when all prescribed actions have been taken but the episode continues
  • logpdfs::Vector{Float64} the log probability (density) of actions
  • i::Int64 the current action index
source

Utility Wrapper

A wrapper for policies to collect statistics and handle errors.

POMDPTools.Policies.PolicyWrapperType
PolicyWrapper

Flexible utility wrapper for a policy designed for collecting statistics about planning.

Carries a function, a policy, and optionally a payload (that can be any type).

The function should typically be defined with the do syntax. Each time action is called on the wrapper, this function will be called.

If there is no payload, it will be called with two argments: the policy and the state/belief. If there is a payload, it will be called with three arguments: the policy, the payload, and the current state or belief. The function should return an appropriate action. The idea is that, in this function, action(policy, s) should be called, statistics from the policy/planner should be collected and saved in the payload, exceptions can be handled, and the action should be returned.

Constructor

PolicyWrapper(policy::Policy; payload=nothing)

Example

using POMDPModels
 using POMDPToolbox
 
 mdp = GridWorld()
@@ -32,10 +32,10 @@
     return a
 end
 
-h = simulate(HistoryRecorder(max_steps=100), mdp, errwrapper)

Fields

  • f::F
  • policy::P
  • payload::PL
source

Pretty Printing Policies

POMDPTools.Policies.showpolicyFunction
showpolicy([io], [mime], m::MDP, p::Policy)
+h = simulate(HistoryRecorder(max_steps=100), mdp, errwrapper)

Fields

  • f::F
  • policy::P
  • payload::PL
source

Pretty Printing Policies

POMDPTools.Policies.showpolicyFunction
showpolicy([io], [mime], m::MDP, p::Policy)
 showpolicy([io], [mime], statelist::AbstractVector, p::Policy)
-showpolicy(...; pre=" ")

Print the states in m or statelist and the actions from policy p corresponding to those states.

For the MDP version, if io[:limit] is true, will only print enough states to fill the display.

source

Policy Evaluation

The evaluate function provides a policy evaluation tool for MDPs:

POMDPTools.Policies.evaluateFunction
evaluate(m::MDP, p::Policy)
+showpolicy(...; pre=" ")

Print the states in m or statelist and the actions from policy p corresponding to those states.

For the MDP version, if io[:limit] is true, will only print enough states to fill the display.

source

Policy Evaluation

The evaluate function provides a policy evaluation tool for MDPs:

POMDPTools.Policies.evaluateFunction
evaluate(m::MDP, p::Policy)
 evaluate(m::MDP, p::Policy; rewardfunction=POMDPs.reward)

Calculate the value for a policy on an MDP using the approach in equation 4.2.2 of Kochenderfer, Decision Making Under Uncertainty, 2015.

Returns a DiscreteValueFunction, which maps states to values.

Example

using POMDPTools, POMDPModels
 m = SimpleGridWorld()
 u = evaluate(m, FunctionPolicy(x->:left))
-u([1,1]) # value of always moving left starting at state [1,1]
source
+u([1,1]) # value of always moving left starting at state [1,1]
source
diff --git a/dev/POMDPTools/simulators/index.html b/dev/POMDPTools/simulators/index.html index f9e0a563..410a8dc7 100644 --- a/dev/POMDPTools/simulators/index.html +++ b/dev/POMDPTools/simulators/index.html @@ -16,7 +16,7 @@ println("in state $s") println("took action $a") println("received observation $o and reward $r") -end

The optional spec argument can be a string, tuple of symbols, or single symbol and follows the same pattern as eachstep called on a SimHistory object.

Under the hood, this function creates a StepSimulator with spec and returns a [PO]MDPSimIterator by calling simulate with all of the arguments except spec. All keyword arguments are passed to the StepSimulator constructor.

source

The StepSimulator contained in this file can provide the same functionality with the following syntax:

sim = StepSimulator("s,a,r,sp")
+end

The optional spec argument can be a string, tuple of symbols, or single symbol and follows the same pattern as eachstep called on a SimHistory object.

Under the hood, this function creates a StepSimulator with spec and returns a [PO]MDPSimIterator by calling simulate with all of the arguments except spec. All keyword arguments are passed to the StepSimulator constructor.

source

The StepSimulator contained in this file can provide the same functionality with the following syntax:

sim = StepSimulator("s,a,r,sp")
 for (s,a,r,sp) in simulate(sim, problem, policy)
     # do something
 end

Rollouts

RolloutSimulator is the simplest MDP or POMDP simulator. When simulate is called, it simply simulates a single trajectory of the process and returns the discounted reward.

rs = RolloutSimulator()
@@ -25,12 +25,12 @@
 
 r = simulate(rs, mdp, policy)

More examples can be found in the POMDPExamples Package

POMDPTools.Simulators.RolloutSimulatorType
RolloutSimulator(rng, max_steps)
 RolloutSimulator(; <keyword arguments>)

A fast simulator that just returns the reward

The simulation will be terminated when either

  1. a terminal state is reached (as determined by isterminal() or
  2. the discount factor is as small as eps or
  3. max_steps have been executed

Keyword arguments:

  • rng::AbstractRNG (default: Random.default_rng()) - A random number generator to use.
  • eps::Float64 (default: 0.0) - A small number; if γᵗ where γ is the discount factor and t is the time step becomes smaller than this, the simulation will be terminated.
  • max_steps::Int (default: typemax(Int)) - The maximum number of steps to simulate.

Usage (optional arguments in brackets):

ro = RolloutSimulator()
-history = simulate(ro, pomdp, policy, [updater [, init_belief [, init_state]]])

See also: HistoryRecorder, run_parallel

source

History Recorder

A HistoryRecorder runs a simulation and records the trajectory. It returns an AbstractVector of NamedTuples - see Histories for more info.

hr = HistoryRecorder(max_steps=100)
+history = simulate(ro, pomdp, policy, [updater [, init_belief [, init_state]]])

See also: HistoryRecorder, run_parallel

source

History Recorder

A HistoryRecorder runs a simulation and records the trajectory. It returns an AbstractVector of NamedTuples - see Histories for more info.

hr = HistoryRecorder(max_steps=100)
 pomdp = TigerPOMDP()
 policy = RandomPolicy(pomdp)
 
 h = simulate(hr, pomdp, policy)

More examples can be found in the POMDPExamples Package.

POMDPTools.Simulators.HistoryRecorderType

A simulator that records the history for later examination

The simulation will be terminated when either

  1. a terminal state is reached (as determined by isterminal() or
  2. the discount factor is as small as eps or
  3. max_steps have been executed

Keyword Arguments: - rng: The random number generator for the simulation - capture_exception::Bool: whether to capture an exception and store it in the history, or let it go uncaught, potentially killing the script - show_progress::Bool: show a progress bar for the simulation - eps - max_steps

Usage (optional arguments in brackets):

hr = HistoryRecorder()
-history = simulate(hr, pomdp, policy, [updater [, init_belief [, init_state]]])
source

sim()

The sim function provides a convenient way to interact with a POMDP or MDP environment and return a history. The first argument is a function that is called at every time step and takes a state (in the case of an MDP) or an observation (in the case of a POMDP) as the argument and then returns an action. The second argument is a pomdp or mdp. It is intended to be used with Julia's do syntax as follows:

pomdp = TigerPOMDP()
+history = simulate(hr, pomdp, policy, [updater [, init_belief [, init_state]]])
source

sim()

The sim function provides a convenient way to interact with a POMDP or MDP environment and return a history. The first argument is a function that is called at every time step and takes a state (in the case of an MDP) or an observation (in the case of a POMDP) as the argument and then returns an action. The second argument is a pomdp or mdp. It is intended to be used with Julia's do syntax as follows:

pomdp = TigerPOMDP()
 history = sim(pomdp, max_steps=10) do obs
     println("Observation was $obs.")
     return TIGER_OPEN_LEFT
@@ -49,7 +49,7 @@
     return a
 end

for a POMDP and a belief updater.

Keyword Arguments

All Versions

POMDP version

POMDP and updater version

source

Histories

The results produced by HistoryRecorders and the sim function are contained in SimHistory objects.

POMDPTools.Simulators.SimHistoryType
SimHistory

An (PO)MDP simulation history returned by simulate(::HistoryRecorder, ::Union{MDP,POMDP},...).

This is an AbstractVector of NamedTuples containing the states, actions, etc.

Examples

hist[1][:s] # returns the first state in the history
hist[:a] # returns all of the actions in the history
source

Examples

using POMDPs, POMDPTools, POMDPModels
+end
will limit the simulation to 100 steps.

POMDP version

POMDP and updater version

source

Histories

The results produced by HistoryRecorders and the sim function are contained in SimHistory objects.

POMDPTools.Simulators.SimHistoryType
SimHistory

An (PO)MDP simulation history returned by simulate(::HistoryRecorder, ::Union{MDP,POMDP},...).

This is an AbstractVector of NamedTuples containing the states, actions, etc.

Examples

hist[1][:s] # returns the first state in the history
hist[:a] # returns all of the actions in the history
source

Examples

using POMDPs, POMDPTools, POMDPModels
 hr = HistoryRecorder(max_steps=10)
 hist = simulate(hr, BabyPOMDP(), FunctionPolicy(x->true))
 step = hist[1] # all information available about the first step
@@ -60,12 +60,12 @@
     println("reward $r received when state $sp was reached after action $a was taken in state $s")
 end

returns the start state, action, reward and destination state for each step of the simulation.

Alternatively, instead of expanding the steps implicitly, the elements of the step can be accessed as fields (since each step is a NamedTuple):

for step in eachstep(h, "(s, a, r, sp)")    
     println("reward $(step.r) received when state $(step.sp) was reached after action $(step.a) was taken in state $(step.s)")
-end

The possible valid elements in the iteration specification are

source

Examples:

collect(eachstep(h, "a,o"))

will produce a vector of action-observation named tuples.

collect(norm(sp-s) for (s,sp) in eachstep(h, "s,sp"))

will produce a vector of the distances traveled on each step (assuming the state is a Euclidean vector).

Notes

Other Functions

state_hist(h), action_hist(h), observation_hist(h) belief_hist(h), and reward_hist(h) will return vectors of the states, actions, and rewards, and undiscounted_reward(h) and discounted_reward(h) will return the total rewards collected over the trajectory. n_steps(h) returns the number of steps in the history. exception(h) and backtrace(h) can be used to hold an exception if the simulation failed to finish.

view(h, range) (e.g. view(h, 1:n_steps(h)-4)) can be used to create a view of the history object h that only contains a certain range of steps. The object returned by view is an AbstractSimHistory that can be iterated through and manipulated just like a complete SimHistory.

Parallel

POMDPTools contains a utility for running many Monte Carlo simulations in parallel to evaluate performance. The basic workflow involves the following steps:

  1. Create a vector of Sim objects, each specifying how a single simulation should be run.
  2. Use the run_parallel or run function to run the simulations.
  3. Analyze the results of the simulations contained in the DataFrame returned by run_parallel.

Example

An example can be found in the POMDPExamples Package.

Sim objects

Each simulation should be specified by a Sim object which contains all the information needed to run a simulation, including the Simulator, POMDP or MDP, Policy, Updater, and any other ingredients.

POMDPTools.Simulators.SimType
Sim(m::MDP, p::Policy[, initialstate]; kwargs...)
-Sim(m::POMDP, p::Policy[, updater[, initial_belief[, initialstate]]]; kwargs...)

Create a Sim object that contains everything needed to run and record a single simulation, including model, initial conditions, and metadata.

A vector of Sim objects can be executed with run or run_parallel.

Keyword Arguments

  • rng::AbstractRNG=Random.default_rng()
  • max_steps::Int=typemax(Int)
  • simulator::Simulator=HistoryRecorder(rng=rng, max_steps=max_steps)
  • metadata::NamedTuple a named tuple (or dictionary) of metadata for the sim that will be recorded, e.g.(solver_iterations=500,)`.
source

Running simulations

The simulations are actually carried out by the run and run_parallel functions.

POMDPTools.Simulators.run_parallelFunction
run_parallel(queue::Vector{Sim})
+end

The possible valid elements in the iteration specification are

  • Any node in the (PO)MDP Dynamic Decision network (by default :s, :a, :sp, :o, :r)
  • b - the initial belief in the step (for POMDPs only)
  • bp - the belief after being updated based on o (for POMDPs only)
  • action_info - info from the policy decision (from action_info)
  • update_info - info from the belief update (from update_info)
  • t - the timestep index
source

Examples:

collect(eachstep(h, "a,o"))

will produce a vector of action-observation named tuples.

collect(norm(sp-s) for (s,sp) in eachstep(h, "s,sp"))

will produce a vector of the distances traveled on each step (assuming the state is a Euclidean vector).

Notes

Other Functions

state_hist(h), action_hist(h), observation_hist(h) belief_hist(h), and reward_hist(h) will return vectors of the states, actions, and rewards, and undiscounted_reward(h) and discounted_reward(h) will return the total rewards collected over the trajectory. n_steps(h) returns the number of steps in the history. exception(h) and backtrace(h) can be used to hold an exception if the simulation failed to finish.

view(h, range) (e.g. view(h, 1:n_steps(h)-4)) can be used to create a view of the history object h that only contains a certain range of steps. The object returned by view is an AbstractSimHistory that can be iterated through and manipulated just like a complete SimHistory.

Parallel

POMDPTools contains a utility for running many Monte Carlo simulations in parallel to evaluate performance. The basic workflow involves the following steps:

  1. Create a vector of Sim objects, each specifying how a single simulation should be run.
  2. Use the run_parallel or run function to run the simulations.
  3. Analyze the results of the simulations contained in the DataFrame returned by run_parallel.

Example

An example can be found in the POMDPExamples Package.

Sim objects

Each simulation should be specified by a Sim object which contains all the information needed to run a simulation, including the Simulator, POMDP or MDP, Policy, Updater, and any other ingredients.

POMDPTools.Simulators.SimType
Sim(m::MDP, p::Policy[, initialstate]; kwargs...)
+Sim(m::POMDP, p::Policy[, updater[, initial_belief[, initialstate]]]; kwargs...)

Create a Sim object that contains everything needed to run and record a single simulation, including model, initial conditions, and metadata.

A vector of Sim objects can be executed with run or run_parallel.

Keyword Arguments

  • rng::AbstractRNG=Random.default_rng()
  • max_steps::Int=typemax(Int)
  • simulator::Simulator=HistoryRecorder(rng=rng, max_steps=max_steps)
  • metadata::NamedTuple a named tuple (or dictionary) of metadata for the sim that will be recorded, e.g.(solver_iterations=500,)`.
source

Running simulations

The simulations are actually carried out by the run and run_parallel functions.

POMDPTools.Simulators.run_parallelFunction
run_parallel(queue::Vector{Sim})
 run_parallel(f::Function, queue::Vector{Sim})

Run Sim objects in queue in parallel and return results as a DataFrame.

By default, the DataFrame will contain the reward for each simulation and the metadata provided to the sim.

Arguments

  • queue: List of Sim objects to be executed
  • f: Function to process the results of each simulation

This function should take two arguments, (1) the Sim that was executed and (2) the result of the simulation, by default a SimHistory. It should return a named tuple that will appear in the dataframe. See Examples below.

Keyword Arguments

  • show_progress::Bool: whether or not to show a progress meter
  • progress::ProgressMeter.Progress: determines how the progress meter is displayed

Examples

run_parallel(queue) do sim, hist
     return (n_steps=n_steps(hist), reward=discounted_reward(hist))
-end

will return a dataframe with with the number of steps and the reward in it.

source

The run function is also provided to run simulations in serial (this is often useful for debugging). Note that the documentation below also contains a section for the builtin julia run function, even though it is not relevant here.

Base.runFunction
run(queue::Vector{Sim})
-run(f::Function, queue::Vector{Sim})

Run the Sim objects in queue on a single process and return the results as a dataframe.

See run_parallel for more information.

source

Specifying information to be recorded

By default, only the discounted rewards from each simulation are recorded, but arbitrary information can be recorded.

The run_parallel and run functions accept a function (normally specified via the do syntax) that takes the Sim object and history of the simulation and extracts relevant statistics as a named tuple. For example, if the desired characteristics are the number of steps in the simulation and the reward, run_parallel would be invoked as follows:

df = run_parallel(queue) do sim::Sim, hist::SimHistory
+end

will return a dataframe with with the number of steps and the reward in it.

source

The run function is also provided to run simulations in serial (this is often useful for debugging). Note that the documentation below also contains a section for the builtin julia run function, even though it is not relevant here.

Base.runFunction
run(queue::Vector{Sim})
+run(f::Function, queue::Vector{Sim})

Run the Sim objects in queue on a single process and return the results as a dataframe.

See run_parallel for more information.

source

Specifying information to be recorded

By default, only the discounted rewards from each simulation are recorded, but arbitrary information can be recorded.

The run_parallel and run functions accept a function (normally specified via the do syntax) that takes the Sim object and history of the simulation and extracts relevant statistics as a named tuple. For example, if the desired characteristics are the number of steps in the simulation and the reward, run_parallel would be invoked as follows:

df = run_parallel(queue) do sim::Sim, hist::SimHistory
     return (n_steps=n_steps(hist), reward=discounted_reward(hist))
 end

These statistics are combined into a DataFrame, with each line representing a single simulation, allowing for statistical analysis. For example,

mean(df[:reward]./df[:n_steps])

would compute the average reward per step with each simulation weighted equally regardless of length.

Display

DisplaySimulator

The DisplaySimulator displays each step of a simulation in real time through a multimedia display such as a Jupyter notebook or ElectronDisplay. Specifically it uses POMDPTools.render and the built-in Julia display function to visualize each step.

Example:

using POMDPs
 using POMDPModels
@@ -77,4 +77,4 @@
 m = SimpleGridWorld()
 simulate(ds, m, RandomPolicy(m))
POMDPTools.Simulators.DisplaySimulatorType
DisplaySimulator(;kwargs...)

Create a simulator that displays each step of a simulation.

Given a POMDP or MDP model m, this simulator roughly works like

for step in stepthrough(m, ...)
     display(render(m, step))
-end

Keyword Arguments

  • display::AbstractDisplay: the display to use for the first argument to the display function. If this is nothing, display(...) will be called without an AbstractDisplay argument.
  • render_kwargs::NamedTuple: keyword arguments for POMDPTools.render(...)
  • max_fps::Number=10: maximum number of frames to be displayed per second - sleep will be used to skip extra time, so this is not designed for high precision
  • predisplay::Function: function to call before every call to display(...). The only argument to this function will be the display (if it is specified) or nothing
  • extra_initial::Bool=false: if true, display an extra step at the beginning with only elements t, sp, and bp for POMDPs (this can be useful to see the initial state if render displays only sp and not s).
  • extra_final::Bool=true: iftrue, display an extra step at the end with only elementst,done,s, andbfor POMDPs (this can be useful to see the final state ifrenderdisplays onlysand notsp`).
  • max_steps::Integer: maximum number of steps to run for
  • spec::NTuple{Symbol}: specification of what step elements to display (see eachstep)
  • rng::AbstractRNG: random number generator

See the POMDPSimulators documentation for more tips about using specific displays.

source

Display-specific tips

The following tips may be helpful when using particular displays.

Jupyter notebooks

By default, in a Jupyter notebook, the visualizations of all steps are displayed in the output box one after another. To make the output animated instead, where the image is overwritten at each step, one may use

DisplaySimulator(predisplay=(d)->IJulia.clear_output(true))

ElectronDisplay

By default, ElectronDisplay will open a new window for each new step. To prevent this, use

ElectronDisplay.CONFIG.single_window = true
+end

Keyword Arguments

See the POMDPSimulators documentation for more tips about using specific displays.

source

Display-specific tips

The following tips may be helpful when using particular displays.

Jupyter notebooks

By default, in a Jupyter notebook, the visualizations of all steps are displayed in the output box one after another. To make the output animated instead, where the image is overwritten at each step, one may use

DisplaySimulator(predisplay=(d)->IJulia.clear_output(true))

ElectronDisplay

By default, ElectronDisplay will open a new window for each new step. To prevent this, use

ElectronDisplay.CONFIG.single_window = true
diff --git a/dev/POMDPTools/testing/index.html b/dev/POMDPTools/testing/index.html index 0943836f..b1257c6e 100644 --- a/dev/POMDPTools/testing/index.html +++ b/dev/POMDPTools/testing/index.html @@ -1,8 +1,8 @@ Testing · POMDPs.jl

Testing

POMDPTools contains basic utilities for testing models and solvers.

Testing (PO)MDP Models

POMDPTools.Testing.has_consistent_distributionsFunction
has_consistent_distributions(m::MDP; atol=0)
-has_consistent_distributions(m::POMDP; atol=0)

Return true if no problems are found in the distributions for a discrete problem. Print information and return false if problems are found.

Tests whether

  • All probabilities are positive
  • Probabilities for all distributions sum to 1
  • All items with positive probability are in the support

Keyword Arguments

  • atol: absolute tolerance passed to approx for all probability checks
source
POMDPTools.Testing.has_consistent_initial_distributionFunction
has_consistent_initial_distribution(m; atol=0)

Return true if no problems are found with the initial state distribution for a discrete problem. Print information and return false if problems are found.

See has_consistent_distributions for information on what checks are performed.

source
POMDPTools.Testing.has_consistent_transition_distributionsFunction
has_consistent_transition_distributions(m; atol=0)

Return true if no problems are found in the transition distributions for a discrete problem. Print information and return false if problems are found.

See has_consistent_distributions for information on what checks are performed.

source
POMDPTools.Testing.has_consistent_observation_distributionsFunction
has_consistent_observation_distributions(m; atol=0)

Return true if no problems are found in the observation distributions for a discrete POMDP. Print information and return false if problems are found.

See has_consistent_distributions for information on what checks are performed.

source

Testing Solvers

POMDPTools.Testing.test_solverFunction
test_solver(solver::Solver, problem::POMDP)
+has_consistent_distributions(m::POMDP; atol=0)

Return true if no problems are found in the distributions for a discrete problem. Print information and return false if problems are found.

Tests whether

  • All probabilities are positive
  • Probabilities for all distributions sum to 1
  • All items with positive probability are in the support

Keyword Arguments

  • atol: absolute tolerance passed to approx for all probability checks
source
POMDPTools.Testing.has_consistent_initial_distributionFunction
has_consistent_initial_distribution(m; atol=0)

Return true if no problems are found with the initial state distribution for a discrete problem. Print information and return false if problems are found.

See has_consistent_distributions for information on what checks are performed.

source
POMDPTools.Testing.has_consistent_transition_distributionsFunction
has_consistent_transition_distributions(m; atol=0)

Return true if no problems are found in the transition distributions for a discrete problem. Print information and return false if problems are found.

See has_consistent_distributions for information on what checks are performed.

source
POMDPTools.Testing.has_consistent_observation_distributionsFunction
has_consistent_observation_distributions(m; atol=0)

Return true if no problems are found in the observation distributions for a discrete POMDP. Print information and return false if problems are found.

See has_consistent_distributions for information on what checks are performed.

source

Testing Solvers

POMDPTools.Testing.test_solverFunction
test_solver(solver::Solver, problem::POMDP)
 test_solver(solver::Solver, problem::MDP)

Use the solver to solve the specified problem, then run a simulation.

This is designed to illustrate how solvers are expected to function. All solvers should be able to complete this standard test with the simple models in the POMDPModels package.

Note that this does NOT test the optimality of the solution, but is only a smoke test to see if the solver interacts with POMDP models as expected.

To run this with a solver called YourSolver, run

using POMDPToolbox
 using POMDPModels
 
 solver = YourSolver(# initialize with parameters #)
-test_solver(solver, BabyPOMDP())
source
+test_solver(solver, BabyPOMDP())source diff --git a/dev/POMDPTools/visualization/index.html b/dev/POMDPTools/visualization/index.html index aae59a28..f41b7e6a 100644 --- a/dev/POMDPTools/visualization/index.html +++ b/dev/POMDPTools/visualization/index.html @@ -1,7 +1,7 @@ -Visualization · POMDPs.jl

Visualization

POMDPTools contains a basic visualization interface consisting of the render function.

Problem writers should implement a method of this function so that their problem can be visualized in a variety of contexts including jupyter notebooks, web browsers, or saved as images or animations.

POMDPTools.ModelTools.renderFunction
render(m::Union{MDP,POMDP}, step::NamedTuple)

Return a renderable representation of the step in problem m.

The renderable representation may be anything that has show(io, mime, x) methods. It could be a plot, svg, Compose.jl context, Cairo context, or image.

Arguments

step is a NamedTuple that contains the states, action, etc. corresponding to one transition in a simulation. It may have the following fields:

  • t: the time step index
  • s: the state at the beginning of the step
  • a: the action
  • sp: the state at the end of the step (s')
  • r: the reward for the step
  • o: the observation
  • b: the belief at the
  • bp: the belief at the end of the step
  • i: info from the model when the state transition was calculated
  • ai: info from the policy decision
  • ui: info from the belief update

Keyword arguments are reserved for the problem implementer and can be used to control appearance, etc.

Important Notes

  • step may not contain all of the elements listed above, so render should check for them and render only what is available
  • o typically corresponds to sp, so it is often clearer for POMDPs to render sp rather than s.
source

Sometimes it is important to have control over how the problem is rendered with different mimetypes. One way to handle this is to have render return a custom type, e.g.

struct MyProblemVisualization
+Visualization · POMDPs.jl

Visualization

POMDPTools contains a basic visualization interface consisting of the render function.

Problem writers should implement a method of this function so that their problem can be visualized in a variety of contexts including jupyter notebooks, web browsers, or saved as images or animations.

POMDPTools.ModelTools.renderFunction
render(m::Union{MDP,POMDP}, step::NamedTuple)

Return a renderable representation of the step in problem m.

The renderable representation may be anything that has show(io, mime, x) methods. It could be a plot, svg, Compose.jl context, Cairo context, or image.

Arguments

step is a NamedTuple that contains the states, action, etc. corresponding to one transition in a simulation. It may have the following fields:

  • t: the time step index
  • s: the state at the beginning of the step
  • a: the action
  • sp: the state at the end of the step (s')
  • r: the reward for the step
  • o: the observation
  • b: the belief at the
  • bp: the belief at the end of the step
  • i: info from the model when the state transition was calculated
  • ai: info from the policy decision
  • ui: info from the belief update

Keyword arguments are reserved for the problem implementer and can be used to control appearance, etc.

Important Notes

  • step may not contain all of the elements listed above, so render should check for them and render only what is available
  • o typically corresponds to sp, so it is often clearer for POMDPs to render sp rather than s.
source

Sometimes it is important to have control over how the problem is rendered with different mimetypes. One way to handle this is to have render return a custom type, e.g.

struct MyProblemVisualization
     mdp::MyProblem
     step::NamedTuple
 end
 
-POMDPTools.render(mdp, step) = MyProblemVisualization(mdp, step)

and then implement custom show methods, e.g.

show(io::IO, mime::MIME"text/html", v::MyProblemVisualization)
+POMDPTools.render(mdp, step) = MyProblemVisualization(mdp, step)

and then implement custom show methods, e.g.

show(io::IO, mime::MIME"text/html", v::MyProblemVisualization)
diff --git a/dev/api/index.html b/dev/api/index.html index 496089b5..700ef41c 100644 --- a/dev/api/index.html +++ b/dev/api/index.html @@ -1,36 +1,36 @@ API Documentation · POMDPs.jl

API Documentation

Docstrings for POMDPs.jl interface members can be accessed through Julia's built-in documentation system or in the list below.

Contents

Index

Types

POMDPs.POMDPType
POMDP{S,A,O}

Abstract base type for a partially observable Markov decision process.

S: state type
 A: action type
-O: observation type
source
POMDPs.MDPType
MDP{S,A}

Abstract base type for a fully observable Markov decision process.

S: state type
-A: action type
source
POMDPs.PolicyType

Base type for a policy (a map from every possible belief, or more abstract policy state, to an optimal or suboptimal action)

source
POMDPs.UpdaterType

Abstract type for an object that defines how the belief should be updated

A belief is a general construct that represents the knowledge an agent has about the state of the system. This can be a probability distribution, an action observation history or a more general representation.

source

Model Functions

Dynamics

POMDPs.transitionFunction
transition(m::POMDP, state, action)
-transition(m::MDP, state, action)

Return the transition distribution from the current state-action pair.

If it is difficult to define the probability density or mass function explicitly, consider using POMDPModelTools.ImplicitDistribution to define a generative model.

source
POMDPs.MDPType
MDP{S,A}

Abstract base type for a fully observable Markov decision process.

S: state type
+A: action type
source
POMDPs.PolicyType

Base type for a policy (a map from every possible belief, or more abstract policy state, to an optimal or suboptimal action)

source
POMDPs.UpdaterType

Abstract type for an object that defines how the belief should be updated

A belief is a general construct that represents the knowledge an agent has about the state of the system. This can be a probability distribution, an action observation history or a more general representation.

source

Model Functions

Dynamics

POMDPs.transitionFunction
transition(m::POMDP, state, action)
+transition(m::MDP, state, action)

Return the transition distribution from the current state-action pair.

If it is difficult to define the probability density or mass function explicitly, consider using POMDPModelTools.ImplicitDistribution to define a generative model.

source
POMDPs.observationFunction
observation(m::POMDP, statep)
 observation(m::POMDP, action, statep)
 observation(m::POMDP, state, action, statep)

Return the observation distribution. You need only define the method with the fewest arguments needed to determine the observation distribution.

If it is difficult to define the probability density or mass function explicitly, consider using POMDPModelTools.ImplicitDistribution to define a generative model.

Example

using POMDPModelTools # for SparseCat
 
 struct MyPOMDP <: POMDP{Int, Int, Int} end
 
-observation(p::MyPOMDP, sp::Int) = SparseCat([sp-1, sp, sp+1], [0.1, 0.8, 0.1])
source
POMDPs.rewardFunction
reward(m::POMDP, s, a)
+observation(p::MyPOMDP, sp::Int) = SparseCat([sp-1, sp, sp+1], [0.1, 0.8, 0.1])
source
POMDPs.rewardFunction
reward(m::POMDP, s, a)
 reward(m::MDP, s, a)

Return the immediate reward for the s-a pair.

reward(m::POMDP, s, a, sp)
-reward(m::MDP, s, a, sp)

Return the immediate reward for the s-a-s' triple

reward(m::POMDP, s, a, sp, o)

Return the immediate reward for the s-a-s'-o quad

For some problems, it is easier to express reward(m, s, a, sp) or reward(m, s, a, sp, o), than reward(m, s, a), but some solvers, e.g. SARSOP, can only use reward(m, s, a). Both can be implemented for a problem, but when reward(m, s, a) is implemented, it should be consistent with reward(m, s, a, sp[, o]), that is, it should be the expected value over all destination states and observations.

source
POMDPs.genFunction
gen(m::Union{MDP,POMDP}, s, a, rng::AbstractRNG)

Function for implementing the entire MDP/POMDP generative model by returning a NamedTuple.

Solver and simulator writers should use the @gen macro to call a generative model.

Arguments

  • m: an MDP or POMDP model
  • s: the current state
  • a: the action
  • rng: a random number generator (Typically a MersenneTwister)

Return

The function should return a NamedTuple. With a subset of following entries:

MDP

  • sp: the next state
  • r: the reward for the step
  • info: extra debugging information, typically in an associative container like a NamedTuple

POMDP

  • sp: the next state
  • o: the observation
  • r: the reward for the step
  • info: extra debugging information, typically in an associative container like a NamedTuple

Some elements can be left out. For instance if o is left out of the return, the problem-writer can also implement observation and POMDPs.jl will automatically use it when needed.

Example

struct LQRMDP <: MDP{Float64, Float64} end
+reward(m::MDP, s, a, sp)

Return the immediate reward for the s-a-s' triple

reward(m::POMDP, s, a, sp, o)

Return the immediate reward for the s-a-s'-o quad

For some problems, it is easier to express reward(m, s, a, sp) or reward(m, s, a, sp, o), than reward(m, s, a), but some solvers, e.g. SARSOP, can only use reward(m, s, a). Both can be implemented for a problem, but when reward(m, s, a) is implemented, it should be consistent with reward(m, s, a, sp[, o]), that is, it should be the expected value over all destination states and observations.

source
POMDPs.genFunction
gen(m::Union{MDP,POMDP}, s, a, rng::AbstractRNG)

Function for implementing the entire MDP/POMDP generative model by returning a NamedTuple.

gen should only be implemented in the case where two or more of the next state, observation, and reward need to be generated at the same time. If the state transition model can be separated from the reward and observation models, you should implement transition with an ImplicitDistribution instead of gen.

Solver and simulator writers should use the @gen macro to call a generative model.

Arguments

  • m: an MDP or POMDP model
  • s: the current state
  • a: the action
  • rng: a random number generator (Typically a MersenneTwister)

Return

The function should return a NamedTuple. With a subset of following entries:

MDP

  • sp: the next state
  • r: the reward for the step
  • info: extra debugging information, typically in an associative container like a NamedTuple

POMDP

  • sp: the next state
  • o: the observation
  • r: the reward for the step
  • info: extra debugging information, typically in an associative container like a NamedTuple

Some elements can be left out. For instance if o is left out of the return, the problem-writer can also implement observation and POMDPs.jl will automatically use it when needed.

Example

struct LQRMDP <: MDP{Float64, Float64} end
 
-POMDPs.gen(m::LQRMDP, s, a, rng) = (sp = s + a + randn(rng), r = -s^2 - a^2)
source
POMDPs.@genMacro
@gen(X)(m, s, a)
-@gen(X)(m, s, a, rng::AbstractRNG)

Call the generative model for a (PO)MDP m; Sample values from several nodes in the dynamic decision network. X is one or more symbols indicating which nodes to output.

Solvers and simulators should call this rather than the gen function. Problem writers should implement a method of the transition or gen function instead of altering @gen.

Arguments

  • m: an MDP or POMDP model
  • s: the current state
  • a: the action
  • rng (optional): a random number generator (Typically a MersenneTwister)

Return

If X, is a symbol, return a value sample from the corresponding node. If X is several symbols, return a Tuple of values sampled from the specified nodes.

Examples

Let m be an MDP or POMDP, s be a state of m, a be an action of m, and rng be an AbstractRNG.

  • @gen(:sp, :r)(m, s, a) returns a Tuple containing the next state and reward.
  • @gen(:sp, :o, :r)(m, s, a, rng) returns a Tuple containing the next state, observation, and reward.
  • @gen(:sp)(m, s, a, rng) returns the next state.
source

Static Properties

POMDPs.statesFunction
states(problem::POMDP)
-states(problem::MDP)

Returns the complete state space of a POMDP.

source
POMDPs.actionsFunction
actions(m::Union{MDP,POMDP})

Returns the entire action space of a (PO)MDP.


actions(m::Union{MDP,POMDP}, s)

Return the actions that can be taken from state s.


actions(m::POMDP, b)

Return the actions that can be taken from belief b.

To implement an observation-dependent action space, use currentobs(b) to get the observation associated with belief b within the implementation of actions(m, b).

source
POMDPs.isterminalFunction
isterminal(m::Union{MDP,POMDP}, s)

Check if state s is terminal.

If a state is terminal, no actions will be taken in it and no additional rewards will be accumulated. Thus, the value function at such a state is, by definition, zero.

source
POMDPs.discountFunction
discount(m::POMDP)
-discount(m::MDP)

Return the discount factor for the problem.

source
POMDPs.initialstateFunction
initialstate(m::Union{POMDP,MDP})

Return a distribution of initial states for (PO)MDP m.

If it is difficult to define the probability density or mass function explicitly, consider using POMDPModelTools.ImplicitDistribution to define a model for sampling.

source
POMDPs.initialobsFunction
initialobs(m::POMDP, s)

Return a distribution of initial observations for POMDP m and state s.

If it is difficult to define the probability density or mass function explicitly, consider using POMDPModelTools.ImplicitDistribution to define a model for sampling.

This function is only used in cases where the policy expects an initial observation rather than an initial belief, e.g. in a reinforcement learning setting. It is not used in a standard POMDP simulation.

source
POMDPs.stateindexFunction
stateindex(problem::POMDP, s)
-stateindex(problem::MDP, s)

Return the integer index of state s. Used for discrete models only.

source
POMDPs.actionindexFunction
actionindex(problem::POMDP, a)
-actionindex(problem::MDP, a)

Return the integer index of action a. Used for discrete models only.

source
POMDPs.obsindexFunction
obsindex(problem::POMDP, o)

Return the integer index of observation o. Used for discrete models only.

source
POMDPs.convert_sFunction
convert_s(::Type{V}, s, problem::Union{MDP,POMDP}) where V<:AbstractArray
-convert_s(::Type{S}, vec::V, problem::Union{MDP,POMDP}) where {S,V<:AbstractArray}

Convert a state to vectorized form or vice versa.

source
POMDPs.convert_aFunction
convert_a(::Type{V}, a, problem::Union{MDP,POMDP}) where V<:AbstractArray
-convert_a(::Type{A}, vec::V, problem::Union{MDP,POMDP}) where {A,V<:AbstractArray}

Convert an action to vectorized form or vice versa.

source
POMDPs.convert_oFunction
convert_o(::Type{V}, o, problem::Union{MDP,POMDP}) where V<:AbstractArray
-convert_o(::Type{O}, vec::V, problem::Union{MDP,POMDP}) where {O,V<:AbstractArray}

Convert an observation to vectorized form or vice versa.

source

Type Inference

POMDPs.statetypeFunction
statetype(t::Type)
+POMDPs.gen(m::LQRMDP, s, a, rng) = (sp = s + a + randn(rng), r = -s^2 - a^2)
source
POMDPs.@genMacro
@gen(X)(m, s, a)
+@gen(X)(m, s, a, rng::AbstractRNG)

Call the generative model for a (PO)MDP m; Sample values from several nodes in the dynamic decision network. X is one or more symbols indicating which nodes to output.

Solvers and simulators should call this rather than the gen function. Problem writers should implement a method of the transition or gen function instead of altering @gen.

Arguments

  • m: an MDP or POMDP model
  • s: the current state
  • a: the action
  • rng (optional): a random number generator (Typically a MersenneTwister)

Return

If X, is a symbol, return a value sample from the corresponding node. If X is several symbols, return a Tuple of values sampled from the specified nodes.

Examples

Let m be an MDP or POMDP, s be a state of m, a be an action of m, and rng be an AbstractRNG.

  • @gen(:sp, :r)(m, s, a) returns a Tuple containing the next state and reward.
  • @gen(:sp, :o, :r)(m, s, a, rng) returns a Tuple containing the next state, observation, and reward.
  • @gen(:sp)(m, s, a, rng) returns the next state.
source

Static Properties

POMDPs.statesFunction
states(problem::POMDP)
+states(problem::MDP)

Returns the complete state space of a POMDP.

source
POMDPs.actionsFunction
actions(m::Union{MDP,POMDP})

Returns the entire action space of a (PO)MDP.


actions(m::Union{MDP,POMDP}, s)

Return the actions that can be taken from state s.


actions(m::POMDP, b)

Return the actions that can be taken from belief b.

To implement an observation-dependent action space, use currentobs(b) to get the observation associated with belief b within the implementation of actions(m, b).

source
POMDPs.isterminalFunction
isterminal(m::Union{MDP,POMDP}, s)

Check if state s is terminal.

If a state is terminal, no actions will be taken in it and no additional rewards will be accumulated. Thus, the value function at such a state is, by definition, zero.

source
POMDPs.discountFunction
discount(m::POMDP)
+discount(m::MDP)

Return the discount factor for the problem.

source
POMDPs.initialstateFunction
initialstate(m::Union{POMDP,MDP})

Return a distribution of initial states for (PO)MDP m.

If it is difficult to define the probability density or mass function explicitly, consider using POMDPModelTools.ImplicitDistribution to define a model for sampling.

source
POMDPs.initialobsFunction
initialobs(m::POMDP, s)

Return a distribution of initial observations for POMDP m and state s.

If it is difficult to define the probability density or mass function explicitly, consider using POMDPModelTools.ImplicitDistribution to define a model for sampling.

This function is only used in cases where the policy expects an initial observation rather than an initial belief, e.g. in a reinforcement learning setting. It is not used in a standard POMDP simulation.

source
POMDPs.stateindexFunction
stateindex(problem::POMDP, s)
+stateindex(problem::MDP, s)

Return the integer index of state s. Used for discrete models only.

source
POMDPs.actionindexFunction
actionindex(problem::POMDP, a)
+actionindex(problem::MDP, a)

Return the integer index of action a. Used for discrete models only.

source
POMDPs.obsindexFunction
obsindex(problem::POMDP, o)

Return the integer index of observation o. Used for discrete models only.

source
POMDPs.convert_sFunction
convert_s(::Type{V}, s, problem::Union{MDP,POMDP}) where V<:AbstractArray
+convert_s(::Type{S}, vec::V, problem::Union{MDP,POMDP}) where {S,V<:AbstractArray}

Convert a state to vectorized form or vice versa.

source
POMDPs.convert_aFunction
convert_a(::Type{V}, a, problem::Union{MDP,POMDP}) where V<:AbstractArray
+convert_a(::Type{A}, vec::V, problem::Union{MDP,POMDP}) where {A,V<:AbstractArray}

Convert an action to vectorized form or vice versa.

source
POMDPs.convert_oFunction
convert_o(::Type{V}, o, problem::Union{MDP,POMDP}) where V<:AbstractArray
+convert_o(::Type{O}, vec::V, problem::Union{MDP,POMDP}) where {O,V<:AbstractArray}

Convert an observation to vectorized form or vice versa.

source

Type Inference

POMDPs.statetypeFunction
statetype(t::Type)
 statetype(p::Union{POMDP,MDP})

Return the state type for a problem type (the S in POMDP{S,A,O}).

type A <: POMDP{Int, Bool, Bool} end
 
-statetype(A) # returns Int
source
POMDPs.actiontypeFunction
actiontype(t::Type)
 actiontype(p::Union{POMDP,MDP})

Return the state type for a problem type (the S in POMDP{S,A,O}).

type A <: POMDP{Bool, Int, Bool} end
 
-actiontype(A) # returns Int
source
POMDPs.obstypeFunction
obstype(t::Type)

Return the state type for a problem type (the S in POMDP{S,A,O}).

type A <: POMDP{Bool, Bool, Int} end
+actiontype(A) # returns Int
source
POMDPs.obstypeFunction
obstype(t::Type)

Return the state type for a problem type (the S in POMDP{S,A,O}).

type A <: POMDP{Bool, Bool, Int} end
 
-obstype(A) # returns Int
source

Distributions and Spaces

Base.randFunction
rand(rng::AbstractRNG, d::Any)

Return a random element from distribution or space d.

If d is a state or transition distribution, the sample will be a state; if d is an action distribution, the sample will be an action or if d is an observation distribution, the sample will be an observation.

source
Distributions.pdfFunction
pdf(d::Any, x::Any)

Evaluate the probability density of distribution d at sample x.

source
Distributions.supportFunction
support(d::Any)

Return an iterable object containing the possible values that can be sampled from distribution d. Values with zero probability may be skipped.

source

Belief Functions

POMDPs.updateFunction
update(updater::Updater, belief_old, action, observation)

Return a new instance of an updated belief given belief_old and the latest action and observation.

source

Distributions and Spaces

Base.randFunction
rand(rng::AbstractRNG, d::Any)

Return a random element from distribution or space d.

If d is a state or transition distribution, the sample will be a state; if d is an action distribution, the sample will be an action or if d is an observation distribution, the sample will be an observation.

source
Distributions.pdfFunction
pdf(d::Any, x::Any)

Evaluate the probability density of distribution d at sample x.

source
Distributions.supportFunction
support(d::Any)

Return an iterable object containing the possible values that can be sampled from distribution d. Values with zero probability may be skipped.

source

Belief Functions

POMDPs.updateFunction
update(updater::Updater, belief_old, action, observation)

Return a new instance of an updated belief given belief_old and the latest action and observation.

source
POMDPs.initialize_beliefFunction
initialize_belief(updater::Updater,
                      state_distribution::Any)
-initialize_belief(updater::Updater, belief::Any)

Returns a belief that can be updated using updater that has similar distribution to state_distribution or belief.

The conversion may be lossy. This function is also idempotent, i.e. there is a default implementation that passes the belief through when it is already the correct type: initialize_belief(updater::Updater, belief) = belief

source
POMDPs.historyFunction
history(b)

Return the action-observation history associated with belief b.

The history should be an AbstractVector, Tuple, (or similar object that supports indexing with end) full of NamedTuples with keys :a and :o, i.e. history(b)[end][:a] should be the last action taken leading up to b, and history(b)[end][:o] should be the last observation received.

It is acceptable to return only part of the history if that is all that is available, but it should always end with the current observation. For example, it would be acceptable to return a structure containing only the last three observations in a length 3 Vector{NamedTuple{(:o,),Tuple{O}}.

source
POMDPs.currentobsFunction
currentobs(b)

Return the latest observation associated with belief b.

If a solver or updater implements history(b) for a belief type, currentobs has a default implementation.

source

Policy and Solver Functions

POMDPs.solveFunction
solve(solver::Solver, problem::POMDP)

Solves the POMDP using method associated with solver, and returns a policy.

source
POMDPs.updaterFunction
updater(policy::Policy)

Returns a default Updater appropriate for a belief type that policy p can use

source
POMDPs.actionFunction
action(policy::Policy, x)

Returns the action that the policy deems best for the current state or belief, x.

x is a generalized information state - can be a state in an MDP, a distribution in POMDP, or another specialized policy-dependent representation of the information needed to choose an action.

source
POMDPs.valueFunction
value(p::Policy, s)
-value(p::Policy, s, a)

Returns the utility value from policy p given the state (or belief), or state-action (or belief-action) pair.

The state-action version is commonly referred to as the Q-value.

source

Simulator

POMDPs.simulateFunction
simulate(sim::Simulator, m::POMDP, p::Policy, u::Updater=updater(p), b0=initialstate(m), s0=rand(b0))
-simulate(sim::Simulator, m::MDP, p::Policy, s0=rand(initialstate(m)))

Run a simulation using the specified policy.

The return type is flexible and depends on the simulator. Simulations should adhere to the Simulation Standard.

source
+initialize_belief(updater::Updater, belief::Any)

Returns a belief that can be updated using updater that has similar distribution to state_distribution or belief.

The conversion may be lossy. This function is also idempotent, i.e. there is a default implementation that passes the belief through when it is already the correct type: initialize_belief(updater::Updater, belief) = belief

source
POMDPs.historyFunction
history(b)

Return the action-observation history associated with belief b.

The history should be an AbstractVector, Tuple, (or similar object that supports indexing with end) full of NamedTuples with keys :a and :o, i.e. history(b)[end][:a] should be the last action taken leading up to b, and history(b)[end][:o] should be the last observation received.

It is acceptable to return only part of the history if that is all that is available, but it should always end with the current observation. For example, it would be acceptable to return a structure containing only the last three observations in a length 3 Vector{NamedTuple{(:o,),Tuple{O}}.

source
POMDPs.currentobsFunction
currentobs(b)

Return the latest observation associated with belief b.

If a solver or updater implements history(b) for a belief type, currentobs has a default implementation.

source

Policy and Solver Functions

POMDPs.solveFunction
solve(solver::Solver, problem::POMDP)

Solves the POMDP using method associated with solver, and returns a policy.

source
POMDPs.updaterFunction
updater(policy::Policy)

Returns a default Updater appropriate for a belief type that policy p can use

source
POMDPs.actionFunction
action(policy::Policy, x)

Returns the action that the policy deems best for the current state or belief, x.

x is a generalized information state - can be a state in an MDP, a distribution in POMDP, or another specialized policy-dependent representation of the information needed to choose an action.

source
POMDPs.valueFunction
value(p::Policy, s)
+value(p::Policy, s, a)

Returns the utility value from policy p given the state (or belief), or state-action (or belief-action) pair.

The state-action version is commonly referred to as the Q-value.

source

Simulator

POMDPs.SimulatorType

Base type for an object defining how simulations should be carried out.

source
POMDPs.simulateFunction
simulate(sim::Simulator, m::POMDP, p::Policy, u::Updater=updater(p), b0=initialstate(m), s0=rand(b0))
+simulate(sim::Simulator, m::MDP, p::Policy, s0=rand(initialstate(m)))

Run a simulation using the specified policy.

The return type is flexible and depends on the simulator. Simulations should adhere to the Simulation Standard.

source
diff --git a/dev/concepts/index.html b/dev/concepts/index.html index 45a88f97..f2f0ff95 100644 --- a/dev/concepts/index.html +++ b/dev/concepts/index.html @@ -1,2 +1,2 @@ -Concepts and Architecture · POMDPs.jl

Concepts and Architecture

POMDPs.jl aims to coordinate the development of three software components: 1) a problem, 2) a solver, 3) an experiment. Each of these components has a set of abstract types associated with it and a set of functions that allow a user to define each component's behavior in a standardized way. An outline of the architecture is shown below.

concepts

The MDP and POMDP types are associated with the problem definition. The Solver and Policy types are associated with the solver or decision-making agent. Typically, the Updater type is also associated with the solver, but a solver may sometimes be used with an updater that was implemented separately. The Simulator type is associated with the experiment.

The code components of the POMDPs.jl ecosystem relevant to problems and solvers are shown below. The arrows represent the flow of information from the problems to the solvers. The figure shows the two interfaces that form POMDPs.jl - Explicit and Generative. Details about these interfaces can be found in the section on Defining POMDPs.

interface_relationships

POMDPs and MDPs

An MDP is a mathematical framework for sequential decision making under uncertainty, and where all of the uncertainty arises from outcomes that are partially random and partially under the control of a decision maker. Mathematically, an MDP is a tuple $(S,A,T,R,\gamma)$, where $S$ is the state space, $A$ is the action space, $T$ is a transition function defining the probability of transitioning to each state given the state and action at the previous time, and $R$ is a reward function mapping every possible transition $(s,a,s')$ to a real reward value. Finally, $\gamma$ is a discount factor that defines the relative weighting of current and future rewards. For more information see a textbook such as [1]. In POMDPs.jl an MDP is represented by a concrete subtype of the MDP abstract type and a set of methods that define each of its components as described in the problem definition section.

A POMDP is a more general sequential decision making problem in which the agent is not sure what state they are in. The state is only partially observable by the decision making agent. Mathematically, a POMDP is a tuple $(S,A,T,R,O,Z,\gamma)$ where $S$, $A$, $T$, $R$, and $\gamma$ have the same meaning as in an MDP, $Z$ is the agent's observation space, and $O$ defines the probability of receiving each observation at a transition. In POMDPs.jl, a POMDP is represented by a concrete subtype of the POMDP abstract type, and the methods described in the problem definition section.

POMDPs.jl contains additional functions for defining optional problem behavior such as an initial state distribution or terminal states. More information can be found in the Defining POMDPs section.

Beliefs and Updaters

In a POMDP domain, the decision-making agent does not have complete information about the state of the problem, so the agent can only make choices based on its "belief" about the state. In the POMDP literature, the term "belief" is typically defined to mean a probability distribution over all possible states of the system. However, in practice, the agent often makes decisions based on an incomplete or lossy record of past observations that has a structure much different from a probability distribution. For example, if the agent is represented by a finite-state controller, as is the case for Monte-Carlo Value Iteration [2], the belief is the controller state, which is a node in a graph. Another example is an agent represented by a recurrent neural network. In this case, the agent's belief is the state of the network. In order to accommodate a wide variety of decision-making approaches in POMDPs.jl, we use the term "belief" to denote the set of information that the agent makes a decision on, which could be an exact state distribution, an action-observation history, a set of weighted particles, or the examples mentioned before. In code, the belief can be represented by any built-in or user-defined type.

When an action is taken and a new observation is received, the belief is updated by the belief updater. In code, a belief updater is represented by a concrete subtype of the Updater abstract type, and the update(updater, belief, action, observation) function defines how the belief is updated when a new observation is received.

Although the agent may use a specialized belief structure to make decisions, the information initially given to the agent about the state of the problem is usually most conveniently represented as a state distribution, thus the initialize_belief function is provided to convert a state distribution to a specialized belief structure that an updater can work with.

In many cases, the belief structure is closely related to the solution technique, so it will be implemented by the programmer who writes the solver. In other cases, the agent can use a variety of belief structures to make decisions, so a domain-specific updater implemented by the programmer that wrote the problem description may be appropriate. Finally, some advanced generic belief updaters such as particle filters may be implemented by a third party. The convenience function updater(policy) can be used to get a suitable default updater for a policy, however many policies can work with other updaters.

For more information on implementing a belief updater, see Defining a Belief Updater

Solvers and Policies

Sequential decision making under uncertainty involves both online and offline calculations. In the broad sense, the term "solver" as used in the node in the figure at the top of the page refers to the software package that performs the calculations at both of these times. However, the code is broken up into two pieces, the solver that performs calculations offline and the policy that performs calculations online.

In the abstract, a policy is a mapping from every belief that an agent might take to an action. A policy is represented in code by a concrete subtype of the Policy abstract type. The programmer implements action to describe what computations need to be done online. For an online solver such as POMCP, all of the decision computation occurs within action while for an offline solver like SARSOP, there is very little computation within action. See Interacting with Policies for more information.

The offline portion of the computation is carried out by the solver, which is represented by a concrete subtype of the Solver abstract type. Computations occur within the solve function. For an offline solver like SARSOP, nearly all of the decision computation occurs within this function, but for some online solvers such as POMCP, solve merely embeds the problem in the policy.

Simulators

A simulator defines a way to run one or more simulations. It is represented by a concrete subtype of the Simulator abstract type and the simulation is an implemention of simulate. Depending on the simulator, simulate may return a variety of data about the simulation, such as the discounted reward or the state history. All simulators should perform simulations consistent with the Simulation Standard.

[1] Decision Making Under Uncertainty: Theory and Application by Mykel J. Kochenderfer, MIT Press, 2015

[2] Bai, H., Hsu, D., & Lee, W. S. (2014). Integrated perception and planning in the continuous space: A POMDP approach. The International Journal of Robotics Research, 33(9), 1288-1302

+Concepts and Architecture · POMDPs.jl

Concepts and Architecture

POMDPs.jl aims to coordinate the development of three software components: 1) a problem, 2) a solver, 3) an experiment. Each of these components has a set of abstract types associated with it and a set of functions that allow a user to define each component's behavior in a standardized way. An outline of the architecture is shown below.

concepts

The MDP and POMDP types are associated with the problem definition. The Solver and Policy types are associated with the solver or decision-making agent. Typically, the Updater type is also associated with the solver, but a solver may sometimes be used with an updater that was implemented separately. The Simulator type is associated with the experiment.

The code components of the POMDPs.jl ecosystem relevant to problems and solvers are shown below. The arrows represent the flow of information from the problems to the solvers. The figure shows the two interfaces that form POMDPs.jl - Explicit and Generative. Details about these interfaces can be found in the section on Defining POMDPs.

interface_relationships

POMDPs and MDPs

An MDP is a mathematical framework for sequential decision making under uncertainty, and where all of the uncertainty arises from outcomes that are partially random and partially under the control of a decision maker. Mathematically, an MDP is a tuple $(S,A,T,R,\gamma)$, where $S$ is the state space, $A$ is the action space, $T$ is a transition function defining the probability of transitioning to each state given the state and action at the previous time, and $R$ is a reward function mapping every possible transition $(s,a,s')$ to a real reward value. Finally, $\gamma$ is a discount factor that defines the relative weighting of current and future rewards. For more information see a textbook such as [1]. In POMDPs.jl an MDP is represented by a concrete subtype of the MDP abstract type and a set of methods that define each of its components as described in the problem definition section.

A POMDP is a more general sequential decision making problem in which the agent is not sure what state they are in. The state is only partially observable by the decision making agent. Mathematically, a POMDP is a tuple $(S,A,T,R,O,Z,\gamma)$ where $S$, $A$, $T$, $R$, and $\gamma$ have the same meaning as in an MDP, $Z$ is the agent's observation space, and $O$ defines the probability of receiving each observation at a transition. In POMDPs.jl, a POMDP is represented by a concrete subtype of the POMDP abstract type, and the methods described in the problem definition section.

POMDPs.jl contains additional functions for defining optional problem behavior such as an initial state distribution or terminal states. More information can be found in the Defining POMDPs section.

Beliefs and Updaters

In a POMDP domain, the decision-making agent does not have complete information about the state of the problem, so the agent can only make choices based on its "belief" about the state. In the POMDP literature, the term "belief" is typically defined to mean a probability distribution over all possible states of the system. However, in practice, the agent often makes decisions based on an incomplete or lossy record of past observations that has a structure much different from a probability distribution. For example, if the agent is represented by a finite-state controller, as is the case for Monte-Carlo Value Iteration [2], the belief is the controller state, which is a node in a graph. Another example is an agent represented by a recurrent neural network. In this case, the agent's belief is the state of the network. In order to accommodate a wide variety of decision-making approaches in POMDPs.jl, we use the term "belief" to denote the set of information that the agent makes a decision on, which could be an exact state distribution, an action-observation history, a set of weighted particles, or the examples mentioned before. In code, the belief can be represented by any built-in or user-defined type.

When an action is taken and a new observation is received, the belief is updated by the belief updater. In code, a belief updater is represented by a concrete subtype of the Updater abstract type, and the update(updater, belief, action, observation) function defines how the belief is updated when a new observation is received.

Although the agent may use a specialized belief structure to make decisions, the information initially given to the agent about the state of the problem is usually most conveniently represented as a state distribution, thus the initialize_belief function is provided to convert a state distribution to a specialized belief structure that an updater can work with.

In many cases, the belief structure is closely related to the solution technique, so it will be implemented by the programmer who writes the solver. In other cases, the agent can use a variety of belief structures to make decisions, so a domain-specific updater implemented by the programmer that wrote the problem description may be appropriate. Finally, some advanced generic belief updaters such as particle filters may be implemented by a third party. The convenience function updater(policy) can be used to get a suitable default updater for a policy, however many policies can work with other updaters.

For more information on implementing a belief updater, see Defining a Belief Updater

Solvers and Policies

Sequential decision making under uncertainty involves both online and offline calculations. In the broad sense, the term "solver" as used in the node in the figure at the top of the page refers to the software package that performs the calculations at both of these times. However, the code is broken up into two pieces, the solver that performs calculations offline and the policy that performs calculations online.

In the abstract, a policy is a mapping from every belief that an agent might take to an action. A policy is represented in code by a concrete subtype of the Policy abstract type. The programmer implements action to describe what computations need to be done online. For an online solver such as POMCP, all of the decision computation occurs within action while for an offline solver like SARSOP, there is very little computation within action. See Interacting with Policies for more information.

The offline portion of the computation is carried out by the solver, which is represented by a concrete subtype of the Solver abstract type. Computations occur within the solve function. For an offline solver like SARSOP, nearly all of the decision computation occurs within this function, but for some online solvers such as POMCP, solve merely embeds the problem in the policy.

Simulators

A simulator defines a way to run one or more simulations. It is represented by a concrete subtype of the Simulator abstract type and the simulation is an implemention of simulate. Depending on the simulator, simulate may return a variety of data about the simulation, such as the discounted reward or the state history. All simulators should perform simulations consistent with the Simulation Standard.

[1] Decision Making Under Uncertainty: Theory and Application by Mykel J. Kochenderfer, MIT Press, 2015

[2] Bai, H., Hsu, D., & Lee, W. S. (2014). Integrated perception and planning in the continuous space: A POMDP approach. The International Journal of Robotics Research, 33(9), 1288-1302

diff --git a/dev/def_pomdp/index.html b/dev/def_pomdp/index.html index ebd50ac1..379c00e1 100644 --- a/dev/def_pomdp/index.html +++ b/dev/def_pomdp/index.html @@ -196,4 +196,4 @@ R = [-1. -100. 10.; -1. 10. -100.] -m = TabularPOMDP(T, R, O, 0.95)

Here T is a $|S| \times |A| \times |S|$ array representing the transition probabilities, with T[sp, a, s] $= T(s' | s, a)$. Similarly, O is an $|O| \times |A| \times |S|$ encoding the observation distribution with O[o, a, sp] $= Z(o | a, s')$, and R is a $|S| \times |A|$ matrix that encodes the reward function. 0.95 is the discount factor.

+m = TabularPOMDP(T, R, O, 0.95)

Here T is a $|S| \times |A| \times |S|$ array representing the transition probabilities, with T[sp, a, s] $= T(s' | s, a)$. Similarly, O is an $|O| \times |A| \times |S|$ encoding the observation distribution with O[o, a, sp] $= Z(o | a, s')$, and R is a $|S| \times |A|$ matrix that encodes the reward function. 0.95 is the discount factor.

diff --git a/dev/def_solver/index.html b/dev/def_solver/index.html index 672b4476..54b8b5c6 100644 --- a/dev/def_solver/index.html +++ b/dev/def_solver/index.html @@ -1,2 +1,2 @@ -Solvers · POMDPs.jl

Solvers

Defining a solver involves creating or using four pieces of code:

  1. A subtype of Solver that holds the parameters and configuration options for the solver.
  2. A subtype of Policy that holds all of the data needed to choose actions online.
  3. A method of solve that takes the Solver and a (PO)MDP as arguments, performs all of the offline computations for solving the problem, and returns the policy.
  4. A method of action that takes in the policy and a state or belief and returns an action.

In many cases, items 2 and 4 can be satisfied with an off-the-shelf Policy from the POMDPTools package. also contains many tools that are useful for defining solvers in a robust, concise, and readable manner.

Online and Offline Solvers

Generally, solvers can be grouped into two categories: Offline solvers that do most of their computational work before interacting with the environment, and online solvers that do their work online as each new state or observation is encountered. Although offline and online solvers both use the exact same Solver, solve, Policy, action structure, the work of defining online and offline solvers is focused on different portions.

For an offline solver, most of the implementation effort will be spent on the [solve] function, and an off-the-shelf policy from POMDPTools will typically be used.

For an online solver, the solve function typically does little or no work, but merely creates a Policy object that will carry out computation online. It is typical in POMDPs.jl to use the term "Planner" to name a Policy object for an online solver that carries out a large amount of computation ("planning") at interaction time. In this case most of the effort will be focused on implementing the action method for the "Planner" Policy type.

Examples

Solver implementation is most clearly explained through examples. The following sections contain examples of both online and offline solver definitions:

+Solvers · POMDPs.jl

Solvers

Defining a solver involves creating or using four pieces of code:

  1. A subtype of Solver that holds the parameters and configuration options for the solver.
  2. A subtype of Policy that holds all of the data needed to choose actions online.
  3. A method of solve that takes the Solver and a (PO)MDP as arguments, performs all of the offline computations for solving the problem, and returns the policy.
  4. A method of action that takes in the policy and a state or belief and returns an action.

In many cases, items 2 and 4 can be satisfied with an off-the-shelf Policy from the POMDPTools package. also contains many tools that are useful for defining solvers in a robust, concise, and readable manner.

Online and Offline Solvers

Generally, solvers can be grouped into two categories: Offline solvers that do most of their computational work before interacting with the environment, and online solvers that do their work online as each new state or observation is encountered. Although offline and online solvers both use the exact same Solver, solve, Policy, action structure, the work of defining online and offline solvers is focused on different portions.

For an offline solver, most of the implementation effort will be spent on the [solve] function, and an off-the-shelf policy from POMDPTools will typically be used.

For an online solver, the solve function typically does little or no work, but merely creates a Policy object that will carry out computation online. It is typical in POMDPs.jl to use the term "Planner" to name a Policy object for an online solver that carries out a large amount of computation ("planning") at interaction time. In this case most of the effort will be focused on implementing the action method for the "Planner" Policy type.

Examples

Solver implementation is most clearly explained through examples. The following sections contain examples of both online and offline solver definitions:

diff --git a/dev/def_updater/index.html b/dev/def_updater/index.html index f381da14..44b7edf4 100644 --- a/dev/def_updater/index.html +++ b/dev/def_updater/index.html @@ -29,4 +29,4 @@ b = Any[POMDPModels.BoolDistribution(0.0), false, false] b = Any[POMDPModels.BoolDistribution(0.0), false, false, false, false] b = Any[POMDPModels.BoolDistribution(0.0), false, false, false, false, true, false] -b = Any[POMDPModels.BoolDistribution(0.0), false, false, false, false, true, false, true, false] +b = Any[POMDPModels.BoolDistribution(0.0), false, false, false, false, true, false, true, false] diff --git a/dev/faq/index.html b/dev/faq/index.html index ddc2aceb..1493a5e1 100644 --- a/dev/faq/index.html +++ b/dev/faq/index.html @@ -14,4 +14,4 @@ end end -POMDPs.reward(m, s, a) = rdict[(s, a)]

Why do I need to put type assertions pomdp::POMDP into the function signature?

Specifying the type in your function signature allows Julia to call the appropriate function when your custom type is passed into it. For example if a POMDPs.jl solver calls states on the POMDP that you passed into it, the correct states function will only get dispatched if you specified that the states function you wrote works with your POMDP type. Because Julia supports multiple-dispatch, these type assertion are a way for doing object-oriented programming in Julia.

+POMDPs.reward(m, s, a) = rdict[(s, a)]

Why do I need to put type assertions pomdp::POMDP into the function signature?

Specifying the type in your function signature allows Julia to call the appropriate function when your custom type is passed into it. For example if a POMDPs.jl solver calls states on the POMDP that you passed into it, the correct states function will only get dispatched if you specified that the states function you wrote works with your POMDP type. Because Julia supports multiple-dispatch, these type assertion are a way for doing object-oriented programming in Julia.

diff --git a/dev/get_started/index.html b/dev/get_started/index.html index 08fbd757..f85e6535 100644 --- a/dev/get_started/index.html +++ b/dev/get_started/index.html @@ -13,4 +13,4 @@ init_dist = initialstate(pomdp) # from POMDPModels hr = HistoryRecorder(max_steps=100) # from POMDPTools hist = simulate(hr, pomdp, policy, belief_updater, init_dist) # run 100 step simulation -println("reward: $(discounted_reward(hist))")

The first part of the code loads the desired packages and initializes the problem and the solver. Next, we compute a POMDP policy. Lastly, we evaluate the results.

There are a few things to mention here. First, the TigerPOMDP type implements all the functions required by QMDPSolver to compute a policy. Second, each policy has a default updater (essentially a filter used to update the belief of the POMDP). To learn more about Updaters check out the Concepts section.

+println("reward: $(discounted_reward(hist))")

The first part of the code loads the desired packages and initializes the problem and the solver. Next, we compute a POMDP policy. Lastly, we evaluate the results.

There are a few things to mention here. First, the TigerPOMDP type implements all the functions required by QMDPSolver to compute a policy. Second, each policy has a default updater (essentially a filter used to update the belief of the POMDP). To learn more about Updaters check out the Concepts section.

diff --git a/dev/index.html b/dev/index.html index e3153461..d4a3a402 100644 --- a/dev/index.html +++ b/dev/index.html @@ -1,2 +1,2 @@ -POMDPs.jl · POMDPs.jl

POMDPs.jl

A Julia interface for defining, solving and simulating partially observable Markov decision processes and their fully observable counterparts.

Package and Ecosystem Features

  • General interface that can handle problems with discrete and continuous state/action/observation spaces
  • A number of popular state-of-the-art solvers implemented for use out-of-the-box
  • Tools that make it easy to define problems and simulate solutions
  • Simple integration of custom solvers into the existing interface

Available Packages

The POMDPs.jl package contains only the interface used for expressing and solving Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs). The POMDPTools package acts as a "standard library" for the POMDPs.jl interface, providing implementations of commonly-used components such as policies, belief updaters, distributions, and simulators. The list of solver and support packages maintained by the JuliaPOMDP community is available at the POMDPs.jl Readme.

Documentation Outline

Documentation comes in three forms:

  1. An explanatory guide is available in the sections outlined below.
  2. How-to examples are available in pages in this document with "Example" in the title and in the POMDPExamples package.
  3. Reference docstrings for the entire POMDPs.jl interface are available in the API Documentation section.
Note

When updating these documents, make sure this is synced with docs/make.jl!!

Basics

Defining POMDP Models

Writing Solvers and Updaters

Analyzing Results

POMDPTools - the standard library for POMDPs.jl

Reference

+POMDPs.jl · POMDPs.jl

POMDPs.jl

A Julia interface for defining, solving and simulating partially observable Markov decision processes and their fully observable counterparts.

Package and Ecosystem Features

  • General interface that can handle problems with discrete and continuous state/action/observation spaces
  • A number of popular state-of-the-art solvers implemented for use out-of-the-box
  • Tools that make it easy to define problems and simulate solutions
  • Simple integration of custom solvers into the existing interface

Available Packages

The POMDPs.jl package contains only the interface used for expressing and solving Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs). The POMDPTools package acts as a "standard library" for the POMDPs.jl interface, providing implementations of commonly-used components such as policies, belief updaters, distributions, and simulators. The list of solver and support packages maintained by the JuliaPOMDP community is available at the POMDPs.jl Readme.

Documentation Outline

Documentation comes in three forms:

  1. An explanatory guide is available in the sections outlined below.
  2. How-to examples are available in pages in this document with "Example" in the title and in the POMDPExamples package.
  3. Reference docstrings for the entire POMDPs.jl interface are available in the API Documentation section.
Note

When updating these documents, make sure this is synced with docs/make.jl!!

Basics

Defining POMDP Models

Writing Solvers and Updaters

Analyzing Results

POMDPTools - the standard library for POMDPs.jl

Reference

diff --git a/dev/install/index.html b/dev/install/index.html index 63aa9d57..09b5190e 100644 --- a/dev/install/index.html +++ b/dev/install/index.html @@ -1,3 +1,3 @@ Installation · POMDPs.jl

Installation

If you have a running Julia distribution (Julia 0.4 or greater), you have everything you need to install POMDPs.jl. To install the package, simply run the following from the Julia REPL:

import Pkg
-Pkg.add("POMDPs") # installs the POMDPs.jl package

Some auxiliary packages and older versions of solvers may be found in the JuliaPOMDP registry. To install this registry, run:

using Pkg; pkg"registry add https://github.com/JuliaPOMDP/Registry"

Note: to use this registry, JuliaPro users must also run edit(normpath(Sys.BINDIR,"..","etc","julia","startup.jl")), comment out the line ENV["DISABLE_FALLBACK"] = "true", save the file, and restart JuliaPro as described in this issue.

+Pkg.add("POMDPs") # installs the POMDPs.jl package

Some auxiliary packages and older versions of solvers may be found in the JuliaPOMDP registry. To install this registry, run:

using Pkg; pkg"registry add https://github.com/JuliaPOMDP/Registry"

Note: to use this registry, JuliaPro users must also run edit(normpath(Sys.BINDIR,"..","etc","julia","startup.jl")), comment out the line ENV["DISABLE_FALLBACK"] = "true", save the file, and restart JuliaPro as described in this issue.

diff --git a/dev/interfaces/index.html b/dev/interfaces/index.html index 40621cbf..60dff5ab 100644 --- a/dev/interfaces/index.html +++ b/dev/interfaces/index.html @@ -1,2 +1,2 @@ -Spaces and Distributions · POMDPs.jl

Spaces and Distributions

Two important components of the definitions of MDPs and POMDPs are spaces, which specify the possible states, actions, and observations in a problem and distributions, which define probability distributions. In order to provide for maximum flexibility spaces and distributions may be of any type (i.e. there are no abstract base types). Solvers and simulators will interact with space and distribution types using the functions defined below.

Spaces

A space object should contain the information needed to define the set of all possible states, actions or observations. The implementation will depend on the attributes of the elements. For example, if the space is continuous, the space object may only contain the limits of the continuous range. In the case of a discrete problem, a vector containing all states is appropriate for representing a space.

The following functions may be called on a space object (Click on a function to read its documentation):

Distributions

A distribution object represents a probability distribution.

The following functions may be called on a distribution object (Click on a function to read its documentation):

You can find some useful pre-made distribution objects in Distributions.jl or POMDPTools.

  • 1Distributions should support both rand(rng::AbstractRNG, d) and rand(d). The recommended way to do this is by implmenting Base.rand(rng::AbstractRNG, s::Random.SamplerTrivial{<:YourDistribution}) from the julia rand interface.
+Spaces and Distributions · POMDPs.jl

Spaces and Distributions

Two important components of the definitions of MDPs and POMDPs are spaces, which specify the possible states, actions, and observations in a problem and distributions, which define probability distributions. In order to provide for maximum flexibility spaces and distributions may be of any type (i.e. there are no abstract base types). Solvers and simulators will interact with space and distribution types using the functions defined below.

Spaces

A space object should contain the information needed to define the set of all possible states, actions or observations. The implementation will depend on the attributes of the elements. For example, if the space is continuous, the space object may only contain the limits of the continuous range. In the case of a discrete problem, a vector containing all states is appropriate for representing a space.

The following functions may be called on a space object (Click on a function to read its documentation):

Distributions

A distribution object represents a probability distribution.

The following functions may be called on a distribution object (Click on a function to read its documentation):

You can find some useful pre-made distribution objects in Distributions.jl or POMDPTools.

  • 1Distributions should support both rand(rng::AbstractRNG, d) and rand(d). The recommended way to do this is by implmenting Base.rand(rng::AbstractRNG, s::Random.SamplerTrivial{<:YourDistribution}) from the julia rand interface.
diff --git a/dev/offline_solver/index.html b/dev/offline_solver/index.html index ac5ddfd3..11e07f36 100644 --- a/dev/offline_solver/index.html +++ b/dev/offline_solver/index.html @@ -70,4 +70,4 @@ @assert action(policy, Deterministic(TIGER_LEFT)) == TIGER_OPEN_RIGHT @assert action(policy, Deterministic(TIGER_RIGHT)) == TIGER_OPEN_LEFT -@assert action(policy, Uniform(states(tiger))) == TIGER_LISTEN +@assert action(policy, Uniform(states(tiger))) == TIGER_LISTEN diff --git a/dev/online_solver/index.html b/dev/online_solver/index.html index 6486f7e6..cbc17176 100644 --- a/dev/online_solver/index.html +++ b/dev/online_solver/index.html @@ -56,4 +56,4 @@ @assert action(planner, Deterministic(TIGER_LEFT)) == TIGER_OPEN_RIGHT @assert action(planner, Deterministic(TIGER_RIGHT)) == TIGER_OPEN_LEFT -# note action(planner, Uniform(states(tiger))) is not very reliable with this number of samples +# note action(planner, Uniform(states(tiger))) is not very reliable with this number of samples diff --git a/dev/policy_interaction/index.html b/dev/policy_interaction/index.html index 8e0c6a2f..66e67478 100644 --- a/dev/policy_interaction/index.html +++ b/dev/policy_interaction/index.html @@ -1,2 +1,2 @@ -Interacting with Policies · POMDPs.jl

Interacting with Policies

A solution to a POMDP is a policy that maps beliefs or action-observation histories to actions. In POMDPs.jl, these are represented by Policy objects. See Solvers and Policies for more information about what a policy can represent in general.

One common task in evaluating POMDP solutions is examining the policies themselves. Since the internal representation of a policy is an esoteric implementation detail, it is best to interact with policies through the action and value interface functions. There are three relevant methods

  • action(policy, s) returns the best action (or one of the best) for the given state or belief.
  • value(policy, s) returns the expected sum of future rewards if the policy is executed.
  • value(policy, s, a) returns the "Q-value", that is, the expected sum of rewards if action a is taken on the next step and then the policy is executed.

Note that the quantities returned by these functions are what the policy/solver expects to be the case after its (usually approximate) computations; they may be far from the true value if the solution is not exactly optimal.

+Interacting with Policies · POMDPs.jl

Interacting with Policies

A solution to a POMDP is a policy that maps beliefs or action-observation histories to actions. In POMDPs.jl, these are represented by Policy objects. See Solvers and Policies for more information about what a policy can represent in general.

One common task in evaluating POMDP solutions is examining the policies themselves. Since the internal representation of a policy is an esoteric implementation detail, it is best to interact with policies through the action and value interface functions. There are three relevant methods

  • action(policy, s) returns the best action (or one of the best) for the given state or belief.
  • value(policy, s) returns the expected sum of future rewards if the policy is executed.
  • value(policy, s, a) returns the "Q-value", that is, the expected sum of rewards if action a is taken on the next step and then the policy is executed.

Note that the quantities returned by these functions are what the policy/solver expects to be the case after its (usually approximate) computations; they may be far from the true value if the solution is not exactly optimal.

diff --git a/dev/run_simulation/index.html b/dev/run_simulation/index.html index eafb337a..fa15e08b 100644 --- a/dev/run_simulation/index.html +++ b/dev/run_simulation/index.html @@ -1,3 +1,3 @@ Running Simulations · POMDPs.jl

Running Simulations

Running a simulation consists of two steps, creating a simulator and calling the simulate function. For example, given a POMDP or MDP model m, and a policy p, one can use the RolloutSimulator from POMDPTools to find the accumulated discounted reward from a single simulated trajectory as follows:

sim = RolloutSimulator()
-r = simulate(sim, m, p)

More inputs, such as a belief updater, initial state, initial belief, etc. may be specified as arguments to simulate. See the docstring for simulate and the appropriate "Input" sections in the Simulation Standard page for more information.

More examples can be found in the POMDPExamples package. A variety of simulators that return more information and interact in different ways can be found in POMDPTools.

+r = simulate(sim, m, p)

More inputs, such as a belief updater, initial state, initial belief, etc. may be specified as arguments to simulate. See the docstring for simulate and the appropriate "Input" sections in the Simulation Standard page for more information.

More examples can be found in the POMDPExamples package. A variety of simulators that return more information and interact in different ways can be found in POMDPTools.

diff --git a/dev/search/index.html b/dev/search/index.html index 5c1f2b35..5cdbf21b 100644 --- a/dev/search/index.html +++ b/dev/search/index.html @@ -1,2 +1,2 @@ -Search · POMDPs.jl

Loading search...

    +Search · POMDPs.jl

    Loading search...

      diff --git a/dev/search_index.js b/dev/search_index.js index 734950ac..24d66844 100644 --- a/dev/search_index.js +++ b/dev/search_index.js @@ -1,3 +1,3 @@ var documenterSearchIndex = {"docs": -[{"location":"POMDPTools/model/#Model-Tools","page":"Model Tools","title":"Model Tools","text":"","category":"section"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"POMDPTools contains assorted tools that are not part of the core POMDPs.jl interface for working with (PO)MDP Models.","category":"page"},{"location":"POMDPTools/model/#Interface-Extensions","page":"Model Tools","title":"Interface Extensions","text":"","category":"section"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"POMDPTools contains several interface extensions that provide shortcuts and standardized ways of dealing with extra data.","category":"page"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"Programmers should use these functions whenever possible in case optimized implementations are available, but all of the functions have default implementations based on the core POMDPs.jl interface. Thus, if the core interface is implemented, all of these functions will also be available.","category":"page"},{"location":"POMDPTools/model/#Weighted-Iteration","page":"Model Tools","title":"Weighted Iteration","text":"","category":"section"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"Many solution techniques, for example value iteration, require iteration through the support of a distribution and evaluating the probability mass for each value. In some cases, looking up the probability mass is expensive, so it is more efficient to iterate through value => probability pairs. weighted_iterator provides a standard interface for this.","category":"page"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"weighted_iterator","category":"page"},{"location":"POMDPTools/model/#POMDPTools.POMDPDistributions.weighted_iterator","page":"Model Tools","title":"POMDPTools.POMDPDistributions.weighted_iterator","text":"weighted_iterator(d)\n\nReturn an iterator through pairs of the values and probabilities in distribution d.\n\nThis is designed to speed up value iteration. Distributions are encouraged to provide a custom optimized implementation if possible.\n\nExample\n\njulia> d = BoolDistribution(0.7)\nBoolDistribution(0.7)\n\njulia> collect(weighted_iterator(d))\n2-element Array{Pair{Bool,Float64},1}:\n true => 0.7\n false => 0.3\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/model/#Observation-Weight","page":"Model Tools","title":"Observation Weight","text":"","category":"section"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"Sometimes, e.g. in particle filtering, the relative likelihood of an observation is required in addition to a generative model, and it is often tedious to implement a custom observation distribution type. For this case, the shortcut function obs_weight is provided.","category":"page"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"obs_weight","category":"page"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.obs_weight","page":"Model Tools","title":"POMDPTools.ModelTools.obs_weight","text":"obs_weight(pomdp, s, a, sp, o)\n\nReturn a weight proportional to the likelihood of receiving observation o from state sp (and a and s if they are present).\n\nThis is a useful shortcut for particle filtering so that the observation distribution does not have to be represented.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/model/#Ordered-Spaces","page":"Model Tools","title":"Ordered Spaces","text":"","category":"section"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"It is often useful to have a list of states, actions, or observations ordered consistently with the respective index function from POMDPs.jl. Since the POMDPs.jl interface does not demand that spaces be ordered consistently with index, the states, actions, and observations functions are not sufficient. Thus POMDPModelTools provides ordered_actions, ordered_states, and ordered_observations to provide this capability.","category":"page"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"ordered_actions\nordered_states\nordered_observations","category":"page"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.ordered_actions","page":"Model Tools","title":"POMDPTools.ModelTools.ordered_actions","text":"ordered_actions(mdp)\n\nReturn an AbstractVector of actions ordered according to actionindex(mdp, a).\n\nordered_actions(mdp) will always return an AbstractVector{A} v containing all of the actions in actions(mdp) in the order such that actionindex(mdp, v[i]) == i. You may wish to override this for your problem for efficiency.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.ordered_states","page":"Model Tools","title":"POMDPTools.ModelTools.ordered_states","text":"ordered_states(mdp)\n\nReturn an AbstractVector of states ordered according to stateindex(mdp, a).\n\nordered_states(mdp) will always return a AbstractVector{A} v containing all of the states in states(mdp) in the order such that stateindex(mdp, v[i]) == i. You may wish to override this for your problem for efficiency.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.ordered_observations","page":"Model Tools","title":"POMDPTools.ModelTools.ordered_observations","text":"ordered_observations(pomdp)\n\nReturn an AbstractVector of observations ordered according to obsindex(pomdp, a).\n\nordered_observations(mdp) will always return a AbstractVector{A} v containing all of the observations in observations(pomdp) in the order such that obsindex(pomdp, v[i]) == i. You may wish to override this for your problem for efficiency.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/model/#Info-Interface","page":"Model Tools","title":"Info Interface","text":"","category":"section"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"It is often the case that useful information besides the belief, state, action, etc is generated by a function in POMDPs.jl. This information can be useful for debugging or understanding the behavior of a solver, updater, or problem. The info interface provides a standard way for problems, policies, solvers or updaters to output this information. The recording simulators from POMDPTools automatically record this information.","category":"page"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"To specify info from policies, solvers, or updaters, implement the following functions:","category":"page"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"action_info\nsolve_info\nupdate_info","category":"page"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.action_info","page":"Model Tools","title":"POMDPTools.ModelTools.action_info","text":"a, ai = action_info(policy, x)\n\nReturn a tuple containing the action determined by policy 'p' at state or belief 'x' and information (usually a NamedTuple, Dict or nothing) from the calculation of that action.\n\nBy default, returns nothing as info.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.solve_info","page":"Model Tools","title":"POMDPTools.ModelTools.solve_info","text":"policy, si = solve_info(solver, problem)\n\nReturn a tuple containing the policy determined by a solver and information (usually a NamedTuple, Dict or nothing) from the calculation of that policy.\n\nBy default, returns nothing as info.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.update_info","page":"Model Tools","title":"POMDPTools.ModelTools.update_info","text":"bp, i = update_info(updater, b, a, o)\n\nReturn a tuple containing the new belief and information (usually a NamedTuple, Dict or nothing) from the belief update.\n\nBy default, returns nothing as info.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/model/#Model-Transformations","page":"Model Tools","title":"Model Transformations","text":"","category":"section"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"POMDPTools contains several tools for transforming problems into other classes so that they can be used by different solvers.","category":"page"},{"location":"POMDPTools/model/#Linear-Algebra-Representations","page":"Model Tools","title":"Linear Algebra Representations","text":"","category":"section"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"For some algorithms, such as value iteration, it is convenient to use vectors that contain the reward for every state, and matrices that contain the transition probabilities. These can be constructed with the following functions:","category":"page"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"transition_matrices\nreward_vectors","category":"page"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.transition_matrices","page":"Model Tools","title":"POMDPTools.ModelTools.transition_matrices","text":"transition_matrices(p::SparseTabularProblem)\n\nAccessor function for the transition model of a sparse tabular problem. It returns a list of sparse matrices for each action of the problem.\n\n\n\n\n\ntransition_matrices(m::Union{MDP,POMDP})\ntransition_matrices(m; sparse=true)\n\nConstruct transition matrices for (PO)MDP m.\n\nThe returned object is an associative object (usually a Dict), where the keys are actions. Each value in this object is an AbstractMatrix where the row corresponds to the state index of s and the column corresponds to the state index of s'. The entry in the matrix is the probability of transitioning from state s to state s'.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.reward_vectors","page":"Model Tools","title":"POMDPTools.ModelTools.reward_vectors","text":"reward_vectors(m::Union{MDP, POMDP})\n\nConstruct reward vectors for (PO)MDP m.\n\nThe returned object is an associative object (usually a Dict), where the keys are actions. Each value in this object is an AbstractVector where the index corresponds to the state index of s and the entry is the reward for that state.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/model/#Sparse-Tabular-MDPs-and-POMDPs","page":"Model Tools","title":"Sparse Tabular MDPs and POMDPs","text":"","category":"section"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"The SparseTabularMDP and SparseTabularPOMDP represents discrete problems defined using the explicit interface. The transition and observation models are represented using sparse matrices. Solver writers can leverage these data structures to write efficient vectorized code. A problem writer can define its problem using the explicit interface and it can be automatically converted to a sparse tabular representation by calling the constructors SparseTabularMDP(::MDP) or SparseTabularPOMDP(::POMDP). See the following docs to know more about the matrix representation and how to access the fields of the SparseTabular objects:","category":"page"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"SparseTabularMDP\nSparseTabularPOMDP\ntransition_matrix\nreward_vector\nobservation_matrix\nreward_matrix\nobservation_matrices","category":"page"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.SparseTabularPOMDP","page":"Model Tools","title":"POMDPTools.ModelTools.SparseTabularPOMDP","text":"SparseTabularPOMDP\n\nA POMDP object where states and actions are integers and the transition and observation distributions are represented by lists of sparse matrices. This data structure can be useful to exploit in vectorized algorithms to gain performance (e.g. see SparseValueIterationSolver). The recommended way to access the transition, reward, and observation matrices is through the provided accessor functions: transition_matrix, reward_vector, observation_matrix.\n\nFields\n\nT::Vector{SparseMatrixCSC{Float64, Int64}} The transition model is represented as a vector of sparse matrices (one for each action). T[a][s, sp] the probability of transition from s to sp taking action a.\nR::Array{Float64, 2} The reward is represented as a matrix where the rows are states and the columns actions: R[s, a] is the reward of taking action a in sate s.\nO::Vector{SparseMatrixCSC{Float64, Int64}} The observation model is represented as a vector of sparse matrices (one for each action). O[a][sp, o] is the probability of observing o from state sp after having taken action a.\ninitial_probs::SparseVector{Float64, Int64} Specifies the initial state distribution\nterminal_states::Set{Int64} Stores the terminal states\ndiscount::Float64 The discount factor\n\nConstructors\n\nSparseTabularPOMDP(pomdp::POMDP) : One can provide the matrices to the default constructor or one can construct a SparseTabularPOMDP from any discrete state MDP defined using the explicit interface. \n\nNote that constructing the transition and reward matrices requires to iterate over all the states and can take a while. To learn more information about how to define an MDP with the explicit interface please visit https://juliapomdp.github.io/POMDPs.jl/latest/explicit/ .\n\nSparseTabularPOMDP(spomdp::SparseTabularMDP; transition, reward, observation, discount) : This constructor returns a new sparse POMDP that is a copy of the original smdp except for the field specified by the keyword arguments.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.transition_matrix","page":"Model Tools","title":"POMDPTools.ModelTools.transition_matrix","text":"transition_matrix(p::SparseTabularProblem, a)\n\nAccessor function for the transition model of a sparse tabular problem. It returns a sparse matrix containing the transition probabilities when taking action a: T[s, sp] = Pr(sp | s, a).\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.reward_vector","page":"Model Tools","title":"POMDPTools.ModelTools.reward_vector","text":"reward_vector(p::SparseTabularProblem, a)\n\nAccessor function for the reward function of a sparse tabular problem. It returns a vector containing the reward for all the states when taking action a: R(s, a). The length of the return vector is equal to the number of states.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.observation_matrix","page":"Model Tools","title":"POMDPTools.ModelTools.observation_matrix","text":"observation_matrix(p::SparseTabularPOMDP, a::Int64)\n\nAccessor function for the observation model of a sparse tabular POMDP. It returns a sparse matrix containing the observation probabilities when having taken action a: O[sp, o] = Pr(o | sp, a).\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.reward_matrix","page":"Model Tools","title":"POMDPTools.ModelTools.reward_matrix","text":"reward_matrix(p::SparseTabularProblem)\n\nAccessor function for the reward matrix R[s, a] of a sparse tabular problem.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.observation_matrices","page":"Model Tools","title":"POMDPTools.ModelTools.observation_matrices","text":"observation_matrices(p::SparseTabularPOMDP)\n\nAccessor function for the observation model of a sparse tabular POMDP. It returns a list of sparse matrices for each action of the problem.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/model/#Fully-Observable-POMDP","page":"Model Tools","title":"Fully Observable POMDP","text":"","category":"section"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"FullyObservablePOMDP","category":"page"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.FullyObservablePOMDP","page":"Model Tools","title":"POMDPTools.ModelTools.FullyObservablePOMDP","text":"FullyObservablePOMDP(mdp)\n\nTurn MDP mdp into a POMDP where the observations are the states of the MDP.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/model/#Generative-Belief-MDP","page":"Model Tools","title":"Generative Belief MDP","text":"","category":"section"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"Every POMDP is an MDP on the belief space GenerativeBeliefMDP creates a generative model for that MDP.","category":"page"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"warning: Warning\nThe reward generated by the GenerativeBeliefMDP is the reward for a single state sampled from the belief; it is not the expected reward for that belief transition (though, in expectation, they are equivalent of course). Implementing the model with the expected reward requires a custom implementation because belief updaters do not typically deal with reward.","category":"page"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"GenerativeBeliefMDP","category":"page"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.GenerativeBeliefMDP","page":"Model Tools","title":"POMDPTools.ModelTools.GenerativeBeliefMDP","text":"GenerativeBeliefMDP(pomdp, updater)\n\nCreate a generative model of the belief MDP corresponding to POMDP pomdp with belief updates performed by updater.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/model/#Example","page":"Model Tools","title":"Example","text":"","category":"section"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"using POMDPs\nusing POMDPModels\nusing POMDPTools\n\npomdp = BabyPOMDP()\nupdater = DiscreteUpdater(pomdp)\n\nbelief_mdp = GenerativeBeliefMDP(pomdp, updater)\n@show statetype(belief_mdp) # POMDPModels.BoolDistribution\n\nfor (a, r, sp) in stepthrough(belief_mdp, RandomPolicy(belief_mdp), \"a,r,sp\", max_steps=5)\n @show a, r, sp\nend\n\n# output\nstatetype(belief_mdp) = DiscreteBelief{POMDPModels.BabyPOMDP, Bool}Bool}\n(a, r, sp) = (true, -5.0, DiscreteBelief{POMDPModels.BabyPOMDP, Bool}(POMDPModels.BabyPOMDP(-5.0, -10.0, 0.1, 0.8, 0.1, 0.9), Bool[0, 1], [1.0, 0.0]))\n(a, r, sp) = (true, -5.0, DiscreteBelief{POMDPModels.BabyPOMDP, Bool}(POMDPModels.BabyPOMDP(-5.0, -10.0, 0.1, 0.8, 0.1, 0.9), Bool[0, 1], [1.0, 0.0]))\n(a, r, sp) = (true, -5.0, DiscreteBelief{POMDPModels.BabyPOMDP, Bool}(POMDPModels.BabyPOMDP(-5.0, -10.0, 0.1, 0.8, 0.1, 0.9), Bool[0, 1], [1.0, 0.0]))\n(a, r, sp) = (false, 0.0, DiscreteBelief{POMDPModels.BabyPOMDP, Bool}(POMDPModels.BabyPOMDP(-5.0, -10.0, 0.1, 0.8, 0.1, 0.9), Bool[0, 1], [0.9759036144578314, 0.02409638554216867]))\n(a, r, sp) = (false, 0.0, DiscreteBelief{POMDPModels.BabyPOMDP, Bool}(POMDPModels.BabyPOMDP(-5.0, -10.0, 0.1, 0.8, 0.1, 0.9), Bool[0, 1], [0.9701315984030756, 0.029868401596924433]))","category":"page"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"DocTestSetup = nothing","category":"page"},{"location":"POMDPTools/model/#Underlying-MDP","page":"Model Tools","title":"Underlying MDP","text":"","category":"section"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"UnderlyingMDP","category":"page"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.UnderlyingMDP","page":"Model Tools","title":"POMDPTools.ModelTools.UnderlyingMDP","text":"UnderlyingMDP(m::POMDP)\n\nTransform POMDP m into an MDP where the states are fully observed.\n\nUnderlyingMDP(m::MDP)\n\nReturn m\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/model/#State-Action-Reward-Model","page":"Model Tools","title":"State Action Reward Model","text":"","category":"section"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"StateActionReward","category":"page"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.StateActionReward","page":"Model Tools","title":"POMDPTools.ModelTools.StateActionReward","text":"StateActionReward(m::Union{MDP,POMDP})\n\nRobustly create a reward function that depends only on the state and action.\n\nIf reward(m, s, a) is implemented, that will be used, otherwise the mean of reward(m, s, a, sp) for MDPs or reward(m, s, a, sp, o) for POMDPs will be used.\n\nExample\n\nusing POMDPs\nusing POMDPModels\nusing POMDPTools\n\nm = BabyPOMDP()\n\nrm = StateActionReward(m)\n\nrm(true, true)\n\n# output\n\n-15.0\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/model/#Utility-Types","page":"Model Tools","title":"Utility Types","text":"","category":"section"},{"location":"POMDPTools/model/#Terminal-State","page":"Model Tools","title":"Terminal State","text":"","category":"section"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"TerminalState and its singleton instance terminalstate are available to use for a terminal state in concert with another state type. It has the appropriate type promotion logic to make its use with other types friendly, similar to nothing and missing.","category":"page"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"note: Note\nNOTE: This is NOT a replacement for the standard POMDPs.jl isterminal function, though isterminal is implemented for the type. It is merely a convenient type to use for terminal states.","category":"page"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"warning: Warning\nWARNING: Early tests (August 2018) suggest that the Julia 1.0 compiler will not be able to efficiently implement union splitting in cases as complex as POMDPs, so using a Union for the state type of a problem can currently have a large overhead.","category":"page"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"TerminalState\nterminalstate","category":"page"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.TerminalState","page":"Model Tools","title":"POMDPTools.ModelTools.TerminalState","text":"TerminalState\n\nA type with no fields whose singleton instance terminalstate is used to represent a terminal state with no additional information.\n\nThis type has the appropriate promotion logic implemented to function like Missing when added to arrays, etc.\n\nNote that terminal states NEED NOT be of type TerminalState. You can define any state to be terminal by implementing the appropriate isterminal method. Solvers and simulators SHOULD NOT check for this type, but should instead check using isterminal. \n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.terminalstate","page":"Model Tools","title":"POMDPTools.ModelTools.terminalstate","text":"terminalstate\n\nThe singleton instance of type TerminalState representing a terminal state.\n\n\n\n\n\n","category":"constant"},{"location":"policy_interaction/#Interacting-with-Policies","page":"Interacting with Policies","title":"Interacting with Policies","text":"","category":"section"},{"location":"policy_interaction/","page":"Interacting with Policies","title":"Interacting with Policies","text":"A solution to a POMDP is a policy that maps beliefs or action-observation histories to actions. In POMDPs.jl, these are represented by Policy objects. See Solvers and Policies for more information about what a policy can represent in general.","category":"page"},{"location":"policy_interaction/","page":"Interacting with Policies","title":"Interacting with Policies","text":"One common task in evaluating POMDP solutions is examining the policies themselves. Since the internal representation of a policy is an esoteric implementation detail, it is best to interact with policies through the action and value interface functions. There are three relevant methods","category":"page"},{"location":"policy_interaction/","page":"Interacting with Policies","title":"Interacting with Policies","text":"action(policy, s) returns the best action (or one of the best) for the given state or belief.\nvalue(policy, s) returns the expected sum of future rewards if the policy is executed.\nvalue(policy, s, a) returns the \"Q-value\", that is, the expected sum of rewards if action a is taken on the next step and then the policy is executed.","category":"page"},{"location":"policy_interaction/","page":"Interacting with Policies","title":"Interacting with Policies","text":"Note that the quantities returned by these functions are what the policy/solver expects to be the case after its (usually approximate) computations; they may be far from the true value if the solution is not exactly optimal.","category":"page"},{"location":"install/#Installation","page":"Installation","title":"Installation","text":"","category":"section"},{"location":"install/","page":"Installation","title":"Installation","text":"If you have a running Julia distribution (Julia 0.4 or greater), you have everything you need to install POMDPs.jl. To install the package, simply run the following from the Julia REPL:","category":"page"},{"location":"install/","page":"Installation","title":"Installation","text":"import Pkg\nPkg.add(\"POMDPs\") # installs the POMDPs.jl package","category":"page"},{"location":"install/","page":"Installation","title":"Installation","text":"Some auxiliary packages and older versions of solvers may be found in the JuliaPOMDP registry. To install this registry, run:","category":"page"},{"location":"install/","page":"Installation","title":"Installation","text":"using Pkg; pkg\"registry add https://github.com/JuliaPOMDP/Registry\"","category":"page"},{"location":"install/","page":"Installation","title":"Installation","text":"Note: to use this registry, JuliaPro users must also run edit(normpath(Sys.BINDIR,\"..\",\"etc\",\"julia\",\"startup.jl\")), comment out the line ENV[\"DISABLE_FALLBACK\"] = \"true\", save the file, and restart JuliaPro as described in this issue.","category":"page"},{"location":"POMDPTools/visualization/#Visualization","page":"Visualization","title":"Visualization","text":"","category":"section"},{"location":"POMDPTools/visualization/","page":"Visualization","title":"Visualization","text":"POMDPTools contains a basic visualization interface consisting of the render function.","category":"page"},{"location":"POMDPTools/visualization/","page":"Visualization","title":"Visualization","text":"Problem writers should implement a method of this function so that their problem can be visualized in a variety of contexts including jupyter notebooks, web browsers, or saved as images or animations.","category":"page"},{"location":"POMDPTools/visualization/","page":"Visualization","title":"Visualization","text":"render","category":"page"},{"location":"POMDPTools/visualization/#POMDPTools.ModelTools.render","page":"Visualization","title":"POMDPTools.ModelTools.render","text":"render(m::Union{MDP,POMDP}, step::NamedTuple)\n\nReturn a renderable representation of the step in problem m.\n\nThe renderable representation may be anything that has show(io, mime, x) methods. It could be a plot, svg, Compose.jl context, Cairo context, or image.\n\nArguments\n\nstep is a NamedTuple that contains the states, action, etc. corresponding to one transition in a simulation. It may have the following fields:\n\nt: the time step index\ns: the state at the beginning of the step\na: the action\nsp: the state at the end of the step (s')\nr: the reward for the step\no: the observation\nb: the belief at the \nbp: the belief at the end of the step\ni: info from the model when the state transition was calculated\nai: info from the policy decision\nui: info from the belief update\n\nKeyword arguments are reserved for the problem implementer and can be used to control appearance, etc.\n\nImportant Notes\n\nstep may not contain all of the elements listed above, so render should check for them and render only what is available\no typically corresponds to sp, so it is often clearer for POMDPs to render sp rather than s.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/visualization/","page":"Visualization","title":"Visualization","text":"Sometimes it is important to have control over how the problem is rendered with different mimetypes. One way to handle this is to have render return a custom type, e.g.","category":"page"},{"location":"POMDPTools/visualization/","page":"Visualization","title":"Visualization","text":"struct MyProblemVisualization\n mdp::MyProblem\n step::NamedTuple\nend\n\nPOMDPTools.render(mdp, step) = MyProblemVisualization(mdp, step)","category":"page"},{"location":"POMDPTools/visualization/","page":"Visualization","title":"Visualization","text":"and then implement custom show methods, e.g.","category":"page"},{"location":"POMDPTools/visualization/","page":"Visualization","title":"Visualization","text":"show(io::IO, mime::MIME\"text/html\", v::MyProblemVisualization)","category":"page"},{"location":"def_pomdp/#defining_pomdps","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"As described in the Concepts and Architecture section, an MDP is defined by the state space, action space, transition distributions, reward function, and discount factor, (SATRgamma). A POMDP also includes the observation space, and observation probability distributions, for a definition of (SATROZgamma). A problem definition in POMDPs.jl consists of an implicit or explicit definition of each of these elements.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"It is possible to define a (PO)MDP with a more traditional object-oriented approach in which the user defines a new type to represent the (PO)MDP and methods of interface functions to define the tuple elements. However, the QuickPOMDPs package provides a more concise way to get started, using keyword arguments instead of new types and methods. Essentially each keyword argument defines a corresponding POMDPs api function. Since the important concepts are the same for the object oriented approach and the QuickPOMDP approach, we will use the latter for this discussion.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"This guide has three parts: First, it explains a very simple example (the Tiger POMDP), then uses a more complex example to illustrate the broader capabilities of the interface. Finally, some alternative ways of defining (PO)MDPs are discussed.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"note: Note\nThis guide assumes that you are comfortable programming in Julia, especially familiar with various ways of defining anonymous functions. Users should consult the Julia documentation to learn more about programming in Julia.","category":"page"},{"location":"def_pomdp/#tiger","page":"Defining POMDPs and MDPs","title":"A Basic Example: The Tiger POMDP","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"In the first section of this guide, we will explain a QuickPOMDP implementation of a very simple problem: the classic Tiger POMDP. In the tiger POMDP, the agent is tasked with escaping from a room. There are two doors leading out of the room. Behind one of the doors is a tiger, and behind the other is sweet, sweet freedom. If the agent opens the door and finds the tiger, it gets eaten (and receives a reward of -100). If the agent opens the other door, it escapes and receives a reward of 10. The agent can also listen. Listening gives a noisy measurement of which door the tiger is hiding behind. Listening gives the agent the correct location of the tiger 85% of the time. The agent receives a reward of -1 for listening. The complete implementation looks like this:","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"using QuickPOMDPs: QuickPOMDP\nusing POMDPTools: Deterministic, Uniform, SparseCat\n\nm = QuickPOMDP(\n states = [\"left\", \"right\"],\n actions = [\"left\", \"right\", \"listen\"],\n observations = [\"left\", \"right\"],\n discount = 0.95,\n\n transition = function (s, a)\n if a == \"listen\"\n return Deterministic(s) # tiger stays behind the same door\n else # a door is opened\n return Uniform([\"left\", \"right\"]) # reset\n end\n end,\n\n observation = function (a, sp)\n if a == \"listen\"\n if sp == \"left\"\n return SparseCat([\"left\", \"right\"], [0.85, 0.15]) # sparse categorical\n else\n return SparseCat([\"right\", \"left\"], [0.85, 0.15])\n end\n else\n return Uniform([\"left\", \"right\"])\n end\n end,\n\n reward = function (s, a)\n if a == \"listen\"\n return -1.0\n elseif s == a # the tiger was found\n return -100.0\n else # the tiger was escaped\n return 10.0\n end\n end,\n\n initialstate = Uniform([\"left\", \"right\"]),\n);","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"The next sections explain how each of the elements of the POMDP tuple are defined in this implementation:","category":"page"},{"location":"def_pomdp/#State,-action-and-observation-spaces","page":"Defining POMDPs and MDPs","title":"State, action and observation spaces","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"In this example, each state, action, and observation is a String. The state, action and observation spaces (S, A, and O), are defined with the states, actions and observations keyword arguments. In this case, they are simply Vectors containing all the elements in the space.","category":"page"},{"location":"def_pomdp/#Transition-and-observation-distributions","page":"Defining POMDPs and MDPs","title":"Transition and observation distributions","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"The transition and observation keyword arguments are used to define the transition distribution, T, and observation distribution, Z, respectively. These models are defined using functions that return distribution objects (more info below). The transition function takes state and action arguments and returns a distribution of the resulting next state. The observation function takes in an action and the resulting next state (sp, short for \"s prime\") and returns the distribution of the observation emitted at this state.","category":"page"},{"location":"def_pomdp/#Reward-function","page":"Defining POMDPs and MDPs","title":"Reward function","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"The reward keyword argument defines R. It is a function that takes in a state and action and returns a number.","category":"page"},{"location":"def_pomdp/#Discount-and-initial-state-distribution","page":"Defining POMDPs and MDPs","title":"Discount and initial state distribution","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"The discount factor, gamma, is defined with the discount keyword, and is simply a number between 0 and 1. The initial state distribution, b_0, is defined with the initialstate argument, and is a distribution object.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"The example above shows a complete implementation of a very simple discrete-space POMDP. However, POMDPs.jl is capable of concisely expressing much more complex models with continuous and hybrid spaces. The guide below introduces a more complex example to fully explain the ways that a POMDP can be defined.","category":"page"},{"location":"def_pomdp/#Guide-to-Defining-POMDPs","page":"Defining POMDPs and MDPs","title":"Guide to Defining POMDPs","text":"","category":"section"},{"location":"def_pomdp/#po-mountaincar","page":"Defining POMDPs and MDPs","title":"A more complex example: A partially-observable mountain car","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"Mountain car is a classic problem in reinforcement learning. A car starts in a valley between two hills, and must reach the goal at the top of the hill to the right (see wikipedia for image). The actions are left and right acceleration and neutral and the state consists of the car's position and velocity. In this partially-observable version, there is a small amount of acceleration noise and observations are normally-distributed noisy measurements of the position. This problem can be implemented as follows:","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"import QuickPOMDPs: QuickPOMDP\nimport POMDPTools: ImplicitDistribution\nimport Distributions: Normal\n\nmountaincar = QuickPOMDP(\n actions = [-1., 0., 1.],\n obstype = Float64,\n discount = 0.95,\n\n transition = function (s, a) \n ImplicitDistribution() do rng\n x, v = s\n vp = v + a*0.001 + cos(3*x)*-0.0025 + 0.0002*randn(rng)\n vp = clamp(vp, -0.07, 0.07)\n xp = x + vp\n return (xp, vp)\n end\n end,\n\n observation = (a, sp) -> Normal(sp[1], 0.15),\n\n reward = function (s, a, sp)\n if sp[1] > 0.5\n return 100.0\n else\n return -1.0\n end\n end,\n\n initialstate = ImplicitDistribution(rng -> (-0.2*rand(rng), 0.0)),\n isterminal = s -> s[1] > 0.5\n)","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"The following sections provide a detailed guide to defining the components of a POMDP using this example and the tiger pomdp further above.","category":"page"},{"location":"def_pomdp/#space_representation","page":"Defining POMDPs and MDPs","title":"State, action, and observation spaces","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"In POMDPs.jl, a state, action, or observation can be represented by any Julia object, for example an integer, a floating point number, a string or Symbol, or a vector. For example, in the tiger problem, the states are Strings, and in the mountaincar problem, the state is a Tuple of two floating point numbers, and the actions and observations are floating point numbers. These types are usually inferred from the space or initial state distribution definitions.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"warn: Warn\nObjects representing individual states, actions, and observations should not be altered once they are created, since they may be used as dictionary keys or stored in histories. Hence it is usually best to use immutable objects such as integers or StaticArrays. If the states need to be mutable (e.g. aggregate types with vectors in them), make sure the states are not actualy mutated and that hash and == functions are implmemented (see AutoHashEquals)","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"The state, action, and observation spaces are defined with the states, actions, and observations Quick(PO)MDP keyword arguments. The simplest way to define these spaces is with a Vector of states, e.g. states = [\"left\", \"right\"] in the tiger problem. More complicated spaces, such as vector spaces and other continuous, uncountable, or hybrid sets can be defined with custom objects that adhere to the space interface. However it should be noted that, for many solvers, an explicit enumeration of the state and observation spaces is not needed. Instead, it is sufficient to specify the state or observation type using the statetype or obstype arguments, e.g. obstype = Float64 in the mountaincar problem.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"tip: Tip\nIf you are having a difficult time representing the state or observation space, it is likely that you will not be able to use a solver that requires an explicit representation. It is usually best to omit that space from the definition and try solvers to see if they work.","category":"page"},{"location":"def_pomdp/#state-dep-action","page":"Defining POMDPs and MDPs","title":"State- or belief-dependent action spaces","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"In some problems, the set of allowable actions depends on the state or belief. This can be implemented by providing a function of the state or belief to the actions argument, e.g. if you can only take the action 1 in state 1, but can take full action space 1, 2 and 3, in an MDP, you might use","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"# add default vlaue \"s = nothing\" , \"actions(mdp)\" won't throw error.\nactions = function (s = nothing) \n if s == 1\n return [1] #<--- return state-dep-actions\n else\n return [1,2,3] #<--- return full action space here\n end\nend","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"Similarly, in a POMDP, you may wish to only allow action 1 if the belief b assigns a nonzero probability to state 1. This can be accomplished with","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"actions = function (b)\n if pdf(b, 1) > 0.0\n return [1,2,3]\n else\n return [2,3]\n end\nend","category":"page"},{"location":"def_pomdp/#Transition-and-observation-distributions-2","page":"Defining POMDPs and MDPs","title":"Transition and observation distributions","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"The transition and observation observation distributions are specified through functions that return distributions. A distribution object implements parts of the distribution interface, most importantly a rand function that provides a way to sample the distribution and, for explicit distributions, a pdf function that evaluates the probability mass or density of a given outcome. In most simple cases, you will be able to use a pre-defined distribution like the ones listed below, but occasionally you will define your own for more complex problems.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"tip: Tip\nSince the transition and observation functions return distributions, you should not call rand within these functions (unless it is within an ImplicitDistribution sampling function (see below)).","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"The transition function takes in a state s and action a and returns a distribution object that defines the distribution of next states given that the current state is s and the action is a, that is T(s s a). Similarly the observation function takes in the action a and the next state sp and returns a distribution object defining O(z a s).","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"note: Note\nIt is also possible to define the observation function in terms of the previous state s, along with a, and sp. This is necessary, for example, when the observation is a measurement of change in state, e.g. sp - s. However some solvers may use the a, sp method (and hence cannot solve problems where the observation is conditioned on s and s). Since providing an a, sp method automatically defines the s, a, sp method, problem writers should usually define only the a, sp method, and only define the s, a, sp method if it is necessary. Except for special performance cases, problem writers should never need to define both methods.","category":"page"},{"location":"def_pomdp/#Commonly-used-distributions","page":"Defining POMDPs and MDPs","title":"Commonly-used distributions","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"In most cases, the following pre-defined distributions found in the POMDPTools and Distributions packages will be sufficient to define models.","category":"page"},{"location":"def_pomdp/#Deterministic","page":"Defining POMDPs and MDPs","title":"Deterministic","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"The Deterministic distribution should be used when there is no randomness in the state or observation given the state and action inputs. This commonly occurs when the new state is a deterministic function of the state and action or the state stays the same, for example when the action is \"listen\" in the tiger example above, the transition function returns Deterministic(s).","category":"page"},{"location":"def_pomdp/#SparseCat","page":"Defining POMDPs and MDPs","title":"SparseCat","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"In discrete POMDPs, it is common for the state or observation to have a few possible outcomes with specified probabilities. This can be represented with a sparse categorical SparseCat distribution that takes a list of outcomes and a list of associated probabilities as arguments. For instance, in the tiger example above, when the action is \"listen\", there is an 85% chance of receiving the correct observation. Thus if the state is \"left\", the observation distribution is SparseCat([\"left\", \"right\"], [0.85, 0.15]), and SparseCat([\"right\", \"left\"], [0.85, 0.15]) if the state is \"right\".","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"Another example where SparseCat distributions are useful is in grid-world problems, where there is a high probability of transitioning along the direction of the action, a low probability of transitioning to other adjacent states, and zero probability of transitioning to any other states.","category":"page"},{"location":"def_pomdp/#Uniform","page":"Defining POMDPs and MDPs","title":"Uniform","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"Another common case is a uniform distribution over a space or set of outcomes. This can be represented with a Uniform object that takes a set of outcomes as an argument. For example, the initial state distribution in the tiger problem is represented with Uniform([\"left\", \"right\"]) indicating that both states are equally likely.","category":"page"},{"location":"def_pomdp/#Distributions.jl","page":"Defining POMDPs and MDPs","title":"Distributions.jl","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"If the states or observations have numerical or vector values, the Distributions.jl package provides a suite of suitable distributions. For example, the observation function in the partially-observable mountain car example above,","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"observation = (a, sp) -> Normal(sp[1], 0.15)","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"returns a Normal distribution from this package with a mean that depends on the car's location (the first element of state sp) and a standard deviation of 0.15.","category":"page"},{"location":"def_pomdp/#implicit_distribution_section","page":"Defining POMDPs and MDPs","title":"ImplicitDistribution","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"In many cases, especially when the state or observation spaces are continuous or hybrid, it is difficult or impossible to specify the probability density explicitly. Fortunately, many solvers for these problems do not require explicit density information and instead need only samples from the distribution. In this case, an \"implicit distribution\" or \"generative model\" is sufficient. In POMDPs.jl, this can be represented using an ImplicitDistribution object.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"The argument to an ImplicitDistribution constructor is a function that takes a random number generator as an argument and returns a sample from the distribution. To see how this works, we'll look at an example inspired by the mountaincar initial state distribution. Samples from this distribution are position-velocity tuples where the velocity is always zero, but the position is uniformly distributed between -0.2 and 0. Consider the following code:","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"using Random: MersenneTwister\nusing POMDPTools: ImplicitDistribution\n\nrng = MersenneTwister(1)\n\nd = ImplicitDistribution(rng -> (-0.2*rand(rng), 0.0))\nrand(rng, d)\n# output\n(-0.04720666913240939, 0.0)","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"Here, rng is the random number generator. When rand(rng, d) is called, the sampling function, rng -> (-0.2*rand(rng), 0.0), is called to generate a state. The sampling function uses rng to generate a random number between 0 and 1 (rand(rng)), multiplies it by -0.2 to get the position, and creates a tuple with the position and a velocity of 0.0 and returns an initial state that might be, for instance (-0.11, 0.0). Any time that a solver, belief updater, or simulator needs an initial state for the problem, it will be sampled in this way.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"note: Note\nThe random number generator is a subtype of AbstractRNG. It is important to use this random number generator for all calls to rand in the sample function for reproducible results. Moreover some solvers use specialized random number generators that allow them to reduce variance. See also the What if I don't use the rng argument? FAQ.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"It is also common to use Julia's do block syntax to define more complex sampling functions. For instance the transition function in the mountaincar example returns an ImplicitDistribution with a sampling function that (1) generates a new noisy velocity through a randn call, then (2) clamps the velocity, and finally (3) integrates the position with Euler's method:","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"transition = function (s, a) \n ImplicitDistribution() do rng\n x, v = s\n vp = v + a*0.001 + cos(3*x)*-0.0025 + 0.0002*randn(rng)\n vp = clamp(vp, -0.07, 0.07)\n xp = x + vp\n return (xp, vp)\n end\nend","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"Because of the nonlinear clamp operation, it would be difficult to represent this distribution explicitly.","category":"page"},{"location":"def_pomdp/#Custom-distributions","page":"Defining POMDPs and MDPs","title":"Custom distributions","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"If none of the distributions above are suitable, for example if you need to represent an explicit distribution with hybrid support, it is not difficult to define your own distributions by implementing the functions in the distribution interface.","category":"page"},{"location":"def_pomdp/#Reward-functions","page":"Defining POMDPs and MDPs","title":"Reward functions","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"The reward function maps a combination of state, action, and observation arguments to the reward for a step. For instance, the reward function in the mountaincar problem,","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"reward = function (s, a, sp)\n if sp[1] > 0.5\n return 100.0\n else\n return -1.0\n end\nend","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"takes in the previous state, s, the action, a, and the resulting state, sp and returns a large positive reward if the resulting position, sp[1], is beyond a threshold (note the coupling of the terminal reward) and a small negative reward on all other steps. If the reward in the problem is stochastic, the reward function implemented in POMDPs.jl should return the mean reward.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"There are two possible reward function argument signatures that a problem-writer might consider implementing for an MDP: (s, a) and (s, a, sp). For a POMDP, there is an additional version, (s, a, sp, o). The (s, a, sp) version is useful when transition to a terminal state results in a reward, and the (s, a, sp, o) version is useful for cases when the reward is associated with an observation, such as a negative reward for the stress caused by a medical diagnostic test that indicates the possibility of a disease. Problem writers should implement the version with the fewest number of arguments possible, since the versions with more arguments are automatically provided to solvers and simulators if a version with fewer arguments is implemented.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"In rare cases, it may make sense to implement two or more versions of the function, for example if a solver requires (s, a), but the user wants an observation-dependent reward to show up in simulation. It is OK to implement two methods of the reward function as long as the following relationships hold: R(s a) = E_ssim T(ssa)R(s a s) and R(s a s) = E_o sim Z(o s a s)R(s a s o). That is, the versions with fewer arguments must be expectations of versions with more arguments.","category":"page"},{"location":"def_pomdp/#Other-Components","page":"Defining POMDPs and MDPs","title":"Other Components","text":"","category":"section"},{"location":"def_pomdp/#Discount-factors","page":"Defining POMDPs and MDPs","title":"Discount factors","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"The discount keyword argument is simply a number between 0 and 1 used to discount rewards in the future.","category":"page"},{"location":"def_pomdp/#Initial-state-distribution","page":"Defining POMDPs and MDPs","title":"Initial state distribution","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"The initialstate argument should be a distribution object (see above) that defines the initial state distribution (and initial belief for POMDPs).","category":"page"},{"location":"def_pomdp/#Terminal-states","page":"Defining POMDPs and MDPs","title":"Terminal states","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"The function supplied to the isterminal object defines which which states in the POMDP are terminal. The function should take a state as an argument as an argument and return true if the state is terminal and false otherwise. For example, in the mountaincar example above, isterminal = s -> s[1] > 0.5 indicates all states where the position, s[1] is greater than 0.5 are terminal.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"It is assumed that the system will take no further steps once it has reached a terminal state. Since reward is assigned for taking steps, no additional award can be accumulated from a terminal state. Consequently, the most important property of terminal states is that the value of a terminal state is always zero. Many solvers leverage this property for efficiency. As in the mountaincar example","category":"page"},{"location":"def_pomdp/#Other-ways-to-define-a-(PO)MDP","page":"Defining POMDPs and MDPs","title":"Other ways to define a (PO)MDP","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"Besides the Quick(PO)MDP approach above, there are several alternative ways to define (PO)MDP models:","category":"page"},{"location":"def_pomdp/#Object-oriented","page":"Defining POMDPs and MDPs","title":"Object-oriented","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"First, it is possible to create your own (PO)MDP types and implement the components of the POMDP directly as methods of POMDPs.jl interface functions. This approach can be thought of as the \"low-level\" way to define a POMDP, and the QuickPOMDP as merely a syntactic convenience. There are a few things that make this object-oriented approach more cumbersome than the QuickPOMDP approach, but the structure is similar. For example, the tiger QuickPOMDP shown above can be implemented as follows:","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"import POMDPs\nusing POMDPs: POMDP\nusing POMDPTools: Deterministic, Uniform, SparseCat\n\nstruct TigerPOMDP <: POMDP{String, String, String}\n p_correct::Float64\n indices::Dict{String, Int}\n\n TigerPOMDP(p_correct=0.85) = new(p_correct, Dict(\"left\"=>1, \"right\"=>2, \"listen\"=>3))\nend\n\nPOMDPs.states(m::TigerPOMDP) = [\"left\", \"right\"]\nPOMDPs.actions(m::TigerPOMDP) = [\"left\", \"right\", \"listen\"]\nPOMDPs.observations(m::TigerPOMDP) = [\"left\", \"right\"]\nPOMDPs.discount(m::TigerPOMDP) = 0.95\nPOMDPs.stateindex(m::TigerPOMDP, s) = m.indices[s]\nPOMDPs.actionindex(m::TigerPOMDP, a) = m.indices[a]\nPOMDPs.obsindex(m::TigerPOMDP, o) = m.indices[o]\n\nfunction POMDPs.transition(m::TigerPOMDP, s, a)\n if a == \"listen\"\n return Deterministic(s) # tiger stays behind the same door\n else # a door is opened\n return Uniform([\"left\", \"right\"]) # reset\n end\nend\n\nfunction POMDPs.observation(m::TigerPOMDP, a, sp)\n if a == \"listen\"\n if sp == \"left\"\n return SparseCat([\"left\", \"right\"], [m.p_correct, 1.0-m.p_correct])\n else\n return SparseCat([\"right\", \"left\"], [m.p_correct, 1.0-m.p_correct])\n end\n else\n return Uniform([\"left\", \"right\"])\n end\nend\n\nfunction POMDPs.reward(m::TigerPOMDP, s, a)\n if a == \"listen\"\n return -1.0\n elseif s == a # the tiger was found\n return -100.0\n else # the tiger was escaped\n return 10.0\n end\nend\n\nPOMDPs.initialstate(m::TigerPOMDP) = Uniform([\"left\", \"right\"])\n# output","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"It is easy to see that the new methods are similar to the keyword arguments in the QuickPOMDP approach, except that every function has an initial m argument that has the newly created POMDP type. There are several differences from the QuickPOMDP approach: First, the POMDP is represented by a new struct that is a subtype of POMDP{S,A,O}. The state, action, and observation types must be specified as the S, A, and O parameters of the POMDP abstract type. Second, this new struct may contain problem-specific fields, which makes it easy for others to construct POMDPs that have the same structure but different parameters. For example, in the code above, the struct has a p_correct parameter that specifies the probability of receiving a correct observation when the \"listen\" action is taken. The final and most cumbersome difference between this object-oriented approach and using QuickPOMDPs is that the user must implement stateindex, actionindex, and obsindex to map states, actions, and observations to appropriate indices so that data such as values can be stored and accessed efficiently in vectors.","category":"page"},{"location":"def_pomdp/#Using-a-single-generative-function-instead-of-separate-T,-Z,-and-R","page":"Defining POMDPs and MDPs","title":"Using a single generative function instead of separate T, Z, and R","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"In some cases, you may wish to use a simulator that generates the next state, observation, and/or reward (s, o, and r) simultaneously. This is sometimes called a \"generative model\".","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"For example if you are working on an autonomous driving POMDP, the car may travel for one or more seconds in between POMDP decision steps during which it may accumulate reward and observation measurements. In this case it might be very difficult to create a reward or observation function based on s, a, and s arguments.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"For situations like this, gen is an alternative to transition, observation, and reward. The gen function should take in state, action, and random number generator arguments and return a NamedTuple with keys sp (for \"s-prime\", the next state), o, and r. The mountaincar example above can be implemented with gen as shown below.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"note: Note\ngen is intended only for the case where two or more of the next state, observation, and reward need to be generated at the same time. If the state transition model can be separated from the reward and observation models, you should implement transition with an ImplicitDistribution instead of gen. See also the \"What is the difference between transition, gen, and @gen?\" FAQ.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"using QuickPOMDPs: QuickPOMDP\nusing POMDPTools: ImplicitDistribution\n\nmountaincar = QuickPOMDP(\n actions = [-1., 0., 1.],\n obstype = Float64,\n discount = 0.95,\n\n gen = function (s, a, rng)\n x, v = s\n vp = v + a*0.001 + cos(3*x)*-0.0025 + 0.0002*randn(rng)\n vp = clamp(vp, -0.07, 0.07)\n xp = x + vp\n if xp > 0.5\n r = 100.0\n else\n r = -1.0\n end\n o = xp + 0.15*randn(rng)\n return (sp=(xp, vp), o=o, r=r)\n end,\n\n initialstate = ImplicitDistribution(rng -> (-0.2*rand(rng), 0.0)),\n isterminal = s -> s[1] > 0.5\n)","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"tip: Tip\ngen is not tied to the QuickPOMDP approach; it can also be used in the object-oriented paradigm.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"tip: Tip\nIt is possible to mix and match gen with transtion, observation, and reward. For example, if the gen function returns a NamedTuple with sp and r keys, POMDPs.jl will try to use gen to generate states and rewards and the observation function to generate observations.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"note: Note\nImplementing gen instead of transition, observation, and reward will limit which solvers you can use; for example, it is impossible to use a solver that requires an explicit transition distribution","category":"page"},{"location":"def_pomdp/#Tabular","page":"Defining POMDPs and MDPs","title":"Tabular","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"Finally, it is sometimes convenient to define (PO)MDPs with tables that define the transition and observation probabilities and rewards. In this case, the states, actions, and observations must simply be integers.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"The code below is a tabular implementation of the tiger example with the states, actions, and observations mapped to the following integers:","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"integer state, action, or observation\n1 \"left\"\n2 \"right\"\n3 \"listen\"","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"using POMDPModels: TabularPOMDP\n\nT = zeros(2,3,2)\nT[:,:,1] = [1. 0.5 0.5; \n 0. 0.5 0.5]\nT[:,:,2] = [0. 0.5 0.5; \n 1. 0.5 0.5]\n\nO = zeros(2,3,2)\nO[:,:,1] = [0.85 0.5 0.5; \n 0.15 0.5 0.5]\nO[:,:,2] = [0.15 0.5 0.5; \n 0.85 0.5 0.5]\n\nR = [-1. -100. 10.; \n -1. 10. -100.]\n\nm = TabularPOMDP(T, R, O, 0.95)","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"Here T is a S times A times S array representing the transition probabilities, with T[sp, a, s] = T(s s a). Similarly, O is an O times A times S encoding the observation distribution with O[o, a, sp] = Z(o a s), and R is a S times A matrix that encodes the reward function. 0.95 is the discount factor.","category":"page"},{"location":"concepts/#Concepts-and-Architecture","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"","category":"section"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"POMDPs.jl aims to coordinate the development of three software components: 1) a problem, 2) a solver, 3) an experiment. Each of these components has a set of abstract types associated with it and a set of functions that allow a user to define each component's behavior in a standardized way. An outline of the architecture is shown below.","category":"page"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"(Image: concepts)","category":"page"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"The MDP and POMDP types are associated with the problem definition. The Solver and Policy types are associated with the solver or decision-making agent. Typically, the Updater type is also associated with the solver, but a solver may sometimes be used with an updater that was implemented separately. The Simulator type is associated with the experiment.","category":"page"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"The code components of the POMDPs.jl ecosystem relevant to problems and solvers are shown below. The arrows represent the flow of information from the problems to the solvers. The figure shows the two interfaces that form POMDPs.jl - Explicit and Generative. Details about these interfaces can be found in the section on Defining POMDPs.","category":"page"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"(Image: interface_relationships)","category":"page"},{"location":"concepts/#POMDPs-and-MDPs","page":"Concepts and Architecture","title":"POMDPs and MDPs","text":"","category":"section"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"An MDP is a mathematical framework for sequential decision making under uncertainty, and where all of the uncertainty arises from outcomes that are partially random and partially under the control of a decision maker. Mathematically, an MDP is a tuple (SATRgamma), where S is the state space, A is the action space, T is a transition function defining the probability of transitioning to each state given the state and action at the previous time, and R is a reward function mapping every possible transition (sas) to a real reward value. Finally, gamma is a discount factor that defines the relative weighting of current and future rewards. For more information see a textbook such as [1]. In POMDPs.jl an MDP is represented by a concrete subtype of the MDP abstract type and a set of methods that define each of its components as described in the problem definition section.","category":"page"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"A POMDP is a more general sequential decision making problem in which the agent is not sure what state they are in. The state is only partially observable by the decision making agent. Mathematically, a POMDP is a tuple (SATROZgamma) where S, A, T, R, and gamma have the same meaning as in an MDP, Z is the agent's observation space, and O defines the probability of receiving each observation at a transition. In POMDPs.jl, a POMDP is represented by a concrete subtype of the POMDP abstract type, and the methods described in the problem definition section.","category":"page"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"POMDPs.jl contains additional functions for defining optional problem behavior such as an initial state distribution or terminal states. More information can be found in the Defining POMDPs section.","category":"page"},{"location":"concepts/#Beliefs-and-Updaters","page":"Concepts and Architecture","title":"Beliefs and Updaters","text":"","category":"section"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"In a POMDP domain, the decision-making agent does not have complete information about the state of the problem, so the agent can only make choices based on its \"belief\" about the state. In the POMDP literature, the term \"belief\" is typically defined to mean a probability distribution over all possible states of the system. However, in practice, the agent often makes decisions based on an incomplete or lossy record of past observations that has a structure much different from a probability distribution. For example, if the agent is represented by a finite-state controller, as is the case for Monte-Carlo Value Iteration [2], the belief is the controller state, which is a node in a graph. Another example is an agent represented by a recurrent neural network. In this case, the agent's belief is the state of the network. In order to accommodate a wide variety of decision-making approaches in POMDPs.jl, we use the term \"belief\" to denote the set of information that the agent makes a decision on, which could be an exact state distribution, an action-observation history, a set of weighted particles, or the examples mentioned before. In code, the belief can be represented by any built-in or user-defined type.","category":"page"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"When an action is taken and a new observation is received, the belief is updated by the belief updater. In code, a belief updater is represented by a concrete subtype of the Updater abstract type, and the update(updater, belief, action, observation) function defines how the belief is updated when a new observation is received.","category":"page"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"Although the agent may use a specialized belief structure to make decisions, the information initially given to the agent about the state of the problem is usually most conveniently represented as a state distribution, thus the initialize_belief function is provided to convert a state distribution to a specialized belief structure that an updater can work with.","category":"page"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"In many cases, the belief structure is closely related to the solution technique, so it will be implemented by the programmer who writes the solver. In other cases, the agent can use a variety of belief structures to make decisions, so a domain-specific updater implemented by the programmer that wrote the problem description may be appropriate. Finally, some advanced generic belief updaters such as particle filters may be implemented by a third party. The convenience function updater(policy) can be used to get a suitable default updater for a policy, however many policies can work with other updaters.","category":"page"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"For more information on implementing a belief updater, see Defining a Belief Updater","category":"page"},{"location":"concepts/#Solvers-and-Policies","page":"Concepts and Architecture","title":"Solvers and Policies","text":"","category":"section"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"Sequential decision making under uncertainty involves both online and offline calculations. In the broad sense, the term \"solver\" as used in the node in the figure at the top of the page refers to the software package that performs the calculations at both of these times. However, the code is broken up into two pieces, the solver that performs calculations offline and the policy that performs calculations online.","category":"page"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"In the abstract, a policy is a mapping from every belief that an agent might take to an action. A policy is represented in code by a concrete subtype of the Policy abstract type. The programmer implements action to describe what computations need to be done online. For an online solver such as POMCP, all of the decision computation occurs within action while for an offline solver like SARSOP, there is very little computation within action. See Interacting with Policies for more information.","category":"page"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"The offline portion of the computation is carried out by the solver, which is represented by a concrete subtype of the Solver abstract type. Computations occur within the solve function. For an offline solver like SARSOP, nearly all of the decision computation occurs within this function, but for some online solvers such as POMCP, solve merely embeds the problem in the policy.","category":"page"},{"location":"concepts/#Simulators","page":"Concepts and Architecture","title":"Simulators","text":"","category":"section"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"A simulator defines a way to run one or more simulations. It is represented by a concrete subtype of the Simulator abstract type and the simulation is an implemention of simulate. Depending on the simulator, simulate may return a variety of data about the simulation, such as the discounted reward or the state history. All simulators should perform simulations consistent with the Simulation Standard.","category":"page"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"[1] Decision Making Under Uncertainty: Theory and Application by Mykel J. Kochenderfer, MIT Press, 2015","category":"page"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"[2] Bai, H., Hsu, D., & Lee, W. S. (2014). Integrated perception and planning in the continuous space: A POMDP approach. The International Journal of Robotics Research, 33(9), 1288-1302","category":"page"},{"location":"interfaces/#Spaces-and-Distributions","page":"Spaces and Distributions","title":"Spaces and Distributions","text":"","category":"section"},{"location":"interfaces/","page":"Spaces and Distributions","title":"Spaces and Distributions","text":"Two important components of the definitions of MDPs and POMDPs are spaces, which specify the possible states, actions, and observations in a problem and distributions, which define probability distributions. In order to provide for maximum flexibility spaces and distributions may be of any type (i.e. there are no abstract base types). Solvers and simulators will interact with space and distribution types using the functions defined below.","category":"page"},{"location":"interfaces/#space-interface","page":"Spaces and Distributions","title":"Spaces","text":"","category":"section"},{"location":"interfaces/","page":"Spaces and Distributions","title":"Spaces and Distributions","text":"A space object should contain the information needed to define the set of all possible states, actions or observations. The implementation will depend on the attributes of the elements. For example, if the space is continuous, the space object may only contain the limits of the continuous range. In the case of a discrete problem, a vector containing all states is appropriate for representing a space.","category":"page"},{"location":"interfaces/","page":"Spaces and Distributions","title":"Spaces and Distributions","text":"The following functions may be called on a space object (Click on a function to read its documentation):","category":"page"},{"location":"interfaces/","page":"Spaces and Distributions","title":"Spaces and Distributions","text":"rand\niterate and the rest of the iteration interface for discrete spaces.","category":"page"},{"location":"interfaces/#Distributions","page":"Spaces and Distributions","title":"Distributions","text":"","category":"section"},{"location":"interfaces/","page":"Spaces and Distributions","title":"Spaces and Distributions","text":"A distribution object represents a probability distribution.","category":"page"},{"location":"interfaces/","page":"Spaces and Distributions","title":"Spaces and Distributions","text":"The following functions may be called on a distribution object (Click on a function to read its documentation):","category":"page"},{"location":"interfaces/","page":"Spaces and Distributions","title":"Spaces and Distributions","text":"rand([rng,] d) [1]\nsupport\npdf\nmode\nmean","category":"page"},{"location":"interfaces/","page":"Spaces and Distributions","title":"Spaces and Distributions","text":"You can find some useful pre-made distribution objects in Distributions.jl or POMDPTools.","category":"page"},{"location":"interfaces/","page":"Spaces and Distributions","title":"Spaces and Distributions","text":"[1]: Distributions should support both rand(rng::AbstractRNG, d) and rand(d). The recommended way to do this is by implmenting Base.rand(rng::AbstractRNG, s::Random.SamplerTrivial{<:YourDistribution}) from the julia rand interface.","category":"page"},{"location":"POMDPTools/#pomdptools_section","page":"POMDPTools: the standard library for POMDPs.jl","title":"POMDPTools: the standard library for POMDPs.jl","text":"","category":"section"},{"location":"POMDPTools/","page":"POMDPTools: the standard library for POMDPs.jl","title":"POMDPTools: the standard library for POMDPs.jl","text":"The POMDPs.jl package does nothing more than define an interface or language for interacting with and solving (PO)MDPs; it does not contain any implementations. In practice, defining and solving POMDPs is made vastly easier if some commonly-used structures are provided. The POMDPTools package contains these implementations. Thus, the relationship between POMDPs.jl and POMDPTools is similar to the relationship between a programming language and its standard library.","category":"page"},{"location":"POMDPTools/","page":"POMDPTools: the standard library for POMDPs.jl","title":"POMDPTools: the standard library for POMDPs.jl","text":"The POMDPTools package source code is hosted in the POMDPs.jl github repository in the lib/POMDPTools directory.","category":"page"},{"location":"POMDPTools/","page":"POMDPTools: the standard library for POMDPs.jl","title":"POMDPTools: the standard library for POMDPs.jl","text":"The contents of the library are outlined below:","category":"page"},{"location":"POMDPTools/","page":"POMDPTools: the standard library for POMDPs.jl","title":"POMDPTools: the standard library for POMDPs.jl","text":"Pages = [\"distributions.md\", \"model.md\", \"visualization.md\", \"beliefs.md\", \"policies.md\", \"simulators.md\", \"common_rl.md\", \"testing.md\"]","category":"page"},{"location":"POMDPTools/policies/#Implemented-Policies","page":"Implemented Policies","title":"Implemented Policies","text":"","category":"section"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"POMDPTools currently provides the following policy types:","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"a wrapper to turn a function into a Policy\nan alpha vector policy type\na random policy\na stochastic policy type\nexploration policies\na vector policy type\na wrapper to collect statistics and errors about policies","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"In addition, it provides the showpolicy function for printing policies similar to the way that matrices are printed in the repl and the evaluate function for evaluating MDP policies.","category":"page"},{"location":"POMDPTools/policies/#Function","page":"Implemented Policies","title":"Function","text":"","category":"section"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"Wraps a Function mapping states to actions into a Policy. ","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"FunctionPolicy","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.FunctionPolicy","page":"Implemented Policies","title":"POMDPTools.Policies.FunctionPolicy","text":"FunctionPolicy\n\nPolicy p=FunctionPolicy(f) returns f(x) when action(p, x) is called.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"FunctionSolver","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.FunctionSolver","page":"Implemented Policies","title":"POMDPTools.Policies.FunctionSolver","text":"FunctionSolver\n\nSolver for a FunctionPolicy.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/policies/#Alpha-Vector-Policy","page":"Implemented Policies","title":"Alpha Vector Policy","text":"","category":"section"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"Represents a policy with a set of alpha vectors (See AlphaVectorPolicy constructor docstring). In addition to finding the optimal action with action, the alpha vectors can be accessed with alphavectors or alphapairs.","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"Determining the estimated value and optimal action depends on calculating the dot product between alpha vectors and a belief vector. POMDPTools.Policies.beliefvec(pomdp, b) is used to create this vector and can be overridden for new belief types for efficiency.","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"AlphaVectorPolicy\nalphavectors\nalphapairs\nPOMDPTools.Policies.beliefvec","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.AlphaVectorPolicy","page":"Implemented Policies","title":"POMDPTools.Policies.AlphaVectorPolicy","text":"AlphaVectorPolicy(pomdp::POMDP, alphas, action_map)\n\nConstruct a policy from alpha vectors.\n\nArguments\n\nalphas: an |S| x (number of alpha vecs) matrix or a vector of alpha vectors.\naction_map: a vector of the actions correponding to each alpha vector\nAlphaVectorPolicy{P<:POMDP, A}\n\nRepresents a policy with a set of alpha vectors.\n\nUse action to get the best action for a belief, and alphavectors and alphapairs to \n\nFields\n\npomdp::P the POMDP problem \nn_states::Int the number of states in the POMDP\nalphas::Vector{Vector{Float64}} the list of alpha vectors\naction_map::Vector{A} a list of action corresponding to the alpha vectors\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/policies/#POMDPTools.Policies.alphavectors","page":"Implemented Policies","title":"POMDPTools.Policies.alphavectors","text":"Return the alpha vectors.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/policies/#POMDPTools.Policies.alphapairs","page":"Implemented Policies","title":"POMDPTools.Policies.alphapairs","text":"Return an iterator of alpha vector-action pairs in the policy.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/policies/#POMDPTools.Policies.beliefvec","page":"Implemented Policies","title":"POMDPTools.Policies.beliefvec","text":"POMDPTools.Policies.beliefvec(m::POMDP, n_states::Int, b)\n\nReturn a vector-like representation of the belief b suitable for calculating the dot product with the alpha vectors.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/policies/#Random-Policy","page":"Implemented Policies","title":"Random Policy","text":"","category":"section"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"A policy that returns a randomly selected action using rand(rng, actions(pomdp)).","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"RandomPolicy","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.RandomPolicy","page":"Implemented Policies","title":"POMDPTools.Policies.RandomPolicy","text":"RandomPolicy{RNG<:AbstractRNG, P<:Union{POMDP,MDP}, U<:Updater}\n\na generic policy that uses the actions function to create a list of actions and then randomly samples an action from it.\n\nConstructor:\n\n`RandomPolicy(problem::Union{POMDP,MDP};\n rng=Random.default_rng(),\n updater=NothingUpdater())`\n\nFields\n\nrng::RNG a random number generator \nprobelm::P the POMDP or MDP problem \nupdater::U a belief updater (default to NothingUpdater in the above constructor)\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"RandomSolver","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.RandomSolver","page":"Implemented Policies","title":"POMDPTools.Policies.RandomSolver","text":"solver that produces a random policy\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/policies/#Stochastic-Policies","page":"Implemented Policies","title":"Stochastic Policies","text":"","category":"section"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"Types for representing randomized policies:","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"StochasticPolicy samples actions from an arbitrary distribution.\nUniformRandomPolicy samples actions uniformly (see RandomPolicy for a similar use)\nCategoricalTabularPolicy samples actions from a categorical distribution with weights given by a ValuePolicy.","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"StochasticPolicy","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.StochasticPolicy","page":"Implemented Policies","title":"POMDPTools.Policies.StochasticPolicy","text":"StochasticPolicy{D, RNG <: AbstractRNG}\n\nRepresents a stochastic policy. Action are sampled from an arbitrary distribution.\n\nConstructor:\n\n`StochasticPolicy(distribution; rng=Random.default_rng())`\n\nFields\n\ndistribution::D\nrng::RNG a random number generator\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"CategoricalTabularPolicy","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.CategoricalTabularPolicy","page":"Implemented Policies","title":"POMDPTools.Policies.CategoricalTabularPolicy","text":"CategoricalTabularPolicy\n\nrepresents a stochastic policy sampling an action from a categorical distribution with weights given by a ValuePolicy\n\nconstructor:\n\nCategoricalTabularPolicy(mdp::Union{POMDP,MDP}; rng=Random.default_rng())\n\nFields\n\nstochastic::StochasticPolicy\nvalue::ValuePolicy\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/policies/#Vector-Policies","page":"Implemented Policies","title":"Vector Policies","text":"","category":"section"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"Tabular policies including the following:","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"VectorPolicy holds a vector of actions, one for each state, ordered according to stateindex.\nValuePolicy holds a matrix of values for state-action pairs and chooses the action with the highest value at the given state","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"VectorPolicy ","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.VectorPolicy","page":"Implemented Policies","title":"POMDPTools.Policies.VectorPolicy","text":"VectorPolicy{S,A}\n\nA generic MDP policy that consists of a vector of actions. The entry at stateindex(mdp, s) is the action that will be taken in state s.\n\nFields\n\nmdp::MDP{S,A} the MDP problem\nact::Vector{A} a vector of size |S| mapping state indices to actions\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"VectorSolver","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.VectorSolver","page":"Implemented Policies","title":"POMDPTools.Policies.VectorSolver","text":"VectorSolver{A}\n\nSolver for VectorPolicy. Doesn't do any computation - just sets the action vector.\n\nFields\n\nact::Vector{A} the action vector\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"ValuePolicy","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.ValuePolicy","page":"Implemented Policies","title":"POMDPTools.Policies.ValuePolicy","text":" ValuePolicy{P<:Union{POMDP,MDP}, T<:AbstractMatrix{Float64}, A}\n\nA generic MDP policy that consists of a value table. The entry at stateindex(mdp, s) is the action that will be taken in state s. It is expected that the order of the actions in the value table is consistent with the order of the actions in act. If act is not explicitly set in the construction, act is ordered according to actionindex.\n\nFields\n\nmdp::P the MDP problem\nvalue_table::T the value table as a |S|x|A| matrix\nact::Vector{A} the possible actions\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/policies/#Value-Dict-Policy","page":"Implemented Policies","title":"Value Dict Policy","text":"","category":"section"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"ValueDictPolicy holds a dictionary of values, where the key is state-action tuple, and chooses the action with the highest value at the given state. It allows one to write solvers without enumerating state and action spaces, but actions and states must support Base.isequal() and Base.hash().","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"ValueDictPolicy","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.ValueDictPolicy","page":"Implemented Policies","title":"POMDPTools.Policies.ValueDictPolicy","text":" ValueDictPolicy(mdp)\n\nA generic MDP policy that consists of a Dict storing Q-values for state-action pairs. If there are no entries higher than a default value, this will fall back to a default policy.\n\nKeyword Arguments\n\nvalue_table::AbstractDict the value dict, key is (s, a) Tuple.\ndefault_value::Float64 the defalut value of value_dict.\ndefault_policy::Policy the policy taken when no action has a value higher than default_value\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/policies/#Exploration-Policies","page":"Implemented Policies","title":"Exploration Policies","text":"","category":"section"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"Exploration policies are often useful for Reinforcement Learning algorithm to choose an action that is different than the action given by the policy being learned (on_policy). ","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"Exploration policies are subtype of the abstract ExplorationPolicy type and they follow the following interface: action(exploration_policy::ExplorationPolicy, on_policy::Policy, k, s). k is used to compute the value of the exploration parameter (see Schedule), and s is the current state or observation in which the agent is taking an action.","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"The action method is exported by POMDPs.jl. To use exploration policies in a solver, you must use the four argument version of action where on_policy is the policy being learned (e.g. tabular policy or neural network policy).","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"This package provides two exploration policies: EpsGreedyPolicy and SoftmaxPolicy","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":" EpsGreedyPolicy\n SoftmaxPolicy","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.EpsGreedyPolicy","page":"Implemented Policies","title":"POMDPTools.Policies.EpsGreedyPolicy","text":"EpsGreedyPolicy <: ExplorationPolicy\n\nrepresents an epsilon greedy policy, sampling a random action with a probability eps or returning an action from a given policy otherwise. The evolution of epsilon can be controlled using a schedule. This feature is useful for using those policies in reinforcement learning algorithms. \n\nConstructor:\n\nEpsGreedyPolicy(problem::Union{MDP, POMDP}, eps::Union{Function, Float64}; rng=Random.default_rng(), schedule=ConstantSchedule)\n\nIf a function is passed for eps, eps(k) is called to compute the value of epsilon when calling action(exploration_policy, on_policy, k, s).\n\nFields\n\neps::Function\nrng::AbstractRNG\nm::M POMDPs or MDPs problem\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/policies/#POMDPTools.Policies.SoftmaxPolicy","page":"Implemented Policies","title":"POMDPTools.Policies.SoftmaxPolicy","text":"SoftmaxPolicy <: ExplorationPolicy\n\nrepresents a softmax policy, sampling a random action according to a softmax function. The softmax function converts the action values of the on policy into probabilities that are used for sampling. A temperature parameter or function can be used to make the resulting distribution more or less wide.\n\nConstructor\n\nSoftmaxPolicy(problem, temperature::Union{Function, Float64}; rng=Random.default_rng())\n\nIf a function is passed for temperature, temperature(k) is called to compute the value of the temperature when calling action(exploration_policy, on_policy, k, s)\n\nFields\n\ntemperature::Function\nrng::AbstractRNG\nactions::A an indexable list of action\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/policies/#Schedule","page":"Implemented Policies","title":"Schedule","text":"","category":"section"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"Exploration policies often rely on a key parameter: epsilon in epsilon-greedy and the temperature in softmax for example. Reinforcement learning algorithms often require a decay schedule for these parameters. Schedule can be passed to an exploration policy as functions. For example one can define an epsilon greedy policy with an exponential decay schedule as follow: ","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":" m # your mdp or pomdp model\n exploration_policy = EpsGreedyPolicy(m, k->0.05*0.9^(k/10))","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"POMDPTools exports a linear decay schedule object that can be used as well. ","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":" LinearDecaySchedule ","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.LinearDecaySchedule","page":"Implemented Policies","title":"POMDPTools.Policies.LinearDecaySchedule","text":"LinearDecaySchedule\n\nA schedule that linearly decreases a value from start to stop in steps steps. if the value is greater or equal to stop, it stays constant.\n\nConstructor\n\nLinearDecaySchedule(;start, stop, steps)\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/policies/#Playback-Policy","page":"Implemented Policies","title":"Playback Policy","text":"","category":"section"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"A policy that replays a fixed sequence of actions. When all actions are used, a backup policy is used.","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"PlaybackPolicy","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.PlaybackPolicy","page":"Implemented Policies","title":"POMDPTools.Policies.PlaybackPolicy","text":"PlaybackPolicy{A<:AbstractArray, P<:Policy, V<:AbstractArray{<:Real}}\n\na policy that applies a fixed sequence of actions until they are all used and then falls back onto a backup policy until the end of the episode.\n\nConstructor:\n\n`PlaybackPolicy(actions::AbstractArray, backup_policy::Policy; logpdfs::AbstractArray{Float64, 1} = Float64[])`\n\nFields\n\nactions::Vector{A} a vector of actions to play back\nbackup_policy::Policy the policy to use when all prescribed actions have been taken but the episode continues\nlogpdfs::Vector{Float64} the log probability (density) of actions\ni::Int64 the current action index\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/policies/#Utility-Wrapper","page":"Implemented Policies","title":"Utility Wrapper","text":"","category":"section"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"A wrapper for policies to collect statistics and handle errors.","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"PolicyWrapper","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.PolicyWrapper","page":"Implemented Policies","title":"POMDPTools.Policies.PolicyWrapper","text":"PolicyWrapper\n\nFlexible utility wrapper for a policy designed for collecting statistics about planning.\n\nCarries a function, a policy, and optionally a payload (that can be any type).\n\nThe function should typically be defined with the do syntax. Each time action is called on the wrapper, this function will be called.\n\nIf there is no payload, it will be called with two argments: the policy and the state/belief. If there is a payload, it will be called with three arguments: the policy, the payload, and the current state or belief. The function should return an appropriate action. The idea is that, in this function, action(policy, s) should be called, statistics from the policy/planner should be collected and saved in the payload, exceptions can be handled, and the action should be returned.\n\nConstructor\n\nPolicyWrapper(policy::Policy; payload=nothing)\n\nExample\n\nusing POMDPModels\nusing POMDPToolbox\n\nmdp = GridWorld()\npolicy = RandomPolicy(mdp)\ncounts = Dict(a=>0 for a in actions(mdp))\n\n# with a payload\nstatswrapper = PolicyWrapper(policy, payload=counts) do policy, counts, s\n a = action(policy, s)\n counts[a] += 1\n return a\nend\n\nh = simulate(HistoryRecorder(max_steps=100), mdp, statswrapper)\nfor (a, count) in payload(statswrapper)\n println(\"policy chose action $a $count of $(n_steps(h)) times.\")\nend\n\n# without a payload\nerrwrapper = PolicyWrapper(policy) do policy, s\n try\n a = action(policy, s)\n catch ex\n @warn(\"Caught error in policy; using default\")\n a = :left\n end\n return a\nend\n\nh = simulate(HistoryRecorder(max_steps=100), mdp, errwrapper)\n\nFields\n\nf::F\npolicy::P\npayload::PL\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/policies/#Pretty-Printing-Policies","page":"Implemented Policies","title":"Pretty Printing Policies","text":"","category":"section"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"showpolicy","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.showpolicy","page":"Implemented Policies","title":"POMDPTools.Policies.showpolicy","text":"showpolicy([io], [mime], m::MDP, p::Policy)\nshowpolicy([io], [mime], statelist::AbstractVector, p::Policy)\nshowpolicy(...; pre=\" \")\n\nPrint the states in m or statelist and the actions from policy p corresponding to those states.\n\nFor the MDP version, if io[:limit] is true, will only print enough states to fill the display.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/policies/#Policy-Evaluation","page":"Implemented Policies","title":"Policy Evaluation","text":"","category":"section"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"The evaluate function provides a policy evaluation tool for MDPs:","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"evaluate","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.evaluate","page":"Implemented Policies","title":"POMDPTools.Policies.evaluate","text":"evaluate(m::MDP, p::Policy)\nevaluate(m::MDP, p::Policy; rewardfunction=POMDPs.reward)\n\nCalculate the value for a policy on an MDP using the approach in equation 4.2.2 of Kochenderfer, Decision Making Under Uncertainty, 2015.\n\nReturns a DiscreteValueFunction, which maps states to values.\n\nExample\n\nusing POMDPTools, POMDPModels\nm = SimpleGridWorld()\nu = evaluate(m, FunctionPolicy(x->:left))\nu([1,1]) # value of always moving left starting at state [1,1]\n\n\n\n\n\n","category":"function"},{"location":"def_updater/#Defining-a-Belief-Updater","page":"Defining a Belief Updater","title":"Defining a Belief Updater","text":"","category":"section"},{"location":"def_updater/","page":"Defining a Belief Updater","title":"Defining a Belief Updater","text":"In this section we list the requirements for defining a belief updater. For a description of what a belief updater is, see Concepts and Architecture - Beliefs and Updaters. Typically a belief updater will have an associated belief type, and may be closely tied to a particular policy/planner.","category":"page"},{"location":"def_updater/#Defining-a-Belief-Type","page":"Defining a Belief Updater","title":"Defining a Belief Type","text":"","category":"section"},{"location":"def_updater/","page":"Defining a Belief Updater","title":"Defining a Belief Updater","text":"A belief object should contain all of the information needed for the next belief update and for the policy to make a decision. The belief type could be a pre-defined type such as a distribution from Distributions.jl or DiscreteBelief or SparseCat from the POMDPTools package, or it could be a custom type.","category":"page"},{"location":"def_updater/","page":"Defining a Belief Updater","title":"Defining a Belief Updater","text":"Often, but not always, the belief will represent a probability distribution. In this case, the functions in the distribution interface should be implemented if possible. Implementing these functions will make the belief usable with many of the policies and planners in the POMDPs.jl ecosystem, and will make it easy for others to convert between beliefs and to interpret what a belief means.","category":"page"},{"location":"def_updater/#Histories-associated-with-a-belief","page":"Defining a Belief Updater","title":"Histories associated with a belief","text":"","category":"section"},{"location":"def_updater/","page":"Defining a Belief Updater","title":"Defining a Belief Updater","text":"If a complete or partial record of the action-observation history leading up to a belief is available, it is often helpful to give access to this by implementing the history or currentobs functions (see the docstrings for more details). This is especially useful if a problem-writer wants to implement a belief- or observation-dependent action space. Belief type implementers need only implement history, and currentobs will automatically be provided, though sometimes it is more convenient to implement currentobs directly.","category":"page"},{"location":"def_updater/#Defining-an-Updater","page":"Defining a Belief Updater","title":"Defining an Updater","text":"","category":"section"},{"location":"def_updater/","page":"Defining a Belief Updater","title":"Defining a Belief Updater","text":"To create an updater, one should define a subtype of the Updater abstract type and implement two methods, one to create the initial belief from the problem's initial state distribution and one to perform a belief update:","category":"page"},{"location":"def_updater/","page":"Defining a Belief Updater","title":"Defining a Belief Updater","text":"initialize_belief(updater, d) creates a belief from state distribution d appropriate to use with the updater. To extract information from d, use the functions from the distribution interface.\nupdate(updater, b, a, o) returns an updated belief given belief b, action a, and observation o. One can usually expect b to be the same type returned by initialize_belief because a careful user will always call initialize_belief before update, but it would also be reasonable to implement update for b of a different type if it is desirable to handle multiple belief types.","category":"page"},{"location":"def_updater/#Example:-History-Updater","page":"Defining a Belief Updater","title":"Example: History Updater","text":"","category":"section"},{"location":"def_updater/","page":"Defining a Belief Updater","title":"Defining a Belief Updater","text":"One trivial type of belief would be the action-observation history, a list containing the initial state distribution and every action taken and observation received. The history contains all of the information received up to the current time, but it is not usually very useful because most policies make decisions based on a state probability distribution. Here the belief type is simply the built in Vector{Any}, so we need only create the updater and write update and initialize_belief. Normally, update would contain belief update probability calculations, but in this example, we simply append the action and observation to the history.","category":"page"},{"location":"def_updater/","page":"Defining a Belief Updater","title":"Defining a Belief Updater","text":"(Note that this example is designed for readability rather than efficiency.)","category":"page"},{"location":"def_updater/","page":"Defining a Belief Updater","title":"Defining a Belief Updater","text":"import POMDPs\n\nstruct HistoryUpdater <: POMDPs.Updater end\n\nPOMDPs.initialize_belief(up::HistoryUpdater, d) = Any[d]\n\nfunction POMDPs.update(up::HistoryUpdater, b, a, o)\n bp = copy(b)\n push!(bp, a)\n push!(bp, o)\n return bp\nend","category":"page"},{"location":"def_updater/","page":"Defining a Belief Updater","title":"Defining a Belief Updater","text":"At each step, the history starts with the original distribution, then contains all the actions and observations received up to that point. The example below shows this for the crying baby problem (observations are true/false for crying and actions are true/false for feeding).","category":"page"},{"location":"def_updater/","page":"Defining a Belief Updater","title":"Defining a Belief Updater","text":"using POMDPTools\nusing POMDPModels\nusing Random\n\npomdp = BabyPOMDP()\npolicy = RandomPolicy(pomdp, rng=MersenneTwister(1))\nup = HistoryUpdater()\n\n# within stepthrough initialize_belief is called on the initial state distribution of the pomdp, then update is called at each step.\nfor b in stepthrough(pomdp, policy, up, \"b\", rng=MersenneTwister(2), max_steps=5)\n @show b\nend\n\n# output\n\nb = Any[POMDPModels.BoolDistribution(0.0)]\nb = Any[POMDPModels.BoolDistribution(0.0), false, false]\nb = Any[POMDPModels.BoolDistribution(0.0), false, false, false, false]\nb = Any[POMDPModels.BoolDistribution(0.0), false, false, false, false, true, false]\nb = Any[POMDPModels.BoolDistribution(0.0), false, false, false, false, true, false, true, false]","category":"page"},{"location":"faq/#Frequently-Asked-Questions-(FAQ)","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"","category":"section"},{"location":"faq/#What-is-the-difference-between-transition,-gen,-and-@gen?","page":"Frequently Asked Questions (FAQ)","title":"What is the difference between transition, gen, and @gen?","text":"","category":"section"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"(See also: Using a single generative function instead of separate T, Z, and R)","category":"page"},{"location":"faq/#For-problem-implementers","page":"Frequently Asked Questions (FAQ)","title":"For problem implementers","text":"","category":"section"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"transition should be implemented to define the state transition distribution, either explicitly, or, if only samples from the distribution are available, with an ImplicitDistribution.\ngen should only be implemented if your simulator can only output samples of two or more of the next state, observation, and reward at the same time, e.g. if rewards are calculated as a robot moves from the current state to the next state so it is difficult to define the reward function separately from the state transitions.\n@gen should never be implemented or modified by the problem writer; it is only used in simulators and solvers (see below).","category":"page"},{"location":"faq/#For-solver/simulator-implementers","page":"Frequently Asked Questions (FAQ)","title":"For solver/simulator implementers","text":"","category":"section"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"@gen should be called whenever a sample of the next state, observation, and or reward is needed. It automatically combines calls to rand, transition, observation, reward, and gen, depending on what is implemented for the problem and the outputs requested by the caller without any overhead.\ntransition should be called only when you need access to the explicit transition probability distribution.\ngen should never be called directly by a solver or simulator; it is only a tool for implementers (see above).","category":"page"},{"location":"faq/#How-do-I-save-my-policies?","page":"Frequently Asked Questions (FAQ)","title":"How do I save my policies?","text":"","category":"section"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"We recommend using JLD2 to save the whole policy object:","category":"page"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"using JLD2\nsave(\"my_policy.jld2\", \"policy\", policy)","category":"page"},{"location":"faq/#Why-is-my-solver-producing-a-suboptimal-policy?","page":"Frequently Asked Questions (FAQ)","title":"Why is my solver producing a suboptimal policy?","text":"","category":"section"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"There could be a number of things that are going wrong. If you have a discrete POMDP or MDP and you're using a solver that requires the explicit transition probabilities, the first thing to try is make sure that your probability masses sum up to unity. We've provide some tools in POMDPToolbox that can check this for you. If you have a POMDP called pomdp, you can run the checks by doing the following:","category":"page"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"using POMDPTools\n@assert has_consistent_distributions(pomdp)","category":"page"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"If this throws an error, you may need to fix your transition or observation functions. ","category":"page"},{"location":"faq/#What-if-I-don't-use-the-rng-argument?","page":"Frequently Asked Questions (FAQ)","title":"What if I don't use the rng argument?","text":"","category":"section"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"POMDPs.jl uses Julia's built-in random number generator system to provide for reproducible simulations. To tie into this system, the gen function, the sampling function for the ImplicitDistribution, and the rand function for custom distributions all have an rng argument that should be used to generate random numbers. However in some cases, for example when wrapping a simulator that is tied to the global random number generator or written in another language, it may be impossible or impractical to use this rng.","category":"page"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"It is natural to wonder if ignoring this rng argument will cause problems. For many use cases, it is OK to ignore this argument - the only consequence will be that simulations will not be exactly reproducible unless the random seed is managed separately. Some algorithms, most notably DESPOT, rely on \"determinized scenarios\" that are implemented with a special rng. Some of the guarantees of these algorithms may not be met if the rng argument is ignored.","category":"page"},{"location":"faq/#Why-are-all-the-solvers-in-separate-modules?","page":"Frequently Asked Questions (FAQ)","title":"Why are all the solvers in separate modules?","text":"","category":"section"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"We did not put all the solvers and support tools into POMDPs.jl, because we wanted POMDPs.jl to be a lightweight interface package. This has a number of advantages. The first is that if a user only wants to use a few solvers from the JuliaPOMDP organization, they do not have to install all the other solvers and their dependencies. The second advantage is that people who are not directly part of the JuliaPOMDP organization can write their own solvers without going into the source code of other solvers. This makes the framework easier to adopt and to extend.","category":"page"},{"location":"faq/#How-can-I-implement-terminal-actions?","page":"Frequently Asked Questions (FAQ)","title":"How can I implement terminal actions?","text":"","category":"section"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"Terminal actions are actions that cause the MDP to terminate without generating a new state. POMDPs.jl handles terminal conditions via the isterminal function on states, and does not directly support terminal actions. If your MDP has a terminal action, you need to implement the model functions accordingly to generate a terminal state. In both generative and explicit cases, you will need some dummy state, say spt, that can be recognized as terminal by the isterminal function. One way to do this is to give spt a state value that is out of bounds (e.g. a vector of NaNs or -1s) and then check for that in isterminal, so that this does not clash with any conventional termination conditions on the state.","category":"page"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"If a terminal action is taken, regardless of current state, the transition function should return a distribution with only one next state, spt, with probability 1.0. In the generative case, the new state generated should be spt. The reward function or the r in generate_sr can be set according to the cost of the terminal action.","category":"page"},{"location":"faq/#Why-are-there-two-versions-of-reward?","page":"Frequently Asked Questions (FAQ)","title":"Why are there two versions of reward?","text":"","category":"section"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"Both reward(m, s, a) and reward(m, s, a, sp) are included because of these two facts:","category":"page"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"Some non-native solvers use reward(m, s, a)\nSometimes the reward depends on s and sp.","category":"page"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"It is reasonable to implement both as long as the (s, a) version is the expectation of the (s, a, s') version (see below).","category":"page"},{"location":"faq/#How-do-I-implement-reward(m,-s,-a)-if-the-reward-depends-on-the-next-state?","page":"Frequently Asked Questions (FAQ)","title":"How do I implement reward(m, s, a) if the reward depends on the next state?","text":"","category":"section"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"The solvers that require reward(m, s, a) only work on problems with finite state and action spaces. In this case, you can define reward(m, s, a) in terms of reward(m, s, a, sp) with the following code:","category":"page"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"const rdict = Dict{Tuple{S,A}, Float64}()\n\nfor s in states(m)\n for a in actions(m)\n r = 0.0\n td = transition(m, s, a) # transition distribution for s, a\n for sp in support(td)\n r += pdf(td, sp)*reward(m, s, a, sp)\n end\n rdict[(s, a)] = r\n end\nend\n\nPOMDPs.reward(m, s, a) = rdict[(s, a)]","category":"page"},{"location":"faq/#Why-do-I-need-to-put-type-assertions-pomdp::POMDP-into-the-function-signature?","page":"Frequently Asked Questions (FAQ)","title":"Why do I need to put type assertions pomdp::POMDP into the function signature?","text":"","category":"section"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"Specifying the type in your function signature allows Julia to call the appropriate function when your custom type is passed into it. For example if a POMDPs.jl solver calls states on the POMDP that you passed into it, the correct states function will only get dispatched if you specified that the states function you wrote works with your POMDP type. Because Julia supports multiple-dispatch, these type assertion are a way for doing object-oriented programming in Julia.","category":"page"},{"location":"POMDPTools/beliefs/#Implemented-Belief-Updaters","page":"Implemented Belief Updaters","title":"Implemented Belief Updaters","text":"","category":"section"},{"location":"POMDPTools/beliefs/","page":"Implemented Belief Updaters","title":"Implemented Belief Updaters","text":"POMDPTools provides the following generic belief updaters:","category":"page"},{"location":"POMDPTools/beliefs/","page":"Implemented Belief Updaters","title":"Implemented Belief Updaters","text":"a discrete belief updater\na k previous observation updater\na previous observation updater \na nothing updater (for when the policy does not depend on any feedback)","category":"page"},{"location":"POMDPTools/beliefs/","page":"Implemented Belief Updaters","title":"Implemented Belief Updaters","text":"For particle filters see ParticleFilters.jl.","category":"page"},{"location":"POMDPTools/beliefs/#Discrete-(Bayesian-Filter)","page":"Implemented Belief Updaters","title":"Discrete (Bayesian Filter)","text":"","category":"section"},{"location":"POMDPTools/beliefs/","page":"Implemented Belief Updaters","title":"Implemented Belief Updaters","text":"The DiscreteUpater is a default implementation of a discrete Bayesian filter. The DiscreteBelief type is provided to represent discrete beliefs for discrete state POMDPs. ","category":"page"},{"location":"POMDPTools/beliefs/","page":"Implemented Belief Updaters","title":"Implemented Belief Updaters","text":"A convenience function uniform_belief is provided to create a DiscreteBelief with equal probability for each state. ","category":"page"},{"location":"POMDPTools/beliefs/","page":"Implemented Belief Updaters","title":"Implemented Belief Updaters","text":"DiscreteBelief","category":"page"},{"location":"POMDPTools/beliefs/#POMDPTools.BeliefUpdaters.DiscreteBelief","page":"Implemented Belief Updaters","title":"POMDPTools.BeliefUpdaters.DiscreteBelief","text":"DiscreteBelief\n\nA belief specified by a probability vector.\n\nNormalization of b is assumed in some calculations (e.g. pdf), but it is only automatically enforced in update(...), and a warning is given if normalized incorrectly in DiscreteBelief(pomdp, b).\n\nConstructor\n\nDiscreteBelief(pomdp, b::Vector{Float64}; check::Bool=true)\n\nFields\n\npomdp : the POMDP problem \nstate_list : a vector of ordered states\nb : the probability vector \n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/beliefs/","page":"Implemented Belief Updaters","title":"Implemented Belief Updaters","text":"DiscreteUpdater","category":"page"},{"location":"POMDPTools/beliefs/#POMDPTools.BeliefUpdaters.DiscreteUpdater","page":"Implemented Belief Updaters","title":"POMDPTools.BeliefUpdaters.DiscreteUpdater","text":"DiscreteUpdater\n\nAn updater type to update discrete belief using the discrete Bayesian filter.\n\nConstructor\n\nDiscreteUpdater(pomdp::POMDP)\n\nFields\n\npomdp <: POMDP\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/beliefs/","page":"Implemented Belief Updaters","title":"Implemented Belief Updaters","text":"uniform_belief(pomdp)","category":"page"},{"location":"POMDPTools/beliefs/#POMDPTools.BeliefUpdaters.uniform_belief-Tuple{Any}","page":"Implemented Belief Updaters","title":"POMDPTools.BeliefUpdaters.uniform_belief","text":" uniform_belief(pomdp)\n\nReturn a DiscreteBelief with equal probability for each state.\n\n\n\n\n\n","category":"method"},{"location":"POMDPTools/beliefs/#K-Previous-Observations","page":"Implemented Belief Updaters","title":"K Previous Observations","text":"","category":"section"},{"location":"POMDPTools/beliefs/","page":"Implemented Belief Updaters","title":"Implemented Belief Updaters","text":"KMarkovUpdater","category":"page"},{"location":"POMDPTools/beliefs/#POMDPTools.BeliefUpdaters.KMarkovUpdater","page":"Implemented Belief Updaters","title":"POMDPTools.BeliefUpdaters.KMarkovUpdater","text":"KMarkovUpdater\n\nUpdater that stores the k most recent observations as the belief.\n\nExample:\n\nup = KMarkovUpdater(5)\ns0 = rand(rng, initialstate(pomdp))\ninitial_observation = rand(rng, initialobs(pomdp, s0))\ninitial_obs_vec = fill(initial_observation, 5)\nhr = HistoryRecorder(rng=rng, max_steps=100)\nhist = simulate(hr, pomdp, policy, up, initial_obs_vec, s0)\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/beliefs/#Previous-Observation","page":"Implemented Belief Updaters","title":"Previous Observation","text":"","category":"section"},{"location":"POMDPTools/beliefs/","page":"Implemented Belief Updaters","title":"Implemented Belief Updaters","text":"PreviousObservationUpdater","category":"page"},{"location":"POMDPTools/beliefs/#POMDPTools.BeliefUpdaters.PreviousObservationUpdater","page":"Implemented Belief Updaters","title":"POMDPTools.BeliefUpdaters.PreviousObservationUpdater","text":"Updater that stores the most recent observation as the belief. If an initial distribution is provided, it will pass that as the initial belief.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/beliefs/#Nothing-Updater","page":"Implemented Belief Updaters","title":"Nothing Updater","text":"","category":"section"},{"location":"POMDPTools/beliefs/","page":"Implemented Belief Updaters","title":"Implemented Belief Updaters","text":"NothingUpdater","category":"page"},{"location":"POMDPTools/beliefs/#POMDPTools.BeliefUpdaters.NothingUpdater","page":"Implemented Belief Updaters","title":"POMDPTools.BeliefUpdaters.NothingUpdater","text":"An updater useful for when a belief is not necessary (i.e. for a random policy). update always returns nothing.\n\n\n\n\n\n","category":"type"},{"location":"api/#API-Documentation","page":"API Documentation","title":"API Documentation","text":"","category":"section"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"Docstrings for POMDPs.jl interface members can be accessed through Julia's built-in documentation system or in the list below.","category":"page"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"CurrentModule = POMDPs","category":"page"},{"location":"api/#Contents","page":"API Documentation","title":"Contents","text":"","category":"section"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"Pages = [\"api.md\"]","category":"page"},{"location":"api/#Index","page":"API Documentation","title":"Index","text":"","category":"section"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"Pages = [\"api.md\"]","category":"page"},{"location":"api/#Types","page":"API Documentation","title":"Types","text":"","category":"section"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"POMDP\nMDP\nSolver\nPolicy\nUpdater","category":"page"},{"location":"api/#POMDPs.POMDP","page":"API Documentation","title":"POMDPs.POMDP","text":"POMDP{S,A,O}\n\nAbstract base type for a partially observable Markov decision process.\n\nS: state type\nA: action type\nO: observation type\n\n\n\n\n\n","category":"type"},{"location":"api/#POMDPs.MDP","page":"API Documentation","title":"POMDPs.MDP","text":"MDP{S,A}\n\nAbstract base type for a fully observable Markov decision process.\n\nS: state type\nA: action type\n\n\n\n\n\n","category":"type"},{"location":"api/#POMDPs.Solver","page":"API Documentation","title":"POMDPs.Solver","text":"Base type for an MDP/POMDP solver\n\n\n\n\n\n","category":"type"},{"location":"api/#POMDPs.Policy","page":"API Documentation","title":"POMDPs.Policy","text":"Base type for a policy (a map from every possible belief, or more abstract policy state, to an optimal or suboptimal action)\n\n\n\n\n\n","category":"type"},{"location":"api/#POMDPs.Updater","page":"API Documentation","title":"POMDPs.Updater","text":"Abstract type for an object that defines how the belief should be updated\n\nA belief is a general construct that represents the knowledge an agent has about the state of the system. This can be a probability distribution, an action observation history or a more general representation.\n\n\n\n\n\n","category":"type"},{"location":"api/#Model-Functions","page":"API Documentation","title":"Model Functions","text":"","category":"section"},{"location":"api/#Dynamics","page":"API Documentation","title":"Dynamics","text":"","category":"section"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"transition\nobservation\nreward\ngen\n@gen","category":"page"},{"location":"api/#POMDPs.transition","page":"API Documentation","title":"POMDPs.transition","text":"transition(m::POMDP, state, action)\ntransition(m::MDP, state, action)\n\nReturn the transition distribution from the current state-action pair.\n\nIf it is difficult to define the probability density or mass function explicitly, consider using POMDPModelTools.ImplicitDistribution to define a generative model.\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.observation","page":"API Documentation","title":"POMDPs.observation","text":"observation(m::POMDP, statep)\nobservation(m::POMDP, action, statep)\nobservation(m::POMDP, state, action, statep)\n\nReturn the observation distribution. You need only define the method with the fewest arguments needed to determine the observation distribution.\n\nIf it is difficult to define the probability density or mass function explicitly, consider using POMDPModelTools.ImplicitDistribution to define a generative model.\n\nExample\n\nusing POMDPModelTools # for SparseCat\n\nstruct MyPOMDP <: POMDP{Int, Int, Int} end\n\nobservation(p::MyPOMDP, sp::Int) = SparseCat([sp-1, sp, sp+1], [0.1, 0.8, 0.1])\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.reward","page":"API Documentation","title":"POMDPs.reward","text":"reward(m::POMDP, s, a)\nreward(m::MDP, s, a)\n\nReturn the immediate reward for the s-a pair.\n\nreward(m::POMDP, s, a, sp)\nreward(m::MDP, s, a, sp)\n\nReturn the immediate reward for the s-a-s' triple\n\nreward(m::POMDP, s, a, sp, o)\n\nReturn the immediate reward for the s-a-s'-o quad\n\nFor some problems, it is easier to express reward(m, s, a, sp) or reward(m, s, a, sp, o), than reward(m, s, a), but some solvers, e.g. SARSOP, can only use reward(m, s, a). Both can be implemented for a problem, but when reward(m, s, a) is implemented, it should be consistent with reward(m, s, a, sp[, o]), that is, it should be the expected value over all destination states and observations.\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.gen","page":"API Documentation","title":"POMDPs.gen","text":"gen(m::Union{MDP,POMDP}, s, a, rng::AbstractRNG)\n\nFunction for implementing the entire MDP/POMDP generative model by returning a NamedTuple.\n\nSolver and simulator writers should use the @gen macro to call a generative model.\n\nArguments\n\nm: an MDP or POMDP model\ns: the current state\na: the action\nrng: a random number generator (Typically a MersenneTwister)\n\nReturn\n\nThe function should return a NamedTuple. With a subset of following entries:\n\nMDP\n\nsp: the next state\nr: the reward for the step\ninfo: extra debugging information, typically in an associative container like a NamedTuple\n\nPOMDP\n\nsp: the next state\no: the observation\nr: the reward for the step\ninfo: extra debugging information, typically in an associative container like a NamedTuple\n\nSome elements can be left out. For instance if o is left out of the return, the problem-writer can also implement observation and POMDPs.jl will automatically use it when needed.\n\nExample\n\nstruct LQRMDP <: MDP{Float64, Float64} end\n\nPOMDPs.gen(m::LQRMDP, s, a, rng) = (sp = s + a + randn(rng), r = -s^2 - a^2)\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.@gen","page":"API Documentation","title":"POMDPs.@gen","text":"@gen(X)(m, s, a)\n@gen(X)(m, s, a, rng::AbstractRNG)\n\nCall the generative model for a (PO)MDP m; Sample values from several nodes in the dynamic decision network. X is one or more symbols indicating which nodes to output.\n\nSolvers and simulators should call this rather than the gen function. Problem writers should implement a method of the transition or gen function instead of altering @gen.\n\nArguments\n\nm: an MDP or POMDP model\ns: the current state\na: the action\nrng (optional): a random number generator (Typically a MersenneTwister)\n\nReturn\n\nIf X, is a symbol, return a value sample from the corresponding node. If X is several symbols, return a Tuple of values sampled from the specified nodes.\n\nExamples\n\nLet m be an MDP or POMDP, s be a state of m, a be an action of m, and rng be an AbstractRNG.\n\n@gen(:sp, :r)(m, s, a) returns a Tuple containing the next state and reward.\n@gen(:sp, :o, :r)(m, s, a, rng) returns a Tuple containing the next state, observation, and reward.\n@gen(:sp)(m, s, a, rng) returns the next state.\n\n\n\n\n\n","category":"macro"},{"location":"api/#Static-Properties","page":"API Documentation","title":"Static Properties","text":"","category":"section"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"states\nactions\nobservations\nisterminal\ndiscount\ninitialstate\ninitialobs\nstateindex\nactionindex\nobsindex\nconvert_s\nconvert_a\nconvert_o","category":"page"},{"location":"api/#POMDPs.states","page":"API Documentation","title":"POMDPs.states","text":"states(problem::POMDP)\nstates(problem::MDP)\n\nReturns the complete state space of a POMDP. \n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.actions","page":"API Documentation","title":"POMDPs.actions","text":"actions(m::Union{MDP,POMDP})\n\nReturns the entire action space of a (PO)MDP.\n\n\n\nactions(m::Union{MDP,POMDP}, s)\n\nReturn the actions that can be taken from state s.\n\n\n\nactions(m::POMDP, b)\n\nReturn the actions that can be taken from belief b.\n\nTo implement an observation-dependent action space, use currentobs(b) to get the observation associated with belief b within the implementation of actions(m, b).\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.observations","page":"API Documentation","title":"POMDPs.observations","text":"observations(problem::POMDP)\n\nReturn the entire observation space.\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.isterminal","page":"API Documentation","title":"POMDPs.isterminal","text":"isterminal(m::Union{MDP,POMDP}, s)\n\nCheck if state s is terminal.\n\nIf a state is terminal, no actions will be taken in it and no additional rewards will be accumulated. Thus, the value function at such a state is, by definition, zero.\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.discount","page":"API Documentation","title":"POMDPs.discount","text":"discount(m::POMDP)\ndiscount(m::MDP)\n\nReturn the discount factor for the problem.\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.initialstate","page":"API Documentation","title":"POMDPs.initialstate","text":"initialstate(m::Union{POMDP,MDP})\n\nReturn a distribution of initial states for (PO)MDP m.\n\nIf it is difficult to define the probability density or mass function explicitly, consider using POMDPModelTools.ImplicitDistribution to define a model for sampling.\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.initialobs","page":"API Documentation","title":"POMDPs.initialobs","text":"initialobs(m::POMDP, s)\n\nReturn a distribution of initial observations for POMDP m and state s.\n\nIf it is difficult to define the probability density or mass function explicitly, consider using POMDPModelTools.ImplicitDistribution to define a model for sampling.\n\nThis function is only used in cases where the policy expects an initial observation rather than an initial belief, e.g. in a reinforcement learning setting. It is not used in a standard POMDP simulation.\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.stateindex","page":"API Documentation","title":"POMDPs.stateindex","text":"stateindex(problem::POMDP, s)\nstateindex(problem::MDP, s)\n\nReturn the integer index of state s. Used for discrete models only.\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.actionindex","page":"API Documentation","title":"POMDPs.actionindex","text":"actionindex(problem::POMDP, a)\nactionindex(problem::MDP, a)\n\nReturn the integer index of action a. Used for discrete models only.\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.obsindex","page":"API Documentation","title":"POMDPs.obsindex","text":"obsindex(problem::POMDP, o)\n\nReturn the integer index of observation o. Used for discrete models only.\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.convert_s","page":"API Documentation","title":"POMDPs.convert_s","text":"convert_s(::Type{V}, s, problem::Union{MDP,POMDP}) where V<:AbstractArray\nconvert_s(::Type{S}, vec::V, problem::Union{MDP,POMDP}) where {S,V<:AbstractArray}\n\nConvert a state to vectorized form or vice versa.\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.convert_a","page":"API Documentation","title":"POMDPs.convert_a","text":"convert_a(::Type{V}, a, problem::Union{MDP,POMDP}) where V<:AbstractArray\nconvert_a(::Type{A}, vec::V, problem::Union{MDP,POMDP}) where {A,V<:AbstractArray}\n\nConvert an action to vectorized form or vice versa.\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.convert_o","page":"API Documentation","title":"POMDPs.convert_o","text":"convert_o(::Type{V}, o, problem::Union{MDP,POMDP}) where V<:AbstractArray\nconvert_o(::Type{O}, vec::V, problem::Union{MDP,POMDP}) where {O,V<:AbstractArray}\n\nConvert an observation to vectorized form or vice versa.\n\n\n\n\n\n","category":"function"},{"location":"api/#Type-Inference","page":"API Documentation","title":"Type Inference","text":"","category":"section"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"statetype\nactiontype\nobstype","category":"page"},{"location":"api/#POMDPs.statetype","page":"API Documentation","title":"POMDPs.statetype","text":"statetype(t::Type)\nstatetype(p::Union{POMDP,MDP})\n\nReturn the state type for a problem type (the S in POMDP{S,A,O}).\n\ntype A <: POMDP{Int, Bool, Bool} end\n\nstatetype(A) # returns Int\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.actiontype","page":"API Documentation","title":"POMDPs.actiontype","text":"actiontype(t::Type)\nactiontype(p::Union{POMDP,MDP})\n\nReturn the state type for a problem type (the S in POMDP{S,A,O}).\n\ntype A <: POMDP{Bool, Int, Bool} end\n\nactiontype(A) # returns Int\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.obstype","page":"API Documentation","title":"POMDPs.obstype","text":"obstype(t::Type)\n\nReturn the state type for a problem type (the S in POMDP{S,A,O}).\n\ntype A <: POMDP{Bool, Bool, Int} end\n\nobstype(A) # returns Int\n\n\n\n\n\n","category":"function"},{"location":"api/#Distributions-and-Spaces","page":"API Documentation","title":"Distributions and Spaces","text":"","category":"section"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"rand\npdf\nmode\nmean\nsupport","category":"page"},{"location":"api/#Base.rand","page":"API Documentation","title":"Base.rand","text":"rand(rng::AbstractRNG, d::Any)\n\nReturn a random element from distribution or space d.\n\nIf d is a state or transition distribution, the sample will be a state; if d is an action distribution, the sample will be an action or if d is an observation distribution, the sample will be an observation.\n\n\n\n\n\n","category":"function"},{"location":"api/#Distributions.pdf","page":"API Documentation","title":"Distributions.pdf","text":"pdf(d::Any, x::Any)\n\nEvaluate the probability density of distribution d at sample x.\n\n\n\n\n\n","category":"function"},{"location":"api/#StatsBase.mode","page":"API Documentation","title":"StatsBase.mode","text":"mode(d::Any)\n\nReturn the most likely value in a distribution d.\n\n\n\n\n\n","category":"function"},{"location":"api/#Statistics.mean","page":"API Documentation","title":"Statistics.mean","text":"mean(d::Any)\n\nReturn the mean of a distribution d.\n\n\n\n\n\n","category":"function"},{"location":"api/#Distributions.support","page":"API Documentation","title":"Distributions.support","text":"support(d::Any)\n\nReturn an iterable object containing the possible values that can be sampled from distribution d. Values with zero probability may be skipped.\n\n\n\n\n\n","category":"function"},{"location":"api/#Belief-Functions","page":"API Documentation","title":"Belief Functions","text":"","category":"section"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"update\ninitialize_belief\nhistory\ncurrentobs","category":"page"},{"location":"api/#POMDPs.update","page":"API Documentation","title":"POMDPs.update","text":"update(updater::Updater, belief_old, action, observation)\n\nReturn a new instance of an updated belief given belief_old and the latest action and observation.\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.initialize_belief","page":"API Documentation","title":"POMDPs.initialize_belief","text":"initialize_belief(updater::Updater,\n state_distribution::Any)\ninitialize_belief(updater::Updater, belief::Any)\n\nReturns a belief that can be updated using updater that has similar distribution to state_distribution or belief.\n\nThe conversion may be lossy. This function is also idempotent, i.e. there is a default implementation that passes the belief through when it is already the correct type: initialize_belief(updater::Updater, belief) = belief\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.history","page":"API Documentation","title":"POMDPs.history","text":"history(b)\n\nReturn the action-observation history associated with belief b.\n\nThe history should be an AbstractVector, Tuple, (or similar object that supports indexing with end) full of NamedTuples with keys :a and :o, i.e. history(b)[end][:a] should be the last action taken leading up to b, and history(b)[end][:o] should be the last observation received.\n\nIt is acceptable to return only part of the history if that is all that is available, but it should always end with the current observation. For example, it would be acceptable to return a structure containing only the last three observations in a length 3 Vector{NamedTuple{(:o,),Tuple{O}}.\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.currentobs","page":"API Documentation","title":"POMDPs.currentobs","text":"currentobs(b)\n\nReturn the latest observation associated with belief b.\n\nIf a solver or updater implements history(b) for a belief type, currentobs has a default implementation.\n\n\n\n\n\n","category":"function"},{"location":"api/#Policy-and-Solver-Functions","page":"API Documentation","title":"Policy and Solver Functions","text":"","category":"section"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"solve\nupdater\naction\nvalue","category":"page"},{"location":"api/#POMDPs.solve","page":"API Documentation","title":"POMDPs.solve","text":"solve(solver::Solver, problem::POMDP)\n\nSolves the POMDP using method associated with solver, and returns a policy.\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.updater","page":"API Documentation","title":"POMDPs.updater","text":"updater(policy::Policy)\n\nReturns a default Updater appropriate for a belief type that policy p can use\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.action","page":"API Documentation","title":"POMDPs.action","text":"action(policy::Policy, x)\n\nReturns the action that the policy deems best for the current state or belief, x.\n\nx is a generalized information state - can be a state in an MDP, a distribution in POMDP, or another specialized policy-dependent representation of the information needed to choose an action.\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.value","page":"API Documentation","title":"POMDPs.value","text":"value(p::Policy, s)\nvalue(p::Policy, s, a)\n\nReturns the utility value from policy p given the state (or belief), or state-action (or belief-action) pair.\n\nThe state-action version is commonly referred to as the Q-value.\n\n\n\n\n\n","category":"function"},{"location":"api/#Simulator","page":"API Documentation","title":"Simulator","text":"","category":"section"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"Simulator\nsimulate","category":"page"},{"location":"api/#POMDPs.Simulator","page":"API Documentation","title":"POMDPs.Simulator","text":"Base type for an object defining how simulations should be carried out.\n\n\n\n\n\n","category":"type"},{"location":"api/#POMDPs.simulate","page":"API Documentation","title":"POMDPs.simulate","text":"simulate(sim::Simulator, m::POMDP, p::Policy, u::Updater=updater(p), b0=initialstate(m), s0=rand(b0))\nsimulate(sim::Simulator, m::MDP, p::Policy, s0=rand(initialstate(m)))\n\nRun a simulation using the specified policy.\n\nThe return type is flexible and depends on the simulator. Simulations should adhere to the Simulation Standard.\n\n\n\n\n\n","category":"function"},{"location":"run_simulation/#Running-Simulations","page":"Running Simulations","title":"Running Simulations","text":"","category":"section"},{"location":"run_simulation/","page":"Running Simulations","title":"Running Simulations","text":"Running a simulation consists of two steps, creating a simulator and calling the simulate function. For example, given a POMDP or MDP model m, and a policy p, one can use the RolloutSimulator from POMDPTools to find the accumulated discounted reward from a single simulated trajectory as follows:","category":"page"},{"location":"run_simulation/","page":"Running Simulations","title":"Running Simulations","text":"sim = RolloutSimulator()\nr = simulate(sim, m, p)","category":"page"},{"location":"run_simulation/","page":"Running Simulations","title":"Running Simulations","text":"More inputs, such as a belief updater, initial state, initial belief, etc. may be specified as arguments to simulate. See the docstring for simulate and the appropriate \"Input\" sections in the Simulation Standard page for more information.","category":"page"},{"location":"run_simulation/","page":"Running Simulations","title":"Running Simulations","text":"More examples can be found in the POMDPExamples package. A variety of simulators that return more information and interact in different ways can be found in POMDPTools.","category":"page"},{"location":"simulation/#Simulation-Standard","page":"Simulation Standard","title":"Simulation Standard","text":"","category":"section"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"Important note: In most cases, users need not implement their own simulators. Several simulators that are compatible with the standard in this document are implemented in POMDPTools and allow interaction from a variety of perspectives. Moreover CommonRLInterface.jl provides an OpenAI Gym style environment interface to interact with environments that is more flexible in some cases.","category":"page"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"In order to maintain consistency across the POMDPs.jl ecosystem, this page defines a standard for how simulations should be conducted. All simulators should be consistent with this page, and, if solvers are attempting to find an optimal POMDP policy, they should optimize the expected value of r_total below. In particular, this page should be consulted when questions about how less-obvious concepts like terminal states are handled.","category":"page"},{"location":"simulation/#POMDP-Simulation","page":"Simulation Standard","title":"POMDP Simulation","text":"","category":"section"},{"location":"simulation/#Inputs","page":"Simulation Standard","title":"Inputs","text":"","category":"section"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"In general, POMDP simulations take up to 5 inputs (see also the simulate docstring):","category":"page"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"pomdp::POMDP: pomdp model object (see POMDPs and MDPs)\npolicy::Policy: policy (see Solvers and Policies)\nup::Updater: belief updater (see Beliefs and Updaters)\nb0: initial belief (this may be updater-specific, such as an observation if the updater just returns the previous observation)\ns: initial state","category":"page"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"The last three of these inputs are optional. If they are not explicitly provided, they should be inferred using the following POMDPs.jl functions:","category":"page"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"up =updater(policy)\nb0 = [initialstate](@ref)(pomdp)`\ns = rand(initialstate(pomdp))","category":"page"},{"location":"simulation/#Simulation-Loop","page":"Simulation Standard","title":"Simulation Loop","text":"","category":"section"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"The main simulation loop is shown below. Note that the isterminal check prevents any actions from being taken and reward from being collected from a terminal state.","category":"page"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"Before the loop begins, initialize_belief is called to create the belief based on the initial state distribution - this is especially important when the belief is solver specific, such as the finite-state-machine used by MCVI. ","category":"page"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"b = initialize_belief(up, b0)\n\nr_total = 0.0\nd = 1.0\nwhile !isterminal(pomdp, s)\n a = action(policy, b)\n s, o, r = @gen(:sp,:o,:r)(pomdp, s, a)\n r_total += d*r\n d *= discount(pomdp)\n b = update(up, b, a, o)\nend","category":"page"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"In terms of the explicit interface, the @gen macro above expands to the equivalent of:","category":"page"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":" sp = rand(transition(pomdp, s, a))\n o = rand(observation(pomdp, s, a, sp))\n r = reward(pomdp, s, a, sp, o)\n s = sp","category":"page"},{"location":"simulation/#MDP-Simulation","page":"Simulation Standard","title":"MDP Simulation","text":"","category":"section"},{"location":"simulation/#Inputs-2","page":"Simulation Standard","title":"Inputs","text":"","category":"section"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"In general, MDP simulations take up to 3 inputs (see also the simulate docstring):","category":"page"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"mdp::MDP: mdp model object (see POMDPs and MDPs)\npolicy::Policy: policy (see Solvers and Policies)\ns: initial state","category":"page"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"The last of these inputs is optional. If the initial state is not explicitly provided, it should be generated using","category":"page"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"s = rand(initialstate(mdp))","category":"page"},{"location":"simulation/#Simulation-Loop-2","page":"Simulation Standard","title":"Simulation Loop","text":"","category":"section"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"The main simulation loop is shown below. Note again that the isterminal check prevents any actions from being taken and reward from being collected from a terminal state.","category":"page"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"r_total = 0.0\nd = 1.0\nwhile !isterminal(mdp, s)\n a = action(policy, s)\n s, r = @gen(:sp,:r)(mdp, s, a)\n r_total += d*r\n d *= discount(mdp)\nend","category":"page"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"In terms of the explicit interface, the @gen macro above expands to the equivalent of:","category":"page"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":" sp = rand(transition(pomdp, s, a))\n r = reward(pomdp, s, a, sp)\n s = sp","category":"page"},{"location":"POMDPTools/simulators/#Implemented-Simulators","page":"Implemented Simulators","title":"Implemented Simulators","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"POMDPTools contains a collection of POMDPs.jl simulators.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"Usage examples can be found in the simulation tutorial in the POMDPExamples package.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"If you are just getting started, probably the easiest way to begin is the stepthrough function. Otherwise, consult the Which Simulator Should I Use? guide below:","category":"page"},{"location":"POMDPTools/simulators/#which_simulator","page":"Implemented Simulators","title":"Which Simulator Should I Use?","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"The simulators in this package provide interaction with simulations of MDP and POMDP environments from a variety of perspectives. Use these questions to choose the best simulator to suit your needs.","category":"page"},{"location":"POMDPTools/simulators/#I-want-to-run-fast-rollout-simulations-and-get-the-discounted-reward.","page":"Implemented Simulators","title":"I want to run fast rollout simulations and get the discounted reward.","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"Use the Rollout Simulator.","category":"page"},{"location":"POMDPTools/simulators/#I-want-to-evaluate-performance-with-many-parallel-Monte-Carlo-simulations.","page":"Implemented Simulators","title":"I want to evaluate performance with many parallel Monte Carlo simulations.","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"Use the Parallel Simulator.","category":"page"},{"location":"POMDPTools/simulators/#I-want-to-closely-examine-the-histories-of-states,-actions,-etc.-produced-by-simulations.","page":"Implemented Simulators","title":"I want to closely examine the histories of states, actions, etc. produced by simulations.","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"Use the History Recorder.","category":"page"},{"location":"POMDPTools/simulators/#I-want-to-step-through-each-individual-step-of-a-simulation.","page":"Implemented Simulators","title":"I want to step through each individual step of a simulation.","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"Use the stepthrough function.","category":"page"},{"location":"POMDPTools/simulators/#I-want-to-visualize-a-simulation.","page":"Implemented Simulators","title":"I want to visualize a simulation.","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"Use the DisplaySimulator.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"Also see the POMDPGifs package for creating gif animations.","category":"page"},{"location":"POMDPTools/simulators/#I-want-to-interact-with-a-MDP-or-POMDP-environment-from-the-policy's-perspective","page":"Implemented Simulators","title":"I want to interact with a MDP or POMDP environment from the policy's perspective","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"Use the sim function.","category":"page"},{"location":"POMDPTools/simulators/#Stepping-through","page":"Implemented Simulators","title":"Stepping through","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"The stepthrough function exposes a simulation as an iterator so that the steps can be iterated through with a for loop syntax as follows:","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"pomdp = BabyPOMDP()\npolicy = RandomPolicy(pomdp)\n\nfor (s, a, o, r) in stepthrough(pomdp, policy, \"s,a,o,r\", max_steps=10)\n println(\"in state $s\")\n println(\"took action $a\")\n println(\"received observation $o and reward $r\")\nend","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"More examples can be found in the POMDPExamples Package.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"stepthrough","category":"page"},{"location":"POMDPTools/simulators/#POMDPTools.Simulators.stepthrough","page":"Implemented Simulators","title":"POMDPTools.Simulators.stepthrough","text":"stepthrough(problem, policy, [spec])\nstepthrough(problem, policy, [spec], [rng=rng], [max_steps=max_steps])\nstepthrough(mdp::MDP, policy::Policy, [init_state], [spec]; [kwargs...])\nstepthrough(pomdp::POMDP, policy::Policy, [up::Updater, [initial_belief, [initial_state]]], [spec]; [kwargs...])\n\nCreate a simulation iterator. This is intended to be used with for loop syntax to output the results of each step as the simulation is being run. \n\nExample:\n\npomdp = BabyPOMDP()\npolicy = RandomPolicy(pomdp)\n\nfor (s, a, o, r) in stepthrough(pomdp, policy, \"s,a,o,r\", max_steps=10)\n println(\"in state $s\")\n println(\"took action $a\")\n println(\"received observation $o and reward $r\")\nend\n\nThe optional spec argument can be a string, tuple of symbols, or single symbol and follows the same pattern as eachstep called on a SimHistory object.\n\nUnder the hood, this function creates a StepSimulator with spec and returns a [PO]MDPSimIterator by calling simulate with all of the arguments except spec. All keyword arguments are passed to the StepSimulator constructor.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"The StepSimulator contained in this file can provide the same functionality with the following syntax:","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"sim = StepSimulator(\"s,a,r,sp\")\nfor (s,a,r,sp) in simulate(sim, problem, policy)\n # do something\nend","category":"page"},{"location":"POMDPTools/simulators/#Rollouts","page":"Implemented Simulators","title":"Rollouts","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"RolloutSimulator is the simplest MDP or POMDP simulator. When simulate is called, it simply simulates a single trajectory of the process and returns the discounted reward.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"rs = RolloutSimulator()\nmdp = GridWorld()\npolicy = RandomPolicy(mdp)\n\nr = simulate(rs, mdp, policy)","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"More examples can be found in the POMDPExamples Package","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"RolloutSimulator","category":"page"},{"location":"POMDPTools/simulators/#POMDPTools.Simulators.RolloutSimulator","page":"Implemented Simulators","title":"POMDPTools.Simulators.RolloutSimulator","text":"RolloutSimulator(rng, max_steps)\nRolloutSimulator(; )\n\nA fast simulator that just returns the reward\n\nThe simulation will be terminated when either\n\na terminal state is reached (as determined by isterminal() or\nthe discount factor is as small as eps or\nmax_steps have been executed\n\nKeyword arguments:\n\nrng::AbstractRNG (default: Random.default_rng()) - A random number generator to use. \neps::Float64 (default: 0.0) - A small number; if γᵗ where γ is the discount factor and t is the time step becomes smaller than this, the simulation will be terminated.\nmax_steps::Int (default: typemax(Int)) - The maximum number of steps to simulate.\n\nUsage (optional arguments in brackets):\n\nro = RolloutSimulator()\nhistory = simulate(ro, pomdp, policy, [updater [, init_belief [, init_state]]])\n\nSee also: HistoryRecorder, run_parallel\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/simulators/#History-Recorder","page":"Implemented Simulators","title":"History Recorder","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"A HistoryRecorder runs a simulation and records the trajectory. It returns an AbstractVector of NamedTuples - see Histories for more info.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"hr = HistoryRecorder(max_steps=100)\npomdp = TigerPOMDP()\npolicy = RandomPolicy(pomdp)\n\nh = simulate(hr, pomdp, policy)","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"More examples can be found in the POMDPExamples Package.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"HistoryRecorder","category":"page"},{"location":"POMDPTools/simulators/#POMDPTools.Simulators.HistoryRecorder","page":"Implemented Simulators","title":"POMDPTools.Simulators.HistoryRecorder","text":"A simulator that records the history for later examination\n\nThe simulation will be terminated when either\n\na terminal state is reached (as determined by isterminal() or\nthe discount factor is as small as eps or\nmax_steps have been executed\n\nKeyword Arguments: - rng: The random number generator for the simulation - capture_exception::Bool: whether to capture an exception and store it in the history, or let it go uncaught, potentially killing the script - show_progress::Bool: show a progress bar for the simulation - eps - max_steps\n\nUsage (optional arguments in brackets):\n\nhr = HistoryRecorder()\nhistory = simulate(hr, pomdp, policy, [updater [, init_belief [, init_state]]])\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/simulators/#sim-function","page":"Implemented Simulators","title":"sim()","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"The sim function provides a convenient way to interact with a POMDP or MDP environment and return a history. The first argument is a function that is called at every time step and takes a state (in the case of an MDP) or an observation (in the case of a POMDP) as the argument and then returns an action. The second argument is a pomdp or mdp. It is intended to be used with Julia's do syntax as follows:","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"pomdp = TigerPOMDP()\nhistory = sim(pomdp, max_steps=10) do obs\n println(\"Observation was $obs.\")\n return TIGER_OPEN_LEFT\nend","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"This allows a flexible and general way to interact with a POMDP environment without creating new Policy types.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"In the POMDP case, an updater can optionally be supplied as an additional positional argument if the policy function works with beliefs rather than directly with observations.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"More examples can be found in the POMDPExamples Package","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"More examples can be found in the POMDPExamples Package","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"sim","category":"page"},{"location":"POMDPTools/simulators/#POMDPTools.Simulators.sim","page":"Implemented Simulators","title":"POMDPTools.Simulators.sim","text":"sim(polfunc::Function, mdp::MDP; [])\nsim(polfunc::Function, pomdp::POMDP; [])\n\nAlternative way of running a simulation with a function specifying how to calculate the action at each timestep.\n\nUsage\n\nsim(mdp) do s\n # code that calculates action `a` based on `s` - this is the policy\n # you can also do other things like display something\n return a\nend\n\nfor an MDP or\n\nsim(pomdp) do o\n # code that calculates 'a' based on observation `o`\n # optionally you could save 'o' in a global variable or do a belief update\n return a\nend\n\nor with a POMDP\n\nsim(pomdp, updater) do b\n # code that calculates 'a' based on belief `b`\n # `b` is calculated by `updater`\n return a\nend\n\nfor a POMDP and a belief updater.\n\nKeyword Arguments\n\nAll Versions\n\ninitialstate: the initial state for the simulation\nsimulator: keyword argument to specify any simulator to run the simulation. If nothing is specified for the simulator, a HistoryRecorder will be used as the simulator, with all keyword arguments forwarded to it, e.g.\nsim(mdp, max_steps=100, show_progress=true) do s\n # ...\nend\nwill limit the simulation to 100 steps.\n\nPOMDP version\n\ninitialobs: this will control the initial observation given to the policy function. If this is not defined, rand(initialobs(m, s)) will be used if it is available. If it is not, missing will be used.\n\nPOMDP and updater version\n\ninitialbelief: initialize_belief(updater, initialbelief) is the first belief that will be given to the policy function.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/simulators/#Histories","page":"Implemented Simulators","title":"Histories","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"The results produced by HistoryRecorders and the sim function are contained in SimHistory objects.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"SimHistory","category":"page"},{"location":"POMDPTools/simulators/#POMDPTools.Simulators.SimHistory","page":"Implemented Simulators","title":"POMDPTools.Simulators.SimHistory","text":"SimHistory\n\nAn (PO)MDP simulation history returned by simulate(::HistoryRecorder, ::Union{MDP,POMDP},...).\n\nThis is an AbstractVector of NamedTuples containing the states, actions, etc.\n\nExamples\n\nhist[1][:s] # returns the first state in the history\n\nhist[:a] # returns all of the actions in the history\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/simulators/#Examples","page":"Implemented Simulators","title":"Examples","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"using POMDPs, POMDPTools, POMDPModels\nhr = HistoryRecorder(max_steps=10)\nhist = simulate(hr, BabyPOMDP(), FunctionPolicy(x->true))\nstep = hist[1] # all information available about the first step\nstep[:s] # the first state\nstep[:a] # the first action","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"To see everything available in a step, use","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"keys(first(hist))","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"The entire history of each variable is available by using a Symbol instead of an index, i.e.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"hist[:s]","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"will return a vector of the starting states for each step (note the difference between :s and :sp).","category":"page"},{"location":"POMDPTools/simulators/#eachstep","page":"Implemented Simulators","title":"eachstep","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"The eachstep function may also be useful:","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"eachstep","category":"page"},{"location":"POMDPTools/simulators/#POMDPTools.Simulators.eachstep","page":"Implemented Simulators","title":"POMDPTools.Simulators.eachstep","text":"for t in eachstep(hist, [spec])\n ...\nend\n\nIterate through the steps in SimHistory hist. spec is a tuple of symbols or string that controls what is returned for each step.\n\nFor example,\n\nfor (s, a, r, sp) in eachstep(h, \"(s, a, r, sp)\") \n println(\"reward $r received when state $sp was reached after action $a was taken in state $s\")\nend\n\nreturns the start state, action, reward and destination state for each step of the simulation.\n\nAlternatively, instead of expanding the steps implicitly, the elements of the step can be accessed as fields (since each step is a NamedTuple):\n\nfor step in eachstep(h, \"(s, a, r, sp)\") \n println(\"reward $(step.r) received when state $(step.sp) was reached after action $(step.a) was taken in state $(step.s)\")\nend\n\nThe possible valid elements in the iteration specification are\n\nAny node in the (PO)MDP Dynamic Decision network (by default :s, :a, :sp, :o, :r)\nb - the initial belief in the step (for POMDPs only)\nbp - the belief after being updated based on o (for POMDPs only)\naction_info - info from the policy decision (from action_info)\nupdate_info - info from the belief update (from update_info)\nt - the timestep index\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/simulators/#Examples:","page":"Implemented Simulators","title":"Examples:","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"collect(eachstep(h, \"a,o\"))","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"will produce a vector of action-observation named tuples.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"collect(norm(sp-s) for (s,sp) in eachstep(h, \"s,sp\"))","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"will produce a vector of the distances traveled on each step (assuming the state is a Euclidean vector).","category":"page"},{"location":"POMDPTools/simulators/#Notes","page":"Implemented Simulators","title":"Notes","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"The iteration specification can be specified as a tuple of symbols (e.g. (:s, :a)) instead of a string.\nFor type stability in performance-critical code, one should construct an iterator directly using HistoryIterator{typeof(h), (:a,:r)}(h) rather than eachstep(h, \"ar\").","category":"page"},{"location":"POMDPTools/simulators/#Other-Functions","page":"Implemented Simulators","title":"Other Functions","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"state_hist(h), action_hist(h), observation_hist(h) belief_hist(h), and reward_hist(h) will return vectors of the states, actions, and rewards, and undiscounted_reward(h) and discounted_reward(h) will return the total rewards collected over the trajectory. n_steps(h) returns the number of steps in the history. exception(h) and backtrace(h) can be used to hold an exception if the simulation failed to finish.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"view(h, range) (e.g. view(h, 1:n_steps(h)-4)) can be used to create a view of the history object h that only contains a certain range of steps. The object returned by view is an AbstractSimHistory that can be iterated through and manipulated just like a complete SimHistory.","category":"page"},{"location":"POMDPTools/simulators/#Parallel","page":"Implemented Simulators","title":"Parallel","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"POMDPTools contains a utility for running many Monte Carlo simulations in parallel to evaluate performance. The basic workflow involves the following steps:","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"Create a vector of Sim objects, each specifying how a single simulation should be run.\nUse the run_parallel or run function to run the simulations.\nAnalyze the results of the simulations contained in the DataFrame returned by run_parallel.","category":"page"},{"location":"POMDPTools/simulators/#Example","page":"Implemented Simulators","title":"Example","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"An example can be found in the POMDPExamples Package.","category":"page"},{"location":"POMDPTools/simulators/#Sim-objects","page":"Implemented Simulators","title":"Sim objects","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"Each simulation should be specified by a Sim object which contains all the information needed to run a simulation, including the Simulator, POMDP or MDP, Policy, Updater, and any other ingredients.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"Sim","category":"page"},{"location":"POMDPTools/simulators/#POMDPTools.Simulators.Sim","page":"Implemented Simulators","title":"POMDPTools.Simulators.Sim","text":"Sim(m::MDP, p::Policy[, initialstate]; kwargs...)\nSim(m::POMDP, p::Policy[, updater[, initial_belief[, initialstate]]]; kwargs...)\n\nCreate a Sim object that contains everything needed to run and record a single simulation, including model, initial conditions, and metadata.\n\nA vector of Sim objects can be executed with run or run_parallel.\n\nKeyword Arguments\n\nrng::AbstractRNG=Random.default_rng()\nmax_steps::Int=typemax(Int)\nsimulator::Simulator=HistoryRecorder(rng=rng, max_steps=max_steps)\nmetadata::NamedTuple a named tuple (or dictionary) of metadata for the sim that will be recorded, e.g.(solver_iterations=500,)`.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/simulators/#Running-simulations","page":"Implemented Simulators","title":"Running simulations","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"The simulations are actually carried out by the run and run_parallel functions.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"run_parallel","category":"page"},{"location":"POMDPTools/simulators/#POMDPTools.Simulators.run_parallel","page":"Implemented Simulators","title":"POMDPTools.Simulators.run_parallel","text":"run_parallel(queue::Vector{Sim})\nrun_parallel(f::Function, queue::Vector{Sim})\n\nRun Sim objects in queue in parallel and return results as a DataFrame.\n\nBy default, the DataFrame will contain the reward for each simulation and the metadata provided to the sim.\n\nArguments\n\nqueue: List of Sim objects to be executed\nf: Function to process the results of each simulation\n\nThis function should take two arguments, (1) the Sim that was executed and (2) the result of the simulation, by default a SimHistory. It should return a named tuple that will appear in the dataframe. See Examples below.\n\nKeyword Arguments\n\nshow_progress::Bool: whether or not to show a progress meter\nprogress::ProgressMeter.Progress: determines how the progress meter is displayed\n\nExamples\n\nrun_parallel(queue) do sim, hist\n return (n_steps=n_steps(hist), reward=discounted_reward(hist))\nend\n\nwill return a dataframe with with the number of steps and the reward in it.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"The run function is also provided to run simulations in serial (this is often useful for debugging). Note that the documentation below also contains a section for the builtin julia run function, even though it is not relevant here.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"run","category":"page"},{"location":"POMDPTools/simulators/#Base.run","page":"Implemented Simulators","title":"Base.run","text":"run(queue::Vector{Sim})\nrun(f::Function, queue::Vector{Sim})\n\nRun the Sim objects in queue on a single process and return the results as a dataframe.\n\nSee run_parallel for more information.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/simulators/#Specifying-information-to-be-recorded","page":"Implemented Simulators","title":"Specifying information to be recorded","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"By default, only the discounted rewards from each simulation are recorded, but arbitrary information can be recorded.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"The run_parallel and run functions accept a function (normally specified via the do syntax) that takes the Sim object and history of the simulation and extracts relevant statistics as a named tuple. For example, if the desired characteristics are the number of steps in the simulation and the reward, run_parallel would be invoked as follows:","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"df = run_parallel(queue) do sim::Sim, hist::SimHistory\n return (n_steps=n_steps(hist), reward=discounted_reward(hist))\nend","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"These statistics are combined into a DataFrame, with each line representing a single simulation, allowing for statistical analysis. For example,","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"mean(df[:reward]./df[:n_steps])","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"would compute the average reward per step with each simulation weighted equally regardless of length.","category":"page"},{"location":"POMDPTools/simulators/#Display","page":"Implemented Simulators","title":"Display","text":"","category":"section"},{"location":"POMDPTools/simulators/#DisplaySimulator","page":"Implemented Simulators","title":"DisplaySimulator","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"The DisplaySimulator displays each step of a simulation in real time through a multimedia display such as a Jupyter notebook or ElectronDisplay. Specifically it uses POMDPTools.render and the built-in Julia display function to visualize each step.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"Example:","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"using POMDPs\nusing POMDPModels\nusing POMDPTools\nusing ElectronDisplay\nElectronDisplay.CONFIG.single_window = true\n\nds = DisplaySimulator()\nm = SimpleGridWorld()\nsimulate(ds, m, RandomPolicy(m))","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"DisplaySimulator","category":"page"},{"location":"POMDPTools/simulators/#POMDPTools.Simulators.DisplaySimulator","page":"Implemented Simulators","title":"POMDPTools.Simulators.DisplaySimulator","text":"DisplaySimulator(;kwargs...)\n\nCreate a simulator that displays each step of a simulation.\n\nGiven a POMDP or MDP model m, this simulator roughly works like\n\nfor step in stepthrough(m, ...)\n display(render(m, step))\nend\n\nKeyword Arguments\n\ndisplay::AbstractDisplay: the display to use for the first argument to the display function. If this is nothing, display(...) will be called without an AbstractDisplay argument.\nrender_kwargs::NamedTuple: keyword arguments for POMDPTools.render(...)\nmax_fps::Number=10: maximum number of frames to be displayed per second - sleep will be used to skip extra time, so this is not designed for high precision\npredisplay::Function: function to call before every call to display(...). The only argument to this function will be the display (if it is specified) or nothing\nextra_initial::Bool=false: if true, display an extra step at the beginning with only elements t, sp, and bp for POMDPs (this can be useful to see the initial state if render displays only sp and not s).\nextra_final::Bool=true: iftrue, display an extra step at the end with only elementst,done,s, andbfor POMDPs (this can be useful to see the final state ifrenderdisplays onlysand notsp`).\nmax_steps::Integer: maximum number of steps to run for\nspec::NTuple{Symbol}: specification of what step elements to display (see eachstep)\nrng::AbstractRNG: random number generator\n\nSee the POMDPSimulators documentation for more tips about using specific displays.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/simulators/#Display-specific-tips","page":"Implemented Simulators","title":"Display-specific tips","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"The following tips may be helpful when using particular displays.","category":"page"},{"location":"POMDPTools/simulators/#Jupyter-notebooks","page":"Implemented Simulators","title":"Jupyter notebooks","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"By default, in a Jupyter notebook, the visualizations of all steps are displayed in the output box one after another. To make the output animated instead, where the image is overwritten at each step, one may use","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"DisplaySimulator(predisplay=(d)->IJulia.clear_output(true))","category":"page"},{"location":"POMDPTools/simulators/#ElectronDisplay","page":"Implemented Simulators","title":"ElectronDisplay","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"By default, ElectronDisplay will open a new window for each new step. To prevent this, use","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"ElectronDisplay.CONFIG.single_window = true","category":"page"},{"location":"POMDPTools/testing/#Testing","page":"Testing","title":"Testing","text":"","category":"section"},{"location":"POMDPTools/testing/","page":"Testing","title":"Testing","text":"POMDPTools contains basic utilities for testing models and solvers.","category":"page"},{"location":"POMDPTools/testing/#Testing-(PO)MDP-Models","page":"Testing","title":"Testing (PO)MDP Models","text":"","category":"section"},{"location":"POMDPTools/testing/","page":"Testing","title":"Testing","text":"has_consistent_distributions\nhas_consistent_initial_distribution\nhas_consistent_transition_distributions\nhas_consistent_observation_distributions","category":"page"},{"location":"POMDPTools/testing/#POMDPTools.Testing.has_consistent_distributions","page":"Testing","title":"POMDPTools.Testing.has_consistent_distributions","text":"has_consistent_distributions(m::MDP; atol=0)\nhas_consistent_distributions(m::POMDP; atol=0)\n\nReturn true if no problems are found in the distributions for a discrete problem. Print information and return false if problems are found.\n\nTests whether\n\nAll probabilities are positive\nProbabilities for all distributions sum to 1\nAll items with positive probability are in the support\n\nKeyword Arguments\n\natol: absolute tolerance passed to approx for all probability checks\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/testing/#POMDPTools.Testing.has_consistent_initial_distribution","page":"Testing","title":"POMDPTools.Testing.has_consistent_initial_distribution","text":"has_consistent_initial_distribution(m; atol=0)\n\nReturn true if no problems are found with the initial state distribution for a discrete problem. Print information and return false if problems are found.\n\nSee has_consistent_distributions for information on what checks are performed.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/testing/#POMDPTools.Testing.has_consistent_transition_distributions","page":"Testing","title":"POMDPTools.Testing.has_consistent_transition_distributions","text":"has_consistent_transition_distributions(m; atol=0)\n\nReturn true if no problems are found in the transition distributions for a discrete problem. Print information and return false if problems are found.\n\nSee has_consistent_distributions for information on what checks are performed.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/testing/#POMDPTools.Testing.has_consistent_observation_distributions","page":"Testing","title":"POMDPTools.Testing.has_consistent_observation_distributions","text":"has_consistent_observation_distributions(m; atol=0)\n\nReturn true if no problems are found in the observation distributions for a discrete POMDP. Print information and return false if problems are found.\n\nSee has_consistent_distributions for information on what checks are performed.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/testing/#Testing-Solvers","page":"Testing","title":"Testing Solvers","text":"","category":"section"},{"location":"POMDPTools/testing/","page":"Testing","title":"Testing","text":"test_solver","category":"page"},{"location":"POMDPTools/testing/#POMDPTools.Testing.test_solver","page":"Testing","title":"POMDPTools.Testing.test_solver","text":"test_solver(solver::Solver, problem::POMDP)\ntest_solver(solver::Solver, problem::MDP)\n\nUse the solver to solve the specified problem, then run a simulation.\n\nThis is designed to illustrate how solvers are expected to function. All solvers should be able to complete this standard test with the simple models in the POMDPModels package.\n\nNote that this does NOT test the optimality of the solution, but is only a smoke test to see if the solver interacts with POMDP models as expected.\n\nTo run this with a solver called YourSolver, run\n\nusing POMDPToolbox\nusing POMDPModels\n\nsolver = YourSolver(# initialize with parameters #)\ntest_solver(solver, BabyPOMDP())\n\n\n\n\n\n","category":"function"},{"location":"offline_solver/#Example:-Defining-an-offline-solver","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"","category":"section"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"In this example, we will define a simple offline solver that works for both POMDPs and MDPs. In order to focus on the code structure, we will not create an algorithm that finds an optimal policy, but rather a greedy policy, that is, one that optimizes the expected immediate reward. For information on using this solver in a simulation, see Running Simulations.","category":"page"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"We begin by creating a solver type. Since there are no adjustable parameters for the solver, it is an empty type, but for a more complex solver, parameters would usually be included as type fields.","category":"page"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"using POMDPs\n\nstruct GreedyOfflineSolver <: Solver end","category":"page"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"Next, we define the functions that will make the solver work for both MDPs and POMDPs.","category":"page"},{"location":"offline_solver/#MDP-Case","page":"Example: Defining an offline solver","title":"MDP Case","text":"","category":"section"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"Finding a greedy policy for an MDP consists of determining the action that has the best reward for each state. First, we create a simple policy object that holds a greedy action for each state.","category":"page"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"struct DictPolicy{S,A} <: Policy\n actions::Dict{S,A}\nend\n\nPOMDPs.action(p::DictPolicy, s) = p.actions[s]","category":"page"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"note: Note\nA POMDPTools.VectorPolicy could be used here. We include this example to show how to define a custom policy.","category":"page"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"The solve function calculates the best greedy action for each state and saves it in a policy. To have the widest possible compatibility with POMDP models, we want to use reward(m, s, a, sp) instead of reward(m, s, a), which means we need to calculate the expectation of the reward over transitions to every possible next state.","category":"page"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"function POMDPs.solve(::GreedyOfflineSolver, m::MDP)\n\n best_actions = Dict{statetype(m), actiontype(m)}()\n\n for s in states(m)\n if !isterminal(m, s)\n best = -Inf\n for a in actions(m)\n td = transition(m, s, a)\n r = 0.0\n for sp in support(td)\n r += pdf(td, sp) * reward(m, s, a, sp)\n end\n if r >= best\n best_actions[s] = a\n best = r\n end\n end\n end\n end\n \n return DictPolicy(best_actions)\nend","category":"page"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"note: Note\nWe limited this implementation to using basic POMDPs.jl implementation functions, but tools such as POMDPTools.StateActionReward, POMDPTools.ordered_states, and POMDPTools.weighted_iterator could have been used for a more concise and efficient implementation.","category":"page"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"We can now verify whether the policy produces the greedy action on an example from POMDPModels:","category":"page"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"using POMDPModels\n\ngw = SimpleGridWorld(size=(2,1), rewards=Dict(GWPos(2,1)=>1.0))\npolicy = solve(GreedyOfflineSolver(), gw)\n\naction(policy, GWPos(1,1))\n\n# output\n\n:right","category":"page"},{"location":"offline_solver/#POMDP-Case","page":"Example: Defining an offline solver","title":"POMDP Case","text":"","category":"section"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"For a POMDP, the greedy solution is the action that maximizes the expected immediate reward according to the belief. Since there are an infinite number of possible beliefs, the greedy solution for every belief cannot be calculated online. However, the greedy policy can take the form of an alpha vector policy where each action has an associated alpha vector with each entry corresponding to the immediate reward from taking the action in that state.","category":"page"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"Again, because a POMDP, may have reward(m, s, a, sp, o) instead of reward(m, s, a), we use the former and calculate the expectation over all next states and observations.","category":"page"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"using POMDPTools: AlphaVectorPolicy\n\nfunction POMDPs.solve(::GreedyOfflineSolver, m::POMDP)\n\n alphas = Vector{Float64}[]\n\n for a in actions(m)\n alpha = zeros(length(states(m)))\n for s in states(m)\n if !isterminal(m, s)\n r = 0.0\n td = transition(m, s, a)\n for sp in support(td)\n tp = pdf(td, sp)\n od = observation(m, s, a, sp)\n for o in support(od)\n r += tp * pdf(od, o) * reward(m, s, a, sp, o)\n end\n end\n alpha[stateindex(m, s)] = r\n end\n end\n push!(alphas, alpha)\n end\n \n return AlphaVectorPolicy(m, alphas, collect(actions(m)))\nend","category":"page"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"We can now verify that a policy created by the solver determines the correct greedy actions:","category":"page"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"using POMDPModels\nusing POMDPTools: Deterministic, Uniform\n\ntiger = TigerPOMDP()\npolicy = solve(GreedyOfflineSolver(), tiger)\n\n@assert action(policy, Deterministic(TIGER_LEFT)) == TIGER_OPEN_RIGHT\n@assert action(policy, Deterministic(TIGER_RIGHT)) == TIGER_OPEN_LEFT\n@assert action(policy, Uniform(states(tiger))) == TIGER_LISTEN","category":"page"},{"location":"def_solver/#Solvers","page":"Solvers","title":"Solvers","text":"","category":"section"},{"location":"def_solver/","page":"Solvers","title":"Solvers","text":"Defining a solver involves creating or using four pieces of code:","category":"page"},{"location":"def_solver/","page":"Solvers","title":"Solvers","text":"A subtype of Solver that holds the parameters and configuration options for the solver.\nA subtype of Policy that holds all of the data needed to choose actions online.\nA method of solve that takes the Solver and a (PO)MDP as arguments, performs all of the offline computations for solving the problem, and returns the policy.\nA method of action that takes in the policy and a state or belief and returns an action.","category":"page"},{"location":"def_solver/","page":"Solvers","title":"Solvers","text":"In many cases, items 2 and 4 can be satisfied with an off-the-shelf Policy from the POMDPTools package. also contains many tools that are useful for defining solvers in a robust, concise, and readable manner.","category":"page"},{"location":"def_solver/#Online-and-Offline-Solvers","page":"Solvers","title":"Online and Offline Solvers","text":"","category":"section"},{"location":"def_solver/","page":"Solvers","title":"Solvers","text":"Generally, solvers can be grouped into two categories: Offline solvers that do most of their computational work before interacting with the environment, and online solvers that do their work online as each new state or observation is encountered. Although offline and online solvers both use the exact same Solver, solve, Policy, action structure, the work of defining online and offline solvers is focused on different portions.","category":"page"},{"location":"def_solver/","page":"Solvers","title":"Solvers","text":"For an offline solver, most of the implementation effort will be spent on the [solve] function, and an off-the-shelf policy from POMDPTools will typically be used.","category":"page"},{"location":"def_solver/","page":"Solvers","title":"Solvers","text":"For an online solver, the solve function typically does little or no work, but merely creates a Policy object that will carry out computation online. It is typical in POMDPs.jl to use the term \"Planner\" to name a Policy object for an online solver that carries out a large amount of computation (\"planning\") at interaction time. In this case most of the effort will be focused on implementing the action method for the \"Planner\" Policy type.","category":"page"},{"location":"def_solver/#Examples","page":"Solvers","title":"Examples","text":"","category":"section"},{"location":"def_solver/","page":"Solvers","title":"Solvers","text":"Solver implementation is most clearly explained through examples. The following sections contain examples of both online and offline solver definitions:","category":"page"},{"location":"def_solver/","page":"Solvers","title":"Solvers","text":"Pages = [\"offline_solver.md\", \"online_solver.md\"]","category":"page"},{"location":"online_solver/#Example:-Defining-an-online-solver","page":"Example: Defining an online solver","title":"Example: Defining an online solver","text":"","category":"section"},{"location":"online_solver/","page":"Example: Defining an online solver","title":"Example: Defining an online solver","text":"In this example, we will define a simple online solver that works for both POMDPs and MDPs. In order to focus on the code structure, we will not create an algorithm that finds an optimal policy, but rather a greedy policy, that is, one that optimizes the expected immediate reward. For information on using this solver in a simulation, see Running Simulations.","category":"page"},{"location":"online_solver/","page":"Example: Defining an online solver","title":"Example: Defining an online solver","text":"In order to handle the widest range of problems, we will use @gen to generate Mone Carlo samples to estimate the reward even if only a simulator is available. We begin by creating the necessary types and the solve function. The only solver parameter is the number of samples used to estimate the reward at each step, and the solve function does nothing more than create a planner with the appropriate (PO)MDP problem definition.","category":"page"},{"location":"online_solver/","page":"Example: Defining an online solver","title":"Example: Defining an online solver","text":"using POMDPs\n\nstruct MonteCarloGreedySolver <: Solver\n num_samples::Int\nend\n\nstruct MonteCarloGreedyPlanner{M} <: Policy\n m::M\n num_samples::Int\nend\n\nPOMDPs.solve(sol::MonteCarloGreedySolver, m) = MonteCarloGreedyPlanner(m, sol.num_samples)","category":"page"},{"location":"online_solver/","page":"Example: Defining an online solver","title":"Example: Defining an online solver","text":"Next, we define the action function where the online work takes place.","category":"page"},{"location":"online_solver/#MDP-Case","page":"Example: Defining an online solver","title":"MDP Case","text":"","category":"section"},{"location":"online_solver/","page":"Example: Defining an online solver","title":"Example: Defining an online solver","text":"function POMDPs.action(p::MonteCarloGreedyPlanner{<:MDP}, s)\n best_reward = -Inf\n local best_action\n for a in actions(p.m)\n reward_sum = sum(@gen(:r)(p.m, s, a) for _ in 1:p.num_samples)\n if reward_sum >= best_reward\n best_reward = reward_sum\n best_action = a\n end\n end\n return best_action\nend","category":"page"},{"location":"online_solver/#POMDP-Case","page":"Example: Defining an online solver","title":"POMDP Case","text":"","category":"section"},{"location":"online_solver/","page":"Example: Defining an online solver","title":"Example: Defining an online solver","text":"function POMDPs.action(p::MonteCarloGreedyPlanner{<:POMDP}, b)\n best_reward = -Inf\n local best_action\n for a in actions(p.m)\n s = rand(b)\n reward_sum = sum(@gen(:r)(p.m, s, a) for _ in 1:p.num_samples)\n if reward_sum >= best_reward\n best_reward = reward_sum\n best_action = a\n end\n end\n return best_action\nend\n\n# output\n","category":"page"},{"location":"online_solver/#Verification","page":"Example: Defining an online solver","title":"Verification","text":"","category":"section"},{"location":"online_solver/","page":"Example: Defining an online solver","title":"Example: Defining an online solver","text":"We can now verify that the online planner works in some simple cases:","category":"page"},{"location":"online_solver/","page":"Example: Defining an online solver","title":"Example: Defining an online solver","text":"using POMDPModels\n\ngw = SimpleGridWorld(size=(2,1), rewards=Dict(GWPos(2,1)=>1.0))\nsolver = MonteCarloGreedySolver(1000)\nplanner = solve(solver, gw)\n\naction(planner, GWPos(1,1))\n\n# output\n\n:right","category":"page"},{"location":"online_solver/","page":"Example: Defining an online solver","title":"Example: Defining an online solver","text":"using POMDPModels\nusing POMDPTools: Deterministic, Uniform\n\ntiger = TigerPOMDP()\nsolver = MonteCarloGreedySolver(1000)\n\nplanner = solve(solver, tiger)\n\n@assert action(planner, Deterministic(TIGER_LEFT)) == TIGER_OPEN_RIGHT\n@assert action(planner, Deterministic(TIGER_RIGHT)) == TIGER_OPEN_LEFT\n# note action(planner, Uniform(states(tiger))) is not very reliable with this number of samples","category":"page"},{"location":"get_started/#Getting-Started","page":"Getting Started","title":"Getting Started","text":"","category":"section"},{"location":"get_started/","page":"Getting Started","title":"Getting Started","text":"Before writing our own POMDP problems or solvers, let's try out some of the available solvers and problem models available in JuliaPOMDP.","category":"page"},{"location":"get_started/","page":"Getting Started","title":"Getting Started","text":"Here is a short piece of code that solves the Tiger POMDP using QMDP, and evaluates the results. Note that you must have the QMDP, POMDPModels, and POMDPToolbox modules installed.","category":"page"},{"location":"get_started/","page":"Getting Started","title":"Getting Started","text":"using POMDPs, QMDP, POMDPModels, POMDPTools\n\n# initialize problem and solver\npomdp = TigerPOMDP() # from POMDPModels\nsolver = QMDPSolver() # from QMDP\n\n# compute a policy\npolicy = solve(solver, pomdp)\n\n#evaluate the policy\nbelief_updater = updater(policy) # the default QMDP belief updater (discrete Bayesian filter)\ninit_dist = initialstate(pomdp) # from POMDPModels\nhr = HistoryRecorder(max_steps=100) # from POMDPTools\nhist = simulate(hr, pomdp, policy, belief_updater, init_dist) # run 100 step simulation\nprintln(\"reward: $(discounted_reward(hist))\")","category":"page"},{"location":"get_started/","page":"Getting Started","title":"Getting Started","text":"The first part of the code loads the desired packages and initializes the problem and the solver. Next, we compute a POMDP policy. Lastly, we evaluate the results.","category":"page"},{"location":"get_started/","page":"Getting Started","title":"Getting Started","text":"There are a few things to mention here. First, the TigerPOMDP type implements all the functions required by QMDPSolver to compute a policy. Second, each policy has a default updater (essentially a filter used to update the belief of the POMDP). To learn more about Updaters check out the Concepts section.","category":"page"},{"location":"POMDPTools/distributions/#Implemented-Distributions","page":"Implemented Distributions","title":"Implemented Distributions","text":"","category":"section"},{"location":"POMDPTools/distributions/","page":"Implemented Distributions","title":"Implemented Distributions","text":"POMDPTools contains several utility distributions to be used in the POMDPs transition and observation functions. These implement the appropriate methods of the functions in the distributions interface.","category":"page"},{"location":"POMDPTools/distributions/","page":"Implemented Distributions","title":"Implemented Distributions","text":"This package also supplies showdistribution for pretty printing distributions as unicode bar graphs to the terminal.","category":"page"},{"location":"POMDPTools/distributions/#Sparse-Categorical-(SparseCat)","page":"Implemented Distributions","title":"Sparse Categorical (SparseCat)","text":"","category":"section"},{"location":"POMDPTools/distributions/","page":"Implemented Distributions","title":"Implemented Distributions","text":"SparseCat is a sparse categorical distribution which is specified by simply providing a list of possible values (states or observations) and the probabilities corresponding to those particular objects.","category":"page"},{"location":"POMDPTools/distributions/","page":"Implemented Distributions","title":"Implemented Distributions","text":"Example: SparseCat([1,2,3], [0.1,0.2,0.7]) is a categorical distribution that assigns probability 0.1 to 1, 0.2 to 2, 0.7 to 3, and 0 to all other values.","category":"page"},{"location":"POMDPTools/distributions/","page":"Implemented Distributions","title":"Implemented Distributions","text":"SparseCat","category":"page"},{"location":"POMDPTools/distributions/#POMDPTools.POMDPDistributions.SparseCat","page":"Implemented Distributions","title":"POMDPTools.POMDPDistributions.SparseCat","text":"SparseCat(values, probabilities)\n\nCreate a sparse categorical distribution.\n\nvalues is an iterable object containing the possible values (can be of any type) in the distribution that have nonzero probability. probabilities is an iterable object that contains the associated probabilities.\n\nThis is optimized for value iteration with a fast implementation of weighted_iterator. Both pdf and rand are order n.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/distributions/#Implicit","page":"Implemented Distributions","title":"Implicit","text":"","category":"section"},{"location":"POMDPTools/distributions/","page":"Implemented Distributions","title":"Implemented Distributions","text":"In situations where a distribution object is required, but the pdf is difficult to specify and only samples are required, ImplicitDistribution provides a convenient way to package a sampling function.","category":"page"},{"location":"POMDPTools/distributions/","page":"Implemented Distributions","title":"Implemented Distributions","text":"ImplicitDistribution","category":"page"},{"location":"POMDPTools/distributions/#POMDPTools.POMDPDistributions.ImplicitDistribution","page":"Implemented Distributions","title":"POMDPTools.POMDPDistributions.ImplicitDistribution","text":"ImplicitDistribution(sample_function, args...)\n\nDefine a distribution that can only be sampled from using rand, but has no explicit pdf.\n\nEach time rand(rng, d::ImplicitDistribution) is called,\n\nsample_function(args..., rng)\n\nwill be called to generate a new sample.\n\nImplicitDistribution is designed to be used with anonymous functions or the do syntax as follows:\n\nExamples\n\nImplicitDistribution(rng->rand(rng)^2)\n\nstruct MyMDP <: MDP{Float64, Int} end\n\nfunction POMDPs.transition(m::MyMDP, s, a)\n ImplicitDistribution(s, a) do s, a, rng\n return s + a + 0.001*randn(rng)\n end\nend\n\ntd = transition(MyMDP(), 1.0, 1)\nrand(td) # will return a number near 2\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/distributions/#Bool-Distribution","page":"Implemented Distributions","title":"Bool Distribution","text":"","category":"section"},{"location":"POMDPTools/distributions/","page":"Implemented Distributions","title":"Implemented Distributions","text":"BoolDistribution","category":"page"},{"location":"POMDPTools/distributions/#POMDPTools.POMDPDistributions.BoolDistribution","page":"Implemented Distributions","title":"POMDPTools.POMDPDistributions.BoolDistribution","text":"BoolDistribution(p_true)\n\nCreate a distribution over Boolean values (true or false).\n\np_true is the probability of the true outcome; the probability of false is 1-p_true.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/distributions/#Deterministic","page":"Implemented Distributions","title":"Deterministic","text":"","category":"section"},{"location":"POMDPTools/distributions/","page":"Implemented Distributions","title":"Implemented Distributions","text":"Deterministic","category":"page"},{"location":"POMDPTools/distributions/#POMDPTools.POMDPDistributions.Deterministic","page":"Implemented Distributions","title":"POMDPTools.POMDPDistributions.Deterministic","text":"Deterministic(value)\n\nCreate a deterministic distribution over only one value.\n\nThis is intended to be used when a distribution is required, but the outcome is deterministic. It is equivalent to a Kronecker Delta distribution.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/distributions/#Uniform","page":"Implemented Distributions","title":"Uniform","text":"","category":"section"},{"location":"POMDPTools/distributions/","page":"Implemented Distributions","title":"Implemented Distributions","text":"Uniform\nUnsafeUniform","category":"page"},{"location":"POMDPTools/distributions/#POMDPTools.POMDPDistributions.Uniform","page":"Implemented Distributions","title":"POMDPTools.POMDPDistributions.Uniform","text":"Uniform(collection)\n\nCreate a uniform categorical distribution over a collection of objects.\n\nThe objects in the collection must be unique (this is tested on construction), and will be stored in a Set. To avoid this overhead, use UnsafeUniform.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/distributions/#POMDPTools.POMDPDistributions.UnsafeUniform","page":"Implemented Distributions","title":"POMDPTools.POMDPDistributions.UnsafeUniform","text":"UnsafeUniform(collection)\n\nCreate a uniform categorical distribution over a collection of objects.\n\nNo checks are performed to ensure uniqueness or check whether an object is actually in the set when evaluating the pdf.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/distributions/#Pretty-Printing","page":"Implemented Distributions","title":"Pretty Printing","text":"","category":"section"},{"location":"POMDPTools/distributions/","page":"Implemented Distributions","title":"Implemented Distributions","text":"showdistribution","category":"page"},{"location":"POMDPTools/distributions/#POMDPTools.POMDPDistributions.showdistribution","page":"Implemented Distributions","title":"POMDPTools.POMDPDistributions.showdistribution","text":"showdistribution([io], [mime], d)\n\nShow a UnicodePlots.barplot representation of a distribution.\n\nKeyword Arguments\n\ntitle::String=string(typeof(d))*\" distribution\": title for the barplot. \n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/common_rl/#CommonRLInterface-Integration","page":"CommonRLInterface Integration","title":"CommonRLInterface Integration","text":"","category":"section"},{"location":"POMDPTools/common_rl/","page":"CommonRLInterface Integration","title":"CommonRLInterface Integration","text":"POMDPTools provides two-way integration with the CommonRLInterface.jl package. Using the convert function, one can convert an MDP or POMDP object to a CommonRLInterface environment, or vice-versa.","category":"page"},{"location":"POMDPTools/common_rl/","page":"CommonRLInterface Integration","title":"CommonRLInterface Integration","text":"For example,","category":"page"},{"location":"POMDPTools/common_rl/","page":"CommonRLInterface Integration","title":"CommonRLInterface Integration","text":"using POMDPs\nusing POMDPTools\nusing POMDPModels\nusing CommonRLInterface\n\nenv = convert(AbstractEnv, BabyPOMDP())\n\nr = act!(env, true)\nobserve(env)","category":"page"},{"location":"POMDPTools/common_rl/","page":"CommonRLInterface Integration","title":"CommonRLInterface Integration","text":"converts a Crying Baby POMDP to an RL environment and acts in and observes the environment. This environment (or any other CommonRLInterface environment), can be converted to an MDP or POMDP:","category":"page"},{"location":"POMDPTools/common_rl/","page":"CommonRLInterface Integration","title":"CommonRLInterface Integration","text":"using BasicPOMCP\n\nm = convert(POMDP, env)\nplanner = solve(POMCPSolver(), m)\na = action(planner, initialstate(m))","category":"page"},{"location":"POMDPTools/common_rl/","page":"CommonRLInterface Integration","title":"CommonRLInterface Integration","text":"You can also use the constructors listed below to manually convert between the interfaces.","category":"page"},{"location":"POMDPTools/common_rl/#Environment-Wrapper-Types","page":"CommonRLInterface Integration","title":"Environment Wrapper Types","text":"","category":"section"},{"location":"POMDPTools/common_rl/","page":"CommonRLInterface Integration","title":"CommonRLInterface Integration","text":"Since the standard reinforcement learning environment interface offers less information about the internal workings of the environment than the POMDPs.jl interface, MDPs and POMDPs created from these environments will have limited functionality. There are two types of (PO)MDP types that can wrap an environment:","category":"page"},{"location":"POMDPTools/common_rl/#Generative-model-wrappers","page":"CommonRLInterface Integration","title":"Generative model wrappers","text":"","category":"section"},{"location":"POMDPTools/common_rl/","page":"CommonRLInterface Integration","title":"CommonRLInterface Integration","text":"If the state and setstate! CommonRLInterface functions are provided, then the environment can be wrapped in a RLEnvMDP or RLEnvPOMDP and the POMDPs.jl generative model interface will be available.","category":"page"},{"location":"POMDPTools/common_rl/#Opaque-wrappers","page":"CommonRLInterface Integration","title":"Opaque wrappers","text":"","category":"section"},{"location":"POMDPTools/common_rl/","page":"CommonRLInterface Integration","title":"CommonRLInterface Integration","text":"If the state and setstate! are not provided, then the resulting POMDP or MDP can only be simulated. This case is represented using the OpaqueRLEnvPOMDP and OpaqueRLEnvMDP wrappers. From the POMDPs.jl perspective, the state of the opaque (PO)MDP is just an integer wrapped in an OpaqueRLEnvState. This keeps track of the \"age\" of the environment so that POMDPs.jl actions that attempt to interact with the environment at a different age are invalid.","category":"page"},{"location":"POMDPTools/common_rl/#Constructors","page":"CommonRLInterface Integration","title":"Constructors","text":"","category":"section"},{"location":"POMDPTools/common_rl/#Creating-RL-environments-from-MDPs-and-POMDPs","page":"CommonRLInterface Integration","title":"Creating RL environments from MDPs and POMDPs","text":"","category":"section"},{"location":"POMDPTools/common_rl/","page":"CommonRLInterface Integration","title":"CommonRLInterface Integration","text":"MDPCommonRLEnv\nPOMDPCommonRLEnv","category":"page"},{"location":"POMDPTools/common_rl/#POMDPTools.CommonRLIntegration.MDPCommonRLEnv","page":"CommonRLInterface Integration","title":"POMDPTools.CommonRLIntegration.MDPCommonRLEnv","text":"MDPCommonRLEnv(m, [s])\nMDPCommonRLEnv{RLO}(m, [s])\n\nCreate a CommonRLInterface environment from MDP m; optionally specify the state 's'.\n\nThe RLO parameter can be used to specify a type to convert the observation to. By default, this is AbstractArray. Use Any to disable conversion.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/common_rl/#POMDPTools.CommonRLIntegration.POMDPCommonRLEnv","page":"CommonRLInterface Integration","title":"POMDPTools.CommonRLIntegration.POMDPCommonRLEnv","text":"POMDPCommonRLEnv(m, [s], [o])\nPOMDPCommonRLEnv{RLO}(m, [s], [o])\n\nCreate a CommonRLInterface environment from POMDP m; optionally specify the state 's' and observation 'o'.\n\nThe RLO parameter can be used to specify a type to convert the observation to. By default, this is AbstractArray. Use Any to disable conversion.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/common_rl/#Creating-MDPs-and-POMDPs-from-RL-environments","page":"CommonRLInterface Integration","title":"Creating MDPs and POMDPs from RL environments","text":"","category":"section"},{"location":"POMDPTools/common_rl/","page":"CommonRLInterface Integration","title":"CommonRLInterface Integration","text":"RLEnvMDP\nRLEnvPOMDP\nOpaqueRLEnvMDP\nOpaqueRLEnvPOMDP","category":"page"},{"location":"POMDPTools/common_rl/#POMDPTools.CommonRLIntegration.RLEnvMDP","page":"CommonRLInterface Integration","title":"POMDPTools.CommonRLIntegration.RLEnvMDP","text":"RLEnvMDP(env; discount=1.0)\n\nCreate an MDP by wrapping a CommonRLInterface.AbstractEnv. state and setstate! from CommonRLInterface must be provided, and the POMDPs generative model functionality will be provided.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/common_rl/#POMDPTools.CommonRLIntegration.RLEnvPOMDP","page":"CommonRLInterface Integration","title":"POMDPTools.CommonRLIntegration.RLEnvPOMDP","text":"RLEnvPOMDP(env; discount=1.0)\n\nCreate an POMDP by wrapping a CommonRLInterface.AbstractEnv. state and setstate! from CommonRLInterface must be provided, and the POMDPs generative model functionality will be provided.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/common_rl/#POMDPTools.CommonRLIntegration.OpaqueRLEnvMDP","page":"CommonRLInterface Integration","title":"POMDPTools.CommonRLIntegration.OpaqueRLEnvMDP","text":"OpaqueRLEnvMDP(env; discount=1.0)\n\nWrap a CommonRLInterface.AbstractEnv in an MDP object. The state will be an OpaqueRLEnvState and only simulation will be supported.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/common_rl/#POMDPTools.CommonRLIntegration.OpaqueRLEnvPOMDP","page":"CommonRLInterface Integration","title":"POMDPTools.CommonRLIntegration.OpaqueRLEnvPOMDP","text":"OpaqueRLEnvPOMDP(env; discount=1.0)\n\nWrap a CommonRLInterface.AbstractEnv in an POMDP object. The state will be an OpaqueRLEnvState and only simulation will be supported.\n\n\n\n\n\n","category":"type"},{"location":"#[POMDPs.jl](https://github.com/JuliaPOMDP/POMDPs.jl)","page":"POMDPs.jl","title":"POMDPs.jl","text":"","category":"section"},{"location":"","page":"POMDPs.jl","title":"POMDPs.jl","text":"A Julia interface for defining, solving and simulating partially observable Markov decision processes and their fully observable counterparts.","category":"page"},{"location":"#Package-and-Ecosystem-Features","page":"POMDPs.jl","title":"Package and Ecosystem Features","text":"","category":"section"},{"location":"","page":"POMDPs.jl","title":"POMDPs.jl","text":"General interface that can handle problems with discrete and continuous state/action/observation spaces\nA number of popular state-of-the-art solvers implemented for use out-of-the-box\nTools that make it easy to define problems and simulate solutions\nSimple integration of custom solvers into the existing interface","category":"page"},{"location":"#Available-Packages","page":"POMDPs.jl","title":"Available Packages","text":"","category":"section"},{"location":"","page":"POMDPs.jl","title":"POMDPs.jl","text":"The POMDPs.jl package contains only the interface used for expressing and solving Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs). The POMDPTools package acts as a \"standard library\" for the POMDPs.jl interface, providing implementations of commonly-used components such as policies, belief updaters, distributions, and simulators. The list of solver and support packages maintained by the JuliaPOMDP community is available at the POMDPs.jl Readme.","category":"page"},{"location":"#Documentation-Outline","page":"POMDPs.jl","title":"Documentation Outline","text":"","category":"section"},{"location":"","page":"POMDPs.jl","title":"POMDPs.jl","text":"Documentation comes in three forms:","category":"page"},{"location":"","page":"POMDPs.jl","title":"POMDPs.jl","text":"An explanatory guide is available in the sections outlined below.\nHow-to examples are available in pages in this document with \"Example\" in the title and in the POMDPExamples package.\nReference docstrings for the entire POMDPs.jl interface are available in the API Documentation section.","category":"page"},{"location":"","page":"POMDPs.jl","title":"POMDPs.jl","text":"note: Note\nWhen updating these documents, make sure this is synced with docs/make.jl!!","category":"page"},{"location":"#Basics","page":"POMDPs.jl","title":"Basics","text":"","category":"section"},{"location":"","page":"POMDPs.jl","title":"POMDPs.jl","text":"Pages = [\"install.md\", \"get_started.md\", \"concepts.md\"]","category":"page"},{"location":"#Defining-POMDP-Models","page":"POMDPs.jl","title":"Defining POMDP Models","text":"","category":"section"},{"location":"","page":"POMDPs.jl","title":"POMDPs.jl","text":"Pages = [ \"def_pomdp.md\", \"interfaces.md\"]\nDepth = 3","category":"page"},{"location":"#Writing-Solvers-and-Updaters","page":"POMDPs.jl","title":"Writing Solvers and Updaters","text":"","category":"section"},{"location":"","page":"POMDPs.jl","title":"POMDPs.jl","text":"Pages = [ \"def_solver.md\", \"offline_solver.md\", \"online_solver.md\", \"def_updater.md\" ]","category":"page"},{"location":"#Analyzing-Results","page":"POMDPs.jl","title":"Analyzing Results","text":"","category":"section"},{"location":"","page":"POMDPs.jl","title":"POMDPs.jl","text":"Pages = [ \"simulation.md\", \"run_simulation.md\", \"policy_interaction.md\" ]","category":"page"},{"location":"#POMDPTools-the-standard-library-for-POMDPs.jl","page":"POMDPs.jl","title":"POMDPTools - the standard library for POMDPs.jl","text":"","category":"section"},{"location":"","page":"POMDPs.jl","title":"POMDPs.jl","text":"Pages = [\"POMDPTools/index.md\", \"POMDPTools/distributions.md\", \"POMDPTools/model.md\", \"POMDPTools/visualization.md\", \"POMDPTools/beliefs.md\", \"POMDPTools/policies.md\", \"POMDPTools/simulators.md\", \"POMDPTools/common_rl.md\", \"POMDPTools/testing.md\"]","category":"page"},{"location":"#Reference","page":"POMDPs.jl","title":"Reference","text":"","category":"section"},{"location":"","page":"POMDPs.jl","title":"POMDPs.jl","text":"Pages = [\"faq.md\", \"api.md\"]","category":"page"}] +[{"location":"POMDPTools/model/#Model-Tools","page":"Model Tools","title":"Model Tools","text":"","category":"section"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"POMDPTools contains assorted tools that are not part of the core POMDPs.jl interface for working with (PO)MDP Models.","category":"page"},{"location":"POMDPTools/model/#Interface-Extensions","page":"Model Tools","title":"Interface Extensions","text":"","category":"section"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"POMDPTools contains several interface extensions that provide shortcuts and standardized ways of dealing with extra data.","category":"page"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"Programmers should use these functions whenever possible in case optimized implementations are available, but all of the functions have default implementations based on the core POMDPs.jl interface. Thus, if the core interface is implemented, all of these functions will also be available.","category":"page"},{"location":"POMDPTools/model/#Weighted-Iteration","page":"Model Tools","title":"Weighted Iteration","text":"","category":"section"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"Many solution techniques, for example value iteration, require iteration through the support of a distribution and evaluating the probability mass for each value. In some cases, looking up the probability mass is expensive, so it is more efficient to iterate through value => probability pairs. weighted_iterator provides a standard interface for this.","category":"page"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"weighted_iterator","category":"page"},{"location":"POMDPTools/model/#POMDPTools.POMDPDistributions.weighted_iterator","page":"Model Tools","title":"POMDPTools.POMDPDistributions.weighted_iterator","text":"weighted_iterator(d)\n\nReturn an iterator through pairs of the values and probabilities in distribution d.\n\nThis is designed to speed up value iteration. Distributions are encouraged to provide a custom optimized implementation if possible.\n\nExample\n\njulia> d = BoolDistribution(0.7)\nBoolDistribution(0.7)\n\njulia> collect(weighted_iterator(d))\n2-element Array{Pair{Bool,Float64},1}:\n true => 0.7\n false => 0.3\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/model/#Observation-Weight","page":"Model Tools","title":"Observation Weight","text":"","category":"section"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"Sometimes, e.g. in particle filtering, the relative likelihood of an observation is required in addition to a generative model, and it is often tedious to implement a custom observation distribution type. For this case, the shortcut function obs_weight is provided.","category":"page"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"obs_weight","category":"page"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.obs_weight","page":"Model Tools","title":"POMDPTools.ModelTools.obs_weight","text":"obs_weight(pomdp, s, a, sp, o)\n\nReturn a weight proportional to the likelihood of receiving observation o from state sp (and a and s if they are present).\n\nThis is a useful shortcut for particle filtering so that the observation distribution does not have to be represented.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/model/#Ordered-Spaces","page":"Model Tools","title":"Ordered Spaces","text":"","category":"section"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"It is often useful to have a list of states, actions, or observations ordered consistently with the respective index function from POMDPs.jl. Since the POMDPs.jl interface does not demand that spaces be ordered consistently with index, the states, actions, and observations functions are not sufficient. Thus POMDPModelTools provides ordered_actions, ordered_states, and ordered_observations to provide this capability.","category":"page"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"ordered_actions\nordered_states\nordered_observations","category":"page"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.ordered_actions","page":"Model Tools","title":"POMDPTools.ModelTools.ordered_actions","text":"ordered_actions(mdp)\n\nReturn an AbstractVector of actions ordered according to actionindex(mdp, a).\n\nordered_actions(mdp) will always return an AbstractVector{A} v containing all of the actions in actions(mdp) in the order such that actionindex(mdp, v[i]) == i. You may wish to override this for your problem for efficiency.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.ordered_states","page":"Model Tools","title":"POMDPTools.ModelTools.ordered_states","text":"ordered_states(mdp)\n\nReturn an AbstractVector of states ordered according to stateindex(mdp, a).\n\nordered_states(mdp) will always return a AbstractVector{A} v containing all of the states in states(mdp) in the order such that stateindex(mdp, v[i]) == i. You may wish to override this for your problem for efficiency.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.ordered_observations","page":"Model Tools","title":"POMDPTools.ModelTools.ordered_observations","text":"ordered_observations(pomdp)\n\nReturn an AbstractVector of observations ordered according to obsindex(pomdp, a).\n\nordered_observations(mdp) will always return a AbstractVector{A} v containing all of the observations in observations(pomdp) in the order such that obsindex(pomdp, v[i]) == i. You may wish to override this for your problem for efficiency.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/model/#Info-Interface","page":"Model Tools","title":"Info Interface","text":"","category":"section"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"It is often the case that useful information besides the belief, state, action, etc is generated by a function in POMDPs.jl. This information can be useful for debugging or understanding the behavior of a solver, updater, or problem. The info interface provides a standard way for problems, policies, solvers or updaters to output this information. The recording simulators from POMDPTools automatically record this information.","category":"page"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"To specify info from policies, solvers, or updaters, implement the following functions:","category":"page"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"action_info\nsolve_info\nupdate_info","category":"page"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.action_info","page":"Model Tools","title":"POMDPTools.ModelTools.action_info","text":"a, ai = action_info(policy, x)\n\nReturn a tuple containing the action determined by policy 'p' at state or belief 'x' and information (usually a NamedTuple, Dict or nothing) from the calculation of that action.\n\nBy default, returns nothing as info.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.solve_info","page":"Model Tools","title":"POMDPTools.ModelTools.solve_info","text":"policy, si = solve_info(solver, problem)\n\nReturn a tuple containing the policy determined by a solver and information (usually a NamedTuple, Dict or nothing) from the calculation of that policy.\n\nBy default, returns nothing as info.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.update_info","page":"Model Tools","title":"POMDPTools.ModelTools.update_info","text":"bp, i = update_info(updater, b, a, o)\n\nReturn a tuple containing the new belief and information (usually a NamedTuple, Dict or nothing) from the belief update.\n\nBy default, returns nothing as info.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/model/#Model-Transformations","page":"Model Tools","title":"Model Transformations","text":"","category":"section"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"POMDPTools contains several tools for transforming problems into other classes so that they can be used by different solvers.","category":"page"},{"location":"POMDPTools/model/#Linear-Algebra-Representations","page":"Model Tools","title":"Linear Algebra Representations","text":"","category":"section"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"For some algorithms, such as value iteration, it is convenient to use vectors that contain the reward for every state, and matrices that contain the transition probabilities. These can be constructed with the following functions:","category":"page"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"transition_matrices\nreward_vectors","category":"page"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.transition_matrices","page":"Model Tools","title":"POMDPTools.ModelTools.transition_matrices","text":"transition_matrices(p::SparseTabularProblem)\n\nAccessor function for the transition model of a sparse tabular problem. It returns a list of sparse matrices for each action of the problem.\n\n\n\n\n\ntransition_matrices(m::Union{MDP,POMDP})\ntransition_matrices(m; sparse=true)\n\nConstruct transition matrices for (PO)MDP m.\n\nThe returned object is an associative object (usually a Dict), where the keys are actions. Each value in this object is an AbstractMatrix where the row corresponds to the state index of s and the column corresponds to the state index of s'. The entry in the matrix is the probability of transitioning from state s to state s'.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.reward_vectors","page":"Model Tools","title":"POMDPTools.ModelTools.reward_vectors","text":"reward_vectors(m::Union{MDP, POMDP})\n\nConstruct reward vectors for (PO)MDP m.\n\nThe returned object is an associative object (usually a Dict), where the keys are actions. Each value in this object is an AbstractVector where the index corresponds to the state index of s and the entry is the reward for that state.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/model/#Sparse-Tabular-MDPs-and-POMDPs","page":"Model Tools","title":"Sparse Tabular MDPs and POMDPs","text":"","category":"section"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"The SparseTabularMDP and SparseTabularPOMDP represents discrete problems defined using the explicit interface. The transition and observation models are represented using sparse matrices. Solver writers can leverage these data structures to write efficient vectorized code. A problem writer can define its problem using the explicit interface and it can be automatically converted to a sparse tabular representation by calling the constructors SparseTabularMDP(::MDP) or SparseTabularPOMDP(::POMDP). See the following docs to know more about the matrix representation and how to access the fields of the SparseTabular objects:","category":"page"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"SparseTabularMDP\nSparseTabularPOMDP\ntransition_matrix\nreward_vector\nobservation_matrix\nreward_matrix\nobservation_matrices","category":"page"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.SparseTabularPOMDP","page":"Model Tools","title":"POMDPTools.ModelTools.SparseTabularPOMDP","text":"SparseTabularPOMDP\n\nA POMDP object where states and actions are integers and the transition and observation distributions are represented by lists of sparse matrices. This data structure can be useful to exploit in vectorized algorithms to gain performance (e.g. see SparseValueIterationSolver). The recommended way to access the transition, reward, and observation matrices is through the provided accessor functions: transition_matrix, reward_vector, observation_matrix.\n\nFields\n\nT::Vector{SparseMatrixCSC{Float64, Int64}} The transition model is represented as a vector of sparse matrices (one for each action). T[a][s, sp] the probability of transition from s to sp taking action a.\nR::Array{Float64, 2} The reward is represented as a matrix where the rows are states and the columns actions: R[s, a] is the reward of taking action a in sate s.\nO::Vector{SparseMatrixCSC{Float64, Int64}} The observation model is represented as a vector of sparse matrices (one for each action). O[a][sp, o] is the probability of observing o from state sp after having taken action a.\ninitial_probs::SparseVector{Float64, Int64} Specifies the initial state distribution\nterminal_states::Set{Int64} Stores the terminal states\ndiscount::Float64 The discount factor\n\nConstructors\n\nSparseTabularPOMDP(pomdp::POMDP) : One can provide the matrices to the default constructor or one can construct a SparseTabularPOMDP from any discrete state MDP defined using the explicit interface. \n\nNote that constructing the transition and reward matrices requires to iterate over all the states and can take a while. To learn more information about how to define an MDP with the explicit interface please visit https://juliapomdp.github.io/POMDPs.jl/latest/explicit/ .\n\nSparseTabularPOMDP(spomdp::SparseTabularMDP; transition, reward, observation, discount) : This constructor returns a new sparse POMDP that is a copy of the original smdp except for the field specified by the keyword arguments.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.transition_matrix","page":"Model Tools","title":"POMDPTools.ModelTools.transition_matrix","text":"transition_matrix(p::SparseTabularProblem, a)\n\nAccessor function for the transition model of a sparse tabular problem. It returns a sparse matrix containing the transition probabilities when taking action a: T[s, sp] = Pr(sp | s, a).\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.reward_vector","page":"Model Tools","title":"POMDPTools.ModelTools.reward_vector","text":"reward_vector(p::SparseTabularProblem, a)\n\nAccessor function for the reward function of a sparse tabular problem. It returns a vector containing the reward for all the states when taking action a: R(s, a). The length of the return vector is equal to the number of states.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.observation_matrix","page":"Model Tools","title":"POMDPTools.ModelTools.observation_matrix","text":"observation_matrix(p::SparseTabularPOMDP, a::Int64)\n\nAccessor function for the observation model of a sparse tabular POMDP. It returns a sparse matrix containing the observation probabilities when having taken action a: O[sp, o] = Pr(o | sp, a).\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.reward_matrix","page":"Model Tools","title":"POMDPTools.ModelTools.reward_matrix","text":"reward_matrix(p::SparseTabularProblem)\n\nAccessor function for the reward matrix R[s, a] of a sparse tabular problem.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.observation_matrices","page":"Model Tools","title":"POMDPTools.ModelTools.observation_matrices","text":"observation_matrices(p::SparseTabularPOMDP)\n\nAccessor function for the observation model of a sparse tabular POMDP. It returns a list of sparse matrices for each action of the problem.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/model/#Fully-Observable-POMDP","page":"Model Tools","title":"Fully Observable POMDP","text":"","category":"section"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"FullyObservablePOMDP","category":"page"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.FullyObservablePOMDP","page":"Model Tools","title":"POMDPTools.ModelTools.FullyObservablePOMDP","text":"FullyObservablePOMDP(mdp)\n\nTurn MDP mdp into a POMDP where the observations are the states of the MDP.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/model/#Generative-Belief-MDP","page":"Model Tools","title":"Generative Belief MDP","text":"","category":"section"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"Every POMDP is an MDP on the belief space GenerativeBeliefMDP creates a generative model for that MDP.","category":"page"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"warning: Warning\nThe reward generated by the GenerativeBeliefMDP is the reward for a single state sampled from the belief; it is not the expected reward for that belief transition (though, in expectation, they are equivalent of course). Implementing the model with the expected reward requires a custom implementation because belief updaters do not typically deal with reward.","category":"page"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"GenerativeBeliefMDP","category":"page"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.GenerativeBeliefMDP","page":"Model Tools","title":"POMDPTools.ModelTools.GenerativeBeliefMDP","text":"GenerativeBeliefMDP(pomdp, updater)\n\nCreate a generative model of the belief MDP corresponding to POMDP pomdp with belief updates performed by updater.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/model/#Example","page":"Model Tools","title":"Example","text":"","category":"section"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"using POMDPs\nusing POMDPModels\nusing POMDPTools\n\npomdp = BabyPOMDP()\nupdater = DiscreteUpdater(pomdp)\n\nbelief_mdp = GenerativeBeliefMDP(pomdp, updater)\n@show statetype(belief_mdp) # POMDPModels.BoolDistribution\n\nfor (a, r, sp) in stepthrough(belief_mdp, RandomPolicy(belief_mdp), \"a,r,sp\", max_steps=5)\n @show a, r, sp\nend\n\n# output\nstatetype(belief_mdp) = DiscreteBelief{POMDPModels.BabyPOMDP, Bool}Bool}\n(a, r, sp) = (true, -5.0, DiscreteBelief{POMDPModels.BabyPOMDP, Bool}(POMDPModels.BabyPOMDP(-5.0, -10.0, 0.1, 0.8, 0.1, 0.9), Bool[0, 1], [1.0, 0.0]))\n(a, r, sp) = (true, -5.0, DiscreteBelief{POMDPModels.BabyPOMDP, Bool}(POMDPModels.BabyPOMDP(-5.0, -10.0, 0.1, 0.8, 0.1, 0.9), Bool[0, 1], [1.0, 0.0]))\n(a, r, sp) = (true, -5.0, DiscreteBelief{POMDPModels.BabyPOMDP, Bool}(POMDPModels.BabyPOMDP(-5.0, -10.0, 0.1, 0.8, 0.1, 0.9), Bool[0, 1], [1.0, 0.0]))\n(a, r, sp) = (false, 0.0, DiscreteBelief{POMDPModels.BabyPOMDP, Bool}(POMDPModels.BabyPOMDP(-5.0, -10.0, 0.1, 0.8, 0.1, 0.9), Bool[0, 1], [0.9759036144578314, 0.02409638554216867]))\n(a, r, sp) = (false, 0.0, DiscreteBelief{POMDPModels.BabyPOMDP, Bool}(POMDPModels.BabyPOMDP(-5.0, -10.0, 0.1, 0.8, 0.1, 0.9), Bool[0, 1], [0.9701315984030756, 0.029868401596924433]))","category":"page"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"DocTestSetup = nothing","category":"page"},{"location":"POMDPTools/model/#Underlying-MDP","page":"Model Tools","title":"Underlying MDP","text":"","category":"section"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"UnderlyingMDP","category":"page"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.UnderlyingMDP","page":"Model Tools","title":"POMDPTools.ModelTools.UnderlyingMDP","text":"UnderlyingMDP(m::POMDP)\n\nTransform POMDP m into an MDP where the states are fully observed.\n\nUnderlyingMDP(m::MDP)\n\nReturn m\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/model/#State-Action-Reward-Model","page":"Model Tools","title":"State Action Reward Model","text":"","category":"section"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"StateActionReward","category":"page"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.StateActionReward","page":"Model Tools","title":"POMDPTools.ModelTools.StateActionReward","text":"StateActionReward(m::Union{MDP,POMDP})\n\nRobustly create a reward function that depends only on the state and action.\n\nIf reward(m, s, a) is implemented, that will be used, otherwise the mean of reward(m, s, a, sp) for MDPs or reward(m, s, a, sp, o) for POMDPs will be used.\n\nExample\n\nusing POMDPs\nusing POMDPModels\nusing POMDPTools\n\nm = BabyPOMDP()\n\nrm = StateActionReward(m)\n\nrm(true, true)\n\n# output\n\n-15.0\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/model/#Utility-Types","page":"Model Tools","title":"Utility Types","text":"","category":"section"},{"location":"POMDPTools/model/#Terminal-State","page":"Model Tools","title":"Terminal State","text":"","category":"section"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"TerminalState and its singleton instance terminalstate are available to use for a terminal state in concert with another state type. It has the appropriate type promotion logic to make its use with other types friendly, similar to nothing and missing.","category":"page"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"note: Note\nNOTE: This is NOT a replacement for the standard POMDPs.jl isterminal function, though isterminal is implemented for the type. It is merely a convenient type to use for terminal states.","category":"page"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"warning: Warning\nWARNING: Early tests (August 2018) suggest that the Julia 1.0 compiler will not be able to efficiently implement union splitting in cases as complex as POMDPs, so using a Union for the state type of a problem can currently have a large overhead.","category":"page"},{"location":"POMDPTools/model/","page":"Model Tools","title":"Model Tools","text":"TerminalState\nterminalstate","category":"page"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.TerminalState","page":"Model Tools","title":"POMDPTools.ModelTools.TerminalState","text":"TerminalState\n\nA type with no fields whose singleton instance terminalstate is used to represent a terminal state with no additional information.\n\nThis type has the appropriate promotion logic implemented to function like Missing when added to arrays, etc.\n\nNote that terminal states NEED NOT be of type TerminalState. You can define any state to be terminal by implementing the appropriate isterminal method. Solvers and simulators SHOULD NOT check for this type, but should instead check using isterminal. \n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/model/#POMDPTools.ModelTools.terminalstate","page":"Model Tools","title":"POMDPTools.ModelTools.terminalstate","text":"terminalstate\n\nThe singleton instance of type TerminalState representing a terminal state.\n\n\n\n\n\n","category":"constant"},{"location":"policy_interaction/#Interacting-with-Policies","page":"Interacting with Policies","title":"Interacting with Policies","text":"","category":"section"},{"location":"policy_interaction/","page":"Interacting with Policies","title":"Interacting with Policies","text":"A solution to a POMDP is a policy that maps beliefs or action-observation histories to actions. In POMDPs.jl, these are represented by Policy objects. See Solvers and Policies for more information about what a policy can represent in general.","category":"page"},{"location":"policy_interaction/","page":"Interacting with Policies","title":"Interacting with Policies","text":"One common task in evaluating POMDP solutions is examining the policies themselves. Since the internal representation of a policy is an esoteric implementation detail, it is best to interact with policies through the action and value interface functions. There are three relevant methods","category":"page"},{"location":"policy_interaction/","page":"Interacting with Policies","title":"Interacting with Policies","text":"action(policy, s) returns the best action (or one of the best) for the given state or belief.\nvalue(policy, s) returns the expected sum of future rewards if the policy is executed.\nvalue(policy, s, a) returns the \"Q-value\", that is, the expected sum of rewards if action a is taken on the next step and then the policy is executed.","category":"page"},{"location":"policy_interaction/","page":"Interacting with Policies","title":"Interacting with Policies","text":"Note that the quantities returned by these functions are what the policy/solver expects to be the case after its (usually approximate) computations; they may be far from the true value if the solution is not exactly optimal.","category":"page"},{"location":"install/#Installation","page":"Installation","title":"Installation","text":"","category":"section"},{"location":"install/","page":"Installation","title":"Installation","text":"If you have a running Julia distribution (Julia 0.4 or greater), you have everything you need to install POMDPs.jl. To install the package, simply run the following from the Julia REPL:","category":"page"},{"location":"install/","page":"Installation","title":"Installation","text":"import Pkg\nPkg.add(\"POMDPs\") # installs the POMDPs.jl package","category":"page"},{"location":"install/","page":"Installation","title":"Installation","text":"Some auxiliary packages and older versions of solvers may be found in the JuliaPOMDP registry. To install this registry, run:","category":"page"},{"location":"install/","page":"Installation","title":"Installation","text":"using Pkg; pkg\"registry add https://github.com/JuliaPOMDP/Registry\"","category":"page"},{"location":"install/","page":"Installation","title":"Installation","text":"Note: to use this registry, JuliaPro users must also run edit(normpath(Sys.BINDIR,\"..\",\"etc\",\"julia\",\"startup.jl\")), comment out the line ENV[\"DISABLE_FALLBACK\"] = \"true\", save the file, and restart JuliaPro as described in this issue.","category":"page"},{"location":"POMDPTools/visualization/#Visualization","page":"Visualization","title":"Visualization","text":"","category":"section"},{"location":"POMDPTools/visualization/","page":"Visualization","title":"Visualization","text":"POMDPTools contains a basic visualization interface consisting of the render function.","category":"page"},{"location":"POMDPTools/visualization/","page":"Visualization","title":"Visualization","text":"Problem writers should implement a method of this function so that their problem can be visualized in a variety of contexts including jupyter notebooks, web browsers, or saved as images or animations.","category":"page"},{"location":"POMDPTools/visualization/","page":"Visualization","title":"Visualization","text":"render","category":"page"},{"location":"POMDPTools/visualization/#POMDPTools.ModelTools.render","page":"Visualization","title":"POMDPTools.ModelTools.render","text":"render(m::Union{MDP,POMDP}, step::NamedTuple)\n\nReturn a renderable representation of the step in problem m.\n\nThe renderable representation may be anything that has show(io, mime, x) methods. It could be a plot, svg, Compose.jl context, Cairo context, or image.\n\nArguments\n\nstep is a NamedTuple that contains the states, action, etc. corresponding to one transition in a simulation. It may have the following fields:\n\nt: the time step index\ns: the state at the beginning of the step\na: the action\nsp: the state at the end of the step (s')\nr: the reward for the step\no: the observation\nb: the belief at the \nbp: the belief at the end of the step\ni: info from the model when the state transition was calculated\nai: info from the policy decision\nui: info from the belief update\n\nKeyword arguments are reserved for the problem implementer and can be used to control appearance, etc.\n\nImportant Notes\n\nstep may not contain all of the elements listed above, so render should check for them and render only what is available\no typically corresponds to sp, so it is often clearer for POMDPs to render sp rather than s.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/visualization/","page":"Visualization","title":"Visualization","text":"Sometimes it is important to have control over how the problem is rendered with different mimetypes. One way to handle this is to have render return a custom type, e.g.","category":"page"},{"location":"POMDPTools/visualization/","page":"Visualization","title":"Visualization","text":"struct MyProblemVisualization\n mdp::MyProblem\n step::NamedTuple\nend\n\nPOMDPTools.render(mdp, step) = MyProblemVisualization(mdp, step)","category":"page"},{"location":"POMDPTools/visualization/","page":"Visualization","title":"Visualization","text":"and then implement custom show methods, e.g.","category":"page"},{"location":"POMDPTools/visualization/","page":"Visualization","title":"Visualization","text":"show(io::IO, mime::MIME\"text/html\", v::MyProblemVisualization)","category":"page"},{"location":"def_pomdp/#defining_pomdps","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"As described in the Concepts and Architecture section, an MDP is defined by the state space, action space, transition distributions, reward function, and discount factor, (SATRgamma). A POMDP also includes the observation space, and observation probability distributions, for a definition of (SATROZgamma). A problem definition in POMDPs.jl consists of an implicit or explicit definition of each of these elements.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"It is possible to define a (PO)MDP with a more traditional object-oriented approach in which the user defines a new type to represent the (PO)MDP and methods of interface functions to define the tuple elements. However, the QuickPOMDPs package provides a more concise way to get started, using keyword arguments instead of new types and methods. Essentially each keyword argument defines a corresponding POMDPs api function. Since the important concepts are the same for the object oriented approach and the QuickPOMDP approach, we will use the latter for this discussion.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"This guide has three parts: First, it explains a very simple example (the Tiger POMDP), then uses a more complex example to illustrate the broader capabilities of the interface. Finally, some alternative ways of defining (PO)MDPs are discussed.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"note: Note\nThis guide assumes that you are comfortable programming in Julia, especially familiar with various ways of defining anonymous functions. Users should consult the Julia documentation to learn more about programming in Julia.","category":"page"},{"location":"def_pomdp/#tiger","page":"Defining POMDPs and MDPs","title":"A Basic Example: The Tiger POMDP","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"In the first section of this guide, we will explain a QuickPOMDP implementation of a very simple problem: the classic Tiger POMDP. In the tiger POMDP, the agent is tasked with escaping from a room. There are two doors leading out of the room. Behind one of the doors is a tiger, and behind the other is sweet, sweet freedom. If the agent opens the door and finds the tiger, it gets eaten (and receives a reward of -100). If the agent opens the other door, it escapes and receives a reward of 10. The agent can also listen. Listening gives a noisy measurement of which door the tiger is hiding behind. Listening gives the agent the correct location of the tiger 85% of the time. The agent receives a reward of -1 for listening. The complete implementation looks like this:","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"using QuickPOMDPs: QuickPOMDP\nusing POMDPTools: Deterministic, Uniform, SparseCat\n\nm = QuickPOMDP(\n states = [\"left\", \"right\"],\n actions = [\"left\", \"right\", \"listen\"],\n observations = [\"left\", \"right\"],\n discount = 0.95,\n\n transition = function (s, a)\n if a == \"listen\"\n return Deterministic(s) # tiger stays behind the same door\n else # a door is opened\n return Uniform([\"left\", \"right\"]) # reset\n end\n end,\n\n observation = function (a, sp)\n if a == \"listen\"\n if sp == \"left\"\n return SparseCat([\"left\", \"right\"], [0.85, 0.15]) # sparse categorical\n else\n return SparseCat([\"right\", \"left\"], [0.85, 0.15])\n end\n else\n return Uniform([\"left\", \"right\"])\n end\n end,\n\n reward = function (s, a)\n if a == \"listen\"\n return -1.0\n elseif s == a # the tiger was found\n return -100.0\n else # the tiger was escaped\n return 10.0\n end\n end,\n\n initialstate = Uniform([\"left\", \"right\"]),\n);","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"The next sections explain how each of the elements of the POMDP tuple are defined in this implementation:","category":"page"},{"location":"def_pomdp/#State,-action-and-observation-spaces","page":"Defining POMDPs and MDPs","title":"State, action and observation spaces","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"In this example, each state, action, and observation is a String. The state, action and observation spaces (S, A, and O), are defined with the states, actions and observations keyword arguments. In this case, they are simply Vectors containing all the elements in the space.","category":"page"},{"location":"def_pomdp/#Transition-and-observation-distributions","page":"Defining POMDPs and MDPs","title":"Transition and observation distributions","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"The transition and observation keyword arguments are used to define the transition distribution, T, and observation distribution, Z, respectively. These models are defined using functions that return distribution objects (more info below). The transition function takes state and action arguments and returns a distribution of the resulting next state. The observation function takes in an action and the resulting next state (sp, short for \"s prime\") and returns the distribution of the observation emitted at this state.","category":"page"},{"location":"def_pomdp/#Reward-function","page":"Defining POMDPs and MDPs","title":"Reward function","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"The reward keyword argument defines R. It is a function that takes in a state and action and returns a number.","category":"page"},{"location":"def_pomdp/#Discount-and-initial-state-distribution","page":"Defining POMDPs and MDPs","title":"Discount and initial state distribution","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"The discount factor, gamma, is defined with the discount keyword, and is simply a number between 0 and 1. The initial state distribution, b_0, is defined with the initialstate argument, and is a distribution object.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"The example above shows a complete implementation of a very simple discrete-space POMDP. However, POMDPs.jl is capable of concisely expressing much more complex models with continuous and hybrid spaces. The guide below introduces a more complex example to fully explain the ways that a POMDP can be defined.","category":"page"},{"location":"def_pomdp/#Guide-to-Defining-POMDPs","page":"Defining POMDPs and MDPs","title":"Guide to Defining POMDPs","text":"","category":"section"},{"location":"def_pomdp/#po-mountaincar","page":"Defining POMDPs and MDPs","title":"A more complex example: A partially-observable mountain car","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"Mountain car is a classic problem in reinforcement learning. A car starts in a valley between two hills, and must reach the goal at the top of the hill to the right (see wikipedia for image). The actions are left and right acceleration and neutral and the state consists of the car's position and velocity. In this partially-observable version, there is a small amount of acceleration noise and observations are normally-distributed noisy measurements of the position. This problem can be implemented as follows:","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"import QuickPOMDPs: QuickPOMDP\nimport POMDPTools: ImplicitDistribution\nimport Distributions: Normal\n\nmountaincar = QuickPOMDP(\n actions = [-1., 0., 1.],\n obstype = Float64,\n discount = 0.95,\n\n transition = function (s, a) \n ImplicitDistribution() do rng\n x, v = s\n vp = v + a*0.001 + cos(3*x)*-0.0025 + 0.0002*randn(rng)\n vp = clamp(vp, -0.07, 0.07)\n xp = x + vp\n return (xp, vp)\n end\n end,\n\n observation = (a, sp) -> Normal(sp[1], 0.15),\n\n reward = function (s, a, sp)\n if sp[1] > 0.5\n return 100.0\n else\n return -1.0\n end\n end,\n\n initialstate = ImplicitDistribution(rng -> (-0.2*rand(rng), 0.0)),\n isterminal = s -> s[1] > 0.5\n)","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"The following sections provide a detailed guide to defining the components of a POMDP using this example and the tiger pomdp further above.","category":"page"},{"location":"def_pomdp/#space_representation","page":"Defining POMDPs and MDPs","title":"State, action, and observation spaces","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"In POMDPs.jl, a state, action, or observation can be represented by any Julia object, for example an integer, a floating point number, a string or Symbol, or a vector. For example, in the tiger problem, the states are Strings, and in the mountaincar problem, the state is a Tuple of two floating point numbers, and the actions and observations are floating point numbers. These types are usually inferred from the space or initial state distribution definitions.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"warn: Warn\nObjects representing individual states, actions, and observations should not be altered once they are created, since they may be used as dictionary keys or stored in histories. Hence it is usually best to use immutable objects such as integers or StaticArrays. If the states need to be mutable (e.g. aggregate types with vectors in them), make sure the states are not actualy mutated and that hash and == functions are implmemented (see AutoHashEquals)","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"The state, action, and observation spaces are defined with the states, actions, and observations Quick(PO)MDP keyword arguments. The simplest way to define these spaces is with a Vector of states, e.g. states = [\"left\", \"right\"] in the tiger problem. More complicated spaces, such as vector spaces and other continuous, uncountable, or hybrid sets can be defined with custom objects that adhere to the space interface. However it should be noted that, for many solvers, an explicit enumeration of the state and observation spaces is not needed. Instead, it is sufficient to specify the state or observation type using the statetype or obstype arguments, e.g. obstype = Float64 in the mountaincar problem.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"tip: Tip\nIf you are having a difficult time representing the state or observation space, it is likely that you will not be able to use a solver that requires an explicit representation. It is usually best to omit that space from the definition and try solvers to see if they work.","category":"page"},{"location":"def_pomdp/#state-dep-action","page":"Defining POMDPs and MDPs","title":"State- or belief-dependent action spaces","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"In some problems, the set of allowable actions depends on the state or belief. This can be implemented by providing a function of the state or belief to the actions argument, e.g. if you can only take the action 1 in state 1, but can take full action space 1, 2 and 3, in an MDP, you might use","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"# add default vlaue \"s = nothing\" , \"actions(mdp)\" won't throw error.\nactions = function (s = nothing) \n if s == 1\n return [1] #<--- return state-dep-actions\n else\n return [1,2,3] #<--- return full action space here\n end\nend","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"Similarly, in a POMDP, you may wish to only allow action 1 if the belief b assigns a nonzero probability to state 1. This can be accomplished with","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"actions = function (b)\n if pdf(b, 1) > 0.0\n return [1,2,3]\n else\n return [2,3]\n end\nend","category":"page"},{"location":"def_pomdp/#Transition-and-observation-distributions-2","page":"Defining POMDPs and MDPs","title":"Transition and observation distributions","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"The transition and observation observation distributions are specified through functions that return distributions. A distribution object implements parts of the distribution interface, most importantly a rand function that provides a way to sample the distribution and, for explicit distributions, a pdf function that evaluates the probability mass or density of a given outcome. In most simple cases, you will be able to use a pre-defined distribution like the ones listed below, but occasionally you will define your own for more complex problems.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"tip: Tip\nSince the transition and observation functions return distributions, you should not call rand within these functions (unless it is within an ImplicitDistribution sampling function (see below)).","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"The transition function takes in a state s and action a and returns a distribution object that defines the distribution of next states given that the current state is s and the action is a, that is T(s s a). Similarly the observation function takes in the action a and the next state sp and returns a distribution object defining O(z a s).","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"note: Note\nIt is also possible to define the observation function in terms of the previous state s, along with a, and sp. This is necessary, for example, when the observation is a measurement of change in state, e.g. sp - s. However some solvers may use the a, sp method (and hence cannot solve problems where the observation is conditioned on s and s). Since providing an a, sp method automatically defines the s, a, sp method, problem writers should usually define only the a, sp method, and only define the s, a, sp method if it is necessary. Except for special performance cases, problem writers should never need to define both methods.","category":"page"},{"location":"def_pomdp/#Commonly-used-distributions","page":"Defining POMDPs and MDPs","title":"Commonly-used distributions","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"In most cases, the following pre-defined distributions found in the POMDPTools and Distributions packages will be sufficient to define models.","category":"page"},{"location":"def_pomdp/#Deterministic","page":"Defining POMDPs and MDPs","title":"Deterministic","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"The Deterministic distribution should be used when there is no randomness in the state or observation given the state and action inputs. This commonly occurs when the new state is a deterministic function of the state and action or the state stays the same, for example when the action is \"listen\" in the tiger example above, the transition function returns Deterministic(s).","category":"page"},{"location":"def_pomdp/#SparseCat","page":"Defining POMDPs and MDPs","title":"SparseCat","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"In discrete POMDPs, it is common for the state or observation to have a few possible outcomes with specified probabilities. This can be represented with a sparse categorical SparseCat distribution that takes a list of outcomes and a list of associated probabilities as arguments. For instance, in the tiger example above, when the action is \"listen\", there is an 85% chance of receiving the correct observation. Thus if the state is \"left\", the observation distribution is SparseCat([\"left\", \"right\"], [0.85, 0.15]), and SparseCat([\"right\", \"left\"], [0.85, 0.15]) if the state is \"right\".","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"Another example where SparseCat distributions are useful is in grid-world problems, where there is a high probability of transitioning along the direction of the action, a low probability of transitioning to other adjacent states, and zero probability of transitioning to any other states.","category":"page"},{"location":"def_pomdp/#Uniform","page":"Defining POMDPs and MDPs","title":"Uniform","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"Another common case is a uniform distribution over a space or set of outcomes. This can be represented with a Uniform object that takes a set of outcomes as an argument. For example, the initial state distribution in the tiger problem is represented with Uniform([\"left\", \"right\"]) indicating that both states are equally likely.","category":"page"},{"location":"def_pomdp/#Distributions.jl","page":"Defining POMDPs and MDPs","title":"Distributions.jl","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"If the states or observations have numerical or vector values, the Distributions.jl package provides a suite of suitable distributions. For example, the observation function in the partially-observable mountain car example above,","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"observation = (a, sp) -> Normal(sp[1], 0.15)","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"returns a Normal distribution from this package with a mean that depends on the car's location (the first element of state sp) and a standard deviation of 0.15.","category":"page"},{"location":"def_pomdp/#implicit_distribution_section","page":"Defining POMDPs and MDPs","title":"ImplicitDistribution","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"In many cases, especially when the state or observation spaces are continuous or hybrid, it is difficult or impossible to specify the probability density explicitly. Fortunately, many solvers for these problems do not require explicit density information and instead need only samples from the distribution. In this case, an \"implicit distribution\" or \"generative model\" is sufficient. In POMDPs.jl, this can be represented using an ImplicitDistribution object.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"The argument to an ImplicitDistribution constructor is a function that takes a random number generator as an argument and returns a sample from the distribution. To see how this works, we'll look at an example inspired by the mountaincar initial state distribution. Samples from this distribution are position-velocity tuples where the velocity is always zero, but the position is uniformly distributed between -0.2 and 0. Consider the following code:","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"using Random: MersenneTwister\nusing POMDPTools: ImplicitDistribution\n\nrng = MersenneTwister(1)\n\nd = ImplicitDistribution(rng -> (-0.2*rand(rng), 0.0))\nrand(rng, d)\n# output\n(-0.04720666913240939, 0.0)","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"Here, rng is the random number generator. When rand(rng, d) is called, the sampling function, rng -> (-0.2*rand(rng), 0.0), is called to generate a state. The sampling function uses rng to generate a random number between 0 and 1 (rand(rng)), multiplies it by -0.2 to get the position, and creates a tuple with the position and a velocity of 0.0 and returns an initial state that might be, for instance (-0.11, 0.0). Any time that a solver, belief updater, or simulator needs an initial state for the problem, it will be sampled in this way.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"note: Note\nThe random number generator is a subtype of AbstractRNG. It is important to use this random number generator for all calls to rand in the sample function for reproducible results. Moreover some solvers use specialized random number generators that allow them to reduce variance. See also the What if I don't use the rng argument? FAQ.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"It is also common to use Julia's do block syntax to define more complex sampling functions. For instance the transition function in the mountaincar example returns an ImplicitDistribution with a sampling function that (1) generates a new noisy velocity through a randn call, then (2) clamps the velocity, and finally (3) integrates the position with Euler's method:","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"transition = function (s, a) \n ImplicitDistribution() do rng\n x, v = s\n vp = v + a*0.001 + cos(3*x)*-0.0025 + 0.0002*randn(rng)\n vp = clamp(vp, -0.07, 0.07)\n xp = x + vp\n return (xp, vp)\n end\nend","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"Because of the nonlinear clamp operation, it would be difficult to represent this distribution explicitly.","category":"page"},{"location":"def_pomdp/#Custom-distributions","page":"Defining POMDPs and MDPs","title":"Custom distributions","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"If none of the distributions above are suitable, for example if you need to represent an explicit distribution with hybrid support, it is not difficult to define your own distributions by implementing the functions in the distribution interface.","category":"page"},{"location":"def_pomdp/#Reward-functions","page":"Defining POMDPs and MDPs","title":"Reward functions","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"The reward function maps a combination of state, action, and observation arguments to the reward for a step. For instance, the reward function in the mountaincar problem,","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"reward = function (s, a, sp)\n if sp[1] > 0.5\n return 100.0\n else\n return -1.0\n end\nend","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"takes in the previous state, s, the action, a, and the resulting state, sp and returns a large positive reward if the resulting position, sp[1], is beyond a threshold (note the coupling of the terminal reward) and a small negative reward on all other steps. If the reward in the problem is stochastic, the reward function implemented in POMDPs.jl should return the mean reward.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"There are two possible reward function argument signatures that a problem-writer might consider implementing for an MDP: (s, a) and (s, a, sp). For a POMDP, there is an additional version, (s, a, sp, o). The (s, a, sp) version is useful when transition to a terminal state results in a reward, and the (s, a, sp, o) version is useful for cases when the reward is associated with an observation, such as a negative reward for the stress caused by a medical diagnostic test that indicates the possibility of a disease. Problem writers should implement the version with the fewest number of arguments possible, since the versions with more arguments are automatically provided to solvers and simulators if a version with fewer arguments is implemented.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"In rare cases, it may make sense to implement two or more versions of the function, for example if a solver requires (s, a), but the user wants an observation-dependent reward to show up in simulation. It is OK to implement two methods of the reward function as long as the following relationships hold: R(s a) = E_ssim T(ssa)R(s a s) and R(s a s) = E_o sim Z(o s a s)R(s a s o). That is, the versions with fewer arguments must be expectations of versions with more arguments.","category":"page"},{"location":"def_pomdp/#Other-Components","page":"Defining POMDPs and MDPs","title":"Other Components","text":"","category":"section"},{"location":"def_pomdp/#Discount-factors","page":"Defining POMDPs and MDPs","title":"Discount factors","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"The discount keyword argument is simply a number between 0 and 1 used to discount rewards in the future.","category":"page"},{"location":"def_pomdp/#Initial-state-distribution","page":"Defining POMDPs and MDPs","title":"Initial state distribution","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"The initialstate argument should be a distribution object (see above) that defines the initial state distribution (and initial belief for POMDPs).","category":"page"},{"location":"def_pomdp/#Terminal-states","page":"Defining POMDPs and MDPs","title":"Terminal states","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"The function supplied to the isterminal object defines which which states in the POMDP are terminal. The function should take a state as an argument as an argument and return true if the state is terminal and false otherwise. For example, in the mountaincar example above, isterminal = s -> s[1] > 0.5 indicates all states where the position, s[1] is greater than 0.5 are terminal.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"It is assumed that the system will take no further steps once it has reached a terminal state. Since reward is assigned for taking steps, no additional award can be accumulated from a terminal state. Consequently, the most important property of terminal states is that the value of a terminal state is always zero. Many solvers leverage this property for efficiency. As in the mountaincar example","category":"page"},{"location":"def_pomdp/#Other-ways-to-define-a-(PO)MDP","page":"Defining POMDPs and MDPs","title":"Other ways to define a (PO)MDP","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"Besides the Quick(PO)MDP approach above, there are several alternative ways to define (PO)MDP models:","category":"page"},{"location":"def_pomdp/#Object-oriented","page":"Defining POMDPs and MDPs","title":"Object-oriented","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"First, it is possible to create your own (PO)MDP types and implement the components of the POMDP directly as methods of POMDPs.jl interface functions. This approach can be thought of as the \"low-level\" way to define a POMDP, and the QuickPOMDP as merely a syntactic convenience. There are a few things that make this object-oriented approach more cumbersome than the QuickPOMDP approach, but the structure is similar. For example, the tiger QuickPOMDP shown above can be implemented as follows:","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"import POMDPs\nusing POMDPs: POMDP\nusing POMDPTools: Deterministic, Uniform, SparseCat\n\nstruct TigerPOMDP <: POMDP{String, String, String}\n p_correct::Float64\n indices::Dict{String, Int}\n\n TigerPOMDP(p_correct=0.85) = new(p_correct, Dict(\"left\"=>1, \"right\"=>2, \"listen\"=>3))\nend\n\nPOMDPs.states(m::TigerPOMDP) = [\"left\", \"right\"]\nPOMDPs.actions(m::TigerPOMDP) = [\"left\", \"right\", \"listen\"]\nPOMDPs.observations(m::TigerPOMDP) = [\"left\", \"right\"]\nPOMDPs.discount(m::TigerPOMDP) = 0.95\nPOMDPs.stateindex(m::TigerPOMDP, s) = m.indices[s]\nPOMDPs.actionindex(m::TigerPOMDP, a) = m.indices[a]\nPOMDPs.obsindex(m::TigerPOMDP, o) = m.indices[o]\n\nfunction POMDPs.transition(m::TigerPOMDP, s, a)\n if a == \"listen\"\n return Deterministic(s) # tiger stays behind the same door\n else # a door is opened\n return Uniform([\"left\", \"right\"]) # reset\n end\nend\n\nfunction POMDPs.observation(m::TigerPOMDP, a, sp)\n if a == \"listen\"\n if sp == \"left\"\n return SparseCat([\"left\", \"right\"], [m.p_correct, 1.0-m.p_correct])\n else\n return SparseCat([\"right\", \"left\"], [m.p_correct, 1.0-m.p_correct])\n end\n else\n return Uniform([\"left\", \"right\"])\n end\nend\n\nfunction POMDPs.reward(m::TigerPOMDP, s, a)\n if a == \"listen\"\n return -1.0\n elseif s == a # the tiger was found\n return -100.0\n else # the tiger was escaped\n return 10.0\n end\nend\n\nPOMDPs.initialstate(m::TigerPOMDP) = Uniform([\"left\", \"right\"])\n# output","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"It is easy to see that the new methods are similar to the keyword arguments in the QuickPOMDP approach, except that every function has an initial m argument that has the newly created POMDP type. There are several differences from the QuickPOMDP approach: First, the POMDP is represented by a new struct that is a subtype of POMDP{S,A,O}. The state, action, and observation types must be specified as the S, A, and O parameters of the POMDP abstract type. Second, this new struct may contain problem-specific fields, which makes it easy for others to construct POMDPs that have the same structure but different parameters. For example, in the code above, the struct has a p_correct parameter that specifies the probability of receiving a correct observation when the \"listen\" action is taken. The final and most cumbersome difference between this object-oriented approach and using QuickPOMDPs is that the user must implement stateindex, actionindex, and obsindex to map states, actions, and observations to appropriate indices so that data such as values can be stored and accessed efficiently in vectors.","category":"page"},{"location":"def_pomdp/#Using-a-single-generative-function-instead-of-separate-T,-Z,-and-R","page":"Defining POMDPs and MDPs","title":"Using a single generative function instead of separate T, Z, and R","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"In some cases, you may wish to use a simulator that generates the next state, observation, and/or reward (s, o, and r) simultaneously. This is sometimes called a \"generative model\".","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"For example if you are working on an autonomous driving POMDP, the car may travel for one or more seconds in between POMDP decision steps during which it may accumulate reward and observation measurements. In this case it might be very difficult to create a reward or observation function based on s, a, and s arguments.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"For situations like this, gen is an alternative to transition, observation, and reward. The gen function should take in state, action, and random number generator arguments and return a NamedTuple with keys sp (for \"s-prime\", the next state), o, and r. The mountaincar example above can be implemented with gen as shown below.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"note: Note\ngen is intended only for the case where two or more of the next state, observation, and reward need to be generated at the same time. If the state transition model can be separated from the reward and observation models, you should implement transition with an ImplicitDistribution instead of gen. See also the \"What is the difference between transition, gen, and @gen?\" FAQ.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"using QuickPOMDPs: QuickPOMDP\nusing POMDPTools: ImplicitDistribution\n\nmountaincar = QuickPOMDP(\n actions = [-1., 0., 1.],\n obstype = Float64,\n discount = 0.95,\n\n gen = function (s, a, rng)\n x, v = s\n vp = v + a*0.001 + cos(3*x)*-0.0025 + 0.0002*randn(rng)\n vp = clamp(vp, -0.07, 0.07)\n xp = x + vp\n if xp > 0.5\n r = 100.0\n else\n r = -1.0\n end\n o = xp + 0.15*randn(rng)\n return (sp=(xp, vp), o=o, r=r)\n end,\n\n initialstate = ImplicitDistribution(rng -> (-0.2*rand(rng), 0.0)),\n isterminal = s -> s[1] > 0.5\n)","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"tip: Tip\ngen is not tied to the QuickPOMDP approach; it can also be used in the object-oriented paradigm.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"tip: Tip\nIt is possible to mix and match gen with transtion, observation, and reward. For example, if the gen function returns a NamedTuple with sp and r keys, POMDPs.jl will try to use gen to generate states and rewards and the observation function to generate observations.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"note: Note\nImplementing gen instead of transition, observation, and reward will limit which solvers you can use; for example, it is impossible to use a solver that requires an explicit transition distribution","category":"page"},{"location":"def_pomdp/#Tabular","page":"Defining POMDPs and MDPs","title":"Tabular","text":"","category":"section"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"Finally, it is sometimes convenient to define (PO)MDPs with tables that define the transition and observation probabilities and rewards. In this case, the states, actions, and observations must simply be integers.","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"The code below is a tabular implementation of the tiger example with the states, actions, and observations mapped to the following integers:","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"integer state, action, or observation\n1 \"left\"\n2 \"right\"\n3 \"listen\"","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"using POMDPModels: TabularPOMDP\n\nT = zeros(2,3,2)\nT[:,:,1] = [1. 0.5 0.5; \n 0. 0.5 0.5]\nT[:,:,2] = [0. 0.5 0.5; \n 1. 0.5 0.5]\n\nO = zeros(2,3,2)\nO[:,:,1] = [0.85 0.5 0.5; \n 0.15 0.5 0.5]\nO[:,:,2] = [0.15 0.5 0.5; \n 0.85 0.5 0.5]\n\nR = [-1. -100. 10.; \n -1. 10. -100.]\n\nm = TabularPOMDP(T, R, O, 0.95)","category":"page"},{"location":"def_pomdp/","page":"Defining POMDPs and MDPs","title":"Defining POMDPs and MDPs","text":"Here T is a S times A times S array representing the transition probabilities, with T[sp, a, s] = T(s s a). Similarly, O is an O times A times S encoding the observation distribution with O[o, a, sp] = Z(o a s), and R is a S times A matrix that encodes the reward function. 0.95 is the discount factor.","category":"page"},{"location":"concepts/#Concepts-and-Architecture","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"","category":"section"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"POMDPs.jl aims to coordinate the development of three software components: 1) a problem, 2) a solver, 3) an experiment. Each of these components has a set of abstract types associated with it and a set of functions that allow a user to define each component's behavior in a standardized way. An outline of the architecture is shown below.","category":"page"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"(Image: concepts)","category":"page"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"The MDP and POMDP types are associated with the problem definition. The Solver and Policy types are associated with the solver or decision-making agent. Typically, the Updater type is also associated with the solver, but a solver may sometimes be used with an updater that was implemented separately. The Simulator type is associated with the experiment.","category":"page"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"The code components of the POMDPs.jl ecosystem relevant to problems and solvers are shown below. The arrows represent the flow of information from the problems to the solvers. The figure shows the two interfaces that form POMDPs.jl - Explicit and Generative. Details about these interfaces can be found in the section on Defining POMDPs.","category":"page"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"(Image: interface_relationships)","category":"page"},{"location":"concepts/#POMDPs-and-MDPs","page":"Concepts and Architecture","title":"POMDPs and MDPs","text":"","category":"section"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"An MDP is a mathematical framework for sequential decision making under uncertainty, and where all of the uncertainty arises from outcomes that are partially random and partially under the control of a decision maker. Mathematically, an MDP is a tuple (SATRgamma), where S is the state space, A is the action space, T is a transition function defining the probability of transitioning to each state given the state and action at the previous time, and R is a reward function mapping every possible transition (sas) to a real reward value. Finally, gamma is a discount factor that defines the relative weighting of current and future rewards. For more information see a textbook such as [1]. In POMDPs.jl an MDP is represented by a concrete subtype of the MDP abstract type and a set of methods that define each of its components as described in the problem definition section.","category":"page"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"A POMDP is a more general sequential decision making problem in which the agent is not sure what state they are in. The state is only partially observable by the decision making agent. Mathematically, a POMDP is a tuple (SATROZgamma) where S, A, T, R, and gamma have the same meaning as in an MDP, Z is the agent's observation space, and O defines the probability of receiving each observation at a transition. In POMDPs.jl, a POMDP is represented by a concrete subtype of the POMDP abstract type, and the methods described in the problem definition section.","category":"page"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"POMDPs.jl contains additional functions for defining optional problem behavior such as an initial state distribution or terminal states. More information can be found in the Defining POMDPs section.","category":"page"},{"location":"concepts/#Beliefs-and-Updaters","page":"Concepts and Architecture","title":"Beliefs and Updaters","text":"","category":"section"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"In a POMDP domain, the decision-making agent does not have complete information about the state of the problem, so the agent can only make choices based on its \"belief\" about the state. In the POMDP literature, the term \"belief\" is typically defined to mean a probability distribution over all possible states of the system. However, in practice, the agent often makes decisions based on an incomplete or lossy record of past observations that has a structure much different from a probability distribution. For example, if the agent is represented by a finite-state controller, as is the case for Monte-Carlo Value Iteration [2], the belief is the controller state, which is a node in a graph. Another example is an agent represented by a recurrent neural network. In this case, the agent's belief is the state of the network. In order to accommodate a wide variety of decision-making approaches in POMDPs.jl, we use the term \"belief\" to denote the set of information that the agent makes a decision on, which could be an exact state distribution, an action-observation history, a set of weighted particles, or the examples mentioned before. In code, the belief can be represented by any built-in or user-defined type.","category":"page"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"When an action is taken and a new observation is received, the belief is updated by the belief updater. In code, a belief updater is represented by a concrete subtype of the Updater abstract type, and the update(updater, belief, action, observation) function defines how the belief is updated when a new observation is received.","category":"page"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"Although the agent may use a specialized belief structure to make decisions, the information initially given to the agent about the state of the problem is usually most conveniently represented as a state distribution, thus the initialize_belief function is provided to convert a state distribution to a specialized belief structure that an updater can work with.","category":"page"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"In many cases, the belief structure is closely related to the solution technique, so it will be implemented by the programmer who writes the solver. In other cases, the agent can use a variety of belief structures to make decisions, so a domain-specific updater implemented by the programmer that wrote the problem description may be appropriate. Finally, some advanced generic belief updaters such as particle filters may be implemented by a third party. The convenience function updater(policy) can be used to get a suitable default updater for a policy, however many policies can work with other updaters.","category":"page"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"For more information on implementing a belief updater, see Defining a Belief Updater","category":"page"},{"location":"concepts/#Solvers-and-Policies","page":"Concepts and Architecture","title":"Solvers and Policies","text":"","category":"section"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"Sequential decision making under uncertainty involves both online and offline calculations. In the broad sense, the term \"solver\" as used in the node in the figure at the top of the page refers to the software package that performs the calculations at both of these times. However, the code is broken up into two pieces, the solver that performs calculations offline and the policy that performs calculations online.","category":"page"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"In the abstract, a policy is a mapping from every belief that an agent might take to an action. A policy is represented in code by a concrete subtype of the Policy abstract type. The programmer implements action to describe what computations need to be done online. For an online solver such as POMCP, all of the decision computation occurs within action while for an offline solver like SARSOP, there is very little computation within action. See Interacting with Policies for more information.","category":"page"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"The offline portion of the computation is carried out by the solver, which is represented by a concrete subtype of the Solver abstract type. Computations occur within the solve function. For an offline solver like SARSOP, nearly all of the decision computation occurs within this function, but for some online solvers such as POMCP, solve merely embeds the problem in the policy.","category":"page"},{"location":"concepts/#Simulators","page":"Concepts and Architecture","title":"Simulators","text":"","category":"section"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"A simulator defines a way to run one or more simulations. It is represented by a concrete subtype of the Simulator abstract type and the simulation is an implemention of simulate. Depending on the simulator, simulate may return a variety of data about the simulation, such as the discounted reward or the state history. All simulators should perform simulations consistent with the Simulation Standard.","category":"page"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"[1] Decision Making Under Uncertainty: Theory and Application by Mykel J. Kochenderfer, MIT Press, 2015","category":"page"},{"location":"concepts/","page":"Concepts and Architecture","title":"Concepts and Architecture","text":"[2] Bai, H., Hsu, D., & Lee, W. S. (2014). Integrated perception and planning in the continuous space: A POMDP approach. The International Journal of Robotics Research, 33(9), 1288-1302","category":"page"},{"location":"interfaces/#Spaces-and-Distributions","page":"Spaces and Distributions","title":"Spaces and Distributions","text":"","category":"section"},{"location":"interfaces/","page":"Spaces and Distributions","title":"Spaces and Distributions","text":"Two important components of the definitions of MDPs and POMDPs are spaces, which specify the possible states, actions, and observations in a problem and distributions, which define probability distributions. In order to provide for maximum flexibility spaces and distributions may be of any type (i.e. there are no abstract base types). Solvers and simulators will interact with space and distribution types using the functions defined below.","category":"page"},{"location":"interfaces/#space-interface","page":"Spaces and Distributions","title":"Spaces","text":"","category":"section"},{"location":"interfaces/","page":"Spaces and Distributions","title":"Spaces and Distributions","text":"A space object should contain the information needed to define the set of all possible states, actions or observations. The implementation will depend on the attributes of the elements. For example, if the space is continuous, the space object may only contain the limits of the continuous range. In the case of a discrete problem, a vector containing all states is appropriate for representing a space.","category":"page"},{"location":"interfaces/","page":"Spaces and Distributions","title":"Spaces and Distributions","text":"The following functions may be called on a space object (Click on a function to read its documentation):","category":"page"},{"location":"interfaces/","page":"Spaces and Distributions","title":"Spaces and Distributions","text":"rand\niterate and the rest of the iteration interface for discrete spaces.","category":"page"},{"location":"interfaces/#Distributions","page":"Spaces and Distributions","title":"Distributions","text":"","category":"section"},{"location":"interfaces/","page":"Spaces and Distributions","title":"Spaces and Distributions","text":"A distribution object represents a probability distribution.","category":"page"},{"location":"interfaces/","page":"Spaces and Distributions","title":"Spaces and Distributions","text":"The following functions may be called on a distribution object (Click on a function to read its documentation):","category":"page"},{"location":"interfaces/","page":"Spaces and Distributions","title":"Spaces and Distributions","text":"rand([rng,] d) [1]\nsupport\npdf\nmode\nmean","category":"page"},{"location":"interfaces/","page":"Spaces and Distributions","title":"Spaces and Distributions","text":"You can find some useful pre-made distribution objects in Distributions.jl or POMDPTools.","category":"page"},{"location":"interfaces/","page":"Spaces and Distributions","title":"Spaces and Distributions","text":"[1]: Distributions should support both rand(rng::AbstractRNG, d) and rand(d). The recommended way to do this is by implmenting Base.rand(rng::AbstractRNG, s::Random.SamplerTrivial{<:YourDistribution}) from the julia rand interface.","category":"page"},{"location":"POMDPTools/#pomdptools_section","page":"POMDPTools: the standard library for POMDPs.jl","title":"POMDPTools: the standard library for POMDPs.jl","text":"","category":"section"},{"location":"POMDPTools/","page":"POMDPTools: the standard library for POMDPs.jl","title":"POMDPTools: the standard library for POMDPs.jl","text":"The POMDPs.jl package does nothing more than define an interface or language for interacting with and solving (PO)MDPs; it does not contain any implementations. In practice, defining and solving POMDPs is made vastly easier if some commonly-used structures are provided. The POMDPTools package contains these implementations. Thus, the relationship between POMDPs.jl and POMDPTools is similar to the relationship between a programming language and its standard library.","category":"page"},{"location":"POMDPTools/","page":"POMDPTools: the standard library for POMDPs.jl","title":"POMDPTools: the standard library for POMDPs.jl","text":"The POMDPTools package source code is hosted in the POMDPs.jl github repository in the lib/POMDPTools directory.","category":"page"},{"location":"POMDPTools/","page":"POMDPTools: the standard library for POMDPs.jl","title":"POMDPTools: the standard library for POMDPs.jl","text":"The contents of the library are outlined below:","category":"page"},{"location":"POMDPTools/","page":"POMDPTools: the standard library for POMDPs.jl","title":"POMDPTools: the standard library for POMDPs.jl","text":"Pages = [\"distributions.md\", \"model.md\", \"visualization.md\", \"beliefs.md\", \"policies.md\", \"simulators.md\", \"common_rl.md\", \"testing.md\"]","category":"page"},{"location":"POMDPTools/policies/#Implemented-Policies","page":"Implemented Policies","title":"Implemented Policies","text":"","category":"section"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"POMDPTools currently provides the following policy types:","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"a wrapper to turn a function into a Policy\nan alpha vector policy type\na random policy\na stochastic policy type\nexploration policies\na vector policy type\na wrapper to collect statistics and errors about policies","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"In addition, it provides the showpolicy function for printing policies similar to the way that matrices are printed in the repl and the evaluate function for evaluating MDP policies.","category":"page"},{"location":"POMDPTools/policies/#Function","page":"Implemented Policies","title":"Function","text":"","category":"section"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"Wraps a Function mapping states to actions into a Policy. ","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"FunctionPolicy","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.FunctionPolicy","page":"Implemented Policies","title":"POMDPTools.Policies.FunctionPolicy","text":"FunctionPolicy\n\nPolicy p=FunctionPolicy(f) returns f(x) when action(p, x) is called.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"FunctionSolver","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.FunctionSolver","page":"Implemented Policies","title":"POMDPTools.Policies.FunctionSolver","text":"FunctionSolver\n\nSolver for a FunctionPolicy.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/policies/#Alpha-Vector-Policy","page":"Implemented Policies","title":"Alpha Vector Policy","text":"","category":"section"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"Represents a policy with a set of alpha vectors (See AlphaVectorPolicy constructor docstring). In addition to finding the optimal action with action, the alpha vectors can be accessed with alphavectors or alphapairs.","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"Determining the estimated value and optimal action depends on calculating the dot product between alpha vectors and a belief vector. POMDPTools.Policies.beliefvec(pomdp, b) is used to create this vector and can be overridden for new belief types for efficiency.","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"AlphaVectorPolicy\nalphavectors\nalphapairs\nPOMDPTools.Policies.beliefvec","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.AlphaVectorPolicy","page":"Implemented Policies","title":"POMDPTools.Policies.AlphaVectorPolicy","text":"AlphaVectorPolicy(pomdp::POMDP, alphas, action_map)\n\nConstruct a policy from alpha vectors.\n\nArguments\n\nalphas: an |S| x (number of alpha vecs) matrix or a vector of alpha vectors.\naction_map: a vector of the actions correponding to each alpha vector\nAlphaVectorPolicy{P<:POMDP, A}\n\nRepresents a policy with a set of alpha vectors.\n\nUse action to get the best action for a belief, and alphavectors and alphapairs to \n\nFields\n\npomdp::P the POMDP problem \nn_states::Int the number of states in the POMDP\nalphas::Vector{Vector{Float64}} the list of alpha vectors\naction_map::Vector{A} a list of action corresponding to the alpha vectors\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/policies/#POMDPTools.Policies.alphavectors","page":"Implemented Policies","title":"POMDPTools.Policies.alphavectors","text":"Return the alpha vectors.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/policies/#POMDPTools.Policies.alphapairs","page":"Implemented Policies","title":"POMDPTools.Policies.alphapairs","text":"Return an iterator of alpha vector-action pairs in the policy.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/policies/#POMDPTools.Policies.beliefvec","page":"Implemented Policies","title":"POMDPTools.Policies.beliefvec","text":"POMDPTools.Policies.beliefvec(m::POMDP, n_states::Int, b)\n\nReturn a vector-like representation of the belief b suitable for calculating the dot product with the alpha vectors.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/policies/#Random-Policy","page":"Implemented Policies","title":"Random Policy","text":"","category":"section"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"A policy that returns a randomly selected action using rand(rng, actions(pomdp)).","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"RandomPolicy","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.RandomPolicy","page":"Implemented Policies","title":"POMDPTools.Policies.RandomPolicy","text":"RandomPolicy{RNG<:AbstractRNG, P<:Union{POMDP,MDP}, U<:Updater}\n\na generic policy that uses the actions function to create a list of actions and then randomly samples an action from it.\n\nConstructor:\n\n`RandomPolicy(problem::Union{POMDP,MDP};\n rng=Random.default_rng(),\n updater=NothingUpdater())`\n\nFields\n\nrng::RNG a random number generator \nprobelm::P the POMDP or MDP problem \nupdater::U a belief updater (default to NothingUpdater in the above constructor)\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"RandomSolver","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.RandomSolver","page":"Implemented Policies","title":"POMDPTools.Policies.RandomSolver","text":"solver that produces a random policy\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/policies/#Stochastic-Policies","page":"Implemented Policies","title":"Stochastic Policies","text":"","category":"section"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"Types for representing randomized policies:","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"StochasticPolicy samples actions from an arbitrary distribution.\nUniformRandomPolicy samples actions uniformly (see RandomPolicy for a similar use)\nCategoricalTabularPolicy samples actions from a categorical distribution with weights given by a ValuePolicy.","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"StochasticPolicy","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.StochasticPolicy","page":"Implemented Policies","title":"POMDPTools.Policies.StochasticPolicy","text":"StochasticPolicy{D, RNG <: AbstractRNG}\n\nRepresents a stochastic policy. Action are sampled from an arbitrary distribution.\n\nConstructor:\n\n`StochasticPolicy(distribution; rng=Random.default_rng())`\n\nFields\n\ndistribution::D\nrng::RNG a random number generator\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"CategoricalTabularPolicy","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.CategoricalTabularPolicy","page":"Implemented Policies","title":"POMDPTools.Policies.CategoricalTabularPolicy","text":"CategoricalTabularPolicy\n\nrepresents a stochastic policy sampling an action from a categorical distribution with weights given by a ValuePolicy\n\nconstructor:\n\nCategoricalTabularPolicy(mdp::Union{POMDP,MDP}; rng=Random.default_rng())\n\nFields\n\nstochastic::StochasticPolicy\nvalue::ValuePolicy\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/policies/#Vector-Policies","page":"Implemented Policies","title":"Vector Policies","text":"","category":"section"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"Tabular policies including the following:","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"VectorPolicy holds a vector of actions, one for each state, ordered according to stateindex.\nValuePolicy holds a matrix of values for state-action pairs and chooses the action with the highest value at the given state","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"VectorPolicy ","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.VectorPolicy","page":"Implemented Policies","title":"POMDPTools.Policies.VectorPolicy","text":"VectorPolicy{S,A}\n\nA generic MDP policy that consists of a vector of actions. The entry at stateindex(mdp, s) is the action that will be taken in state s.\n\nFields\n\nmdp::MDP{S,A} the MDP problem\nact::Vector{A} a vector of size |S| mapping state indices to actions\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"VectorSolver","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.VectorSolver","page":"Implemented Policies","title":"POMDPTools.Policies.VectorSolver","text":"VectorSolver{A}\n\nSolver for VectorPolicy. Doesn't do any computation - just sets the action vector.\n\nFields\n\nact::Vector{A} the action vector\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"ValuePolicy","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.ValuePolicy","page":"Implemented Policies","title":"POMDPTools.Policies.ValuePolicy","text":" ValuePolicy{P<:Union{POMDP,MDP}, T<:AbstractMatrix{Float64}, A}\n\nA generic MDP policy that consists of a value table. The entry at stateindex(mdp, s) is the action that will be taken in state s. It is expected that the order of the actions in the value table is consistent with the order of the actions in act. If act is not explicitly set in the construction, act is ordered according to actionindex.\n\nFields\n\nmdp::P the MDP problem\nvalue_table::T the value table as a |S|x|A| matrix\nact::Vector{A} the possible actions\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/policies/#Value-Dict-Policy","page":"Implemented Policies","title":"Value Dict Policy","text":"","category":"section"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"ValueDictPolicy holds a dictionary of values, where the key is state-action tuple, and chooses the action with the highest value at the given state. It allows one to write solvers without enumerating state and action spaces, but actions and states must support Base.isequal() and Base.hash().","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"ValueDictPolicy","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.ValueDictPolicy","page":"Implemented Policies","title":"POMDPTools.Policies.ValueDictPolicy","text":" ValueDictPolicy(mdp)\n\nA generic MDP policy that consists of a Dict storing Q-values for state-action pairs. If there are no entries higher than a default value, this will fall back to a default policy.\n\nKeyword Arguments\n\nvalue_table::AbstractDict the value dict, key is (s, a) Tuple.\ndefault_value::Float64 the defalut value of value_dict.\ndefault_policy::Policy the policy taken when no action has a value higher than default_value\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/policies/#Exploration-Policies","page":"Implemented Policies","title":"Exploration Policies","text":"","category":"section"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"Exploration policies are often useful for Reinforcement Learning algorithm to choose an action that is different than the action given by the policy being learned (on_policy). ","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"Exploration policies are subtype of the abstract ExplorationPolicy type and they follow the following interface: action(exploration_policy::ExplorationPolicy, on_policy::Policy, k, s). k is used to compute the value of the exploration parameter (see Schedule), and s is the current state or observation in which the agent is taking an action.","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"The action method is exported by POMDPs.jl. To use exploration policies in a solver, you must use the four argument version of action where on_policy is the policy being learned (e.g. tabular policy or neural network policy).","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"This package provides two exploration policies: EpsGreedyPolicy and SoftmaxPolicy","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":" EpsGreedyPolicy\n SoftmaxPolicy","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.EpsGreedyPolicy","page":"Implemented Policies","title":"POMDPTools.Policies.EpsGreedyPolicy","text":"EpsGreedyPolicy <: ExplorationPolicy\n\nrepresents an epsilon greedy policy, sampling a random action with a probability eps or returning an action from a given policy otherwise. The evolution of epsilon can be controlled using a schedule. This feature is useful for using those policies in reinforcement learning algorithms. \n\nConstructor:\n\nEpsGreedyPolicy(problem::Union{MDP, POMDP}, eps::Union{Function, Float64}; rng=Random.default_rng(), schedule=ConstantSchedule)\n\nIf a function is passed for eps, eps(k) is called to compute the value of epsilon when calling action(exploration_policy, on_policy, k, s).\n\nFields\n\neps::Function\nrng::AbstractRNG\nm::M POMDPs or MDPs problem\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/policies/#POMDPTools.Policies.SoftmaxPolicy","page":"Implemented Policies","title":"POMDPTools.Policies.SoftmaxPolicy","text":"SoftmaxPolicy <: ExplorationPolicy\n\nrepresents a softmax policy, sampling a random action according to a softmax function. The softmax function converts the action values of the on policy into probabilities that are used for sampling. A temperature parameter or function can be used to make the resulting distribution more or less wide.\n\nConstructor\n\nSoftmaxPolicy(problem, temperature::Union{Function, Float64}; rng=Random.default_rng())\n\nIf a function is passed for temperature, temperature(k) is called to compute the value of the temperature when calling action(exploration_policy, on_policy, k, s)\n\nFields\n\ntemperature::Function\nrng::AbstractRNG\nactions::A an indexable list of action\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/policies/#Schedule","page":"Implemented Policies","title":"Schedule","text":"","category":"section"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"Exploration policies often rely on a key parameter: epsilon in epsilon-greedy and the temperature in softmax for example. Reinforcement learning algorithms often require a decay schedule for these parameters. Schedule can be passed to an exploration policy as functions. For example one can define an epsilon greedy policy with an exponential decay schedule as follow: ","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":" m # your mdp or pomdp model\n exploration_policy = EpsGreedyPolicy(m, k->0.05*0.9^(k/10))","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"POMDPTools exports a linear decay schedule object that can be used as well. ","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":" LinearDecaySchedule ","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.LinearDecaySchedule","page":"Implemented Policies","title":"POMDPTools.Policies.LinearDecaySchedule","text":"LinearDecaySchedule\n\nA schedule that linearly decreases a value from start to stop in steps steps. if the value is greater or equal to stop, it stays constant.\n\nConstructor\n\nLinearDecaySchedule(;start, stop, steps)\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/policies/#Playback-Policy","page":"Implemented Policies","title":"Playback Policy","text":"","category":"section"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"A policy that replays a fixed sequence of actions. When all actions are used, a backup policy is used.","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"PlaybackPolicy","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.PlaybackPolicy","page":"Implemented Policies","title":"POMDPTools.Policies.PlaybackPolicy","text":"PlaybackPolicy{A<:AbstractArray, P<:Policy, V<:AbstractArray{<:Real}}\n\na policy that applies a fixed sequence of actions until they are all used and then falls back onto a backup policy until the end of the episode.\n\nConstructor:\n\n`PlaybackPolicy(actions::AbstractArray, backup_policy::Policy; logpdfs::AbstractArray{Float64, 1} = Float64[])`\n\nFields\n\nactions::Vector{A} a vector of actions to play back\nbackup_policy::Policy the policy to use when all prescribed actions have been taken but the episode continues\nlogpdfs::Vector{Float64} the log probability (density) of actions\ni::Int64 the current action index\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/policies/#Utility-Wrapper","page":"Implemented Policies","title":"Utility Wrapper","text":"","category":"section"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"A wrapper for policies to collect statistics and handle errors.","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"PolicyWrapper","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.PolicyWrapper","page":"Implemented Policies","title":"POMDPTools.Policies.PolicyWrapper","text":"PolicyWrapper\n\nFlexible utility wrapper for a policy designed for collecting statistics about planning.\n\nCarries a function, a policy, and optionally a payload (that can be any type).\n\nThe function should typically be defined with the do syntax. Each time action is called on the wrapper, this function will be called.\n\nIf there is no payload, it will be called with two argments: the policy and the state/belief. If there is a payload, it will be called with three arguments: the policy, the payload, and the current state or belief. The function should return an appropriate action. The idea is that, in this function, action(policy, s) should be called, statistics from the policy/planner should be collected and saved in the payload, exceptions can be handled, and the action should be returned.\n\nConstructor\n\nPolicyWrapper(policy::Policy; payload=nothing)\n\nExample\n\nusing POMDPModels\nusing POMDPToolbox\n\nmdp = GridWorld()\npolicy = RandomPolicy(mdp)\ncounts = Dict(a=>0 for a in actions(mdp))\n\n# with a payload\nstatswrapper = PolicyWrapper(policy, payload=counts) do policy, counts, s\n a = action(policy, s)\n counts[a] += 1\n return a\nend\n\nh = simulate(HistoryRecorder(max_steps=100), mdp, statswrapper)\nfor (a, count) in payload(statswrapper)\n println(\"policy chose action $a $count of $(n_steps(h)) times.\")\nend\n\n# without a payload\nerrwrapper = PolicyWrapper(policy) do policy, s\n try\n a = action(policy, s)\n catch ex\n @warn(\"Caught error in policy; using default\")\n a = :left\n end\n return a\nend\n\nh = simulate(HistoryRecorder(max_steps=100), mdp, errwrapper)\n\nFields\n\nf::F\npolicy::P\npayload::PL\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/policies/#Pretty-Printing-Policies","page":"Implemented Policies","title":"Pretty Printing Policies","text":"","category":"section"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"showpolicy","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.showpolicy","page":"Implemented Policies","title":"POMDPTools.Policies.showpolicy","text":"showpolicy([io], [mime], m::MDP, p::Policy)\nshowpolicy([io], [mime], statelist::AbstractVector, p::Policy)\nshowpolicy(...; pre=\" \")\n\nPrint the states in m or statelist and the actions from policy p corresponding to those states.\n\nFor the MDP version, if io[:limit] is true, will only print enough states to fill the display.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/policies/#Policy-Evaluation","page":"Implemented Policies","title":"Policy Evaluation","text":"","category":"section"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"The evaluate function provides a policy evaluation tool for MDPs:","category":"page"},{"location":"POMDPTools/policies/","page":"Implemented Policies","title":"Implemented Policies","text":"evaluate","category":"page"},{"location":"POMDPTools/policies/#POMDPTools.Policies.evaluate","page":"Implemented Policies","title":"POMDPTools.Policies.evaluate","text":"evaluate(m::MDP, p::Policy)\nevaluate(m::MDP, p::Policy; rewardfunction=POMDPs.reward)\n\nCalculate the value for a policy on an MDP using the approach in equation 4.2.2 of Kochenderfer, Decision Making Under Uncertainty, 2015.\n\nReturns a DiscreteValueFunction, which maps states to values.\n\nExample\n\nusing POMDPTools, POMDPModels\nm = SimpleGridWorld()\nu = evaluate(m, FunctionPolicy(x->:left))\nu([1,1]) # value of always moving left starting at state [1,1]\n\n\n\n\n\n","category":"function"},{"location":"def_updater/#Defining-a-Belief-Updater","page":"Defining a Belief Updater","title":"Defining a Belief Updater","text":"","category":"section"},{"location":"def_updater/","page":"Defining a Belief Updater","title":"Defining a Belief Updater","text":"In this section we list the requirements for defining a belief updater. For a description of what a belief updater is, see Concepts and Architecture - Beliefs and Updaters. Typically a belief updater will have an associated belief type, and may be closely tied to a particular policy/planner.","category":"page"},{"location":"def_updater/#Defining-a-Belief-Type","page":"Defining a Belief Updater","title":"Defining a Belief Type","text":"","category":"section"},{"location":"def_updater/","page":"Defining a Belief Updater","title":"Defining a Belief Updater","text":"A belief object should contain all of the information needed for the next belief update and for the policy to make a decision. The belief type could be a pre-defined type such as a distribution from Distributions.jl or DiscreteBelief or SparseCat from the POMDPTools package, or it could be a custom type.","category":"page"},{"location":"def_updater/","page":"Defining a Belief Updater","title":"Defining a Belief Updater","text":"Often, but not always, the belief will represent a probability distribution. In this case, the functions in the distribution interface should be implemented if possible. Implementing these functions will make the belief usable with many of the policies and planners in the POMDPs.jl ecosystem, and will make it easy for others to convert between beliefs and to interpret what a belief means.","category":"page"},{"location":"def_updater/#Histories-associated-with-a-belief","page":"Defining a Belief Updater","title":"Histories associated with a belief","text":"","category":"section"},{"location":"def_updater/","page":"Defining a Belief Updater","title":"Defining a Belief Updater","text":"If a complete or partial record of the action-observation history leading up to a belief is available, it is often helpful to give access to this by implementing the history or currentobs functions (see the docstrings for more details). This is especially useful if a problem-writer wants to implement a belief- or observation-dependent action space. Belief type implementers need only implement history, and currentobs will automatically be provided, though sometimes it is more convenient to implement currentobs directly.","category":"page"},{"location":"def_updater/#Defining-an-Updater","page":"Defining a Belief Updater","title":"Defining an Updater","text":"","category":"section"},{"location":"def_updater/","page":"Defining a Belief Updater","title":"Defining a Belief Updater","text":"To create an updater, one should define a subtype of the Updater abstract type and implement two methods, one to create the initial belief from the problem's initial state distribution and one to perform a belief update:","category":"page"},{"location":"def_updater/","page":"Defining a Belief Updater","title":"Defining a Belief Updater","text":"initialize_belief(updater, d) creates a belief from state distribution d appropriate to use with the updater. To extract information from d, use the functions from the distribution interface.\nupdate(updater, b, a, o) returns an updated belief given belief b, action a, and observation o. One can usually expect b to be the same type returned by initialize_belief because a careful user will always call initialize_belief before update, but it would also be reasonable to implement update for b of a different type if it is desirable to handle multiple belief types.","category":"page"},{"location":"def_updater/#Example:-History-Updater","page":"Defining a Belief Updater","title":"Example: History Updater","text":"","category":"section"},{"location":"def_updater/","page":"Defining a Belief Updater","title":"Defining a Belief Updater","text":"One trivial type of belief would be the action-observation history, a list containing the initial state distribution and every action taken and observation received. The history contains all of the information received up to the current time, but it is not usually very useful because most policies make decisions based on a state probability distribution. Here the belief type is simply the built in Vector{Any}, so we need only create the updater and write update and initialize_belief. Normally, update would contain belief update probability calculations, but in this example, we simply append the action and observation to the history.","category":"page"},{"location":"def_updater/","page":"Defining a Belief Updater","title":"Defining a Belief Updater","text":"(Note that this example is designed for readability rather than efficiency.)","category":"page"},{"location":"def_updater/","page":"Defining a Belief Updater","title":"Defining a Belief Updater","text":"import POMDPs\n\nstruct HistoryUpdater <: POMDPs.Updater end\n\nPOMDPs.initialize_belief(up::HistoryUpdater, d) = Any[d]\n\nfunction POMDPs.update(up::HistoryUpdater, b, a, o)\n bp = copy(b)\n push!(bp, a)\n push!(bp, o)\n return bp\nend","category":"page"},{"location":"def_updater/","page":"Defining a Belief Updater","title":"Defining a Belief Updater","text":"At each step, the history starts with the original distribution, then contains all the actions and observations received up to that point. The example below shows this for the crying baby problem (observations are true/false for crying and actions are true/false for feeding).","category":"page"},{"location":"def_updater/","page":"Defining a Belief Updater","title":"Defining a Belief Updater","text":"using POMDPTools\nusing POMDPModels\nusing Random\n\npomdp = BabyPOMDP()\npolicy = RandomPolicy(pomdp, rng=MersenneTwister(1))\nup = HistoryUpdater()\n\n# within stepthrough initialize_belief is called on the initial state distribution of the pomdp, then update is called at each step.\nfor b in stepthrough(pomdp, policy, up, \"b\", rng=MersenneTwister(2), max_steps=5)\n @show b\nend\n\n# output\n\nb = Any[POMDPModels.BoolDistribution(0.0)]\nb = Any[POMDPModels.BoolDistribution(0.0), false, false]\nb = Any[POMDPModels.BoolDistribution(0.0), false, false, false, false]\nb = Any[POMDPModels.BoolDistribution(0.0), false, false, false, false, true, false]\nb = Any[POMDPModels.BoolDistribution(0.0), false, false, false, false, true, false, true, false]","category":"page"},{"location":"faq/#Frequently-Asked-Questions-(FAQ)","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"","category":"section"},{"location":"faq/#What-is-the-difference-between-transition,-gen,-and-@gen?","page":"Frequently Asked Questions (FAQ)","title":"What is the difference between transition, gen, and @gen?","text":"","category":"section"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"(See also: Using a single generative function instead of separate T, Z, and R)","category":"page"},{"location":"faq/#For-problem-implementers","page":"Frequently Asked Questions (FAQ)","title":"For problem implementers","text":"","category":"section"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"transition should be implemented to define the state transition distribution, either explicitly, or, if only samples from the distribution are available, with an ImplicitDistribution.\ngen should only be implemented if your simulator can only output samples of two or more of the next state, observation, and reward at the same time, e.g. if rewards are calculated as a robot moves from the current state to the next state so it is difficult to define the reward function separately from the state transitions.\n@gen should never be implemented or modified by the problem writer; it is only used in simulators and solvers (see below).","category":"page"},{"location":"faq/#For-solver/simulator-implementers","page":"Frequently Asked Questions (FAQ)","title":"For solver/simulator implementers","text":"","category":"section"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"@gen should be called whenever a sample of the next state, observation, and or reward is needed. It automatically combines calls to rand, transition, observation, reward, and gen, depending on what is implemented for the problem and the outputs requested by the caller without any overhead.\ntransition should be called only when you need access to the explicit transition probability distribution.\ngen should never be called directly by a solver or simulator; it is only a tool for implementers (see above).","category":"page"},{"location":"faq/#How-do-I-save-my-policies?","page":"Frequently Asked Questions (FAQ)","title":"How do I save my policies?","text":"","category":"section"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"We recommend using JLD2 to save the whole policy object:","category":"page"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"using JLD2\nsave(\"my_policy.jld2\", \"policy\", policy)","category":"page"},{"location":"faq/#Why-is-my-solver-producing-a-suboptimal-policy?","page":"Frequently Asked Questions (FAQ)","title":"Why is my solver producing a suboptimal policy?","text":"","category":"section"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"There could be a number of things that are going wrong. If you have a discrete POMDP or MDP and you're using a solver that requires the explicit transition probabilities, the first thing to try is make sure that your probability masses sum up to unity. We've provide some tools in POMDPToolbox that can check this for you. If you have a POMDP called pomdp, you can run the checks by doing the following:","category":"page"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"using POMDPTools\n@assert has_consistent_distributions(pomdp)","category":"page"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"If this throws an error, you may need to fix your transition or observation functions. ","category":"page"},{"location":"faq/#What-if-I-don't-use-the-rng-argument?","page":"Frequently Asked Questions (FAQ)","title":"What if I don't use the rng argument?","text":"","category":"section"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"POMDPs.jl uses Julia's built-in random number generator system to provide for reproducible simulations. To tie into this system, the gen function, the sampling function for the ImplicitDistribution, and the rand function for custom distributions all have an rng argument that should be used to generate random numbers. However in some cases, for example when wrapping a simulator that is tied to the global random number generator or written in another language, it may be impossible or impractical to use this rng.","category":"page"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"It is natural to wonder if ignoring this rng argument will cause problems. For many use cases, it is OK to ignore this argument - the only consequence will be that simulations will not be exactly reproducible unless the random seed is managed separately. Some algorithms, most notably DESPOT, rely on \"determinized scenarios\" that are implemented with a special rng. Some of the guarantees of these algorithms may not be met if the rng argument is ignored.","category":"page"},{"location":"faq/#Why-are-all-the-solvers-in-separate-modules?","page":"Frequently Asked Questions (FAQ)","title":"Why are all the solvers in separate modules?","text":"","category":"section"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"We did not put all the solvers and support tools into POMDPs.jl, because we wanted POMDPs.jl to be a lightweight interface package. This has a number of advantages. The first is that if a user only wants to use a few solvers from the JuliaPOMDP organization, they do not have to install all the other solvers and their dependencies. The second advantage is that people who are not directly part of the JuliaPOMDP organization can write their own solvers without going into the source code of other solvers. This makes the framework easier to adopt and to extend.","category":"page"},{"location":"faq/#How-can-I-implement-terminal-actions?","page":"Frequently Asked Questions (FAQ)","title":"How can I implement terminal actions?","text":"","category":"section"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"Terminal actions are actions that cause the MDP to terminate without generating a new state. POMDPs.jl handles terminal conditions via the isterminal function on states, and does not directly support terminal actions. If your MDP has a terminal action, you need to implement the model functions accordingly to generate a terminal state. In both generative and explicit cases, you will need some dummy state, say spt, that can be recognized as terminal by the isterminal function. One way to do this is to give spt a state value that is out of bounds (e.g. a vector of NaNs or -1s) and then check for that in isterminal, so that this does not clash with any conventional termination conditions on the state.","category":"page"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"If a terminal action is taken, regardless of current state, the transition function should return a distribution with only one next state, spt, with probability 1.0. In the generative case, the new state generated should be spt. The reward function or the r in generate_sr can be set according to the cost of the terminal action.","category":"page"},{"location":"faq/#Why-are-there-two-versions-of-reward?","page":"Frequently Asked Questions (FAQ)","title":"Why are there two versions of reward?","text":"","category":"section"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"Both reward(m, s, a) and reward(m, s, a, sp) are included because of these two facts:","category":"page"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"Some non-native solvers use reward(m, s, a)\nSometimes the reward depends on s and sp.","category":"page"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"It is reasonable to implement both as long as the (s, a) version is the expectation of the (s, a, s') version (see below).","category":"page"},{"location":"faq/#How-do-I-implement-reward(m,-s,-a)-if-the-reward-depends-on-the-next-state?","page":"Frequently Asked Questions (FAQ)","title":"How do I implement reward(m, s, a) if the reward depends on the next state?","text":"","category":"section"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"The solvers that require reward(m, s, a) only work on problems with finite state and action spaces. In this case, you can define reward(m, s, a) in terms of reward(m, s, a, sp) with the following code:","category":"page"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"const rdict = Dict{Tuple{S,A}, Float64}()\n\nfor s in states(m)\n for a in actions(m)\n r = 0.0\n td = transition(m, s, a) # transition distribution for s, a\n for sp in support(td)\n r += pdf(td, sp)*reward(m, s, a, sp)\n end\n rdict[(s, a)] = r\n end\nend\n\nPOMDPs.reward(m, s, a) = rdict[(s, a)]","category":"page"},{"location":"faq/#Why-do-I-need-to-put-type-assertions-pomdp::POMDP-into-the-function-signature?","page":"Frequently Asked Questions (FAQ)","title":"Why do I need to put type assertions pomdp::POMDP into the function signature?","text":"","category":"section"},{"location":"faq/","page":"Frequently Asked Questions (FAQ)","title":"Frequently Asked Questions (FAQ)","text":"Specifying the type in your function signature allows Julia to call the appropriate function when your custom type is passed into it. For example if a POMDPs.jl solver calls states on the POMDP that you passed into it, the correct states function will only get dispatched if you specified that the states function you wrote works with your POMDP type. Because Julia supports multiple-dispatch, these type assertion are a way for doing object-oriented programming in Julia.","category":"page"},{"location":"POMDPTools/beliefs/#Implemented-Belief-Updaters","page":"Implemented Belief Updaters","title":"Implemented Belief Updaters","text":"","category":"section"},{"location":"POMDPTools/beliefs/","page":"Implemented Belief Updaters","title":"Implemented Belief Updaters","text":"POMDPTools provides the following generic belief updaters:","category":"page"},{"location":"POMDPTools/beliefs/","page":"Implemented Belief Updaters","title":"Implemented Belief Updaters","text":"a discrete belief updater\na k previous observation updater\na previous observation updater \na nothing updater (for when the policy does not depend on any feedback)","category":"page"},{"location":"POMDPTools/beliefs/","page":"Implemented Belief Updaters","title":"Implemented Belief Updaters","text":"For particle filters see ParticleFilters.jl.","category":"page"},{"location":"POMDPTools/beliefs/#Discrete-(Bayesian-Filter)","page":"Implemented Belief Updaters","title":"Discrete (Bayesian Filter)","text":"","category":"section"},{"location":"POMDPTools/beliefs/","page":"Implemented Belief Updaters","title":"Implemented Belief Updaters","text":"The DiscreteUpater is a default implementation of a discrete Bayesian filter. The DiscreteBelief type is provided to represent discrete beliefs for discrete state POMDPs. ","category":"page"},{"location":"POMDPTools/beliefs/","page":"Implemented Belief Updaters","title":"Implemented Belief Updaters","text":"A convenience function uniform_belief is provided to create a DiscreteBelief with equal probability for each state. ","category":"page"},{"location":"POMDPTools/beliefs/","page":"Implemented Belief Updaters","title":"Implemented Belief Updaters","text":"DiscreteBelief","category":"page"},{"location":"POMDPTools/beliefs/#POMDPTools.BeliefUpdaters.DiscreteBelief","page":"Implemented Belief Updaters","title":"POMDPTools.BeliefUpdaters.DiscreteBelief","text":"DiscreteBelief\n\nA belief specified by a probability vector.\n\nNormalization of b is assumed in some calculations (e.g. pdf), but it is only automatically enforced in update(...), and a warning is given if normalized incorrectly in DiscreteBelief(pomdp, b).\n\nConstructor\n\nDiscreteBelief(pomdp, b::Vector{Float64}; check::Bool=true)\n\nFields\n\npomdp : the POMDP problem \nstate_list : a vector of ordered states\nb : the probability vector \n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/beliefs/","page":"Implemented Belief Updaters","title":"Implemented Belief Updaters","text":"DiscreteUpdater","category":"page"},{"location":"POMDPTools/beliefs/#POMDPTools.BeliefUpdaters.DiscreteUpdater","page":"Implemented Belief Updaters","title":"POMDPTools.BeliefUpdaters.DiscreteUpdater","text":"DiscreteUpdater\n\nAn updater type to update discrete belief using the discrete Bayesian filter.\n\nConstructor\n\nDiscreteUpdater(pomdp::POMDP)\n\nFields\n\npomdp <: POMDP\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/beliefs/","page":"Implemented Belief Updaters","title":"Implemented Belief Updaters","text":"uniform_belief(pomdp)","category":"page"},{"location":"POMDPTools/beliefs/#POMDPTools.BeliefUpdaters.uniform_belief-Tuple{Any}","page":"Implemented Belief Updaters","title":"POMDPTools.BeliefUpdaters.uniform_belief","text":" uniform_belief(pomdp)\n\nReturn a DiscreteBelief with equal probability for each state.\n\n\n\n\n\n","category":"method"},{"location":"POMDPTools/beliefs/#K-Previous-Observations","page":"Implemented Belief Updaters","title":"K Previous Observations","text":"","category":"section"},{"location":"POMDPTools/beliefs/","page":"Implemented Belief Updaters","title":"Implemented Belief Updaters","text":"KMarkovUpdater","category":"page"},{"location":"POMDPTools/beliefs/#POMDPTools.BeliefUpdaters.KMarkovUpdater","page":"Implemented Belief Updaters","title":"POMDPTools.BeliefUpdaters.KMarkovUpdater","text":"KMarkovUpdater\n\nUpdater that stores the k most recent observations as the belief.\n\nExample:\n\nup = KMarkovUpdater(5)\ns0 = rand(rng, initialstate(pomdp))\ninitial_observation = rand(rng, initialobs(pomdp, s0))\ninitial_obs_vec = fill(initial_observation, 5)\nhr = HistoryRecorder(rng=rng, max_steps=100)\nhist = simulate(hr, pomdp, policy, up, initial_obs_vec, s0)\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/beliefs/#Previous-Observation","page":"Implemented Belief Updaters","title":"Previous Observation","text":"","category":"section"},{"location":"POMDPTools/beliefs/","page":"Implemented Belief Updaters","title":"Implemented Belief Updaters","text":"PreviousObservationUpdater","category":"page"},{"location":"POMDPTools/beliefs/#POMDPTools.BeliefUpdaters.PreviousObservationUpdater","page":"Implemented Belief Updaters","title":"POMDPTools.BeliefUpdaters.PreviousObservationUpdater","text":"Updater that stores the most recent observation as the belief. If an initial distribution is provided, it will pass that as the initial belief.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/beliefs/#Nothing-Updater","page":"Implemented Belief Updaters","title":"Nothing Updater","text":"","category":"section"},{"location":"POMDPTools/beliefs/","page":"Implemented Belief Updaters","title":"Implemented Belief Updaters","text":"NothingUpdater","category":"page"},{"location":"POMDPTools/beliefs/#POMDPTools.BeliefUpdaters.NothingUpdater","page":"Implemented Belief Updaters","title":"POMDPTools.BeliefUpdaters.NothingUpdater","text":"An updater useful for when a belief is not necessary (i.e. for a random policy). update always returns nothing.\n\n\n\n\n\n","category":"type"},{"location":"api/#API-Documentation","page":"API Documentation","title":"API Documentation","text":"","category":"section"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"Docstrings for POMDPs.jl interface members can be accessed through Julia's built-in documentation system or in the list below.","category":"page"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"CurrentModule = POMDPs","category":"page"},{"location":"api/#Contents","page":"API Documentation","title":"Contents","text":"","category":"section"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"Pages = [\"api.md\"]","category":"page"},{"location":"api/#Index","page":"API Documentation","title":"Index","text":"","category":"section"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"Pages = [\"api.md\"]","category":"page"},{"location":"api/#Types","page":"API Documentation","title":"Types","text":"","category":"section"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"POMDP\nMDP\nSolver\nPolicy\nUpdater","category":"page"},{"location":"api/#POMDPs.POMDP","page":"API Documentation","title":"POMDPs.POMDP","text":"POMDP{S,A,O}\n\nAbstract base type for a partially observable Markov decision process.\n\nS: state type\nA: action type\nO: observation type\n\n\n\n\n\n","category":"type"},{"location":"api/#POMDPs.MDP","page":"API Documentation","title":"POMDPs.MDP","text":"MDP{S,A}\n\nAbstract base type for a fully observable Markov decision process.\n\nS: state type\nA: action type\n\n\n\n\n\n","category":"type"},{"location":"api/#POMDPs.Solver","page":"API Documentation","title":"POMDPs.Solver","text":"Base type for an MDP/POMDP solver\n\n\n\n\n\n","category":"type"},{"location":"api/#POMDPs.Policy","page":"API Documentation","title":"POMDPs.Policy","text":"Base type for a policy (a map from every possible belief, or more abstract policy state, to an optimal or suboptimal action)\n\n\n\n\n\n","category":"type"},{"location":"api/#POMDPs.Updater","page":"API Documentation","title":"POMDPs.Updater","text":"Abstract type for an object that defines how the belief should be updated\n\nA belief is a general construct that represents the knowledge an agent has about the state of the system. This can be a probability distribution, an action observation history or a more general representation.\n\n\n\n\n\n","category":"type"},{"location":"api/#Model-Functions","page":"API Documentation","title":"Model Functions","text":"","category":"section"},{"location":"api/#Dynamics","page":"API Documentation","title":"Dynamics","text":"","category":"section"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"transition\nobservation\nreward\ngen\n@gen","category":"page"},{"location":"api/#POMDPs.transition","page":"API Documentation","title":"POMDPs.transition","text":"transition(m::POMDP, state, action)\ntransition(m::MDP, state, action)\n\nReturn the transition distribution from the current state-action pair.\n\nIf it is difficult to define the probability density or mass function explicitly, consider using POMDPModelTools.ImplicitDistribution to define a generative model.\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.observation","page":"API Documentation","title":"POMDPs.observation","text":"observation(m::POMDP, statep)\nobservation(m::POMDP, action, statep)\nobservation(m::POMDP, state, action, statep)\n\nReturn the observation distribution. You need only define the method with the fewest arguments needed to determine the observation distribution.\n\nIf it is difficult to define the probability density or mass function explicitly, consider using POMDPModelTools.ImplicitDistribution to define a generative model.\n\nExample\n\nusing POMDPModelTools # for SparseCat\n\nstruct MyPOMDP <: POMDP{Int, Int, Int} end\n\nobservation(p::MyPOMDP, sp::Int) = SparseCat([sp-1, sp, sp+1], [0.1, 0.8, 0.1])\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.reward","page":"API Documentation","title":"POMDPs.reward","text":"reward(m::POMDP, s, a)\nreward(m::MDP, s, a)\n\nReturn the immediate reward for the s-a pair.\n\nreward(m::POMDP, s, a, sp)\nreward(m::MDP, s, a, sp)\n\nReturn the immediate reward for the s-a-s' triple\n\nreward(m::POMDP, s, a, sp, o)\n\nReturn the immediate reward for the s-a-s'-o quad\n\nFor some problems, it is easier to express reward(m, s, a, sp) or reward(m, s, a, sp, o), than reward(m, s, a), but some solvers, e.g. SARSOP, can only use reward(m, s, a). Both can be implemented for a problem, but when reward(m, s, a) is implemented, it should be consistent with reward(m, s, a, sp[, o]), that is, it should be the expected value over all destination states and observations.\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.gen","page":"API Documentation","title":"POMDPs.gen","text":"gen(m::Union{MDP,POMDP}, s, a, rng::AbstractRNG)\n\nFunction for implementing the entire MDP/POMDP generative model by returning a NamedTuple.\n\ngen should only be implemented in the case where two or more of the next state, observation, and reward need to be generated at the same time. If the state transition model can be separated from the reward and observation models, you should implement transition with an ImplicitDistribution instead of gen.\n\nSolver and simulator writers should use the @gen macro to call a generative model.\n\nArguments\n\nm: an MDP or POMDP model\ns: the current state\na: the action\nrng: a random number generator (Typically a MersenneTwister)\n\nReturn\n\nThe function should return a NamedTuple. With a subset of following entries:\n\nMDP\n\nsp: the next state\nr: the reward for the step\ninfo: extra debugging information, typically in an associative container like a NamedTuple\n\nPOMDP\n\nsp: the next state\no: the observation\nr: the reward for the step\ninfo: extra debugging information, typically in an associative container like a NamedTuple\n\nSome elements can be left out. For instance if o is left out of the return, the problem-writer can also implement observation and POMDPs.jl will automatically use it when needed.\n\nExample\n\nstruct LQRMDP <: MDP{Float64, Float64} end\n\nPOMDPs.gen(m::LQRMDP, s, a, rng) = (sp = s + a + randn(rng), r = -s^2 - a^2)\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.@gen","page":"API Documentation","title":"POMDPs.@gen","text":"@gen(X)(m, s, a)\n@gen(X)(m, s, a, rng::AbstractRNG)\n\nCall the generative model for a (PO)MDP m; Sample values from several nodes in the dynamic decision network. X is one or more symbols indicating which nodes to output.\n\nSolvers and simulators should call this rather than the gen function. Problem writers should implement a method of the transition or gen function instead of altering @gen.\n\nArguments\n\nm: an MDP or POMDP model\ns: the current state\na: the action\nrng (optional): a random number generator (Typically a MersenneTwister)\n\nReturn\n\nIf X, is a symbol, return a value sample from the corresponding node. If X is several symbols, return a Tuple of values sampled from the specified nodes.\n\nExamples\n\nLet m be an MDP or POMDP, s be a state of m, a be an action of m, and rng be an AbstractRNG.\n\n@gen(:sp, :r)(m, s, a) returns a Tuple containing the next state and reward.\n@gen(:sp, :o, :r)(m, s, a, rng) returns a Tuple containing the next state, observation, and reward.\n@gen(:sp)(m, s, a, rng) returns the next state.\n\n\n\n\n\n","category":"macro"},{"location":"api/#Static-Properties","page":"API Documentation","title":"Static Properties","text":"","category":"section"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"states\nactions\nobservations\nisterminal\ndiscount\ninitialstate\ninitialobs\nstateindex\nactionindex\nobsindex\nconvert_s\nconvert_a\nconvert_o","category":"page"},{"location":"api/#POMDPs.states","page":"API Documentation","title":"POMDPs.states","text":"states(problem::POMDP)\nstates(problem::MDP)\n\nReturns the complete state space of a POMDP. \n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.actions","page":"API Documentation","title":"POMDPs.actions","text":"actions(m::Union{MDP,POMDP})\n\nReturns the entire action space of a (PO)MDP.\n\n\n\nactions(m::Union{MDP,POMDP}, s)\n\nReturn the actions that can be taken from state s.\n\n\n\nactions(m::POMDP, b)\n\nReturn the actions that can be taken from belief b.\n\nTo implement an observation-dependent action space, use currentobs(b) to get the observation associated with belief b within the implementation of actions(m, b).\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.observations","page":"API Documentation","title":"POMDPs.observations","text":"observations(problem::POMDP)\n\nReturn the entire observation space.\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.isterminal","page":"API Documentation","title":"POMDPs.isterminal","text":"isterminal(m::Union{MDP,POMDP}, s)\n\nCheck if state s is terminal.\n\nIf a state is terminal, no actions will be taken in it and no additional rewards will be accumulated. Thus, the value function at such a state is, by definition, zero.\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.discount","page":"API Documentation","title":"POMDPs.discount","text":"discount(m::POMDP)\ndiscount(m::MDP)\n\nReturn the discount factor for the problem.\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.initialstate","page":"API Documentation","title":"POMDPs.initialstate","text":"initialstate(m::Union{POMDP,MDP})\n\nReturn a distribution of initial states for (PO)MDP m.\n\nIf it is difficult to define the probability density or mass function explicitly, consider using POMDPModelTools.ImplicitDistribution to define a model for sampling.\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.initialobs","page":"API Documentation","title":"POMDPs.initialobs","text":"initialobs(m::POMDP, s)\n\nReturn a distribution of initial observations for POMDP m and state s.\n\nIf it is difficult to define the probability density or mass function explicitly, consider using POMDPModelTools.ImplicitDistribution to define a model for sampling.\n\nThis function is only used in cases where the policy expects an initial observation rather than an initial belief, e.g. in a reinforcement learning setting. It is not used in a standard POMDP simulation.\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.stateindex","page":"API Documentation","title":"POMDPs.stateindex","text":"stateindex(problem::POMDP, s)\nstateindex(problem::MDP, s)\n\nReturn the integer index of state s. Used for discrete models only.\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.actionindex","page":"API Documentation","title":"POMDPs.actionindex","text":"actionindex(problem::POMDP, a)\nactionindex(problem::MDP, a)\n\nReturn the integer index of action a. Used for discrete models only.\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.obsindex","page":"API Documentation","title":"POMDPs.obsindex","text":"obsindex(problem::POMDP, o)\n\nReturn the integer index of observation o. Used for discrete models only.\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.convert_s","page":"API Documentation","title":"POMDPs.convert_s","text":"convert_s(::Type{V}, s, problem::Union{MDP,POMDP}) where V<:AbstractArray\nconvert_s(::Type{S}, vec::V, problem::Union{MDP,POMDP}) where {S,V<:AbstractArray}\n\nConvert a state to vectorized form or vice versa.\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.convert_a","page":"API Documentation","title":"POMDPs.convert_a","text":"convert_a(::Type{V}, a, problem::Union{MDP,POMDP}) where V<:AbstractArray\nconvert_a(::Type{A}, vec::V, problem::Union{MDP,POMDP}) where {A,V<:AbstractArray}\n\nConvert an action to vectorized form or vice versa.\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.convert_o","page":"API Documentation","title":"POMDPs.convert_o","text":"convert_o(::Type{V}, o, problem::Union{MDP,POMDP}) where V<:AbstractArray\nconvert_o(::Type{O}, vec::V, problem::Union{MDP,POMDP}) where {O,V<:AbstractArray}\n\nConvert an observation to vectorized form or vice versa.\n\n\n\n\n\n","category":"function"},{"location":"api/#Type-Inference","page":"API Documentation","title":"Type Inference","text":"","category":"section"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"statetype\nactiontype\nobstype","category":"page"},{"location":"api/#POMDPs.statetype","page":"API Documentation","title":"POMDPs.statetype","text":"statetype(t::Type)\nstatetype(p::Union{POMDP,MDP})\n\nReturn the state type for a problem type (the S in POMDP{S,A,O}).\n\ntype A <: POMDP{Int, Bool, Bool} end\n\nstatetype(A) # returns Int\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.actiontype","page":"API Documentation","title":"POMDPs.actiontype","text":"actiontype(t::Type)\nactiontype(p::Union{POMDP,MDP})\n\nReturn the state type for a problem type (the S in POMDP{S,A,O}).\n\ntype A <: POMDP{Bool, Int, Bool} end\n\nactiontype(A) # returns Int\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.obstype","page":"API Documentation","title":"POMDPs.obstype","text":"obstype(t::Type)\n\nReturn the state type for a problem type (the S in POMDP{S,A,O}).\n\ntype A <: POMDP{Bool, Bool, Int} end\n\nobstype(A) # returns Int\n\n\n\n\n\n","category":"function"},{"location":"api/#Distributions-and-Spaces","page":"API Documentation","title":"Distributions and Spaces","text":"","category":"section"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"rand\npdf\nmode\nmean\nsupport","category":"page"},{"location":"api/#Base.rand","page":"API Documentation","title":"Base.rand","text":"rand(rng::AbstractRNG, d::Any)\n\nReturn a random element from distribution or space d.\n\nIf d is a state or transition distribution, the sample will be a state; if d is an action distribution, the sample will be an action or if d is an observation distribution, the sample will be an observation.\n\n\n\n\n\n","category":"function"},{"location":"api/#Distributions.pdf","page":"API Documentation","title":"Distributions.pdf","text":"pdf(d::Any, x::Any)\n\nEvaluate the probability density of distribution d at sample x.\n\n\n\n\n\n","category":"function"},{"location":"api/#StatsBase.mode","page":"API Documentation","title":"StatsBase.mode","text":"mode(d::Any)\n\nReturn the most likely value in a distribution d.\n\n\n\n\n\n","category":"function"},{"location":"api/#Statistics.mean","page":"API Documentation","title":"Statistics.mean","text":"mean(d::Any)\n\nReturn the mean of a distribution d.\n\n\n\n\n\n","category":"function"},{"location":"api/#Distributions.support","page":"API Documentation","title":"Distributions.support","text":"support(d::Any)\n\nReturn an iterable object containing the possible values that can be sampled from distribution d. Values with zero probability may be skipped.\n\n\n\n\n\n","category":"function"},{"location":"api/#Belief-Functions","page":"API Documentation","title":"Belief Functions","text":"","category":"section"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"update\ninitialize_belief\nhistory\ncurrentobs","category":"page"},{"location":"api/#POMDPs.update","page":"API Documentation","title":"POMDPs.update","text":"update(updater::Updater, belief_old, action, observation)\n\nReturn a new instance of an updated belief given belief_old and the latest action and observation.\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.initialize_belief","page":"API Documentation","title":"POMDPs.initialize_belief","text":"initialize_belief(updater::Updater,\n state_distribution::Any)\ninitialize_belief(updater::Updater, belief::Any)\n\nReturns a belief that can be updated using updater that has similar distribution to state_distribution or belief.\n\nThe conversion may be lossy. This function is also idempotent, i.e. there is a default implementation that passes the belief through when it is already the correct type: initialize_belief(updater::Updater, belief) = belief\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.history","page":"API Documentation","title":"POMDPs.history","text":"history(b)\n\nReturn the action-observation history associated with belief b.\n\nThe history should be an AbstractVector, Tuple, (or similar object that supports indexing with end) full of NamedTuples with keys :a and :o, i.e. history(b)[end][:a] should be the last action taken leading up to b, and history(b)[end][:o] should be the last observation received.\n\nIt is acceptable to return only part of the history if that is all that is available, but it should always end with the current observation. For example, it would be acceptable to return a structure containing only the last three observations in a length 3 Vector{NamedTuple{(:o,),Tuple{O}}.\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.currentobs","page":"API Documentation","title":"POMDPs.currentobs","text":"currentobs(b)\n\nReturn the latest observation associated with belief b.\n\nIf a solver or updater implements history(b) for a belief type, currentobs has a default implementation.\n\n\n\n\n\n","category":"function"},{"location":"api/#Policy-and-Solver-Functions","page":"API Documentation","title":"Policy and Solver Functions","text":"","category":"section"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"solve\nupdater\naction\nvalue","category":"page"},{"location":"api/#POMDPs.solve","page":"API Documentation","title":"POMDPs.solve","text":"solve(solver::Solver, problem::POMDP)\n\nSolves the POMDP using method associated with solver, and returns a policy.\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.updater","page":"API Documentation","title":"POMDPs.updater","text":"updater(policy::Policy)\n\nReturns a default Updater appropriate for a belief type that policy p can use\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.action","page":"API Documentation","title":"POMDPs.action","text":"action(policy::Policy, x)\n\nReturns the action that the policy deems best for the current state or belief, x.\n\nx is a generalized information state - can be a state in an MDP, a distribution in POMDP, or another specialized policy-dependent representation of the information needed to choose an action.\n\n\n\n\n\n","category":"function"},{"location":"api/#POMDPs.value","page":"API Documentation","title":"POMDPs.value","text":"value(p::Policy, s)\nvalue(p::Policy, s, a)\n\nReturns the utility value from policy p given the state (or belief), or state-action (or belief-action) pair.\n\nThe state-action version is commonly referred to as the Q-value.\n\n\n\n\n\n","category":"function"},{"location":"api/#Simulator","page":"API Documentation","title":"Simulator","text":"","category":"section"},{"location":"api/","page":"API Documentation","title":"API Documentation","text":"Simulator\nsimulate","category":"page"},{"location":"api/#POMDPs.Simulator","page":"API Documentation","title":"POMDPs.Simulator","text":"Base type for an object defining how simulations should be carried out.\n\n\n\n\n\n","category":"type"},{"location":"api/#POMDPs.simulate","page":"API Documentation","title":"POMDPs.simulate","text":"simulate(sim::Simulator, m::POMDP, p::Policy, u::Updater=updater(p), b0=initialstate(m), s0=rand(b0))\nsimulate(sim::Simulator, m::MDP, p::Policy, s0=rand(initialstate(m)))\n\nRun a simulation using the specified policy.\n\nThe return type is flexible and depends on the simulator. Simulations should adhere to the Simulation Standard.\n\n\n\n\n\n","category":"function"},{"location":"run_simulation/#Running-Simulations","page":"Running Simulations","title":"Running Simulations","text":"","category":"section"},{"location":"run_simulation/","page":"Running Simulations","title":"Running Simulations","text":"Running a simulation consists of two steps, creating a simulator and calling the simulate function. For example, given a POMDP or MDP model m, and a policy p, one can use the RolloutSimulator from POMDPTools to find the accumulated discounted reward from a single simulated trajectory as follows:","category":"page"},{"location":"run_simulation/","page":"Running Simulations","title":"Running Simulations","text":"sim = RolloutSimulator()\nr = simulate(sim, m, p)","category":"page"},{"location":"run_simulation/","page":"Running Simulations","title":"Running Simulations","text":"More inputs, such as a belief updater, initial state, initial belief, etc. may be specified as arguments to simulate. See the docstring for simulate and the appropriate \"Input\" sections in the Simulation Standard page for more information.","category":"page"},{"location":"run_simulation/","page":"Running Simulations","title":"Running Simulations","text":"More examples can be found in the POMDPExamples package. A variety of simulators that return more information and interact in different ways can be found in POMDPTools.","category":"page"},{"location":"simulation/#Simulation-Standard","page":"Simulation Standard","title":"Simulation Standard","text":"","category":"section"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"Important note: In most cases, users need not implement their own simulators. Several simulators that are compatible with the standard in this document are implemented in POMDPTools and allow interaction from a variety of perspectives. Moreover CommonRLInterface.jl provides an OpenAI Gym style environment interface to interact with environments that is more flexible in some cases.","category":"page"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"In order to maintain consistency across the POMDPs.jl ecosystem, this page defines a standard for how simulations should be conducted. All simulators should be consistent with this page, and, if solvers are attempting to find an optimal POMDP policy, they should optimize the expected value of r_total below. In particular, this page should be consulted when questions about how less-obvious concepts like terminal states are handled.","category":"page"},{"location":"simulation/#POMDP-Simulation","page":"Simulation Standard","title":"POMDP Simulation","text":"","category":"section"},{"location":"simulation/#Inputs","page":"Simulation Standard","title":"Inputs","text":"","category":"section"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"In general, POMDP simulations take up to 5 inputs (see also the simulate docstring):","category":"page"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"pomdp::POMDP: pomdp model object (see POMDPs and MDPs)\npolicy::Policy: policy (see Solvers and Policies)\nup::Updater: belief updater (see Beliefs and Updaters)\nb0: initial belief (this may be updater-specific, such as an observation if the updater just returns the previous observation)\ns: initial state","category":"page"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"The last three of these inputs are optional. If they are not explicitly provided, they should be inferred using the following POMDPs.jl functions:","category":"page"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"up =updater(policy)\nb0 = [initialstate](@ref)(pomdp)`\ns = rand(initialstate(pomdp))","category":"page"},{"location":"simulation/#Simulation-Loop","page":"Simulation Standard","title":"Simulation Loop","text":"","category":"section"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"The main simulation loop is shown below. Note that the isterminal check prevents any actions from being taken and reward from being collected from a terminal state.","category":"page"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"Before the loop begins, initialize_belief is called to create the belief based on the initial state distribution - this is especially important when the belief is solver specific, such as the finite-state-machine used by MCVI. ","category":"page"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"b = initialize_belief(up, b0)\n\nr_total = 0.0\nd = 1.0\nwhile !isterminal(pomdp, s)\n a = action(policy, b)\n s, o, r = @gen(:sp,:o,:r)(pomdp, s, a)\n r_total += d*r\n d *= discount(pomdp)\n b = update(up, b, a, o)\nend","category":"page"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"In terms of the explicit interface, the @gen macro above expands to the equivalent of:","category":"page"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":" sp = rand(transition(pomdp, s, a))\n o = rand(observation(pomdp, s, a, sp))\n r = reward(pomdp, s, a, sp, o)\n s = sp","category":"page"},{"location":"simulation/#MDP-Simulation","page":"Simulation Standard","title":"MDP Simulation","text":"","category":"section"},{"location":"simulation/#Inputs-2","page":"Simulation Standard","title":"Inputs","text":"","category":"section"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"In general, MDP simulations take up to 3 inputs (see also the simulate docstring):","category":"page"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"mdp::MDP: mdp model object (see POMDPs and MDPs)\npolicy::Policy: policy (see Solvers and Policies)\ns: initial state","category":"page"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"The last of these inputs is optional. If the initial state is not explicitly provided, it should be generated using","category":"page"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"s = rand(initialstate(mdp))","category":"page"},{"location":"simulation/#Simulation-Loop-2","page":"Simulation Standard","title":"Simulation Loop","text":"","category":"section"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"The main simulation loop is shown below. Note again that the isterminal check prevents any actions from being taken and reward from being collected from a terminal state.","category":"page"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"r_total = 0.0\nd = 1.0\nwhile !isterminal(mdp, s)\n a = action(policy, s)\n s, r = @gen(:sp,:r)(mdp, s, a)\n r_total += d*r\n d *= discount(mdp)\nend","category":"page"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":"In terms of the explicit interface, the @gen macro above expands to the equivalent of:","category":"page"},{"location":"simulation/","page":"Simulation Standard","title":"Simulation Standard","text":" sp = rand(transition(pomdp, s, a))\n r = reward(pomdp, s, a, sp)\n s = sp","category":"page"},{"location":"POMDPTools/simulators/#Implemented-Simulators","page":"Implemented Simulators","title":"Implemented Simulators","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"POMDPTools contains a collection of POMDPs.jl simulators.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"Usage examples can be found in the simulation tutorial in the POMDPExamples package.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"If you are just getting started, probably the easiest way to begin is the stepthrough function. Otherwise, consult the Which Simulator Should I Use? guide below:","category":"page"},{"location":"POMDPTools/simulators/#which_simulator","page":"Implemented Simulators","title":"Which Simulator Should I Use?","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"The simulators in this package provide interaction with simulations of MDP and POMDP environments from a variety of perspectives. Use these questions to choose the best simulator to suit your needs.","category":"page"},{"location":"POMDPTools/simulators/#I-want-to-run-fast-rollout-simulations-and-get-the-discounted-reward.","page":"Implemented Simulators","title":"I want to run fast rollout simulations and get the discounted reward.","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"Use the Rollout Simulator.","category":"page"},{"location":"POMDPTools/simulators/#I-want-to-evaluate-performance-with-many-parallel-Monte-Carlo-simulations.","page":"Implemented Simulators","title":"I want to evaluate performance with many parallel Monte Carlo simulations.","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"Use the Parallel Simulator.","category":"page"},{"location":"POMDPTools/simulators/#I-want-to-closely-examine-the-histories-of-states,-actions,-etc.-produced-by-simulations.","page":"Implemented Simulators","title":"I want to closely examine the histories of states, actions, etc. produced by simulations.","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"Use the History Recorder.","category":"page"},{"location":"POMDPTools/simulators/#I-want-to-step-through-each-individual-step-of-a-simulation.","page":"Implemented Simulators","title":"I want to step through each individual step of a simulation.","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"Use the stepthrough function.","category":"page"},{"location":"POMDPTools/simulators/#I-want-to-visualize-a-simulation.","page":"Implemented Simulators","title":"I want to visualize a simulation.","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"Use the DisplaySimulator.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"Also see the POMDPGifs package for creating gif animations.","category":"page"},{"location":"POMDPTools/simulators/#I-want-to-interact-with-a-MDP-or-POMDP-environment-from-the-policy's-perspective","page":"Implemented Simulators","title":"I want to interact with a MDP or POMDP environment from the policy's perspective","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"Use the sim function.","category":"page"},{"location":"POMDPTools/simulators/#Stepping-through","page":"Implemented Simulators","title":"Stepping through","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"The stepthrough function exposes a simulation as an iterator so that the steps can be iterated through with a for loop syntax as follows:","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"pomdp = BabyPOMDP()\npolicy = RandomPolicy(pomdp)\n\nfor (s, a, o, r) in stepthrough(pomdp, policy, \"s,a,o,r\", max_steps=10)\n println(\"in state $s\")\n println(\"took action $a\")\n println(\"received observation $o and reward $r\")\nend","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"More examples can be found in the POMDPExamples Package.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"stepthrough","category":"page"},{"location":"POMDPTools/simulators/#POMDPTools.Simulators.stepthrough","page":"Implemented Simulators","title":"POMDPTools.Simulators.stepthrough","text":"stepthrough(problem, policy, [spec])\nstepthrough(problem, policy, [spec], [rng=rng], [max_steps=max_steps])\nstepthrough(mdp::MDP, policy::Policy, [init_state], [spec]; [kwargs...])\nstepthrough(pomdp::POMDP, policy::Policy, [up::Updater, [initial_belief, [initial_state]]], [spec]; [kwargs...])\n\nCreate a simulation iterator. This is intended to be used with for loop syntax to output the results of each step as the simulation is being run. \n\nExample:\n\npomdp = BabyPOMDP()\npolicy = RandomPolicy(pomdp)\n\nfor (s, a, o, r) in stepthrough(pomdp, policy, \"s,a,o,r\", max_steps=10)\n println(\"in state $s\")\n println(\"took action $a\")\n println(\"received observation $o and reward $r\")\nend\n\nThe optional spec argument can be a string, tuple of symbols, or single symbol and follows the same pattern as eachstep called on a SimHistory object.\n\nUnder the hood, this function creates a StepSimulator with spec and returns a [PO]MDPSimIterator by calling simulate with all of the arguments except spec. All keyword arguments are passed to the StepSimulator constructor.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"The StepSimulator contained in this file can provide the same functionality with the following syntax:","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"sim = StepSimulator(\"s,a,r,sp\")\nfor (s,a,r,sp) in simulate(sim, problem, policy)\n # do something\nend","category":"page"},{"location":"POMDPTools/simulators/#Rollouts","page":"Implemented Simulators","title":"Rollouts","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"RolloutSimulator is the simplest MDP or POMDP simulator. When simulate is called, it simply simulates a single trajectory of the process and returns the discounted reward.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"rs = RolloutSimulator()\nmdp = GridWorld()\npolicy = RandomPolicy(mdp)\n\nr = simulate(rs, mdp, policy)","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"More examples can be found in the POMDPExamples Package","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"RolloutSimulator","category":"page"},{"location":"POMDPTools/simulators/#POMDPTools.Simulators.RolloutSimulator","page":"Implemented Simulators","title":"POMDPTools.Simulators.RolloutSimulator","text":"RolloutSimulator(rng, max_steps)\nRolloutSimulator(; )\n\nA fast simulator that just returns the reward\n\nThe simulation will be terminated when either\n\na terminal state is reached (as determined by isterminal() or\nthe discount factor is as small as eps or\nmax_steps have been executed\n\nKeyword arguments:\n\nrng::AbstractRNG (default: Random.default_rng()) - A random number generator to use. \neps::Float64 (default: 0.0) - A small number; if γᵗ where γ is the discount factor and t is the time step becomes smaller than this, the simulation will be terminated.\nmax_steps::Int (default: typemax(Int)) - The maximum number of steps to simulate.\n\nUsage (optional arguments in brackets):\n\nro = RolloutSimulator()\nhistory = simulate(ro, pomdp, policy, [updater [, init_belief [, init_state]]])\n\nSee also: HistoryRecorder, run_parallel\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/simulators/#History-Recorder","page":"Implemented Simulators","title":"History Recorder","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"A HistoryRecorder runs a simulation and records the trajectory. It returns an AbstractVector of NamedTuples - see Histories for more info.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"hr = HistoryRecorder(max_steps=100)\npomdp = TigerPOMDP()\npolicy = RandomPolicy(pomdp)\n\nh = simulate(hr, pomdp, policy)","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"More examples can be found in the POMDPExamples Package.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"HistoryRecorder","category":"page"},{"location":"POMDPTools/simulators/#POMDPTools.Simulators.HistoryRecorder","page":"Implemented Simulators","title":"POMDPTools.Simulators.HistoryRecorder","text":"A simulator that records the history for later examination\n\nThe simulation will be terminated when either\n\na terminal state is reached (as determined by isterminal() or\nthe discount factor is as small as eps or\nmax_steps have been executed\n\nKeyword Arguments: - rng: The random number generator for the simulation - capture_exception::Bool: whether to capture an exception and store it in the history, or let it go uncaught, potentially killing the script - show_progress::Bool: show a progress bar for the simulation - eps - max_steps\n\nUsage (optional arguments in brackets):\n\nhr = HistoryRecorder()\nhistory = simulate(hr, pomdp, policy, [updater [, init_belief [, init_state]]])\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/simulators/#sim-function","page":"Implemented Simulators","title":"sim()","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"The sim function provides a convenient way to interact with a POMDP or MDP environment and return a history. The first argument is a function that is called at every time step and takes a state (in the case of an MDP) or an observation (in the case of a POMDP) as the argument and then returns an action. The second argument is a pomdp or mdp. It is intended to be used with Julia's do syntax as follows:","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"pomdp = TigerPOMDP()\nhistory = sim(pomdp, max_steps=10) do obs\n println(\"Observation was $obs.\")\n return TIGER_OPEN_LEFT\nend","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"This allows a flexible and general way to interact with a POMDP environment without creating new Policy types.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"In the POMDP case, an updater can optionally be supplied as an additional positional argument if the policy function works with beliefs rather than directly with observations.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"More examples can be found in the POMDPExamples Package","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"More examples can be found in the POMDPExamples Package","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"sim","category":"page"},{"location":"POMDPTools/simulators/#POMDPTools.Simulators.sim","page":"Implemented Simulators","title":"POMDPTools.Simulators.sim","text":"sim(polfunc::Function, mdp::MDP; [])\nsim(polfunc::Function, pomdp::POMDP; [])\n\nAlternative way of running a simulation with a function specifying how to calculate the action at each timestep.\n\nUsage\n\nsim(mdp) do s\n # code that calculates action `a` based on `s` - this is the policy\n # you can also do other things like display something\n return a\nend\n\nfor an MDP or\n\nsim(pomdp) do o\n # code that calculates 'a' based on observation `o`\n # optionally you could save 'o' in a global variable or do a belief update\n return a\nend\n\nor with a POMDP\n\nsim(pomdp, updater) do b\n # code that calculates 'a' based on belief `b`\n # `b` is calculated by `updater`\n return a\nend\n\nfor a POMDP and a belief updater.\n\nKeyword Arguments\n\nAll Versions\n\ninitialstate: the initial state for the simulation\nsimulator: keyword argument to specify any simulator to run the simulation. If nothing is specified for the simulator, a HistoryRecorder will be used as the simulator, with all keyword arguments forwarded to it, e.g.\nsim(mdp, max_steps=100, show_progress=true) do s\n # ...\nend\nwill limit the simulation to 100 steps.\n\nPOMDP version\n\ninitialobs: this will control the initial observation given to the policy function. If this is not defined, rand(initialobs(m, s)) will be used if it is available. If it is not, missing will be used.\n\nPOMDP and updater version\n\ninitialbelief: initialize_belief(updater, initialbelief) is the first belief that will be given to the policy function.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/simulators/#Histories","page":"Implemented Simulators","title":"Histories","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"The results produced by HistoryRecorders and the sim function are contained in SimHistory objects.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"SimHistory","category":"page"},{"location":"POMDPTools/simulators/#POMDPTools.Simulators.SimHistory","page":"Implemented Simulators","title":"POMDPTools.Simulators.SimHistory","text":"SimHistory\n\nAn (PO)MDP simulation history returned by simulate(::HistoryRecorder, ::Union{MDP,POMDP},...).\n\nThis is an AbstractVector of NamedTuples containing the states, actions, etc.\n\nExamples\n\nhist[1][:s] # returns the first state in the history\n\nhist[:a] # returns all of the actions in the history\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/simulators/#Examples","page":"Implemented Simulators","title":"Examples","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"using POMDPs, POMDPTools, POMDPModels\nhr = HistoryRecorder(max_steps=10)\nhist = simulate(hr, BabyPOMDP(), FunctionPolicy(x->true))\nstep = hist[1] # all information available about the first step\nstep[:s] # the first state\nstep[:a] # the first action","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"To see everything available in a step, use","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"keys(first(hist))","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"The entire history of each variable is available by using a Symbol instead of an index, i.e.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"hist[:s]","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"will return a vector of the starting states for each step (note the difference between :s and :sp).","category":"page"},{"location":"POMDPTools/simulators/#eachstep","page":"Implemented Simulators","title":"eachstep","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"The eachstep function may also be useful:","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"eachstep","category":"page"},{"location":"POMDPTools/simulators/#POMDPTools.Simulators.eachstep","page":"Implemented Simulators","title":"POMDPTools.Simulators.eachstep","text":"for t in eachstep(hist, [spec])\n ...\nend\n\nIterate through the steps in SimHistory hist. spec is a tuple of symbols or string that controls what is returned for each step.\n\nFor example,\n\nfor (s, a, r, sp) in eachstep(h, \"(s, a, r, sp)\") \n println(\"reward $r received when state $sp was reached after action $a was taken in state $s\")\nend\n\nreturns the start state, action, reward and destination state for each step of the simulation.\n\nAlternatively, instead of expanding the steps implicitly, the elements of the step can be accessed as fields (since each step is a NamedTuple):\n\nfor step in eachstep(h, \"(s, a, r, sp)\") \n println(\"reward $(step.r) received when state $(step.sp) was reached after action $(step.a) was taken in state $(step.s)\")\nend\n\nThe possible valid elements in the iteration specification are\n\nAny node in the (PO)MDP Dynamic Decision network (by default :s, :a, :sp, :o, :r)\nb - the initial belief in the step (for POMDPs only)\nbp - the belief after being updated based on o (for POMDPs only)\naction_info - info from the policy decision (from action_info)\nupdate_info - info from the belief update (from update_info)\nt - the timestep index\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/simulators/#Examples:","page":"Implemented Simulators","title":"Examples:","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"collect(eachstep(h, \"a,o\"))","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"will produce a vector of action-observation named tuples.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"collect(norm(sp-s) for (s,sp) in eachstep(h, \"s,sp\"))","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"will produce a vector of the distances traveled on each step (assuming the state is a Euclidean vector).","category":"page"},{"location":"POMDPTools/simulators/#Notes","page":"Implemented Simulators","title":"Notes","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"The iteration specification can be specified as a tuple of symbols (e.g. (:s, :a)) instead of a string.\nFor type stability in performance-critical code, one should construct an iterator directly using HistoryIterator{typeof(h), (:a,:r)}(h) rather than eachstep(h, \"ar\").","category":"page"},{"location":"POMDPTools/simulators/#Other-Functions","page":"Implemented Simulators","title":"Other Functions","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"state_hist(h), action_hist(h), observation_hist(h) belief_hist(h), and reward_hist(h) will return vectors of the states, actions, and rewards, and undiscounted_reward(h) and discounted_reward(h) will return the total rewards collected over the trajectory. n_steps(h) returns the number of steps in the history. exception(h) and backtrace(h) can be used to hold an exception if the simulation failed to finish.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"view(h, range) (e.g. view(h, 1:n_steps(h)-4)) can be used to create a view of the history object h that only contains a certain range of steps. The object returned by view is an AbstractSimHistory that can be iterated through and manipulated just like a complete SimHistory.","category":"page"},{"location":"POMDPTools/simulators/#Parallel","page":"Implemented Simulators","title":"Parallel","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"POMDPTools contains a utility for running many Monte Carlo simulations in parallel to evaluate performance. The basic workflow involves the following steps:","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"Create a vector of Sim objects, each specifying how a single simulation should be run.\nUse the run_parallel or run function to run the simulations.\nAnalyze the results of the simulations contained in the DataFrame returned by run_parallel.","category":"page"},{"location":"POMDPTools/simulators/#Example","page":"Implemented Simulators","title":"Example","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"An example can be found in the POMDPExamples Package.","category":"page"},{"location":"POMDPTools/simulators/#Sim-objects","page":"Implemented Simulators","title":"Sim objects","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"Each simulation should be specified by a Sim object which contains all the information needed to run a simulation, including the Simulator, POMDP or MDP, Policy, Updater, and any other ingredients.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"Sim","category":"page"},{"location":"POMDPTools/simulators/#POMDPTools.Simulators.Sim","page":"Implemented Simulators","title":"POMDPTools.Simulators.Sim","text":"Sim(m::MDP, p::Policy[, initialstate]; kwargs...)\nSim(m::POMDP, p::Policy[, updater[, initial_belief[, initialstate]]]; kwargs...)\n\nCreate a Sim object that contains everything needed to run and record a single simulation, including model, initial conditions, and metadata.\n\nA vector of Sim objects can be executed with run or run_parallel.\n\nKeyword Arguments\n\nrng::AbstractRNG=Random.default_rng()\nmax_steps::Int=typemax(Int)\nsimulator::Simulator=HistoryRecorder(rng=rng, max_steps=max_steps)\nmetadata::NamedTuple a named tuple (or dictionary) of metadata for the sim that will be recorded, e.g.(solver_iterations=500,)`.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/simulators/#Running-simulations","page":"Implemented Simulators","title":"Running simulations","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"The simulations are actually carried out by the run and run_parallel functions.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"run_parallel","category":"page"},{"location":"POMDPTools/simulators/#POMDPTools.Simulators.run_parallel","page":"Implemented Simulators","title":"POMDPTools.Simulators.run_parallel","text":"run_parallel(queue::Vector{Sim})\nrun_parallel(f::Function, queue::Vector{Sim})\n\nRun Sim objects in queue in parallel and return results as a DataFrame.\n\nBy default, the DataFrame will contain the reward for each simulation and the metadata provided to the sim.\n\nArguments\n\nqueue: List of Sim objects to be executed\nf: Function to process the results of each simulation\n\nThis function should take two arguments, (1) the Sim that was executed and (2) the result of the simulation, by default a SimHistory. It should return a named tuple that will appear in the dataframe. See Examples below.\n\nKeyword Arguments\n\nshow_progress::Bool: whether or not to show a progress meter\nprogress::ProgressMeter.Progress: determines how the progress meter is displayed\n\nExamples\n\nrun_parallel(queue) do sim, hist\n return (n_steps=n_steps(hist), reward=discounted_reward(hist))\nend\n\nwill return a dataframe with with the number of steps and the reward in it.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"The run function is also provided to run simulations in serial (this is often useful for debugging). Note that the documentation below also contains a section for the builtin julia run function, even though it is not relevant here.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"run","category":"page"},{"location":"POMDPTools/simulators/#Base.run","page":"Implemented Simulators","title":"Base.run","text":"run(queue::Vector{Sim})\nrun(f::Function, queue::Vector{Sim})\n\nRun the Sim objects in queue on a single process and return the results as a dataframe.\n\nSee run_parallel for more information.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/simulators/#Specifying-information-to-be-recorded","page":"Implemented Simulators","title":"Specifying information to be recorded","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"By default, only the discounted rewards from each simulation are recorded, but arbitrary information can be recorded.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"The run_parallel and run functions accept a function (normally specified via the do syntax) that takes the Sim object and history of the simulation and extracts relevant statistics as a named tuple. For example, if the desired characteristics are the number of steps in the simulation and the reward, run_parallel would be invoked as follows:","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"df = run_parallel(queue) do sim::Sim, hist::SimHistory\n return (n_steps=n_steps(hist), reward=discounted_reward(hist))\nend","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"These statistics are combined into a DataFrame, with each line representing a single simulation, allowing for statistical analysis. For example,","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"mean(df[:reward]./df[:n_steps])","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"would compute the average reward per step with each simulation weighted equally regardless of length.","category":"page"},{"location":"POMDPTools/simulators/#Display","page":"Implemented Simulators","title":"Display","text":"","category":"section"},{"location":"POMDPTools/simulators/#DisplaySimulator","page":"Implemented Simulators","title":"DisplaySimulator","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"The DisplaySimulator displays each step of a simulation in real time through a multimedia display such as a Jupyter notebook or ElectronDisplay. Specifically it uses POMDPTools.render and the built-in Julia display function to visualize each step.","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"Example:","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"using POMDPs\nusing POMDPModels\nusing POMDPTools\nusing ElectronDisplay\nElectronDisplay.CONFIG.single_window = true\n\nds = DisplaySimulator()\nm = SimpleGridWorld()\nsimulate(ds, m, RandomPolicy(m))","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"DisplaySimulator","category":"page"},{"location":"POMDPTools/simulators/#POMDPTools.Simulators.DisplaySimulator","page":"Implemented Simulators","title":"POMDPTools.Simulators.DisplaySimulator","text":"DisplaySimulator(;kwargs...)\n\nCreate a simulator that displays each step of a simulation.\n\nGiven a POMDP or MDP model m, this simulator roughly works like\n\nfor step in stepthrough(m, ...)\n display(render(m, step))\nend\n\nKeyword Arguments\n\ndisplay::AbstractDisplay: the display to use for the first argument to the display function. If this is nothing, display(...) will be called without an AbstractDisplay argument.\nrender_kwargs::NamedTuple: keyword arguments for POMDPTools.render(...)\nmax_fps::Number=10: maximum number of frames to be displayed per second - sleep will be used to skip extra time, so this is not designed for high precision\npredisplay::Function: function to call before every call to display(...). The only argument to this function will be the display (if it is specified) or nothing\nextra_initial::Bool=false: if true, display an extra step at the beginning with only elements t, sp, and bp for POMDPs (this can be useful to see the initial state if render displays only sp and not s).\nextra_final::Bool=true: iftrue, display an extra step at the end with only elementst,done,s, andbfor POMDPs (this can be useful to see the final state ifrenderdisplays onlysand notsp`).\nmax_steps::Integer: maximum number of steps to run for\nspec::NTuple{Symbol}: specification of what step elements to display (see eachstep)\nrng::AbstractRNG: random number generator\n\nSee the POMDPSimulators documentation for more tips about using specific displays.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/simulators/#Display-specific-tips","page":"Implemented Simulators","title":"Display-specific tips","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"The following tips may be helpful when using particular displays.","category":"page"},{"location":"POMDPTools/simulators/#Jupyter-notebooks","page":"Implemented Simulators","title":"Jupyter notebooks","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"By default, in a Jupyter notebook, the visualizations of all steps are displayed in the output box one after another. To make the output animated instead, where the image is overwritten at each step, one may use","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"DisplaySimulator(predisplay=(d)->IJulia.clear_output(true))","category":"page"},{"location":"POMDPTools/simulators/#ElectronDisplay","page":"Implemented Simulators","title":"ElectronDisplay","text":"","category":"section"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"By default, ElectronDisplay will open a new window for each new step. To prevent this, use","category":"page"},{"location":"POMDPTools/simulators/","page":"Implemented Simulators","title":"Implemented Simulators","text":"ElectronDisplay.CONFIG.single_window = true","category":"page"},{"location":"POMDPTools/testing/#Testing","page":"Testing","title":"Testing","text":"","category":"section"},{"location":"POMDPTools/testing/","page":"Testing","title":"Testing","text":"POMDPTools contains basic utilities for testing models and solvers.","category":"page"},{"location":"POMDPTools/testing/#Testing-(PO)MDP-Models","page":"Testing","title":"Testing (PO)MDP Models","text":"","category":"section"},{"location":"POMDPTools/testing/","page":"Testing","title":"Testing","text":"has_consistent_distributions\nhas_consistent_initial_distribution\nhas_consistent_transition_distributions\nhas_consistent_observation_distributions","category":"page"},{"location":"POMDPTools/testing/#POMDPTools.Testing.has_consistent_distributions","page":"Testing","title":"POMDPTools.Testing.has_consistent_distributions","text":"has_consistent_distributions(m::MDP; atol=0)\nhas_consistent_distributions(m::POMDP; atol=0)\n\nReturn true if no problems are found in the distributions for a discrete problem. Print information and return false if problems are found.\n\nTests whether\n\nAll probabilities are positive\nProbabilities for all distributions sum to 1\nAll items with positive probability are in the support\n\nKeyword Arguments\n\natol: absolute tolerance passed to approx for all probability checks\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/testing/#POMDPTools.Testing.has_consistent_initial_distribution","page":"Testing","title":"POMDPTools.Testing.has_consistent_initial_distribution","text":"has_consistent_initial_distribution(m; atol=0)\n\nReturn true if no problems are found with the initial state distribution for a discrete problem. Print information and return false if problems are found.\n\nSee has_consistent_distributions for information on what checks are performed.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/testing/#POMDPTools.Testing.has_consistent_transition_distributions","page":"Testing","title":"POMDPTools.Testing.has_consistent_transition_distributions","text":"has_consistent_transition_distributions(m; atol=0)\n\nReturn true if no problems are found in the transition distributions for a discrete problem. Print information and return false if problems are found.\n\nSee has_consistent_distributions for information on what checks are performed.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/testing/#POMDPTools.Testing.has_consistent_observation_distributions","page":"Testing","title":"POMDPTools.Testing.has_consistent_observation_distributions","text":"has_consistent_observation_distributions(m; atol=0)\n\nReturn true if no problems are found in the observation distributions for a discrete POMDP. Print information and return false if problems are found.\n\nSee has_consistent_distributions for information on what checks are performed.\n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/testing/#Testing-Solvers","page":"Testing","title":"Testing Solvers","text":"","category":"section"},{"location":"POMDPTools/testing/","page":"Testing","title":"Testing","text":"test_solver","category":"page"},{"location":"POMDPTools/testing/#POMDPTools.Testing.test_solver","page":"Testing","title":"POMDPTools.Testing.test_solver","text":"test_solver(solver::Solver, problem::POMDP)\ntest_solver(solver::Solver, problem::MDP)\n\nUse the solver to solve the specified problem, then run a simulation.\n\nThis is designed to illustrate how solvers are expected to function. All solvers should be able to complete this standard test with the simple models in the POMDPModels package.\n\nNote that this does NOT test the optimality of the solution, but is only a smoke test to see if the solver interacts with POMDP models as expected.\n\nTo run this with a solver called YourSolver, run\n\nusing POMDPToolbox\nusing POMDPModels\n\nsolver = YourSolver(# initialize with parameters #)\ntest_solver(solver, BabyPOMDP())\n\n\n\n\n\n","category":"function"},{"location":"offline_solver/#Example:-Defining-an-offline-solver","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"","category":"section"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"In this example, we will define a simple offline solver that works for both POMDPs and MDPs. In order to focus on the code structure, we will not create an algorithm that finds an optimal policy, but rather a greedy policy, that is, one that optimizes the expected immediate reward. For information on using this solver in a simulation, see Running Simulations.","category":"page"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"We begin by creating a solver type. Since there are no adjustable parameters for the solver, it is an empty type, but for a more complex solver, parameters would usually be included as type fields.","category":"page"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"using POMDPs\n\nstruct GreedyOfflineSolver <: Solver end","category":"page"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"Next, we define the functions that will make the solver work for both MDPs and POMDPs.","category":"page"},{"location":"offline_solver/#MDP-Case","page":"Example: Defining an offline solver","title":"MDP Case","text":"","category":"section"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"Finding a greedy policy for an MDP consists of determining the action that has the best reward for each state. First, we create a simple policy object that holds a greedy action for each state.","category":"page"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"struct DictPolicy{S,A} <: Policy\n actions::Dict{S,A}\nend\n\nPOMDPs.action(p::DictPolicy, s) = p.actions[s]","category":"page"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"note: Note\nA POMDPTools.VectorPolicy could be used here. We include this example to show how to define a custom policy.","category":"page"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"The solve function calculates the best greedy action for each state and saves it in a policy. To have the widest possible compatibility with POMDP models, we want to use reward(m, s, a, sp) instead of reward(m, s, a), which means we need to calculate the expectation of the reward over transitions to every possible next state.","category":"page"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"function POMDPs.solve(::GreedyOfflineSolver, m::MDP)\n\n best_actions = Dict{statetype(m), actiontype(m)}()\n\n for s in states(m)\n if !isterminal(m, s)\n best = -Inf\n for a in actions(m)\n td = transition(m, s, a)\n r = 0.0\n for sp in support(td)\n r += pdf(td, sp) * reward(m, s, a, sp)\n end\n if r >= best\n best_actions[s] = a\n best = r\n end\n end\n end\n end\n \n return DictPolicy(best_actions)\nend","category":"page"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"note: Note\nWe limited this implementation to using basic POMDPs.jl implementation functions, but tools such as POMDPTools.StateActionReward, POMDPTools.ordered_states, and POMDPTools.weighted_iterator could have been used for a more concise and efficient implementation.","category":"page"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"We can now verify whether the policy produces the greedy action on an example from POMDPModels:","category":"page"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"using POMDPModels\n\ngw = SimpleGridWorld(size=(2,1), rewards=Dict(GWPos(2,1)=>1.0))\npolicy = solve(GreedyOfflineSolver(), gw)\n\naction(policy, GWPos(1,1))\n\n# output\n\n:right","category":"page"},{"location":"offline_solver/#POMDP-Case","page":"Example: Defining an offline solver","title":"POMDP Case","text":"","category":"section"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"For a POMDP, the greedy solution is the action that maximizes the expected immediate reward according to the belief. Since there are an infinite number of possible beliefs, the greedy solution for every belief cannot be calculated online. However, the greedy policy can take the form of an alpha vector policy where each action has an associated alpha vector with each entry corresponding to the immediate reward from taking the action in that state.","category":"page"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"Again, because a POMDP, may have reward(m, s, a, sp, o) instead of reward(m, s, a), we use the former and calculate the expectation over all next states and observations.","category":"page"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"using POMDPTools: AlphaVectorPolicy\n\nfunction POMDPs.solve(::GreedyOfflineSolver, m::POMDP)\n\n alphas = Vector{Float64}[]\n\n for a in actions(m)\n alpha = zeros(length(states(m)))\n for s in states(m)\n if !isterminal(m, s)\n r = 0.0\n td = transition(m, s, a)\n for sp in support(td)\n tp = pdf(td, sp)\n od = observation(m, s, a, sp)\n for o in support(od)\n r += tp * pdf(od, o) * reward(m, s, a, sp, o)\n end\n end\n alpha[stateindex(m, s)] = r\n end\n end\n push!(alphas, alpha)\n end\n \n return AlphaVectorPolicy(m, alphas, collect(actions(m)))\nend","category":"page"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"We can now verify that a policy created by the solver determines the correct greedy actions:","category":"page"},{"location":"offline_solver/","page":"Example: Defining an offline solver","title":"Example: Defining an offline solver","text":"using POMDPModels\nusing POMDPTools: Deterministic, Uniform\n\ntiger = TigerPOMDP()\npolicy = solve(GreedyOfflineSolver(), tiger)\n\n@assert action(policy, Deterministic(TIGER_LEFT)) == TIGER_OPEN_RIGHT\n@assert action(policy, Deterministic(TIGER_RIGHT)) == TIGER_OPEN_LEFT\n@assert action(policy, Uniform(states(tiger))) == TIGER_LISTEN","category":"page"},{"location":"def_solver/#Solvers","page":"Solvers","title":"Solvers","text":"","category":"section"},{"location":"def_solver/","page":"Solvers","title":"Solvers","text":"Defining a solver involves creating or using four pieces of code:","category":"page"},{"location":"def_solver/","page":"Solvers","title":"Solvers","text":"A subtype of Solver that holds the parameters and configuration options for the solver.\nA subtype of Policy that holds all of the data needed to choose actions online.\nA method of solve that takes the Solver and a (PO)MDP as arguments, performs all of the offline computations for solving the problem, and returns the policy.\nA method of action that takes in the policy and a state or belief and returns an action.","category":"page"},{"location":"def_solver/","page":"Solvers","title":"Solvers","text":"In many cases, items 2 and 4 can be satisfied with an off-the-shelf Policy from the POMDPTools package. also contains many tools that are useful for defining solvers in a robust, concise, and readable manner.","category":"page"},{"location":"def_solver/#Online-and-Offline-Solvers","page":"Solvers","title":"Online and Offline Solvers","text":"","category":"section"},{"location":"def_solver/","page":"Solvers","title":"Solvers","text":"Generally, solvers can be grouped into two categories: Offline solvers that do most of their computational work before interacting with the environment, and online solvers that do their work online as each new state or observation is encountered. Although offline and online solvers both use the exact same Solver, solve, Policy, action structure, the work of defining online and offline solvers is focused on different portions.","category":"page"},{"location":"def_solver/","page":"Solvers","title":"Solvers","text":"For an offline solver, most of the implementation effort will be spent on the [solve] function, and an off-the-shelf policy from POMDPTools will typically be used.","category":"page"},{"location":"def_solver/","page":"Solvers","title":"Solvers","text":"For an online solver, the solve function typically does little or no work, but merely creates a Policy object that will carry out computation online. It is typical in POMDPs.jl to use the term \"Planner\" to name a Policy object for an online solver that carries out a large amount of computation (\"planning\") at interaction time. In this case most of the effort will be focused on implementing the action method for the \"Planner\" Policy type.","category":"page"},{"location":"def_solver/#Examples","page":"Solvers","title":"Examples","text":"","category":"section"},{"location":"def_solver/","page":"Solvers","title":"Solvers","text":"Solver implementation is most clearly explained through examples. The following sections contain examples of both online and offline solver definitions:","category":"page"},{"location":"def_solver/","page":"Solvers","title":"Solvers","text":"Pages = [\"offline_solver.md\", \"online_solver.md\"]","category":"page"},{"location":"online_solver/#Example:-Defining-an-online-solver","page":"Example: Defining an online solver","title":"Example: Defining an online solver","text":"","category":"section"},{"location":"online_solver/","page":"Example: Defining an online solver","title":"Example: Defining an online solver","text":"In this example, we will define a simple online solver that works for both POMDPs and MDPs. In order to focus on the code structure, we will not create an algorithm that finds an optimal policy, but rather a greedy policy, that is, one that optimizes the expected immediate reward. For information on using this solver in a simulation, see Running Simulations.","category":"page"},{"location":"online_solver/","page":"Example: Defining an online solver","title":"Example: Defining an online solver","text":"In order to handle the widest range of problems, we will use @gen to generate Mone Carlo samples to estimate the reward even if only a simulator is available. We begin by creating the necessary types and the solve function. The only solver parameter is the number of samples used to estimate the reward at each step, and the solve function does nothing more than create a planner with the appropriate (PO)MDP problem definition.","category":"page"},{"location":"online_solver/","page":"Example: Defining an online solver","title":"Example: Defining an online solver","text":"using POMDPs\n\nstruct MonteCarloGreedySolver <: Solver\n num_samples::Int\nend\n\nstruct MonteCarloGreedyPlanner{M} <: Policy\n m::M\n num_samples::Int\nend\n\nPOMDPs.solve(sol::MonteCarloGreedySolver, m) = MonteCarloGreedyPlanner(m, sol.num_samples)","category":"page"},{"location":"online_solver/","page":"Example: Defining an online solver","title":"Example: Defining an online solver","text":"Next, we define the action function where the online work takes place.","category":"page"},{"location":"online_solver/#MDP-Case","page":"Example: Defining an online solver","title":"MDP Case","text":"","category":"section"},{"location":"online_solver/","page":"Example: Defining an online solver","title":"Example: Defining an online solver","text":"function POMDPs.action(p::MonteCarloGreedyPlanner{<:MDP}, s)\n best_reward = -Inf\n local best_action\n for a in actions(p.m)\n reward_sum = sum(@gen(:r)(p.m, s, a) for _ in 1:p.num_samples)\n if reward_sum >= best_reward\n best_reward = reward_sum\n best_action = a\n end\n end\n return best_action\nend","category":"page"},{"location":"online_solver/#POMDP-Case","page":"Example: Defining an online solver","title":"POMDP Case","text":"","category":"section"},{"location":"online_solver/","page":"Example: Defining an online solver","title":"Example: Defining an online solver","text":"function POMDPs.action(p::MonteCarloGreedyPlanner{<:POMDP}, b)\n best_reward = -Inf\n local best_action\n for a in actions(p.m)\n s = rand(b)\n reward_sum = sum(@gen(:r)(p.m, s, a) for _ in 1:p.num_samples)\n if reward_sum >= best_reward\n best_reward = reward_sum\n best_action = a\n end\n end\n return best_action\nend\n\n# output\n","category":"page"},{"location":"online_solver/#Verification","page":"Example: Defining an online solver","title":"Verification","text":"","category":"section"},{"location":"online_solver/","page":"Example: Defining an online solver","title":"Example: Defining an online solver","text":"We can now verify that the online planner works in some simple cases:","category":"page"},{"location":"online_solver/","page":"Example: Defining an online solver","title":"Example: Defining an online solver","text":"using POMDPModels\n\ngw = SimpleGridWorld(size=(2,1), rewards=Dict(GWPos(2,1)=>1.0))\nsolver = MonteCarloGreedySolver(1000)\nplanner = solve(solver, gw)\n\naction(planner, GWPos(1,1))\n\n# output\n\n:right","category":"page"},{"location":"online_solver/","page":"Example: Defining an online solver","title":"Example: Defining an online solver","text":"using POMDPModels\nusing POMDPTools: Deterministic, Uniform\n\ntiger = TigerPOMDP()\nsolver = MonteCarloGreedySolver(1000)\n\nplanner = solve(solver, tiger)\n\n@assert action(planner, Deterministic(TIGER_LEFT)) == TIGER_OPEN_RIGHT\n@assert action(planner, Deterministic(TIGER_RIGHT)) == TIGER_OPEN_LEFT\n# note action(planner, Uniform(states(tiger))) is not very reliable with this number of samples","category":"page"},{"location":"get_started/#Getting-Started","page":"Getting Started","title":"Getting Started","text":"","category":"section"},{"location":"get_started/","page":"Getting Started","title":"Getting Started","text":"Before writing our own POMDP problems or solvers, let's try out some of the available solvers and problem models available in JuliaPOMDP.","category":"page"},{"location":"get_started/","page":"Getting Started","title":"Getting Started","text":"Here is a short piece of code that solves the Tiger POMDP using QMDP, and evaluates the results. Note that you must have the QMDP, POMDPModels, and POMDPToolbox modules installed.","category":"page"},{"location":"get_started/","page":"Getting Started","title":"Getting Started","text":"using POMDPs, QMDP, POMDPModels, POMDPTools\n\n# initialize problem and solver\npomdp = TigerPOMDP() # from POMDPModels\nsolver = QMDPSolver() # from QMDP\n\n# compute a policy\npolicy = solve(solver, pomdp)\n\n#evaluate the policy\nbelief_updater = updater(policy) # the default QMDP belief updater (discrete Bayesian filter)\ninit_dist = initialstate(pomdp) # from POMDPModels\nhr = HistoryRecorder(max_steps=100) # from POMDPTools\nhist = simulate(hr, pomdp, policy, belief_updater, init_dist) # run 100 step simulation\nprintln(\"reward: $(discounted_reward(hist))\")","category":"page"},{"location":"get_started/","page":"Getting Started","title":"Getting Started","text":"The first part of the code loads the desired packages and initializes the problem and the solver. Next, we compute a POMDP policy. Lastly, we evaluate the results.","category":"page"},{"location":"get_started/","page":"Getting Started","title":"Getting Started","text":"There are a few things to mention here. First, the TigerPOMDP type implements all the functions required by QMDPSolver to compute a policy. Second, each policy has a default updater (essentially a filter used to update the belief of the POMDP). To learn more about Updaters check out the Concepts section.","category":"page"},{"location":"POMDPTools/distributions/#Implemented-Distributions","page":"Implemented Distributions","title":"Implemented Distributions","text":"","category":"section"},{"location":"POMDPTools/distributions/","page":"Implemented Distributions","title":"Implemented Distributions","text":"POMDPTools contains several utility distributions to be used in the POMDPs transition and observation functions. These implement the appropriate methods of the functions in the distributions interface.","category":"page"},{"location":"POMDPTools/distributions/","page":"Implemented Distributions","title":"Implemented Distributions","text":"This package also supplies showdistribution for pretty printing distributions as unicode bar graphs to the terminal.","category":"page"},{"location":"POMDPTools/distributions/#Sparse-Categorical-(SparseCat)","page":"Implemented Distributions","title":"Sparse Categorical (SparseCat)","text":"","category":"section"},{"location":"POMDPTools/distributions/","page":"Implemented Distributions","title":"Implemented Distributions","text":"SparseCat is a sparse categorical distribution which is specified by simply providing a list of possible values (states or observations) and the probabilities corresponding to those particular objects.","category":"page"},{"location":"POMDPTools/distributions/","page":"Implemented Distributions","title":"Implemented Distributions","text":"Example: SparseCat([1,2,3], [0.1,0.2,0.7]) is a categorical distribution that assigns probability 0.1 to 1, 0.2 to 2, 0.7 to 3, and 0 to all other values.","category":"page"},{"location":"POMDPTools/distributions/","page":"Implemented Distributions","title":"Implemented Distributions","text":"SparseCat","category":"page"},{"location":"POMDPTools/distributions/#POMDPTools.POMDPDistributions.SparseCat","page":"Implemented Distributions","title":"POMDPTools.POMDPDistributions.SparseCat","text":"SparseCat(values, probabilities)\n\nCreate a sparse categorical distribution.\n\nvalues is an iterable object containing the possible values (can be of any type) in the distribution that have nonzero probability. probabilities is an iterable object that contains the associated probabilities.\n\nThis is optimized for value iteration with a fast implementation of weighted_iterator. Both pdf and rand are order n.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/distributions/#Implicit","page":"Implemented Distributions","title":"Implicit","text":"","category":"section"},{"location":"POMDPTools/distributions/","page":"Implemented Distributions","title":"Implemented Distributions","text":"In situations where a distribution object is required, but the pdf is difficult to specify and only samples are required, ImplicitDistribution provides a convenient way to package a sampling function.","category":"page"},{"location":"POMDPTools/distributions/","page":"Implemented Distributions","title":"Implemented Distributions","text":"ImplicitDistribution","category":"page"},{"location":"POMDPTools/distributions/#POMDPTools.POMDPDistributions.ImplicitDistribution","page":"Implemented Distributions","title":"POMDPTools.POMDPDistributions.ImplicitDistribution","text":"ImplicitDistribution(sample_function, args...)\n\nDefine a distribution that can only be sampled from using rand, but has no explicit pdf.\n\nEach time rand(rng, d::ImplicitDistribution) is called,\n\nsample_function(args..., rng)\n\nwill be called to generate a new sample.\n\nImplicitDistribution is designed to be used with anonymous functions or the do syntax as follows:\n\nExamples\n\nImplicitDistribution(rng->rand(rng)^2)\n\nstruct MyMDP <: MDP{Float64, Int} end\n\nfunction POMDPs.transition(m::MyMDP, s, a)\n ImplicitDistribution(s, a) do s, a, rng\n return s + a + 0.001*randn(rng)\n end\nend\n\ntd = transition(MyMDP(), 1.0, 1)\nrand(td) # will return a number near 2\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/distributions/#Bool-Distribution","page":"Implemented Distributions","title":"Bool Distribution","text":"","category":"section"},{"location":"POMDPTools/distributions/","page":"Implemented Distributions","title":"Implemented Distributions","text":"BoolDistribution","category":"page"},{"location":"POMDPTools/distributions/#POMDPTools.POMDPDistributions.BoolDistribution","page":"Implemented Distributions","title":"POMDPTools.POMDPDistributions.BoolDistribution","text":"BoolDistribution(p_true)\n\nCreate a distribution over Boolean values (true or false).\n\np_true is the probability of the true outcome; the probability of false is 1-p_true.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/distributions/#Deterministic","page":"Implemented Distributions","title":"Deterministic","text":"","category":"section"},{"location":"POMDPTools/distributions/","page":"Implemented Distributions","title":"Implemented Distributions","text":"Deterministic","category":"page"},{"location":"POMDPTools/distributions/#POMDPTools.POMDPDistributions.Deterministic","page":"Implemented Distributions","title":"POMDPTools.POMDPDistributions.Deterministic","text":"Deterministic(value)\n\nCreate a deterministic distribution over only one value.\n\nThis is intended to be used when a distribution is required, but the outcome is deterministic. It is equivalent to a Kronecker Delta distribution.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/distributions/#Uniform","page":"Implemented Distributions","title":"Uniform","text":"","category":"section"},{"location":"POMDPTools/distributions/","page":"Implemented Distributions","title":"Implemented Distributions","text":"Uniform\nUnsafeUniform","category":"page"},{"location":"POMDPTools/distributions/#POMDPTools.POMDPDistributions.Uniform","page":"Implemented Distributions","title":"POMDPTools.POMDPDistributions.Uniform","text":"Uniform(collection)\n\nCreate a uniform categorical distribution over a collection of objects.\n\nThe objects in the collection must be unique (this is tested on construction), and will be stored in a Set. To avoid this overhead, use UnsafeUniform.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/distributions/#POMDPTools.POMDPDistributions.UnsafeUniform","page":"Implemented Distributions","title":"POMDPTools.POMDPDistributions.UnsafeUniform","text":"UnsafeUniform(collection)\n\nCreate a uniform categorical distribution over a collection of objects.\n\nNo checks are performed to ensure uniqueness or check whether an object is actually in the set when evaluating the pdf.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/distributions/#Pretty-Printing","page":"Implemented Distributions","title":"Pretty Printing","text":"","category":"section"},{"location":"POMDPTools/distributions/","page":"Implemented Distributions","title":"Implemented Distributions","text":"showdistribution","category":"page"},{"location":"POMDPTools/distributions/#POMDPTools.POMDPDistributions.showdistribution","page":"Implemented Distributions","title":"POMDPTools.POMDPDistributions.showdistribution","text":"showdistribution([io], [mime], d)\n\nShow a UnicodePlots.barplot representation of a distribution.\n\nKeyword Arguments\n\ntitle::String=string(typeof(d))*\" distribution\": title for the barplot. \n\n\n\n\n\n","category":"function"},{"location":"POMDPTools/common_rl/#CommonRLInterface-Integration","page":"CommonRLInterface Integration","title":"CommonRLInterface Integration","text":"","category":"section"},{"location":"POMDPTools/common_rl/","page":"CommonRLInterface Integration","title":"CommonRLInterface Integration","text":"POMDPTools provides two-way integration with the CommonRLInterface.jl package. Using the convert function, one can convert an MDP or POMDP object to a CommonRLInterface environment, or vice-versa.","category":"page"},{"location":"POMDPTools/common_rl/","page":"CommonRLInterface Integration","title":"CommonRLInterface Integration","text":"For example,","category":"page"},{"location":"POMDPTools/common_rl/","page":"CommonRLInterface Integration","title":"CommonRLInterface Integration","text":"using POMDPs\nusing POMDPTools\nusing POMDPModels\nusing CommonRLInterface\n\nenv = convert(AbstractEnv, BabyPOMDP())\n\nr = act!(env, true)\nobserve(env)","category":"page"},{"location":"POMDPTools/common_rl/","page":"CommonRLInterface Integration","title":"CommonRLInterface Integration","text":"converts a Crying Baby POMDP to an RL environment and acts in and observes the environment. This environment (or any other CommonRLInterface environment), can be converted to an MDP or POMDP:","category":"page"},{"location":"POMDPTools/common_rl/","page":"CommonRLInterface Integration","title":"CommonRLInterface Integration","text":"using BasicPOMCP\n\nm = convert(POMDP, env)\nplanner = solve(POMCPSolver(), m)\na = action(planner, initialstate(m))","category":"page"},{"location":"POMDPTools/common_rl/","page":"CommonRLInterface Integration","title":"CommonRLInterface Integration","text":"You can also use the constructors listed below to manually convert between the interfaces.","category":"page"},{"location":"POMDPTools/common_rl/#Environment-Wrapper-Types","page":"CommonRLInterface Integration","title":"Environment Wrapper Types","text":"","category":"section"},{"location":"POMDPTools/common_rl/","page":"CommonRLInterface Integration","title":"CommonRLInterface Integration","text":"Since the standard reinforcement learning environment interface offers less information about the internal workings of the environment than the POMDPs.jl interface, MDPs and POMDPs created from these environments will have limited functionality. There are two types of (PO)MDP types that can wrap an environment:","category":"page"},{"location":"POMDPTools/common_rl/#Generative-model-wrappers","page":"CommonRLInterface Integration","title":"Generative model wrappers","text":"","category":"section"},{"location":"POMDPTools/common_rl/","page":"CommonRLInterface Integration","title":"CommonRLInterface Integration","text":"If the state and setstate! CommonRLInterface functions are provided, then the environment can be wrapped in a RLEnvMDP or RLEnvPOMDP and the POMDPs.jl generative model interface will be available.","category":"page"},{"location":"POMDPTools/common_rl/#Opaque-wrappers","page":"CommonRLInterface Integration","title":"Opaque wrappers","text":"","category":"section"},{"location":"POMDPTools/common_rl/","page":"CommonRLInterface Integration","title":"CommonRLInterface Integration","text":"If the state and setstate! are not provided, then the resulting POMDP or MDP can only be simulated. This case is represented using the OpaqueRLEnvPOMDP and OpaqueRLEnvMDP wrappers. From the POMDPs.jl perspective, the state of the opaque (PO)MDP is just an integer wrapped in an OpaqueRLEnvState. This keeps track of the \"age\" of the environment so that POMDPs.jl actions that attempt to interact with the environment at a different age are invalid.","category":"page"},{"location":"POMDPTools/common_rl/#Constructors","page":"CommonRLInterface Integration","title":"Constructors","text":"","category":"section"},{"location":"POMDPTools/common_rl/#Creating-RL-environments-from-MDPs-and-POMDPs","page":"CommonRLInterface Integration","title":"Creating RL environments from MDPs and POMDPs","text":"","category":"section"},{"location":"POMDPTools/common_rl/","page":"CommonRLInterface Integration","title":"CommonRLInterface Integration","text":"MDPCommonRLEnv\nPOMDPCommonRLEnv","category":"page"},{"location":"POMDPTools/common_rl/#POMDPTools.CommonRLIntegration.MDPCommonRLEnv","page":"CommonRLInterface Integration","title":"POMDPTools.CommonRLIntegration.MDPCommonRLEnv","text":"MDPCommonRLEnv(m, [s])\nMDPCommonRLEnv{RLO}(m, [s])\n\nCreate a CommonRLInterface environment from MDP m; optionally specify the state 's'.\n\nThe RLO parameter can be used to specify a type to convert the observation to. By default, this is AbstractArray. Use Any to disable conversion.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/common_rl/#POMDPTools.CommonRLIntegration.POMDPCommonRLEnv","page":"CommonRLInterface Integration","title":"POMDPTools.CommonRLIntegration.POMDPCommonRLEnv","text":"POMDPCommonRLEnv(m, [s], [o])\nPOMDPCommonRLEnv{RLO}(m, [s], [o])\n\nCreate a CommonRLInterface environment from POMDP m; optionally specify the state 's' and observation 'o'.\n\nThe RLO parameter can be used to specify a type to convert the observation to. By default, this is AbstractArray. Use Any to disable conversion.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/common_rl/#Creating-MDPs-and-POMDPs-from-RL-environments","page":"CommonRLInterface Integration","title":"Creating MDPs and POMDPs from RL environments","text":"","category":"section"},{"location":"POMDPTools/common_rl/","page":"CommonRLInterface Integration","title":"CommonRLInterface Integration","text":"RLEnvMDP\nRLEnvPOMDP\nOpaqueRLEnvMDP\nOpaqueRLEnvPOMDP","category":"page"},{"location":"POMDPTools/common_rl/#POMDPTools.CommonRLIntegration.RLEnvMDP","page":"CommonRLInterface Integration","title":"POMDPTools.CommonRLIntegration.RLEnvMDP","text":"RLEnvMDP(env; discount=1.0)\n\nCreate an MDP by wrapping a CommonRLInterface.AbstractEnv. state and setstate! from CommonRLInterface must be provided, and the POMDPs generative model functionality will be provided.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/common_rl/#POMDPTools.CommonRLIntegration.RLEnvPOMDP","page":"CommonRLInterface Integration","title":"POMDPTools.CommonRLIntegration.RLEnvPOMDP","text":"RLEnvPOMDP(env; discount=1.0)\n\nCreate an POMDP by wrapping a CommonRLInterface.AbstractEnv. state and setstate! from CommonRLInterface must be provided, and the POMDPs generative model functionality will be provided.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/common_rl/#POMDPTools.CommonRLIntegration.OpaqueRLEnvMDP","page":"CommonRLInterface Integration","title":"POMDPTools.CommonRLIntegration.OpaqueRLEnvMDP","text":"OpaqueRLEnvMDP(env; discount=1.0)\n\nWrap a CommonRLInterface.AbstractEnv in an MDP object. The state will be an OpaqueRLEnvState and only simulation will be supported.\n\n\n\n\n\n","category":"type"},{"location":"POMDPTools/common_rl/#POMDPTools.CommonRLIntegration.OpaqueRLEnvPOMDP","page":"CommonRLInterface Integration","title":"POMDPTools.CommonRLIntegration.OpaqueRLEnvPOMDP","text":"OpaqueRLEnvPOMDP(env; discount=1.0)\n\nWrap a CommonRLInterface.AbstractEnv in an POMDP object. The state will be an OpaqueRLEnvState and only simulation will be supported.\n\n\n\n\n\n","category":"type"},{"location":"#[POMDPs.jl](https://github.com/JuliaPOMDP/POMDPs.jl)","page":"POMDPs.jl","title":"POMDPs.jl","text":"","category":"section"},{"location":"","page":"POMDPs.jl","title":"POMDPs.jl","text":"A Julia interface for defining, solving and simulating partially observable Markov decision processes and their fully observable counterparts.","category":"page"},{"location":"#Package-and-Ecosystem-Features","page":"POMDPs.jl","title":"Package and Ecosystem Features","text":"","category":"section"},{"location":"","page":"POMDPs.jl","title":"POMDPs.jl","text":"General interface that can handle problems with discrete and continuous state/action/observation spaces\nA number of popular state-of-the-art solvers implemented for use out-of-the-box\nTools that make it easy to define problems and simulate solutions\nSimple integration of custom solvers into the existing interface","category":"page"},{"location":"#Available-Packages","page":"POMDPs.jl","title":"Available Packages","text":"","category":"section"},{"location":"","page":"POMDPs.jl","title":"POMDPs.jl","text":"The POMDPs.jl package contains only the interface used for expressing and solving Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs). The POMDPTools package acts as a \"standard library\" for the POMDPs.jl interface, providing implementations of commonly-used components such as policies, belief updaters, distributions, and simulators. The list of solver and support packages maintained by the JuliaPOMDP community is available at the POMDPs.jl Readme.","category":"page"},{"location":"#Documentation-Outline","page":"POMDPs.jl","title":"Documentation Outline","text":"","category":"section"},{"location":"","page":"POMDPs.jl","title":"POMDPs.jl","text":"Documentation comes in three forms:","category":"page"},{"location":"","page":"POMDPs.jl","title":"POMDPs.jl","text":"An explanatory guide is available in the sections outlined below.\nHow-to examples are available in pages in this document with \"Example\" in the title and in the POMDPExamples package.\nReference docstrings for the entire POMDPs.jl interface are available in the API Documentation section.","category":"page"},{"location":"","page":"POMDPs.jl","title":"POMDPs.jl","text":"note: Note\nWhen updating these documents, make sure this is synced with docs/make.jl!!","category":"page"},{"location":"#Basics","page":"POMDPs.jl","title":"Basics","text":"","category":"section"},{"location":"","page":"POMDPs.jl","title":"POMDPs.jl","text":"Pages = [\"install.md\", \"get_started.md\", \"concepts.md\"]","category":"page"},{"location":"#Defining-POMDP-Models","page":"POMDPs.jl","title":"Defining POMDP Models","text":"","category":"section"},{"location":"","page":"POMDPs.jl","title":"POMDPs.jl","text":"Pages = [ \"def_pomdp.md\", \"interfaces.md\"]\nDepth = 3","category":"page"},{"location":"#Writing-Solvers-and-Updaters","page":"POMDPs.jl","title":"Writing Solvers and Updaters","text":"","category":"section"},{"location":"","page":"POMDPs.jl","title":"POMDPs.jl","text":"Pages = [ \"def_solver.md\", \"offline_solver.md\", \"online_solver.md\", \"def_updater.md\" ]","category":"page"},{"location":"#Analyzing-Results","page":"POMDPs.jl","title":"Analyzing Results","text":"","category":"section"},{"location":"","page":"POMDPs.jl","title":"POMDPs.jl","text":"Pages = [ \"simulation.md\", \"run_simulation.md\", \"policy_interaction.md\" ]","category":"page"},{"location":"#POMDPTools-the-standard-library-for-POMDPs.jl","page":"POMDPs.jl","title":"POMDPTools - the standard library for POMDPs.jl","text":"","category":"section"},{"location":"","page":"POMDPs.jl","title":"POMDPs.jl","text":"Pages = [\"POMDPTools/index.md\", \"POMDPTools/distributions.md\", \"POMDPTools/model.md\", \"POMDPTools/visualization.md\", \"POMDPTools/beliefs.md\", \"POMDPTools/policies.md\", \"POMDPTools/simulators.md\", \"POMDPTools/common_rl.md\", \"POMDPTools/testing.md\"]","category":"page"},{"location":"#Reference","page":"POMDPs.jl","title":"Reference","text":"","category":"section"},{"location":"","page":"POMDPs.jl","title":"POMDPs.jl","text":"Pages = [\"faq.md\", \"api.md\"]","category":"page"}] } diff --git a/dev/simulation/index.html b/dev/simulation/index.html index 0053a37a..20b66424 100644 --- a/dev/simulation/index.html +++ b/dev/simulation/index.html @@ -21,4 +21,4 @@ d *= discount(mdp) end

      In terms of the explicit interface, the @gen macro above expands to the equivalent of:

          sp = rand(transition(pomdp, s, a))
           r = reward(pomdp, s, a, sp)
      -    s = sp
      + s = sp