POMDP-VALUE-ITERATION

AIMA3e

function POMDP-VALUE-ITERATION(pomdp, ε) returns a utility function
inputs: pomdp, a POMDP with states S, actions A(s), transition model P(s′ | s, a),
sensor model P(e | s), rewards R(s), discount γ
ε, the maximum error allowed in the utility of any state
local variables: U, U′, sets of plans p with associated utility vectors α_p

U′ ← a set containing just the empty plan [], with α_[](s) = R(s)
repeat
U ← U′
U′ ← the set of all plans consisting of an action and, for each possible next percept,
a plan in U with utility vectors computed according to Equation(??)
U′ ← REMOVE-DOMINATED-PLANS(U′)
until MAX-DIFFERENCE(U, U′) < ε(1 − γ) ⁄ γ
return U

Figure ?? A high-level sketch of the value iteration algorithm for POMDPs. The REMOVE-DOMINATED-PLANS step and MAX-DIFFERENCE test are typically implemented as linear programs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

POMDP-Value-Iteration.md

POMDP-Value-Iteration.md

POMDP-VALUE-ITERATION

AIMA3e

Files

POMDP-Value-Iteration.md

Latest commit

History

POMDP-Value-Iteration.md

File metadata and controls

POMDP-VALUE-ITERATION

AIMA3e