function POMDP-VALUE-ITERATION(pomdp, ε) returns a utility function
inputs: pomdp, a POMDP with states S, actions A(s), transition model P(s′ | s, a),
sensor model P(e | s), rewards R(s), discount γ
ε, the maximum error allowed in the utility of any state
local variables: U, U′, sets of plans p with associated utility vectors αp
U′ ← a set containing just the empty plan [], with α[](s) = R(s)
repeat
U ← U′
U′ ← the set of all plans consisting of an action and, for each possible next percept,
a plan in U with utility vectors computed according to Equation(??)
U′ ← REMOVE-DOMINATED-PLANS(U′)
until MAX-DIFFERENCE(U, U′) < ε(1 − γ) ⁄ γ
return U
Figure ?? A high-level sketch of the value iteration algorithm for POMDPs. The REMOVE-DOMINATED-PLANS step and MAX-DIFFERENCE test are typically implemented as linear programs.