-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1 from JuliaPOMDP/discrete-explicit
Discrete explicit
- Loading branch information
Showing
9 changed files
with
450 additions
and
107 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,123 @@ | ||
# QuickPOMDPs | ||
|
||
Eventually this will be a repository containing more simplified interfaces for expressing certain classes of POMDPs. The goal is for [POMDPs.jl]( https://github.com/JuliaPOMDP/POMDPs.jl) to act as a low level interface (like [MathProgBase](https://github.com/JuliaOpt/MathProgBase.jl)) and for the interface(s) defined here to act as concise and convenient high-level interface (like [JuMP](https://github.com/JuliaOpt/JuMP.jl) or [Convex](https://github.com/JuliaOpt/Convex.jl)). | ||
|
||
Another package that should be referenced when designing this is [PLite.jl](https://github.com/sisl/PLite.jl/blob/master/docs/README.md). | ||
|
||
Contributions of new interfaces for defining specific classes of problems are welcome! | ||
|
||
For now, there are just a few sketches of interfaces outlined below: | ||
|
||
# Interface Ideas | ||
|
||
## Basic Discrete | ||
|
||
Can represent any problem with discrete actions, observations, and states using the POMDPs.jl explicit interface. This would just be a tight wrapper over the POMDPs.jl interface and would look very similar to a pure POMDPs.jl implementation. Advantages over direct POMDPs.jl are that it's slightly more compact and **you don't have to understand object-oriented programming**. | ||
|
||
The Tiger problem would look like this: | ||
|
||
```julia | ||
pomdp = @discretePOMDP begin | ||
@states [:tiger_l, :tiger_r] | ||
@actions [:open_l, :open_r, :listen] | ||
@observations [:tiger_l, :tiger_r] | ||
|
||
@transition function (s, a) | ||
if a == :listen | ||
return [s]=>[1.0] | ||
else | ||
return [TIGER_L, TIGER_R]=>[0.5, 0.5] # reset | ||
end | ||
end | ||
|
||
@reward Dict((:tiger_l, :open_l) => -100., | ||
(:tiger_r, :open_r) => -100., | ||
(:tiger_l, :open_r) => 10., | ||
(:tiger_r, :open_l) => 10. | ||
) | ||
|
||
@default_reward -1.0 | ||
|
||
@observation function (a, sp) | ||
if a == :listen | ||
if sp == :tiger_l | ||
return [:tiger_l, :tiger_r]=>[0.85, 0.15] | ||
else | ||
return [:tiger_r, :tiger_l]=>[0.85, 0.15] | ||
end | ||
else | ||
return [:tiger_l, :tiger_r]=>[0.5, 0.5] | ||
end | ||
end | ||
|
||
@initial [:tiger_l, :tiger_r]=>[0.5, 0.5] | ||
@discount 0.95 | ||
end | ||
``` | ||
|
||
Note, this could also be done without any macros as a constructor with keyword arguments. Perhaps that would be easier to understand? | ||
|
||
## Generative Function | ||
|
||
Another common problem is one where the dynamics are given by a function. The crying baby problem would look something like this: | ||
|
||
```julia | ||
pomdp = @generativePOMDP begin | ||
@initial rng -> rand(rng) > 0.5 | ||
|
||
@dynamics function (s, a, rng) | ||
if s # hungry | ||
sp = true | ||
else # not hungry | ||
sp = rand(rng) < 0.1 ? true : false | ||
end | ||
if sp # hungry | ||
o = rand(rng) < 0.8 ? true : false | ||
else # not hungry | ||
o = rand(rng) < 0.1 ? true : false | ||
end | ||
r = (s ? -10.0 : 0.0) + (a ? -5.0 : 0.0) | ||
return s, o, r | ||
end | ||
|
||
@discount 0.95 | ||
end | ||
``` | ||
|
||
Again, you could do this without macros, and just use keyword arguments. | ||
|
||
## Named Variables | ||
|
||
It might also be more clear what is going on if we declared variables with names as shown in the example below. | ||
|
||
This would be tougher to compile though, and it's not clear what the easiest way to express distributions or reward would be. | ||
|
||
Ideas welcome! | ||
|
||
```julia | ||
mdp = @MDP begin | ||
xmax = 10 | ||
ymax = 10 | ||
|
||
@states begin | ||
x in 1:10 | ||
y in 1:10 | ||
end | ||
|
||
@actions begin | ||
dir in [:up, :down, :left, :right] | ||
end | ||
|
||
@reward rdict = Dict( | ||
#XXX no idea how to define this in terms of x and y | ||
) | ||
default_reward = 0.0 | ||
|
||
@transition #XXX what is the most concise way to define the transition distribution?? | ||
|
||
terminal = vals(reward) | ||
discount = 0.95 | ||
|
||
initial | ||
end | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
name = "QuickPOMDPs" | ||
uuid = "8af83fb2-a731-493c-9049-9e19dbce6165" | ||
authors = ["Zachary Sunberg <[email protected]>"] | ||
version = "0.1.0" | ||
|
||
[deps] | ||
BeliefUpdaters = "8bb6e9a1-7d73-552c-a44a-e5dc5634aac4" | ||
POMDPModelTools = "08074719-1b2a-587c-a292-00f91cc44415" | ||
POMDPTesting = "92e6a534-49c2-5324-9027-86e3c861ab81" | ||
POMDPs = "a93abf59-7444-517b-a68a-c42f96afdd7d" | ||
|
||
[extras] | ||
POMDPPolicies = "182e52fb-cfd0-5e46-8c26-fd0667c990f4" | ||
POMDPSimulators = "e0d0a172-29c6-5d4e-96d0-f262df5d01fd" | ||
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40" | ||
|
||
[targets] | ||
test = ["Test", "POMDPPolicies", "POMDPSimulators"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,127 +1,63 @@ | ||
# QuickPOMDPs | ||
|
||
[![Build Status](https://travis-ci.org/zsunberg/QuickPOMDPs.jl.svg?branch=master)](https://travis-ci.org/zsunberg/QuickPOMDPs.jl) | ||
[![Coverage Status](https://coveralls.io/repos/zsunberg/QuickPOMDPs.jl/badge.svg?branch=master&service=github)](https://coveralls.io/github/zsunberg/QuickPOMDPs.jl?branch=master) | ||
[![codecov.io](http://codecov.io/github/zsunberg/QuickPOMDPs.jl/coverage.svg?branch=master)](http://codecov.io/github/zsunberg/QuickPOMDPs.jl?branch=master) | ||
[![Build Status](https://travis-ci.org/JuliaPOMDP/QuickPOMDPs.jl.svg?branch=master)](https://travis-ci.org/JuliaPOMDP/QuickPOMDPs.jl) | ||
[![Coverage Status](https://coveralls.io/repos/JuliaPOMDP/QuickPOMDPs.jl/badge.svg?branch=master&service=github)](https://coveralls.io/github/JuliaPOMDP/QuickPOMDPs.jl?branch=master) | ||
[![codecov.io](http://codecov.io/github/JuliaPOMDP/QuickPOMDPs.jl/coverage.svg?branch=master)](http://codecov.io/github/JuliaPOMDP/QuickPOMDPs.jl?branch=master) | ||
|
||
Eventually this will be a repository containing one or more simplified interfaces for expressing certain classes of POMDPs. The goal is for [POMDPs.jl]( https://github.com/JuliaPOMDP/POMDPs.jl) to act as a low level interface (like [MathProgBase](https://github.com/JuliaOpt/MathProgBase.jl)) and for the interface(s) defined here to act as concise and convenient high-level interface (like [JuMP](https://github.com/JuliaOpt/JuMP.jl) or [Convex](https://github.com/JuliaOpt/Convex.jl)). | ||
Simplified Interface for specifying [POMDPs.jl](https://github.com/JuliaPOMDP/POMDPs.jl) models. | ||
|
||
Another package that should be referenced when designing this is [PLite.jl](https://github.com/sisl/PLite.jl/blob/master/docs/README.md). | ||
For now there is only one interface (Discrete Explicit), but more may be added (see [IDEAS.md](IDEAS.md)). | ||
|
||
Contributions of new interfaces for defining specific classes of problems are welcome! | ||
## Discrete Explicit Interface | ||
|
||
For now, there are just a few sketches of interfaces outlined below: | ||
This interface is designed to match the standard definition of a POMDP in the literature as closely as possible. The standard definition uses the tuple (S,A,O,T,Z,R,γ) for a POMDP and (S,A,T,R,γ) for an MDP, where | ||
|
||
# Interface Ideas | ||
- S, A, and O are the state, action, and observation spaces, | ||
- T and Z are the transition and observation probability distribution functions (pdfs), | ||
- R is the reward function, and | ||
- γ is the discount factor. | ||
|
||
## Basic Discrete | ||
The `DiscreteExplicitPOMDP` and `DiscreteExplicitMDP` types are provided for POMDPs and MDPs with discrete spaces and explicitly defined distributions. They should offer moderately good performance on small to medium-sized problems. | ||
|
||
Can represent any problem with discrete actions, observations, and states using the POMDPs.jl explicit interface. This would just be a tight wrapper over the POMDPs.jl interface and would look very similar to a pure POMDPs.jl implementation. Advantages over direct POMDPs.jl are that it's slightly more compact and **you don't have to understand object-oriented programming**. | ||
### Example | ||
|
||
The Tiger problem would look like this: | ||
The classic tiger POMDP [Kaelbling et al. 98](http://www.sciencedirect.com/science/article/pii/S000437029800023X) can be defined as follows: | ||
|
||
```julia | ||
pomdp = @discretePOMDP begin | ||
@states [:tiger_l, :tiger_r] | ||
@actions [:open_l, :open_r, :listen] | ||
@observations [:tiger_l, :tiger_r] | ||
S = [:left, :right] # S, A, and O may contain any objects | ||
A = [:left, :right, :listen] # including user-defined types | ||
O = [:left, :right] | ||
γ = 0.95 | ||
|
||
@transition function (s, a) | ||
function T(s, a, sp) | ||
if a == :listen | ||
return [s]=>[1.0] | ||
else | ||
return [TIGER_L, TIGER_R]=>[0.5, 0.5] # reset | ||
return s == sp | ||
else # a door is opened | ||
return 0.5 #reset | ||
end | ||
end | ||
|
||
@reward Dict((:tiger_l, :open_l) => -100., | ||
(:tiger_r, :open_r) => -100., | ||
(:tiger_l, :open_r) => 10., | ||
(:tiger_r, :open_l) => 10. | ||
) | ||
|
||
@default_reward -1.0 | ||
|
||
@observation function (a, sp) | ||
function Z(a, sp, o) | ||
if a == :listen | ||
if sp == :tiger_l | ||
return [:tiger_l, :tiger_r]=>[0.85, 0.15] | ||
if o == sp | ||
return 0.85 | ||
else | ||
return [:tiger_r, :tiger_l]=>[0.85, 0.15] | ||
return 0.15 | ||
end | ||
else | ||
return [:tiger_l, :tiger_r]=>[0.5, 0.5] | ||
return 0.5 | ||
end | ||
end | ||
|
||
@initial [:tiger_l, :tiger_r]=>[0.5, 0.5] | ||
@discount 0.95 | ||
end | ||
``` | ||
|
||
Note, this could also be done without any macros as a constructor with keyword arguments. Perhaps that would be easier to understand? | ||
|
||
## Generative Function | ||
|
||
Another common problem is one where the dynamics are given by a function. The crying baby problem would look something like this: | ||
|
||
```julia | ||
pomdp = @generativePOMDP begin | ||
@initial rng -> rand(rng) > 0.5 | ||
|
||
@dynamics function (s, a, rng) | ||
if s # hungry | ||
sp = true | ||
else # not hungry | ||
sp = rand(rng) < 0.1 ? true : false | ||
end | ||
if sp # hungry | ||
o = rand(rng) < 0.8 ? true : false | ||
else # not hungry | ||
o = rand(rng) < 0.1 ? true : false | ||
function R(s, a) | ||
if a == :listen | ||
return -1.0 | ||
elseif s == a # the tiger was found | ||
return -100.0 | ||
else # the tiger was escaped | ||
return 10.0 | ||
end | ||
r = (s ? -10.0 : 0.0) + (a ? -5.0 : 0.0) | ||
return s, o, r | ||
end | ||
|
||
@discount 0.95 | ||
end | ||
``` | ||
|
||
Again, you could do this without macros, and just use keyword arguments. | ||
|
||
## Named Variables | ||
|
||
It might also be more clear what is going on if we declared variables with names as shown in the example below. | ||
|
||
This would be tougher to compile though, and it's not clear what the easiest way to express distributions or reward would be. | ||
|
||
Ideas welcome! | ||
|
||
```julia | ||
mdp = @MDP begin | ||
xmax = 10 | ||
ymax = 10 | ||
|
||
@states begin | ||
x in 1:10 | ||
y in 1:10 | ||
end | ||
|
||
@actions begin | ||
dir in [:up, :down, :left, :right] | ||
end | ||
|
||
@reward rdict = Dict( | ||
#XXX no idea how to define this in terms of x and y | ||
) | ||
default_reward = 0.0 | ||
|
||
@transition #XXX what is the most concise way to define the transition distribution?? | ||
|
||
terminal = vals(reward) | ||
discount = 0.95 | ||
|
||
initial | ||
end | ||
m = DiscreteExplicitPOMDP(S,A,O,T,Z,R,γ) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,14 @@ | ||
module QuickPOMDPs | ||
|
||
# package code goes here | ||
using POMDPs | ||
using POMDPModelTools | ||
using BeliefUpdaters | ||
using POMDPTesting | ||
|
||
export | ||
DiscreteExplicitPOMDP, | ||
DiscreteExplicitMDP | ||
|
||
include("discrete_explicit.jl") | ||
|
||
end # module |
Oops, something went wrong.