Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues using MaxUCB criterion #38

Open
JvThunder opened this issue Oct 22, 2023 · 2 comments
Open

Issues using MaxUCB criterion #38

JvThunder opened this issue Oct 22, 2023 · 2 comments

Comments

@JvThunder
Copy link

I was trying to use POMCPOWSolver(criterion=MaxUCB(1.0)) for my project, but I got an error.
Then, I tried a very simple environment with simple transitions as in the following code:

from julia.POMDPs import solve, simulate
from julia.POMDPTools import Deterministic, HistoryRecorder, RandomPolicy
from julia.POMCPOW import POMCPOWSolver, MaxUCB
from julia.CommonRLSpaces import Box
from quickpomdps import QuickPOMDP

def transition(state, action):
    return Deterministic([state[0] + 1])

def observation(state, action, next_state):
    return Deterministic(next_state)

def reward(state, action, next_state):
    return 1

def terminal(state):
    return (state[0] >= 2)

pomdp = QuickPOMDP(
    states = Box([0], [3]),
    actions = Box([0], [1]),
    observations = Box([0], [3]),
    discount = 0.9,
    isterminal = terminal,
    transition = transition,
    observation = observation,
    reward = reward,
    initialstate = Deterministic([1])
)

# TODO: this is not working
# this works well
# solver = POMCPOWSolver(max_time = 1, tree_queries = 15)
# this got into MethodError: no method matching insert
solver = POMCPOWSolver(criterion=MaxUCB(1.0))

policy = solve(solver, pomdp)
hr = HistoryRecorder(max_steps=2)
hist = simulate(hr, pomdp, policy)
rhist = simulate(hr, pomdp, RandomPolicy(pomdp))

it = 0
for step in hist:
    print(f"____step:{it}____")
    print("State: ", step.s)
    print("Action: ", step.a)
    print("Reward: ", step.r)
    print("__________________")
    it += 1

Note that I am using python-jl to run this. I also tried POMCPOWSolver(max_time = 1, tree_queries = 15) and it works fine, so I think the issue might be the MaxUCB. The error I got is:

Traceback (most recent call last):
  File "/home/jvthunder/anaconda/envs/pomdp/lib/python3.8/site-packages/julia/pseudo_python_cli.py", line 308, in main
    python(**vars(ns))
  File "/home/jvthunder/anaconda/envs/pomdp/lib/python3.8/site-packages/julia/pseudo_python_cli.py", line 59, in python
    scope = runpy.run_path(script, run_name="__main__")
  File "/home/jvthunder/anaconda/envs/pomdp/lib/python3.8/runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/home/jvthunder/anaconda/envs/pomdp/lib/python3.8/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/home/jvthunder/anaconda/envs/pomdp/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "test.py", line 60, in <module>
    hist = simulate(hr, pomdp, policy)
RuntimeError: <PyCall.jlwrap (in a Julia function called from Python)
JULIA: MethodError: no method matching insert!(::POMCPOW.CategoricalVector{Tuple{StaticArraysCore.SVector{1, Float64}, Float64}}, ::Tuple{Vector{Int64}, Float64}, ::Float64)

Closest candidates are:
  insert!(!Matched::DataStructures.SortedMultiDict{K, D, Ord}, ::Any, ::Any) where {K, D, Ord<:Base.Order.Ordering}
   @ DataStructures ~/.julia/packages/DataStructures/MKv4P/src/sorted_multi_dict.jl:167
  insert!(::POMCPOW.CategoricalVector{T}, !Matched::T, ::Float64) where T
   @ POMCPOW ~/.julia/packages/POMCPOW/f6XAQ/src/categorical_vector.jl:12
  insert!(!Matched::DataStructures.BalancedTree23{K, D, Ord}, ::Any, ::Any, !Matched::Bool) where {K, D, Ord<:Base.Order.Ordering}
   @ DataStructures ~/.julia/packages/DataStructures/MKv4P/src/balanced_tree.jl:358
  ...

Can you please tell me how to make this work with MaxUCB?

@zsunberg
Copy link
Member

You might be able to fix the error with from julia.StaticArrays import SVector and then replace your current transition distribution with

def transition(state, action):
    return Deterministic(SVector(state[0] + 1))

Let me know if that works and/or if you need more explanation.

@zsunberg
Copy link
Member

I think that #39 fixes the problem so that you can use the original code anyways now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants