-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
An implementation of CC-POMCP #68
base: main
Are you sure you want to change the base?
Conversation
…r; added example problem for CCPOMCP.
Thanks for the effort. Much better organized than before. Will take a pass soon. One thing - the CIs are failing. Could you fix this first? I just merged this PR that will make CI actions trigger when you update the source branch of this PR. Please update your branch with the latest changes in main. |
I merged the changes from |
Great. Looks like CIs passed except for pre-commit. I will review the code soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the super delay... I was planning to test this out locally and merge if there's no issue. But when I ran python -m pomdp_py -r ccrocksample
, the program failed due to particle deprivation - I think you should have at least one working example of your algorithm. I was curious what advantage or difference CC-POMCP has over POMCP but this is not clear to me. I hope that your example could make that clear. Thanks.
*** Testing CC-POMCP ***
==== Step 1 ====
True state: State((0, 3) | ('good', 'good', 'bad', 'bad', 'good', 'good', 'good', 'good') | False)
Action: check-5
Observation: good
Response: reward=0, cost=1
Response (Cumulative): reward=0, cost=1
Response (Cumulative Discounted): reward=0, cost=1
__num_sims__: 10000
__plan_time__: 12.15742
World:
______ID______
4......>
7....0.>
3..2...>
R......>
..6....>
1......>
.....5.>
_____G/B_____
$......>
$....$.>
x..x...>
R......>
..$....>
$......>
.....$.>
==== Step 2 ====
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/kzheng/repo/pomdp-py/pomdp_py/__main__.py", line 51, in <module>
main()
File "/home/kzheng/repo/pomdp-py/pomdp_py/problems/cc_rocksample/cc_rocksample_problem.py", line 215, in main
total_response, total_discounted_response = test_planner(
File "/home/kzheng/repo/pomdp-py/pomdp_py/problems/cc_rocksample/cc_rocksample_problem.py", line 158, in test_planner
ccpomcp.update(cc_rocksample.agent, action, real_observation)
File "pomdp_py/algorithms/pomcp.pyx", line 96, in pomdp_py.algorithms.pomcp.POMCP.update
cpdef update(self, Agent agent, Action real_action, Observation real_observation,
File "pomdp_py/algorithms/pomcp.pyx", line 120, in pomdp_py.algorithms.pomcp.POMCP.update
agent.set_belief(particle_reinvigoration(tree_belief,
File "pomdp_py/representations/belief/particles.pyx", line 29, in pomdp_py.representations.belief.particles.particle_reinvigoration
raise ValueError("Particle deprivation.")
ValueError: Particle deprivation.
Also, I noticed the added unit test for cvec; That's good. Could you add a test that involves calling CC-POMCP? It's true that such tests aren't there yet even for existing algorithms, but it'd be good for new algorithms to have tests still.
Left other comments, mostly minor and docstring-related. I hope the delay here didn't delay the progress of your work. Thanks again for the contribution!
cdef void vector_copy(double[:] src, double[:] dst) | ||
|
||
|
||
cdef class Vector: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a docstring that describes what this is for?
Option | ||
) | ||
from typing import Optional | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of generalization
? Please add docstring
raise NotImplementedError | ||
|
||
|
||
cdef class ResponseModel: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does "response" differ from "observation" or "reward"? Please clarify. Please include relevant references since this is not common terminology. If it is a wrapper for "reward", why is this wrapper necessary?
Perhaps, it is best to point out the gist of CC-POMCP: Maximize cumulative reward while constraining cumulative cost.
print("Observation: %s" % str(real_observation)) | ||
print("Response: %s" % str(env_response)) | ||
print("Response (Cumulative): %s" % str(total_response)) | ||
print("Response (Cumulative Discounted): %s" % str(total_discounted_response)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To showcase the benefit of CC-POMCP, you should indicate what the cost constraint is, and demonstrate that the constraint isn't crossed.
State, | ||
) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be really nice to have some other domains; Rocksample is really old and kind of detached from reality. If cost-constraint is really useful it should find applications in other more realistic domains.
@@ -295,6 +295,20 @@ cdef class POUCT(Planner): | |||
""" | |||
self._rollout_policy = rollout_policy | |||
|
|||
cpdef QNode _create_qnode( | |||
self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lgtm. Please add docstring though; Looks like it's necessary to create QNode with parameters that differ from default ones e.g. self._num_visits_init
in CC-POMCP. Why?
This pull request implements the cost-constrained POMCP (CC-POMCP) algorithm and its dependencies. The algorithm is in
algorithms/ccpomcp.p*
, while the dependencies are in theframework/generalization.p*
andutils/cvec.p*
files. This pull request also proposes a generic model, called aResponseModel
, and a corresponding output, called aResponse
. The name "response" comes from the notion of independent and dependent variables, where a response (reward, cost, etc.) depends on the interaction with the real or simulated environment. Thus, a response model is a wrapper for more specific models, such as reward and cost models (and any others that will follow in the future). By extension, a response is a wrapper for the reward, cost, etc.The
framework/generalization.p*
files containThe pull request has the following:
test_util_vector_ops.py
), and