An implementation of CC-POMCP #68

troiwill · 2024-04-19T05:21:23Z

This pull request implements the cost-constrained POMCP (CC-POMCP) algorithm and its dependencies. The algorithm is in algorithms/ccpomcp.p*, while the dependencies are in the framework/generalization.p* and utils/cvec.p* files. This pull request also proposes a generic model, called a ResponseModel, and a corresponding output, called a Response. The name "response" comes from the notion of independent and dependent variables, where a response (reward, cost, etc.) depends on the interaction with the real or simulated environment. Thus, a response model is a wrapper for more specific models, such as reward and cost models (and any others that will follow in the future). By extension, a response is a wrapper for the reward, cost, etc.

The framework/generalization.p* files contain

The pull request has the following:

Implementation of the CCPOMCP and its dependencies,
Some updates to pre-existing algorithms to reduce copied-and-pasted code,
Some updates to the code for rocksample,
Implementation of the rocksample from the CC-POMCP paper,
Test script for Vector operations (in test_util_vector_ops.py), and
Passes for the test_all.py script.

… test file.

…r; added example problem for CCPOMCP.

zkytony · 2024-04-19T12:54:05Z

Thanks for the effort. Much better organized than before. Will take a pass soon. One thing - the CIs are failing. Could you fix this first? I just merged this PR that will make CI actions trigger when you update the source branch of this PR. Please update your branch with the latest changes in main.

troiwill · 2024-05-06T23:23:33Z

I merged the changes from main into ccpomcp. Let me know if I need to update anything.

zkytony · 2024-05-20T10:57:41Z

Great. Looks like CIs passed except for pre-commit. I will review the code soon. ~~I should also enable CIs for all branches..~~. It should be enabled, after you rebase/merge latest changes in main. @troiwill

zkytony

Apologies for the super delay... I was planning to test this out locally and merge if there's no issue. But when I ran python -m pomdp_py -r ccrocksample, the program failed due to particle deprivation - I think you should have at least one working example of your algorithm. I was curious what advantage or difference CC-POMCP has over POMCP but this is not clear to me. I hope that your example could make that clear. Thanks.

*** Testing CC-POMCP ***
==== Step 1 ====
True state: State((0, 3) | ('good', 'good', 'bad', 'bad', 'good', 'good', 'good', 'good') | False)
Action: check-5
Observation: good
Response: reward=0, cost=1
Response (Cumulative): reward=0, cost=1
Response (Cumulative Discounted): reward=0, cost=1
__num_sims__: 10000
__plan_time__: 12.15742
World:

______ID______
4......>
7....0.>
3..2...>
R......>
..6....>
1......>
.....5.>
_____G/B_____
$......>
$....$.>
x..x...>
R......>
..$....>
$......>
.....$.>

==== Step 2 ====
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/kzheng/repo/pomdp-py/pomdp_py/__main__.py", line 51, in <module>
    main()
  File "/home/kzheng/repo/pomdp-py/pomdp_py/problems/cc_rocksample/cc_rocksample_problem.py", line 215, in main
    total_response, total_discounted_response = test_planner(
  File "/home/kzheng/repo/pomdp-py/pomdp_py/problems/cc_rocksample/cc_rocksample_problem.py", line 158, in test_planner
    ccpomcp.update(cc_rocksample.agent, action, real_observation)
  File "pomdp_py/algorithms/pomcp.pyx", line 96, in pomdp_py.algorithms.pomcp.POMCP.update
    cpdef update(self, Agent agent, Action real_action, Observation real_observation,
  File "pomdp_py/algorithms/pomcp.pyx", line 120, in pomdp_py.algorithms.pomcp.POMCP.update
    agent.set_belief(particle_reinvigoration(tree_belief,
  File "pomdp_py/representations/belief/particles.pyx", line 29, in pomdp_py.representations.belief.particles.particle_reinvigoration
    raise ValueError("Particle deprivation.")
ValueError: Particle deprivation.

Also, I noticed the added unit test for cvec; That's good. Could you add a test that involves calling CC-POMCP? It's true that such tests aren't there yet even for existing algorithms, but it'd be good for new algorithms to have tests still.

Left other comments, mostly minor and docstring-related. I hope the delay here didn't delay the progress of your work. Thanks again for the contribution!

zkytony · 2024-10-10T15:35:50Z

pomdp_py/utils/cvec.pxd

+cdef void vector_copy(double[:] src, double[:] dst)
+
+
+cdef class Vector:


Could you add a docstring that describes what this is for?

zkytony · 2024-10-10T15:48:51Z

pomdp_py/framework/generalization.pyx

+    Option
+)
+from typing import Optional
+


What is the purpose of generalization? Please add docstring

zkytony · 2024-10-10T15:50:02Z

pomdp_py/framework/generalization.pyx

+        raise NotImplementedError
+
+
+cdef class ResponseModel:


How does "response" differ from "observation" or "reward"? Please clarify. Please include relevant references since this is not common terminology. If it is a wrapper for "reward", why is this wrapper necessary?

Perhaps, it is best to point out the gist of CC-POMCP: Maximize cumulative reward while constraining cumulative cost.

zkytony · 2024-10-10T16:03:04Z

pomdp_py/problems/cc_rocksample/cc_rocksample_problem.py

+        print("Observation: %s" % str(real_observation))
+        print("Response: %s" % str(env_response))
+        print("Response (Cumulative): %s" % str(total_response))
+        print("Response (Cumulative Discounted): %s" % str(total_discounted_response))


To showcase the benefit of CC-POMCP, you should indicate what the cost constraint is, and demonstrate that the constraint isn't crossed.

zkytony · 2024-10-10T16:04:00Z

pomdp_py/problems/cc_rocksample/cc_rocksample_problem.py

+    State,
+)
+
+


Would be really nice to have some other domains; Rocksample is really old and kind of detached from reality. If cost-constraint is really useful it should find applications in other more realistic domains.

zkytony · 2024-10-10T16:07:20Z

pomdp_py/algorithms/po_uct.pyx

@@ -295,6 +295,20 @@ cdef class POUCT(Planner):
        """
        self._rollout_policy = rollout_policy

+    cpdef QNode _create_qnode(
+        self,


This lgtm. Please add docstring though; Looks like it's necessary to create QNode with parameters that differ from default ones e.g. self._num_visits_init in CC-POMCP. Why?

troiwill added 30 commits March 27, 2024 17:31

Intial commit for Vector, GenericResponse, RewardCost, and an initial…

034cd45

… test file.

Updated ignore file.

5b3321a

Added CCPOMCP algorithm and dependencies; added test script for Vecto…

9223f94

…r; added example problem for CCPOMCP.

Fixed error.

0571618

Updated code to improve speed.

954c404

Removed complex way of handling null responses.

a6610eb

Implemented NumPy vectors and reduced Python references.

6dbcbf7

Updated and added tests.

62fa04b

Added example problem for rocksample for CCPOMCP.

bd26db7

Added profiling for cython.

2e68bfb

Limited nsteps for profiling.

555bb68

Limited nsteps for profiling.

c759d42

Added code for profiling.

e899754

Removed except * from c functions.

5ddaaae

Minor additions.

0b88307

Added profiling.

dd4705f

Added profiling.

b946db0

Minor changes.

f166ad8

Added the comments to function calls.

fa8dac1

Removed except * from function names.

19e779a

Added _create_qnode function to reduce code.

a7e666c

Minor update.

1a99ff7

Removed unneeded test.

57c0568

Added code comments.

ce008e8

Removed profiling.

915b4c0

Removed profiling code.

900b6a8

Merged ccpomcp-fast-greedy.

e06d4ec

Changed nsteps to 100.

cc2e218

Corrected the description for the Response class.

cad3e92

Removed print statement used for debugging.

835449d

zkytony self-requested a review April 19, 2024 12:35

Merge remote-tracking branch 'upstream/main' into ccpomcp-fix-ci

f90d0d7

Fixed issue with missing numpy dependency during pip install.

cf33420

zkytony requested changes Oct 10, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An implementation of CC-POMCP #68

An implementation of CC-POMCP #68

troiwill commented Apr 19, 2024

zkytony commented Apr 19, 2024 •

edited

Loading

troiwill commented May 6, 2024

zkytony commented May 20, 2024 •

edited

Loading

zkytony left a comment •

edited

Loading

zkytony Oct 10, 2024

zkytony Oct 10, 2024

zkytony Oct 10, 2024

zkytony Oct 10, 2024

zkytony Oct 10, 2024

zkytony Oct 10, 2024

		cdef void vector_copy(double[:] src, double[:] dst)


		cdef class Vector:

An implementation of CC-POMCP #68

Are you sure you want to change the base?

An implementation of CC-POMCP #68

Conversation

troiwill commented Apr 19, 2024

zkytony commented Apr 19, 2024 • edited Loading

troiwill commented May 6, 2024

zkytony commented May 20, 2024 • edited Loading

zkytony left a comment • edited Loading

Choose a reason for hiding this comment

zkytony Oct 10, 2024

Choose a reason for hiding this comment

zkytony Oct 10, 2024

Choose a reason for hiding this comment

zkytony Oct 10, 2024

Choose a reason for hiding this comment

zkytony Oct 10, 2024

Choose a reason for hiding this comment

zkytony Oct 10, 2024

Choose a reason for hiding this comment

zkytony Oct 10, 2024

Choose a reason for hiding this comment

zkytony commented Apr 19, 2024 •

edited

Loading

zkytony commented May 20, 2024 •

edited

Loading

zkytony left a comment •

edited

Loading