Prune Changes #19

dylan-asmar · 2024-06-28T23:16:29Z

This PR introduces some changes to the pruning process. The changes don't address witness pruning or adapting the delta parameters, but it implements some changes to the prune_alpha! function and adds in a method to prune strictly dominated alpha vectors (comparing vectors only, not at specific belief points).

The original intersection_distance implementation did not return the actual distance from the alpha vectors intersection to the belief point. The original implementation took advantage of the sparsity of the belief vector and only calculated the denominator (the squared difference of the alpha vectors) for non-zero values of the belief vector. This was a good idea for speed, but I think it was an error. For example, check what the original function would return if you had [1.0, 0.0] for a1, [0.0, 1.0] for a2, and [0.5, 0.5] for b. The function would return 1.0 as the distance, but the actual distance is 0.707. This PR has a revised implementation. It is a bit slower but returns a more accurate distance.

Based on the slightly slower distance calculation, I added prune_strictly_dominated!(::SARSOPTree) to prune strictly dominated alpha vectors. This function removes any vectors that are dominated across the entire belief space (by just comparing the values of the vectors, not at specific belief points). This function is called in prune!(::SARSOPSolver, ::SARSOPTree) after the beliefs are pruned. This addition ended up speeding up the process a bit and almost made up for the slower distance calculation (still saw some slower times in the larger problems, e.g. RockSample(15,10)).

The last change is to prune_alpha!. Instead of looping through each vector and then checking at all belief points, we now first check for the dominating alpha vectors at each belief point. Then we only need to compare the dominated alpha vector at each point to all other vectors to check for delta dominance. This change doesn't catch duplicate alpha vectors, but the added prune_strictly_dominated! function catches those. This method also sets up the framework towards an adaptive delta implementation (related issue) as we need to track which belief points each vector is dominant at.

I did some ablation studies for each change. However, I have only included the final performance comparisons here for different delta values. The only hit in performance is RockSample(5,5), but the larger problems see a decent performance increase.

Performance Comparison (delta = 0.1)

BabyPOMDP

Settings:

epsilon: 0.1
precision: 0.001
delta: 0.1
max_steps: 50 (for benchmarking)
max_time: 5.0 (for policy run)

Benchmark

Original

BenchmarkTools.Trial: 3171 samples with 1 evaluation.
 Range (min … max):  1.505 ms …  43.395 ms  ┊ GC (min … max): 0.00% … 96.20%
 Time  (median):     1.537 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.575 ms ± 773.872 μs  ┊ GC (mean ± σ):  1.94% ±  5.34%

    ▂▇██▆▆▃▅▅▃▂▁                                               
  ▃▆████████████▇▆▆▆▅▄▅▄▅▃▄▃▃▃▃▃▃▃▂▂▂▃▂▃▂▂▂▂▁▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂ ▄
  1.51 ms         Histogram: frequency by time         1.7 ms <

 Memory estimate: 425.98 KiB, allocs estimate: 3847.

New

BenchmarkTools.Trial: 3639 samples with 1 evaluation.
 Range (min … max):  1.296 ms …   7.054 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.341 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.373 ms ± 289.663 μs  ┊ GC (mean ± σ):  1.46% ± 5.73%

       ▆█▅▅▂                                                   
  ▃▄▅▆███████▇▅▄▄▄▃▃▃▃▂▂▂▂▂▂▂▂▂▁▁▂▂▂▁▁▁▂▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂ ▃
  1.3 ms          Histogram: frequency by time        1.65 ms <

 Memory estimate: 367.88 KiB, allocs estimate: 3862.

Policy Run

Original

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          -28.3501795  -15.6037314  12.7464480985   3          30        
 0.00       10         -16.3057342  -16.2819897  0.0237444537    2          98        
 0.00       20         -16.3054833  -16.3024060  0.0030772894    2          133       
--------------------------------------------------------------------------------------
 0.00       28         -16.3054833  -16.3045949  0.0008883805    2          134       
--------------------------------------------------------------------------------------

New

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          -28.3501795  -15.6037314  12.7464480985   3          30        
 0.00       10         -16.3057342  -16.2819897  0.0237444537    2          98        
 0.00       20         -16.3054833  -16.3024060  0.0030772894    2          133       
--------------------------------------------------------------------------------------
 0.00       28         -16.3054833  -16.3045949  0.0008883805    2          134       
--------------------------------------------------------------------------------------

TigerPOMDP

Settings:

epsilon: 0.1
precision: 0.001
delta: 0.1
max_steps: 50 (for benchmarking)
max_time: 5.0 (for policy run)

Benchmark

Original

BenchmarkTools.Trial: 175 samples with 1 evaluation.
 Range (min … max):  27.593 ms … 68.988 ms  ┊ GC (min … max): 0.00% … 59.07%
 Time  (median):     28.125 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   28.701 ms ±  3.273 ms  ┊ GC (mean ± σ):  1.37% ±  4.96%

   ▂ ▃▄▂█▂▁▁                                                   
  ▄█████████▅▅▅▄▄▄▃▃▁▁▃▁▄▅▃▃▃▁▁▁▃▁▁▁▁▃▁▁▃▃▁▁▁▁▁▁▁▃▁▁▃▃▁▃▁▁▃▁▃ ▃
  27.6 ms         Histogram: frequency by time        32.1 ms <

 Memory estimate: 3.99 MiB, allocs estimate: 30581.

New

BenchmarkTools.Trial: 186 samples with 1 evaluation.
 Range (min … max):  25.873 ms … 42.620 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     26.236 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   26.947 ms ±  2.222 ms  ┊ GC (mean ± σ):  1.49% ± 4.60%

  ▆█▆▃▁▁ ▃▁                                                    
  █████████▇▁▁▁▁▁▁▁▁█▁▁▄▁▁▄▄▄▁▁▁▁▁▁▄▁▄▁▁▄▁▁▁▁▁▁▁▁▁▆▁▁▄▁▁▁▁▁▁▄ ▄
  25.9 ms      Histogram: log(frequency) by time      36.1 ms <

 Memory estimate: 3.42 MiB, allocs estimate: 30605.

Policy Run

Original

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          -10.7779872  87.0980496   97.8760368879   7          44        
 0.01       10         14.2589622   51.5049954   37.2460332328   5          464       
 0.01       20         18.6214722   33.7498003   15.1283280569   5          649       
 0.02       30         19.3534084   21.8312080   2.4777995367    5          657       
 0.02       40         19.3709835   19.6684906   0.2975071211    5          463       
 0.03       50         19.3713674   19.3833253   0.0119579046    6          212       
--------------------------------------------------------------------------------------
 0.04       57         19.3713684   19.3722266   0.0008581957    6          467       
--------------------------------------------------------------------------------------

New

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          -10.7779872  87.0980496   97.8760368879   7          44        
 0.01       10         14.2589622   51.5049954   37.2460332328   5          464       
 0.01       20         18.6214722   33.7498003   15.1283280569   5          649       
 0.02       30         19.3534084   21.8312080   2.4777995367    5          657       
 0.02       40         19.3709835   19.6684906   0.2975071211    5          463       
 0.03       50         19.3713674   19.3833253   0.0119579046    5          212       
--------------------------------------------------------------------------------------
 0.03       57         19.3713684   19.3722266   0.0008581957    5          467       
--------------------------------------------------------------------------------------

RockSamplePOMDP(5,5)

Settings:

epsilon: 0.1
precision: 0.001
delta: 0.1
max_steps: 50 (for benchmarking)
max_time: 5.0 (for policy run)

Benchmark

Original

BenchmarkTools.Trial: 274 samples with 1 evaluation.
 Range (min … max):  17.806 ms …  21.571 ms  ┊ GC (min … max): 0.00% … 14.15%
 Time  (median):     18.020 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   18.259 ms ± 717.625 μs  ┊ GC (mean ± σ):  1.14% ±  3.20%

   ▄██▇▆▅▂                                                      
  ▆█████████▄▆▁▁▆▁▄▁▁▁▁▁▁▁▁▁▁▄▁▁▁▁▁▄▄▁▄▁▄▇▄▁▁▁▁▄▁▇▁▄▁▁▄▄█▄▄▄▄▄ ▆
  17.8 ms       Histogram: log(frequency) by time      20.8 ms <

 Memory estimate: 4.50 MiB, allocs estimate: 27979.

New

BenchmarkTools.Trial: 254 samples with 1 evaluation.
 Range (min … max):  18.852 ms … 37.836 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     19.165 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.686 ms ±  1.788 ms  ┊ GC (mean ± σ):  2.03% ± 5.28%

   ▆█▇▃▁                                                       
  ███████▄▁▄▄▄▁▄▁▁▁▄▁▁▄▅▁▁▅▁▁▁▁▁▅▁▁▄▄▁▅▄▁▁▁▁▁▄▄▁▆▁▅▄▁▁▁▁▁▄▅▁▄ ▅
  18.9 ms      Histogram: log(frequency) by time      25.1 ms <

 Memory estimate: 4.49 MiB, allocs estimate: 28244.

Policy Run

Original

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          15.2423183   18.3402991   3.0979807840    13         14        
 0.00       10         16.9264164   18.1564052   1.2299888405    50         85        
 0.01       20         16.9264164   17.6433978   0.7169814681    68         113       
 0.01       30         16.9264164   17.5412846   0.6148682461    75         138       
 ...      
 0.10       230        16.9264164   16.9559859   0.0295695736    125        334       
 0.11       240        16.9264164   16.9499615   0.0235451701    132        333       
 0.11       250        16.9264164   16.9352883   0.0088718960    135        212       
 0.11       260        16.9264164   16.9318942   0.0054778577    125        95        
--------------------------------------------------------------------------------------
 0.12       269        16.9264164   16.9264164   -0.0000000000   127        42        
--------------------------------------------------------------------------------------

New

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          15.2423183   18.3402991   3.0979807840    13         14        
 0.00       10         16.9264164   18.1564052   1.2299888405    50         85        
 0.01       20         16.9264164   17.6433978   0.7169814681    71         113       
 0.01       30         16.9264164   17.5412846   0.6148682461    82         138       
 ...    
 0.14       250        16.9264164   16.9439586   0.0175422648    140        377       
 0.14       260        16.9264164   16.9341914   0.0077750094    140        294       
 0.15       270        16.9264164   16.9318942   0.0054778577    140        270       
 0.15       280        16.9264164   16.9291465   0.0027301069    140        236       
--------------------------------------------------------------------------------------
 0.16       290        16.9264164   16.9273809   0.0009645004    140        103       
--------------------------------------------------------------------------------------

TagPOMDP

Settings:

epsilon: 0.1
precision: 0.001
delta: 0.1
max_steps: 50 (for benchmarking)
max_time: 5.0 (for policy run)

Benchmark

Original

BenchmarkTools.Trial: 3 samples with 1 evaluation.
 Range (min … max):  1.842 s …   1.889 s  ┊ GC (min … max): 0.00% … 2.55%
 Time  (median):     1.865 s              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.865 s ± 23.570 ms  ┊ GC (mean ± σ):  0.86% ± 1.47%

  █                          █                            █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  1.84 s         Histogram: frequency by time        1.89 s <

 Memory estimate: 149.51 MiB, allocs estimate: 838113.

New

BenchmarkTools.Trial: 6 samples with 1 evaluation.
 Range (min … max):  903.588 ms … 924.269 ms  ┊ GC (min … max): 4.93% … 4.11%
 Time  (median):     907.070 ms               ┊ GC (median):    4.43%
 Time  (mean ± σ):   911.095 ms ±   8.801 ms  ┊ GC (mean ± σ):  4.58% ± 0.58%

  █ █      ██                                     █           █  
  █▁█▁▁▁▁▁▁██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁█ ▁
  904 ms           Histogram: frequency by time          924 ms <

 Memory estimate: 144.89 MiB, allocs estimate: 838603.

Policy Run

Original

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.01       0          -19.3713764  -4.6944443   14.6769321680   18         48        
 0.14       10         -12.1605923  -4.9155579   7.2450343269    192        345       
 0.40       20         -11.9445215  -5.0085441   6.9359773437    285        545       
 0.77       30         -11.5468985  -5.0615592   6.4853393203    361        773       
 ...     
 2.94       70         -11.2132678  -5.2020310   6.0112367924    617        1446      
 3.68       80         -11.1744215  -5.2321641   5.9422574180    632        1578      
 4.08       90         -11.1744215  -5.2645196   5.9099018818    703        1675      
 4.99       100        -11.1403313  -5.2904604   5.8498708373    708        1833      
--------------------------------------------------------------------------------------
 5.01       102        -11.1403313  -5.2905198   5.8498114519    722        1843      
--------------------------------------------------------------------------------------

New

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.01       0          -19.3713764  -4.6944443   14.6769321680   18         48        
 0.09       10         -12.1605923  -4.9155579   7.2450343269    108        345       
 0.19       20         -11.9445215  -5.0085441   6.9359773437    162        545       
 0.33       30         -11.5468985  -5.0615592   6.4853393203    210        774       
 ...   
 3.44       130        -11.1282380  -5.3416163   5.7866217372    544        2250      
 3.93       140        -11.1202809  -5.3649098   5.7553711760    575        2380      
 4.38       150        -11.1016246  -5.3819350   5.7196895946    608        2516      
 4.85       160        -11.1016246  -5.3932110   5.7084135463    612        2633      
--------------------------------------------------------------------------------------
 5.01       164        -11.1016246  -5.3989383   5.7026863223    613        2675      
--------------------------------------------------------------------------------------

TagPOMDP

Settings:

epsilon: 0.1
precision: 0.001
delta: 0.1
max_steps: 50 (for benchmarking)
max_time: 60.0 (for policy run)

Policy Run

Original

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.01       0          -19.3713764  -4.6944443   14.6769321680   18         48        
 0.14       10         -12.1605923  -4.9155579   7.2450343269    192        345       
 0.41       20         -11.9445215  -5.0085441   6.9359773437    285        545       
 0.78       30         -11.5468985  -5.0615592   6.4853393203    361        773       
 ...    
 50.40      400        -10.9287465  -5.6406050   5.2881415754    1413       5473      
 53.02      410        -10.9157919  -5.6579474   5.2578445942    1464       5605      
 55.63      420        -10.9157919  -5.6669246   5.2488673921    1469       5707      
 58.36      430        -10.9049055  -5.6747273   5.2301781393    1486       5820      
--------------------------------------------------------------------------------------
 60.85      440        -10.9049055  -5.6831865   5.2217189894    1464       5887      
--------------------------------------------------------------------------------------

New

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.01       0          -19.3713764  -4.6944443   14.6769321680   18         48        
 0.09       10         -12.1605923  -4.9155579   7.2450343269    108        345       
 0.19       20         -11.9445215  -5.0085441   6.9359773437    162        545       
 0.33       30         -11.5468985  -5.0615592   6.4853393203    210        774       
 ...    
 52.15      540        -10.9280846  -5.7686969   5.1593877438    1084       7060      
 54.88      550        -10.9227351  -5.7750679   5.1476671438    1100       7249      
 57.15      560        -10.9227351  -5.7818944   5.1408406919    1113       7351      
 59.52      570        -10.9227351  -5.7894943   5.1332407734    1121       7463      
--------------------------------------------------------------------------------------
 60.04      573        -10.9227351  -5.7910739   5.1316611547    1124       7483      
--------------------------------------------------------------------------------------

RockSamplePOMDP(15,10)

Settings:

epsilon: 0.1
precision: 0.001
delta: 0.1
max_steps: 50 (for benchmarking)
max_time: 120.0 (for policy run)
init_lower: BlindLowerBound(9223372036854775807, 60.0, 0.001, Float64[], Float64[])
init_upper: FastInformedBound(9223372036854775807, 60.0, 0.001, 0.0, Float64[], Float64[])

Policy Run

Original

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 1.14       0          14.9526422   18.9521036   3.9994614354    33         36        
 19.78      10         15.5252008   18.4262322   2.9010313613    301        472       
 47.74      20         15.6821109   18.2229256   2.5408147418    531        819       
 89.14      30         15.7936995   18.0692991   2.2755996353    771        1124      
--------------------------------------------------------------------------------------
 122.84     37         15.8384434   18.0353355   2.1968921378    937        1355      
--------------------------------------------------------------------------------------

New

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 1.15       0          14.9526422   18.9521036   3.9994614354    31         36        
 19.91      10         15.5252008   18.4262322   2.9010313613    299        472       
 46.64      20         15.6821109   18.2229256   2.5408147418    526        819       
 86.17      30         15.7936995   18.0692991   2.2755996353    766        1124      
--------------------------------------------------------------------------------------
 125.40     38         15.8384434   18.0342288   2.1957854454    968        1397      
-----

# Performance Comparison (delta = 0.01)

## BabyPOMDP
Settings:
  - epsilon: 0.1
  - precision: 0.001
  - delta: 0.01
  - max_steps: 50 (for benchmarking)
  - max_time: 5.0 (for policy run)
### Benchmark
#### Original
```julia
BenchmarkTools.Trial: 3104 samples with 1 evaluation.
 Range (min … max):  1.505 ms …   9.948 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.552 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.608 ms ± 384.784 μs  ┊ GC (mean ± σ):  1.55% ± 5.96%

  █▁                                                           
  ██▆▄▃▂▂▂▂▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▂▂▁▁▂▂ ▂
  1.5 ms          Histogram: frequency by time        4.13 ms <

 Memory estimate: 425.98 KiB, allocs estimate: 3847.

New

BenchmarkTools.Trial: 3736 samples with 1 evaluation.
 Range (min … max):  1.273 ms …   4.454 ms  ┊ GC (min … max): 0.00% … 67.61%
 Time  (median):     1.304 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.337 ms ± 237.356 μs  ┊ GC (mean ± σ):  1.54% ±  5.82%

    ▆█▆▄▄▂▁▁                                                   
  ▃▅█████████▇▇▇▆▅▅▅▅▅▅▅▄▅▄▄▄▃▄▃▃▃▃▃▃▂▂▂▃▃▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▁▂ ▄
  1.27 ms         Histogram: frequency by time        1.47 ms <

 Memory estimate: 367.85 KiB, allocs estimate: 3862.

Policy Run

Original

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          -28.3501795  -15.6037314  12.7464480985   2          30        
 0.00       10         -16.3057342  -16.2819897  0.0237444537    2          98        
 0.00       20         -16.3054833  -16.3024060  0.0030772894    2          133       
--------------------------------------------------------------------------------------
 0.00       28         -16.3054833  -16.3045949  0.0008883805    2          134       
--------------------------------------------------------------------------------------

New

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          -28.3501795  -15.6037314  12.7464480985   2          30        
 0.00       10         -16.3057342  -16.2819897  0.0237444537    2          98        
 0.00       20         -16.3054833  -16.3024060  0.0030772894    2          133       
--------------------------------------------------------------------------------------
 0.00       28         -16.3054833  -16.3045949  0.0008883805    2          134       
--------------------------------------------------------------------------------------

TigerPOMDP

Settings:

epsilon: 0.1
precision: 0.001
delta: 0.01
max_steps: 50 (for benchmarking)
max_time: 5.0 (for policy run)

Benchmark

Original

BenchmarkTools.Trial: 175 samples with 1 evaluation.
 Range (min … max):  27.673 ms … 34.380 ms  ┊ GC (min … max): 0.00% … 17.56%
 Time  (median):     28.044 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   28.603 ms ±  1.361 ms  ┊ GC (mean ± σ):  1.28% ±  3.73%

   ▂█▄                                                         
  ▄████▅▆▅▃▃▁▁▁▁▁▂▁▃▅▃▂▁▁▁▂▁▁▁▂▃▁▁▁▁▁▂▁▂▃▂▁▁▁▁▁▂▂▁▂▁▁▁▂▂▁▁▁▁▃ ▂
  27.7 ms         Histogram: frequency by time        33.5 ms <

 Memory estimate: 3.99 MiB, allocs estimate: 30579.

New

BenchmarkTools.Trial: 189 samples with 1 evaluation.
 Range (min … max):  25.553 ms … 36.437 ms  ┊ GC (min … max): 0.00% … 26.43%
 Time  (median):     26.105 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   26.577 ms ±  1.348 ms  ┊ GC (mean ± σ):  1.28% ±  4.06%

     ▆█▆▃ ▂ ▁  ▁                                               
  ▇▄▆████▇█▁█▆███▆▄▆▁▁▁▁▁▄▄▁▁▁▁▄▁▄▄▄▁▆▁▁▁▁▁▁▆▁▁▄▁▁▁▁▁▁▁▁▄▁▄▁▄ ▄
  25.6 ms      Histogram: log(frequency) by time      31.9 ms <

 Memory estimate: 3.43 MiB, allocs estimate: 30633.

Policy Run

Original

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          -10.7779872  87.0980496   97.8760368879   7          44        
 0.01       10         14.2589622   51.5049954   37.2460332328   5          464       
 0.01       20         18.6214722   33.7498003   15.1283280569   5          649       
 0.02       30         19.3534084   21.8312080   2.4777995367    5          657       
 0.03       40         19.3709835   19.6684906   0.2975071211    5          463       
 0.03       50         19.3713674   19.3833253   0.0119579046    5          212       
--------------------------------------------------------------------------------------
 0.04       57         19.3713684   19.3722266   0.0008581957    5          467       
--------------------------------------------------------------------------------------

New

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          -10.7779872  87.0980496   97.8760368879   6          44        
 0.01       10         14.2589622   51.5049954   37.2460332328   5          464       
 0.01       20         18.6214722   33.7498003   15.1283280569   5          649       
 0.02       30         19.3534084   21.8312080   2.4777995367    5          657       
 0.02       40         19.3709835   19.6684906   0.2975071211    5          463       
 0.03       50         19.3713674   19.3833253   0.0119579046    5          212       
--------------------------------------------------------------------------------------
 0.03       57         19.3713684   19.3722266   0.0008581957    5          467       
--------------------------------------------------------------------------------------

RockSamplePOMDP(5,5)

Settings:

epsilon: 0.1
precision: 0.001
delta: 0.01
max_steps: 50 (for benchmarking)
max_time: 5.0 (for policy run)

Benchmark

Original

BenchmarkTools.Trial: 252 samples with 1 evaluation.
 Range (min … max):  19.415 ms … 24.896 ms  ┊ GC (min … max): 0.00% … 20.29%
 Time  (median):     19.574 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.920 ms ±  1.055 ms  ┊ GC (mean ± σ):  1.62% ±  4.32%

  ▆█▇▃▁                                                        
  █████▆█▄▄▁▁▄▁▁▁▁▁▁▁▄▁▁▄▁▄▁▄▄▆▁▄▁▄▅▄▄▁▁▁▁▁▁▄▄▄▄▄▁▁▄▁▁▁▁▁▁▁▅▄ ▅
  19.4 ms      Histogram: log(frequency) by time      24.4 ms <

 Memory estimate: 4.50 MiB, allocs estimate: 27979.

New

BenchmarkTools.Trial: 252 samples with 1 evaluation.
 Range (min … max):  19.075 ms … 36.833 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     19.356 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.847 ms ±  1.719 ms  ┊ GC (mean ± σ):  1.96% ± 5.23%

  ▇█▆▆▅                                                        
  █████▆▆▆▄▇▁▄▁▁▁▄▁▁▆▄▄▄▁▄▁▁▄▄▁▆▁▆▁▇▁▁▄▁▁▁▁▁▁▇▁▄▁▁▁▁▁▁▁▁▁▄▁▁▄ ▆
  19.1 ms      Histogram: log(frequency) by time      26.4 ms <

 Memory estimate: 4.49 MiB, allocs estimate: 28244.

Policy Run

Original

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          15.2423183   18.3402991   3.0979807840    13         14        
 0.00       10         16.9264164   18.1564052   1.2299888405    50         85        
 0.01       20         16.9264164   17.6433978   0.7169814681    68         113       
 0.01       30         16.9264164   17.5412846   0.6148682461    75         138       
...       
 0.11       230        16.9264164   16.9559859   0.0295695736    125        334       
 0.12       240        16.9264164   16.9499615   0.0235451701    132        333       
 0.12       250        16.9264164   16.9352883   0.0088718960    135        212       
 0.13       260        16.9264164   16.9318942   0.0054778577    108        95        
--------------------------------------------------------------------------------------
 0.13       269        16.9264164   16.9264164   -0.0000000000   100        42        
--------------------------------------------------------------------------------------

New

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          15.2423183   18.3402991   3.0979807840    13         14        
 0.00       10         16.9264164   18.1564052   1.2299888405    50         85        
 0.01       20         16.9264164   17.6433978   0.7169814681    71         113       
 0.01       30         16.9264164   17.5412846   0.6148682461    82         138       
 ...     
 0.14       250        16.9264164   16.9439586   0.0175422648    140        377       
 0.15       260        16.9264164   16.9341914   0.0077750094    140        294       
 0.15       270        16.9264164   16.9318942   0.0054778577    140        270       
 0.16       280        16.9264164   16.9291465   0.0027301069    140        236       
--------------------------------------------------------------------------------------
 0.16       290        16.9264164   16.9273809   0.0009645004    140        103       
--------------------------------------------------------------------------------------

TagPOMDP

Settings:

epsilon: 0.1
precision: 0.001
delta: 0.01
max_steps: 50 (for benchmarking)
max_time: 5.0 (for policy run)

Benchmark

Original

BenchmarkTools.Trial: 2 samples with 1 evaluation.
 Range (min … max):  3.557 s …   3.693 s  ┊ GC (min … max): 1.02% … 2.68%
 Time  (median):     3.625 s              ┊ GC (median):    1.87%
 Time  (mean ± σ):   3.625 s ± 96.484 ms  ┊ GC (mean ± σ):  1.87% ± 1.17%

  █                                                       █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  3.56 s         Histogram: frequency by time        3.69 s <

 Memory estimate: 213.66 MiB, allocs estimate: 1233902.

New

BenchmarkTools.Trial: 6 samples with 1 evaluation.
 Range (min … max):  894.999 ms … 941.950 ms  ┊ GC (min … max): 2.93% … 4.34%
 Time  (median):     912.370 ms               ┊ GC (median):    4.53%
 Time  (mean ± σ):   917.238 ms ±  18.653 ms  ┊ GC (mean ± σ):  4.48% ± 0.87%

  █          █        █   █                              █    █  
  █▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁█▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁█ ▁
  895 ms           Histogram: frequency by time          942 ms <

 Memory estimate: 144.89 MiB, allocs estimate: 838603.

Policy Run

Original

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.01       0          -19.3713764  -4.6944443   14.6769321680   17         48        
 0.23       10         -16.8795969  -4.9353316   11.9442653748   137        413       
 0.71       20         -16.8186935  -5.0282493   11.7904441409   266        739       
 1.52       30         -16.7359900  -5.0891495   11.6468404953   394        1093      
 2.55       40         -16.7337451  -5.1183440   11.6154011517   427        1459      
 3.68       50         -16.7282187  -5.1628562   11.5653624986   462        1801      
 4.48       60         -11.7655940  -5.1908938   6.5747002521    78         2054      
 4.88       70         -11.6318770  -5.2183210   6.4135559922    153        2207      
--------------------------------------------------------------------------------------
 5.01       74         -11.4848554  -5.2267659   6.2580894702    150        2248      
--------------------------------------------------------------------------------------

New

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.01       0          -19.3713764  -4.6944443   14.6769321680   18         48        
 0.11       10         -12.1605923  -4.9155579   7.2450343269    108        345       
 0.21       20         -11.9445215  -5.0085441   6.9359773437    162        545       
 0.35       30         -11.5468985  -5.0615592   6.4853393203    210        774       
 ...    
 3.21       130        -11.1282380  -5.3416163   5.7866217372    544        2250      
 3.65       140        -11.1202809  -5.3649098   5.7553711760    575        2380      
 4.09       150        -11.1016246  -5.3819350   5.7196895946    608        2516      
 4.53       160        -11.1016246  -5.3932110   5.7084135463    612        2633      
--------------------------------------------------------------------------------------
 5.04       170        -11.0975521  -5.4072832   5.6902688876    663        2765      
--------------------------------------------------------------------------------------

TagPOMDP

Settings:

epsilon: 0.1
precision: 0.001
delta: 0.01
max_steps: 50 (for benchmarking)
max_time: 60.0 (for policy run)

Policy Run

Original

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.01       0          -19.3713764  -4.6944443   14.6769321680   17         48        
 0.22       10         -16.8795969  -4.9353316   11.9442653748   137        413       
 0.71       20         -16.8186935  -5.0282493   11.7904441409   266        739       
 1.50       30         -16.7359900  -5.0891495   11.6468404953   394        1093      
 ...    
 52.11      360        -10.9047672  -5.6299037   5.2748634709    815        5881      
 54.36      370        -10.8936428  -5.6454088   5.2482339323    876        5995      
 57.54      380        -10.8936428  -5.6538675   5.2397752781    846        6104      
 59.99      390        -10.8901538  -5.6584136   5.2317401605    928        6229      
--------------------------------------------------------------------------------------
 61.06      392        -10.8901538  -5.6590235   5.2311302602    867        6237      
--------------------------------------------------------------------------------------

New

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.01       0          -19.3713764  -4.6944443   14.6769321680   18         48        
 0.10       10         -12.1605923  -4.9155579   7.2450343269    108        345       
 0.20       20         -11.9445215  -5.0085441   6.9359773437    162        545       
 0.34       30         -11.5468985  -5.0615592   6.4853393203    210        774       
 ...   
 54.86      550        -10.9227351  -5.7750679   5.1476671438    1100       7249      
 57.21      560        -10.9227351  -5.7818944   5.1408406919    1113       7351      
 59.58      570        -10.9227351  -5.7894943   5.1332407734    1121       7463      
--------------------------------------------------------------------------------------
 60.04      573        -10.9227351  -5.7910739   5.1316611547    1124       7483      
--------------------------------------------------------------------------------------
```
## RockSamplePOMDP(15,10)
Settings:
  - epsilon: 0.1
  - precision: 0.001
  - delta: 0.01
  - max_steps: 50 (for benchmarking)
  - max_time: 120.0 (for policy run)
  - init_lower: BlindLowerBound(9223372036854775807, 60.0, 0.001, Float64[], Float64[])
  - init_upper: FastInformedBound(9223372036854775807, 60.0, 0.001, 0.0, Float64[], Float64[])
### Policy Run
#### Original
```julia
--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 1.21       0          14.9526422   18.9521036   3.9994614354    31         36        
 22.61      10         15.5252008   18.4262322   2.9010313613    299        472       
 60.05      20         15.6821109   18.2229256   2.5408147418    528        819       
 117.51     30         15.7936995   18.0692991   2.2755996353    767        1124      
--------------------------------------------------------------------------------------
 120.66     32         15.7936995   18.0621404   2.2684409641    811        1161      
--------------------------------------------------------------------------------------
```
#### New
```julia
--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 1.15       0          14.9526422   18.9521036   3.9994614354    31         36        
 20.12      10         15.5252008   18.4262322   2.9010313613    299        472       
 47.28      20         15.6821109   18.2229256   2.5408147418    526        819       
 87.16      30         15.7936995   18.0692991   2.2755996353    766        1124      
--------------------------------------------------------------------------------------
 127.28     38         15.8384434   18.0342288   2.1957854454    968        1397      
--------------------------------------------------------------------------------------
```

# Performance Comparison (delta = 0.0001)

## BabyPOMDP
Settings:
  - epsilon: 0.1
  - precision: 0.001
  - delta: 0.0001
  - max_steps: 50 (for benchmarking)
  - max_time: 5.0 (for policy run)
### Benchmark
#### Original
```julia
BenchmarkTools.Trial: 3113 samples with 1 evaluation.
 Range (min … max):  1.508 ms …   5.042 ms  ┊ GC (min … max): 0.00% … 66.53%
 Time  (median):     1.564 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.603 ms ± 275.125 μs  ┊ GC (mean ± σ):  1.57% ±  5.97%

    ▂▇█▆▃▂ ▁                                                   
  ▁▄████████▇▆▆▆▅▆▅▇▆▆▇▆▆▅▅▅▄▄▅▅▄▃▃▃▃▂▃▂▂▂▂▂▂▁▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  1.51 ms         Histogram: frequency by time        1.77 ms <

 Memory estimate: 425.98 KiB, allocs estimate: 3847.
```
#### New
```julia
BenchmarkTools.Trial: 3491 samples with 1 evaluation.
 Range (min … max):  1.275 ms … 47.107 ms  ┊ GC (min … max): 0.00% … 97.01%
 Time  (median):     1.328 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.430 ms ±  1.075 ms  ┊ GC (mean ± σ):  2.03% ±  5.36%

  █▅                                                          
  ██▆▄▂▂▂▂▂▂▁▂▂▂▂▁▂▁▁▁▁▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▂▂▁▂▁▂▂▁▁▁▁▁▁▁▁▁▂▂▂▂▂▂ ▂
  1.27 ms        Histogram: frequency by time        3.98 ms <

 Memory estimate: 367.85 KiB, allocs estimate: 3862.
```
### Policy Run
#### Original
```julia
--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          -28.3501795  -15.6037314  12.7464480985   2          30        
 0.00       10         -16.3057342  -16.2819897  0.0237444537    2          98        
 0.00       20         -16.3054833  -16.3024060  0.0030772894    2          133       
--------------------------------------------------------------------------------------
 0.00       28         -16.3054833  -16.3045949  0.0008883805    2          134       
--------------------------------------------------------------------------------------
```
#### New
```julia
--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          -28.3501795  -15.6037314  12.7464480985   2          30        
 0.00       10         -16.3057342  -16.2819897  0.0237444537    2          98        
 0.00       20         -16.3054833  -16.3024060  0.0030772894    2          133       
--------------------------------------------------------------------------------------
 0.00       28         -16.3054833  -16.3045949  0.0008883805    2          134       
--------------------------------------------------------------------------------------
```
## TigerPOMDP
Settings:
  - epsilon: 0.1
  - precision: 0.001
  - delta: 0.0001
  - max_steps: 50 (for benchmarking)
  - max_time: 5.0 (for policy run)
### Benchmark
#### Original
```julia
BenchmarkTools.Trial: 174 samples with 1 evaluation.
 Range (min … max):  27.757 ms … 64.675 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     28.058 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   28.889 ms ±  3.113 ms  ┊ GC (mean ± σ):  1.37% ± 4.00%

  ▆█▇▃   ▂ ▁                                                   
  ████▅▇▆█▁███▇█▅▆▅▁▁▁▁▁▁▁▁▁▁▅▅▁▅▆▁▁▁▁▁▆▁▁▁▅▆▁▅▁▁▁▁▁▅▁▅▁▁▁▁▁▅ ▅
  27.8 ms      Histogram: log(frequency) by time      35.3 ms <

 Memory estimate: 3.99 MiB, allocs estimate: 30579.
```
#### New
````julia
BenchmarkTools.Trial: 186 samples with 1 evaluation.
 Range (min … max):  25.764 ms … 67.617 ms  ┊ GC (min … max): 0.00% … 61.13%
 Time  (median):     26.194 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   26.968 ms ±  3.823 ms  ┊ GC (mean ± σ):  1.36% ±  5.01%

  ▆█▄▃▁                                                        
  █████▄▅▁▁▄▅▄▄▄▅▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▁▄ ▄
  25.8 ms      Histogram: log(frequency) by time      43.5 ms <

 Memory estimate: 3.43 MiB, allocs estimate: 30633.
```
### Policy Run
#### Original
```julia
--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          -10.7779872  87.0980496   97.8760368879   7          44        
 0.01       10         14.2589622   51.5049954   37.2460332328   5          464       
 0.01       20         18.6214722   33.7498003   15.1283280569   5          649       
 0.02       30         19.3534084   21.8312080   2.4777995367    5          657       
 0.02       40         19.3709835   19.6684906   0.2975071211    5          463       
 0.03       50         19.3713674   19.3833253   0.0119579046    5          212       
--------------------------------------------------------------------------------------
 0.04       57         19.3713684   19.3722266   0.0008581957    5          467       
--------------------------------------------------------------------------------------
```
#### New
```julia
--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          -10.7779872  87.0980496   97.8760368879   6          44        
 0.01       10         14.2589622   51.5049954   37.2460332328   5          464       
 0.01       20         18.6214722   33.7498003   15.1283280569   5          649       
 0.02       30         19.3534084   21.8312080   2.4777995367    5          657       
 0.02       40         19.3709835   19.6684906   0.2975071211    5          463       
 0.03       50         19.3713674   19.3833253   0.0119579046    5          212       
--------------------------------------------------------------------------------------
 0.03       57         19.3713684   19.3722266   0.0008581957    5          467       
--------------------------------------------------------------------------------------
```
## RockSamplePOMDP(5,5)
Settings:
  - epsilon: 0.1
  - precision: 0.001
  - delta: 0.0001
  - max_steps: 50 (for benchmarking)
  - max_time: 5.0 (for policy run)
### Benchmark
#### Original
```julia
BenchmarkTools.Trial: 250 samples with 1 evaluation.
 Range (min … max):  19.529 ms … 25.507 ms  ┊ GC (min … max): 0.00% … 22.39%
 Time  (median):     19.745 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   20.070 ms ±  1.035 ms  ┊ GC (mean ± σ):  1.56% ±  4.22%

  ▁██▄▁                                                        
  ██████▄▅▁▁▁▁▁▁▁▁▄▁▁▁▁▄▁▅▄▄▄▁▁▁▆▁▆▁▁▁▄▁▁▄▁▁▄▆▁▁▄▅▁▁▄▁▁▁▁▁▁▁▄ ▅
  19.5 ms      Histogram: log(frequency) by time      24.8 ms <

 Memory estimate: 4.50 MiB, allocs estimate: 27979.
```
#### New
```julia
BenchmarkTools.Trial: 247 samples with 1 evaluation.
 Range (min … max):  19.405 ms … 38.657 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     19.732 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   20.258 ms ±  2.131 ms  ┊ GC (mean ± σ):  1.25% ± 3.58%

  █▇▄                                                          
  ████▆▁▅▅▁█▇▆▄▅▄▁▁▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄ ▅
  19.4 ms      Histogram: log(frequency) by time      36.5 ms <

 Memory estimate: 4.49 MiB, allocs estimate: 28244.
```
### Policy Run
#### Original
```julia
--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          15.2423183   18.3402991   3.0979807840    13         14        
 0.00       10         16.9264164   18.1564052   1.2299888405    50         85        
 0.01       20         16.9264164   17.6433978   0.7169814681    68         113       
 0.01       30         16.9264164   17.5412846   0.6148682461    75         138       
 ...     
 0.12       230        16.9264164   16.9559859   0.0295695736    125        334       
 0.12       240        16.9264164   16.9499615   0.0235451701    132        333       
 0.12       250        16.9264164   16.9352883   0.0088718960    135        212       
 0.13       260        16.9264164   16.9318942   0.0054778577    105        95        
--------------------------------------------------------------------------------------
 0.13       269        16.9264164   16.9264164   -0.0000000000   85         42        
--------------------------------------------------------------------------------------
```
#### New
```julia
--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          15.2423183   18.3402991   3.0979807840    13         14        
 0.00       10         16.9264164   18.1564052   1.2299888405    50         85        
 0.01       20         16.9264164   17.6433978   0.7169814681    71         113       
 0.01       30         16.9264164   17.5412846   0.6148682461    82         138       
 ...     
 0.14       250        16.9264164   16.9439586   0.0175422648    140        377       
 0.15       260        16.9264164   16.9341914   0.0077750094    140        294       
 0.15       270        16.9264164   16.9318942   0.0054778577    140        270       
 0.16       280        16.9264164   16.9291465   0.0027301069    140        236       
--------------------------------------------------------------------------------------
 0.16       290        16.9264164   16.9273809   0.0009645004    140        103       
--------------------------------------------------------------------------------------
```
## TagPOMDP
Settings:
  - epsilon: 0.1
  - precision: 0.001
  - delta: 0.0001
  - max_steps: 50 (for benchmarking)
  - max_time: 5.0 (for policy run)
### Benchmark
#### Original
```julia
BenchmarkTools.Trial: 2 samples with 1 evaluation.
 Range (min … max):  3.822 s …   3.906 s  ┊ GC (min … max): 3.72% … 0.98%
 Time  (median):     3.864 s              ┊ GC (median):    2.34%
 Time  (mean ± σ):   3.864 s ± 59.360 ms  ┊ GC (mean ± σ):  2.34% ± 1.94%

  █                                                       █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  3.82 s         Histogram: frequency by time        3.91 s <

 Memory estimate: 213.87 MiB, allocs estimate: 1233765.
```
#### New
```julia
BenchmarkTools.Trial: 6 samples with 1 evaluation.
 Range (min … max):  923.464 ms … 988.016 ms  ┊ GC (min … max): 1.24% … 5.16%
 Time  (median):     965.469 ms               ┊ GC (median):    4.55%
 Time  (mean ± σ):   958.682 ms ±  26.717 ms  ┊ GC (mean ± σ):  4.16% ± 1.76%

  █    █                              █      █         █      █  
  █▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁█ ▁
  923 ms           Histogram: frequency by time          988 ms <

 Memory estimate: 144.92 MiB, allocs estimate: 838607.
```
### Policy Run
#### Original
```julia
--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.01       0          -19.3713764  -4.6944443   14.6769321680   16         48        
 0.18       10         -16.8795969  -4.9353316   11.9442653748   125        413       
 0.65       20         -16.8186935  -5.0282493   11.7904441409   234        739       
 1.42       30         -16.7359900  -5.0891495   11.6468404953   355        1093      
 2.48       40         -16.7337451  -5.1183440   11.6154011517   385        1459      
 3.78       50         -16.7282187  -5.1628562   11.5653624986   417        1801      
 4.56       60         -11.7655940  -5.1908938   6.5747002521    78         2054      
 5.00       70         -11.6318770  -5.2183210   6.4135559922    152        2207      
--------------------------------------------------------------------------------------
 5.00       71         -11.6318770  -5.2183210   6.4135559922    152        2207      
--------------------------------------------------------------------------------------
```
#### New
```julia
--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.01       0          -19.3713764  -4.6944443   14.6769321680   18         48        
 0.10       10         -12.1605923  -4.9155579   7.2450343269    108        345       
 0.20       20         -11.9445215  -5.0085441   6.9359773437    162        545       
 0.36       30         -11.5468985  -5.0615592   6.4853393203    210        774       
 ...   
 3.66       130        -11.1282380  -5.3416163   5.7866217372    544        2250      
 4.16       140        -11.1202809  -5.3649098   5.7553711760    575        2380      
 4.59       150        -11.1016246  -5.3819350   5.7196895946    608        2516      
 5.02       160        -11.1016246  -5.3932110   5.7084135463    612        2633      
--------------------------------------------------------------------------------------
 5.02       161        -11.1016246  -5.3932110   5.7084135463    612        2633      
--------------------------------------------------------------------------------------
```
## TagPOMDP
Settings:
  - epsilon: 0.1
  - precision: 0.001
  - delta: 0.0001
  - max_steps: 50 (for benchmarking)
  - max_time: 60.0 (for policy run)
### Policy Run
#### Original
```julia
--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.01       0          -19.3713764  -4.6944443   14.6769321680   16         48        
 0.17       10         -16.8795969  -4.9353316   11.9442653748   125        413       
 0.63       20         -16.8186935  -5.0282493   11.7904441409   234        739       
 1.42       30         -16.7359900  -5.0891495   11.6468404953   355        1093      
 ...    
 51.38      350        -10.9047672  -5.6197602   5.2850069111    740        5799      
 53.36      360        -10.9047672  -5.6299037   5.2748634709    784        5883      
 56.26      370        -10.8936428  -5.6454088   5.2482339323    784        5997      
 59.35      380        -10.8936428  -5.6538675   5.2397752781    765        6106      
--------------------------------------------------------------------------------------
 60.76      385        -10.8936428  -5.6556865   5.2379563180    745        6150      
--------------------------------------------------------------------------------------
```
#### New
```julia
--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.01       0          -19.3713764  -4.6944443   14.6769321680   18         48        
 0.10       10         -12.1605923  -4.9155579   7.2450343269    108        345       
 0.21       20         -11.9445215  -5.0085441   6.9359773437    162        545       
 0.36       30         -11.5468985  -5.0615592   6.4853393203    210        774       
 ...   
 51.56      540        -10.9280846  -5.7686969   5.1593877438    1084       7060      
 54.34      550        -10.9227351  -5.7750679   5.1476671438    1100       7249      
 56.58      560        -10.9227351  -5.7818944   5.1408406919    1113       7351      
 58.92      570        -10.9227351  -5.7894943   5.1332407734    1121       7463      
--------------------------------------------------------------------------------------
 60.01      576        -10.9227351  -5.7912153   5.1315197909    1124       7503      
--------------------------------------------------------------------------------------
```
## RockSamplePOMDP(15,10)
Settings:
  - epsilon: 0.1
  - precision: 0.001
  - delta: 0.0001
  - max_steps: 50 (for benchmarking)
  - max_time: 120.0 (for policy run)
  - init_lower: BlindLowerBound(9223372036854775807, 60.0, 0.001, Float64[], Float64[])
  - init_upper: FastInformedBound(9223372036854775807, 60.0, 0.001, 0.0, Float64[], Float64[])
### Policy Run
#### Original
```julia
--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 1.13       0          14.9526422   18.9521036   3.9994614354    31         36        
 24.59      10         15.5252008   18.4262322   2.9010313613    298        472       
 71.24      20         15.6821109   18.2229256   2.5408147418    527        819       
--------------------------------------------------------------------------------------
 129.31     29         15.7936995   18.0692991   2.2755996353    727        1068      
--------------------------------------------------------------------------------------
```
#### New
```julia
--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 1.15       0          14.9526422   18.9521036   3.9994614354    31         36        
 22.97      10         15.5252008   18.4262322   2.9010313613    299        472       
 56.51      20         15.6821109   18.2229256   2.5408147418    526        819       
 106.46     30         15.7936995   18.0692991   2.2755996353    766        1124      
--------------------------------------------------------------------------------------
 124.22     34         15.8091284   18.0564031   2.2472746795    837        1236      
--------------------------------------------------------------------------------------
```

… call to `prune`

WhiffleFish · 2024-07-09T21:30:24Z

Thanks for the PR! You're entirely correct, the current intersection distance implementation is wrong. However, I think the significant increase in allocations here needs to be addressed.

WhiffleFish

This PR is absolutely necessary -- I do however think it needs to be a bit more performant before a merge.

Great work

src/prune.jl

WhiffleFish · 2024-07-09T21:57:46Z

src/prune.jl

@@ -5,6 +5,7 @@ end

 function prune!(solver::SARSOPSolver, tree::SARSOPTree)
    prune!(tree)
+    prune_strictly_dominated!(tree::SARSOPTree)


Pruning strictly dominated alpha vecs at every iteration seems to take up a lot of time. Have you found performance to be worse when lumping it together with the conditional block containing prune_alpha!?

I do remember considering this, but I don't remember the details when I compared the two. I can run some comparisons and report back. (timeline TBD)

TL;DR; I'd recommend keeping it as pruning at every iteration.

Just ran the comparison for delta=1e-4. The allocations are a bit higher when running at every iteration, but the overall process is faster. The difference between these benchmarks and the original ones posted at the submission of the PR is due to the suggested changes (which help out quite a bit!).

BabyPOMDP

Settings:

epsilon: 0.1

precision: 0.001

delta: 0.0001

max_steps: 50 (for benchmarking)

max_time: 5.0 (for policy run)

Benchmark

At Every Iteration

BenchmarkTools.Trial: 3262 samples with 1 evaluation. Range (min … max): 1.384 ms … 69.646 ms ┊ GC (min … max): 0.00% … 97.80% Time (median): 1.423 ms ┊ GC (median): 0.00% Time (mean ± σ): 1.531 ms ± 1.607 ms ┊ GC (mean ± σ): 3.23% ± 6.40% █▆▅▄▂ ▁ ██████▆▆▅▅▅▅▃▁▃▁▁▃▁▁▁▁▁▁▃▄▁▁▁▁▁▁▁▁▁▁▁▁▁▃▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▃ █ 1.38 ms Histogram: log(frequency) by time 3.84 ms < Memory estimate: 662.32 KiB, allocs estimate: 8721.

Only when should_prune_alphas

BenchmarkTools.Trial: 3284 samples with 1 evaluation. Range (min … max): 1.420 ms … 68.969 ms ┊ GC (min … max): 0.00% … 97.74% Time (median): 1.456 ms ┊ GC (median): 0.00% Time (mean ± σ): 1.521 ms ± 1.236 ms ┊ GC (mean ± σ): 3.64% ± 6.95% ▁▇█▄▂ ▁ ▂▅██████▇████▆▆▆▄▄▄▃▄▃▃▃▃▃▃▃▃▃▃▂▂▂▂▂▃▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▁▁▁▁▂▂ ▃ 1.42 ms Histogram: frequency by time 1.67 ms < Memory estimate: 771.65 KiB, allocs estimate: 9099.

Policy Run

At Every Iteration

-------------------------------------------------------------------------------------- Time Iter LB UB Precision # Alphas # Beliefs -------------------------------------------------------------------------------------- 0.00 0 -28.3501795 -15.6037314 12.7464480985 2 30 0.00 10 -16.3057342 -16.2819897 0.0237444537 2 98 0.00 20 -16.3054833 -16.3024060 0.0030772894 2 133 -------------------------------------------------------------------------------------- 0.00 28 -16.3054833 -16.3045949 0.0008883805 2 134 --------------------------------------------------------------------------------------

Only when should_prune_alphas

-------------------------------------------------------------------------------------- Time Iter LB UB Precision # Alphas # Beliefs -------------------------------------------------------------------------------------- 0.00 0 -28.3501795 -15.6037314 12.7464480985 2 30 0.00 10 -16.3057342 -16.2819897 0.0237444537 2 98 0.00 20 -16.3054833 -16.3024060 0.0030772894 2 133 -------------------------------------------------------------------------------------- 0.00 28 -16.3054833 -16.3045949 0.0008883805 2 134 --------------------------------------------------------------------------------------

TigerPOMDP

Settings:

epsilon: 0.1

precision: 0.001

delta: 0.0001

max_steps: 50 (for benchmarking)

max_time: 5.0 (for policy run)

Benchmark

At Every Iteration

BenchmarkTools.Trial: 188 samples with 1 evaluation. Range (min … max): 25.590 ms … 94.617 ms ┊ GC (min … max): 0.00% … 71.50% Time (median): 26.162 ms ┊ GC (median): 0.00% Time (mean ± σ): 26.683 ms ± 5.040 ms ┊ GC (mean ± σ): 1.70% ± 5.60% ▁ ▂▅▆██▅▁ ▆█▇███████▁▅█▆▅▆▅▁▅▅▁▁▁▅▁▁▁▅▁▁▅▁▁▁▅▅▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▅▁▁▁▅ ▅ 25.6 ms Histogram: log(frequency) by time 30.5 ms < Memory estimate: 4.02 MiB, allocs estimate: 39898.

Only when should_prune_alphas

BenchmarkTools.Trial: 184 samples with 1 evaluation. Range (min … max): 26.127 ms … 91.703 ms ┊ GC (min … max): 0.00% … 71.08% Time (median): 26.702 ms ┊ GC (median): 0.00% Time (mean ± σ): 27.259 ms ± 4.871 ms ┊ GC (mean ± σ): 1.89% ± 5.92% ▄▇█▆ ▆▄▆██████▆▇▇▁▆▄█▁▁▆▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▆▁▁▁▁▄▄ ▄ 26.1 ms Histogram: log(frequency) by time 31.5 ms < Memory estimate: 4.81 MiB, allocs estimate: 40544.

Policy Run

At Every Iteration

-------------------------------------------------------------------------------------- Time Iter LB UB Precision # Alphas # Beliefs -------------------------------------------------------------------------------------- 0.00 0 -10.7779872 87.0980496 97.8760368879 6 44 0.01 10 14.2589622 51.5049954 37.2460332328 5 464 ... 0.02 40 19.3709835 19.6684906 0.2975071211 5 463 0.03 50 19.3713674 19.3833253 0.0119579046 5 212 -------------------------------------------------------------------------------------- 0.03 57 19.3713684 19.3722266 0.0008581957 5 467 --------------------------------------------------------------------------------------

Only when should_prune_alphas

-------------------------------------------------------------------------------------- Time Iter LB UB Precision # Alphas # Beliefs -------------------------------------------------------------------------------------- 0.00 0 -10.7779872 87.0980496 97.8760368879 6 44 0.01 10 14.2589622 51.5049954 37.2460332328 5 464 ... 0.02 40 19.3709835 19.6684906 0.2975071211 5 463 0.03 50 19.3713674 19.3833253 0.0119579046 5 212 -------------------------------------------------------------------------------------- 0.03 57 19.3713684 19.3722266 0.0008581957 5 467 --------------------------------------------------------------------------------------

RockSamplePOMDP(5,5)

Settings:

epsilon: 0.1

precision: 0.001

delta: 0.0001

max_steps: 50 (for benchmarking)

max_time: 5.0 (for policy run)

Benchmark

At Every Iteration

BenchmarkTools.Trial: 235 samples with 1 evaluation. Range (min … max): 19.918 ms … 94.401 ms ┊ GC (min … max): 0.00% … 77.84% Time (median): 20.551 ms ┊ GC (median): 0.00% Time (mean ± σ): 21.292 ms ± 4.979 ms ┊ GC (mean ± σ): 3.92% ± 7.37% ▁▄▅▆▄▄▆█▃▁▁ ▁ ███████████▅█▇▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▁▅▆▅▅▁▆▆▆▅█▁▇▁▅▁▁▆▅▆▅▅▁▇ ▆ 19.9 ms Histogram: log(frequency) by time 25.3 ms < Memory estimate: 11.77 MiB, allocs estimate: 51341.

Only when should_prune_alphas

BenchmarkTools.Trial: 227 samples with 1 evaluation. Range (min … max): 21.107 ms … 88.421 ms ┊ GC (min … max): 0.00% … 75.12% Time (median): 21.362 ms ┊ GC (median): 0.00% Time (mean ± σ): 22.097 ms ± 4.550 ms ┊ GC (mean ± σ): 2.80% ± 6.42% ▃█▇▃ ▂▄▂ ▃▄ ████▆███▄███▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇▁▁▁▁▆▁▄▆▆▁▁▁▁▆▄▁▄▁▄▁▁▁▁▁▆▄▄▄ ▆ 21.1 ms Histogram: log(frequency) by time 26.2 ms < Memory estimate: 8.71 MiB, allocs estimate: 41919.

Policy Run

At Every Iteration

-------------------------------------------------------------------------------------- Time Iter LB UB Precision # Alphas # Beliefs -------------------------------------------------------------------------------------- 0.00 0 15.2423183 18.3402991 3.0979807840 13 14 0.01 10 16.9264164 18.1564052 1.2299888405 50 85 ... 0.17 270 16.9264164 16.9318942 0.0054778577 140 270 0.17 280 16.9264164 16.9291465 0.0027301069 140 236 -------------------------------------------------------------------------------------- 0.18 290 16.9264164 16.9273809 0.0009645004 140 103 --------------------------------------------------------------------------------------

Only when should_prune_alphas

-------------------------------------------------------------------------------------- Time Iter LB UB Precision # Alphas # Beliefs -------------------------------------------------------------------------------------- 0.00 0 15.2423183 18.3402991 3.0979807840 13 14 0.00 10 16.9264164 18.1564052 1.2299888405 50 85 ... 0.17 270 16.9264164 16.9318942 0.0054778577 140 270 0.18 280 16.9264164 16.9291465 0.0027301069 146 236 -------------------------------------------------------------------------------------- 0.18 290 16.9264164 16.9273809 0.0009645004 154 103 --------------------------------------------------------------------------------------

TagPOMDP

Settings:

epsilon: 0.1

precision: 0.001

delta: 0.0001

max_steps: 50 (for benchmarking)

max_time: 5.0 (for policy run)

Benchmark

At Every Iteration

BenchmarkTools.Trial: 6 samples with 1 evaluation. Range (min … max): 901.824 ms … 983.309 ms ┊ GC (min … max): 0.00% … 8.59% Time (median): 906.289 ms ┊ GC (median): 0.00% Time (mean ± σ): 924.674 ms ± 33.572 ms ┊ GC (mean ± σ): 2.44% ± 3.75% █ ▁ ▁ ▁ ▁ █▁█▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁ 902 ms Histogram: frequency by time 983 ms < Memory estimate: 196.05 MiB, allocs estimate: 899624.

Only when should_prune_alphas

BenchmarkTools.Trial: 5 samples with 1 evaluation. Range (min … max): 1.086 s … 1.235 s ┊ GC (min … max): 0.00% … 8.29% Time (median): 1.093 s ┊ GC (median): 0.00% Time (mean ± σ): 1.120 s ± 64.031 ms ┊ GC (mean ± σ): 1.83% ± 3.71% ███ █ █ ███▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁ 1.09 s Histogram: frequency by time 1.23 s < Memory estimate: 175.33 MiB, allocs estimate: 879666.

Policy Run

At Every Iteration

-------------------------------------------------------------------------------------- Time Iter LB UB Precision # Alphas # Beliefs -------------------------------------------------------------------------------------- 0.01 0 -19.3713764 -4.6944443 14.6769321680 18 48 0.09 10 -12.1605923 -4.9155579 7.2450343269 108 345 ... 4.14 150 -11.1016246 -5.3819350 5.7196895946 608 2516 4.58 160 -11.1016246 -5.3932110 5.7084135463 612 2633 -------------------------------------------------------------------------------------- 5.02 169 -11.1003199 -5.4068507 5.6934692423 641 2743 --------------------------------------------------------------------------------------

Only when should_prune_alphas

-------------------------------------------------------------------------------------- Time Iter LB UB Precision # Alphas # Beliefs -------------------------------------------------------------------------------------- 0.01 0 -19.3713764 -4.6944443 14.6769321680 18 48 0.10 10 -12.1605923 -4.9155579 7.2450343269 108 345 ... 4.31 140 -11.1202809 -5.3649098 5.7553711760 575 2380 4.78 150 -11.1016246 -5.3819350 5.7196895946 629 2516 -------------------------------------------------------------------------------------- 5.02 156 -11.1016246 -5.3906492 5.7109754343 652 2579 --------------------------------------------------------------------------------------

TagPOMDP

Settings:

epsilon: 0.1

precision: 0.001

delta: 0.0001

max_steps: 50 (for benchmarking)

max_time: 60.0 (for policy run)

Policy Run

At Every Iteration

-------------------------------------------------------------------------------------- Time Iter LB UB Precision # Alphas # Beliefs -------------------------------------------------------------------------------------- 0.01 0 -19.3713764 -4.6944443 14.6769321680 18 48 0.09 10 -12.1605923 -4.9155579 7.2450343269 108 345 ... 57.21 550 -10.9227351 -5.7750679 5.1476671438 1100 7249 59.61 560 -10.9227351 -5.7818944 5.1408406919 1113 7351 -------------------------------------------------------------------------------------- 60.08 563 -10.9227351 -5.7841025 5.1386325673 1114 7373 --------------------------------------------------------------------------------------

Only when should_prune_alphas

-------------------------------------------------------------------------------------- Time Iter LB UB Precision # Alphas # Beliefs -------------------------------------------------------------------------------------- 0.01 0 -19.3713764 -4.6944443 14.6769321680 18 48 0.11 10 -12.1605923 -4.9155579 7.2450343269 108 345 ... 56.39 530 -10.9280846 -5.7611180 5.1669666193 1082 6906 59.07 540 -10.9280846 -5.7686969 5.1593877438 1152 7060 -------------------------------------------------------------------------------------- 60.22 544 -10.9280846 -5.7698137 5.1582709201 1110 7119 --------------------------------------------------------------------------------------

RockSamplePOMDP(15,10)

Settings:

epsilon: 0.1

precision: 0.001

delta: 0.0001

max_steps: 50 (for benchmarking)

max_time: 120.0 (for policy run)

init_lower: BlindLowerBound(9223372036854775807, 60.0, 0.001, Float64[], Float64[])

init_upper: FastInformedBound(9223372036854775807, 60.0, 0.001, 0.0, Float64[], Float64[])

Policy Run

At Every Iteration

-------------------------------------------------------------------------------------- Time Iter LB UB Precision # Alphas # Beliefs -------------------------------------------------------------------------------------- 1.16 0 14.9526422 18.9521036 3.9994614354 31 36 23.02 10 15.5252008 18.4262322 2.9010313613 299 472 56.36 20 15.6821109 18.2229256 2.5408147418 526 819 106.26 30 15.7936995 18.0692991 2.2755996353 766 1124 -------------------------------------------------------------------------------------- 123.90 34 15.8091284 18.0564031 2.2472746795 837 1236 --------------------------------------------------------------------------------------

Only when should_prune_alphas

-------------------------------------------------------------------------------------- Time Iter LB UB Precision # Alphas # Beliefs -------------------------------------------------------------------------------------- 1.16 0 14.9526422 18.9521036 3.9994614354 31 36 24.87 10 15.5252008 18.4262322 2.9010313613 299 472 66.25 20 15.6821109 18.2229256 2.5408147418 526 819 127.53 30 15.7936995 18.0692991 2.2755996353 766 1124 -------------------------------------------------------------------------------------- 127.53 31 15.7936995 18.0692991 2.2755996353 766 1124 --------------------------------------------------------------------------------------

Co-authored-by: Tyler Becker <[email protected]>

dylan-asmar · 2024-07-22T22:08:52Z

@WhiffleFish I updated the benchmark results after the suggested changes. We are doing WAY better on allocations than before and are better than or on par with the original implementation now.

I left the conversations open on the change requests. We can resolve the conversation if you are satisfied with the comments/changes.

Let me know if there are any other changes we should consider.

dylan-asmar added 6 commits June 28, 2024 13:41

Added pruning of alpha vectors for strictly dominated vectors at each…

3d697fc

… call to `prune`

Chnage to not update Gamma size

17df213

Update to prune_alpha!

10c99ec

function name update

ff3e8cc

formatting update

c94bdf7

Added tests for prune functions

03a751e

WhiffleFish requested changes Jul 9, 2024

View reviewed changes

dylan-asmar and others added 3 commits July 9, 2024 22:32

Update src/prune.jl

4794b0e

Co-authored-by: Tyler Becker <[email protected]>

Update src/prune.jl

edbe960

Co-authored-by: Tyler Becker <[email protected]>

Updated prune_strictly_dominated! to reduce allocations

67133d0

dylan-asmar requested a review from WhiffleFish August 12, 2024 20:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prune Changes #19

Prune Changes #19

dylan-asmar commented Jun 28, 2024 •

edited

Loading

WhiffleFish commented Jul 9, 2024

WhiffleFish left a comment

WhiffleFish Jul 9, 2024

dylan-asmar Jul 10, 2024

dylan-asmar Jul 22, 2024 •

edited

Loading

dylan-asmar commented Jul 22, 2024

Prune Changes #19

Are you sure you want to change the base?

Prune Changes #19

Conversation

dylan-asmar commented Jun 28, 2024 • edited Loading

Performance Comparison (delta = 0.1)

BabyPOMDP

Benchmark

Original

New

Policy Run

Original

New

TigerPOMDP

Benchmark

Original

New

Policy Run

Original

New

RockSamplePOMDP(5,5)

Benchmark

Original

New

Policy Run

Original

New

TagPOMDP

Benchmark

Original

New

Policy Run

Original

New

TagPOMDP

Policy Run

Original

New

RockSamplePOMDP(15,10)

Policy Run

Original

New

New

Policy Run

Original

New

TigerPOMDP

Benchmark

Original

New

Policy Run

Original

New

RockSamplePOMDP(5,5)

Benchmark

Original

New

Policy Run

Original

New

TagPOMDP

Benchmark

Original

New

Policy Run

Original

New

TagPOMDP

Policy Run

Original

New

WhiffleFish commented Jul 9, 2024

WhiffleFish left a comment

Choose a reason for hiding this comment

WhiffleFish Jul 9, 2024

Choose a reason for hiding this comment

dylan-asmar Jul 10, 2024

Choose a reason for hiding this comment

dylan-asmar Jul 22, 2024 • edited Loading

Choose a reason for hiding this comment

dylan-asmar commented Jun 28, 2024 •

edited

Loading

dylan-asmar Jul 22, 2024 •

edited

Loading

Only when `should_prune_alphas`

Only when `should_prune_alphas`

Only when `should_prune_alphas`

Only when `should_prune_alphas`

Only when `should_prune_alphas`

Only when `should_prune_alphas`

Only when `should_prune_alphas`

Only when `should_prune_alphas`

Only when `should_prune_alphas`

Only when `should_prune_alphas`