Value Function for (belief, action) pairs missing #35

mmcelikok · 2021-03-16T11:10:55Z

Hi,

I have noticed that in the documentation we are told value(policy, b, a) should return Q(b,a). This is not defined for AlphaVectorPolicies. I see there is the actionvalues function in there already doing this. A simple tweak should do the trick for value(policy, b, a) right?

zsunberg · 2021-03-17T03:45:11Z

If there is an alpha vector corresponding to the action, it shouldn't be too hard to implement, but what should the function return if there are no alpha vectors for that action?

mmcelikok · 2021-03-17T12:04:48Z

Hmm when would that happen? If an action is not possible from a belief state? Right now I am using the actionvalues function to access the Q(b,a) for each b in a belief history but I only tried it for the RockSample.

zsunberg · 2021-03-17T16:43:24Z

It would happen if an action is never optimal for any belief. All of the alpha vectors corresponding to that action might be pruned from the alpha vector set.

I suppose the right thing would be to throw an ArgumentError, but I don't know how that would impact performance. Another option would be returning NaN or -Inf, but I have a feeling that indicating errors in that way is bad practice.

mmcelikok · 2021-03-17T17:33:14Z

Ah yeah, I did not take the pruning into account true. Is this an error though? I wouldn't call it an error, the pruning reduces the search space by forgoing unnecessary value calculations for actions that are clearly never optimal. Then, if we are taking say a cost minimization perspective returning +Inf (-Inf if reward maximization) makes sense no? In the end, if an action is never optimal for any belief, setting its value for each belief to +Inf makes sense.

zsunberg · 2021-03-17T17:42:39Z

Yes, pruning is certainly not an error, but trying to access a value for an action without an alpha vector might be considered an error.

That being said actionvalues already returns -Inf for actions without an alpha vector, so I think it would be OK to do that here as well.

Do you want to submit a PR? I can review it.

mmcelikok · 2021-03-17T17:44:16Z

Will do as soon as I can, thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Value Function for (belief, action) pairs missing #35

Value Function for (belief, action) pairs missing #35

mmcelikok commented Mar 16, 2021 •

edited

Loading

zsunberg commented Mar 17, 2021

mmcelikok commented Mar 17, 2021

zsunberg commented Mar 17, 2021

mmcelikok commented Mar 17, 2021

zsunberg commented Mar 17, 2021

mmcelikok commented Mar 17, 2021

Value Function for (belief, action) pairs missing #35

Value Function for (belief, action) pairs missing #35

Comments

mmcelikok commented Mar 16, 2021 • edited Loading

zsunberg commented Mar 17, 2021

mmcelikok commented Mar 17, 2021

zsunberg commented Mar 17, 2021

mmcelikok commented Mar 17, 2021

zsunberg commented Mar 17, 2021

mmcelikok commented Mar 17, 2021

mmcelikok commented Mar 16, 2021 •

edited

Loading