-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Value Function for (belief, action) pairs missing #35
Comments
If there is an alpha vector corresponding to the action, it shouldn't be too hard to implement, but what should the function return if there are no alpha vectors for that action? |
Hmm when would that happen? If an action is not possible from a belief state? Right now I am using the actionvalues function to access the Q(b,a) for each b in a belief history but I only tried it for the RockSample. |
It would happen if an action is never optimal for any belief. All of the alpha vectors corresponding to that action might be pruned from the alpha vector set. I suppose the right thing would be to throw an ArgumentError, but I don't know how that would impact performance. Another option would be returning NaN or -Inf, but I have a feeling that indicating errors in that way is bad practice. |
Ah yeah, I did not take the pruning into account true. Is this an error though? I wouldn't call it an error, the pruning reduces the search space by forgoing unnecessary value calculations for actions that are clearly never optimal. Then, if we are taking say a cost minimization perspective returning +Inf (-Inf if reward maximization) makes sense no? In the end, if an action is never optimal for any belief, setting its value for each belief to +Inf makes sense. |
Yes, pruning is certainly not an error, but trying to access a value for an action without an alpha vector might be considered an error. That being said Do you want to submit a PR? I can review it. |
Will do as soon as I can, thanks! |
Hi,
I have noticed that in the documentation we are told value(policy, b, a) should return Q(b,a). This is not defined for AlphaVectorPolicies. I see there is the actionvalues function in there already doing this. A simple tweak should do the trick for value(policy, b, a) right?
The text was updated successfully, but these errors were encountered: