Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Value Function for (belief, action) pairs missing #35

Open
mmcelikok opened this issue Mar 16, 2021 · 6 comments
Open

Value Function for (belief, action) pairs missing #35

mmcelikok opened this issue Mar 16, 2021 · 6 comments

Comments

@mmcelikok
Copy link

mmcelikok commented Mar 16, 2021

Hi,

I have noticed that in the documentation we are told value(policy, b, a) should return Q(b,a). This is not defined for AlphaVectorPolicies. I see there is the actionvalues function in there already doing this. A simple tweak should do the trick for value(policy, b, a) right?

@zsunberg
Copy link
Member

If there is an alpha vector corresponding to the action, it shouldn't be too hard to implement, but what should the function return if there are no alpha vectors for that action?

@mmcelikok
Copy link
Author

Hmm when would that happen? If an action is not possible from a belief state? Right now I am using the actionvalues function to access the Q(b,a) for each b in a belief history but I only tried it for the RockSample.

@zsunberg
Copy link
Member

It would happen if an action is never optimal for any belief. All of the alpha vectors corresponding to that action might be pruned from the alpha vector set.

I suppose the right thing would be to throw an ArgumentError, but I don't know how that would impact performance. Another option would be returning NaN or -Inf, but I have a feeling that indicating errors in that way is bad practice.

@mmcelikok
Copy link
Author

Ah yeah, I did not take the pruning into account true. Is this an error though? I wouldn't call it an error, the pruning reduces the search space by forgoing unnecessary value calculations for actions that are clearly never optimal. Then, if we are taking say a cost minimization perspective returning +Inf (-Inf if reward maximization) makes sense no? In the end, if an action is never optimal for any belief, setting its value for each belief to +Inf makes sense.

@zsunberg
Copy link
Member

Yes, pruning is certainly not an error, but trying to access a value for an action without an alpha vector might be considered an error.

That being said actionvalues already returns -Inf for actions without an alpha vector, so I think it would be OK to do that here as well.

Do you want to submit a PR? I can review it.

@mmcelikok
Copy link
Author

Will do as soon as I can, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants