Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Callable policy type directly (like alg4dm) #42

Open
mossr opened this issue Sep 21, 2021 · 7 comments
Open

Callable policy type directly (like alg4dm) #42

mossr opened this issue Sep 21, 2021 · 7 comments

Comments

@mossr
Copy link
Member

mossr commented Sep 21, 2021

Has it been considered to allow for callable policies directly? The algorithmsbook uses this syntax extensively.

For POMDPPolicies, we'd only need to define:

::Policy)(s) = action(π, s)

then we can do things like

π = solve(solver, mdp)
a = π(s)

Thoughts?

@zsunberg
Copy link
Member

zsunberg commented Sep 21, 2021

Yeah, I often think about this e.g. JuliaPOMDP/POMDPs.jl#252

Changing the way people interact with policies would be a big change, so we would want to make sure that there is a good plan for transitioning.

The other related issue is whether policies should return distributions of actions.

If I was redesigning POMDPs.jl today, I think I would say action(policy, s) should return a distribution of actions and policy(s) should return a sample from that distribution, with a default implementation of

(p::Policy)(s) = rand(action(p, s))

I have not thought very much about the utility of having the function call syntax as just syntactic sugar like you suggest.

@zsunberg
Copy link
Member

Happy to consider proposals

@mossr
Copy link
Member Author

mossr commented Sep 21, 2021

I like the distribution idea, but haven't thought too much about the impact of that conceptual change. My original proposal was simply some syntactical sugar to mimic the action function exactly.

I'll keep thinking about this.

@rejuvyesh
Copy link
Member

I think I would say action(policy, s) should return a distribution of actions and policy(s) should return a sample from that distribution

I would expect the opposite.

@zsunberg
Copy link
Member

@rejuvyesh Good to know. That would be much easier to transition to from our current semantics. The reason I thought action should return a distribution is that transition and observation also return distributions.

It would also be nice for policy(s) to return an action because I think people would more likely write functions that return actions rather than distributions, e.g. policy(s) = -K*s for a linear policy rather than policy(s) = Deterministic(-K*s)

@zsunberg
Copy link
Member

But I would be open to having policy(s) return a distribution if other people like that a lot.

@rejuvyesh
Copy link
Member

I think in [PO]MDP literature, policy $\pi$ is understood to map states to actions or action distributions. But explicitly calling the function action on the policy object I would expect to get back an action (potentially sampled). I understand that functions like transition and observation return distributions, so if we want to adhere to that, we should just eliminate either action or having policy as a callable in my opinion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants