Callable policy type directly (like alg4dm) #42

mossr · 2021-09-21T19:59:59Z

Has it been considered to allow for callable policies directly? The algorithmsbook uses this syntax extensively.

For POMDPPolicies, we'd only need to define:

(π::Policy)(s) = action(π, s)

then we can do things like

π = solve(solver, mdp)
a = π(s)

Thoughts?

zsunberg · 2021-09-21T20:14:17Z

Yeah, I often think about this e.g. JuliaPOMDP/POMDPs.jl#252

Changing the way people interact with policies would be a big change, so we would want to make sure that there is a good plan for transitioning.

The other related issue is whether policies should return distributions of actions.

If I was redesigning POMDPs.jl today, I think I would say action(policy, s) should return a distribution of actions and policy(s) should return a sample from that distribution, with a default implementation of

(p::Policy)(s) = rand(action(p, s))

I have not thought very much about the utility of having the function call syntax as just syntactic sugar like you suggest.

zsunberg · 2021-09-21T20:16:43Z

Happy to consider proposals

mossr · 2021-09-21T20:19:53Z

I like the distribution idea, but haven't thought too much about the impact of that conceptual change. My original proposal was simply some syntactical sugar to mimic the action function exactly.

I'll keep thinking about this.

rejuvyesh · 2021-09-22T23:42:01Z

I think I would say action(policy, s) should return a distribution of actions and policy(s) should return a sample from that distribution

I would expect the opposite.

zsunberg · 2021-09-23T17:21:09Z

@rejuvyesh Good to know. That would be much easier to transition to from our current semantics. The reason I thought action should return a distribution is that transition and observation also return distributions.

It would also be nice for policy(s) to return an action because I think people would more likely write functions that return actions rather than distributions, e.g. policy(s) = -K*s for a linear policy rather than policy(s) = Deterministic(-K*s)

zsunberg · 2021-09-23T17:21:45Z

But I would be open to having policy(s) return a distribution if other people like that a lot.

rejuvyesh · 2021-09-24T18:03:42Z

I think in [PO]MDP literature, policy $\pi$ is understood to map states to actions or action distributions. But explicitly calling the function action on the policy object I would expect to get back an action (potentially sampled). I understand that functions like transition and observation return distributions, so if we want to adhere to that, we should just eliminate either action or having policy as a callable in my opinion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Callable policy type directly (like alg4dm) #42

Callable policy type directly (like alg4dm) #42

mossr commented Sep 21, 2021

zsunberg commented Sep 21, 2021 •

edited

Loading

zsunberg commented Sep 21, 2021

mossr commented Sep 21, 2021

rejuvyesh commented Sep 22, 2021

zsunberg commented Sep 23, 2021

zsunberg commented Sep 23, 2021

rejuvyesh commented Sep 24, 2021

Callable policy type directly (like alg4dm) #42

Callable policy type directly (like alg4dm) #42

Comments

mossr commented Sep 21, 2021

zsunberg commented Sep 21, 2021 • edited Loading

zsunberg commented Sep 21, 2021

mossr commented Sep 21, 2021

rejuvyesh commented Sep 22, 2021

zsunberg commented Sep 23, 2021

zsunberg commented Sep 23, 2021

rejuvyesh commented Sep 24, 2021

zsunberg commented Sep 21, 2021 •

edited

Loading