Skip to content

Commit

Permalink
Merge pull request #420 from fzyzcjy/patch-1
Browse files Browse the repository at this point in the history
Super tiny fix format
  • Loading branch information
simoninithomas authored Mar 4, 2024
2 parents 262cc0c + 59bce06 commit bf5a72a
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions units/en/unit4/policy-gradient.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ Let's give some more details on this formula:


- \\(R(\tau)\\) : Return from an arbitrary trajectory. To take this quantity and use it to calculate the expected return, we need to multiply it by the probability of each possible trajectory.

- \\(P(\tau;\theta)\\) : Probability of each possible trajectory \\(\tau\\) (that probability depends on \\( \theta\\) since it defines the policy that it uses to select the actions of the trajectory which has an impact of the states visited).

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit6/probability.png" alt="Probability"/>
Expand Down

0 comments on commit bf5a72a

Please sign in to comment.