Skip to content

Commit

Permalink
format
Browse files Browse the repository at this point in the history
  • Loading branch information
johnjim0816 committed Aug 14, 2023
1 parent e802497 commit d8024ee
Show file tree
Hide file tree
Showing 5 changed files with 48 additions and 25 deletions.
2 changes: 1 addition & 1 deletion docs/ch5/main.md
Original file line number Diff line number Diff line change
Expand Up @@ -395,5 +395,5 @@ self.epsilon_decay = 200 # e-greedy策略中epsilon的衰减率
</div>
<div align=center>图 $\text{5.11}$ $\text{Sarsa}$ 算法测试曲线 </div>

我们发现相比于 $\text{Q-learning}$ 算法的 $300$ 回合收敛,$\text{Sarsa}$ 算法需要额外的 $100$ 个回合收敛,但是收敛之后会更稳定,没有一些波动过大的值,这就是我们接下来要讲的同策略( $\text{on-policy}$ )与异策略( $\text{off-policy}$ )的内容
我们发现相比于 $\text{Q-learning}$ 算法的 $300$ 回合收敛,$\text{Sarsa}$ 算法需要额外的 $100$ 个回合收敛,但是收敛之后会更稳定,没有一些波动过大的值。

Loading

0 comments on commit d8024ee

Please sign in to comment.