Skip to content

Commit

Permalink
W3d4_fix (#1075)
Browse files Browse the repository at this point in the history
* Add files via upload

* Created using Colaboratory

* Created using Colaboratory

* Created using Colaboratory

* Process tutorial notebooks

* Created using Colaboratory

* Process tutorial notebooks

---------

Co-authored-by: GitHub Action <[email protected]>
  • Loading branch information
GaganaB and actions-user authored Jul 31, 2023
1 parent 46bc687 commit 7d2a3fe
Show file tree
Hide file tree
Showing 7 changed files with 6 additions and 6 deletions.
2 changes: 1 addition & 1 deletion tutorials/W3D4_ReinforcementLearning/W3D4_Tutorial1.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -974,7 +974,7 @@
"\n",
"With a high learning rate, the value function tracks each observed reward, changing quickly whenever there is a reward prediction error. In a probabilistic scenario case, this behavior results in the value function changing too quickly and never stabilizing (converging). Using a low learning rate can stabilize the value function by smoothing out any variation in the reward signal, leading the value function to converge to the average reward over time. However, using a low learning rate can result in slow learning.\n",
"\n",
"To get the best of all worls, it is often useful to use a high learning rate early on (producing fast learning), and to reduce the learning rate gradually throughout learning (so that the value function converges to the average reward). This is sometimes called \"learning rate schedule\"."
"To get the best of all worlds, it is often useful to use a high learning rate early on (producing fast learning), and to reduce the learning rate gradually throughout learning (so that the value function converges to the average reward). This is sometimes called \"learning rate schedule\"."
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion tutorials/W3D4_ReinforcementLearning/W3D4_Tutorial3.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -351,7 +351,7 @@
"\n",
"For our discussion we will be looking at the classic Cliff World, or Cliff Walker, environment. This is a 4x10 grid with a starting position in the lower-left and the goal position in the lower-right. Every tile between these two is the \"cliff\", and should the agent enter the cliff, they will receive a -100 reward and be sent back to the starting position. Every tile other than the cliff produces a -1 reward when entered. The goal tile ends the episode after taking any action from it.\n",
"\n",
"<img alt=\"CliffWorld\" width=\"577\" height=\"308\" src=\"https://github.com/NeuromatchAcademy/course-content/blob/main/tutorials/static/W3D4_Tutorial3_CliffWorld.png?raw=true\">\n",
"<img alt=\"CliffWorld\" width=\"577\" height=\"308\" src=\"https://github.com/NeuromatchAcademy/course-content/blob/main/tutorials/static/W3D4_Tutorial3_GridWorld410.png?raw=true\">\n",
"\n",
"Given these conditions, the maximum achievable reward is -11 (1 up, 9 right, 1 down). Using negative rewards is a common technique to encourage the agent to move and seek out the goal state as fast as possible."
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -976,7 +976,7 @@
"\n",
"With a high learning rate, the value function tracks each observed reward, changing quickly whenever there is a reward prediction error. In a probabilistic scenario case, this behavior results in the value function changing too quickly and never stabilizing (converging). Using a low learning rate can stabilize the value function by smoothing out any variation in the reward signal, leading the value function to converge to the average reward over time. However, using a low learning rate can result in slow learning.\n",
"\n",
"To get the best of all worls, it is often useful to use a high learning rate early on (producing fast learning), and to reduce the learning rate gradually throughout learning (so that the value function converges to the average reward). This is sometimes called \"learning rate schedule\"."
"To get the best of all worlds, it is often useful to use a high learning rate early on (producing fast learning), and to reduce the learning rate gradually throughout learning (so that the value function converges to the average reward). This is sometimes called \"learning rate schedule\"."
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -351,7 +351,7 @@
"\n",
"For our discussion we will be looking at the classic Cliff World, or Cliff Walker, environment. This is a 4x10 grid with a starting position in the lower-left and the goal position in the lower-right. Every tile between these two is the \"cliff\", and should the agent enter the cliff, they will receive a -100 reward and be sent back to the starting position. Every tile other than the cliff produces a -1 reward when entered. The goal tile ends the episode after taking any action from it.\n",
"\n",
"<img alt=\"CliffWorld\" width=\"577\" height=\"308\" src=\"https://github.com/NeuromatchAcademy/course-content/blob/main/tutorials/static/W3D4_Tutorial3_CliffWorld.png?raw=true\">\n",
"<img alt=\"CliffWorld\" width=\"577\" height=\"308\" src=\"https://github.com/NeuromatchAcademy/course-content/blob/main/tutorials/static/W3D4_Tutorial3_GridWorld410.png?raw=true\">\n",
"\n",
"Given these conditions, the maximum achievable reward is -11 (1 up, 9 right, 1 down). Using negative rewards is a common technique to encourage the agent to move and seek out the goal state as fast as possible."
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -911,7 +911,7 @@
"\n",
"With a high learning rate, the value function tracks each observed reward, changing quickly whenever there is a reward prediction error. In a probabilistic scenario case, this behavior results in the value function changing too quickly and never stabilizing (converging). Using a low learning rate can stabilize the value function by smoothing out any variation in the reward signal, leading the value function to converge to the average reward over time. However, using a low learning rate can result in slow learning.\n",
"\n",
"To get the best of all worls, it is often useful to use a high learning rate early on (producing fast learning), and to reduce the learning rate gradually throughout learning (so that the value function converges to the average reward). This is sometimes called \"learning rate schedule\"."
"To get the best of all worlds, it is often useful to use a high learning rate early on (producing fast learning), and to reduce the learning rate gradually throughout learning (so that the value function converges to the average reward). This is sometimes called \"learning rate schedule\"."
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -351,7 +351,7 @@
"\n",
"For our discussion we will be looking at the classic Cliff World, or Cliff Walker, environment. This is a 4x10 grid with a starting position in the lower-left and the goal position in the lower-right. Every tile between these two is the \"cliff\", and should the agent enter the cliff, they will receive a -100 reward and be sent back to the starting position. Every tile other than the cliff produces a -1 reward when entered. The goal tile ends the episode after taking any action from it.\n",
"\n",
"<img alt=\"CliffWorld\" width=\"577\" height=\"308\" src=\"https://github.com/NeuromatchAcademy/course-content/blob/main/tutorials/static/W3D4_Tutorial3_CliffWorld.png?raw=true\">\n",
"<img alt=\"CliffWorld\" width=\"577\" height=\"308\" src=\"https://github.com/NeuromatchAcademy/course-content/blob/main/tutorials/static/W3D4_Tutorial3_GridWorld410.png?raw=true\">\n",
"\n",
"Given these conditions, the maximum achievable reward is -11 (1 up, 9 right, 1 down). Using negative rewards is a common technique to encourage the agent to move and seek out the goal state as fast as possible."
]
Expand Down
Binary file added tutorials/static/W3D4_Tutorial3_GridWorld410.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 7d2a3fe

Please sign in to comment.