W3d4_fix (#1075)

* Add files via upload * Created using Colaboratory * Created using Colaboratory * Created using Colaboratory * Process tutorial notebooks * Created using Colaboratory * Process tutorial notebooks --------- Co-authored-by: GitHub Action <[email protected]>
NeuromatchAcademy · Jul 31, 2023 · 7d2a3fe · 7d2a3fe
1 parent 46bc687
commit 7d2a3fe
Show file tree

Hide file tree

Showing 7 changed files with 6 additions and 6 deletions.
diff --git a/tutorials/W3D4_ReinforcementLearning/W3D4_Tutorial1.ipynb b/tutorials/W3D4_ReinforcementLearning/W3D4_Tutorial1.ipynb
@@ -974,7 +974,7 @@
     "\n",
     "With a high learning rate, the value function tracks each observed reward, changing quickly whenever there is a reward prediction error. In a probabilistic scenario case, this behavior results in the value function changing too quickly and never stabilizing (converging). Using a low learning rate can stabilize the value function by smoothing out any variation in the reward signal, leading the value function to converge to the average reward over time. However, using a low learning rate can result in slow learning.\n",
     "\n",
-    "To get the best of all worls, it is often useful to use a high learning rate early on (producing fast learning), and to reduce the learning rate gradually throughout learning (so that the value function converges to the average reward). This is sometimes called \"learning rate schedule\"."
+    "To get the best of all worlds, it is often useful to use a high learning rate early on (producing fast learning), and to reduce the learning rate gradually throughout learning (so that the value function converges to the average reward). This is sometimes called \"learning rate schedule\"."
    ]
   },
   {

diff --git a/tutorials/W3D4_ReinforcementLearning/W3D4_Tutorial3.ipynb b/tutorials/W3D4_ReinforcementLearning/W3D4_Tutorial3.ipynb
@@ -351,7 +351,7 @@
     "\n",
     "For our discussion we will be looking at the classic Cliff World, or Cliff Walker, environment. This is a 4x10 grid with a starting position in the lower-left and the goal position in the lower-right. Every tile between these two is the \"cliff\", and should the agent enter the cliff, they will receive a -100 reward and be sent back to the starting position. Every tile other than the cliff produces a -1 reward when entered. The goal tile ends the episode after taking any action from it.\n",
     "\n",
-    "<img alt=\"CliffWorld\" width=\"577\" height=\"308\" src=\"https://github.com/NeuromatchAcademy/course-content/blob/main/tutorials/static/W3D4_Tutorial3_CliffWorld.png?raw=true\">\n",
+    "<img alt=\"CliffWorld\" width=\"577\" height=\"308\" src=\"https://github.com/NeuromatchAcademy/course-content/blob/main/tutorials/static/W3D4_Tutorial3_GridWorld410.png?raw=true\">\n",
     "\n",
     "Given these conditions, the maximum achievable reward is -11 (1 up, 9 right, 1 down). Using negative rewards is a common technique to encourage the agent to move and seek out the goal state as fast as possible."
    ]

diff --git a/tutorials/W3D4_ReinforcementLearning/instructor/W3D4_Tutorial1.ipynb b/tutorials/W3D4_ReinforcementLearning/instructor/W3D4_Tutorial1.ipynb
@@ -976,7 +976,7 @@
     "\n",
     "With a high learning rate, the value function tracks each observed reward, changing quickly whenever there is a reward prediction error. In a probabilistic scenario case, this behavior results in the value function changing too quickly and never stabilizing (converging). Using a low learning rate can stabilize the value function by smoothing out any variation in the reward signal, leading the value function to converge to the average reward over time. However, using a low learning rate can result in slow learning.\n",
     "\n",
-    "To get the best of all worls, it is often useful to use a high learning rate early on (producing fast learning), and to reduce the learning rate gradually throughout learning (so that the value function converges to the average reward). This is sometimes called \"learning rate schedule\"."
+    "To get the best of all worlds, it is often useful to use a high learning rate early on (producing fast learning), and to reduce the learning rate gradually throughout learning (so that the value function converges to the average reward). This is sometimes called \"learning rate schedule\"."
    ]
   },
   {

diff --git a/tutorials/W3D4_ReinforcementLearning/instructor/W3D4_Tutorial3.ipynb b/tutorials/W3D4_ReinforcementLearning/instructor/W3D4_Tutorial3.ipynb
@@ -351,7 +351,7 @@
     "\n",
     "For our discussion we will be looking at the classic Cliff World, or Cliff Walker, environment. This is a 4x10 grid with a starting position in the lower-left and the goal position in the lower-right. Every tile between these two is the \"cliff\", and should the agent enter the cliff, they will receive a -100 reward and be sent back to the starting position. Every tile other than the cliff produces a -1 reward when entered. The goal tile ends the episode after taking any action from it.\n",
     "\n",
-    "<img alt=\"CliffWorld\" width=\"577\" height=\"308\" src=\"https://github.com/NeuromatchAcademy/course-content/blob/main/tutorials/static/W3D4_Tutorial3_CliffWorld.png?raw=true\">\n",
+    "<img alt=\"CliffWorld\" width=\"577\" height=\"308\" src=\"https://github.com/NeuromatchAcademy/course-content/blob/main/tutorials/static/W3D4_Tutorial3_GridWorld410.png?raw=true\">\n",
     "\n",
     "Given these conditions, the maximum achievable reward is -11 (1 up, 9 right, 1 down). Using negative rewards is a common technique to encourage the agent to move and seek out the goal state as fast as possible."
    ]

diff --git a/tutorials/W3D4_ReinforcementLearning/student/W3D4_Tutorial1.ipynb b/tutorials/W3D4_ReinforcementLearning/student/W3D4_Tutorial1.ipynb
@@ -911,7 +911,7 @@
     "\n",
     "With a high learning rate, the value function tracks each observed reward, changing quickly whenever there is a reward prediction error. In a probabilistic scenario case, this behavior results in the value function changing too quickly and never stabilizing (converging). Using a low learning rate can stabilize the value function by smoothing out any variation in the reward signal, leading the value function to converge to the average reward over time. However, using a low learning rate can result in slow learning.\n",
     "\n",
-    "To get the best of all worls, it is often useful to use a high learning rate early on (producing fast learning), and to reduce the learning rate gradually throughout learning (so that the value function converges to the average reward). This is sometimes called \"learning rate schedule\"."
+    "To get the best of all worlds, it is often useful to use a high learning rate early on (producing fast learning), and to reduce the learning rate gradually throughout learning (so that the value function converges to the average reward). This is sometimes called \"learning rate schedule\"."
    ]
   },
   {

diff --git a/tutorials/W3D4_ReinforcementLearning/student/W3D4_Tutorial3.ipynb b/tutorials/W3D4_ReinforcementLearning/student/W3D4_Tutorial3.ipynb
@@ -351,7 +351,7 @@
     "\n",
     "For our discussion we will be looking at the classic Cliff World, or Cliff Walker, environment. This is a 4x10 grid with a starting position in the lower-left and the goal position in the lower-right. Every tile between these two is the \"cliff\", and should the agent enter the cliff, they will receive a -100 reward and be sent back to the starting position. Every tile other than the cliff produces a -1 reward when entered. The goal tile ends the episode after taking any action from it.\n",
     "\n",
-    "<img alt=\"CliffWorld\" width=\"577\" height=\"308\" src=\"https://github.com/NeuromatchAcademy/course-content/blob/main/tutorials/static/W3D4_Tutorial3_CliffWorld.png?raw=true\">\n",
+    "<img alt=\"CliffWorld\" width=\"577\" height=\"308\" src=\"https://github.com/NeuromatchAcademy/course-content/blob/main/tutorials/static/W3D4_Tutorial3_GridWorld410.png?raw=true\">\n",
     "\n",
     "Given these conditions, the maximum achievable reward is -11 (1 up, 9 right, 1 down). Using negative rewards is a common technique to encourage the agent to move and seek out the goal state as fast as possible."
    ]

diff --git a/tutorials/static/W3D4_Tutorial3_GridWorld410.png b/tutorials/static/W3D4_Tutorial3_GridWorld410.png