-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[coursera] Proofread recaps and week 1 assignments #470
base: coursera
Are you sure you want to change the base?
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed week 1 assignments. Haven't looked at the recaps yet.
@@ -267,7 +267,7 @@ | |||
" policy[s_i,a_i] ~ #[occurrences of s_i and a_i in elite states/actions]\n", | |||
"\n", | |||
" Don't forget to normalize the policy to get valid probabilities and handle the 0/0 case.\n", | |||
" For states that you never visited, use a uniform distribution (1/n_actions for all states).\n", | |||
" For states, that you never visited, use a uniform distribution (1/n_actions for all states).\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I remember English punctuation, this comma shouldn't be there.
"\n", | ||
"In case CEM failed to learn how to win from one distinct starting point, it will simply discard it because no sessions from that starting point will make it into the \"elites\".\n", | ||
"In case CEM failed to learn, how to win from one distinct starting point, it will simply discard it because no sessions from that starting point will make it into the \"elites\".\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for this comma.
"# This code creates a virtual display to draw game images on.\n", | ||
"# It will have no effect if your machine has a monitor.\n", | ||
"# This code creates a virtual display for drawing game images on.\n", | ||
"# It won't have any effect if your machine has a monitor.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have this block in (almost?) every notebook. I wouldn't stress that too much, but if this block is changed in one notebook, it should probably be changed similarly in all other notebooks.
Luckily, that can be done quite easily, since this block is absolutely identical across all notebooks.
@@ -82,7 +82,7 @@ | |||
" activation='tanh',\n", | |||
")\n", | |||
"\n", | |||
"# initialize agent to the dimension of state space and number of actions\n", | |||
"# initialize agent to the dimension of state space and a number of actions\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should probably be "the" here, since it's a very specific number of actions.
" * Try re-using samples from 3-5 last iterations when computing threshold and training\n", | ||
" * Experiment with amount of training iterations and learning rate of the neural network (see params)\n", | ||
" * Try re-using samples from 3-5 last iterations when computing threshold and during training\n", | ||
" * Experiment with an amount of training iterations and the learning rate of the neural network (see params)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"an amount" → "the number"?
@@ -359,7 +361,7 @@ | |||
" ax.set_xlabel('position (x)')\n", | |||
" ax.set_ylabel('velocity (v)')\n", | |||
" \n", | |||
" # Sample a trajectory and draw it\n", | |||
" # Sample the trajectory and draw it\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm reasonably sure that it should be "a trajectory", since we are sampling an arbitrary trajectory, not a specific one.
"* `reset()`: reset environment to the initial state, _return first observation_\n", | ||
"* `render()`: show current environment state (a more colorful version :) )\n", | ||
"* `step(a)`: commit action `a` and return `(new_observation, reward, is_done, info)`\n", | ||
"The three main methods of this environment are:\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be "an environment", since there are many possible environments, and the interface is identical for all of them.
@@ -119,7 +119,7 @@ | |||
"source": [ | |||
"### Play with it\n", | |||
"\n", | |||
"Below is the code that drives the car to the right. However, if you simply use the default policy, the car will not reach the flag at the far right due to gravity.\n", | |||
"Below is the code that drives the car to the right. However, if you simply use the default policy, the car won't reach the flag at the far right due to the gravity.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this "the". My internal English language model is confused.
" # Your goal is to fix that. You don't need anything sophisticated here,\n", | ||
" # and you can hard-code any policy that seems to work.\n", | ||
" # Hint: think how you would make a swing go farther and faster.\n", | ||
" # Hint: think how you would make a swing go faster and faster.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, we meant "farther" here. This change doesn't alter the meaning much, though.
I will also revert the metadata changes later. |
@@ -23,13 +23,13 @@ | |||
"source": [ | |||
"### Part I: Jupyter notebooks in a nutshell\n", | |||
"* You are reading this line in a jupyter notebook.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The word "Jupyter" should be capitalized throughout.
" * This cell contains hypertext, while the next one contains code.\n", | ||
"* You can __run a cell__ with code by selecting it (click) and pressing `Ctrl + Enter` to execute the code and display the output (if any).\n", | ||
"* If you're running this on a device with no keyboard, ~~you are doing it wrong~~ use the top bar (esp. play/stop/restart buttons) to run the code.\n", | ||
"* Behind the curtains, there's a Python interpreter, that runs the code and remembers everything, that was defined.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am reasonably sure that these commas aren't needed.
"* Use top menu → Kernel → Interrupt (or Stop button), if you want to stop running the cell midway.\n", | ||
"* Use top menu → Kernel → Restart (or cyclic arrow button), if interruption doesn't fix the problem (be aware, you will lose all defined variables).\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same about these commas.
@@ -103,9 +103,9 @@ | |||
"metadata": {}, | |||
"source": [ | |||
"### Part II: Loading data with Pandas\n", | |||
"Pandas is a library that helps you load the data, prepare it and perform some lightweight analysis. The god object here is the `pandas.DataFrame` - a 2D table with batteries included. \n", | |||
"Pandas is the library that helps you to load the data, prepare it and perform some lightweight analysis. The god object here is the `pandas.DataFrame` - a 2D table with included batteries. \n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be "a" here, I think. In addition, "batteries included" is a common expression (e.g. https://www.python.org/dev/peps/pep-0206/).
week1_intro/primer/recap_ml.ipynb
Outdated
"metadata": { | ||
"tags": [] | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove these changes myself.
@@ -2303,7 +2325,7 @@ | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"# plot a histogram of age and a histogram of ticket fares on separate plots\n", | |||
"# plot the histogram of age and the histogram of ticket fares on separate plots\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "a" is fine in both cases here.
@@ -336,7 +339,7 @@ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"If you compute gradient from multiple losses, the gradients will add up at variables, therefore it's useful to __zero the gradients__ between iteratons." | |||
"If you compute gradient from multiple losses, the gradients will be added to variables, therefore, it's useful to __zero the gradients__ between iteratons." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The meaning is different. The gradient isn't added to variables; rather, each variable has an attached field called .grad
which is where the gradients accumulate.
"* In general, PyTorch's error messages are quite helpful, read 'em before you google 'em.\n", | ||
"* if you see nan/inf, print what happens at each iteration to find our where exactly it occurs.\n", | ||
"* In general, PyTorch's error messages are quite helpful, read them before you google them.\n", | ||
"* if you see nan/inf, print what happens at each iteration to find out, where exactly it occurs.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant comma?
"* PyTorch examples - a repo, that implements many cool DL models in PyTorch - [link](https://github.com/pytorch/examples)\n", | ||
"* Practical PyTorch - a repo, that implements some... other cool DL models... yes, in PyTorch - [link](https://github.com/spro/practical-pytorch)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two redundant commas?
" \n", | ||
"* If anything seems wrong, try going through one step of training and printing everything you compute.\n", | ||
"* If something seems wrong, try going through each step of training and print everything you compute.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the original text meant "try going through a single training step and print everything you compute".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed weeks 2 & 3. 4-6 still pending.
@@ -6,9 +6,9 @@ | |||
"source": [ | |||
"### Markov decision process\n", | |||
"\n", | |||
"This week's methods are all built to solve __M__arkov __D__ecision __P__rocesses. In the broadest sense, an MDP is defined by how it changes states and how rewards are computed.\n", | |||
"This week methods are all built to solve __M__arkov __D__ecision __P__rocesses. In the broadest sense, the MDP is defined by how it changes the states and how rewards are computed.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "an MDP" is fine here.
@@ -78,7 +78,7 @@ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"We can now use MDP just as any other gym environment:" | |||
"We can now use the MDP just as any other gym environment:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably MDP
(written in monospace) would be better here (since it's a name).
"num_iter = 100 # maximum iterations, excluding initialization\n", | ||
"# stop VI if new values are this close to old values (or closer)\n", | ||
"# stop VI if new values are as close to old values (or closer)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this isn't valid English. "As close" implies a continuation (as close as...), which is missing here.
@@ -677,7 +677,7 @@ | |||
"source": [ | |||
"### Let's visualize!\n", | |||
"\n", | |||
"It's usually interesting to see what your algorithm actually learned under the hood. To do so, we'll plot state value functions and optimal actions at each VI step." | |||
"It's usually interesting to see, what your algorithm actually learned under the hood. To do so, we'll plot the state value functions and optimal actions at each VI step." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant comma?
"\n", | ||
"<img src=https://github.com/yandexdataschool/Practical_RL/raw/master/yet_another_week/_resource/exp_replay.png width=480>\n", | ||
"\n", | ||
"#### Training with experience replay\n", | ||
"1. Play game, sample `<s,a,r,s'>`.\n", | ||
"2. Update q-values based on `<s,a,r,s'>`.\n", | ||
"3. Store `<s,a,r,s'>` transition in a buffer. \n", | ||
" 3. If buffer is full, delete earliest data.\n", | ||
"4. Sample K such transitions from that buffer and update q-values based on them.\n", | ||
" 3. If buffer is full, delete the earliest data.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the buffer?
" An agent, that changes some of q-learning functions to implement Expected Value SARSA. \n", | ||
" Note: this demo assumes, that your implementation of QLearningAgent.update uses get_value(next_state).\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant commas?
@@ -220,7 +220,7 @@ | |||
"source": [ | |||
"### Cliff World\n", | |||
"\n", | |||
"Let's now see how our algorithm compares against q-learning in case where we force agent to explore all the time.\n", | |||
"Let's now see, how our algorithm compares against q-learning in case, where we force an agent to explore all the time.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant commas?
I'm also not sure about "an" here — it might have to be "the".
@@ -328,7 +328,7 @@ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"Let's now see what did the algorithms learn by visualizing their actions at every state." | |||
"Let's now see, what did the algorithms learn by visualizing their actions at every state." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant comma?
It should also be "what the algorithms learned" instead of "what did the algorithms learn".
@@ -395,7 +395,7 @@ | |||
"\n", | |||
"Here are some of the things you can do if you feel like it:\n", | |||
"\n", | |||
"* Play with epsilon. See learned how policies change if you set epsilon to higher/lower values (e.g. 0.75).\n", | |||
"* Play with an epsilon. See how learned policies change if you set an epsilon to the higher/lower values (e.g. 0.75).\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a very specific epsilon, so it should be "the".
On the other hand, you may regard "epsilon" as a name, in which case there should be no article.
@@ -395,7 +395,7 @@ | |||
"\n", | |||
"Here are some of the things you can do if you feel like it:\n", | |||
"\n", | |||
"* Play with epsilon. See learned how policies change if you set epsilon to higher/lower values (e.g. 0.75).\n", | |||
"* Play with an epsilon. See how learned policies change if you set an epsilon to the higher/lower values (e.g. 0.75).\n", | |||
"* Expected Value SASRSA for softmax policy:\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you replace "SASRSA" with "SARSA" here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of redundant commas: roughty ⅔ of the comments I made were about them. Other than that, there are only a few problems here and there.
"\n", | ||
"Ideally you should start small with maybe 1-2 hidden layers with < 200 neurons and then increase network size if agent doesn't beat the target score." | ||
"Ideally you should start small, with maybe 1-2 hidden layers with < 200 neurons and then increase the network size, if agent doesn't beat the target score." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There shouldn't be a comma here, according to this English.SE answer.
"assert isinstance(list(network.modules(\n", | ||
"))[-1], nn.Linear), \"please make sure you predict q-values without nonlinearity (ignore if you know what you're doing)\"\n", | ||
"))[-1], nn.Linear), \"please, make sure you predict q-values without nonlinearity (ignore, if you know, what you're doing)\"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, we don't need commas here.
" ) == 2, \"make sure, that you predicted q-values for all actions in next state\"\n", | ||
" assert next_state_values.data.dim(\n", | ||
" ) == 1, \"make sure you computed V(s') as maximum over just the actions axis and not all axes\"\n", | ||
" ) == 1, \"make sure, that you computed V(s') as maximum over just the actions axis and not all axes\"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, it seems that commas are unnecessary here.
"\n", | ||
"Seriously though,\n", | ||
"* __ mean reward__ is the average reward per game. For a correct implementation it may stay low for some 10 epochs, then start growing while oscilating insanely and converges by ~50-100 steps depending on the network architecture. \n", | ||
"* __ mean reward__ is the average reward per game. For the correct implementation it may stay low for some 10 epochs, then start growing, while oscilating insanely and converges by ~50-100 steps depending on the network architecture. \n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are a few remaining problems with this line. How about:
__ mean reward__ is the average reward per game. When you implement everything correctly, it may stay low for some 10 epochs, then start growing while oscilating insanely, and converge after around 50-100 epochs depending on the network architecture.
"* If it never reaches target score by the end of for loop, try increasing the number of hidden neurons or look at the epsilon.\n", | ||
"* __ epsilon__ - agent's willingness to explore. If you see that agent's already at < 0.01 epsilon before it's is at least 200, just reset it back to 0.1 - 0.5." | ||
"* __ epsilon__ - agent's willingness to explore. If you see, that agent's already at < 0.01 epsilon before it's is at least 200, just reset it back to 0.1 - 0.5." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this is a subordinating conjuction that is not at the beginning of the sentence, and the comma is unnecessary. However, some the sentence is unclear for other reasons. How about:
__ epsilon__ — the agent's willingness to explore. If you see that the agent is already at epsilon < 0.01 before the score is at least 200, just reset the epsilon back to 0.1 - 0.5.
"\n", | ||
"* Optimizer: In the original paper RMSProp was used (they did not have Adam in 2013) and it can work not worse than Adam. For us Adam was default and it worked.\n", | ||
"* Optimizer: In the original paper RMSProp was used (they did not have Adam in 2013) and it can work as good as Adam. For us Adam was a default and it worked.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"as well as"?
"\n", | ||
"* target network update frequency: has something in common with learning rate. Too frequent updates can lead to divergence. Too rare can lead to slow leraning. For millions of total timesteps thousands of inner steps seem ok. One iteration of target network updating is an iteration of the (this time approximate) $\\gamma$-compression that stands behind Q-learning. The more inner steps it makes the more accurate is the compression.\n", | ||
"* target network update frequency: has something in common with the learning rate. Too frequent updates can lead to divergence. Too rare can lead to slow learning. For millions of total timesteps thousands of inner steps seem ok. One iteration of target network updating is an iteration of (this time approximate) $\\gamma$-compression, that stands behind Q-learning. The more inner steps it makes the more accurate is the compression.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant comma? (in "compression, that")
@@ -1217,11 +1215,11 @@ | |||
"If you want to play with DQN a bit more, here's a list of things you can try with it:\n", | |||
"\n", | |||
"### Easy:\n", | |||
"* Implementing __double q-learning__ shouldn't be a problem if you've already have target networks in place.\n", | |||
"* Implementing __double q-learning__ shouldn't be a problem, if you already have target networks in place.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant comma?
" * You will probably need `tf.argmax` to select best actions\n", | ||
" * Here's an original [article](https://arxiv.org/abs/1509.06461)\n", | ||
"\n", | ||
"* __Dueling__ architecture is also quite straightforward if you have standard DQN.\n", | ||
"* __Dueling__ architecture is also quite straightforward, if you have standard DQN.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant comma?
"* You can now sample such transitions in proportion to the error (see [article](https://arxiv.org/abs/1511.05952)) for training.\n", | ||
"\n", | ||
"It's probably more convenient to explicitly declare inputs for \"sample observations\", \"sample actions\" and so on to plug them into q-learning.\n", | ||
"\n", | ||
"Prioritized (and even normal) experience replay should greatly reduce amount of game sessions you need to play in order to achieve good performance. \n", | ||
"Prioritized (and even normal) experience replay should greatly reduce an amount of game sessions you need to play in order to achieve good performance. \n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"an amount" → "the number"
No description provided.