diff --git a/2023/tama/griddy/index.html b/2023/tama/griddy/index.html
index f736f0e..2a85205 100644
--- a/2023/tama/griddy/index.html
+++ b/2023/tama/griddy/index.html
@@ -91,7 +91,7 @@ <h3 id="a-straightforward-approach">A straightforward approach</h3>
 <p>Anyway, the most straightforward solution would be to <em>just do it</em> as stated, a.k.a., <em>brute force</em>: enumerate all <span class="math inline">\(2^{rc}\)</span> grids, compute <span class="math inline">\(B(G)\)</span> for each of them, and then sum up all these <span class="math inline">\(B(G)^3\)</span>. Enumerating grids is relatively straightforward with backtracking, and for the first subtask, <span class="math inline">\(2^{rc} = 2^{25} = 33554432\)</span> which is quite manageable for a computer. The only missing ingredient to fully implement this solution is being able to compute <span class="math inline">\(B(G)\)</span> for a given grid <span class="math inline">\(G\)</span>.</p>
 <h3 id="computing-bg">Computing <span class="math inline">\(B(G)\)</span></h3>
 <p>We are given a grid with <span class="math inline">\(r\)</span> rows and <span class="math inline">\(c\)</span> columns, and we want to make it <em>based</em>, i.e., change it so that every row and every column has an even number of cringe memes.</p>
-<p>Let’s convert a cool meme (🗿) into a <span class="math inline">\(0\)</span> and a cringe meme (😬) into a <span class="math inline">\(1\)</span>, so the condition translates to: the sum of every row and every column is even.</p>
+<p>Let’s convert a cool meme (🗿) into a <span class="math inline">\(0\)</span> and a cringe meme (😬) into a <span class="math inline">\(1\)</span>, so the condition translates to: <strong>the sum of every row and every column is even.</strong></p>
 <p>Now, the effect of flipping a cell is to flip the <span class="definition" data-bs-toggle="tooltip" data-bs-placement="bottom" title="The parity of a number is whether it’s odd or even. (The word “parity” itself is related to the word “pair”.)">parity</span> of exactly one row and exactly one column, namely the row and column containing the cell. Thus, if there are <span class="math inline">\(R\)</span> odd rows, then you need at least <span class="math inline">\(R\)</span> flips to make all these odd rows even. Similarly, if there are <span class="math inline">\(C\)</span> odd columns, then you need at least <span class="math inline">\(C\)</span> flips. Combining these tells us that we need <span class="math inline">\(\max(R, C)\)</span> or more moves to make the grid based.</p>
 On the other hand, for every move, we can choose the row and column to flip <em>independently</em>. Thus, it seems intuitive that <span class="math inline">\(\max(R, C)\)</span> moves are enough. And indeed, it is:
 <div class="theorem">
diff --git a/2023/tama/griddy/index.md b/2023/tama/griddy/index.md
index 1778bf7..157ce88 100644
--- a/2023/tama/griddy/index.md
+++ b/2023/tama/griddy/index.md
@@ -29,7 +29,7 @@ Anyway, the most straightforward solution would be to *just do it* as stated, a.
 
 We are given a grid with $r$ rows and $c$ columns, and we want to make it *based*, i.e., change it so that every row and every column has an even number of cringe memes.
 
-Let&rsquo;s convert a cool meme (&#128511;) into a $0$ and a cringe meme (&#128556;) into a $1$, so the condition translates to: the sum of every row and every column is even.
+Let&rsquo;s convert a cool meme (&#128511;) into a $0$ and a cringe meme (&#128556;) into a $1$, so the condition translates to: **the sum of every row and every column is even.**
 
 Now, the effect of flipping a cell is to flip the <span class="definition" data-bs-toggle="tooltip" data-bs-placement="bottom" title="The parity of a number is whether it&rsquo;s odd or even. (The word &ldquo;parity&rdquo; itself is related to the word &ldquo;pair&rdquo;.)">parity</span> of exactly one row and exactly one column, namely the row and column containing the cell. Thus, if there are $R$ odd rows, then you need at least $R$ flips to make all these odd rows even. Similarly, if there are $C$ odd columns, then you need at least $C$ flips. Combining these tells us that we need $\max(R, C)$ or more moves to make the grid based.
 
diff --git a/2023/tama/lucas/index.html b/2023/tama/lucas/index.html
index d095e08..f4d337b 100644
--- a/2023/tama/lucas/index.html
+++ b/2023/tama/lucas/index.html
@@ -117,7 +117,7 @@ <h3 id="generate-the-parities-separately">Generate the parities separately</h3>
     <span class="cf">if</span> parity <span class="op">==</span> <span class="dv">0</span>:
         total <span class="op">=</span> (total <span class="op">+</span> value<span class="op">**</span><span class="dv">2</span>) <span class="op">%</span> m</code></pre></div>
 <div class="remarks">
-<p><strong>Remark:</strong> You can also just compute the Lucas numbers modulo <span class="math inline">\(2m\)</span>; that way, reducing them modulo <span class="math inline">\(m\)</span> and <span class="math inline">\(2\)</span> is still valid. (Can you see why?)</p>
+<p><strong>Remark:</strong> You could also just compute the Lucas numbers modulo <span class="math inline">\(2m\)</span>; that way, reducing them modulo <span class="math inline">\(m\)</span> and <span class="math inline">\(2\)</span> is still valid. (Can you see why?)</p>
 </div>
 <h3 id="pen-and-paper-insight">Pen-and-paper insight</h3>
 It turns out that there’s a simple criterion that gives us the parity of a Lucas number given only its index.
diff --git a/2023/tama/lucas/index.md b/2023/tama/lucas/index.md
index d543536..bfe0392 100644
--- a/2023/tama/lucas/index.md
+++ b/2023/tama/lucas/index.md
@@ -61,7 +61,7 @@ for value, parity in zip(L, L_parity):
 ```
 
 <div class="remarks">
-**Remark:** You can also just compute the Lucas numbers modulo $2m$; that way, reducing them modulo $m$ and $2$ is still valid. (Can you see why?)
+**Remark:** You could also just compute the Lucas numbers modulo $2m$; that way, reducing them modulo $m$ and $2$ is still valid. (Can you see why?)
 </div>
 
 ### Pen-and-paper insight
diff --git a/2023/tama/pillage/index.html b/2023/tama/pillage/index.html
index e49accf..29f1479 100644
--- a/2023/tama/pillage/index.html
+++ b/2023/tama/pillage/index.html
@@ -85,35 +85,35 @@ <h2 id="solution-writeup">Solution Writeup</h2>
 <strong>Solution Writeup:</strong> Kevin Atienza</p>
 <div class="editorial-section">
 <h2 id="remark-ab-modulo-998244353">Remark: <span class="math inline">\(a/b\)</span> modulo <span class="math inline">\(998244353\)</span>?</h2>
-<p>In this editorial, I'll pretend that we're <em>computing the full answer</em> (a rational number) instead of <em>the answer modulo <span class="math inline">\(998244353\)</span></em>. It turns out that many exact solutions can be adapted to compute the answer modulo something instead. A bonus section below describes how to do it.</p>
+<p>In this editorial, I’ll pretend that we’re computing <em>the full answer</em> (a rational number) instead of <em>the answer modulo <span class="math inline">\(998244353\)</span></em>. It turns out that many exact solutions can be adapted to compute the answer modulo something instead. A bonus section below describes how to do it.</p>
 </div>
 <p><details class="editorial-section"><summary class="h2">Subtask 1</summary></p>
-<p>For Subtask 1, I'll describe a solution that doesn't use a lot of insights and essentially only uses <strong>dynamic programming</strong> (DP) (aside from the definition of <a href="https://en.wikipedia.org/wiki/Expected_value">expected value</a>). You could also solve this subtask with <em>pen and paper</em> by using the solution for Subtask 2, which is perfectly doable by hand (and easier to implement as well).</p>
+<p>For Subtask 1, I’ll describe a solution that doesn’t use a lot of insights and essentially only uses <strong>dynamic programming</strong> (DP) (aside from the definition of <a href="https://en.wikipedia.org/wiki/Expected_value">expected value</a>). You could also solve this subtask with <em>pen and paper</em> by using the solution for Subtask 2, which is perfectly doable by hand (and easier to implement as well).</p>
 <h3 id="expected-value-counting">Expected value ⇝ Counting</h3>
 <p>If you have some sort of “random variable” <span class="math inline">\(X\)</span>, then we say that the <strong>expected value</strong> of <span class="math inline">\(X\)</span>, denoted <span class="math inline">\(\operatorname{E}[X]\)</span>, is the weighted sum of the possible results of <span class="math inline">\(X\)</span>, weighted by their probabilities. More formally, if the possible results are <span class="math inline">\(\{x_1, x_2, \ldots, x_k\}\)</span> with respective probabilities <span class="math inline">\(p_1, p_2, \ldots, p_k\)</span>, then <span class="math display">\[\operatorname{E}[X] := p_1x_1 + p_2x_2 + \ldots + p_kx_k,\]</span> or in summation notation, <span class="math display">\[\operatorname{E}[X] := \sum_{i=1}^k p_ix_i.\]</span> The expected value of <span class="math inline">\(X\)</span> can be thought of as the <em>average</em> value of <span class="math inline">\(X\)</span>, when an experiment is performed many, many times and averaging the value of <span class="math inline">\(X\)</span> across them.</p>
 <p>Here are some examples:</p>
 <ul>
-<li>If <span class="math inline">\(X\)</span> represents the result of throwing a die, then the possible results are <span class="math inline">\(\{1, 2, \ldots, 6\}\)</span>, each with probability <span class="math inline">\(1/6\)</span>, so the expected value is <span class="math display">\[\operatorname{E}[X] = \frac{1}{6}\cdot 1 + \frac{1}{6}\cdot 2 + \ldots + \frac{1}{6}\cdot 6 = \frac{1}{6}(1 + 2 + \ldots + 6) = \frac{21}{6} = 3.5.\]</span></li>
+<li>If <span class="math inline">\(X\)</span> represents the result of throwing a <span class="definition" data-bs-toggle="tooltip" data-bs-placement="bottom" title="A die is a cube whose sides are marked 1, 2, ..., 6 dots. We usually look at the idealized scenario where all six sides are equally likely to come up when you throw the die.">die</span>, then the possible results are <span class="math inline">\(\{1, 2, \ldots, 6\}\)</span>, each with probability <span class="math inline">\(1/6\)</span>, so the expected value is <span class="math display">\[\operatorname{E}[X] = \frac{1}{6}\cdot 1 + \frac{1}{6}\cdot 2 + \ldots + \frac{1}{6}\cdot 6 = \frac{1}{6}(1 + 2 + \ldots + 6) = \frac{21}{6} = 3.5.\]</span></li>
 <li>If <span class="math inline">\(Y\)</span> represents the <em>sum</em> of the results of throwing two dice, then the possible results are <span class="math inline">\(\{2, 3, 4, \ldots, 12\}\)</span>. The probabilities are no longer uniform, e.g., <span class="math inline">\(7\)</span> is much more probable than <span class="math inline">\(2\)</span> or <span class="math inline">\(12\)</span>. The full table of probabilities is: <span class="math display">\[\begin{array}{r|ccccccccccc}
 \text{result}      &amp; 2 &amp; 3 &amp; 4 &amp; 5 &amp; 6 &amp; 7 &amp; 8 &amp; 9 &amp; 10 &amp; 11 &amp; 12 \\
 \hline
 \text{probability} &amp; \frac{1}{36} &amp; \frac{2}{36} &amp; \frac{3}{36} &amp; \frac{4}{36} &amp; \frac{5}{36} &amp; \frac{6}{36} &amp; \frac{5}{36} &amp; \frac{4}{36} &amp; \frac{3}{36} &amp; \frac{2}{36} &amp; \frac{1}{36}
 \end{array}\]</span> and you can check that the expected value of <span class="math inline">\(Y\)</span> is <span class="math display">\[\operatorname{E}[Y] = \frac{252}{36} = 7.\]</span></li>
 </ul>
-<p>So let's define a random variable <span class="math inline">\(T\)</span> representing the result of the process outlined in the problem statement. The process chooses <span class="math inline">\(w\)</span> numbers randomly<a href="#fn1" class="footnoteRef" id="fnref1"><sup>1</sup></a> between <span class="math inline">\(1\)</span> and <span class="math inline">\(k\)</span>, and <span class="math inline">\(T\)</span> is calculated as the <em>sum</em> of the <span class="math inline">\(n\)</span> largest elements, so the possible results are between <span class="math inline">\(n\)</span> and <span class="math inline">\(nk\)</span>. If we write the probability of obtaining the result <span class="math inline">\(t\)</span> as <span class="math inline">\(p_t\)</span>, then the answer is <span class="math display">\[\operatorname{E}[T] = \sum_{t=n}^{nk}\, p_t\,t.\]</span> So we are done if we can compute <span class="math inline">\(p_t\)</span> for each <span class="math inline">\(t\)</span> from <span class="math inline">\(n\)</span> to <span class="math inline">\(nk\)</span>.</p>
+<p>So let’s define a random variable <span class="math inline">\(T\)</span> representing the result of the process outlined in the problem statement. The process chooses <span class="math inline">\(w\)</span> numbers randomly<a href="#fn1" class="footnoteRef" id="fnref1"><sup>1</sup></a> between <span class="math inline">\(1\)</span> and <span class="math inline">\(k\)</span>, and <span class="math inline">\(T\)</span> is calculated as the <em>sum</em> of the <span class="math inline">\(n\)</span> largest elements, so the possible results are between <span class="math inline">\(n\)</span> and <span class="math inline">\(nk\)</span>. If we write the probability of obtaining the result <span class="math inline">\(t\)</span> as <span class="math inline">\(p_t\)</span>, then the answer is <span class="math display">\[\operatorname{E}[T] = \sum_{t=n}^{nk}\, p_t\,t.\]</span> So we are done if we can compute <span class="math inline">\(p_t\)</span> for each <span class="math inline">\(t\)</span> from <span class="math inline">\(n\)</span> to <span class="math inline">\(nk\)</span>.</p>
 <p>Now, the process has <span class="math inline">\(k^w\)</span> possible outcomes—namely all the sequences of length <span class="math inline">\(w\)</span>, each element of which is between <span class="math inline">\(1\)</span> and <span class="math inline">\(k\)</span>—and each of those outcomes is equally likely. Therefore, we can simply <em>count</em> the number of outcomes that result in a sum of <span class="math inline">\(t\)</span>, then divide by <span class="math inline">\(k^w\)</span> to get the probability. If we write the <em>number</em> of sequences whose sum of <span class="math inline">\(n\)</span> largest elements is <span class="math inline">\(t\)</span> as <span class="math inline">\(c_t\)</span>, then we simply have <span class="math display">\[p_t = \frac{c_t}{k^w}.\]</span></p>
-<p>So we've now reduced the problem to computing the <span class="math inline">\(c_t\)</span>s. Now, a sum of <span class="math inline">\(t\)</span> can arise in multiple ways. For example, if <span class="math inline">\(n = 3\)</span> and <span class="math inline">\(t = 10\)</span>, then the top <span class="math inline">\(3\)</span> values of the sequence (each in sorted order) could be <span class="math inline">\([2, 3, 5]\)</span>, or it could be <span class="math inline">\([2, 4, 4]\)</span>, or <span class="math inline">\([1, 1, 8]\)</span>, or something else. So, to count the number of sequences whose sum of <span class="math inline">\(n\)</span> largest elements is <span class="math inline">\(t\)</span>, we need to enumerate all possible sequences of top <span class="math inline">\(n\)</span> values whose sum is <span class="math inline">\(t\)</span>, and for each one, count the number of sequences of length <span class="math inline">\(w\)</span> whose sequence of top <span class="math inline">\(n\)</span> values is <em>that</em> sequence.</p>
-If that's confusing, let's formalize a bit. Let's define a <strong>winner sequence</strong> as a <em>sorted</em> sequence of <span class="math inline">\(n\)</span> values, each of which is between <span class="math inline">\(1\)</span> and <span class="math inline">\(k\)</span>. Winner sequences are exactly the possible “sequences of <span class="math inline">\(n\)</span> largest values”. Now, if <span class="math inline">\(W\)</span> is a winner sequence, let's define <span class="math inline">\(c(W, w)\)</span> as the number of length-<span class="math inline">\(w\)</span> sequences whose sequence of <span class="math inline">\(n\)</span> largest values is <span class="math inline">\(W\)</span>. Then you may check that the following equation holds <span class="math display">\[c_t = \sum_{\substack{\text{$W$ is a winner sequence} \\ \mathit{sum}(W) = t}} c(W, w).\]</span> Thus, we've further reduced the problem to that of computing <span class="math inline">\(c(W, w)\)</span> across all winner sequences <span class="math inline">\(W\)</span>. And as it turns out, for Subtask 1, there aren't that many winner sequences. We can see this by simply enumerating them all (say with a computer). Finding a formula for the number of them isn't that hard either:
+<p>So we’ve now reduced the problem to computing the <span class="math inline">\(c_t\)</span>s. Now, a sum of <span class="math inline">\(t\)</span> can arise in multiple ways. For example, if <span class="math inline">\(n = 3\)</span> and <span class="math inline">\(t = 10\)</span>, then the top <span class="math inline">\(3\)</span> values of the sequence (each in sorted order) could be <span class="math inline">\([2, 3, 5]\)</span>, or it could be <span class="math inline">\([2, 4, 4]\)</span>, or <span class="math inline">\([1, 1, 8]\)</span>, or something else. So, to count the number of sequences whose sum of <span class="math inline">\(n\)</span> largest elements is <span class="math inline">\(t\)</span>, we need to enumerate all possible sequences of top <span class="math inline">\(n\)</span> values whose sum is <span class="math inline">\(t\)</span>, and for each one, count the number of sequences of length <span class="math inline">\(w\)</span> whose sequence of top <span class="math inline">\(n\)</span> values is <em>that</em> sequence.</p>
+If that’s confusing, let’s formalize a bit. Let’s define a <strong>winner sequence</strong> as a <em>sorted</em> sequence of <span class="math inline">\(n\)</span> values, each of which is between <span class="math inline">\(1\)</span> and <span class="math inline">\(k\)</span>. Winner sequences are exactly the possible “sequences of <span class="math inline">\(n\)</span> largest values”. Now, if <span class="math inline">\(W\)</span> is a winner sequence, let’s define <span class="math inline">\(c(W, w)\)</span> as the number of length-<span class="math inline">\(w\)</span> sequences whose sequence of <span class="math inline">\(n\)</span> largest values is <span class="math inline">\(W\)</span>. Then you may check that the following equation holds <span class="math display">\[c_t = \sum_{\substack{\text{$W$ is a winner sequence} \\ \mathit{sum}(W) = t}} c(W, w).\]</span> Thus, we’ve further reduced the problem to that of computing <span class="math inline">\(c(W, w)\)</span> across all winner sequences <span class="math inline">\(W\)</span>. And as it turns out, for Subtask 1, there aren’t that many winner sequences. We can see this by simply enumerating them all (say with a computer). Finding a formula for the number of them isn’t that hard either:
 <div class="task">
 <p><strong>Exercise:</strong> Show that the number of winner sequences is exactly <span class="math inline">\(\binom{n + k - 1}{n}\)</span>.</p>
 </div>
 <p>For Subtask 1, <span class="math inline">\(n = 5\)</span> and <span class="math inline">\(k = 5\)</span>, so <span class="math inline">\(\binom{n + k - 1}{n} = 126\)</span>, so there are indeed only a few of them.</p>
 <h3 id="computing-cw-w">Computing <span class="math inline">\(c(W, w)\)</span></h3>
 <p>Thinking “DP-cally”, we now attempt to build the length-<span class="math inline">\(w\)</span> sequence element by element. As we build the sequence, its “sequence of <span class="math inline">\(n\)</span> largest elements” changes as well.</p>
-<p>Let's be more precise. For a sequence <span class="math inline">\(S\)</span>, let's call the “sequence of <span class="math inline">\(n\)</span> largest elements of <span class="math inline">\(S\)</span>” its <strong>winning sequence,</strong> and denote it by <span class="math inline">\(W_S\)</span>. Now, suppose we insert the value <span class="math inline">\(v\)</span> to <span class="math inline">\(S\)</span>. Let's denote the updated sequence by <span class="math inline">\(S + [v]\)</span>. Then the winning sequence might change because of <span class="math inline">\(v\)</span>. Specifically, the new winning sequence is obtained by <em>inserting</em> <span class="math inline">\(v\)</span> to <span class="math inline">\(W_S\)</span> in its proper sorted location, and then dropping the lowest element. (Can you see why?) Let's denote the process of “inserting a value <span class="math inline">\(v\)</span> to a sequence <span class="math inline">\(W\)</span> in its proper sorted location, and then dropping the lowest element” as a <em>pushpop</em> operation, and denote it by <span class="math inline">\(\mathit{pushpop}(W, v)\)</span>. Then what we're saying is that the winning sequence of <span class="math inline">\(S + [v]\)</span> is related to the winning sequence of <span class="math inline">\(S\)</span> via a pushpop operation—specifically, <span class="math display">\[W_{S + [v]} = \mathit{pushpop}(W_S, v).\]</span></p>
-<p>We can now think recursively, and find a recurrence for <span class="math inline">\(c(W, w)\)</span>, as follows. Every sequence of length <span class="math inline">\(w\)</span> can be obtained by taking a sequence <span class="math inline">\(S\)</span> of length <span class="math inline">\(w - 1\)</span> and then appending some value <span class="math inline">\(v\)</span> (between <span class="math inline">\(1\)</span> to <span class="math inline">\(k\)</span>) to it. And as described above, the new winning sequence <span class="math inline">\(W_{S + [v]}\)</span> is just <span class="math inline">\(\mathit{pushpop}(W_S, v)\)</span>. Notice that this latter expression only depends on <span class="math inline">\(W_S\)</span>, not on <span class="math inline">\(S\)</span> itself. Thus, for each possible <em>winner</em> sequence <span class="math inline">\(W&#39;\)</span>, we could simply collect the sequences <span class="math inline">\(S\)</span> with <span class="math inline">\(W&#39;\)</span> as their winning sequence, and notice that the new winning sequence must be <span class="math inline">\(\mathit{pushpop}(W&#39;, v)\)</span>. In other words, we have the equation <span class="math display">\[c(W, w) = \!\!\!\!\sum_{\substack{W&#39;\,\,\,\, \\ \text{$W&#39;$ is a winner sequence}}} \sum_{\substack{1 \le v \le k \,\,\,\, \\ \mathit{pushpop}(W&#39;, v) = W}} \!\!\!\!(\text{number of sequences $S$ of length $w - 1$ whose winning sequence is $W&#39;$}).\]</span> But the summand is just <span class="math inline">\(c(W&#39;, w - 1)\)</span> by definition! Therefore, we obtain the recurrence <span class="math display">\[c(W, w) = \sum_{\substack{W&#39;\,\,\,\, \\ \text{$W&#39;$ is a winner sequence}}} \sum_{\substack{1 \le v \le k \,\,\,\, \\ \mathit{pushpop}(W&#39;, v) = W}} c(W&#39;, w - 1),\]</span> and we can use this to compute all <span class="math inline">\(c(W, w&#39;)\)</span> we need, via DP: we build a <em>table</em> of results, one for each winner sequence <span class="math inline">\(W\)</span> and each <span class="math inline">\(w&#39; \le w\)</span>. Each entry of the table can be computed using the summation above. Since our formula for <span class="math inline">\(c(W, w&#39;)\)</span> only depends on <span class="math inline">\(c(W&#39;, w&#39; - 1)\)</span>, i.e., those with a smaller <span class="math inline">\(w&#39;\)</span> value, if we compute the table in increasing order of <span class="math inline">\(w&#39;\)</span>, those values have already been computed, and are already on the table. Thus, we'll be able to compute the final result all the way up to <span class="math inline">\(w\)</span>, which is what we wanted.</p>
+<p>Let’s be more precise. For a sequence <span class="math inline">\(S\)</span>, let’s call the “sequence of <span class="math inline">\(n\)</span> largest elements of <span class="math inline">\(S\)</span>” its <strong>winning sequence,</strong> and denote it by <span class="math inline">\(W_S\)</span>. Now, suppose we insert the value <span class="math inline">\(v\)</span> to <span class="math inline">\(S\)</span>. Let’s denote the updated sequence by <span class="math inline">\(S + [v]\)</span>. Then the winning sequence might change because of <span class="math inline">\(v\)</span>. Specifically, the new winning sequence is obtained by <em>inserting</em> <span class="math inline">\(v\)</span> to <span class="math inline">\(W_S\)</span> in its proper sorted location, and then dropping the lowest element. (Can you see why?) Let’s denote the process of “inserting a value <span class="math inline">\(v\)</span> to a sequence <span class="math inline">\(W\)</span> in its proper sorted location, and then dropping the lowest element” as a <em>pushpop</em> operation, and denote it by <span class="math inline">\(\mathit{pushpop}(W, v)\)</span>. Then what we’re saying is that the winning sequence of <span class="math inline">\(S + [v]\)</span> is related to the winning sequence of <span class="math inline">\(S\)</span> via a pushpop operation—specifically, <span class="math display">\[W_{S + [v]} = \mathit{pushpop}(W_S, v).\]</span></p>
+<p>We can now think recursively, and find a recurrence for <span class="math inline">\(c(W, w)\)</span>, as follows. Every sequence of length <span class="math inline">\(w\)</span> can be obtained by taking a sequence <span class="math inline">\(S\)</span> of length <span class="math inline">\(w - 1\)</span> and then appending some value <span class="math inline">\(v\)</span> (between <span class="math inline">\(1\)</span> to <span class="math inline">\(k\)</span>) to it. And as described above, the new winning sequence <span class="math inline">\(W_{S + [v]}\)</span> is just <span class="math inline">\(\mathit{pushpop}(W_S, v)\)</span>. Notice that this latter expression only depends on <span class="math inline">\(W_S\)</span>, not on <span class="math inline">\(S\)</span> itself. Thus, for each possible <em>winner</em> sequence <span class="math inline">\(W&#39;\)</span>, we could simply collect the sequences <span class="math inline">\(S\)</span> with <span class="math inline">\(W&#39;\)</span> as their winning sequence, and notice that the new winning sequence must be <span class="math inline">\(\mathit{pushpop}(W&#39;, v)\)</span>. In other words, we have the equation <span class="math display">\[c(W, w) = \!\!\!\!\sum_{\substack{W&#39;\,\,\,\, \\ \text{$W&#39;$ is a winner sequence}}} \sum_{\substack{1 \le v \le k \,\,\,\, \\ \mathit{pushpop}(W&#39;, v) = W}} \!\!\!\!(\text{number of sequences $S$ of length $w - 1$ whose winning sequence is $W&#39;$}).\]</span> But the summand is just <span class="math inline">\(c(W&#39;, w - 1)\)</span> by definition! Therefore, we obtain the recurrence <span class="math display">\[c(W, w) = \sum_{\substack{W&#39;\,\,\,\, \\ \text{$W&#39;$ is a winner sequence}}} \sum_{\substack{1 \le v \le k \,\,\,\, \\ \mathit{pushpop}(W&#39;, v) = W}} c(W&#39;, w - 1),\]</span> and we can use this to compute all <span class="math inline">\(c(W, w&#39;)\)</span> we need, via DP: we build a <em>table</em> of results, one for each winner sequence <span class="math inline">\(W\)</span> and each <span class="math inline">\(w&#39; \le w\)</span>. Each entry of the table can be computed using the summation above. Since our formula for <span class="math inline">\(c(W, w&#39;)\)</span> only depends on <span class="math inline">\(c(W&#39;, w&#39; - 1)\)</span>, i.e., those with a smaller <span class="math inline">\(w&#39;\)</span> value, if we compute the table in increasing order of <span class="math inline">\(w&#39;\)</span>, those values have already been computed, and are already on the table. Thus, we’ll be able to compute the final result all the way up to <span class="math inline">\(w\)</span>, which is what we wanted.</p>
 <p>Now, as for the base case, you could just directly count the sequences for, say, <span class="math inline">\(w&#39; = n\)</span>, since the winning sequence is basically the <em>sorted</em> version of the sequence itself. Alternatively, we can use <span class="math inline">\(w&#39; = 0\)</span> as our base case, though we need to think about what the winning sequence of a sequence with less than <span class="math inline">\(n\)</span> elements should be. Well, it makes sense to say that the winning sequence must be the whole sequence as well, just sorted. And instead of a <em>pushpop</em> operation, we could simply use a <em>push</em> operation, at least while the sequence still has length less than <span class="math inline">\(n\)</span>.</p>
-<p>With this, we now have a solution! What's the running time? Well, the table has an entry for each <span class="math inline">\((W, w&#39;)\)</span> with <span class="math inline">\(W\)</span> a winner sequence and <span class="math inline">\(w&#39; \le w\)</span>. Recall that there are <span class="math inline">\(\binom{n + k - 1}{n}\)</span> winner sequences, so there are <span class="math inline">\(\approx \binom{n + k - 1}{n}w\)</span> entries. Each entry is computed with the sum above, which clearly has at most <span class="math inline">\(\binom{n + k - 1}{n}k\)</span> summands (often much less). Therefore, the amount of steps is roughly proportional to <span class="math display">\[\approx \binom{n + k - 1}{n}w\cdot \binom{n + k - 1}{n}k = \binom{n + k - 1}{n}^2 wk.\]</span> For Subtask 1, this is good enough; my straightforward Python implementation computes the <em>full</em> answer in less than one second.</p>
+<p>With this, we now have a solution! What’s the running time? Well, the table has an entry for each <span class="math inline">\((W, w&#39;)\)</span> with <span class="math inline">\(W\)</span> a winner sequence and <span class="math inline">\(w&#39; \le w\)</span>. Recall that there are <span class="math inline">\(\binom{n + k - 1}{n}\)</span> winner sequences, so there are <span class="math inline">\(\approx \binom{n + k - 1}{n}w\)</span> entries. Each entry is computed with the sum above, which clearly has at most <span class="math inline">\(\binom{n + k - 1}{n}k\)</span> summands (often much less). Therefore, the amount of steps is roughly proportional to <span class="math display">\[\approx \binom{n + k - 1}{n}w\cdot \binom{n + k - 1}{n}k = \binom{n + k - 1}{n}^2 wk.\]</span> For Subtask 1, this is good enough; my straightforward Python implementation computes the <em>full</em> answer in less than one second.</p>
 <div class="caution">
 <p><strong>Note:</strong> Understanding this implementation is <em>not</em> required to understand the following sections, so you may skip it.</p>
 </div>
@@ -156,7 +156,7 @@ <h3 id="computing-cw-w">Computing <span class="math inline">\(c(W, w)\)</span></
     <span class="cf">return</span> <span class="bu">sum</span>(p_(t) <span class="op">*</span> t <span class="cf">for</span> t <span class="kw">in</span> <span class="bu">range</span>(n, n<span class="op">*</span>k <span class="op">+</span> <span class="dv">1</span>))</code></pre></div>
 <p></details></p>
 <div class="remarks">
-<p><strong>Remark:</strong> The implementation tries to copy our formulas above as closely as possible. As a result, it's highly unoptimized, and there are definitely several improvements that be made. But the main point is that even such unoptimized code is enough to solve the subtask.</p>
+<p><strong>Remark:</strong> The implementation tries to copy our formulas above as closely as possible. As a result, it’s highly unoptimized, and there are definitely several improvements that be made. But the main point is that even such unoptimized code is enough to solve the subtask.</p>
 </div>
 <p></details></p>
 <p><details class="editorial-section"><summary class="h2">Subtask 2</summary></p>
@@ -166,10 +166,10 @@ <h3 id="linearity-of-expectation">Linearity of expectation</h3>
 <li><span class="math inline">\(\operatorname{E}[\alpha X] = \alpha \operatorname{E}[X]\)</span> for any random variable <span class="math inline">\(X\)</span> and any constant <span class="math inline">\(\alpha\)</span>, and</li>
 <li><span class="math inline">\(\operatorname{E}[X_1 + X_2] = \operatorname{E}[X_1] + \operatorname{E}[X_2]\)</span> for any two random variables <span class="math inline">\(X_1\)</span> and <span class="math inline">\(X_2\)</span>.</li>
 </ul>
-<p>The first one is quite intuitive; after all, <span class="math inline">\(\alpha X\)</span> is just <span class="math inline">\(X\)</span> with all values scaled by <span class="math inline">\(\alpha\)</span>, so the <em>average</em> should just be scaled in the same way. However, the second property—additivity—may be surprising. The property could be intuitive in the case where <span class="math inline">\(X_1\)</span> and <span class="math inline">\(X_2\)</span> are <em>independent</em>, but linearity doesn't <em>require</em> them to be—it's simply <em>always</em> true!</p>
-<p>In a bonus section below, we'll explain why this is true, but for now, let's first try to apply this to the problem. Let <span class="math inline">\(T\)</span> be the same random variable as before, so it denotes the <em>sum</em> of the <span class="math inline">\(n\)</span> largest values of the sequence produced. Now, we define <span class="math inline">\(n\)</span> new random variables <span class="math inline">\(T_1, T_2, \ldots T_n\)</span>, where <span class="math inline">\(T_i\)</span> denotes the <span class="math inline">\(i\)</span>th largest value of the sequence. Then clearly we have <span class="math display">\[T = T_1 + T_2 + \ldots + T_n = \sum_{i=1}^n T_i.\]</span> Now, the <span class="math inline">\(T_i\)</span>'s are definitely not independent, e.g., knowing the largest value constrains the possible values of the second value, and vice versa. Regardless, <em>expectation is always additive</em>, so we have the equality <span class="math display">\[\operatorname{E}[T] = \operatorname{E}[T_1] + \operatorname{E}[T_2] + \ldots + \operatorname{E}[T_n] = \sum_{i=1}^n \operatorname{E}[T_i].\]</span> Thus, we've reduced the problem to computing <span class="math inline">\(\operatorname{E}[T_i]\)</span> for <span class="math inline">\(1 \le i \le n\)</span>, which is potentially more manageable!</p>
+<p>The first one is quite intuitive; after all, <span class="math inline">\(\alpha X\)</span> is just <span class="math inline">\(X\)</span> with all values scaled by <span class="math inline">\(\alpha\)</span>, so the <em>average</em> should just be scaled in the same way. However, the second property—additivity—may be surprising. The property could be intuitive in the case where <span class="math inline">\(X_1\)</span> and <span class="math inline">\(X_2\)</span> are <em>independent</em>, but linearity doesn’t <em>require</em> them to be—it’s simply <em>always</em> true!</p>
+<p>In a bonus section below, we’ll explain why this is true, but for now, let’s first try to apply this to the problem. Let <span class="math inline">\(T\)</span> be the same random variable as before, so it denotes the <em>sum</em> of the <span class="math inline">\(n\)</span> largest values of the sequence produced. Now, we define <span class="math inline">\(n\)</span> new random variables <span class="math inline">\(T_1, T_2, \ldots T_n\)</span>, where <span class="math inline">\(T_i\)</span> denotes the <span class="math inline">\(i\)</span>th largest value of the sequence. Then clearly we have <span class="math display">\[T = T_1 + T_2 + \ldots + T_n = \sum_{i=1}^n T_i.\]</span> Now, the <span class="math inline">\(T_i\)</span>’s are definitely not independent, e.g., knowing the largest value constrains the possible values of the second largest, and vice versa. Regardless, <em>expectation is always additive</em>, so we have the equality <span class="math display">\[\operatorname{E}[T] = \operatorname{E}[T_1] + \operatorname{E}[T_2] + \ldots + \operatorname{E}[T_n] = \sum_{i=1}^n \operatorname{E}[T_i].\]</span> Thus, we’ve reduced the problem to computing <span class="math inline">\(\operatorname{E}[T_i]\)</span> for <span class="math inline">\(1 \le i \le n\)</span>, which is potentially more manageable!</p>
 <h3 id="computing-operatornameet_i">Computing <span class="math inline">\(\operatorname{E}[T_i]\)</span></h3>
-<p>Let's now try to compute <span class="math inline">\(\operatorname{E}[T_i]\)</span>, the expected value of the <span class="math inline">\(i\)</span>th largest element of the sequence. The possible values are between <span class="math inline">\(1\)</span> and <span class="math inline">\(k\)</span>, so by definition, we have <span class="math display">\[\operatorname{E}[T_i] = \sum_{v=1}^k \operatorname{P}[T_i = v]\cdot v,\]</span> where <span class="math inline">\(\operatorname{P}[T_i = v]\)</span> denotes the probability that <span class="math inline">\(T_i = v\)</span>. Next, we again turn probability into counting; noting that there are <span class="math inline">\(k^w\)</span> equally likely possibilities, we have something like <span class="math display">\[\operatorname{P}[T_i = v] = \frac{\mathit{count}_{=v}(i)}{k^w}\]</span> where <span class="math inline">\(\mathit{count}_{=v}(i)\)</span> denotes the number of sequences whose <span class="math inline">\(i\)</span>th largest value is <span class="math inline">\(v\)</span>. Thus, we're done if we can compute <span class="math inline">\(\mathit{count}_{=v}(i)\)</span>.</p>
+<p>Let’s now try to compute <span class="math inline">\(\operatorname{E}[T_i]\)</span>, the expected value of the <span class="math inline">\(i\)</span>th largest element of the sequence. The possible values are between <span class="math inline">\(1\)</span> and <span class="math inline">\(k\)</span>, so by definition, we have <span class="math display">\[\operatorname{E}[T_i] = \sum_{v=1}^k \operatorname{P}[T_i = v]\cdot v,\]</span> where <span class="math inline">\(\operatorname{P}[T_i = v]\)</span> denotes the probability that <span class="math inline">\(T_i = v\)</span>. Next, we again turn probability into counting; noting that there are <span class="math inline">\(k^w\)</span> equally likely possibilities, we have something like <span class="math display">\[\operatorname{P}[T_i = v] = \frac{\mathit{count}_{=v}(i)}{k^w}\]</span> where <span class="math inline">\(\mathit{count}_{=v}(i)\)</span> denotes the number of sequences whose <span class="math inline">\(i\)</span>th largest value is <span class="math inline">\(v\)</span>. Thus, we’re done if we can compute <span class="math inline">\(\mathit{count}_{=v}(i)\)</span>.</p>
 <h3 id="computing-mathitcount_vi">Computing <span class="math inline">\(\mathit{count}_{=v}(i)\)</span></h3>
 <p>We can compute <span class="math inline">\(\mathit{count}_{=v}(i)\)</span> by noting that:</p>
 <div class="theorem">
@@ -230,22 +230,22 @@ <h3 id="computing-mathitcount_vi">Computing <span class="math inline">\(\mathit{
 <p>Thus, all in all, there are <span class="math display">\[c(\ell, g, v) = \binom{w}{\ell} \cdot \binom{w - \ell}{g} \cdot (v-1)^{\ell} \cdot (k-v)^g\]</span> such sequences.</p>
 <p>We now have a complete solution! How fast does it run? Well, we need to compute <span class="math inline">\(\operatorname{E}[T_i]\)</span> for <span class="math inline">\(1 \le i \le n\)</span>, which in turn require the values <span class="math inline">\(\mathit{count}_{=v}(i)\)</span> for <span class="math inline">\(1 \le i \le n\)</span> and <span class="math inline">\(1 \le v \le k\)</span>, which in turn require the values <span class="math inline">\(c(\ell, g, v)\)</span> for <span class="math inline">\(0 \le \ell \le w - 1\)</span>, <span class="math inline">\(0 \le g \le n - 1\)</span> and <span class="math inline">\(1 \le v \le k\)</span>.</p>
 <ul>
-<li><p>Each <span class="math inline">\(c(\ell, g, v)\)</span> value is a product of some binomial coefficients and powers. The powers can all be computed with fast exponentiation, or they could just be precomputed in a table at the beginning (since all powers we need have bases less than <span class="math inline">\(k\)</span>, and exponents less than <span class="math inline">\(w\)</span>), and the binomial coefficients can also be precomputed in a table, either via Pascal's identity, or precomputing factorials and using <span class="math display">\[\binom{a}{b} = \frac{a!}{(a - b)!b!}.\]</span> Therefore, we could say that each <span class="math inline">\(c(\ell, g, v)\)</span> can be computed in a constant amount of steps, and since there are <span class="math inline">\(\approx wnk\)</span> of them, the total number of steps to compute them all is <span class="math inline">\(\approx wnk\)</span>.</p></li>
+<li><p>Each <span class="math inline">\(c(\ell, g, v)\)</span> value is a product of some binomial coefficients and powers. The powers can all be computed with fast exponentiation, or they could just be precomputed in a table at the beginning (since all powers we need have bases less than <span class="math inline">\(k\)</span>, and exponents less than <span class="math inline">\(w\)</span>), and the binomial coefficients can also be precomputed in a table, either via Pascal’s identity, or precomputing factorials and using <span class="math display">\[\binom{a}{b} = \frac{a!}{(a - b)!b!}.\]</span> Therefore, we could say that each <span class="math inline">\(c(\ell, g, v)\)</span> can be computed in a constant amount of steps, and since there are <span class="math inline">\(\approx wnk\)</span> of them, the total number of steps to compute them all is <span class="math inline">\(\approx wnk\)</span>.</p></li>
 <li><p>To compute the <span class="math inline">\(\mathit{count}_{=v}(i)\)</span> values, note that there are <span class="math inline">\(kn\)</span> such values, and each one is computed with a summation with <span class="math inline">\(\approx wn\)</span> summands. Therefore, it takes <span class="math inline">\(\approx wn^2 k\)</span> steps to compute them all.</p></li>
 <li><p>The formula for <span class="math inline">\(\operatorname{E}[T]\)</span> has <span class="math inline">\(n\)</span> summands, each of which has a formula with <span class="math inline">\(k\)</span> summands, so this takes <span class="math inline">\(\approx nk\)</span> steps.</p></li>
 <li><p>Finally, we also need to account for the precomputation of factorials and powers. There are <span class="math inline">\(\approx w\)</span> factorials and <span class="math inline">\(\approx kw\)</span> powers to precompute, so their precomputation takes <span class="math inline">\(\approx kw\)</span> steps.</p></li>
 </ul>
-<p>Thus, the running time is dominated by the computation of <span class="math inline">\(\mathit{count}_{=v}(i)\)</span>. For Subtask 2, we have <span class="math inline">\(wn^2 k = 6\cdot 10^9\)</span>, so the number of steps seems small enough for this to be waitable if you use a fast language and a highly optimized implementation. It may be slow though, so instead of that, let's just improve our algorithm further.</p>
+<p>Thus, the running time is dominated by the computation of <span class="math inline">\(\mathit{count}_{=v}(i)\)</span>. For Subtask 2, we have <span class="math inline">\(wn^2 k = 6\cdot 10^9\)</span>, so the number of steps seems small enough for this to be waitable if you use a fast language and a highly optimized implementation. It may be slow though, so instead of that, let’s just improve our algorithm further.</p>
 <h3 id="computing-mathitcount_vi-more-quickly">Computing <span class="math inline">\(\mathit{count}_{=v}(i)\)</span> more quickly</h3>
-<p>Let's look at <span class="math inline">\(\mathit{count}_{=v}(i)\)</span> again. It denotes the number of sequences whose <span class="math inline">\(i\)</span>th largest value is exactly <span class="math inline">\(v\)</span>. It turns out that it's easier to count the number of sequences whose <span class="math inline">\(i\)</span>th largest value is <strong>at most <span class="math inline">\(v\)</span></strong>. Even more nicely, it turns out that you can use the latter to compute the former!</p>
-To see this, let's define <span class="math inline">\(\mathit{count}_{\le v}(i)\)</span> to be the number of sequences whose <span class="math inline">\(i\)</span>th largest value is at most <span class="math inline">\(v\)</span>. Then we easily have:
+<p>Let’s look at <span class="math inline">\(\mathit{count}_{=v}(i)\)</span> again. It denotes the number of sequences whose <span class="math inline">\(i\)</span>th largest value is exactly <span class="math inline">\(v\)</span>. It turns out that it’s easier to count the number of sequences whose <span class="math inline">\(i\)</span>th largest value is <strong>at most <span class="math inline">\(v\)</span></strong>. Even more nicely, it turns out that you can use the latter to compute the former!</p>
+To see this, let’s define <span class="math inline">\(\mathit{count}_{\le v}(i)\)</span> to be the number of sequences whose <span class="math inline">\(i\)</span>th largest value is at most <span class="math inline">\(v\)</span>. Then we easily have:
 <div class="theorem">
 <p><strong>Claim:</strong> <span class="math inline">\(\mathit{count}_{=v}(i) = \mathit{count}_{\le v}(i) - \mathit{count}_{\le v - 1}(i)\)</span>.</p>
 </div>
 <div class="proof">
 <p><strong>Proof:</strong> Left as an exercise to the reader.</p>
 </div>
-So we've reduced the problem to computing <span class="math inline">\(\mathit{count}_{\le v}(i)\)</span> for <span class="math inline">\(0 \le v \le k\)</span> and <span class="math inline">\(1 \le i \le n\)</span>. So what? Well, here's what. It turns out that we can find a version of Theorem 1 that applies to <span class="math inline">\(\mathit{count}_{\le v}(i)\)</span>:
+So we’ve reduced the problem to computing <span class="math inline">\(\mathit{count}_{\le v}(i)\)</span> for <span class="math inline">\(0 \le v \le k\)</span> and <span class="math inline">\(1 \le i \le n\)</span>. So what? Well, here’s what. It turns out that we can find a version of Theorem 1 that applies to <span class="math inline">\(\mathit{count}_{\le v}(i)\)</span>:
 <div class="theorem">
 <p><strong>Theorem 2:</strong> The <span class="math inline">\(i\)</span>th largest value of a sequence is at most <span class="math inline">\(v\)</span> if and only if the sequence has <span class="math inline">\(&lt; i\)</span> elements greater than <span class="math inline">\(v\)</span>.</p>
 </div>
@@ -253,7 +253,7 @@ <h3 id="computing-mathitcount_vi-more-quickly">Computing <span class="math inlin
 <p><strong>Proof:</strong> Left as an exercise to the reader.</p>
 </div>
 <p>And as you may notice, Theorem 2 is much simpler than Theorem 1!</p>
-<p>We can now use a similar counting argument as before. Let <span class="math inline">\(g\)</span> be the number of elements greater than <span class="math inline">\(v\)</span>, so that <span class="math inline">\(g &lt; i\)</span>, and we can again write <span class="math display">\[\mathit{count}_{\le v}(i) = \sum_{g=0}^{i-1} c(g, v)\]</span> where now, <span class="math inline">\(c(g, v)\)</span> denotes the number of sequences with <em>exactly</em> <span class="math inline">\(g\)</span> elements greater than <span class="math inline">\(v\)</span>. Then we can count <span class="math inline">\(c(g, v)\)</span> similarly as before, except it's even simpler:</p>
+<p>We can now use a similar counting argument as before. Let <span class="math inline">\(g\)</span> be the number of elements greater than <span class="math inline">\(v\)</span>, so that <span class="math inline">\(g &lt; i\)</span>, and we can again write <span class="math display">\[\mathit{count}_{\le v}(i) = \sum_{g=0}^{i-1} c(g, v)\]</span> where now, <span class="math inline">\(c(g, v)\)</span> denotes the number of sequences with <em>exactly</em> <span class="math inline">\(g\)</span> elements greater than <span class="math inline">\(v\)</span>. Then we can count <span class="math inline">\(c(g, v)\)</span> similarly as before, except it’s even simpler:</p>
 <ol style="list-style-type: decimal">
 <li>First, choose the <span class="math inline">\(g\)</span> indices that will be <span class="math inline">\(&gt; v\)</span>. There are <span class="math inline">\(\binom{w}{g}\)</span> ways to do this. The rest of the elements will be <span class="math inline">\(\le v\)</span>.</li>
 <li>Then, we choose the actual values of the elements <span class="math inline">\(&gt; v\)</span>. There are <span class="math inline">\(g\)</span> values to choose, and each one is an independent choice of a number between <span class="math inline">\(v+1\)</span> and <span class="math inline">\(k\)</span>, so there are <span class="math inline">\((k-v)^g\)</span> ways to do this.</li>
@@ -266,17 +266,17 @@ <h3 id="computing-mathitcount_vi-more-quickly">Computing <span class="math inlin
 </div>
 <p></details></p>
 <p><details class="editorial-section"><summary class="h2">Subtask 3</summary></p>
-<p>The main change from Subtask 2 to Subtask 3 is that <span class="math inline">\(w\)</span> is vastly increased, which means the portion of our previous algorithm that takes <span class="math inline">\(\approx wk\)</span> steps is now unacceptable. Let's recap what those steps are:</p>
+<p>The main change from Subtask 2 to Subtask 3 is that <span class="math inline">\(w\)</span> is vastly increased, which means the portion of our previous algorithm that takes <span class="math inline">\(\approx wk\)</span> steps is now unacceptable. Let’s recap what those steps are:</p>
 <ol style="list-style-type: decimal">
 <li>precomputing factorials up to <span class="math inline">\(w\)</span>, and</li>
 <li>precomputing powers up to base <span class="math inline">\(k\)</span> and up to exponent <span class="math inline">\(w\)</span>.</li>
 </ol>
-<p>Among these, the second one clearly dominates the running time. But we can essentially get rid of the second one by simply <em>not</em> precomputing powers, and instead just fast exponentiation to compute them when needed! This makes the running time slightly worse—fast exponentiation takes <span class="math inline">\(\mathcal{O}(\lg w)\)</span> steps for an exponent the size of <span class="math inline">\(w\)</span>—but that's a very worthwhile tradeoff, because you can check that the number of steps improves from <span class="math inline">\(\mathcal{O}(wk + n^2 k)\)</span> to <span class="math display">\[\mathcal{O}(w + n^2 k + nk \lg w).\]</span> This is now acceptable for Subtask 3 🙂.</p>
-<p>Now, there's still that factor <span class="math inline">\(w\)</span> in the running time, which in the current subtask is probably ok since <span class="math inline">\(w = 10^8\)</span>. However, in later subtasks, <span class="math inline">\(w = 10^{16}\)</span>, which suggests that that bit can still be improved further.</p>
-<p>How can we improve it? Well, the main reason for needing factorials up to <span class="math inline">\(w\)</span> is so that we can compute binomial coefficients. But looking closer, notice that we actually only need binomial coefficients <strong>at exactly row <span class="math inline">\(w\)</span></strong>. Furthermore, we actually only need the first <span class="math inline">\(n\)</span> coefficients in it. And as it turns out, there's a way to compute a row of binomial coefficients one by one, starting from the leftmost one, by using the following recurrence (which is easy to prove using the factorial formula): <span class="math display">\[\binom{w}{g} = \binom{w}{g - 1}\cdot \frac{w - g + 1}{g},\]</span> with base case simply <span class="math inline">\(\binom{w}{0} = 1\)</span>. So now, instead of precomputing factorials, we may simply precompute the needed binomial coefficients using this recurrence with just <span class="math inline">\(\approx n\)</span> steps! The running time then improves to <span class="math display">\[\mathcal{O}(n^2 k + nk \lg w),\]</span> which is really cool.</p>
+<p>Among these, the second one clearly dominates the running time. But we can essentially get rid of the second one by simply <em>not</em> precomputing powers, and instead just fast exponentiation to compute them when needed! This makes the running time slightly worse—fast exponentiation takes <span class="math inline">\(\mathcal{O}(\lg w)\)</span> steps for an exponent the size of <span class="math inline">\(w\)</span>—but that’s a very worthwhile tradeoff, because you can check that the number of steps improves from <span class="math inline">\(\mathcal{O}(wk + n^2 k)\)</span> to <span class="math display">\[\mathcal{O}(w + n^2 k + nk \lg w).\]</span> This is now acceptable for Subtask 3 🙂.</p>
+<p>Now, there’s still that term <span class="math inline">\(w\)</span> in the running time, which in the current subtask is probably ok since <span class="math inline">\(w = 10^8\)</span>. However, in later subtasks, <span class="math inline">\(w = 10^{16}\)</span>, which suggests that that bit can still be improved further.</p>
+<p>How can we improve it? Well, the main reason for needing factorials up to <span class="math inline">\(w\)</span> is so that we can compute binomial coefficients. But looking closer, notice that we actually only need binomial coefficients <strong>at exactly row <span class="math inline">\(w\)</span></strong>. Furthermore, we actually only need the first <span class="math inline">\(n\)</span> coefficients in it. And as it turns out, there’s a way to compute a row of binomial coefficients one by one, starting from the leftmost one, by using the following recurrence (which is easy to prove using the factorial formula): <span class="math display">\[\binom{w}{g} = \binom{w}{g - 1}\cdot \frac{w - g + 1}{g},\]</span> with base case simply <span class="math inline">\(\binom{w}{0} = 1\)</span>. So now, instead of precomputing factorials, we may simply precompute the needed binomial coefficients using this recurrence with just <span class="math inline">\(\approx n\)</span> steps! The running time then improves to <span class="math display">\[\mathcal{O}(n^2 k + nk \lg w),\]</span> which is really cool.</p>
 <p></details></p>
 <p><details class="editorial-section"><summary class="h2">Subtasks 4 &amp; 5</summary></p>
-<p>Our previous algorithm is now too slow; in particular, that <span class="math inline">\(\mathcal{O}(n^2 k)\)</span> bit in the running time is now too large. For the rest of the subtasks, I'll just give a couple of hints to guide you towards faster solutions.</p>
+<p>Our previous algorithm is now too slow; in particular, that <span class="math inline">\(\mathcal{O}(n^2 k)\)</span> bit in the running time is now too large. For the rest of the subtasks, I’ll just give a couple of hints to guide you towards faster solutions.</p>
 <p><details class="task"><summary class="h4">Hint 1</summary> Do you really have to compute the whole sum <span class="math display">\[\mathit{count}_{\le v}(i) = \sum_{g=0}^{i-1} c(g, v)\]</span> every time? </details></p>
 <p><details class="task"><summary class="h4">Hint 2</summary> Notice that <span class="math display">\[(k - v)^g\cdot v^{w - g} = v^w \cdot \left(\frac{k - v}{v}\right)^g.\]</span> Letting <span class="math inline">\(x_v := \frac{k - v}{v}\)</span>, this is the same as <span class="math inline">\(v^w x_v^g\)</span>. </details></p>
 <p></details></p>
@@ -287,7 +287,7 @@ <h3 id="computing-mathitcount_vi-more-quickly">Computing <span class="math inlin
 <li><strong>Additivity:</strong> <span class="math inline">\(\operatorname{E}[X_1 + X_2] = \operatorname{E}[X_1] + \operatorname{E}[X_2]\)</span> for any two random variables <span class="math inline">\(X_1\)</span> and <span class="math inline">\(X_2\)</span>.</li>
 </ul>
 <p>The first one is simple enough, and you should be able to prove it yourself 🙂. The real surprise is the second, which holds even if <span class="math inline">\(X_1\)</span> and <span class="math inline">\(X_2\)</span> are not independent. (For independent variables, this may not be a surprise, since “clearly” the variables have nothing to do with each other,<a href="#fn2" class="footnoteRef" id="fnref2"><sup>2</sup></a> so the averages should “just add up.”)</p>
-<p>Let's see an example of this, using our current problem itself, with <span class="math inline">\(n = 2\)</span>, <span class="math inline">\(w = 3\)</span> and <span class="math inline">\(k = 2\)</span>. In this case, we have <span class="math display">\[T = T_1 + T_2\]</span> where <span class="math inline">\(T_i\)</span> is the value of the <span class="math inline">\(i\)</span>th largest element. Clearly, <span class="math inline">\(T_1\)</span> and <span class="math inline">\(T_2\)</span> are not independent; for example, we know that <span class="math inline">\(T_1\)</span> is at least <span class="math inline">\(T_2\)</span>, so if <span class="math inline">\(T_2\)</span> is <span class="math inline">\(2\)</span>, then <span class="math inline">\(T_1\)</span> must be <span class="math inline">\(2\)</span> as well.</p>
+<p>Let’s see an example of this, using our current problem itself, with <span class="math inline">\(n = 2\)</span>, <span class="math inline">\(w = 3\)</span> and <span class="math inline">\(k = 2\)</span>. In this case, we have <span class="math display">\[T = T_1 + T_2\]</span> where <span class="math inline">\(T_i\)</span> is the value of the <span class="math inline">\(i\)</span>th largest element. Clearly, <span class="math inline">\(T_1\)</span> and <span class="math inline">\(T_2\)</span> are not independent; for example, we know that <span class="math inline">\(T_1\)</span> is at least <span class="math inline">\(T_2\)</span>, so if <span class="math inline">\(T_2\)</span> is <span class="math inline">\(2\)</span>, then <span class="math inline">\(T_1\)</span> must be <span class="math inline">\(2\)</span> as well.</p>
 <p>Regardless, we will now illustrate that <span class="math display">\[\operatorname{E}[T] = \operatorname{E}[T_1] + \operatorname{E}[T_2]\]</span> by simply enumerating all <span class="math inline">\(2^3 = 8\)</span> possible sequences:</p>
 <ul>
 <li>For <span class="math inline">\([1, 1, 1]\)</span>, we have <span class="math inline">\(T_1 = 1\)</span>, <span class="math inline">\(T_2 = 1\)</span> and <span class="math inline">\(T = 2\)</span>;</li>
@@ -304,7 +304,7 @@ <h3 id="computing-mathitcount_vi-more-quickly">Computing <span class="math inlin
 \operatorname{E}[T_2] &amp;= \frac{1 + 1 + 1 + 2 + 1 + 2 + 2 + 2}{8} = 1.5;\\
 \operatorname{E}[T]   &amp;= \frac{2 + 3 + 3 + 4 + 3 + 4 + 4 + 4}{8} = 3.375,
 \end{align*}\]</span> and sure enough, <span class="math inline">\(3.375 = 1.875 + 1.5\)</span>, even though <span class="math inline">\(T_1\)</span> and <span class="math inline">\(T_2\)</span> are not independent.</p>
-<p>But actually, this little calculation illustrates pretty well <em>why</em> expectation is additive; we're simply adding the same things in different ways! To illustrate this further, we can tabulate everything as follows: <span class="math display">\[\begin{array}{l|l|lll}
+<p>But actually, this little calculation illustrates pretty well <em>why</em> expectation is additive; we’re simply adding the same things in different ways! To illustrate this further, we can tabulate everything as follows: <span class="math display">\[\begin{array}{l|l|lll}
 s &amp; p_s &amp; T_1 &amp; T_2 &amp; T \\
 \hline
 [1, 1, 1] &amp; \frac{1}{8} &amp; 1 &amp; 1 &amp; 2 \\
@@ -326,8 +326,8 @@ <h3 id="computing-mathitcount_vi-more-quickly">Computing <span class="math inlin
 [2, 1, 2] &amp; \frac{2}{8} &amp; \frac{2}{8} &amp; \frac{4}{8} \\
 [2, 2, 1] &amp; \frac{2}{8} &amp; \frac{2}{8} &amp; \frac{4}{8} \\
 [2, 2, 2] &amp; \frac{2}{8} &amp; \frac{2}{8} &amp; \frac{4}{8}.
-\end{array}\]</span> and note that the <span class="math inline">\(p_sT\)</span> column is still the sum of the <span class="math inline">\(p_sT_1\)</span> and <span class="math inline">\(p_sT_2\)</span> columns. Finally, computing <span class="math inline">\(\operatorname{E}[T]\)</span> amounts to taking the <em>sum</em> of the <span class="math inline">\(p_sT\)</span> column, while computing <span class="math inline">\(\operatorname{E}[T_1] + \operatorname{E}[T_2]\)</span> amounts to taking the sum of the <span class="math inline">\(p_sT_1\)</span> and <span class="math inline">\(p_sT_2\)</span> columns separately, then adding them. But these are clearly the same! (And this worked even if <span class="math inline">\(T_1\)</span> and <span class="math inline">\(T_2\)</span> aren't independent.)</p>
-<p>It should now not be too hard to formalize this argument and make it more general. If you're interested, here it is: <details class="proof"><summary class="h4">Proof</summary></p>
+\end{array}\]</span> and note that the <span class="math inline">\(p_sT\)</span> column is still the sum of the <span class="math inline">\(p_sT_1\)</span> and <span class="math inline">\(p_sT_2\)</span> columns. Then computing <span class="math inline">\(\operatorname{E}[T]\)</span> amounts to taking the sum of the <span class="math inline">\(p_sT\)</span> column, while computing <span class="math inline">\(\operatorname{E}[T_1] + \operatorname{E}[T_2]\)</span> amounts to taking the sums of the <span class="math inline">\(p_sT_1\)</span> and <span class="math inline">\(p_sT_2\)</span> columns separately, then adding them. But these are clearly the same! (And this worked even if <span class="math inline">\(T_1\)</span> and <span class="math inline">\(T_2\)</span> aren’t independent.)</p>
+<p>It should now not be too hard to formalize this argument and make it more general. If you’re interested, here it is: <details class="proof"><summary class="h4">Proof</summary></p>
 <p>Suppose the sample space has <span class="math inline">\(k\)</span> elements <span class="math inline">\(\{\omega_1, \omega_2, \ldots, \omega_k\}\)</span> with respective probabilities <span class="math inline">\(p_1, p_2, \ldots, p_k\)</span>. Because <span class="math inline">\(T = T_1 + T_2\)</span>, we must always have <span class="math inline">\(T(\omega_i) = T_1(\omega_i) + T_2(\omega_i),\)</span> for every <span class="math inline">\(i\)</span>.</p>
 <p>Thus, by the <a href="https://en.wikipedia.org/wiki/Law_of_the_unconscious_statistician">law of the unconscious statistician</a>, <span class="math display">\[\begin{align*}
 \operatorname{E}[T]
@@ -346,16 +346,16 @@ <h3 id="computing-mathitcount_vi-more-quickly">Computing <span class="math inlin
 <p></details></p>
 <p><details class="editorial-section"><summary class="h2">Bonus: Computing rational numbers modulo <span class="math inline">\(998244353\)</span></summary></p>
 <p>All solutions we described above compute the <em>full</em> answer, i.e., we pretend we were working on <span class="math inline">\(\mathbb{R}\)</span> (or maybe <span class="math inline">\(\mathbb{C}\)</span>) where we can add, subtract, multiply, and crucially, divide, numbers. Actually, we could also pretend we are working on <span class="math inline">\(\mathbb{Q}\)</span>, i.e., the rationals, since all intermediate results are clearly rational, and we can also do the same arithmetic operations there.</p>
-<p>Now, in many problems, we can usually convert such full-answer solutions into solutions that compute the answer mod <span class="math inline">\(m\)</span>, say <span class="math inline">\(m = 998244353\)</span>, because we can also add, subtract and multiply numbers mod <span class="math inline">\(m\)</span>. However, division mod <span class="math inline">\(m\)</span> is more complicated; it sometimes doesn't work at all. To see this, let <span class="math inline">\(m = 10\)</span>, and note that <span class="math inline">\(12 \equiv 32 \pmod{10}\)</span>, but dividing by <span class="math inline">\(4\)</span> fails: <span class="math display">\[\frac{12}{4} = 3 \not\equiv 8 = \frac{32}{4} \pmod{10}.\]</span></p>
+<p>Now, in many problems, we can usually convert such full-answer solutions into solutions that compute the answer mod <span class="math inline">\(m\)</span>, say <span class="math inline">\(m = 998244353\)</span>, because we can also add, subtract and multiply numbers mod <span class="math inline">\(m\)</span>. However, division mod <span class="math inline">\(m\)</span> is more complicated; it sometimes doesn’t work at all. To see this, let <span class="math inline">\(m = 10\)</span>, and note that <span class="math inline">\(12 \equiv 32 \pmod{10}\)</span>, but dividing by <span class="math inline">\(4\)</span> fails: <span class="math display">\[\frac{12}{4} = 3 \not\equiv 8 = \frac{32}{4} \pmod{10}.\]</span></p>
 <h3 id="computing-ab-bmod-m-by-trial-and-error">Computing <span class="math inline">\(a/b \bmod m\)</span> by trial and error</h3>
-<p>Before we tackle this issue, let's first see if we can compute <span class="math inline">\(a/b \bmod m\)</span> based solely on the definition given in the problem statement. Suppose you've computed the full answer as <span class="math inline">\(a/b\)</span>, and let's say it's in lowest terms. Then the problem guarantees us that <span class="math inline">\(a/b \bmod m\)</span> is well-defined, and it is the unique number <span class="math inline">\(q\)</span> such that “<span class="math inline">\(a/b - q = \frac{a - qb}{b}\)</span> is divisible by <span class="math inline">\(m\)</span>”, which by definition means that <span class="math inline">\(\frac{a - qb}{b}\)</span> can be written as a fraction whose numerator is divisible by <span class="math inline">\(m\)</span> but whose denominator is not. Now, the fraction <span class="math inline">\(\frac{a - qb}{b}\)</span> is already in lowest terms (why?), so this means two things:</p>
+<p>Before we tackle this issue, let’s first see if we can compute <span class="math inline">\(a/b \bmod m\)</span> based solely on the definition given in the problem statement. Suppose you’ve computed the full answer as <span class="math inline">\(a/b\)</span>, and let’s say it’s in lowest terms. Then the problem guarantees us that <span class="math inline">\(a/b \bmod m\)</span> is well-defined, and it is the unique number <span class="math inline">\(q\)</span> such that “<span class="math inline">\(a/b - q = \frac{a - qb}{b}\)</span> is divisible by <span class="math inline">\(m\)</span>”, which by definition means that <span class="math inline">\(\frac{a - qb}{b}\)</span> can be written as a fraction whose numerator is divisible by <span class="math inline">\(m\)</span> but whose denominator is not. Now, the fraction <span class="math inline">\(\frac{a - qb}{b}\)</span> is already in lowest terms (why?), so this means two things:</p>
 <ul>
-<li><span class="math inline">\(b\)</span> must not be divisible by <span class="math inline">\(m\)</span> (which we can check), otherwise there's no hope of <span class="math inline">\(\frac{a - qb}{b}\)</span> being divisible by <span class="math inline">\(m\)</span>.</li>
+<li><span class="math inline">\(b\)</span> must not be divisible by <span class="math inline">\(m\)</span> (which we can check), otherwise there’s no hope of <span class="math inline">\(\frac{a - qb}{b}\)</span> being divisible by <span class="math inline">\(m\)</span>.</li>
 <li><span class="math inline">\(a - qb\)</span> is divisible by <span class="math inline">\(m\)</span>. To find such a <span class="math inline">\(q\)</span>, we could simply use brute force: check each <span class="math inline">\(q\)</span> from <span class="math inline">\(0\)</span> to <span class="math inline">\(m-1\)</span> and find one where <span class="math inline">\(a - qb\)</span> is divisible by <span class="math inline">\(m\)</span>. The problem statement says that such a <span class="math inline">\(q\)</span> is unique.</li>
 </ul>
-<p>All in all, this takes <span class="math inline">\(\approx m\)</span> steps in the worst case to find <span class="math inline">\(q\)</span>, which is the answer we're looking for. With <span class="math inline">\(m = 998244353 \approx 10^9\)</span>, that isn't so bad, especially if <span class="math inline">\(a/b\)</span> doesn't have too many digits. So for Subtasks 1 and 2, that's more-or-less okay. But for the larger subtasks the numbers become too large<a href="#fn3" class="footnoteRef" id="fnref3"><sup>3</sup></a> which makes it not okay, and we clearly need to do something else.</p>
+<p>All in all, this takes <span class="math inline">\(\approx m\)</span> steps in the worst case to find <span class="math inline">\(q\)</span>, which is the answer we’re looking for. With <span class="math inline">\(m = 998244353 \approx 10^9\)</span>, that isn’t so bad, especially if <span class="math inline">\(a/b\)</span> doesn’t have too many digits. So for Subtasks 1 and 2, that’s more-or-less okay. But for the larger subtasks, the numbers become too large<a href="#fn3" class="footnoteRef" id="fnref3"><sup>3</sup></a> which makes it not okay, and we clearly need to do something else.</p>
 <h3 id="working-modulo-m">Working “modulo <span class="math inline">\(m\)</span>”</h3>
-<p>You might suspect that the reason that dividing by <span class="math inline">\(4\)</span> failed modulo <span class="math inline">\(10\)</span> is that <span class="math inline">\(4\)</span> and <span class="math inline">\(10\)</span> share a common factor. And indeed, that's a good hunch. For example, dividing by <span class="math inline">\(3\)</span> seems to work modulo <span class="math inline">\(10\)</span>, which you can check with lots of small examples, or maybe by using a program to do several checks for you, e.g.: <details class="code"><summary class="h4">Code (Python)</summary></p>
+<p>You might suspect that the reason that dividing by <span class="math inline">\(4\)</span> failed modulo <span class="math inline">\(10\)</span> is that <span class="math inline">\(4\)</span> and <span class="math inline">\(10\)</span> share a common factor. And indeed, that’s a good hunch. For example, dividing by <span class="math inline">\(3\)</span> seems to work modulo <span class="math inline">\(10\)</span>, which you can check with lots of small examples, or maybe by using a program to do several checks for you, e.g.: <details class="code"><summary class="h4">Code (Python)</summary></p>
 <div class="sourceCode"><pre class="sourceCode python"><code class="sourceCode python"><span class="im">from</span> math <span class="im">import</span> gcd
 
 <span class="kw">def</span> congruent(m, a, b):
@@ -380,7 +380,7 @@ <h3 id="working-modulo-m">Working “modulo <span class="math inline">\(m\)</spa
                 <span class="cf">assert</span> congruent(m, num1 <span class="op">//</span> den, num2 <span class="op">//</span> den)
 
 <span class="bu">print</span>(<span class="st">&quot;All OK&quot;</span>)</code></pre></div>
-<p></details> You can replace <code class="sourceCode python">m <span class="op">=</span> <span class="dv">10</span></code> with other numbers and it still seems to work! So clearly, there seems to be some sense in which division “kinda makes sense”, as long as the number you're dividing with is coprime with the modulus <span class="math inline">\(m\)</span>.</p>
+<p></details> You can replace <code class="sourceCode python">m <span class="op">=</span> <span class="dv">10</span></code> with other numbers and it still seems to work! So clearly, there seems to be some sense in which division “kinda makes sense”, as long as the number you’re dividing with is coprime with the modulus <span class="math inline">\(m\)</span>.</p>
 <p>And as it turns out, we can prove that fact!</p>
 <div class="theorem">
 <p><strong>Theorem A:</strong> If <span class="math inline">\(da \equiv db \pmod{m}\)</span> and <span class="math inline">\(\gcd(m, d) = 1\)</span>, then <span class="math inline">\(a \equiv b \pmod{m}\)</span>.</p>
@@ -388,60 +388,65 @@ <h3 id="working-modulo-m">Working “modulo <span class="math inline">\(m\)</spa
 <div class="proof">
 <p><strong>Proof:</strong> Fairly straightforward, so we leave it to the reader.</p>
 </div>
-<p>Now that's all well and good, but what we really want is to be able to <em>divide</em> modulo <span class="math inline">\(m\)</span>. For this, we should answer the following question first: <em>what is division, really</em>? Well, dividing is the same as multiplying by the <em>multiplicative inverse</em>, that is, <span class="math inline">\(a/b\)</span> is the same as <span class="math inline">\(ab^{-1}\)</span>, where <span class="math inline">\(b^{-1} = 1/b\)</span> is the multiplicative inverse of <span class="math inline">\(b\)</span>. But what is a multiplicative inverse? Well, <span class="math inline">\(b^{-1}\)</span> is defined as the unique number such that <span class="math inline">\(bb^{-1} = 1\)</span>.</p>
-<p>Now, as it turns out, multiplicative inverses sometimes exist modulo <span class="math inline">\(m\)</span>. In the mod <span class="math inline">\(m\)</span> world, the multiplicative inverse of <span class="math inline">\(b\)</span> is still denoted <span class="math inline">\(b^{-1}\)</span>, but this time, it's not a fraction. Nonetheless, it's still defined analogously; <span class="math inline">\(b^{-1}\)</span> is the “unique” number such that <span class="math display">\[bb^{-1} \equiv 1 \pmod{m}.\]</span> Note that I put “unique” in quotes because if <span class="math inline">\(x\)</span> is a multiplicative inverse, then <span class="math inline">\(x + m\)</span> is also one, as is <span class="math inline">\(x + 2m\)</span>, <span class="math inline">\(x - m\)</span>, etc. But as it turns out, all these numbers are the same mod <span class="math inline">\(m\)</span>, which is what we mean by “unique“ here.</p>
-We can actually prove that fact, and in fact, something stronger:
+<p>Now Theorem A’s nice and all, but it only allows us to divide by <span class="math inline">\(d\)</span> if the number was already divisible divisible by <span class="math inline">\(d\)</span>. What we really want is to be able to divide modulo <span class="math inline">\(m\)</span> anytime we want, that is, makes sense of things like <span class="math display">\[7/3 \bmod 10.\]</span> In this particular example, even though <span class="math inline">\(7/3\)</span> is not an integer, note that <span class="math inline">\(7 \equiv 27 \pmod{10}\)</span> and <span class="math inline">\(27/3 = 9\)</span> is an integer, so if Theorem A were to extend even to non-integer settings, then we ought to have <span class="math inline">\(7/3 \equiv 27/3 \pmod{10}\)</span>, so <span class="math inline">\(7/3 \bmod 10\)</span> ought to be <span class="math inline">\(9\)</span>. We’d like to generalize this reasoning.</p>
+<p>For this, we should answer the following question first: <em>what is division, really</em>? Well, dividing is the same as multiplying by the <em>multiplicative inverse</em>, that is, <span class="math inline">\(a/b\)</span> is the same as <span class="math inline">\(ab^{-1}\)</span>, where <span class="math inline">\(b^{-1} = 1/b\)</span> is the multiplicative inverse of <span class="math inline">\(b\)</span>. But what is a multiplicative inverse? Well, <span class="math inline">\(b^{-1}\)</span> is defined as the unique number such that <span class="math inline">\(bb^{-1} = 1\)</span>.</p>
+<p>Now, as it turns out, multiplicative inverses <em>sometimes</em> exist modulo <span class="math inline">\(m\)</span>. In the mod <span class="math inline">\(m\)</span> world, the multiplicative inverse of <span class="math inline">\(b\)</span> is still denoted <span class="math inline">\(b^{-1}\)</span>, but this time, it’s not a fraction. Nonetheless, it’s still defined analogously; <span class="math inline">\(b^{-1}\)</span> is the “unique” number such that <span class="math display">\[bb^{-1} \equiv 1 \pmod{m}.\]</span> Note that I put “unique” in quotes because if <span class="math inline">\(x\)</span> is a multiplicative inverse, then <span class="math inline">\(x + m\)</span> is also one, as is <span class="math inline">\(x + 2m\)</span>, <span class="math inline">\(x - m\)</span>, etc. But as it turns out, all these numbers are the same mod <span class="math inline">\(m\)</span>, which is what we mean by “unique” here.</p>
+We can actually prove that fact, and in fact, something stronger; we can say precisely when there’s a multiplicative inverse:
 <div class="theorem">
-<p><strong>Theorem B:</strong> <span class="math inline">\(b\)</span> has a multiplicative inverse if and only if <span class="math inline">\(b\)</span> and <span class="math inline">\(m\)</span> and coprime, and it is unique if it exists.</p>
+<p><strong>Theorem B:</strong> For any <span class="math inline">\(b \in \mathbb{Z}\)</span>, <span class="math inline">\(b\)</span> has a multiplicative inverse if and only if <span class="math inline">\(b\)</span> and <span class="math inline">\(m\)</span> and coprime, and it is unique (mod <span class="math inline">\(m\)</span>) if it exists.</p>
 </div>
 <p><details class="proof"><summary class="h4">Proof</summary></p>
-<p>(⇒) Suppose <span class="math inline">\(b\)</span> has a multiplicative inverse <span class="math inline">\(b&#39;\)</span>, so that <span class="math display">\[bb&#39; \equiv 1 \pmod{m}.\]</span> This is equivalent to saying that there's a <span class="math inline">\(k\)</span> such that <span class="math display">\[bb&#39; - mk = 1.\]</span> Now, if <span class="math inline">\(d\)</span> is a common divisor of <span class="math inline">\(b\)</span> and <span class="math inline">\(m\)</span>, then <span class="math inline">\(d\)</span> divides the left-hand side, so it must also divide the right-hand side, which is <span class="math inline">\(1\)</span>. Thus, all common divisors of <span class="math inline">\(b\)</span> and <span class="math inline">\(m\)</span> divide <span class="math inline">\(1\)</span>, which means they are coprime.</p>
-<p>(⇐) Suppose <span class="math inline">\(b\)</span> and <span class="math inline">\(m\)</span> are coprime, so their gcd is <span class="math inline">\(1\)</span>. By <a href="https://en.wikipedia.org/wiki/B%C3%A9zout%27s_identity">Bézout's</a>, there are integers <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span> such that <span class="math display">\[bx + my = 1.\]</span> Reducing this modulo <span class="math inline">\(m\)</span> gives <span class="math display">\[bx \equiv 1 \pmod{m},\]</span> so <span class="math inline">\(x\)</span> is a multiplicative inverse of <span class="math inline">\(b\)</span>.</p>
+<p>(⇒) Suppose <span class="math inline">\(b\)</span> has a multiplicative inverse <span class="math inline">\(b&#39;\)</span>, so that <span class="math display">\[bb&#39; \equiv 1 \pmod{m}.\]</span> This is equivalent to saying that there’s a <span class="math inline">\(k\)</span> such that <span class="math display">\[bb&#39; - mk = 1.\]</span> Now, if <span class="math inline">\(d\)</span> is a common divisor of <span class="math inline">\(b\)</span> and <span class="math inline">\(m\)</span>, then <span class="math inline">\(d\)</span> divides the left-hand side, so it must also divide the right-hand side, which is <span class="math inline">\(1\)</span>. Thus, all common divisors of <span class="math inline">\(b\)</span> and <span class="math inline">\(m\)</span> divide <span class="math inline">\(1\)</span>, which means they are coprime.</p>
+<p>(⇐) Suppose <span class="math inline">\(b\)</span> and <span class="math inline">\(m\)</span> are coprime, so their gcd is <span class="math inline">\(1\)</span>. By <a href="https://en.wikipedia.org/wiki/B%C3%A9zout%27s_identity">Bézout’s</a>, there are integers <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span> such that <span class="math display">\[bx + my = 1.\]</span> Reducing this modulo <span class="math inline">\(m\)</span> gives <span class="math display">\[bx \equiv 1 \pmod{m},\]</span> so <span class="math inline">\(x\)</span> is a multiplicative inverse of <span class="math inline">\(b\)</span>.</p>
 <p>(Uniqueness) Suppose <span class="math inline">\(b&#39;\)</span> and <span class="math inline">\(b&#39;&#39;\)</span> are both multiplicative inverses of <span class="math inline">\(b\)</span>. Then <span class="math display">\[\begin{align*}
     bb&#39; &amp;\equiv 1 \pmod{m} \\
     bb&#39;&#39; &amp;\equiv 1 \pmod{m},
 \end{align*}\]</span> so <span class="math display">\[bb&#39; \equiv bb&#39;&#39; \pmod{m}.\]</span> But <span class="math inline">\(m\)</span> and <span class="math inline">\(b\)</span> are coprime (since a multiplicative inverse exists), so by using Theorem A, <span class="math inline">\(b&#39; \equiv b&#39;&#39; \pmod{m}\)</span>, so any two multiplicative inverses of <span class="math inline">\(b\)</span> are the same mod <span class="math inline">\(m\)</span>.</p>
 <p></details></p>
 <div class="remarks">
-<p><strong>Remark:</strong> The proof can actually be turned into an algorithm to compute the multiplicative inverse, since the integers <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span> guaranteed by Bézout's identity can be computed using the <strong>extended version of Euclid's gcd algorithm.</strong></p>
+<p><strong>Remark:</strong> The proof can actually be turned into an algorithm to compute the multiplicative inverse, since the integers <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span> guaranteed by Bézout’s identity can be computed using the <strong>extended version of Euclid’s gcd algorithm.</strong></p>
 </div>
-<p>So with this, we're now fairly able to “divide modulo <span class="math inline">\(m\)</span>”, as long as the divisors are coprime with <span class="math inline">\(m\)</span>. Since we're using the modulus <span class="math inline">\(m = 998244353\)</span> which is prime, most numbers are coprime! The only ones we can't divide with are those divisible by <span class="math inline">\(m\)</span> itself, but since such numbers are <span class="math inline">\(\equiv 0 \pmod{m}\)</span>, it makes sense not to be able to divide with them since that's sort of equivalent to dividing by <span class="math inline">\(0\)</span>.</p>
-<p>Now, that's well and good, but we still need to relate this way of dividing modulo <span class="math inline">\(m\)</span> with the definition given in the statement. As it turns out, everything is okay; we can prove that <span class="math inline">\(a/b \bmod m\)</span>, as defined in the statement, is the same as <span class="math inline">\(ab^{-1} \bmod m\)</span>, using the following theorem:</p>
+<p>So with this, we’re now fairly able to “divide modulo <span class="math inline">\(m\)</span>”, as long as the divisors are coprime with <span class="math inline">\(m\)</span>. Since we’re using the modulus <span class="math inline">\(m = 998244353\)</span> which is prime, most numbers are coprime! The only ones we can’t divide with are those divisible by <span class="math inline">\(m\)</span> itself, but since such numbers are <span class="math inline">\(\equiv 0 \pmod{m}\)</span>, it makes sense not to be able to divide with them since that’s sort of equivalent to dividing by <span class="math inline">\(0\)</span>.</p>
+<p>Now, that’s well and good, but we still need to relate this way of dividing modulo <span class="math inline">\(m\)</span> with the definition given in the statement. As it turns out, everything is okay; we can prove that <span class="math inline">\(a/b \bmod m\)</span>, as defined in the statement, is the same as <span class="math inline">\(ab^{-1} \bmod m\)</span>, using the following theorem:</p>
 <div class="theorem">
 <p><strong>Theorem C:</strong> For a rational <span class="math inline">\(r\)</span>, <span class="math inline">\(r \bmod m\)</span> exists if and only if <span class="math inline">\(r\)</span> can be written as <span class="math inline">\(a/b\)</span> with <span class="math inline">\(b\)</span> coprime with <span class="math inline">\(m\)</span>, and if it exists, then we have the equality <span class="math display">\[(a/b \bmod m) = (ab^{-1} \bmod m).\]</span></p>
 </div>
-<p>For this theorem to work, we will amend the definition given in the statement as follows: We say a rational is <strong>divisible by <span class="math inline">\(m\)</span></strong> if it can be written as <span class="math inline">\(a/b\)</span> with <span class="math inline">\(a\)</span> divisible by <span class="math inline">\(m\)</span> and <span class="math inline">\(b\)</span> <em>coprime</em> with <span class="math inline">\(m\)</span>. This is equivalent to the definition in the statement when <span class="math inline">\(m\)</span> is prime, but it's friendlier to nonprime moduli.</p>
+<p>For this theorem to work, we will amend the definition given in the statement as follows: We say a rational is <strong>divisible by <span class="math inline">\(m\)</span></strong> if it can be written as <span class="math inline">\(a/b\)</span> with <span class="math inline">\(a\)</span> divisible by <span class="math inline">\(m\)</span> and <span class="math inline">\(b\)</span> <em>coprime</em> with <span class="math inline">\(m\)</span>. This is equivalent to the definition in the statement when <span class="math inline">\(m\)</span> is prime, but it’s friendlier to nonprime moduli.</p>
 <p><details class="proof"><summary class="h4">Proof</summary></p>
-<p>(⇒) Suppose <span class="math inline">\(r \bmod m\)</span> exists, i.e., there's a unique <span class="math inline">\(q\)</span> such that <span class="math inline">\(r - q\)</span> is “divisible by <span class="math inline">\(m\)</span>” (as defined above). Writing <span class="math inline">\(r\)</span> in lowest terms as <span class="math inline">\(a/b\)</span>, we note that <span class="math inline">\(a/b - q = \frac{a - bq}{b}\)</span> is also in lowest terms.</p>
+<p>(⇒) Suppose <span class="math inline">\(r \bmod m\)</span> exists, i.e., there’s a unique <span class="math inline">\(q\)</span> such that <span class="math inline">\(r - q\)</span> is “divisible by <span class="math inline">\(m\)</span>” (as defined above). Writing <span class="math inline">\(r\)</span> in lowest terms as <span class="math inline">\(a/b\)</span>, we note that <span class="math inline">\(a/b - q = \frac{a - bq}{b}\)</span> is also in lowest terms.</p>
 <p>By definition of divisibility, <span class="math inline">\(\frac{a - qb}{b}\)</span> can be written as <span class="math inline">\(a&#39;/b&#39;\)</span> with <span class="math inline">\(m\)</span> dividing <span class="math inline">\(a&#39;\)</span> but coprime with <span class="math inline">\(b&#39;\)</span>. Since <span class="math display">\[\frac{a - qb}{b} = \frac{a&#39;}{b&#39;}\]</span> and the former is in lowest terms, it follows that <span class="math inline">\(a - qb\)</span> is a divisor of <span class="math inline">\(a&#39;\)</span> and <span class="math inline">\(b\)</span> is a divisor of <span class="math inline">\(b&#39;\)</span>. But if <span class="math inline">\(m\)</span> and <span class="math inline">\(b&#39;\)</span> are coprime and <span class="math inline">\(b \mid b&#39;\)</span>, then <span class="math inline">\(m\)</span> and <span class="math inline">\(b\)</span> must be coprime as well.</p>
-<p>(⇐) Suppose <span class="math inline">\(r = a/b\)</span> with <span class="math inline">\(b\)</span> is coprime with <span class="math inline">\(m\)</span>. Then I claim that <span class="math display">\[q := (ab^{-1} \bmod m)\]</span> satisfies the definition of <span class="math inline">\(r \bmod m\)</span>. Note that <span class="math display">\[r - q = \frac{a - qb}{b},\]</span> and we already know <span class="math inline">\(b\)</span> is coprime with <span class="math inline">\(m\)</span>, so it's sufficient to show that <span class="math inline">\(a - qb\)</span> is divisible by <span class="math inline">\(m\)</span>, i.e., <span class="math inline">\(a \equiv qb \pmod{m}\)</span>. That's shown as follows: <span class="math display">\[\begin{align*}
+<p>(⇐) Suppose <span class="math inline">\(r = a/b\)</span> with <span class="math inline">\(b\)</span> is coprime with <span class="math inline">\(m\)</span>. Then I claim that <span class="math display">\[q := (ab^{-1} \bmod m)\]</span> satisfies the definition of <span class="math inline">\(r \bmod m\)</span>. Note that <span class="math display">\[r - q = \frac{a - qb}{b},\]</span> and we already know <span class="math inline">\(b\)</span> is coprime with <span class="math inline">\(m\)</span>, so it’s sufficient to show that <span class="math inline">\(a - qb\)</span> is divisible by <span class="math inline">\(m\)</span>, i.e., <span class="math inline">\(a \equiv qb \pmod{m}\)</span>. That’s shown as follows: <span class="math display">\[\begin{align*}
 qb 
 &amp;\equiv (ab^{-1})b \\
 &amp;\equiv a(bb^{-1}) \\
 &amp;\equiv a\cdot(1) \\
 &amp;= a \pmod{m}.
 \end{align*}\]</span></p>
-<p>So <span class="math inline">\(r - q\)</span> is indeed divisible by <span class="math inline">\(m\)</span>. All that remains is to show that <span class="math inline">\(q\)</span> is the unique one satisfying the definition. If <span class="math inline">\(q&#39;\)</span> also satisfies the definition, then <span class="math inline">\(\frac{a - q&#39;b}{b}\)</span> is also divisible by <span class="math inline">\(m\)</span>, so we can write it as <span class="math display">\[\frac{a - q&#39;b}{b} = \frac{a&#39;}{b&#39;}\]</span> with <span class="math inline">\(m\)</span> dividing <span class="math inline">\(a&#39;\)</span> and coprime with <span class="math inline">\(b&#39;\)</span>. Rearranging this gives <span class="math display">\[(a - q&#39;b)b&#39; = a&#39;b.\]</span> Because <span class="math inline">\(m \mid a&#39;\)</span>, <span class="math inline">\(m\)</span> must divide the left-hand side <span class="math inline">\((a - q&#39;b)b&#39;\)</span> as well, but since <span class="math inline">\(m\)</span> and <span class="math inline">\(b&#39;\)</span> are coprime, <span class="math inline">\(m\)</span> must divide <span class="math inline">\(a - q&#39;b\)</span>, i.e., <span class="math display">\[a \equiv q&#39;b \pmod{m}.\]</span> Multiplying both sides by <span class="math inline">\(b^{-1}\)</span>, we get <span class="math display">\[q&#39; \equiv ab^{-1} \equiv q \pmod{m}.\]</span> In other words, any other possible value <span class="math inline">\(q&#39;\)</span> of <span class="math inline">\((r \bmod m)\)</span> must be equal to <span class="math inline">\(q = (ab^{-1} \bmod m)\)</span>, so it's unique. </details></p>
+<p>So <span class="math inline">\(r - q\)</span> is indeed divisible by <span class="math inline">\(m\)</span>. All that remains is to show that <span class="math inline">\(q\)</span> is the unique one satisfying the definition. If <span class="math inline">\(q&#39;\)</span> also satisfies the definition, then <span class="math inline">\(\frac{a - q&#39;b}{b}\)</span> is also divisible by <span class="math inline">\(m\)</span>, so we can write it as <span class="math display">\[\frac{a - q&#39;b}{b} = \frac{a&#39;}{b&#39;}\]</span> with <span class="math inline">\(m\)</span> dividing <span class="math inline">\(a&#39;\)</span> and coprime with <span class="math inline">\(b&#39;\)</span>. Rearranging this gives <span class="math display">\[(a - q&#39;b)b&#39; = a&#39;b.\]</span> Because <span class="math inline">\(m \mid a&#39;\)</span>, reducing this modulo <span class="math inline">\(m\)</span> gives <span class="math display">\[\begin{align*}
+(a - q&#39;b)b&#39; &amp;\equiv 0 \pmod{m} \\
+a - q&#39;b &amp;\equiv 0 &amp;&amp; \text{using Theorem A} \\
+a   &amp;\equiv q&#39;b
+\end{align*}\]</span> Multiplying both sides by <span class="math inline">\(b^{-1}\)</span>, we get <span class="math display">\[q&#39; \equiv ab^{-1} \equiv q \pmod{m}.\]</span> In other words, any other possible value <span class="math inline">\(q&#39;\)</span> of <span class="math inline">\((r \bmod m)\)</span> must be equal to <span class="math inline">\(q = (ab^{-1} \bmod m)\)</span>, so it’s unique. </details></p>
 <div class="theorem">
 <p><strong>Corollary:</strong> Suppose <span class="math inline">\(r\)</span> cannot be written as <span class="math inline">\(a/b\)</span> with <span class="math inline">\(b\)</span> coprime with <span class="math inline">\(m\)</span>. Then there is <em>no</em> integer <span class="math inline">\(q\)</span> such that <span class="math inline">\(r - q\)</span> is divisible by <span class="math inline">\(m\)</span>.</p>
 </div>
-<p>Note that this doesn't follow immediately from the definition, since if <span class="math inline">\(r \bmod m\)</span> doesn't exist, then all we can say from the definition is that there isn't <em>exactly one</em> <span class="math inline">\(q\)</span> such that <span class="math inline">\(r - q\)</span> is divisible by <span class="math inline">\(m\)</span>. In particular, there may be zero, or there may be more than one. This corollary rules out the latter.</p>
+<p>Note that this doesn’t follow immediately from the definition, since if <span class="math inline">\(r \bmod m\)</span> doesn’t exist, then all we can say from the definition is that there isn’t <em>exactly one</em> <span class="math inline">\(q\)</span> such that <span class="math inline">\(r - q\)</span> is divisible by <span class="math inline">\(m\)</span>. In particular, there may be zero, or there may be more than one. This corollary rules out the latter.</p>
 <p><details class="proof"><summary class="h4">Proof</summary></p>
 <p>We prove the contrapositive.</p>
-<p>Suppose there is a <span class="math inline">\(q\)</span> such that <span class="math inline">\(r - q\)</span> is divisible by <span class="math inline">\(m\)</span>. Notice that the “(⇒)” portion of the previous proof doesn't really use the fact that <span class="math inline">\(q\)</span> is unique, so the proof also goes through here just fine, and it proves that <span class="math inline">\(r\)</span> can be written as <span class="math inline">\(a/b\)</span> with <span class="math inline">\(b\)</span> coprime with <span class="math inline">\(m\)</span>. </details></p>
-<p>With this, we can now completely work modulo <span class="math inline">\(m = 998244353\)</span> all throughout! All that we need now is to check that we're only ever dividing with numbers without <span class="math inline">\(m\)</span> as a prime factor. The possible divisors come from <span class="math inline">\(k^w\)</span> and the numbers coming from the computation of <span class="math inline">\(\binom{w}{g}\)</span> with <span class="math inline">\(g &lt; n\)</span>. The number <span class="math inline">\(k\)</span> is less than <span class="math inline">\(m\)</span> in all inputs, so <span class="math inline">\(k^w\)</span> is coprime with <span class="math inline">\(m\)</span>. And in the first few subtasks, <span class="math inline">\(w\)</span> is also less than <span class="math inline">\(m\)</span>, so all factors in <span class="math inline">\(\binom{w}{g}\)</span> are all coprime as well. Finally, in the subtasks where <span class="math inline">\(w\)</span> is very large, recall that we're only computing the first <span class="math inline">\(n\)</span> terms of row <span class="math inline">\(w\)</span> of the binomial coefficient table, and that we're using the recurrence <span class="math display">\[\binom{w}{g} = \binom{w}{g - 1}\cdot \frac{w - g + 1}{g},\]</span> so we only need to divide with numbers <span class="math inline">\(g &lt; n\)</span>. Since <span class="math inline">\(n &lt; m\)</span> for all inputs, this is ok too.</p>
+<p>Suppose there is a <span class="math inline">\(q\)</span> such that <span class="math inline">\(r - q\)</span> is divisible by <span class="math inline">\(m\)</span>. Notice that the “(⇒)” portion of the previous proof doesn’t really use the fact that <span class="math inline">\(q\)</span> is unique, so the proof also goes through here just fine, and it proves that <span class="math inline">\(r\)</span> can be written as <span class="math inline">\(a/b\)</span> with <span class="math inline">\(b\)</span> coprime with <span class="math inline">\(m\)</span>. </details></p>
+<p>With this, we can now completely work modulo <span class="math inline">\(m = 998244353\)</span> all throughout! All that we need now is to check that we’re only ever dividing with numbers without <span class="math inline">\(m\)</span> as a prime factor. The possible divisors come from <span class="math inline">\(k^w\)</span> and the numbers coming from the computation of <span class="math inline">\(\binom{w}{g}\)</span> with <span class="math inline">\(g &lt; n\)</span>. The number <span class="math inline">\(k\)</span> is less than <span class="math inline">\(m\)</span> in all inputs, so <span class="math inline">\(k^w\)</span> is coprime with <span class="math inline">\(m\)</span>. And in the first few subtasks, <span class="math inline">\(w\)</span> is also less than <span class="math inline">\(m\)</span>, so all factors in <span class="math inline">\(\binom{w}{g}\)</span> are coprime with <span class="math inline">\(m\)</span> as well. Finally, in the subtasks where <span class="math inline">\(w\)</span> is very large, recall that we’re only computing the first <span class="math inline">\(n\)</span> terms of row <span class="math inline">\(w\)</span> of the binomial coefficient table, and that we’re using the recurrence <span class="math display">\[\binom{w}{g} = \binom{w}{g - 1}\cdot \frac{w - g + 1}{g},\]</span> so we only need to divide with numbers <span class="math inline">\(g &lt; n\)</span>. Since <span class="math inline">\(n &lt; m\)</span> for all inputs, this is ok too.</p>
 <p>Thus, we can safely divide whenever we need to, and all is well in the world.</p>
 <div class="remarks">
-<p><strong>Remark:</strong> In math, when we're doing this idea of “working modulo <span class="math inline">\(m\)</span>” we usually say we're “working in <span class="math inline">\(\mathbb{Z}/m\mathbb{Z}\)</span>”. Here, “<span class="math inline">\(\mathbb{Z}/m\mathbb{Z}\)</span>” is a formalization of the “set of integers modulo <span class="math inline">\(m\)</span>”. It is just like the integers <span class="math inline">\(\mathbb{Z}\)</span>, but we make two numbers equal iff they are the same mod <span class="math inline">\(m\)</span>. In this setting, we can also add, subtract, and multiply, and we can divide by any number coprime with <span class="math inline">\(m\)</span> (as shown above).</p>
-<p>If <span class="math inline">\(m\)</span> is prime, then this means we can divide by any “nonzero number” (where you need to remember that “nonzero” means “not divisible by <span class="math inline">\(m\)</span>”), which makes <span class="math inline">\(\mathbb{Z}/m\mathbb{Z}\)</span> a <strong>field</strong>, just like <span class="math inline">\(\mathbb{R}\)</span>, <span class="math inline">\(\mathbb{C}\)</span>, and <span class="math inline">\(\mathbb{Q}\)</span>.</p>
+<p><strong>Remark:</strong> In math, when we’re doing this idea of “working modulo <span class="math inline">\(m\)</span>” we usually say we’re “working in <span class="math inline">\(\mathbb{Z}/m\mathbb{Z}\)</span>”. Here, “<span class="math inline">\(\mathbb{Z}/m\mathbb{Z}\)</span>” is a formalization of the “set of integers modulo <span class="math inline">\(m\)</span>”. It is just like the integers <span class="math inline">\(\mathbb{Z}\)</span>, but we make two numbers equal iff they are the same mod <span class="math inline">\(m\)</span>. In this setting, we can also add, subtract, and multiply, and we can divide by any number coprime with <span class="math inline">\(m\)</span> (as shown above).</p>
+<p>If <span class="math inline">\(m\)</span> is prime, then this means we can divide by any “nonzero number” (where you need to remember that “nonzero” means “not divisible by <span class="math inline">\(m\)</span>”), which makes <span class="math inline">\(\mathbb{Z}/m\mathbb{Z}\)</span> behave very much like <span class="math inline">\(\mathbb{R}\)</span>, <span class="math inline">\(\mathbb{C}\)</span>, and <span class="math inline">\(\mathbb{Q}\)</span> where arithmetic operations are all defined except only division by zero; we say it’s a <strong>field</strong>.</p>
 </div>
 <p></details></p>
 <div class="footnotes">
 <hr />
 <ol>
 <li id="fn1"><p>technically, we should say “<em>uniformly randomly, and independently of each other</em>” here...<a href="#fnref1">↩</a></p></li>
-<li id="fn2"><p>technically, <em>independent</em> doesn't really mean <em>has nothing to do with each other</em>; it means more like <em>the probabilities of one are not affected if you know the other</em><a href="#fnref2">↩</a></p></li>
-<li id="fn3"><p>and even if it isn't, the overhead of having to compute with large numbers makes things slower, which we probably couldn't afford for later subtasks<a href="#fnref3">↩</a></p></li>
+<li id="fn2"><p>technically, <em>independent</em> doesn’t really mean <em>has nothing to do with each other</em>; it means more like <em>the probabilities of one are not affected if you know the other</em><a href="#fnref2">↩</a></p></li>
+<li id="fn3"><p>and even if it isn’t, the overhead of having to compute with large numbers makes things slower, which we probably couldn’t afford for later subtasks<a href="#fnref3">↩</a></p></li>
 </ol>
 </div>
 </div>
diff --git a/2023/tama/pillage/index.md b/2023/tama/pillage/index.md
index 98af185..2e8da71 100644
--- a/2023/tama/pillage/index.md
+++ b/2023/tama/pillage/index.md
@@ -17,13 +17,13 @@
 
 ## Remark: $a/b$ modulo $998244353$?
 
-In this editorial, I'll pretend that we're *computing the full answer* (a rational number) instead of *the answer modulo $998244353$*. It turns out that many exact solutions can be adapted to compute the answer modulo something instead. A bonus section below describes how to do it.
+In this editorial, I&rsquo;ll pretend that we&rsquo;re computing *the full answer* (a rational number) instead of *the answer modulo $998244353$*. It turns out that many exact solutions can be adapted to compute the answer modulo something instead. A bonus section below describes how to do it.
 </div>
 
 
 <details class="editorial-section"><summary class="h2">Subtask 1</summary>
 
-For Subtask 1, I'll describe a solution that doesn't use a lot of insights and essentially only uses **dynamic programming** (DP) (aside from the definition of [expected value](https://en.wikipedia.org/wiki/Expected_value)). You could also solve this subtask with *pen and paper* by using the solution for Subtask 2, which is perfectly doable by hand (and easier to implement as well).
+For Subtask 1, I&rsquo;ll describe a solution that doesn&rsquo;t use a lot of insights and essentially only uses **dynamic programming** (DP) (aside from the definition of [expected value](https://en.wikipedia.org/wiki/Expected_value)). You could also solve this subtask with *pen and paper* by using the solution for Subtask 2, which is perfectly doable by hand (and easier to implement as well).
 
 
 ### Expected value &#x21DD; Counting
@@ -36,7 +36,7 @@ The expected value of $X$ can be thought of as the *average* value of $X$, when
 
 Here are some examples:
 
-- If $X$ represents the result of throwing a die, then the possible results are $\{1, 2, \ldots, 6\}$, each with probability $1/6$, so the expected value is
+- If $X$ represents the result of throwing a <span class="definition" data-bs-toggle="tooltip" data-bs-placement="bottom" title="A die is a cube whose sides are marked 1, 2, ..., 6 dots. We usually look at the idealized scenario where all six sides are equally likely to come up when you throw the die.">die</span>, then the possible results are $\{1, 2, \ldots, 6\}$, each with probability $1/6$, so the expected value is
 $$\operatorname{E}[X] = \frac{1}{6}\cdot 1 + \frac{1}{6}\cdot 2 + \ldots + \frac{1}{6}\cdot 6 = \frac{1}{6}(1 + 2 + \ldots + 6) = \frac{21}{6} = 3.5.$$
 - If $Y$ represents the *sum* of the results of throwing two dice, then the possible results are $\{2, 3, 4, \ldots, 12\}$. The probabilities are no longer uniform, e.g., $7$ is much more probable than $2$ or $12$. The full table of probabilities is:
     $$\begin{array}{r|ccccccccccc}
@@ -47,18 +47,18 @@ $$\operatorname{E}[X] = \frac{1}{6}\cdot 1 + \frac{1}{6}\cdot 2 + \ldots + \frac
 and you can check that the expected value of $Y$ is
 $$\operatorname{E}[Y] = \frac{252}{36} = 7.$$
 
-So let's define a random variable $T$ representing the result of the process outlined in the problem statement. The process chooses $w$ numbers randomly[^1] between $1$ and $k$, and $T$ is calculated as the *sum* of the $n$ largest elements, so the possible results are between $n$ and $nk$. If we write the probability of obtaining the result $t$ as $p_t$, then the answer is
+So let&rsquo;s define a random variable $T$ representing the result of the process outlined in the problem statement. The process chooses $w$ numbers randomly[^1] between $1$ and $k$, and $T$ is calculated as the *sum* of the $n$ largest elements, so the possible results are between $n$ and $nk$. If we write the probability of obtaining the result $t$ as $p_t$, then the answer is
 $$\operatorname{E}[T] = \sum_{t=n}^{nk}\, p_t\,t.$$
 So we are done if we can compute $p_t$ for each $t$ from $n$ to $nk$.
 
 Now, the process has $k^w$ possible outcomes&mdash;namely all the sequences of length $w$, each element of which is between $1$ and $k$&mdash;and each of those outcomes is equally likely. Therefore, we can simply *count* the number of outcomes that result in a sum of $t$, then divide by $k^w$ to get the probability. If we write the *number* of sequences whose sum of $n$ largest elements is $t$ as $c_t$, then we simply have
 $$p_t = \frac{c_t}{k^w}.$$
 
-So we've now reduced the problem to computing the $c_t$s. Now, a sum of $t$ can arise in multiple ways. For example, if $n = 3$ and $t = 10$, then the top $3$ values of the sequence (each in sorted order) could be $[2, 3, 5]$, or it could be $[2, 4, 4]$, or $[1, 1, 8]$, or something else. So, to count the number of sequences whose sum of $n$ largest elements is $t$, we need to enumerate all possible sequences of top $n$ values whose sum is $t$, and for each one, count the number of sequences of length $w$ whose sequence of top $n$ values is *that* sequence.
+So we&rsquo;ve now reduced the problem to computing the $c_t$s. Now, a sum of $t$ can arise in multiple ways. For example, if $n = 3$ and $t = 10$, then the top $3$ values of the sequence (each in sorted order) could be $[2, 3, 5]$, or it could be $[2, 4, 4]$, or $[1, 1, 8]$, or something else. So, to count the number of sequences whose sum of $n$ largest elements is $t$, we need to enumerate all possible sequences of top $n$ values whose sum is $t$, and for each one, count the number of sequences of length $w$ whose sequence of top $n$ values is *that* sequence.
 
-If that's confusing, let's formalize a bit. Let's define a **winner sequence** as a *sorted* sequence of $n$ values, each of which is between $1$ and $k$. Winner sequences are exactly the possible &ldquo;sequences of $n$ largest values&rdquo;. Now, if $W$ is a winner sequence, let's define $c(W, w)$ as the number of length-$w$ sequences whose sequence of $n$ largest values is $W$. Then you may check that the following equation holds
+If that&rsquo;s confusing, let&rsquo;s formalize a bit. Let&rsquo;s define a **winner sequence** as a *sorted* sequence of $n$ values, each of which is between $1$ and $k$. Winner sequences are exactly the possible &ldquo;sequences of $n$ largest values&rdquo;. Now, if $W$ is a winner sequence, let&rsquo;s define $c(W, w)$ as the number of length-$w$ sequences whose sequence of $n$ largest values is $W$. Then you may check that the following equation holds
 $$c_t = \sum_{\substack{\text{$W$ is a winner sequence} \\ \mathit{sum}(W) = t}} c(W, w).$$
-Thus, we've further reduced the problem to that of computing $c(W, w)$ across all winner sequences $W$. And as it turns out, for Subtask 1, there aren't that many winner sequences. We can see this by simply enumerating them all (say with a computer). Finding a formula for the number of them isn't that hard either:
+Thus, we&rsquo;ve further reduced the problem to that of computing $c(W, w)$ across all winner sequences $W$. And as it turns out, for Subtask 1, there aren&rsquo;t that many winner sequences. We can see this by simply enumerating them all (say with a computer). Finding a formula for the number of them isn&rsquo;t that hard either:
 <div class="task">
 **Exercise:** Show that the number of winner sequences is exactly $\binom{n + k - 1}{n}$.
 </div>
@@ -69,18 +69,18 @@ For Subtask 1, $n = 5$ and $k = 5$, so $\binom{n + k - 1}{n} = 126$, so there ar
 
 Thinking &ldquo;DP-cally&rdquo;, we now attempt to build the length-$w$ sequence element by element. As we build the sequence, its &ldquo;sequence of $n$ largest elements&rdquo; changes as well.
 
-Let's be more precise. For a sequence $S$, let's call the &ldquo;sequence of $n$ largest elements of $S$&rdquo; its **winning sequence,** and denote it by $W_S$. Now, suppose we insert the value $v$ to $S$. Let's denote the updated sequence by $S + [v]$. Then the winning sequence might change because of $v$. Specifically, the new winning sequence is obtained by *inserting* $v$ to $W_S$ in its proper sorted location, and then dropping the lowest element. (Can you see why?) Let's denote the process of &ldquo;inserting a value $v$ to a sequence $W$ in its proper sorted location, and then dropping the lowest element&rdquo; as a *pushpop* operation, and denote it by $\mathit{pushpop}(W, v)$. Then what we're saying is that the winning sequence of $S + [v]$ is related to the winning sequence of $S$ via a pushpop operation&mdash;specifically,
+Let&rsquo;s be more precise. For a sequence $S$, let&rsquo;s call the &ldquo;sequence of $n$ largest elements of $S$&rdquo; its **winning sequence,** and denote it by $W_S$. Now, suppose we insert the value $v$ to $S$. Let&rsquo;s denote the updated sequence by $S + [v]$. Then the winning sequence might change because of $v$. Specifically, the new winning sequence is obtained by *inserting* $v$ to $W_S$ in its proper sorted location, and then dropping the lowest element. (Can you see why?) Let&rsquo;s denote the process of &ldquo;inserting a value $v$ to a sequence $W$ in its proper sorted location, and then dropping the lowest element&rdquo; as a *pushpop* operation, and denote it by $\mathit{pushpop}(W, v)$. Then what we&rsquo;re saying is that the winning sequence of $S + [v]$ is related to the winning sequence of $S$ via a pushpop operation&mdash;specifically,
 $$W_{S + [v]} = \mathit{pushpop}(W_S, v).$$
 
 We can now think recursively, and find a recurrence for $c(W, w)$, as follows. Every sequence of length $w$ can be obtained by taking a sequence $S$ of length $w - 1$ and then appending some value $v$ (between $1$ to $k$) to it. And as described above, the new winning sequence $W_{S + [v]}$ is just $\mathit{pushpop}(W_S, v)$. Notice that this latter expression only depends on $W_S$, not on $S$ itself. Thus, for each possible *winner* sequence $W'$, we could simply collect the sequences $S$ with $W'$ as their winning sequence, and notice that the new winning sequence must be $\mathit{pushpop}(W', v)$. In other words, we have the equation
 $$c(W, w) = \!\!\!\!\sum_{\substack{W'\,\,\,\, \\ \text{$W'$ is a winner sequence}}} \sum_{\substack{1 \le v \le k \,\,\,\, \\ \mathit{pushpop}(W', v) = W}} \!\!\!\!(\text{number of sequences $S$ of length $w - 1$ whose winning sequence is $W'$}).$$
 But the summand is just $c(W', w - 1)$ by definition! Therefore, we obtain the recurrence
 $$c(W, w) = \sum_{\substack{W'\,\,\,\, \\ \text{$W'$ is a winner sequence}}} \sum_{\substack{1 \le v \le k \,\,\,\, \\ \mathit{pushpop}(W', v) = W}} c(W', w - 1),$$
-and we can use this to compute all $c(W, w')$ we need, via DP: we build a *table* of results, one for each winner sequence $W$ and each $w' \le w$. Each entry of the table can be computed using the summation above. Since our formula for $c(W, w')$ only depends on $c(W', w' - 1)$, i.e., those with a smaller $w'$ value, if we compute the table in increasing order of $w'$, those values have already been computed, and are already on the table. Thus, we'll be able to compute the final result all the way up to $w$, which is what we wanted.
+and we can use this to compute all $c(W, w')$ we need, via DP: we build a *table* of results, one for each winner sequence $W$ and each $w' \le w$. Each entry of the table can be computed using the summation above. Since our formula for $c(W, w')$ only depends on $c(W', w' - 1)$, i.e., those with a smaller $w'$ value, if we compute the table in increasing order of $w'$, those values have already been computed, and are already on the table. Thus, we&rsquo;ll be able to compute the final result all the way up to $w$, which is what we wanted.
 
 Now, as for the base case, you could just directly count the sequences for, say, $w' = n$, since the winning sequence is basically the *sorted* version of the sequence itself. Alternatively, we can use $w' = 0$ as our base case, though we need to think about what the winning sequence of a sequence with less than $n$ elements should be. Well, it makes sense to say that the winning sequence must be the whole sequence as well, just sorted. And instead of a *pushpop* operation, we could simply use a *push* operation, at least while the sequence still has length less than $n$.
 
-With this, we now have a solution! What's the running time? Well, the table has an entry for each $(W, w')$ with $W$ a winner sequence and $w' \le w$. Recall that there are $\binom{n + k - 1}{n}$ winner sequences, so there are $\approx \binom{n + k - 1}{n}w$ entries. Each entry is computed with the sum above, which clearly has at most $\binom{n + k - 1}{n}k$ summands (often much less). Therefore, the amount of steps is roughly proportional to
+With this, we now have a solution! What&rsquo;s the running time? Well, the table has an entry for each $(W, w')$ with $W$ a winner sequence and $w' \le w$. Recall that there are $\binom{n + k - 1}{n}$ winner sequences, so there are $\approx \binom{n + k - 1}{n}w$ entries. Each entry is computed with the sum above, which clearly has at most $\binom{n + k - 1}{n}k$ summands (often much less). Therefore, the amount of steps is roughly proportional to
 $$\approx \binom{n + k - 1}{n}w\cdot \binom{n + k - 1}{n}k = \binom{n + k - 1}{n}^2 wk.$$
 For Subtask 1, this is good enough; my straightforward Python implementation computes the *full* answer in less than one second.
 
@@ -131,7 +131,7 @@ def solve(n, w, k):
 </details>
 
 <div class="remarks">
-**Remark:** The implementation tries to copy our formulas above as closely as possible. As a result, it's highly unoptimized, and there are definitely several improvements that be made. But the main point is that even such unoptimized code is enough to solve the subtask.
+**Remark:** The implementation tries to copy our formulas above as closely as possible. As a result, it&rsquo;s highly unoptimized, and there are definitely several improvements that be made. But the main point is that even such unoptimized code is enough to solve the subtask.
 </div>
 
 </details>
@@ -147,22 +147,22 @@ To find faster solutions, we use something called the &ldquo;**linearity of expe
 - $\operatorname{E}[\alpha X] = \alpha \operatorname{E}[X]$ for any random variable $X$ and any constant $\alpha$, and
 - $\operatorname{E}[X_1 + X_2] = \operatorname{E}[X_1] + \operatorname{E}[X_2]$ for any two random variables $X_1$ and $X_2$.
 
-The first one is quite intuitive; after all, $\alpha X$ is just $X$ with all values scaled by $\alpha$, so the *average* should just be scaled in the same way. However, the second property&mdash;additivity&mdash;may be surprising. The property could be intuitive in the case where $X_1$ and $X_2$ are *independent*, but linearity doesn't *require* them to be&mdash;it's simply *always* true!
+The first one is quite intuitive; after all, $\alpha X$ is just $X$ with all values scaled by $\alpha$, so the *average* should just be scaled in the same way. However, the second property&mdash;additivity&mdash;may be surprising. The property could be intuitive in the case where $X_1$ and $X_2$ are *independent*, but linearity doesn&rsquo;t *require* them to be&mdash;it&rsquo;s simply *always* true!
 
-In a bonus section below, we'll explain why this is true, but for now, let's first try to apply this to the problem. Let $T$ be the same random variable as before, so it denotes the *sum* of the $n$ largest values of the sequence produced. Now, we define $n$ new random variables $T_1, T_2, \ldots T_n$, where $T_i$ denotes the $i$th largest value of the sequence. Then clearly we have
+In a bonus section below, we&rsquo;ll explain why this is true, but for now, let&rsquo;s first try to apply this to the problem. Let $T$ be the same random variable as before, so it denotes the *sum* of the $n$ largest values of the sequence produced. Now, we define $n$ new random variables $T_1, T_2, \ldots T_n$, where $T_i$ denotes the $i$th largest value of the sequence. Then clearly we have
 $$T = T_1 + T_2 + \ldots + T_n = \sum_{i=1}^n T_i.$$
-Now, the $T_i$'s are definitely not independent, e.g., knowing the largest value constrains the possible values of the second value, and vice versa. Regardless, *expectation is always additive*, so we have the equality
+Now, the $T_i$&rsquo;s are definitely not independent, e.g., knowing the largest value constrains the possible values of the second largest, and vice versa. Regardless, *expectation is always additive*, so we have the equality
 $$\operatorname{E}[T] = \operatorname{E}[T_1] + \operatorname{E}[T_2] + \ldots + \operatorname{E}[T_n] = \sum_{i=1}^n \operatorname{E}[T_i].$$
-Thus, we've reduced the problem to computing $\operatorname{E}[T_i]$ for $1 \le i \le n$, which is potentially more manageable!
+Thus, we&rsquo;ve reduced the problem to computing $\operatorname{E}[T_i]$ for $1 \le i \le n$, which is potentially more manageable!
 
 
 ### Computing $\operatorname{E}[T_i]$
 
-Let's now try to compute $\operatorname{E}[T_i]$, the expected value of the $i$th largest element of the sequence. The possible values are between $1$ and $k$, so by definition, we have
+Let&rsquo;s now try to compute $\operatorname{E}[T_i]$, the expected value of the $i$th largest element of the sequence. The possible values are between $1$ and $k$, so by definition, we have
 $$\operatorname{E}[T_i] = \sum_{v=1}^k \operatorname{P}[T_i = v]\cdot v,$$
 where $\operatorname{P}[T_i = v]$ denotes the probability that $T_i = v$. Next, we again turn probability into counting; noting that there are $k^w$ equally likely possibilities, we have something like
 $$\operatorname{P}[T_i = v] = \frac{\mathit{count}_{=v}(i)}{k^w}$$
-where $\mathit{count}_{=v}(i)$ denotes the number of sequences whose $i$th largest value is $v$. Thus, we're done if we can compute $\mathit{count}_{=v}(i)$.
+where $\mathit{count}_{=v}(i)$ denotes the number of sequences whose $i$th largest value is $v$. Thus, we&rsquo;re done if we can compute $\mathit{count}_{=v}(i)$.
 
 
 ### Computing $\mathit{count}_{=v}(i)$
@@ -237,7 +237,7 @@ such sequences.
 
 We now have a complete solution! How fast does it run? Well, we need to compute $\operatorname{E}[T_i]$ for $1 \le i \le n$, which in turn require the values $\mathit{count}_{=v}(i)$ for $1 \le i \le n$ and $1 \le v \le k$, which in turn require the values $c(\ell, g, v)$ for $0 \le \ell \le w - 1$, $0 \le g \le n - 1$ and $1 \le v \le k$.
 
-- Each $c(\ell, g, v)$ value is a product of some binomial coefficients and powers. The powers can all be computed with fast exponentiation, or they could just be precomputed in a table at the beginning (since all powers we need have bases less than $k$, and exponents less than $w$), and the binomial coefficients can also be precomputed in a table, either via Pascal's identity, or precomputing factorials and using
+- Each $c(\ell, g, v)$ value is a product of some binomial coefficients and powers. The powers can all be computed with fast exponentiation, or they could just be precomputed in a table at the beginning (since all powers we need have bases less than $k$, and exponents less than $w$), and the binomial coefficients can also be precomputed in a table, either via Pascal&rsquo;s identity, or precomputing factorials and using
 $$\binom{a}{b} = \frac{a!}{(a - b)!b!}.$$
 Therefore, we could say that each $c(\ell, g, v)$ can be computed in a constant amount of steps, and since there are $\approx wnk$ of them, the total number of steps to compute them all is $\approx wnk$.
 
@@ -247,14 +247,14 @@ Therefore, we could say that each $c(\ell, g, v)$ can be computed in a constant
 
 - Finally, we also need to account for the precomputation of factorials and powers. There are $\approx w$ factorials and $\approx kw$ powers to precompute, so their precomputation takes $\approx kw$ steps.
 
-Thus, the running time is dominated by the computation of $\mathit{count}_{=v}(i)$. For Subtask 2, we have $wn^2 k = 6\cdot 10^9$, so the number of steps seems small enough for this to be waitable if you use a fast language and a highly optimized implementation. It may be slow though, so instead of that, let's just improve our algorithm further.
+Thus, the running time is dominated by the computation of $\mathit{count}_{=v}(i)$. For Subtask 2, we have $wn^2 k = 6\cdot 10^9$, so the number of steps seems small enough for this to be waitable if you use a fast language and a highly optimized implementation. It may be slow though, so instead of that, let&rsquo;s just improve our algorithm further.
 
 
 ### Computing $\mathit{count}_{=v}(i)$ more quickly
 
-Let's look at $\mathit{count}_{=v}(i)$ again. It denotes the number of sequences whose $i$th largest value is exactly $v$. It turns out that it's easier to count the number of sequences whose $i$th largest value is **at most $v$**. Even more nicely, it turns out that you can use the latter to compute the former!
+Let&rsquo;s look at $\mathit{count}_{=v}(i)$ again. It denotes the number of sequences whose $i$th largest value is exactly $v$. It turns out that it&rsquo;s easier to count the number of sequences whose $i$th largest value is **at most $v$**. Even more nicely, it turns out that you can use the latter to compute the former!
 
-To see this, let's define $\mathit{count}_{\le v}(i)$ to be the number of sequences whose $i$th largest value is at most $v$. Then we easily have:
+To see this, let&rsquo;s define $\mathit{count}_{\le v}(i)$ to be the number of sequences whose $i$th largest value is at most $v$. Then we easily have:
 <div class="theorem">
 **Claim:** $\mathit{count}_{=v}(i) = \mathit{count}_{\le v}(i) - \mathit{count}_{\le v - 1}(i)$.
 </div>
@@ -263,7 +263,7 @@ To see this, let's define $\mathit{count}_{\le v}(i)$ to be the number of sequen
 **Proof:** Left as an exercise to the reader.
 </div>
 
-So we've reduced the problem to computing $\mathit{count}_{\le v}(i)$ for $0 \le v \le k$ and $1 \le i \le n$. So what? Well, here's what. It turns out that we can find a version of Theorem 1 that applies to $\mathit{count}_{\le v}(i)$:
+So we&rsquo;ve reduced the problem to computing $\mathit{count}_{\le v}(i)$ for $0 \le v \le k$ and $1 \le i \le n$. So what? Well, here&rsquo;s what. It turns out that we can find a version of Theorem 1 that applies to $\mathit{count}_{\le v}(i)$:
 <div class="theorem">
 
 **Theorem 2:** The $i$th largest value of a sequence is at most $v$ if and only if the sequence has $< i$ elements greater than $v$.
@@ -277,7 +277,7 @@ And as you may notice, Theorem 2 is much simpler than Theorem 1!
 
 We can now use a similar counting argument as before. Let $g$ be the number of elements greater than $v$, so that $g < i$, and we can again write
 $$\mathit{count}_{\le v}(i) = \sum_{g=0}^{i-1} c(g, v)$$
-where now, $c(g, v)$ denotes the number of sequences with *exactly* $g$ elements greater than $v$. Then we can count $c(g, v)$ similarly as before, except it's even simpler:
+where now, $c(g, v)$ denotes the number of sequences with *exactly* $g$ elements greater than $v$. Then we can count $c(g, v)$ similarly as before, except it&rsquo;s even simpler:
 
 1. First, choose the $g$ indices that will be $> v$. There are $\binom{w}{g}$ ways to do this. The rest of the elements will be $\le v$.
 2. Then, we choose the actual values of the elements $> v$. There are $g$ values to choose, and each one is an independent choice of a number between $v+1$ and $k$, so there are $(k-v)^g$ ways to do this.
@@ -300,18 +300,18 @@ What happens to the running time when you base your algorithm on this?
 
 <details class="editorial-section"><summary class="h2">Subtask 3</summary>
 
-The main change from Subtask 2 to Subtask 3 is that $w$ is vastly increased, which means the portion of our previous algorithm that takes $\approx wk$ steps is now unacceptable. Let's recap what those steps are:
+The main change from Subtask 2 to Subtask 3 is that $w$ is vastly increased, which means the portion of our previous algorithm that takes $\approx wk$ steps is now unacceptable. Let&rsquo;s recap what those steps are:
 
 1. precomputing factorials up to $w$, and
 2. precomputing powers up to base $k$ and up to exponent $w$.
 
-Among these, the second one clearly dominates the running time. But we can essentially get rid of the second one by simply *not* precomputing powers, and instead just fast exponentiation to compute them when needed! This makes the running time slightly worse&mdash;fast exponentiation takes $\mathcal{O}(\lg w)$ steps for an exponent the size of $w$&mdash;but that's a very worthwhile tradeoff, because you can check that the number of steps improves from $\mathcal{O}(wk + n^2 k)$ to 
+Among these, the second one clearly dominates the running time. But we can essentially get rid of the second one by simply *not* precomputing powers, and instead just fast exponentiation to compute them when needed! This makes the running time slightly worse&mdash;fast exponentiation takes $\mathcal{O}(\lg w)$ steps for an exponent the size of $w$&mdash;but that&rsquo;s a very worthwhile tradeoff, because you can check that the number of steps improves from $\mathcal{O}(wk + n^2 k)$ to 
 $$\mathcal{O}(w + n^2 k + nk \lg w).$$
 This is now acceptable for Subtask 3 &#128578;.
 
-Now, there's still that factor $w$ in the running time, which in the current subtask is probably ok since $w = 10^8$. However, in later subtasks, $w = 10^{16}$, which suggests that that bit can still be improved further.
+Now, there&rsquo;s still that term $w$ in the running time, which in the current subtask is probably ok since $w = 10^8$. However, in later subtasks, $w = 10^{16}$, which suggests that that bit can still be improved further.
 
-How can we improve it? Well, the main reason for needing factorials up to $w$ is so that we can compute binomial coefficients. But looking closer, notice that we actually only need binomial coefficients **at exactly row $w$**. Furthermore, we actually only need the first $n$ coefficients in it. And as it turns out, there's a way to compute a row of binomial coefficients one by one, starting from the leftmost one, by using the following recurrence (which is easy to prove using the factorial formula):
+How can we improve it? Well, the main reason for needing factorials up to $w$ is so that we can compute binomial coefficients. But looking closer, notice that we actually only need binomial coefficients **at exactly row $w$**. Furthermore, we actually only need the first $n$ coefficients in it. And as it turns out, there&rsquo;s a way to compute a row of binomial coefficients one by one, starting from the leftmost one, by using the following recurrence (which is easy to prove using the factorial formula):
 $$\binom{w}{g} = \binom{w}{g - 1}\cdot \frac{w - g + 1}{g},$$
 with base case simply $\binom{w}{0} = 1$. So now, instead of precomputing factorials, we may simply precompute the needed binomial coefficients using this recurrence with just $\approx n$ steps! The running time then improves to
 $$\mathcal{O}(n^2 k + nk \lg w),$$
@@ -323,7 +323,7 @@ which is really cool.
 
 <details class="editorial-section"><summary class="h2">Subtasks 4 & 5</summary>
 
-Our previous algorithm is now too slow; in particular, that $\mathcal{O}(n^2 k)$ bit in the running time is now too large. For the rest of the subtasks, I'll just give a couple of hints to guide you towards faster solutions.
+Our previous algorithm is now too slow; in particular, that $\mathcal{O}(n^2 k)$ bit in the running time is now too large. For the rest of the subtasks, I&rsquo;ll just give a couple of hints to guide you towards faster solutions.
 
 <details class="task"><summary class="h4">Hint 1</summary>
 Do you really have to compute the whole sum
@@ -352,7 +352,7 @@ This section is devoted to explaining why expectation is *linear*. Recall from a
 
 The first one is simple enough, and you should be able to prove it yourself &#128578;. The real surprise is the second, which holds even if $X_1$ and $X_2$ are not independent. (For independent variables, this may not be a surprise, since &ldquo;clearly&rdquo; the variables have nothing to do with each other,[^2] so the averages should &ldquo;just add up.&rdquo;)
 
-Let's see an example of this, using our current problem itself, with $n = 2$, $w = 3$ and $k = 2$. In this case, we have
+Let&rsquo;s see an example of this, using our current problem itself, with $n = 2$, $w = 3$ and $k = 2$. In this case, we have
 $$T = T_1 + T_2$$
 where $T_i$ is the value of the $i$th largest element. Clearly, $T_1$ and $T_2$ are not independent; for example, we know that $T_1$ is at least $T_2$, so if $T_2$ is $2$, then $T_1$ must be $2$ as well.
 
@@ -377,7 +377,7 @@ $$\begin{align*}
 \end{align*}$$
 and sure enough, $3.375 = 1.875 + 1.5$, even though $T_1$ and $T_2$ are not independent.
 
-But actually, this little calculation illustrates pretty well *why* expectation is additive; we're simply adding the same things in different ways! To illustrate this further, we can tabulate everything as follows:
+But actually, this little calculation illustrates pretty well *why* expectation is additive; we&rsquo;re simply adding the same things in different ways! To illustrate this further, we can tabulate everything as follows:
 $$\begin{array}{l|l|lll}
 s & p_s & T_1 & T_2 & T \\
 \hline
@@ -403,9 +403,9 @@ s & p_sT_1 & p_sT_2 & p_sT \\
 [2, 2, 1] & \frac{2}{8} & \frac{2}{8} & \frac{4}{8} \\
 [2, 2, 2] & \frac{2}{8} & \frac{2}{8} & \frac{4}{8}.
 \end{array}$$
-and note that the $p_sT$ column is still the sum of the $p_sT_1$ and $p_sT_2$ columns. Finally, computing $\operatorname{E}[T]$ amounts to taking the *sum* of the $p_sT$ column, while computing $\operatorname{E}[T_1] + \operatorname{E}[T_2]$ amounts to taking the sum of the $p_sT_1$ and $p_sT_2$ columns separately, then adding them. But these are clearly the same! (And this worked even if $T_1$ and $T_2$ aren't independent.)
+and note that the $p_sT$ column is still the sum of the $p_sT_1$ and $p_sT_2$ columns. Then computing $\operatorname{E}[T]$ amounts to taking the sum of the $p_sT$ column, while computing $\operatorname{E}[T_1] + \operatorname{E}[T_2]$ amounts to taking the sums of the $p_sT_1$ and $p_sT_2$ columns separately, then adding them. But these are clearly the same! (And this worked even if $T_1$ and $T_2$ aren&rsquo;t independent.)
 
-It should now not be too hard to formalize this argument and make it more general. If you're interested, here it is:
+It should now not be too hard to formalize this argument and make it more general. If you&rsquo;re interested, here it is:
 <details class="proof"><summary class="h4">Proof</summary>
 
 Suppose the sample space has $k$ elements $\{\omega_1, \omega_2, \ldots, \omega_k\}$ with respective probabilities $p_1, p_2, \ldots, p_k$. Because $T = T_1 + T_2$, we must always have $T(\omega_i) = T_1(\omega_i) + T_2(\omega_i),$ for every $i$. 
@@ -438,23 +438,23 @@ For practice, you could try proving the *scaling* property formally yourself:
 
 All solutions we described above compute the *full* answer, i.e., we pretend we were working on $\mathbb{R}$ (or maybe $\mathbb{C}$) where we can add, subtract, multiply, and crucially, divide, numbers. Actually, we could also pretend we are working on $\mathbb{Q}$, i.e., the rationals, since all intermediate results are clearly rational, and we can also do the same arithmetic operations there.
 
-Now, in many problems, we can usually convert such full-answer solutions into solutions that compute the answer mod $m$, say $m = 998244353$, because we can also add, subtract and multiply numbers mod $m$. However, division mod $m$ is more complicated; it sometimes doesn't work at all. To see this, let $m = 10$, and note that $12 \equiv 32 \pmod{10}$, but dividing by $4$ fails:
+Now, in many problems, we can usually convert such full-answer solutions into solutions that compute the answer mod $m$, say $m = 998244353$, because we can also add, subtract and multiply numbers mod $m$. However, division mod $m$ is more complicated; it sometimes doesn&rsquo;t work at all. To see this, let $m = 10$, and note that $12 \equiv 32 \pmod{10}$, but dividing by $4$ fails:
 $$\frac{12}{4} = 3 \not\equiv 8 = \frac{32}{4} \pmod{10}.$$
 
 
 ### Computing $a/b \bmod m$ by trial and error
 
-Before we tackle this issue, let's first see if we can compute $a/b \bmod m$ based solely on the definition given in the problem statement. Suppose you've computed the full answer as $a/b$, and let's say it's in lowest terms. Then the problem guarantees us that $a/b \bmod m$ is well-defined, and it is the unique number $q$ such that &ldquo;$a/b - q = \frac{a - qb}{b}$ is divisible by $m$&rdquo;, which by definition means that $\frac{a - qb}{b}$ can be written as a fraction whose numerator is divisible by $m$ but whose denominator is not. Now, the fraction $\frac{a - qb}{b}$ is already in lowest terms (why?), so this means two things:
+Before we tackle this issue, let&rsquo;s first see if we can compute $a/b \bmod m$ based solely on the definition given in the problem statement. Suppose you&rsquo;ve computed the full answer as $a/b$, and let&rsquo;s say it&rsquo;s in lowest terms. Then the problem guarantees us that $a/b \bmod m$ is well-defined, and it is the unique number $q$ such that &ldquo;$a/b - q = \frac{a - qb}{b}$ is divisible by $m$&rdquo;, which by definition means that $\frac{a - qb}{b}$ can be written as a fraction whose numerator is divisible by $m$ but whose denominator is not. Now, the fraction $\frac{a - qb}{b}$ is already in lowest terms (why?), so this means two things:
 
-- $b$ must not be divisible by $m$ (which we can check), otherwise there's no hope of $\frac{a - qb}{b}$ being divisible by $m$.
+- $b$ must not be divisible by $m$ (which we can check), otherwise there&rsquo;s no hope of $\frac{a - qb}{b}$ being divisible by $m$.
 - $a - qb$ is divisible by $m$. To find such a $q$, we could simply use brute force: check each $q$ from $0$ to $m-1$ and find one where $a - qb$ is divisible by $m$. The problem statement says that such a $q$ is unique.
 
-All in all, this takes $\approx m$ steps in the worst case to find $q$, which is the answer we're looking for. With $m = 998244353 \approx 10^9$, that isn't so bad, especially if $a/b$ doesn't have too many digits. So for Subtasks 1 and 2, that's more-or-less okay. But for the larger subtasks the numbers become too large[^3] which makes it not okay, and we clearly need to do something else.
+All in all, this takes $\approx m$ steps in the worst case to find $q$, which is the answer we&rsquo;re looking for. With $m = 998244353 \approx 10^9$, that isn&rsquo;t so bad, especially if $a/b$ doesn&rsquo;t have too many digits. So for Subtasks 1 and 2, that&rsquo;s more-or-less okay. But for the larger subtasks, the numbers become too large[^3] which makes it not okay, and we clearly need to do something else.
 
 
 ### Working &ldquo;modulo $m$&rdquo;
 
-You might suspect that the reason that dividing by $4$ failed modulo $10$ is that $4$ and $10$ share a common factor. And indeed, that's a good hunch. For example, dividing by $3$ seems to work modulo $10$, which you can check with lots of small examples, or maybe by using a program to do several checks for you, e.g.:
+You might suspect that the reason that dividing by $4$ failed modulo $10$ is that $4$ and $10$ share a common factor. And indeed, that&rsquo;s a good hunch. For example, dividing by $3$ seems to work modulo $10$, which you can check with lots of small examples, or maybe by using a program to do several checks for you, e.g.:
 <details class="code"><summary class="h4">Code (Python)</summary>
 
 ```python
@@ -484,7 +484,7 @@ for den in denominators:
 print("All OK")
 ```
 </details>
-You can replace `m = 10`{.python} with other numbers and it still seems to work! So clearly, there seems to be some sense in which division &ldquo;kinda makes sense&rdquo;, as long as the number you're dividing with is coprime with the modulus $m$.
+You can replace `m = 10`{.python} with other numbers and it still seems to work! So clearly, there seems to be some sense in which division &ldquo;kinda makes sense&rdquo;, as long as the number you&rsquo;re dividing with is coprime with the modulus $m$.
 
 And as it turns out, we can prove that fact!
 
@@ -497,27 +497,29 @@ And as it turns out, we can prove that fact!
 **Proof:** Fairly straightforward, so we leave it to the reader.
 </div>
 
-Now that's all well and good, but what we really want is to be able to *divide* modulo $m$. For this, we should answer the following question first: *what is division, really*? Well, dividing is the same as multiplying by the *multiplicative inverse*, that is, $a/b$ is the same as $ab^{-1}$, where $b^{-1} = 1/b$ is the multiplicative inverse of $b$. But what is a multiplicative inverse? Well, $b^{-1}$ is defined as the unique number such that $bb^{-1} = 1$.
+Now Theorem A&rsquo;s nice and all, but it only allows us to divide by $d$ if the number was already divisible divisible by $d$. What we really want is to be able to divide modulo $m$ anytime we want, that is, makes sense of things like $$7/3 \bmod 10.$$ In this particular example, even though $7/3$ is not an integer, note that $7 \equiv 27 \pmod{10}$ and $27/3 = 9$ is an integer, so if Theorem A were to extend even to non-integer settings, then we ought to have $7/3 \equiv 27/3 \pmod{10}$, so $7/3 \bmod 10$ ought to be $9$. We&rsquo;d like to generalize this reasoning.
 
-Now, as it turns out, multiplicative inverses sometimes exist modulo $m$. In the mod $m$ world, the multiplicative inverse of $b$ is still denoted $b^{-1}$, but this time, it's not a fraction. Nonetheless, it's still defined analogously; $b^{-1}$ is the &ldquo;unique&rdquo; number such that
+For this, we should answer the following question first: *what is division, really*? Well, dividing is the same as multiplying by the *multiplicative inverse*, that is, $a/b$ is the same as $ab^{-1}$, where $b^{-1} = 1/b$ is the multiplicative inverse of $b$. But what is a multiplicative inverse? Well, $b^{-1}$ is defined as the unique number such that $bb^{-1} = 1$.
+
+Now, as it turns out, multiplicative inverses *sometimes* exist modulo $m$. In the mod $m$ world, the multiplicative inverse of $b$ is still denoted $b^{-1}$, but this time, it&rsquo;s not a fraction. Nonetheless, it&rsquo;s still defined analogously; $b^{-1}$ is the &ldquo;unique&rdquo; number such that
 $$bb^{-1} \equiv 1 \pmod{m}.$$
-Note that I put &ldquo;unique&rdquo; in quotes because if $x$ is a multiplicative inverse, then $x + m$ is also one, as is $x + 2m$, $x - m$, etc. But as it turns out, all these numbers are the same mod $m$, which is what we mean by &ldquo;unique&ldquo; here.
+Note that I put &ldquo;unique&rdquo; in quotes because if $x$ is a multiplicative inverse, then $x + m$ is also one, as is $x + 2m$, $x - m$, etc. But as it turns out, all these numbers are the same mod $m$, which is what we mean by &ldquo;unique&rdquo; here.
 
-We can actually prove that fact, and in fact, something stronger:
+We can actually prove that fact, and in fact, something stronger; we can say precisely when there&rsquo;s a multiplicative inverse:
 <div class="theorem">
 
-**Theorem B:** $b$ has a multiplicative inverse if and only if $b$ and $m$ and coprime, and it is unique if it exists.
+**Theorem B:** For any $b \in \mathbb{Z}$, $b$ has a multiplicative inverse if and only if $b$ and $m$ and coprime, and it is unique (mod $m$) if it exists.
 </div>
 
 <details class="proof"><summary class="h4">Proof</summary>
 
 (&rArr;) Suppose $b$ has a multiplicative inverse $b'$, so that
 $$bb' \equiv 1 \pmod{m}.$$
-This is equivalent to saying that there's a $k$ such that
+This is equivalent to saying that there&rsquo;s a $k$ such that
 $$bb' - mk = 1.$$
 Now, if $d$ is a common divisor of $b$ and $m$, then $d$ divides the left-hand side, so it must also divide the right-hand side, which is $1$. Thus, all common divisors of $b$ and $m$ divide $1$, which means they are coprime.
 
-(&lArr;) Suppose $b$ and $m$ are coprime, so their gcd is $1$. By [Bézout's](https://en.wikipedia.org/wiki/B%C3%A9zout%27s_identity), there are integers $x$ and $y$ such that
+(&lArr;) Suppose $b$ and $m$ are coprime, so their gcd is $1$. By [Bézout&rsquo;s](https://en.wikipedia.org/wiki/B%C3%A9zout%27s_identity), there are integers $x$ and $y$ such that
 $$bx + my = 1.$$
 Reducing this modulo $m$ gives
 $$bx \equiv 1 \pmod{m},$$
@@ -535,12 +537,12 @@ But $m$ and $b$ are coprime (since a multiplicative inverse exists), so by using
 </details>
 
 <div class="remarks">
-**Remark:** The proof can actually be turned into an algorithm to compute the multiplicative inverse, since the integers $x$ and $y$ guaranteed by Bézout's identity can be computed using the **extended version of Euclid's gcd algorithm.**
+**Remark:** The proof can actually be turned into an algorithm to compute the multiplicative inverse, since the integers $x$ and $y$ guaranteed by Bézout&rsquo;s identity can be computed using the **extended version of Euclid&rsquo;s gcd algorithm.**
 </div>
 
-So with this, we're now fairly able to &ldquo;divide modulo $m$&rdquo;, as long as the divisors are coprime with $m$. Since we're using the modulus $m = 998244353$ which is prime, most numbers are coprime! The only ones we can't divide with are those divisible by $m$ itself, but since such numbers are $\equiv 0 \pmod{m}$, it makes sense not to be able to divide with them since that's sort of equivalent to dividing by $0$.
+So with this, we&rsquo;re now fairly able to &ldquo;divide modulo $m$&rdquo;, as long as the divisors are coprime with $m$. Since we&rsquo;re using the modulus $m = 998244353$ which is prime, most numbers are coprime! The only ones we can&rsquo;t divide with are those divisible by $m$ itself, but since such numbers are $\equiv 0 \pmod{m}$, it makes sense not to be able to divide with them since that&rsquo;s sort of equivalent to dividing by $0$.
 
-Now, that's well and good, but we still need to relate this way of dividing modulo $m$ with the definition given in the statement. As it turns out, everything is okay; we can prove that $a/b \bmod m$, as defined in the statement, is the same as $ab^{-1} \bmod m$, using the following theorem:
+Now, that&rsquo;s well and good, but we still need to relate this way of dividing modulo $m$ with the definition given in the statement. As it turns out, everything is okay; we can prove that $a/b \bmod m$, as defined in the statement, is the same as $ab^{-1} \bmod m$, using the following theorem:
 
 <div class="theorem">
 
@@ -548,11 +550,11 @@ Now, that's well and good, but we still need to relate this way of dividing modu
 $$(a/b \bmod m) = (ab^{-1} \bmod m).$$
 </div>
 
-For this theorem to work, we will amend the definition given in the statement as follows: We say a rational is **divisible by $m$** if it can be written as $a/b$ with $a$ divisible by $m$ and $b$ *coprime* with $m$. This is equivalent to the definition in the statement when $m$ is prime, but it's friendlier to nonprime moduli.
+For this theorem to work, we will amend the definition given in the statement as follows: We say a rational is **divisible by $m$** if it can be written as $a/b$ with $a$ divisible by $m$ and $b$ *coprime* with $m$. This is equivalent to the definition in the statement when $m$ is prime, but it&rsquo;s friendlier to nonprime moduli.
 
 <details class="proof"><summary class="h4">Proof</summary>
 
-(&rArr;) Suppose $r \bmod m$ exists, i.e., there's a unique $q$ such that $r - q$ is &ldquo;divisible by $m$&rdquo; (as defined above). Writing $r$ in lowest terms as $a/b$, we note that $a/b - q = \frac{a - bq}{b}$ is also in lowest terms.
+(&rArr;) Suppose $r \bmod m$ exists, i.e., there&rsquo;s a unique $q$ such that $r - q$ is &ldquo;divisible by $m$&rdquo; (as defined above). Writing $r$ in lowest terms as $a/b$, we note that $a/b - q = \frac{a - bq}{b}$ is also in lowest terms.
 
 By definition of divisibility, $\frac{a - qb}{b}$ can be written as $a'/b'$ with $m$ dividing $a'$ but coprime with $b'$. Since
 $$\frac{a - qb}{b} = \frac{a'}{b'}$$
@@ -562,7 +564,7 @@ and the former is in lowest terms, it follows that $a - qb$ is a divisor of $a'$
 $$q := (ab^{-1} \bmod m)$$
 satisfies the definition of $r \bmod m$. Note that
 $$r - q = \frac{a - qb}{b},$$
-and we already know $b$ is coprime with $m$, so it's sufficient to show that $a - qb$ is divisible by $m$, i.e., $a \equiv qb \pmod{m}$. That's shown as follows:
+and we already know $b$ is coprime with $m$, so it&rsquo;s sufficient to show that $a - qb$ is divisible by $m$, i.e., $a \equiv qb \pmod{m}$. That&rsquo;s shown as follows:
 $$\begin{align*}
 qb 
 &\equiv (ab^{-1})b \\
@@ -575,27 +577,31 @@ So $r - q$ is indeed divisible by $m$. All that remains is to show that $q$ is t
 $$\frac{a - q'b}{b} = \frac{a'}{b'}$$
 with $m$ dividing $a'$ and coprime with $b'$. Rearranging this gives
 $$(a - q'b)b' = a'b.$$
-Because $m \mid a'$, $m$ must divide the left-hand side $(a - q'b)b'$ as well, but since $m$ and $b'$ are coprime, $m$ must divide $a - q'b$, i.e.,
-$$a \equiv q'b \pmod{m}.$$
+Because $m \mid a'$, reducing this modulo $m$ gives
+$$\begin{align*}
+(a - q'b)b' &\equiv 0 \pmod{m} \\
+a - q'b &\equiv 0 && \text{using Theorem A} \\
+a   &\equiv q'b
+\end{align*}$$
 Multiplying both sides by $b^{-1}$, we get
 $$q' \equiv ab^{-1} \equiv q \pmod{m}.$$
-In other words, any other possible value $q'$ of $(r \bmod m)$ must be equal to $q = (ab^{-1} \bmod m)$, so it's unique.
+In other words, any other possible value $q'$ of $(r \bmod m)$ must be equal to $q = (ab^{-1} \bmod m)$, so it&rsquo;s unique.
 </details>
 
 <div class="theorem">
 
 **Corollary:** Suppose $r$ cannot be written as $a/b$ with $b$ coprime with $m$. Then there is *no* integer $q$ such that $r - q$ is divisible by $m$.
 </div>
-Note that this doesn't follow immediately from the definition, since if $r \bmod m$ doesn't exist, then all we can say from the definition is that there isn't *exactly one* $q$ such that $r - q$ is divisible by $m$. In particular, there may be zero, or there may be more than one. This corollary rules out the latter.
+Note that this doesn&rsquo;t follow immediately from the definition, since if $r \bmod m$ doesn&rsquo;t exist, then all we can say from the definition is that there isn&rsquo;t *exactly one* $q$ such that $r - q$ is divisible by $m$. In particular, there may be zero, or there may be more than one. This corollary rules out the latter.
 
 <details class="proof"><summary class="h4">Proof</summary>
 
 We prove the contrapositive.
 
-Suppose there is a $q$ such that $r - q$ is divisible by $m$. Notice that the &ldquo;(&rArr;)&rdquo; portion of the previous proof doesn't really use the fact that $q$ is unique, so the proof also goes through here just fine, and it proves that $r$ can be written as $a/b$ with $b$ coprime with $m$.
+Suppose there is a $q$ such that $r - q$ is divisible by $m$. Notice that the &ldquo;(&rArr;)&rdquo; portion of the previous proof doesn&rsquo;t really use the fact that $q$ is unique, so the proof also goes through here just fine, and it proves that $r$ can be written as $a/b$ with $b$ coprime with $m$.
 </details>
 
-With this, we can now completely work modulo $m = 998244353$ all throughout! All that we need now is to check that we're only ever dividing with numbers without $m$ as a prime factor. The possible divisors come from $k^w$ and the numbers coming from the computation of $\binom{w}{g}$ with $g < n$. The number $k$ is less than $m$ in all inputs, so $k^w$ is coprime with $m$. And in the first few subtasks, $w$ is also less than $m$, so all factors in $\binom{w}{g}$ are all coprime as well. Finally, in the subtasks where $w$ is very large, recall that we're only computing the first $n$ terms of row $w$ of the binomial coefficient table, and that we're using the recurrence
+With this, we can now completely work modulo $m = 998244353$ all throughout! All that we need now is to check that we&rsquo;re only ever dividing with numbers without $m$ as a prime factor. The possible divisors come from $k^w$ and the numbers coming from the computation of $\binom{w}{g}$ with $g < n$. The number $k$ is less than $m$ in all inputs, so $k^w$ is coprime with $m$. And in the first few subtasks, $w$ is also less than $m$, so all factors in $\binom{w}{g}$ are coprime with $m$ as well. Finally, in the subtasks where $w$ is very large, recall that we&rsquo;re only computing the first $n$ terms of row $w$ of the binomial coefficient table, and that we&rsquo;re using the recurrence
 $$\binom{w}{g} = \binom{w}{g - 1}\cdot \frac{w - g + 1}{g},$$
 so we only need to divide with numbers $g < n$. Since $n < m$ for all inputs, this is ok too.
 
@@ -603,15 +609,15 @@ Thus, we can safely divide whenever we need to, and all is well in the world.
 
 <div class="remarks">
 
-**Remark:** In math, when we're doing this idea of &ldquo;working modulo $m$&rdquo; we usually say we're &ldquo;working in $\mathbb{Z}/m\mathbb{Z}$&rdquo;. Here, &ldquo;$\mathbb{Z}/m\mathbb{Z}$&rdquo; is a formalization of the &ldquo;set of integers modulo $m$&rdquo;. It is just like the integers $\mathbb{Z}$, but we make two numbers equal iff they are the same mod $m$. In this setting, we can also add, subtract, and multiply, and we can divide by any number coprime with $m$ (as shown above).
+**Remark:** In math, when we&rsquo;re doing this idea of &ldquo;working modulo $m$&rdquo; we usually say we&rsquo;re &ldquo;working in $\mathbb{Z}/m\mathbb{Z}$&rdquo;. Here, &ldquo;$\mathbb{Z}/m\mathbb{Z}$&rdquo; is a formalization of the &ldquo;set of integers modulo $m$&rdquo;. It is just like the integers $\mathbb{Z}$, but we make two numbers equal iff they are the same mod $m$. In this setting, we can also add, subtract, and multiply, and we can divide by any number coprime with $m$ (as shown above).
 
-If $m$ is prime, then this means we can divide by any &ldquo;nonzero number&rdquo; (where you need to remember that &ldquo;nonzero&rdquo; means &ldquo;not divisible by $m$&rdquo;), which makes $\mathbb{Z}/m\mathbb{Z}$ a **field**, just like $\mathbb{R}$, $\mathbb{C}$, and $\mathbb{Q}$.
+If $m$ is prime, then this means we can divide by any &ldquo;nonzero number&rdquo; (where you need to remember that &ldquo;nonzero&rdquo; means &ldquo;not divisible by $m$&rdquo;), which makes $\mathbb{Z}/m\mathbb{Z}$ behave very much like $\mathbb{R}$, $\mathbb{C}$, and $\mathbb{Q}$ where arithmetic operations are all defined except only division by zero; we say it&rsquo;s a **field**.
 </div>
 
 </details>
 
 [^1]: technically, we should say &ldquo;*uniformly randomly, and independently of each other*&rdquo; here...
 
-[^2]: technically, *independent* doesn't really mean *has nothing to do with each other*; it means more like *the probabilities of one are not affected if you know the other*
+[^2]: technically, *independent* doesn&rsquo;t really mean *has nothing to do with each other*; it means more like *the probabilities of one are not affected if you know the other*
 
-[^3]: and even if it isn't, the overhead of having to compute with large numbers makes things slower, which we probably couldn't afford for later subtasks
+[^3]: and even if it isn&rsquo;t, the overhead of having to compute with large numbers makes things slower, which we probably couldn&rsquo;t afford for later subtasks
diff --git a/2023/tama/primes/index.html b/2023/tama/primes/index.html
index 3713f44..dc525a0 100644
--- a/2023/tama/primes/index.html
+++ b/2023/tama/primes/index.html
@@ -134,11 +134,11 @@ <h3 id="brute-force">Brute force?</h3>
     <span class="cf">for</span> a <span class="kw">in</span> product(<span class="bu">range</span>(<span class="dv">2</span>, b), repeat<span class="op">=</span>n):
         ...</code></pre></div>
 <p>We can check that this is correct by running it on one of the examples, say <span class="math inline">\(n = 3\)</span> and <span class="math inline">\(b = 18\)</span>.</p>
-<p>Unfortunately, when you try to pass in the actual input <span class="math inline">\(n = 10\)</span> and <span class="math inline">\(b = 48\)</span>, you’ll find that it doesn’t seem to finish. Indeed, there are <span class="math inline">\(46\)</span> possible values, which means there are <span class="math inline">\(46^{10} \approx 4\cdot 10^{16}\)</span> possible sequences. Even if we could process <span class="math inline">\(10^9\)</span> sequences per second, this program will take more than one year to finish!</p>
+<p>Unfortunately, when you try to pass in the actual input <span class="math inline">\(n = 10\)</span> and <span class="math inline">\(b = 48\)</span>, you’ll find that it doesn’t seem to finish. Indeed, there are <span class="math inline">\(46\)</span> possible values, which means there are <span class="math inline">\(46^{10} \approx 4\cdot 10^{16}\)</span> possible sequences. Even if we could process <span class="math inline">\(10^9\)</span> sequences per second, this program will take <em>more than one year</em> to finish!</p>
 <p>We can improve this slightly with some observations.</p>
 <ul>
-<li><p>First, the numbers must be <em>distinct</em>, so we could just enumerate all sequences <strong>without repeated values</strong>. This reduces the number of candidates from <span class="math inline">\(46^{10}\)</span> to <span class="math inline">\(46\cdot 45\cdot 44 \cdots 37\)</span>. However, this number is still large—it’s <span class="math inline">\(\approx 1.5\cdot 10^{16}\)</span>, which isn’t a huge improvement. With <span class="math inline">\(10^9\)</span> sequences per second, our program would still take several months.</p></li>
-<li><p>Another insight would be to notice that for every set of <span class="math inline">\(n\)</span> distinct numbers, there is at most one ordering of them that could potentially work, because we want their largest (or smallest) prime factors to be increasing as well. So for every <em>set</em> of <span class="math inline">\(n\)</span> distinct numbers, we can simply <strong>sort them by their largest prime factor</strong>, and check if that ordering works. This reduces the number of candidates further to <span class="math inline">\(\binom{46}{10} \approx 4\cdot 10^9\)</span>, which is much smaller than before, and the program may now be waitable.</p></li>
+<li><p>First, the numbers must be <em>distinct</em>, so we could just try to enumerate sequences <strong>without repeated values</strong>. This reduces the number of candidates from <span class="math inline">\(46^{10}\)</span> to <span class="math inline">\(46\cdot 45\cdot 44 \cdots 37\)</span>. However, this number is still large—it’s <span class="math inline">\(\approx 1.5\cdot 10^{16}\)</span>, which isn’t a huge improvement. With <span class="math inline">\(10^9\)</span> sequences per second, our program would still take several months.</p></li>
+<li><p>Another insight would be to notice that for every <em>set</em> of <span class="math inline">\(n\)</span> distinct numbers, there is at most one ordering of them that could potentially work, because we want their largest (or smallest) prime factors to be increasing as well. So for every <em>set</em> of <span class="math inline">\(n\)</span> distinct numbers, we can simply <strong>sort them by their largest prime factor</strong>, and check if that ordering works. This reduces the number of candidates further to <span class="math inline">\(\binom{46}{10} \approx 4\cdot 10^9\)</span>, which is much smaller than before, and the program may now be waitable.</p></li>
 <li><p>However, we can do even better than this. We could attempt to build the sequence number by number, and stop the construction <strong>as soon as one of the conditions already fails</strong>.</p>
 <p>Specifically, the goal is to construct the sequence <span class="math inline">\([a_1, a_2, \ldots, a_n]\)</span> number by number. At every point in the construction, we’re attempting to choose the value of some <span class="math inline">\(a_i\)</span> between <span class="math inline">\(2\)</span> and <span class="math inline">\(b-1\)</span>. We could just try each of them in turn, but we could do better: We know that <span class="math inline">\(a_i\)</span>’s smallest and largest prime factors must be larger than those of <span class="math inline">\(a_{i-1}\)</span>’s, so it’s enough to only try the values with that property.</p>
 <p>After successfully choosing <span class="math inline">\(n\)</span> such numbers this way, we’re guaranteed that the sequence we produced is valid (since we already checked all the necessary conditions), so the running time of this solution is now basically proportional to the number of sequences itself!<a href="#fn1" class="footnoteRef" id="fnref1"><sup>1</sup></a> So we simply hope that there aren’t too many of them that the program will finish quickly. And sure enough, if you implement and run this with <span class="math inline">\(n = 10\)</span> and <span class="math inline">\(b = 48\)</span>, we find that it finishes in just a few seconds, even in Python!</p></li>
@@ -218,6 +218,9 @@ <h3 id="counting-recursively">Counting recursively</h3>
 <p>Using this recurrence, we can now build a table of values of <span class="math inline">\(S(n&#39;, i)\)</span>, for all <span class="math inline">\((n&#39;, i)\)</span> such that <span class="math inline">\(1 \le n&#39; \le n\)</span> and <span class="math inline">\(1 &lt; i &lt; b\)</span>. We can build this table in increasing order of <span class="math inline">\(n&#39;\)</span>, because each entry <span class="math inline">\(S(n&#39;, i)\)</span> only depends on the “previous layer” (because the summands are <span class="math inline">\(S(n&#39; - 1, j)\)</span>), whose values we’ve already computed. Finally, once we fill in the <span class="math inline">\(n\)</span>th layer, we could then compute the answer using our summation formula above.</p>
 <p>What’s the running time of this solution? Well, there are <span class="math inline">\(\approx nb\)</span> possible arguments <span class="math inline">\((n&#39;, i)\)</span>, and each one is computed with a summation with <span class="math inline">\(\approx b\)</span> terms, so the amount of work is roughly <span class="math inline">\(\approx nb\cdot b = nb^2\)</span>. (In algorithm parlance, we say that the running time is “<span class="math inline">\(\mathcal{O}(nb^2)\)</span>.”) The amount of steps needed is small enough that this algorithm can be used to solve Subtask 1 by hand (or maybe with a spreadsheet). For Subtask 2, this is already quite waitable, but we can slightly speed it up by noticing that <span class="math inline">\(S(n, i)\)</span> doesn’t really depend on <span class="math inline">\(i\)</span>, only on <span class="math inline">\((x_i, y_i)\)</span>, so such values are equal for multiple points that happen to <em>coincide</em>. Formally, if <span class="math inline">\((x_i, y_i) = (x_j, y_j)\)</span>, then <span class="math inline">\(S(n, i) = S(n, j)\)</span>. Using this, we only need to compute it once for every <em>distinct</em> point in <span class="math inline">\(\{(x_i, y_i) \mid 1 &lt; i &lt; b \}\)</span>. This speeds up the running time from <span class="math inline">\(\approx nb\cdot b\)</span> steps to <span class="math inline">\(\approx np\cdot b\)</span> steps, where <span class="math inline">\(p\)</span> is the number of distinct points. (For <span class="math inline">\(b = 4000\)</span>, you could check that <span class="math inline">\(p = 1637\)</span>.)</p>
 <p>This technique of building a table of results whose elements depend on earlier entries is called <strong>dynamic programming</strong>, or DP.</p>
+<div class="task">
+<p><strong>Exercise:</strong> There’s another slight tweak that can be done to improve this from <span class="math inline">\(\approx np\cdot b\)</span> steps to <span class="math inline">\(\approx np\cdot p\)</span> steps. Explain how to do it.</p>
+</div>
 <p></details></p>
 <p><details class="editorial-section"><summary class="h2">Subtasks 3 &amp; 4</summary></p>
 <p>For the remaining subtasks, I’ll only give hints. The previous solution is now too slow, so we need something faster. I’ll give you a few hints that you can use to speed up your solution in different ways. A combination of some of them (plus maybe a few other insights) can be used to solve the remaining subtasks.</p>
diff --git a/2023/tama/primes/index.md b/2023/tama/primes/index.md
index bf4623c..84a8468 100644
--- a/2023/tama/primes/index.md
+++ b/2023/tama/primes/index.md
@@ -77,13 +77,13 @@ def solve(n, b):
 ```
 We can check that this is correct by running it on one of the examples, say $n = 3$ and $b = 18$.
 
-Unfortunately, when you try to pass in the actual input $n = 10$ and $b = 48$, you&rsquo;ll find that it doesn&rsquo;t seem to finish. Indeed, there are $46$ possible values, which means there are $46^{10} \approx 4\cdot 10^{16}$ possible sequences. Even if we could process $10^9$ sequences per second, this program will take more than one year to finish!
+Unfortunately, when you try to pass in the actual input $n = 10$ and $b = 48$, you&rsquo;ll find that it doesn&rsquo;t seem to finish. Indeed, there are $46$ possible values, which means there are $46^{10} \approx 4\cdot 10^{16}$ possible sequences. Even if we could process $10^9$ sequences per second, this program will take *more than one year* to finish!
 
 We can improve this slightly with some observations.
 
-- First, the numbers must be *distinct*, so we could just enumerate all sequences **without repeated values**. This reduces the number of candidates from $46^{10}$ to $46\cdot 45\cdot 44 \cdots 37$. However, this number is still large&mdash;it&rsquo;s $\approx 1.5\cdot 10^{16}$, which isn&rsquo;t a huge improvement. With $10^9$ sequences per second, our program would still take several months.
+- First, the numbers must be *distinct*, so we could just try to enumerate sequences **without repeated values**. This reduces the number of candidates from $46^{10}$ to $46\cdot 45\cdot 44 \cdots 37$. However, this number is still large&mdash;it&rsquo;s $\approx 1.5\cdot 10^{16}$, which isn&rsquo;t a huge improvement. With $10^9$ sequences per second, our program would still take several months.
 
-- Another insight would be to notice that for every set of $n$ distinct numbers, there is at most one ordering of them that could potentially work, because we want their largest (or smallest) prime factors to be increasing as well. So for every *set* of $n$ distinct numbers, we can simply **sort them by their largest prime factor**, and check if that ordering works. This reduces the number of candidates further to $\binom{46}{10} \approx 4\cdot 10^9$, which is much smaller than before, and the program may now be waitable.
+- Another insight would be to notice that for every *set* of $n$ distinct numbers, there is at most one ordering of them that could potentially work, because we want their largest (or smallest) prime factors to be increasing as well. So for every *set* of $n$ distinct numbers, we can simply **sort them by their largest prime factor**, and check if that ordering works. This reduces the number of candidates further to $\binom{46}{10} \approx 4\cdot 10^9$, which is much smaller than before, and the program may now be waitable.
 
 - However, we can do even better than this. We could attempt to build the sequence number by number, and stop the construction **as soon as one of the conditions already fails**. 
 
@@ -197,6 +197,11 @@ What&rsquo;s the running time of this solution? Well, there are $\approx nb$ pos
 
 This technique of building a table of results whose elements depend on earlier entries is called **dynamic programming**, or DP.
 
+<div class="task">
+
+**Exercise:** There&rsquo;s another slight tweak that can be done to improve this from $\approx np\cdot b$ steps to $\approx np\cdot p$ steps. Explain how to do it.
+</div>
+
 </details>