diff --git a/2023/tama/griddy/index.html b/2023/tama/griddy/index.html index f736f0e..2a85205 100644 --- a/2023/tama/griddy/index.html +++ b/2023/tama/griddy/index.html @@ -91,7 +91,7 @@

A straightforward approach

Anyway, the most straightforward solution would be to just do it as stated, a.k.a., brute force: enumerate all \(2^{rc}\) grids, compute \(B(G)\) for each of them, and then sum up all these \(B(G)^3\). Enumerating grids is relatively straightforward with backtracking, and for the first subtask, \(2^{rc} = 2^{25} = 33554432\) which is quite manageable for a computer. The only missing ingredient to fully implement this solution is being able to compute \(B(G)\) for a given grid \(G\).

Computing \(B(G)\)

We are given a grid with \(r\) rows and \(c\) columns, and we want to make it based, i.e., change it so that every row and every column has an even number of cringe memes.

-

Let’s convert a cool meme (🗿) into a \(0\) and a cringe meme (😬) into a \(1\), so the condition translates to: the sum of every row and every column is even.

+

Let’s convert a cool meme (🗿) into a \(0\) and a cringe meme (😬) into a \(1\), so the condition translates to: the sum of every row and every column is even.

Now, the effect of flipping a cell is to flip the parity of exactly one row and exactly one column, namely the row and column containing the cell. Thus, if there are \(R\) odd rows, then you need at least \(R\) flips to make all these odd rows even. Similarly, if there are \(C\) odd columns, then you need at least \(C\) flips. Combining these tells us that we need \(\max(R, C)\) or more moves to make the grid based.

On the other hand, for every move, we can choose the row and column to flip independently. Thus, it seems intuitive that \(\max(R, C)\) moves are enough. And indeed, it is:
diff --git a/2023/tama/griddy/index.md b/2023/tama/griddy/index.md index 1778bf7..157ce88 100644 --- a/2023/tama/griddy/index.md +++ b/2023/tama/griddy/index.md @@ -29,7 +29,7 @@ Anyway, the most straightforward solution would be to *just do it* as stated, a. We are given a grid with $r$ rows and $c$ columns, and we want to make it *based*, i.e., change it so that every row and every column has an even number of cringe memes. -Let’s convert a cool meme (🗿) into a $0$ and a cringe meme (😬) into a $1$, so the condition translates to: the sum of every row and every column is even. +Let’s convert a cool meme (🗿) into a $0$ and a cringe meme (😬) into a $1$, so the condition translates to: **the sum of every row and every column is even.** Now, the effect of flipping a cell is to flip the parity of exactly one row and exactly one column, namely the row and column containing the cell. Thus, if there are $R$ odd rows, then you need at least $R$ flips to make all these odd rows even. Similarly, if there are $C$ odd columns, then you need at least $C$ flips. Combining these tells us that we need $\max(R, C)$ or more moves to make the grid based. diff --git a/2023/tama/lucas/index.html b/2023/tama/lucas/index.html index d095e08..f4d337b 100644 --- a/2023/tama/lucas/index.html +++ b/2023/tama/lucas/index.html @@ -117,7 +117,7 @@

Generate the parities separately

if parity == 0: total = (total + value**2) % m
-

Remark: You can also just compute the Lucas numbers modulo \(2m\); that way, reducing them modulo \(m\) and \(2\) is still valid. (Can you see why?)

+

Remark: You could also just compute the Lucas numbers modulo \(2m\); that way, reducing them modulo \(m\) and \(2\) is still valid. (Can you see why?)

Pen-and-paper insight

It turns out that there’s a simple criterion that gives us the parity of a Lucas number given only its index. diff --git a/2023/tama/lucas/index.md b/2023/tama/lucas/index.md index d543536..bfe0392 100644 --- a/2023/tama/lucas/index.md +++ b/2023/tama/lucas/index.md @@ -61,7 +61,7 @@ for value, parity in zip(L, L_parity): ```
-**Remark:** You can also just compute the Lucas numbers modulo $2m$; that way, reducing them modulo $m$ and $2$ is still valid. (Can you see why?) +**Remark:** You could also just compute the Lucas numbers modulo $2m$; that way, reducing them modulo $m$ and $2$ is still valid. (Can you see why?)
### Pen-and-paper insight diff --git a/2023/tama/pillage/index.html b/2023/tama/pillage/index.html index e49accf..29f1479 100644 --- a/2023/tama/pillage/index.html +++ b/2023/tama/pillage/index.html @@ -85,35 +85,35 @@

Solution Writeup

Solution Writeup: Kevin Atienza

Remark: \(a/b\) modulo \(998244353\)?

-

In this editorial, I'll pretend that we're computing the full answer (a rational number) instead of the answer modulo \(998244353\). It turns out that many exact solutions can be adapted to compute the answer modulo something instead. A bonus section below describes how to do it.

+

In this editorial, I’ll pretend that we’re computing the full answer (a rational number) instead of the answer modulo \(998244353\). It turns out that many exact solutions can be adapted to compute the answer modulo something instead. A bonus section below describes how to do it.

Subtask 1

-

For Subtask 1, I'll describe a solution that doesn't use a lot of insights and essentially only uses dynamic programming (DP) (aside from the definition of expected value). You could also solve this subtask with pen and paper by using the solution for Subtask 2, which is perfectly doable by hand (and easier to implement as well).

+

For Subtask 1, I’ll describe a solution that doesn’t use a lot of insights and essentially only uses dynamic programming (DP) (aside from the definition of expected value). You could also solve this subtask with pen and paper by using the solution for Subtask 2, which is perfectly doable by hand (and easier to implement as well).

Expected value ⇝ Counting

If you have some sort of “random variable” \(X\), then we say that the expected value of \(X\), denoted \(\operatorname{E}[X]\), is the weighted sum of the possible results of \(X\), weighted by their probabilities. More formally, if the possible results are \(\{x_1, x_2, \ldots, x_k\}\) with respective probabilities \(p_1, p_2, \ldots, p_k\), then \[\operatorname{E}[X] := p_1x_1 + p_2x_2 + \ldots + p_kx_k,\] or in summation notation, \[\operatorname{E}[X] := \sum_{i=1}^k p_ix_i.\] The expected value of \(X\) can be thought of as the average value of \(X\), when an experiment is performed many, many times and averaging the value of \(X\) across them.

Here are some examples:

-

So let's define a random variable \(T\) representing the result of the process outlined in the problem statement. The process chooses \(w\) numbers randomly1 between \(1\) and \(k\), and \(T\) is calculated as the sum of the \(n\) largest elements, so the possible results are between \(n\) and \(nk\). If we write the probability of obtaining the result \(t\) as \(p_t\), then the answer is \[\operatorname{E}[T] = \sum_{t=n}^{nk}\, p_t\,t.\] So we are done if we can compute \(p_t\) for each \(t\) from \(n\) to \(nk\).

+

So let’s define a random variable \(T\) representing the result of the process outlined in the problem statement. The process chooses \(w\) numbers randomly1 between \(1\) and \(k\), and \(T\) is calculated as the sum of the \(n\) largest elements, so the possible results are between \(n\) and \(nk\). If we write the probability of obtaining the result \(t\) as \(p_t\), then the answer is \[\operatorname{E}[T] = \sum_{t=n}^{nk}\, p_t\,t.\] So we are done if we can compute \(p_t\) for each \(t\) from \(n\) to \(nk\).

Now, the process has \(k^w\) possible outcomes—namely all the sequences of length \(w\), each element of which is between \(1\) and \(k\)—and each of those outcomes is equally likely. Therefore, we can simply count the number of outcomes that result in a sum of \(t\), then divide by \(k^w\) to get the probability. If we write the number of sequences whose sum of \(n\) largest elements is \(t\) as \(c_t\), then we simply have \[p_t = \frac{c_t}{k^w}.\]

-

So we've now reduced the problem to computing the \(c_t\)s. Now, a sum of \(t\) can arise in multiple ways. For example, if \(n = 3\) and \(t = 10\), then the top \(3\) values of the sequence (each in sorted order) could be \([2, 3, 5]\), or it could be \([2, 4, 4]\), or \([1, 1, 8]\), or something else. So, to count the number of sequences whose sum of \(n\) largest elements is \(t\), we need to enumerate all possible sequences of top \(n\) values whose sum is \(t\), and for each one, count the number of sequences of length \(w\) whose sequence of top \(n\) values is that sequence.

-If that's confusing, let's formalize a bit. Let's define a winner sequence as a sorted sequence of \(n\) values, each of which is between \(1\) and \(k\). Winner sequences are exactly the possible “sequences of \(n\) largest values”. Now, if \(W\) is a winner sequence, let's define \(c(W, w)\) as the number of length-\(w\) sequences whose sequence of \(n\) largest values is \(W\). Then you may check that the following equation holds \[c_t = \sum_{\substack{\text{$W$ is a winner sequence} \\ \mathit{sum}(W) = t}} c(W, w).\] Thus, we've further reduced the problem to that of computing \(c(W, w)\) across all winner sequences \(W\). And as it turns out, for Subtask 1, there aren't that many winner sequences. We can see this by simply enumerating them all (say with a computer). Finding a formula for the number of them isn't that hard either: +

So we’ve now reduced the problem to computing the \(c_t\)s. Now, a sum of \(t\) can arise in multiple ways. For example, if \(n = 3\) and \(t = 10\), then the top \(3\) values of the sequence (each in sorted order) could be \([2, 3, 5]\), or it could be \([2, 4, 4]\), or \([1, 1, 8]\), or something else. So, to count the number of sequences whose sum of \(n\) largest elements is \(t\), we need to enumerate all possible sequences of top \(n\) values whose sum is \(t\), and for each one, count the number of sequences of length \(w\) whose sequence of top \(n\) values is that sequence.

+If that’s confusing, let’s formalize a bit. Let’s define a winner sequence as a sorted sequence of \(n\) values, each of which is between \(1\) and \(k\). Winner sequences are exactly the possible “sequences of \(n\) largest values”. Now, if \(W\) is a winner sequence, let’s define \(c(W, w)\) as the number of length-\(w\) sequences whose sequence of \(n\) largest values is \(W\). Then you may check that the following equation holds \[c_t = \sum_{\substack{\text{$W$ is a winner sequence} \\ \mathit{sum}(W) = t}} c(W, w).\] Thus, we’ve further reduced the problem to that of computing \(c(W, w)\) across all winner sequences \(W\). And as it turns out, for Subtask 1, there aren’t that many winner sequences. We can see this by simply enumerating them all (say with a computer). Finding a formula for the number of them isn’t that hard either:

Exercise: Show that the number of winner sequences is exactly \(\binom{n + k - 1}{n}\).

For Subtask 1, \(n = 5\) and \(k = 5\), so \(\binom{n + k - 1}{n} = 126\), so there are indeed only a few of them.

Computing \(c(W, w)\)

Thinking “DP-cally”, we now attempt to build the length-\(w\) sequence element by element. As we build the sequence, its “sequence of \(n\) largest elements” changes as well.

-

Let's be more precise. For a sequence \(S\), let's call the “sequence of \(n\) largest elements of \(S\)” its winning sequence, and denote it by \(W_S\). Now, suppose we insert the value \(v\) to \(S\). Let's denote the updated sequence by \(S + [v]\). Then the winning sequence might change because of \(v\). Specifically, the new winning sequence is obtained by inserting \(v\) to \(W_S\) in its proper sorted location, and then dropping the lowest element. (Can you see why?) Let's denote the process of “inserting a value \(v\) to a sequence \(W\) in its proper sorted location, and then dropping the lowest element” as a pushpop operation, and denote it by \(\mathit{pushpop}(W, v)\). Then what we're saying is that the winning sequence of \(S + [v]\) is related to the winning sequence of \(S\) via a pushpop operation—specifically, \[W_{S + [v]} = \mathit{pushpop}(W_S, v).\]

-

We can now think recursively, and find a recurrence for \(c(W, w)\), as follows. Every sequence of length \(w\) can be obtained by taking a sequence \(S\) of length \(w - 1\) and then appending some value \(v\) (between \(1\) to \(k\)) to it. And as described above, the new winning sequence \(W_{S + [v]}\) is just \(\mathit{pushpop}(W_S, v)\). Notice that this latter expression only depends on \(W_S\), not on \(S\) itself. Thus, for each possible winner sequence \(W'\), we could simply collect the sequences \(S\) with \(W'\) as their winning sequence, and notice that the new winning sequence must be \(\mathit{pushpop}(W', v)\). In other words, we have the equation \[c(W, w) = \!\!\!\!\sum_{\substack{W'\,\,\,\, \\ \text{$W'$ is a winner sequence}}} \sum_{\substack{1 \le v \le k \,\,\,\, \\ \mathit{pushpop}(W', v) = W}} \!\!\!\!(\text{number of sequences $S$ of length $w - 1$ whose winning sequence is $W'$}).\] But the summand is just \(c(W', w - 1)\) by definition! Therefore, we obtain the recurrence \[c(W, w) = \sum_{\substack{W'\,\,\,\, \\ \text{$W'$ is a winner sequence}}} \sum_{\substack{1 \le v \le k \,\,\,\, \\ \mathit{pushpop}(W', v) = W}} c(W', w - 1),\] and we can use this to compute all \(c(W, w')\) we need, via DP: we build a table of results, one for each winner sequence \(W\) and each \(w' \le w\). Each entry of the table can be computed using the summation above. Since our formula for \(c(W, w')\) only depends on \(c(W', w' - 1)\), i.e., those with a smaller \(w'\) value, if we compute the table in increasing order of \(w'\), those values have already been computed, and are already on the table. Thus, we'll be able to compute the final result all the way up to \(w\), which is what we wanted.

+

Let’s be more precise. For a sequence \(S\), let’s call the “sequence of \(n\) largest elements of \(S\)” its winning sequence, and denote it by \(W_S\). Now, suppose we insert the value \(v\) to \(S\). Let’s denote the updated sequence by \(S + [v]\). Then the winning sequence might change because of \(v\). Specifically, the new winning sequence is obtained by inserting \(v\) to \(W_S\) in its proper sorted location, and then dropping the lowest element. (Can you see why?) Let’s denote the process of “inserting a value \(v\) to a sequence \(W\) in its proper sorted location, and then dropping the lowest element” as a pushpop operation, and denote it by \(\mathit{pushpop}(W, v)\). Then what we’re saying is that the winning sequence of \(S + [v]\) is related to the winning sequence of \(S\) via a pushpop operation—specifically, \[W_{S + [v]} = \mathit{pushpop}(W_S, v).\]

+

We can now think recursively, and find a recurrence for \(c(W, w)\), as follows. Every sequence of length \(w\) can be obtained by taking a sequence \(S\) of length \(w - 1\) and then appending some value \(v\) (between \(1\) to \(k\)) to it. And as described above, the new winning sequence \(W_{S + [v]}\) is just \(\mathit{pushpop}(W_S, v)\). Notice that this latter expression only depends on \(W_S\), not on \(S\) itself. Thus, for each possible winner sequence \(W'\), we could simply collect the sequences \(S\) with \(W'\) as their winning sequence, and notice that the new winning sequence must be \(\mathit{pushpop}(W', v)\). In other words, we have the equation \[c(W, w) = \!\!\!\!\sum_{\substack{W'\,\,\,\, \\ \text{$W'$ is a winner sequence}}} \sum_{\substack{1 \le v \le k \,\,\,\, \\ \mathit{pushpop}(W', v) = W}} \!\!\!\!(\text{number of sequences $S$ of length $w - 1$ whose winning sequence is $W'$}).\] But the summand is just \(c(W', w - 1)\) by definition! Therefore, we obtain the recurrence \[c(W, w) = \sum_{\substack{W'\,\,\,\, \\ \text{$W'$ is a winner sequence}}} \sum_{\substack{1 \le v \le k \,\,\,\, \\ \mathit{pushpop}(W', v) = W}} c(W', w - 1),\] and we can use this to compute all \(c(W, w')\) we need, via DP: we build a table of results, one for each winner sequence \(W\) and each \(w' \le w\). Each entry of the table can be computed using the summation above. Since our formula for \(c(W, w')\) only depends on \(c(W', w' - 1)\), i.e., those with a smaller \(w'\) value, if we compute the table in increasing order of \(w'\), those values have already been computed, and are already on the table. Thus, we’ll be able to compute the final result all the way up to \(w\), which is what we wanted.

Now, as for the base case, you could just directly count the sequences for, say, \(w' = n\), since the winning sequence is basically the sorted version of the sequence itself. Alternatively, we can use \(w' = 0\) as our base case, though we need to think about what the winning sequence of a sequence with less than \(n\) elements should be. Well, it makes sense to say that the winning sequence must be the whole sequence as well, just sorted. And instead of a pushpop operation, we could simply use a push operation, at least while the sequence still has length less than \(n\).

-

With this, we now have a solution! What's the running time? Well, the table has an entry for each \((W, w')\) with \(W\) a winner sequence and \(w' \le w\). Recall that there are \(\binom{n + k - 1}{n}\) winner sequences, so there are \(\approx \binom{n + k - 1}{n}w\) entries. Each entry is computed with the sum above, which clearly has at most \(\binom{n + k - 1}{n}k\) summands (often much less). Therefore, the amount of steps is roughly proportional to \[\approx \binom{n + k - 1}{n}w\cdot \binom{n + k - 1}{n}k = \binom{n + k - 1}{n}^2 wk.\] For Subtask 1, this is good enough; my straightforward Python implementation computes the full answer in less than one second.

+

With this, we now have a solution! What’s the running time? Well, the table has an entry for each \((W, w')\) with \(W\) a winner sequence and \(w' \le w\). Recall that there are \(\binom{n + k - 1}{n}\) winner sequences, so there are \(\approx \binom{n + k - 1}{n}w\) entries. Each entry is computed with the sum above, which clearly has at most \(\binom{n + k - 1}{n}k\) summands (often much less). Therefore, the amount of steps is roughly proportional to \[\approx \binom{n + k - 1}{n}w\cdot \binom{n + k - 1}{n}k = \binom{n + k - 1}{n}^2 wk.\] For Subtask 1, this is good enough; my straightforward Python implementation computes the full answer in less than one second.

Note: Understanding this implementation is not required to understand the following sections, so you may skip it.

@@ -156,7 +156,7 @@

Computing \(c(W, w)\)return sum(p_(t) * t for t in range(n, n*k + 1))

-

Remark: The implementation tries to copy our formulas above as closely as possible. As a result, it's highly unoptimized, and there are definitely several improvements that be made. But the main point is that even such unoptimized code is enough to solve the subtask.

+

Remark: The implementation tries to copy our formulas above as closely as possible. As a result, it’s highly unoptimized, and there are definitely several improvements that be made. But the main point is that even such unoptimized code is enough to solve the subtask.

Subtask 2

@@ -166,10 +166,10 @@

Linearity of expectation

  • \(\operatorname{E}[\alpha X] = \alpha \operatorname{E}[X]\) for any random variable \(X\) and any constant \(\alpha\), and
  • \(\operatorname{E}[X_1 + X_2] = \operatorname{E}[X_1] + \operatorname{E}[X_2]\) for any two random variables \(X_1\) and \(X_2\).
  • -

    The first one is quite intuitive; after all, \(\alpha X\) is just \(X\) with all values scaled by \(\alpha\), so the average should just be scaled in the same way. However, the second property—additivity—may be surprising. The property could be intuitive in the case where \(X_1\) and \(X_2\) are independent, but linearity doesn't require them to be—it's simply always true!

    -

    In a bonus section below, we'll explain why this is true, but for now, let's first try to apply this to the problem. Let \(T\) be the same random variable as before, so it denotes the sum of the \(n\) largest values of the sequence produced. Now, we define \(n\) new random variables \(T_1, T_2, \ldots T_n\), where \(T_i\) denotes the \(i\)th largest value of the sequence. Then clearly we have \[T = T_1 + T_2 + \ldots + T_n = \sum_{i=1}^n T_i.\] Now, the \(T_i\)'s are definitely not independent, e.g., knowing the largest value constrains the possible values of the second value, and vice versa. Regardless, expectation is always additive, so we have the equality \[\operatorname{E}[T] = \operatorname{E}[T_1] + \operatorname{E}[T_2] + \ldots + \operatorname{E}[T_n] = \sum_{i=1}^n \operatorname{E}[T_i].\] Thus, we've reduced the problem to computing \(\operatorname{E}[T_i]\) for \(1 \le i \le n\), which is potentially more manageable!

    +

    The first one is quite intuitive; after all, \(\alpha X\) is just \(X\) with all values scaled by \(\alpha\), so the average should just be scaled in the same way. However, the second property—additivity—may be surprising. The property could be intuitive in the case where \(X_1\) and \(X_2\) are independent, but linearity doesn’t require them to be—it’s simply always true!

    +

    In a bonus section below, we’ll explain why this is true, but for now, let’s first try to apply this to the problem. Let \(T\) be the same random variable as before, so it denotes the sum of the \(n\) largest values of the sequence produced. Now, we define \(n\) new random variables \(T_1, T_2, \ldots T_n\), where \(T_i\) denotes the \(i\)th largest value of the sequence. Then clearly we have \[T = T_1 + T_2 + \ldots + T_n = \sum_{i=1}^n T_i.\] Now, the \(T_i\)’s are definitely not independent, e.g., knowing the largest value constrains the possible values of the second largest, and vice versa. Regardless, expectation is always additive, so we have the equality \[\operatorname{E}[T] = \operatorname{E}[T_1] + \operatorname{E}[T_2] + \ldots + \operatorname{E}[T_n] = \sum_{i=1}^n \operatorname{E}[T_i].\] Thus, we’ve reduced the problem to computing \(\operatorname{E}[T_i]\) for \(1 \le i \le n\), which is potentially more manageable!

    Computing \(\operatorname{E}[T_i]\)

    -

    Let's now try to compute \(\operatorname{E}[T_i]\), the expected value of the \(i\)th largest element of the sequence. The possible values are between \(1\) and \(k\), so by definition, we have \[\operatorname{E}[T_i] = \sum_{v=1}^k \operatorname{P}[T_i = v]\cdot v,\] where \(\operatorname{P}[T_i = v]\) denotes the probability that \(T_i = v\). Next, we again turn probability into counting; noting that there are \(k^w\) equally likely possibilities, we have something like \[\operatorname{P}[T_i = v] = \frac{\mathit{count}_{=v}(i)}{k^w}\] where \(\mathit{count}_{=v}(i)\) denotes the number of sequences whose \(i\)th largest value is \(v\). Thus, we're done if we can compute \(\mathit{count}_{=v}(i)\).

    +

    Let’s now try to compute \(\operatorname{E}[T_i]\), the expected value of the \(i\)th largest element of the sequence. The possible values are between \(1\) and \(k\), so by definition, we have \[\operatorname{E}[T_i] = \sum_{v=1}^k \operatorname{P}[T_i = v]\cdot v,\] where \(\operatorname{P}[T_i = v]\) denotes the probability that \(T_i = v\). Next, we again turn probability into counting; noting that there are \(k^w\) equally likely possibilities, we have something like \[\operatorname{P}[T_i = v] = \frac{\mathit{count}_{=v}(i)}{k^w}\] where \(\mathit{count}_{=v}(i)\) denotes the number of sequences whose \(i\)th largest value is \(v\). Thus, we’re done if we can compute \(\mathit{count}_{=v}(i)\).

    Computing \(\mathit{count}_{=v}(i)\)

    We can compute \(\mathit{count}_{=v}(i)\) by noting that:

    @@ -230,22 +230,22 @@

    Computing \(\mathit{

    Thus, all in all, there are \[c(\ell, g, v) = \binom{w}{\ell} \cdot \binom{w - \ell}{g} \cdot (v-1)^{\ell} \cdot (k-v)^g\] such sequences.

    We now have a complete solution! How fast does it run? Well, we need to compute \(\operatorname{E}[T_i]\) for \(1 \le i \le n\), which in turn require the values \(\mathit{count}_{=v}(i)\) for \(1 \le i \le n\) and \(1 \le v \le k\), which in turn require the values \(c(\ell, g, v)\) for \(0 \le \ell \le w - 1\), \(0 \le g \le n - 1\) and \(1 \le v \le k\).

      -
    • Each \(c(\ell, g, v)\) value is a product of some binomial coefficients and powers. The powers can all be computed with fast exponentiation, or they could just be precomputed in a table at the beginning (since all powers we need have bases less than \(k\), and exponents less than \(w\)), and the binomial coefficients can also be precomputed in a table, either via Pascal's identity, or precomputing factorials and using \[\binom{a}{b} = \frac{a!}{(a - b)!b!}.\] Therefore, we could say that each \(c(\ell, g, v)\) can be computed in a constant amount of steps, and since there are \(\approx wnk\) of them, the total number of steps to compute them all is \(\approx wnk\).

    • +
    • Each \(c(\ell, g, v)\) value is a product of some binomial coefficients and powers. The powers can all be computed with fast exponentiation, or they could just be precomputed in a table at the beginning (since all powers we need have bases less than \(k\), and exponents less than \(w\)), and the binomial coefficients can also be precomputed in a table, either via Pascal’s identity, or precomputing factorials and using \[\binom{a}{b} = \frac{a!}{(a - b)!b!}.\] Therefore, we could say that each \(c(\ell, g, v)\) can be computed in a constant amount of steps, and since there are \(\approx wnk\) of them, the total number of steps to compute them all is \(\approx wnk\).

    • To compute the \(\mathit{count}_{=v}(i)\) values, note that there are \(kn\) such values, and each one is computed with a summation with \(\approx wn\) summands. Therefore, it takes \(\approx wn^2 k\) steps to compute them all.

    • The formula for \(\operatorname{E}[T]\) has \(n\) summands, each of which has a formula with \(k\) summands, so this takes \(\approx nk\) steps.

    • Finally, we also need to account for the precomputation of factorials and powers. There are \(\approx w\) factorials and \(\approx kw\) powers to precompute, so their precomputation takes \(\approx kw\) steps.

    -

    Thus, the running time is dominated by the computation of \(\mathit{count}_{=v}(i)\). For Subtask 2, we have \(wn^2 k = 6\cdot 10^9\), so the number of steps seems small enough for this to be waitable if you use a fast language and a highly optimized implementation. It may be slow though, so instead of that, let's just improve our algorithm further.

    +

    Thus, the running time is dominated by the computation of \(\mathit{count}_{=v}(i)\). For Subtask 2, we have \(wn^2 k = 6\cdot 10^9\), so the number of steps seems small enough for this to be waitable if you use a fast language and a highly optimized implementation. It may be slow though, so instead of that, let’s just improve our algorithm further.

    Computing \(\mathit{count}_{=v}(i)\) more quickly

    -

    Let's look at \(\mathit{count}_{=v}(i)\) again. It denotes the number of sequences whose \(i\)th largest value is exactly \(v\). It turns out that it's easier to count the number of sequences whose \(i\)th largest value is at most \(v\). Even more nicely, it turns out that you can use the latter to compute the former!

    -To see this, let's define \(\mathit{count}_{\le v}(i)\) to be the number of sequences whose \(i\)th largest value is at most \(v\). Then we easily have: +

    Let’s look at \(\mathit{count}_{=v}(i)\) again. It denotes the number of sequences whose \(i\)th largest value is exactly \(v\). It turns out that it’s easier to count the number of sequences whose \(i\)th largest value is at most \(v\). Even more nicely, it turns out that you can use the latter to compute the former!

    +To see this, let’s define \(\mathit{count}_{\le v}(i)\) to be the number of sequences whose \(i\)th largest value is at most \(v\). Then we easily have:

    Claim: \(\mathit{count}_{=v}(i) = \mathit{count}_{\le v}(i) - \mathit{count}_{\le v - 1}(i)\).

    Proof: Left as an exercise to the reader.

    -So we've reduced the problem to computing \(\mathit{count}_{\le v}(i)\) for \(0 \le v \le k\) and \(1 \le i \le n\). So what? Well, here's what. It turns out that we can find a version of Theorem 1 that applies to \(\mathit{count}_{\le v}(i)\): +So we’ve reduced the problem to computing \(\mathit{count}_{\le v}(i)\) for \(0 \le v \le k\) and \(1 \le i \le n\). So what? Well, here’s what. It turns out that we can find a version of Theorem 1 that applies to \(\mathit{count}_{\le v}(i)\):

    Theorem 2: The \(i\)th largest value of a sequence is at most \(v\) if and only if the sequence has \(< i\) elements greater than \(v\).

    @@ -253,7 +253,7 @@

    Computing \(g\) be the number of elements greater than \(v\), so that \(g < i\), and we can again write \[\mathit{count}_{\le v}(i) = \sum_{g=0}^{i-1} c(g, v)\] where now, \(c(g, v)\) denotes the number of sequences with exactly \(g\) elements greater than \(v\). Then we can count \(c(g, v)\) similarly as before, except it's even simpler:

    +

    We can now use a similar counting argument as before. Let \(g\) be the number of elements greater than \(v\), so that \(g < i\), and we can again write \[\mathit{count}_{\le v}(i) = \sum_{g=0}^{i-1} c(g, v)\] where now, \(c(g, v)\) denotes the number of sequences with exactly \(g\) elements greater than \(v\). Then we can count \(c(g, v)\) similarly as before, except it’s even simpler:

    1. First, choose the \(g\) indices that will be \(> v\). There are \(\binom{w}{g}\) ways to do this. The rest of the elements will be \(\le v\).
    2. Then, we choose the actual values of the elements \(> v\). There are \(g\) values to choose, and each one is an independent choice of a number between \(v+1\) and \(k\), so there are \((k-v)^g\) ways to do this.
    3. @@ -266,17 +266,17 @@

      Computing Subtask 3

      -

      The main change from Subtask 2 to Subtask 3 is that \(w\) is vastly increased, which means the portion of our previous algorithm that takes \(\approx wk\) steps is now unacceptable. Let's recap what those steps are:

      +

      The main change from Subtask 2 to Subtask 3 is that \(w\) is vastly increased, which means the portion of our previous algorithm that takes \(\approx wk\) steps is now unacceptable. Let’s recap what those steps are:

      1. precomputing factorials up to \(w\), and
      2. precomputing powers up to base \(k\) and up to exponent \(w\).
      -

      Among these, the second one clearly dominates the running time. But we can essentially get rid of the second one by simply not precomputing powers, and instead just fast exponentiation to compute them when needed! This makes the running time slightly worse—fast exponentiation takes \(\mathcal{O}(\lg w)\) steps for an exponent the size of \(w\)—but that's a very worthwhile tradeoff, because you can check that the number of steps improves from \(\mathcal{O}(wk + n^2 k)\) to \[\mathcal{O}(w + n^2 k + nk \lg w).\] This is now acceptable for Subtask 3 🙂.

      -

      Now, there's still that factor \(w\) in the running time, which in the current subtask is probably ok since \(w = 10^8\). However, in later subtasks, \(w = 10^{16}\), which suggests that that bit can still be improved further.

      -

      How can we improve it? Well, the main reason for needing factorials up to \(w\) is so that we can compute binomial coefficients. But looking closer, notice that we actually only need binomial coefficients at exactly row \(w\). Furthermore, we actually only need the first \(n\) coefficients in it. And as it turns out, there's a way to compute a row of binomial coefficients one by one, starting from the leftmost one, by using the following recurrence (which is easy to prove using the factorial formula): \[\binom{w}{g} = \binom{w}{g - 1}\cdot \frac{w - g + 1}{g},\] with base case simply \(\binom{w}{0} = 1\). So now, instead of precomputing factorials, we may simply precompute the needed binomial coefficients using this recurrence with just \(\approx n\) steps! The running time then improves to \[\mathcal{O}(n^2 k + nk \lg w),\] which is really cool.

      +

      Among these, the second one clearly dominates the running time. But we can essentially get rid of the second one by simply not precomputing powers, and instead just fast exponentiation to compute them when needed! This makes the running time slightly worse—fast exponentiation takes \(\mathcal{O}(\lg w)\) steps for an exponent the size of \(w\)—but that’s a very worthwhile tradeoff, because you can check that the number of steps improves from \(\mathcal{O}(wk + n^2 k)\) to \[\mathcal{O}(w + n^2 k + nk \lg w).\] This is now acceptable for Subtask 3 🙂.

      +

      Now, there’s still that term \(w\) in the running time, which in the current subtask is probably ok since \(w = 10^8\). However, in later subtasks, \(w = 10^{16}\), which suggests that that bit can still be improved further.

      +

      How can we improve it? Well, the main reason for needing factorials up to \(w\) is so that we can compute binomial coefficients. But looking closer, notice that we actually only need binomial coefficients at exactly row \(w\). Furthermore, we actually only need the first \(n\) coefficients in it. And as it turns out, there’s a way to compute a row of binomial coefficients one by one, starting from the leftmost one, by using the following recurrence (which is easy to prove using the factorial formula): \[\binom{w}{g} = \binom{w}{g - 1}\cdot \frac{w - g + 1}{g},\] with base case simply \(\binom{w}{0} = 1\). So now, instead of precomputing factorials, we may simply precompute the needed binomial coefficients using this recurrence with just \(\approx n\) steps! The running time then improves to \[\mathcal{O}(n^2 k + nk \lg w),\] which is really cool.

    Subtasks 4 & 5

    -

    Our previous algorithm is now too slow; in particular, that \(\mathcal{O}(n^2 k)\) bit in the running time is now too large. For the rest of the subtasks, I'll just give a couple of hints to guide you towards faster solutions.

    +

    Our previous algorithm is now too slow; in particular, that \(\mathcal{O}(n^2 k)\) bit in the running time is now too large. For the rest of the subtasks, I’ll just give a couple of hints to guide you towards faster solutions.

    Hint 1 Do you really have to compute the whole sum \[\mathit{count}_{\le v}(i) = \sum_{g=0}^{i-1} c(g, v)\] every time?

    Hint 2 Notice that \[(k - v)^g\cdot v^{w - g} = v^w \cdot \left(\frac{k - v}{v}\right)^g.\] Letting \(x_v := \frac{k - v}{v}\), this is the same as \(v^w x_v^g\).

    @@ -287,7 +287,7 @@

    Computing \(\operatorname{E}[X_1 + X_2] = \operatorname{E}[X_1] + \operatorname{E}[X_2]\) for any two random variables \(X_1\) and \(X_2\).

    The first one is simple enough, and you should be able to prove it yourself 🙂. The real surprise is the second, which holds even if \(X_1\) and \(X_2\) are not independent. (For independent variables, this may not be a surprise, since “clearly” the variables have nothing to do with each other,2 so the averages should “just add up.”)

    -

    Let's see an example of this, using our current problem itself, with \(n = 2\), \(w = 3\) and \(k = 2\). In this case, we have \[T = T_1 + T_2\] where \(T_i\) is the value of the \(i\)th largest element. Clearly, \(T_1\) and \(T_2\) are not independent; for example, we know that \(T_1\) is at least \(T_2\), so if \(T_2\) is \(2\), then \(T_1\) must be \(2\) as well.

    +

    Let’s see an example of this, using our current problem itself, with \(n = 2\), \(w = 3\) and \(k = 2\). In this case, we have \[T = T_1 + T_2\] where \(T_i\) is the value of the \(i\)th largest element. Clearly, \(T_1\) and \(T_2\) are not independent; for example, we know that \(T_1\) is at least \(T_2\), so if \(T_2\) is \(2\), then \(T_1\) must be \(2\) as well.

    Regardless, we will now illustrate that \[\operatorname{E}[T] = \operatorname{E}[T_1] + \operatorname{E}[T_2]\] by simply enumerating all \(2^3 = 8\) possible sequences: