Skip to content

Commit

Permalink
algo/hw1: update notes
Browse files Browse the repository at this point in the history
  • Loading branch information
tiankaima committed Jun 11, 2024
1 parent eeeb5a2 commit 4c3eb82
Show file tree
Hide file tree
Showing 5 changed files with 180 additions and 25 deletions.
84 changes: 75 additions & 9 deletions 7e1810-algo_hw/hw1.typ
Original file line number Diff line number Diff line change
@@ -1,12 +1,22 @@
#import "utils.typ": *

== HW 1 (Week 2)
Due: 2024.03.17

#let ans(it) = [
#pad(1em)[
#text(fill: blue)[
#it
]
]
#rev1_note[
Review: 渐进符号

$o, O, Theta, omega, Omega$ 的定义如下:

$
O(g(n)) = {f(n) mid(|) exists c > 0, n_0 > 0, forall n >= n_0 quad 0 <= f(n) <= c dot g(n)}\
o(g(n)) = {f(n) mid(|) exists c > 0, n_0 > 0, forall n >= n_0 quad 0 <= f(n) < c dot g(n)}\
Theta(g(n)) = {
f(n) mid(|) exists c_1, c_2 > 0, n_0 > 0, forall n >= n_0 quad 0 <= c_1 dot g(n) <= f(n) <= c_2 dot g(n)
}\
Omega(g(n)) = {f(n) mid(|) exists c > 0, n_0 > 0, forall n >= n_0 quad 0 <= c dot g(n) <= f(n)}\
omega(g(n)) = {f(n) mid(|) exists c > 0, n_0 > 0, forall n >= n_0 quad 0 <= c dot g(n) < f(n)}
$
]

=== Question 2.3-5
Expand All @@ -28,14 +38,17 @@ You can also think of insertion sort as a recursive algorithm. In order to sort
A[i + 1] = key
```

#rev1_note[
最坏情况是: 在排序 $[1,k]$ 时, 需要将 $A[k]$$A[1:k-1]$ 中的所有元素比较一次, 以确定插入位置. 这样, 在排序 $[1,n]$ 时, 需要比较的次数为 $1 + 2 + dots.c + (n-1) = Theta(n^2)$.
]

The recurrence for its worst-case running time is

$
T(n) = cases(T(n - 1) + Theta(n) space.quad & n>1, Theta(1) & n=1)
$

The solution to the recurrence is $Theta(n^2)$ worst-case time.

]

=== Question 2-1
Expand All @@ -51,6 +64,7 @@ Although merge sort runs in $Theta(n lg n)$ worst-case time and insertion sort r

#ans[
+ For each sublist, the insertion sort can sort the $k$ elements in $Theta(k^2)$ worst-case time. Thus, the insertion sort can sort the $n\/k$ sublists, each of length $k$, in $Theta(n k)$ worst-case time.

+ Given $n\/k$ sorted sublists, each of length $k$, the recurrence for merging the sublists is
$
T(n) = cases(2 dot.c T(n\/2) + Theta(n) space.quad & n>k, 0 & n=k)
Expand All @@ -59,23 +73,75 @@ Although merge sort runs in $Theta(n lg n)$ worst-case time and insertion sort r

*This could also be viewed as a tree with $lg(n\/k)$ levels with $n$ element in each level. Worst case would be $Theta(n lg (n\/k))$*

#rev1_note[
$n\/k$ 个数组看成 $n\/k$ 个元素, 作为 merge sort 的叶节点. 这样一个数有 $n\/k$ 个叶节点, 也就有 $log(n\/k)$ 层. 每层实际上合并 $n$ 个元素, 总时间复杂度为 $Theta(n lg(n\/k))$.

直接进行 $n\/k-1$ 次合并是不可行的, 这样的速度在 $Theta(n^2\/k)$, 不符合要求.

另一种可行的思路: 考虑直接合并 $n\/k$ 个有序数组, 我们比较这 $n\/k$ 个数组中, 尚未取出的最小元素, 并从中选取最小元素.

具体来说, 维护一个 $n\/k$ 大小的 heap 和一个 $n\/k$ 大小的数组, 用于存储每个数组中的当前元素. 每次取出堆顶元素, 并将对应数组的下一个元素插入堆中. 这样, 每次取出最小元素 (构建最小堆) 的时间复杂度为 $O(lg(n\/k))$, 总时间复杂度为 $O(n lg(n\/k))$.
]

+ Take $Theta(n k + n lg(n \/ k)) = Theta(n lg n)$, consider $k = Theta(lg n)$:
$
Theta(n k + n lg(n \/ k))
&= Theta (n k + n lg n - n lg k) \
&= Theta (n lg n + n lg n - n lg (lg n)) \
&= Theta (n lg n)
$

#rev1_note[
思路:
$
Theta(n k +n log(n\/k))=O(n log n)
$
只需令 $k, log(n\/k) = O(log n)$ 就能满足条件. 这样我们得到 $k=o(1)=O(log n)$, 选取最大边界 $k=Theta(log n)$, 通过上述验证可以发现严格记号成立, 那么最大的 $k$ 值为 $Theta(log n)$. (渐进意义上的.)

容易发现当 $k=o(log n)$ 时, $Theta(n k + n log(n\/k))=o(n log n)$, 这样的 $k$ 值不满足题目要求.
]

+ Choose $k$ to be the largest length of sublist for which insertion sort is faster than merge sort. Use a small constant such as $5$ or $10$.

#rev1_note[
这里的主要问题是, 比较两个 $Theta$ 意义下相等的算法用时必须考虑常数, 实践中可以通过记录算法实际运算次数得到.
]
]

#rev1_note[
Review: 主定理

对分治算法的递归式

$T(n) = a T(n / b) + f(n)$

主定理给出了一个快速求解递归算法复杂度的复杂度:

1. 如果 $f(n) = O(n^c)$, 其中 $c < log_b a$, 则 $T(n) = Theta(n^{log_b a})$.

]

=== Question 4.2-3
What is the largest $k$ such that if you can multiply $3 times 3$ matrices using $k$ multiplications (not assuming commutativity of multiplication), then you can multiply $n times n$ matrices in $o(n lg 7)$ time? What is the running time of this algorithm?
What is the largest $k$ such that if you can multiply $3 times 3$ matrices using $k$ multiplications (not assuming commutativity of multiplication), then you can multiply $n times n$ matrices in $o(n^(log 7))$ time? What is the running time of this algorithm?

#ans[
#rev1_note[
稍微翻译一下题目:

如果你有一个 $k$ 次乘法的 $3 times 3$ 矩阵乘法算法, 那么这样的算法是否能否构造一个, 在 $o(n^(log_2 7))$ 时间内完成 $n times n$ 矩阵乘法? 问满足条件的最大的 $k$ 是多少.

递归式是 $T(n) = k T(n\/3) + O(n^2)$, 我们分类讨论来使用主定理:

// - $k=27$ 时这就是最基本的矩阵分块算法. 我们不妨假设 $k<27$.

- $log_3 k < 2$, 正则化条件: $k dot (n\/3)^2 < n^2$$k < 9$, 算法规模在 $T(n)=O(n^2)subset O(n^(log_2 7))$.
- $log_3 k = 2$, 此时 $T(n)=O(n^2 lg n) subset O(n^(log_2 7))$.
- $log_3 k > 2$, 为使 $T(n)=O(n^(log_3 k)) subset O(n^(log_2 7))$, 需要 $log_3 k < log_2 7$, 最大的 $k=21$.

下面这个答案中递归式是错误的, 应该改正.
]

Assuming $n = 3^m$. Use block matrix multiplication, the recursive running time is $T(n) = k T(n\/3) + O(1)$.

When $log_3 k > 2 $, using master theorem, the largest $k$ to satisfy $log_3 k < lg 7$ is $k=21$.

]
66 changes: 58 additions & 8 deletions 7e1810-algo_hw/hw2.typ
Original file line number Diff line number Diff line change
@@ -1,17 +1,26 @@
#import "utils.typ": *

== HW 2 (Week 3)
Due: 2024.03.24

#let ans(it) = [
#pad(1em)[
#text(fill: blue)[
#it
]
]
]

=== Question 6.2-6
The code for MAX-HEAPIFY is quite efficient in terms of constant factors, except possibly for the recursive call in line 10, for which some compilers might produce inefficient code. Write an efficient MAX-HEAPIFY that uses an iterative control construct (a loop) instead of recursion.

#ans[
#rev1_note[
Review: 最大堆/最小堆

最大堆是一种满足性质 $A["PARENT"(i)] >= A[i]$ 的二叉树, 其中 $"PARENT"(i) = floor(i\/2)$. 最小堆是一种满足性质 $A["PARENT"(i)] <= A[i]$ 的二叉树.

插入过程: 向最下一层、最右侧节点插入新叶节点 (实际上就是在数组结尾添加), 添加之后向上调整, 使其重新满足最大(小)堆的性质. 调整时间复杂度 $O(log n)$

删除过程: 向下调整, 在子节点中寻找最大(小)值, 与当前节点交换, 递归调整. 调整时间复杂度 $O(log n)$

建堆过程: 从最后一个非叶节点开始, 依次向前调整, 使其满足最大(小)堆的性质. 时间复杂度 $O(n)$, 主要考虑从叶到根开始「合并」已经建好的堆, 每次都是向下调整, 时间复杂度 $O(log n - k)$, 总时间复杂度 $O(n log n) - sum_i^(n) log i=O(n)$

下面是一个递归改循环的版本:
]

Consider the following pseudocode code:
```txt
MAX-HEAPIFY(A, i)
Expand Down Expand Up @@ -65,6 +74,18 @@ Show how to implement a first-in, first-out queue with a priority queue. Show ho

=== Question 7.4-6
Consider modifying the PARTITION procedure by randomly picking three elements from subarray $A[p : r]$ and partitioning about their median (the middle value of the three elements). Approximate the probability of getting worse than an $alpha$-to-$(1 - alpha)$ split, as a function of $alpha$ in the range $0 < alpha < 1/2$.

#rev1_note[
认为元素可以重复取得, 或者认为 $r-p$ 足够大, 这样可以保证三次选取独立.

考虑三个变量中位数的分布, 只要它落在 $[0,alpha]union[1-alpha,1]$ 之间, 这样的 $q$ 的选取就不如 $(0,alpha)union(alpha,1)$, 由于对称性, 我们只需要计算左边的部分, 共有两种情况能使得中位数落在 $[0,alpha]$:

- 前两个数和最后一个分别落在 $[0,alpha],[alpha,1]$中: $vec(2,3) times alpha^2(1-alpha)$
- 三个数均落在 $[0,alpha]$ 中: $alpha^3$

容易证明这些情况是无交的, 并且列举了所有可能的「中位数落在...」的情况, 乘 $2$ 即可:
]

#ans[
*Assuming the same element could be picked more than once*(which should be the case in real world).

Expand All @@ -78,6 +99,21 @@ Consider modifying the PARTITION procedure by randomly picking three elements fr

=== Question 8.2-7
Counting sort can also work efficiently if the input values have fractional parts, but the number of digits in the fractional part is small. Suppose that you are given n numbers in the range $0$ to $k$, each with at most $d$ decimal (base $10$) digits to the right of the decimal point. Modify counting sort to run in $Theta(n + 10^d k)$ time.

#rev1_note[
Review: 计数排序

一个保证稳定性的思路如下, 考虑数据长度 $n$, 范围 $[0,k]$.
- 开辟一个 $[0,k]$ 的数组 $C$. 清零
- 对于原数据 $n$, 遍历一遍并加入到这个计数的数组中
- 计算前缀和: $C[i] = C[i] + C[i-1]$
- 从原数据的尾部开始, 将数据放入到 $C$ 中对应的位置, 并将 $C$ 中的值减一, 这样每次从 $C$ 中取得的数字总是不同的.

时间复杂度: $O(n+k)$, 这个记号是不明确 $k$$m$ 之间规模关系时使用的, 如果 $n>>k$, 那么时间复杂度为 $O(n)$, 如果 $k>>n$, 那么时间复杂度为 $O(k)$.

下面的例子中, 我们只是将小数部分提到整数部分, 然后进行计数排序(保证 value 能作为 index)
]

#ans[
To achieve $Theta(n + 10^d k)$ time, we first use $Theta(n)$ time to multiply each number by $10^d$, then change the $C[0, k]$ to $C[0, 10^d k]$, and finally use $Theta(10^d k)$ time to sort the numbers.

Expand All @@ -102,6 +138,15 @@ Counting sort can also work efficiently if the input values have fractional part

=== Question 8.3-5
Show how to sort $n$ integers in the range $0$ to $n^3 - 1$ in $O(n)$ time.

#rev1_note[
Review: 基数排序

假设每个数据有 $k$ 个关键字, 每个关键字有自己的排序方式, 以第一个关键字开始, 从小到大排序, 第一个关键字相同的情况下比较第二个关键字. 以此类推, 直到最后一个关键字.

在下面这个问题中, 内层排序方式使用计数排序, 相当于每层排序 $l$ 组共计 $n$ 个元素的关键字, 按照计数排序, 每层的复杂度在 $O(n)$. 三层也还是 $O(n)$.
]

#ans[
First convert each number to base $n$, then use counting sort to sort the numbers.

Expand All @@ -111,6 +156,11 @@ Show how to sort $n$ integers in the range $0$ to $n^3 - 1$ in $O(n)$ time.

=== Question 9.3.9
Describe an $O(n)$-time algorithm that, given a set $S$ of $n$ distinct numbers and a positive integer $k <= n$, determines the $k$ numbers in $S$ that are closest to the median of $S$.

#rev1_note[
下面的回答中需要更正: Step 2 中, 对于每个元素 $y$ 计算的是 $abs(y-x)$. 然后记录这个作为 key, 原来的值作为 value. 最后取前 $k$ 个元素即可.
]

#ans[
+ $O(n)$: Using SELECT, we can find $x$ to be the median of $S$.
+ $O(n)$: Subtract $x$ from each element in $S$.
Expand Down
32 changes: 24 additions & 8 deletions 7e1810-algo_hw/hw3.typ
Original file line number Diff line number Diff line change
@@ -1,17 +1,29 @@
#import "utils.typ": *

== HW 3 (Week 4)
Due: 2024.03.31

#let ans(it) = [
#pad(1em)[
#text(fill: blue)[
#it
]
]
]

=== Question 12.2-3
Write the `TREE-PREDECESSOR` procedure(which is symmetric to `TREE-SUCCESSOR`).

#rev1_note[

Review: 二叉搜索树

二叉搜索树是一种二叉树, 其中每个节点 $x$ 都有一个关键字 $"key"[x]$ 以及一个指向 $x$ 的父节点的指针 $p[x]$, 以及指向左右孩子的指针 $"left"[x]$$"right"[x]$. 二叉搜索树性质:

+ 对于任意节点 $x$, 其左子树中的*所有*关键字的值都小于 $"key"[x]$.
+ 对于任意节点 $x$, 其右子树中的*所有*关键字的值都大于 $"key"[x]$.

二叉搜索树的中序遍历是一个有序序列.

前驱的搜索逻辑:

- 如果左节点不为空, 那么只需要搜索左节点的最大值(尽可能的向右、向下遍历)
- 如果左节点为空, 向上找到第一个向左的 parent , 也就是说对这个 parent 来说, 当前节点是右孩子. 如果是左孩子的话那就持续向上遍历.
- 返回最后一个父节点. 如果到根部依旧不存在向左的 parent, 那么只能说明最开始的节点已经处在整棵树的左下角, 它没有前驱, 返回空.
]

#ans[
```txt
TREE-PREDECESSOR(x)
Expand All @@ -28,6 +40,10 @@ Write the `TREE-PREDECESSOR` procedure(which is symmetric to `TREE-SUCCESSOR`).
=== Question 13.1-5
Show that the longest simple path from a node $x$ in red-black tree to a descendant leaf at most twice that of the shortest simple path from node $x$ to a descendant leaf.

#rev1_note[
证明从红黑树节点 $x$ 到叶子节点的最长简单路径长度至多是最短简单路径长度的两倍.
]

#ans[
Consider the longest simple path $(a_1, ... ,a_s)$ & the shortest simple path $(b_1, ... ,b_t)$, they have equal number of black nodes (Property 5).
Neither of the paths can have repeated red node (Property 4).
Expand Down
8 changes: 8 additions & 0 deletions 7e1810-algo_hw/main.typ
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
#import "@preview/cetz:0.2.2": *
#import "@preview/diagraph:0.2.1": *

#import "utils.typ": *

#set text(
font: ("linux libertine", "Source Han Serif SC", "Source Han Serif"),
size: 10pt,
Expand All @@ -22,6 +24,12 @@

本文档以 CC BY-NC-SA 4.0 协议发布. 请遵守学术诚信, 不得用于商业用途.

#rev1_note[
\* Revision 2024/06/11: 随期末复习增加了一些注释性内容, 以红色标注.

参考了助教答案中的部分内容, 在此表示感谢.
]

#image("imgs/sticker_1.jpg", width: 30%)
]

Expand Down
15 changes: 15 additions & 0 deletions 7e1810-algo_hw/utils.typ
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#let ans(it) = [
#pad(1em)[
#text(fill: blue)[
#it
]
]
]

#let rev1_note(it) = [
#pad(1em)[
#text(fill: red)[
#it
]
]
]

0 comments on commit 4c3eb82

Please sign in to comment.