-
Notifications
You must be signed in to change notification settings - Fork 1
/
eigenapps.tex
1710 lines (1444 loc) · 63.4 KB
/
eigenapps.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\setsecnumdepth{subsection}
\begin{alttitles}
\chapter{Eigenapplications}
\label{ch:eigenapps}
We end this book with a few more ultra-cool real-world applications of linear
algebra. Unlike those in Chapter~\ref{ch:apps}, though, these will all involve
the \textbf{eigenvalue} concepts you learned in Chapter~\ref{ch:eigen}. Our
new ability to penetrate to the heart of a matrix and understand its inner
structure will enable us to do things we could only dream about before.
\pagebreak
\renewcommand{\thesubsection}{V\arabic{subsection}.}%... from subsections
\section{Video compression}
Imagine you owned a business, and you needed to send a $7\times 7$ matrix over
the Internet to one of your customers. Say, this matrix:
\vspace{-.15in}
\begin{align*}
M_1 =
\begingroup
\renewcommand*{\arraystretch}{1.2}
\begin{bmatrix}
5 & -1 & 45 & 16 & 16 & 3 & 37 \\
3 & -9 & 9 & 27 & -100 & 24 & 601 \\
13 & 0 & 2 & 13 & -18 & 21 & 11 \\
-4 & 33 & 9 & 1 & 4 & 14 & 50 \\
-21 & 51 & 9 & 17 & 5 & 73 & -5 \\
31 & -5 & 9 & -99 & -22 & 1 & 7 \\
6 & 8 & 9 & 24 & 8 & 8 & 8 \\
\end{bmatrix}.
\endgroup
\end{align*}
\vspace{-.15in}
How many numbers in total would you have to send your customer to communicate
the information in this matrix?
Easy question. There are 49 entries in a 7-by-7 matrix, so you need to send 49
numbers. Duh.
\medskip
Now suppose you wanted to send this matrix instead:
\vspace{-.15in}
\begin{align*}
M_2 =
\begingroup
\renewcommand*{\arraystretch}{1.2}
\begin{bmatrix}
5 & 5 & 10 & -5 & 20 & 500 & 2\frac{1}{2} \\
3 & 3 & 6 & -3 & 12 & 300 & 1\frac{1}{2} \\
13 & 13 & 26 & -13 & 52 & 1300 & 6\frac{1}{2} \\
-4 & -4 & -8 & 4 & -16 & -400 & -2 \\
-21 & -21 & -42 & 21 & -84 & -2100 & -10\frac{1}{2} \\
31 & 31 & 62 & -31 & 124 & 3100 & 15\frac{1}{2} \\
6 & 6 & 12 & -6 & 24 & 600 & 3 \\
\end{bmatrix}.
\endgroup
\end{align*}
\vspace{-.15in}
How many numbers would you have to send this time?
If you answered, ``again, 49,'' take another look. $M_2$ is way different than
$M_1$, because it has a very regular structure. Look at $M_2$'s first
(leftmost) column. Then realize that its second column is identical to the
first. Then further realize that its third column is exactly \textit{twice} the
first. And its fourth column is \textit{negative} the first. And its last three
columns are four times, a hundred times, and one-half times the first column,
respectively.
So if your customer knew the matrix would have this type of structure, I claim
you could send them all the necessary information in just \textit{fourteen}
numbers instead of 42. Here's how:
\begin{compactenum}
\item First, send them the first column: $[\ 5\ \ 3\ \ 13\ \ -4\ \ -21\ \ 31\ \
6\ ]$. That's seven numbers.
\index{Oakenshield, Thorin}
\item Then, send them the multiplier for each of the other columns. Namely, $[\
1\ \ 1\ \ 2\ \ -1\ \ 4\ \ 100\ \ \frac{1}{2}\ ]$. That, too, is seven
numbers.\footnote{Yeah, yeah, I know we could omit the first ``1'' in this
second set of seven, because the customer will know that the first column
should be multiplied by 1 without us having to tell them. That's a detail, and
it's actually messier to take advantage of that. Thirteen is unlucky anyway --
ask Thorin Oakenshield if you don't believe me.}
\end{compactenum}
\index{outer product}
From these fourteen numbers alone, the customer can reconstruct the original
matrix. All they have to do is compute the outer product (recall
p.~\pageref{outerProduct}) of the two 7-element vectors they received:
\vspace{-.15in}
\begin{gather*}
\begin{bmatrix}
5 \\ 3 \\ 13 \\ -4 \\ -21 \\ 31 \\ 6 \\
\end{bmatrix} \ \textrm{\textbullet} \
\begin{bmatrix}
1 & 1 & 2 & -1 & 4 & 100 & \frac{1}{2} \\
\end{bmatrix} =\\
\begingroup
\renewcommand*{\arraystretch}{1.2}
\begin{bmatrix}
5 & 5 & 10 & -5 & 20 & 500 & 2\frac{1}{2} \\
3 & 3 & 6 & -3 & 12 & 300 & 1\frac{1}{2} \\
13 & 13 & 26 & -13 & 52 & 1300 & 6\frac{1}{2} \\
-4 & -4 & -8 & 4 & -16 & -400 & -2 \\
-21 & -21 & -42 & 21 & -84 & -2100 & -10\frac{1}{2} \\
31 & 31 & 62 & -31 & 124 & 3100 & 15\frac{1}{2} \\
6 & 6 & 12 & -6 & 24 & 600 & 3 \\
\end{bmatrix}.
\endgroup
\end{gather*}
\vspace{-.15in}
\medskip
\label{rank-deficient}
What's going on here? Simply put, $M_2$ has \textit{less information} in it
than $M_1$ does, despite the fact that they are ostensibly the same size. In
fact, you may have realized that $M_2$ is a \textbf{rank-1} matrix. It has only
one linearly independent column: all the others are simply (scaled) copies of
it. This is what a rank-deficient matrix fundamentally is: a matrix that can
have some of its entries ``squeezed'' out of it without losing any information.
\smallskip
$M_1$ and $M_2$ represent the far extremes of this phenomenon. $M_1$ is full
rank (rank-7), and $M_2$ is just rank-1. In general, a $7\times 7$
rank-deficient matrix might not be \textit{as} deficient as $M_2$ is: it could
have a rank anywhere from 1 to 6.
\smallskip
Incidentally, 14 numbers might not seem like that big a savings over 49 (it's
only 3.5 times fewer entries), but consider what happens as the matrix gets
larger. Suppose it were $1920\times 1920$. Transmitting a rank-1 matrix of that
size would require $1920+1920=$\ 3,840 numbers to go across the wire. But a
full-rank matrix of that size would need \textit{3,686,400} entries. That's
nearly 1,000 times as much data.
\subsection{Enter Netflix}
\index{frame (of a video)}
\index{gray scale}
\index{bit / byte}
Now what's the application here? Well, it's probably one you use every day.
Suppose your aforementioned business is a video streaming service, and your
customer is watching one of your videos. The matrix we've been talking about is
one \textbf{frame} of this video. For simplicity, we'll consider images
(frames) that are in \textbf{gray scale} rather than color. Each \textbf{pixel}
of the image is represented by a single number on some scale: perhaps 0
indicates a pitch black pixel in the current frame, and 255 is bright white,
and all the numbers in between are shades of gray.\footnote{Why a range of 0
through 255? Because as you may remember from Chapters 6 and 7 of \textit{A
Cool Brisk Walk}, that's how many different values can fit in a single byte (8
bits) of data.}
If each frame of our movie is a $1920\times 1920$-pixel square\footnote{At this
point it may occur to you that when you watch a movie, your screen typically
isn't perfectly square, but rectangular. Its \index{aspect ratio}
\textbf{aspect ratio} (width-to-height) might be, say, 1.875:1, which yields a
$1920\times 1024$ canvas. \\ \indent When your matrix isn't square, instead of using
the eigenvalue decomposition, as we learned last chapter, we can use the
\index{singular value decomposition (SVD)} \textbf{singular value decomposition
(SVD)}, a very closely related technique. In fact, the ``singular values'' and
``singular vectors'' that the SVD gives you are exactly the same thing as
eigenvalues and eigenvectors for a non-square matrix.}, and we're shoveling 30
frames per second at our viewer, that would be about 105 Megabytes of
information \textit{every second} we'd be trying to send. The Internet
ain't got no time for that.
So we'd like to instead send a compressed version of each frame: a matrix whose
rank is far less than 1,920 and yet which looks pretty close to the original.
Our problem thus reduces to: \textit{what matrix, of the same size as the
original image's matrix but of at most rank-k (for some number k) is the
``closest'' to the original?}
\subsection{Yet another norm}
\index{Frobenius norm}
\index{norm (of a matrix)}
\index{Euclidean norm}
First, we have to settle on what we mean by one matrix being ``close'' to
another. Here, we'll subtract one matrix from the other, and then take the
\textbf{Frobenius norm} of this difference. Subtracting matrices, of course, is
just the inverse of adding them (p.~\pageref{matrixAddition}): we do it element
by element and get a matrix of the differences. The ``Frobenius norm'' (which
always sounded to me like a character from a Willy Wonka story, btw) is just
what you would expect it to be: the square-root of the sum of all these squared
differences. In fact it's exactly like the Euclidean norm of a vector
(p.~\pageref{Euclideannorm}), except that we have a two-dimensional pane of
entries to work with instead of just a one-dimensional list.
To be concrete, let's say we have a matrix $M$, and an ``approximation'' to
matrix $M$ called $\hat{M}$, with the following values:
\vspace{-.15in}
\begin{align*}
M =
\begin{bmatrix}
17 & 2 \\
-3 & 6 \\
\end{bmatrix}, \quad
\hat{M} =
\begin{bmatrix}
14 & 4 \\
-5 & -1 \\
\end{bmatrix},
\end{align*}
\vspace{-.15in}
How ``close'' are these two matrices to each other? To quantify this, we first
subtract one from the other (doesn't matter which order):
\vspace{-.15in}
\begin{align*}
M - \hat{M} =
\begin{bmatrix}
3 & -2 \\
2 & 7 \\
\end{bmatrix}.
\end{align*}
\vspace{-.15in}
And then we take the square-root of the sum of the squared entries to give us
the Frobenius norm:
\vspace{-.15in}
\begin{align*}
{\left\lVert{M - \hat{M}}\right\rVert_F} =
\sqrt{3^2 + -2^2 + 2^2 + 7^2} = 8.124.
\end{align*}
\vspace{-.15in}
This measure passes a quick sanity check: the further apart the entries of $M$
and $\hat{M}$ at a particular row and column, the larger the difference that is
added towards the Frobenius norm. So this metric gives pairs of matrices whose
entries are more similar to each other a lower overall norm, indicating a
higher similarity.
\subsection{Best low-rank matrix approximations}
\index{dominant eigenvector}
And now for our eigenstuff. Recall (p.~\pageref{theScoopAboutEigenvectors} and
following) that every eigenvector of a matrix has a corresponding eigenvalue,
and that we could arrange these eigenvectors by decreasing eigenvalue if we
like. The one with the largest eigenvalue had the special name ``dominant
eigenvector.'' I'll also refer to ``the top eigenvector,'' ``the top two
eigenvectors,'' ``the top ten eigenvectors,'' and so forth, by which I just
mean ``the $k$ eigenvectors with the highest eigenvalues.''
Okay. It turns out that the best rank-1 approximation to a square matrix (where
``best'' means ``closest to the original when using the Frobenius norm'') is
\textit{the dominant eigenvector, times its eigenvalue, times the transpose of
the dominant eigenvector.} To illustrate, let's say we had this $5\times 5$
matrix\footnote{You may notice that this example matrix happens to be
\textit{symmetric}. Things actually get slightly weird for non-symmetric
matrices, in which case you can again turn to the SVD,\index{singular value
decomposition (SVD)} which is almost the same thing as the
\index{eigendecomposition} eigendecomposition.}:
\vspace{-.15in}
\label{Amatrix}
\begin{align*}
A =
\begin{bmatrix}
12 & 2 & 1 & 9 & 13 \\
2 & 20 & 12 & 7 & 5 \\
1 & 12 & 6 & -4 & 5 \\
9 & 7 & -4 & 8 & 8 \\
13 & 5 & 5 & 8 & 13 \\
\end{bmatrix}
,
\end{align*}
\vspace{-.15in}
and we wanted the best rank-1 approximation to it. We ask Python for its
dominant eigenvector, and the corresponding eigenvalue, and get this:
\vspace{-.15in}
\begin{align*}
\overrightarrow{x_1} =
\begin{bmatrix}
.462 \\ .538 \\ .257 \\ .38 \\ .535 \\
\end{bmatrix}, \quad
\lambda_1 = 37.363.
\end{align*}
\vspace{-.15in}
Multiplying these out as indicated above, we get:
\vspace{-.15in}
\begin{gather*}
\overrightarrow{x_1} \ \textrm{\textbullet} \ \begin{bmatrix} \lambda_1
\end{bmatrix} \ \textrm{\textbullet} \ \overrightarrow{x_1}^\intercal =\\
\begin{bmatrix}
.462 \\ .538 \\ .257 \\ .38 \\ .535 \\
\end{bmatrix} \ \textrm{\textbullet}\
\begin{bmatrix}
37.363
\end{bmatrix} \ \textrm{\textbullet}\
\begin{bmatrix}
.462 & .538 & .257 & .38 & .535 \\
\end{bmatrix} = \\
\begin{bmatrix}
7.963 & 9.288 & 4.441 & 6.563 & 9.222\\
9.288 & 10.834 & 5.18 & 7.655 & 10.757\\
4.441 & 5.18 & 2.477 & 3.66 & 5.143\\
6.563 & 7.655 & 3.66 & 5.409 & 7.6 \\
9.222 & 10.757 & 5.143 & 7.6 & 10.68 \\
\end{bmatrix}.
\end{gather*}
\vspace{-.05in}
\index{Spectral Theorem}
Take careful note to see that \textit{this is actually the \textbf{Spectral
Theorem} in action} (from p.~\pageref{SpectralTheorem}) but using only a subset
of the eigenvectors/eigenvalues (namely, the top one) instead of all of them.
(This is why I put the top eigenvalue, 37.363, into a $1\times 1$ matrix in the
equation above -- to help you see that connection.)
\medskip
Now how close is this approximation to our original $A$ matrix? Not that great,
actually. If you run your eyes over the entries, and compare to $A$
(p.~\pageref{Amatrix}) it doesn't even look like it's trying very hard. The
Frobenius norm of the difference between the two, by the way, is a whopping
573.04, which tells you that limiting ourselves to rank-1 isn't producing a
very good approximation. We wouldn't want to send such an image to our viewer,
even though it would only take 10 bytes instead of 25, because they might not
even know whether they were on HBO or the Disney Channel.
\bigskip
Okay, let's move up to rank-2 then. What's the closest rank-2 matrix to our
original? Python says the second highest eigenvalue (and its eigenvector) are:
\vspace{-.15in}
\begin{align*}
\overrightarrow{x_2} =
\begin{bmatrix}
-.458 \\ .658 \\ .448 \\ -.266 \\ -.293 \\
\end{bmatrix}, \quad
\lambda_2 = 27.712.
\end{align*}
\vspace{-.15in}
Adding this second eigenvector into the mix, we get:
\vspace{-.15in}
\begin{gather*}
\begin{bmatrix}
| & | \\[6pt]
\overrightarrow{\textbf{x}_{1}} &
\overrightarrow{\textbf{x}_{2}} \\
| & | \\[6pt]
\end{bmatrix} \ \textrm{\textbullet}\
\begin{bmatrix}
\lambda_1 & 0 \\
0 & \lambda_2 \\
\end{bmatrix} \ \textrm{\textbullet}\
\begin{bmatrix}
| & | \\[6pt]
\overrightarrow{\textbf{x}_{1}} &
\overrightarrow{\textbf{x}_{2}} \\
| & | \\[6pt]
\end{bmatrix}^\intercal = \\
\small
\begingroup
\setlength\arraycolsep{2pt}
\begin{bmatrix}
.462 & -.458 \\
.538 & .658 \\
.257 & .448 \\
.38 & -.266 \\
.535 & -.293 \\
\end{bmatrix} \ \textrm{\textbullet}\
\begin{bmatrix}
37.363 & 0 \\
0 & 27.712 \\
\end{bmatrix} \ \textrm{\textbullet}\
\begin{bmatrix}
.462 & .538 & .257 & .38 & .535 \\
-.458 & .658 & .448 & -.266 & -.293 \\
\end{bmatrix}\endgroup\\
=
\begin{bmatrix}
12.515 & 2.748 & -0.011 & 9.211 & 12.138 \\
2.748 & 20.231 & 11.577 & 3.849 & 6.567 \\
-0.011 & 11.577 & 6.831 & 1.07 & 2.291 \\
9.211 & 3.849 & 1.07 & 6.95 & 9.297 \\
12.138 & 6.567 & 2.291 & 9.297 & 12.548 \\
\end{bmatrix}.
\end{gather*}
\vspace{-.05in}
Now we're talking. Flip back and forth between those numbers and the original
$A$ on p.~\pageref{Amatrix}, and you'll see that with just two eigenvectors,
we're starting to get a remarkably close approximation. Frobenius just
plummeted to 101.62. Repeating this for the best rank-3 approximation, we get:
\vspace{-.15in}
\begin{gather*}
\begin{bmatrix}
| & | & | \\[6pt]
\overrightarrow{\textbf{x}_{1}} &
\overrightarrow{\textbf{x}_{2}} &
\overrightarrow{\textbf{x}_{3}} \\
| & | & | \\[6pt]
\end{bmatrix} \ \textrm{\textbullet}\
\begin{bmatrix}
\lambda_1 & 0 & 0 \\
0 & \lambda_2 & 0 \\
0 & 0 & \lambda_3 \\
\end{bmatrix} \ \textrm{\textbullet}\
\begin{bmatrix}
| & | & | \\[6pt]
\overrightarrow{\textbf{x}_{1}} &
\overrightarrow{\textbf{x}_{2}} &
\overrightarrow{\textbf{x}_{3}} \\
| & | & | \\[6pt]
\end{bmatrix}^\intercal = \\
\begingroup
\setlength\arraycolsep{2pt}
\footnotesize
\begin{bmatrix}
.462 & -.458 & .156 \\
.538 & .658 & -.33 \\
.257 & .448 & .525 \\
.38 & -.266 & -.651 \\
.535 & -.293 & .408 \\
\end{bmatrix} \ \textrm{\textbullet}\
\begin{bmatrix}
37.363 & 0 & 0 \\
0 & 27.712 & 0 \\
0 & 0 & 7.6 \\
\end{bmatrix} \ \textrm{\textbullet}\
\begin{bmatrix}
.462 & .538 & .257 & .38 & .535 \\
-.458 & .658 & .448 & -.266 & -.293 \\
.156 & -.33 & .525 & -.651 & .408 \\
\end{bmatrix} \endgroup \\
=
\begin{bmatrix}
12.701 & 2.356 & 0.612 & 8.438 & 12.623 \\
2.356 & 21.06 & 10.259 & 5.484 & 5.542 \\
0.612 & 10.259 & 8.925 & -1.528 & 3.921 \\
8.438 & 5.484 & -1.528 & 10.173 & 7.276 \\
12.623 & 5.542 & 3.921 & 7.276 & 13.815 \\
\end{bmatrix},
\end{gather*}
\vspace{-.15in}
which is even closer, with a Frobenius norm of just 59.07.
\bigskip
Each time we add another eigenvector, we give our approximation another degree
of freedom which it can use to bend closer to the original. And finally, if we
use all 5, we of course get the original matrix back:
\vspace{-.15in}
\footnotesize
\begin{gather*}
\begin{bmatrix}
| & | & | & | & | \\[6pt]
\overrightarrow{\textbf{x}_{1}} &
\overrightarrow{\textbf{x}_{2}} &
\overrightarrow{\textbf{x}_{3}} &
\overrightarrow{\textbf{x}_{4}} &
\overrightarrow{\textbf{x}_{5}} \\
| & | & | & | & | \\[6pt]
\end{bmatrix} \ \textrm{\textbullet}\
\begin{bmatrix}
\lambda_1 & 0 & 0 & 0 & 0 \\
0 & \lambda_2 & 0 & 0 & 0 \\
0 & 0 & \lambda_3 & 0 & 0 \\
0 & 0 & 0 & \lambda_4 & 0 \\
0 & 0 & 0 & 0 & \lambda_5 \\
\end{bmatrix} \ \textrm{\textbullet}\
\begin{bmatrix}
| & | & | & | & | \\[6pt]
\overrightarrow{\textbf{x}_{1}} &
\overrightarrow{\textbf{x}_{2}} &
\overrightarrow{\textbf{x}_{3}} &
\overrightarrow{\textbf{x}_{4}} &
\overrightarrow{\textbf{x}_{5}} \\
| & | & | & | & | \\[6pt]
\end{bmatrix}^\intercal = \\
\begingroup
\setlength\arraycolsep{2pt}
\footnotesize
\begin{bmatrix}
-0.462 & -0.458 & 0.156 & 0.735 & -0.109 \\
-0.538 & 0.658 & -0.33 & 0.082 & -0.402 \\
-0.257 & 0.448 & 0.525 & 0.105 & 0.668 \\
-0.38 & -0.266 & -0.651 & -0.181 & 0.572 \\
-0.535 & -0.293 & 0.408 & -0.639 & -0.23 & \\
\end{bmatrix}\endgroup \ \textrm{\textbullet} \quad \quad \quad \quad \quad \quad \quad \quad
\quad \quad
\\
\quad \quad \quad \quad \quad \quad \quad \quad
\quad \quad
\begingroup
\setlength\arraycolsep{2pt}
\begin{bmatrix}
37.363 & 0 & 0 & 0 & 0 \\
0 & 27.712 & 0 & 0 & 0 \\
0 & 0 & 7.599 & 0 & 0 \\
0 & 0 & 0 & -.151 & 0 \\
0 & 0 & 0 & 0 & -6.523 \\
\end{bmatrix}\endgroup \ \textrm{\textbullet} \quad \quad \quad \quad \quad \quad \quad \quad
\quad \quad
\\
\end{gather*}
\pagebreak
\begin{gather*}
\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad
\begingroup
\setlength\arraycolsep{2pt}
\begin{bmatrix}
-0.462 & -0.538 & -0.257 & -0.38 & -0.535 \\
-0.458 & 0.658 & 0.448 & -0.266 & -0.293 \\
0.156 & -0.33 & 0.525 & -0.651 & 0.408 \\
0.735 & 0.082 & 0.105 & -0.181 & -0.639 \\
-0.109 & -0.402 & 0.668 & 0.572 & -0.23 \\
\end{bmatrix}\endgroup = \\
\normalsize
\begin{bmatrix}
12 & 2 & 1 & 9 & 13 \\
2 & 20 & 12 & 7 & 5 \\
1 & 12 & 6 & -4 & 5 \\
9 & 7 & -4 & 8 & 8 \\
13 & 5 & 5 & 8 & 13 \\
\end{bmatrix} = A \quad \checkmark
\end{gather*}
\vspace{-.05in}
\normalsize
by the Spectral Theorem. (And a Frobenius norm of 0.)
\bigskip
\bigskip
Finally, let's look at this work on an actual image.
Figure~\ref{fig:stormtrooper} shows a $1800\times 1800$ gray scale still
frame from a movie. At 1 byte (8 bits) per pixel, the entire original image
would take 3.24 MBytes to transmit in full. That's a lot of data for one frame
of a movie that the viewer will only see for $\frac{1}{30}^\textrm{th}$ of a
second.
\begin{figure}[hb]
\centering
\includegraphics[width=0.5\textwidth]{stormtrooper.png}
\caption{A 3,240,000-byte gray scale image.}
\label{fig:stormtrooper}
\end{figure}
Let's see if we can do better. Figure~\ref{fig:approximations} starts with a
lowly rank-1 matrix (using just the dominant eigenvector), and then increases
the rank as the images move left-to-right down the page.
\begin{figure}[H]
\centering
\begin{tabular}{ccc}
\scriptsize{Rank-1 (3600 bytes)} &
\scriptsize{Rank-2 (7200 bytes)} &
\scriptsize{Rank-3 (10,800 bytes)} \\
\includegraphics[width=0.3\textwidth]{approx01.png} &
\includegraphics[width=0.3\textwidth]{approx02.png} &
\includegraphics[width=0.3\textwidth]{approx03.png}\\
\smallskip
\scriptsize{Rank-6 (21,600 bytes)} &
\scriptsize{Rank-9 (32,400 bytes)} &
\scriptsize{Rank-12 (43,200 bytes)} \\
\includegraphics[width=0.3\textwidth]{approx06.png} &
\includegraphics[width=0.3\textwidth]{approx09.png} &
\includegraphics[width=0.3\textwidth]{approx12.png}\\
\smallskip
\scriptsize{Rank-20 (72,000 bytes)} &
\scriptsize{Rank-30 (108,000 bytes)} &
\scriptsize{Rank-40 (144,000 bytes)} \\
\includegraphics[width=0.3\textwidth]{approx20.png} &
\includegraphics[width=0.3\textwidth]{approx30.png} &
\includegraphics[width=0.3\textwidth]{approx40.png}\\
\scriptsize{Rank-100 (360,000 bytes)} &
\scriptsize{Rank-200 (720,000 bytes)} &
\scriptsize{Rank-400 (1,440,000 bytes)} \\
\includegraphics[width=0.3\textwidth]{approx100.png} &
\includegraphics[width=0.3\textwidth]{approx200.png} &
\includegraphics[width=0.3\textwidth]{approx400.png}\\
\end{tabular}
\caption{Low-rank approximations of an image matrix, using increasing numbers
of eigenvectors (and thus an increasing rank, with increasing
storage/transmission costs.)}
\label{fig:approximations}
\end{figure}
\pagebreak
It's interesting to watch how the image emerges as we add more eigenvectors.
With only the dominant eigenvector, all we can make out is a blurry,
right-angle-centric splash of black and white. Still, it's not bad for a rank-1
matrix, and after adding just a few more eigenvectors we can already see the
basic shape of the helmet come through.
Another observation is that we pretty quickly reach a point of diminishing
returns. Compare the rank-100 and rank-400 matrices in the bottom row of
Figure~\ref{fig:approximations}. Is it really worth quadrupling the size to get
the second one?
\begin{figure}[H]
\centering
\includegraphics[width=0.7\textwidth]{frobenius.png}
\caption{The closeness to the original matrix as a function of the
approximation's rank.}
\label{fig:frobenius}
\end{figure}
Figure~\ref{fig:frobenius} quantifies this by plotting the rank of the matrix
against the Frobenius norm of the difference from the original. As you can see,
less than fifty or so eigenvectors (out of the total of 1800) is enough to
eliminate nearly all the error.
\vfill
\pagebreak
\renewcommand{\thesubsection}{M\arabic{subsection}.}%... from subsections
\section{Markov chains}
\index{probability}
For our next eigenapplication, we'll need to review a few concepts from
Chapter 4 of \textit{A Cool Brisk Walk}: probability.
Recall that we can quantify the likelihood of some event happening by assigning
it a \textbf{probability} between 0 and 1, where 0 means the event is
impossible and 1 means it's a certainty. We use the notation ``Pr($E$)'' for
this, where $E$ is some event. Sometimes the probability is based on actual
counts of past occurrences, and sometimes it is estimated based on intuition
and common sense. For instance:
\begin{center}
Pr(home team wins in college basketball) = .6765 \\
\smallskip
Pr(worldwide pandemic next year) = .1
\end{center}
\index{conditional probability}
You may also recall the notion of \textbf{conditional probability}, in which
our estimate of an event's likelihood changes based on some other event being
known to have (or have not) occurred. We use the notation ``Pr($A|B$),''
pronounced as ``the probability of $A$ given $B$'' for this. Examples:
\begin{center}
Pr(speeding ticket) = .05 \\
\smallskip
Pr(speeding ticket\ $|$\ driving a red car) = .15 \\[12pt]
\medskip
Pr(Jets win tomorrow) = .7 \\
\smallskip
Pr(Jets win tomorrow\ $|$\ starting QB hurt in practice) = .4 \\
\end{center}
In one of these examples the conditional probability is higher than the
original, and in the other case it's lower, but either way it's always between
0 and 1.
\subsection{Systems and states}
\index{system}
\index{state}
\index{Monopoly}
Now recall from Chapter~\ref{ch:apps} that we often want to predict how the
\textbf{state} of some \textbf{system} will unfold over its lifetime. A
system's ``state'' is just a way of talking about the (temporary) condition it
is in at any one point in time. In this section, we're going to focus on just a
single aspect of a system's state, which has a single value at a point in time,
not a bunch of stuff like in the Monopoly example. Perhaps our system is the
U.S. economy, and in any given quarter its ``state'' is whether or not it's in
a recession. Or maybe we're interested in the system of local weather
conditions, for which each day's state is either sunny, cloudy, or rainy.
\index{independence}
Let's expand on this last example because we all have so much experience with
weather. One way we could get a handle on what tomorrow may bring is to count
how many days in the past have been sunny, cloudy, and rainy in our region, and
then assume \textbf{independence} between days. Remember that if two events are
independent, that means that the outcome of one of them has no impact on the
other. A Little League player's \textsl{jersey color} and \textsl{position} are
independent of each other: if you tell me Henry's uniform is blue, that doesn't
give me any information about whether he's an infielder or a pitcher.
So assuming independence, we could count up the past 100 days in
Fredericksburg, and conclude:
\vspace{-.15in}
\begin{align*}
\textrm{Pr(sunny)} &= .5 \\
\textrm{Pr(cloudy)} &= .4 \\
\textrm{Pr(rainy)} &= .1 \\
\end{align*}
\vspace{-.25in}
With this simple model, if someone asked us ``how likely is it to rain
tomorrow?''~we'd reply ``about a 10\% chance,'' no matter what the weather was
today, yesterday, or any other day. We figure that it doesn't matter what
happened on those other days, because we're assuming every day is independent
of the others.
Now this simplistic response is unsatisfying on several levels. For one, it
doesn't take into account the season we're in, mushing all months together into
a mediocre gray. But more to the point, it doesn't even take into account the
recent past. As we've all observed, weather patterns tend to form and transform
on a slower time scale than individual days. If it's hot today, it's pretty
likely to remain hot for at least a little while -- we're probably not going to
get a string of hot, cold, hot, cold, hot, and cold days consecutively.
\index{Markov chain}
\index{Markov property}
\index{Markov, Andrey}
\index{first-order}
\index{Ren, Kylo}
\index{Kylo Ren}
\index{Supreme Leader Snoke}
Remember that when the current state of some system is influenced only by its
\textit{previous} state, it has the so-called ``Markov property.'' In this
case, we can build a powerful structure called a \textbf{first-order Markov
chain} to analyze it. Markov chains are named for the brilliant Russian
mathematician Andrey Markov, one of the most underrated minds in history, in my
opinion. The phrase ``first order'' here has nothing to do with Kylo Ren or
Supreme Leader Snoke; it means that the system's current state depends only on
its immediately preceding state, and is conditionally independent of all states
longer ago than that.\footnote{If we said that a system's current state
depended on its immediately \textit{two} preceding states, we'd built a
second-order Markov chain, and so forth.} In weather terms, this means knowing
that it rained on Tuesday tells you something important about whether it will
also rain on Wednesday, but nothing (directly) about Thursday or beyond.
\subsection{Stochastic matrices}
A Markov chain can be modeled as (surprise!)~a matrix, which encodes all its
\textit{conditional probabilities}. Each one says: ``if the previous state is
$X$, here's the probability that the current state will be $Y$.''
For example, maybe it's true that heat waves tend to last longer than a day. If
it was sunny yesterday, it's likely to remain sunny today. In symbols, we might
say:
\vspace{-.15in}
\begin{align*}
\textrm{Pr(sunny}_k\ |\ \textrm{sunny}_{k-1}) &= .7 \\
\end{align*}
\vspace{-.25in}
The $k$ subscript is just to number the days. This formula says: ``the chances
of it being sunny on any particular day, given that it was sunny the day
before, are 70\%.''
\smallskip
Perhaps it's also true in our region that rainstorms tend not to last longer
than one day. (Once a storm finally breaks, the atmosphere has ``gotten the
rain out of its system'' and drier days will likely follow.) So we might
estimate these quantities:
\vspace{-.15in}
\begin{align*}
\textrm{Pr(rainy}_k\ |\ \textrm{rainy}_{k-1}) &= .05 \\
\textrm{Pr(cloudy}_k\ |\ \textrm{rainy}_{k-1}) &= .55 \\
\textrm{Pr(sunny}_k\ |\ \textrm{rainy}_{k-1}) &= .4 \\
\end{align*}
\vspace{-.25in}
We've saying that two rainy days in a row are very unlikely: if it rained
yesterday, it's most likely to be cloudy (and dry) today, not wet again.
\index{mutually exclusive}
\index{collectively exhaustive}
Note carefully that the above three numbers \textit{add up to exactly 1}, as
they must. The weather on day $k$ must either be rainy, cloudy, or sunny in our
simple weather model. Since these three possibilities are mutually exclusive
and collectively exhaustive, their probabilities must total 1.
\smallskip
If you nodded to the previous paragraph, make sure you also understand that the
\textit{following} three numbers do \textit{not} have to add up to 1 (and
normally won't):
\vspace{-.15in}
\begin{align*}
\textrm{Pr(cloudy}_k\ |\ \textrm{rainy}_{k-1}) &= .55 \\
\textrm{Pr(cloudy}_k\ |\ \textrm{cloudy}_{k-1}) &= .6 \\
\textrm{Pr(cloudy}_k\ |\ \textrm{sunny}_{k-1}) &= .2 \\
\end{align*}
\vspace{-.25in}
% WARNING: "top of the page"
Students sometimes get confused here. The three expressions look an awful lot
like the ones at the top of the page, yet the bottom three don't represent
mutually exclusive and collectively exhaustive options at all. The first of the
bottom three says ``what's the probability it'll be cloudy today if it was
rainy yesterday?'' The second asks a question about a completely different
circumstance: ``what about if it was \textit{cloudy} yesterday? Now how likely
are clouds today?'' These two numbers (.55 and .6) are unrelated to one
another, and not bound by any probability axioms.
\bigskip
And now, the matrix. There are nine different conditional probabilities for
this system, and we'll arrange them in a matrix called $W$ (for ``weather'') as
follows:
\vspace{-.2in}
\[
\begin{blockarray}{ccccccccc}
& & & & \leftarrow & {\overrightarrow{\textrm{day}_{k-1}}} & \rightarrow \\[11pt]
& & & & {\footnotesize\textrm{sunny}} & {\footnotesize\textrm{cloudy}} &
{\footnotesize\textrm{rainy}} \\
\begin{block}{cccc[ccccc]}
& & \uparrow & {\footnotesize\textrm{sunny}} & .7 & .3 & .4 \\
{\large W =} \quad\quad\quad & & {\textrm{day}_k} \ \ & {\footnotesize\textrm{cloudy}} & .2 & .6 & .55 \\
& & \downarrow & {\footnotesize\textrm{rainy}} & .1 & .1 & .05 \\
\end{block}
\end{blockarray}
\]
% WARNING: "last couple of pages"
The columns correspond to yesterday's weather, and the rows correspond to
today's weather. So if it was cloudy yesterday, the probability of sun today is
.3 (top-middle entry). If it was sunny yesterday, the probability of rain today
is .1 (bottom-left corner). Take a moment to look at this matrix and verify
your understanding of what each entry means. Also verify that the example
numbers on the previous couple of pages have all been entered in the right
places.
\index{stochastic matrix}
\index{probability matrix}
\index{substitution matrix}
\index{transition matrix}
A matrix of the form above goes by many names: some call them \textbf{Markov
matrices}, others \textbf{probability matrices}, still others
\textbf{transition matrices}, and yet others \textbf{substitution matrices}.
Me? I like the term \textbf{stochastic matrices} (since it sounds cool) so
that's what we'll go with here. (``Stochastic,'' by the way, is a word that
basically means ``random'' or ``indeterminate.'' Any time you're dealing with
probability, you have a stochastic system.)
\index{Leslie matrix}
Just like our Leslie matrices from p.~\pageref{sec:leslie} did, stochastic
matrices have certain constraints they must adhere to in order to join the
club. For a matrix to be a stochastic matrix:
\begin{compactenum}
\item all its entries must between 0 and 1, and
\item the sum of each of its columns must be exactly 1.
\end{compactenum}
(Notice I wrote ``the sum of each \textit{column},'' not row, for the same
reason I took pains to explain a few paragraphs ago. The rows don't normally
sum to 1, and there's no reason they should.)
Before going further, take a moment and verify that the $W$ matrix above is
indeed a stochastic matrix.
Then, complete this puzzle. Given the stochastic matrix $W$, which of the
following weeks of weather is most likely to actually occur? And which is least
likely? (``S''=sunny, ``C''=cloudy, ``R''=rainy)
\label{markovPuzzle}
\begin{adjustwidth}{38pt}{30pt}
\begin{multicols}{2}
\begin{compactenum}[a.]
\item S-R-S-R-S-R-S
\item S-R-R-R-S-S-S
\item S-C-S-C-S-C-S
\item S-S-S-S-C-C-C
\end{compactenum}
\end{multicols}
\end{adjustwidth}
(Answers on p.~\pageref{markovPuzzleSol}.)
\subsection{Simulating the Markov chain}
\index{linear operator}
The $W$ matrix is square, and thus a linear operator. It can be multiplied by
$3\times 1$ column vectors to produce other $3\times 1$ column vectors. Its
domain is vectors that represent yesterday's weather, and its codomain is
vectors that represent today's weather.
Suppose yesterday -- which we'll call ``day 1'' since it's the first day we'll
consider -- was a sunny day. We now want to predict what the weather will be
today. If we create a column vector for yesterday that has 1 for ``sunny'' and
0 for the other weather options, all we need to do is multiply it by $W$, and
whammo:
\vspace{-.15in}
\begin{align*}
W \cdot \overrightarrow{\textrm{day}_1} = \overrightarrow{\textrm{day}_2}.
\end{align*}
\vspace{-.15in}
By the actual numbers, we get:
\[
\begin{blockarray}{cccccc}
& {\footnotesize\textrm{sunny}} & {\footnotesize\textrm{cloudy}} &
{\footnotesize\textrm{rainy}} \\
\begin{block}{c[ccccc]}
{\footnotesize\textrm{sunny}} & .7 & .3 & .4 \\
{\footnotesize\textrm{cloudy}} & .2 & .6 & .55 \\
{\footnotesize\textrm{rainy}} & .1 & .1 & .05 \\
\end{block}
\end{blockarray} \cdot
\begin{bmatrix}
1 \\ 0 \\ 0 \\
\end{bmatrix} =
\begin{bmatrix}
.7 \\ .2 \\ .1
\end{bmatrix}.
\]
We set $\overrightarrow{\textrm{day}_1}$ to $[\ 1\ \ 0\ \ 0\ ]^\intercal$
because we \textit{knew} yesterday's weather. When we set out to make our
forecast, we knew that there was a 100\% chance it was sunny yesterday, and 0\%
chance it was either cloudy or rainy.
The resulting $\overrightarrow{\textrm{day}_2}$ vector, of course, does not
turn out to have only 0's and 1's. That's because the forecast is uncertain.
What the $\overrightarrow{\textrm{day}_2}$ vector means is that there will be a
10\% chance of rain today, a 20\% chance of overcast skies, and a 70\% chance
of clear blue.
Just like with Leslie matrices, we can run this forward any number of times we
want to predict into the distant future. Let's find out what the weather is
likely to be on day 4 (the day after tomorrow):
\vspace{-.15in}
\begin{align*}
W \cdot \overrightarrow{\textrm{day}_1} &= \overrightarrow{\textrm{day}_2} \\
W \cdot \overrightarrow{\textrm{day}_2} &= \overrightarrow{\textrm{day}_3} \\
W \cdot \overrightarrow{\textrm{day}_3} &= \overrightarrow{\textrm{day}_4} \\
\textrm{so} \\
W^3 \cdot \overrightarrow{\textrm{day}_1} &= \overrightarrow{\textrm{day}_4}.
\end{align*}
\vspace{-.15in}
\index{associative (operation)}
That last step takes advantage of the associative property of matrix
multiplication. This yields:
\[
\begin{blockarray}{cccccc}
& {\footnotesize\textrm{sunny}} & {\footnotesize\textrm{cloudy}} &
{\footnotesize\textrm{rainy}} \\
\begin{block}{c[ccccc]}
{\footnotesize\textrm{sunny}} & .5455 & .4815 & .49575 \\
{\footnotesize\textrm{cloudy}} & .35925 & .42325 & .409125 \\
{\footnotesize\textrm{rainy}} & .09525 & .09525 & .095125 \\
\end{block}
\end{blockarray} \cdot
\begin{bmatrix}
1 \\ 0 \\ 0 \\
\end{bmatrix} =
\begin{bmatrix}
.5455 \\ .35925 \\ .09525
\end{bmatrix}.
\]
Apparently, then, on the day after tomorrow we have only about a 55\% chance of
sunny weather. That is, if it had been sunny yesterday. What if it had been
cloudy yesterday?