-
Notifications
You must be signed in to change notification settings - Fork 0
/
ec2experience.txt
73 lines (49 loc) · 2.66 KB
/
ec2experience.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
1. How long did each of the six runs take? How many mappers and how many reducers did you use?
run 1: (freedom, 0) on the 2005 dataset with combiner off
time: 16 minutes, 52 seconds
mappers: 242
reducers: 33
run 2: (freedom, 0) on the 2005 dataset with combiner on
time: 6 minutes, 34 seconds
mappers: 242
reducers: 33
run 3: (capital, 0) on the 2006 dataset with combiner on
time: 15 mins, 20 seconds
mappers: 348
reducers: 33
processing rate: 0.01938 gb/s
run 4: (capital, 0) on the 2006 dataset with combiner on
time: 8 mins, 58 seconds
mappers: 348
reducers: 33
processing rate: 0.03313 gb/s
run 5: (landmark, 1) on the 2006 dataset with combiner on
time: 8 mins, 50 seconds
mappers: 348
reducers: 33
processing rate: 0.03363 gb/s
run 6: (monument, 2) on the 2006 dataset with combiner on
time: 8 mins, 51 seconds
mappers: 348
reducers: 33
processing rate: 0.03357 gb/s
2. For the two runs with (freedom, 0), how much faster did your code run on the 5 workers with the combiner turned on than with the combiner turned off? Express your answer as a percentage.
((16 minutes 52 seconds)-(6 minutes 34 seconds))/(6 minutes 34 seconds)=1.5685, so 156.85% faster
3. For the runs on the 2006 dataset, what was the median processing rate per GB (= 2^30 bytes) of input for the tests using 5 workers? Using 9 workers?
The median processing rates are shown above in part 1. The median processing rate for 5 workers is 0.01938 gb/s, and the median processing rate for 9 workers is 0.03357 gb/s.
4. What was the percent speedup of running (capital, 0) with 9 workers over 5 workers? What is the maximum possible speedup, assuming your code is fully parallelizable? How well, in your opinion, does Hadoop parallelize your code? Justify your answer in 1-2 sentences.
(0.03357-0.01938)/0.01938 = 0.7322 = 73.22% faster
Optimal is (9-5)/5 = 0.8 = 80% faster
73.22/80 = 0.9153 = 91.53% efficient
In my opinion, Hadoop parallelizes code pretty well and it parallelizes at a speedup that is 91.53% of the maximum speedup.
5. For a single run on the 2006 dataset, what was the price per GB processed on with 5 workers? With 9 workers? (Recall that an extra-large instance costs $0.58 per hour, rounded up to the nearest hour.)
($0.58)*(5 workers)*(1 hour) = $2.90
GB Processed: (19,139,821,102 bytes)*(1/2^30 gb/bytes) = 17.82534 gb
$2.90/17.82534 gb =
$0.16 per gb
($0.58)*(9 workers)*(1 hour) = $5.22
GB Processed: (19,141,786,065 bytes)*(1/2^30 gb/bytes) = 17.82718 gb
$5.33/17.82718 gb =
$0.30 per gb
6. How much total money did you use to complete this project?
($0.58)*(5 workers)*(1 hour)*(3 jobs) + ($0.58)*(9 workers)*(1 hour)*(3 jobs) = $24.36