forked from lintool/Cloud9
-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
177 lines (137 loc) · 7.52 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Cloud9: A Hadoop toolkit for working with big data</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description" content="">
<meta name="author" content="">
<!-- Le styles -->
<link href="docs/assets/css/bootstrap.css" rel="stylesheet">
<link href="docs/assets/css/bootstrap-responsive.css" rel="stylesheet">
<link href="docs/assets/css/docs.css" rel="stylesheet">
<link href="docs/assets/js/google-code-prettify/prettify.css" rel="stylesheet">
<!-- Le HTML5 shim, for IE6-8 support of HTML5 elements -->
<!--[if lt IE 9]>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
</head>
<body data-spy="scroll" data-target=".bs-docs-sidebar">
<!-- Navbar
================================================== -->
<div class="navbar navbar-inverse navbar-fixed-top">
<div class="navbar-inner">
<div class="container">
<button type="button" class="btn btn-navbar" data-toggle="collapse" data-target=".nav-collapse">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<div class="nav-collapse collapse">
<ul class="nav">
<li class="active">
<a href="./index.html">Home</a>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jumbotron masthead">
<div class="container">
<h1>Cloud<sup>9</sup></h1>
<p>A Hadoop toolkit for working with big data</p>
<ul class="masthead-links">
<li>
<a href="https://github.com/lintool/Cloud9">GitHub project</a>
</li>
<li>
Version 2.0.0
</li>
</ul>
</div>
</div>
<div class="container">
<p class="lead" style="padding-top: 60px">Cloud<sup>9</sup> is a
collection of Hadoop tools that tries to make working with big data a
bit easier.</p>
<p>This software was designed with two goals in mind: First, to serve
as a teaching tool for MapReduce and MapReduce algorithm
design. Second, to provide a collection of useful tools on which to
build other "big data" systems. Here are just a few features:</p>
<ul>
<li>API for working with various text collections, including
Wikipedia, TREC document collections for information retrieval
research, and
the <a href="http://boston.lti.cs.cmu.edu/clueweb09/wiki/tiki-index.php?page=ClueWeb09%20Wiki">ClueWeb09
web crawl</a>.</li>
<li>Reference implementations of a few common MapReduce algorithm,
including PageRank, bread-first search, co-occurrence matrix
computation.</li>
<li>Implementations of various useful Hadoop data types, including
primitive collections, via the <code>lintools-datatypes</code> package
available <a href="https://github.com/lintool/tools/tree/master/lintools-datatypes">here</a>.</li>
</ul>
<p class="lead" style="padding-top: 10px">Getting Started</p>
<ul>
<li><a href="docs/word-count.html">Word count tutorial</a></li>
<!--li><a href="collections/clue.html">Guide to working on the ClueWeb09
collection</a></li>
<li><a href="docs/content/clue-access.html">Random access to ClueWeb09
WARC records</a></li-->
<li><a href="docs/content/wikipedia.html">Guide to working with Wikipedia</a></li>
<!--li><a href="docs/content/collections.html">Guide to working with standard document collections</a></li-->
</ul>
<p class="lead" style="padding-top: 10px">Exercises</p>
<ul>
<li><a href="docs/exercises/bigrams.html">Bigram counts</a></li>
<li><a href="docs/exercises/indexing.html">Inverted indexing</a></li>
<li><a href="docs/exercises/retrieval.html">Boolean retrieval</a></li>
<li><a href="docs/exercises/pagerank.html">PageRank</a></li>
</ul>
<p class="lead" style="padding-top: 10px">Reference Implementations</p>
<p>Cloud<sup>9</sup> provides reference implementations of many design
patterns and algorithms introduced in the
book <a href="http://mapreduce.me">Data-Intensive Text Processing with
MapReduce</a> by Lin and Dyer. Some of these examples are also
solutions to exercises included with the library, which have been
previously used in MapReduce courses at the University of
Maryland.</p>
<ul>
<li><a href="docs/content/order-inversion.html">Order inversion</a> design pattern for computing relative frequencies (Chapter 3)</li>
<li><a href="docs/content/pairs-stripes.html">Pairs and stripes</a> design pattern for computing term co-occurrences (Chapter 3)</li>
<li>Inverted indexing is covered in <a href="docs/exercises/indexing.html">this exercise</a> (Chapter 4)</li>
<li><a href="docs/content/bfs.html">Parallel breadth-first search</a> (Chapter 5)</li>
<li><a href="docs/content/pagerank.html">PageRank</a> and design patterns for efficient graph algorithms (partially covered in Chapter 5)</li>
</ul>
</div>
<!-- Footer
================================================== -->
<footer class="footer">
<div class="container">
<p class="pull-right"><a href="#">Back to top</a></p>
<p>Designed using <a href="http://twitter.github.com/bootstrap/">Bootstrap</a>.</p>
<p>Code licensed under <a href="http://www.apache.org/licenses/LICENSE-2.0" target="_blank">Apache License v2.0</a>, documentation under <a href="http://creativecommons.org/licenses/by/3.0/">CC BY 3.0</a>.</p>
<p style="padding-top: 40px">This work is or has been supported by the following sources: NSF under awards <a href="http://www.nsf.gov/awardsearch/showAward?AWD_ID=0705832">IIS-0705832</a>, <a href="http://www.nsf.gov/awardsearch/showAward?AWD_ID=0836560">IIS-0836560</a>, <a href="http://www.nsf.gov/awardsearch/showAward?AWD_ID=0916043">IIS-0916043</a>, <a href="http://www.nsf.gov/awardsearch/showAward?AWD_ID=1018625">CCF-1018625</a>, <a href="http://www.nsf.gov/awardsearch/showAward?AWD_ID=1144034">IIS-1144034</a>, and <a href="http://www.nsf.gov/awardsearch/showAward?AWD_ID=1218043">IIS-1218043</a>; Google and IBM under the Academic Cloud Computing Initiative (ACCI); the Intramural Research Program of the NIH, National Library of Medicine; DARPA/IPTO Contract No. HR0011-06-2-0001 under the GALE program, Contract No. HR0011-12-C-0015 under the BOLT program; and Amazon Web Services. Any opinions, findings, conclusions, or recommendations expressed here do not necessarily reflect those of the sponsors.</p>
</div>
</footer>
<!-- Le javascript
================================================== -->
<!-- Placed at the end of the document so the pages load faster -->
<script src="docs/assets/js/jquery.js"></script>
<script src="docs/assets/js/google-code-prettify/prettify.js"></script>
<script src="docs/assets/js/bootstrap-transition.js"></script>
<script src="docs/assets/js/bootstrap-alert.js"></script>
<script src="docs/assets/js/bootstrap-modal.js"></script>
<script src="docs/assets/js/bootstrap-dropdown.js"></script>
<script src="docs/assets/js/bootstrap-scrollspy.js"></script>
<script src="docs/assets/js/bootstrap-tab.js"></script>
<script src="docs/assets/js/bootstrap-tooltip.js"></script>
<script src="docs/assets/js/bootstrap-popover.js"></script>
<script src="docs/assets/js/bootstrap-button.js"></script>
<script src="docs/assets/js/bootstrap-collapse.js"></script>
<script src="docs/assets/js/bootstrap-carousel.js"></script>
<script src="docs/assets/js/bootstrap-typeahead.js"></script>
<script src="docs/assets/js/bootstrap-affix.js"></script>
</body>
</html>