Ditp - ch2 1
Ditp - ch2 1
g g g g g
Figure 2.1: Illustration of map and fold, two higher-order functions commonly
used together in functional programming: map takes a function f and applies it
to every element in a list, while fold iteratively applies a function g to aggregate
results.
Figure
2.1:
IllustraPon
of
map
and
fold,
two
higher-‐order
funcPons
commonly
used
together
in
funcPonal
programming:
map
takes
a
funcPon
f
and
applies
it
to
every
element
in
a
list,
while
ffunctional
old
iteraPvely
applies
a
funcPon
programming. As we willg
to
discuss
aggregate
results.
in detail shortly, the MapReduce
execution framework coordinates the map and reduce phases of processing over
large amounts of data on large clusters of commodity machines.
Viewed from a slightly different angle, MapReduce codifies a generic “recipe”
for processing large datasets that consists of two stages. In the first stage, a
user-specified computation is applied over all input records in a dataset. These
operations occur in parallel and yield intermediate output that is then aggre-
gated by another user-specified computation. The programmer defines these
two types of computations, and the execution framework coordinates the ac-
tual processing (very loosely, MapReduce provides a functional abstraction).
Although such a two-stage processing structure may appear to be very restric-
tive, many interesting algorithms can be expressed quite concisely—especially
if one decomposes complex algorithms into a sequence of MapReduce jobs.
Subsequent chapters in this book focus on how a number of algorithms can be
implemented in MapReduce.
To be precise, MapReduce can refer to three distinct but related concepts.
First, MapReduce is a programming model, which is the sense discussed above.
Second, MapReduce can refer to the execution framework (i.e., the “runtime”)
that coordinates the execution of programs written in this particular style. Fi-
nally, MapReduce can refer to the software implementation of the programming
model and the execution framework: for example, Google’s proprietary imple-
mentation vs. the open-source Hadoop implementation in Java. And in fact,
there are many implementations of MapReduce, e.g., targeted specifically for
multi-core processors [127], for GPGPUs [71], for the CELL architecture [126],
etc. There are some differences between the MapReduce programming model