0% found this document useful (0 votes)

8 views

Ditp - ch2 1

MapReduce has its roots in functional programming concepts like map and fold. It provides a model for processing large datasets in parallel across clusters by dividing the work into two phases - the map phase where a function is applied to each input record to generate intermediate outputs, and the reduce phase where the outputs are aggregated. The MapReduce framework coordinates executing the map and reduce phases in parallel on a distributed file system and hardware cluster. It provides a generic programming model for processing large datasets that has been widely adopted with different implementations.

Uploaded by

jefferyleclerc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Ditp - ch2 1

Uploaded by

jefferyleclerc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

tails by introducing partitioners and combiners, which provide greater control

over data flow. MapReduce would not be practical without a tightly-integrated

distributed file system that manages the data being processed; Section 2.5 cov-
ers this in detail. Tying everything together, a complete cluster architecture is
described in Section 2.6 before the chapter ends with a summary.

2.1 Functional Programming Roots

MapReduce has its roots in functional programming, which is exemplified in
languages such as Lisp and ML.4 A key feature of functional languages is the
concept of higher-order functions, or functions that can accept other functions
as arguments. Two common built-in higher order functions are map and fold,
illustrated in Figure 2.1. Given a list, map takes as an argument a function f
(that takes a single argument) and applies it to all elements in a list (the top
part of the diagram). Given a list, fold takes as arguments a function g (that
takes two arguments) and an initial value: g is first applied to the initial value
and the first item in the list, the result of which is stored in an intermediate
variable. This intermediate variable and the next item in the list serve as
the arguments to a second application of g, the results of which are stored in
the intermediate variable. This process repeats until all items in the list have
been consumed; fold then returns the final value of the intermediate variable.
Typically, map and fold are used in combination. For example, to compute
the sum of squares of a list of integers, one could map a function that squares
its argument (i.e., λx.x2 ) over the input list, and then fold the resulting list
with the addition function (more precisely, λxλy.x + y) using an initial value
of zero.
We can view map as a concise way to represent the transformation of a
dataset (as defined by the function f ). In the same vein, we can view fold as an
aggregation operation, as defined by the function g. One immediate observation
is that the application of f to each item in a list (or more generally, to elements
in a large dataset) can be parallelized in a straightforward manner, since each
functional application happens in isolation. In a cluster, these operations can
be distributed across many different machines. The fold operation, on the
other hand, has more restrictions on data locality—elements in the list must
be “brought together” before the function g can be applied. However, many
real-world applications do not require g to be applied to all elements of the
list. To the extent that elements in the list can be divided into groups, the fold
aggregations can also proceed in parallel. Furthermore, for operations that are
commutative and associative, significant efficiencies can be gained in the fold
operation through local aggregation and appropriate reordering.
In a nutshell, we have described MapReduce. The map phase in MapReduce
roughly corresponds to the map operation in functional programming, whereas
the reduce phase in MapReduce roughly corresponds to the fold operation in
4 However, there are important characteristics of MapReduce that make it non-functional

in nature—this will become apparent later.

f f f f f

g g g g g

Figure 2.1: Illustration of map and fold, two higher-order functions commonly
used together in functional programming: map takes a function f and applies it
to every element in a list, while fold iteratively applies a function g to aggregate
results.
Figure 2.1: IllustraPon of map and fold, two higher-‐order funcPons commonly used together
in funcPonal programming: map takes a funcPon f and applies it to every element in a list,
while ffunctional
old iteraPvely applies a funcPon
programming. As we willg to discuss
aggregate results.
in detail shortly, the MapReduce
execution framework coordinates the map and reduce phases of processing over
large amounts of data on large clusters of commodity machines.
Viewed from a slightly different angle, MapReduce codifies a generic “recipe”
for processing large datasets that consists of two stages. In the first stage, a
user-specified computation is applied over all input records in a dataset. These
operations occur in parallel and yield intermediate output that is then aggre-
gated by another user-specified computation. The programmer defines these
two types of computations, and the execution framework coordinates the ac-
tual processing (very loosely, MapReduce provides a functional abstraction).
Although such a two-stage processing structure may appear to be very restric-
tive, many interesting algorithms can be expressed quite concisely—especially
if one decomposes complex algorithms into a sequence of MapReduce jobs.
Subsequent chapters in this book focus on how a number of algorithms can be
implemented in MapReduce.
To be precise, MapReduce can refer to three distinct but related concepts.
First, MapReduce is a programming model, which is the sense discussed above.
Second, MapReduce can refer to the execution framework (i.e., the “runtime”)
that coordinates the execution of programs written in this particular style. Fi-
nally, MapReduce can refer to the software implementation of the programming
model and the execution framework: for example, Google’s proprietary imple-
mentation vs. the open-source Hadoop implementation in Java. And in fact,
there are many implementations of MapReduce, e.g., targeted specifically for
multi-core processors [127], for GPGPUs [71], for the CELL architecture [126],
etc. There are some differences between the MapReduce programming model

Auto Tux Kart
No ratings yet
Auto Tux Kart
2 pages
Assignment5 Drw2 B
No ratings yet
Assignment5 Drw2 B
5 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
37 pages
Google'S Mapreduce Programming Model - Revisited: Ralf L Ammel
No ratings yet
Google'S Mapreduce Programming Model - Revisited: Ralf L Ammel
42 pages
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
No ratings yet
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
23 pages
Distributed Computing Seminar: Mapreduce Theory and Implementation
No ratings yet
Distributed Computing Seminar: Mapreduce Theory and Implementation
30 pages
Towards Efficient Mapreduce Using Mpi
No ratings yet
Towards Efficient Mapreduce Using Mpi
10 pages
Module 3 (Part-1) - Big Data
No ratings yet
Module 3 (Part-1) - Big Data
46 pages
CC UNIT-7
No ratings yet
CC UNIT-7
16 pages
Map Reduce
No ratings yet
Map Reduce
42 pages
Introduction To Map Reduce
No ratings yet
Introduction To Map Reduce
50 pages
Distributed and Cloud Computing
No ratings yet
Distributed and Cloud Computing
58 pages
Mapreduce Class Notes
No ratings yet
Mapreduce Class Notes
43 pages
Chapter 4
No ratings yet
Chapter 4
53 pages
Map Reduce Examples
No ratings yet
Map Reduce Examples
7 pages
Mapreduce
No ratings yet
Mapreduce
13 pages
Mapreduce
No ratings yet
Mapreduce
13 pages
Map-Reduce For Parallel Computing: Amit Jain
No ratings yet
Map-Reduce For Parallel Computing: Amit Jain
72 pages
MapReduce: Simplified Data Processing On Large Clusters
100% (1)
MapReduce: Simplified Data Processing On Large Clusters
13 pages
BIG DATA
No ratings yet
BIG DATA
120 pages
Da Unit 5 Data Analytics
No ratings yet
Da Unit 5 Data Analytics
43 pages
Big Data Notes (All Lectures)
No ratings yet
Big Data Notes (All Lectures)
44 pages
Bda Ia1 Scheme
No ratings yet
Bda Ia1 Scheme
7 pages
Mapreduce Model Principles
No ratings yet
Mapreduce Model Principles
65 pages
Mapreduce: Theory and Implementation: Cse 490H - Intro To Distributed Computing, Modified by George Lee
No ratings yet
Mapreduce: Theory and Implementation: Cse 490H - Intro To Distributed Computing, Modified by George Lee
33 pages
Key Ideas Behind Mapreduce 3. What Is Mapreduce? 4. Hadoop Implementation of Mapreduce 5. Anatomy of A Mapreduce Job Run
No ratings yet
Key Ideas Behind Mapreduce 3. What Is Mapreduce? 4. Hadoop Implementation of Mapreduce 5. Anatomy of A Mapreduce Job Run
27 pages
Lecture 3 - MapReduce
No ratings yet
Lecture 3 - MapReduce
9 pages
Unit 5 Lecture 5
No ratings yet
Unit 5 Lecture 5
21 pages
3.Map-Reduce Framework - 1
No ratings yet
3.Map-Reduce Framework - 1
47 pages
Ditp - ch2 3
No ratings yet
Ditp - ch2 3
2 pages
AAAI2011 Tutorial Slides
No ratings yet
AAAI2011 Tutorial Slides
213 pages
Big Data Computing
No ratings yet
Big Data Computing
36 pages
Ir MR 1
No ratings yet
Ir MR 1
34 pages
Term Paper Java
No ratings yet
Term Paper Java
14 pages
Chapter4 - MapReduce
No ratings yet
Chapter4 - MapReduce
29 pages
Adaptive Processing of User-Defined Aggregates in Jaql: Andrey Balmin Vuk Ercegovac Rares Vernica Kevin Beyer
No ratings yet
Adaptive Processing of User-Defined Aggregates in Jaql: Andrey Balmin Vuk Ercegovac Rares Vernica Kevin Beyer
8 pages
Analysis of Mapreduce Algorithms: Harini Padmanaban
No ratings yet
Analysis of Mapreduce Algorithms: Harini Padmanaban
6 pages
Mapreduce: Simpli - Ed Data Processing On Large Clusters
No ratings yet
Mapreduce: Simpli - Ed Data Processing On Large Clusters
4 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
9 pages
Google'S Mapreduce Programming Model - Revisited: Ralf L Ammel
No ratings yet
Google'S Mapreduce Programming Model - Revisited: Ralf L Ammel
30 pages
Ditp ch2
No ratings yet
Ditp ch2
2 pages
Ecs765p W2
No ratings yet
Ecs765p W2
55 pages
Dean 08 Map Reduce
No ratings yet
Dean 08 Map Reduce
7 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
27 pages
Functional Programming - Unit3
No ratings yet
Functional Programming - Unit3
71 pages
Introduction to batch processing
No ratings yet
Introduction to batch processing
23 pages
Map reduce
No ratings yet
Map reduce
35 pages
Chapter Five Hadoop Mapreduce & HDFS
No ratings yet
Chapter Five Hadoop Mapreduce & HDFS
44 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
43 pages
Parallel Programming, Mapreduce Model: Unit Ii
No ratings yet
Parallel Programming, Mapreduce Model: Unit Ii
47 pages
Own Answer 2
No ratings yet
Own Answer 2
22 pages
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
No ratings yet
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
15 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Unit-2 (MapReduce-I)
No ratings yet
Unit-2 (MapReduce-I)
28 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
NOSQL_MOD3
No ratings yet
NOSQL_MOD3
18 pages
A Comparative Study of Clustering Algorithms Using Mapreduce in Hadoop IJERTV2IS101148
No ratings yet
A Comparative Study of Clustering Algorithms Using Mapreduce in Hadoop IJERTV2IS101148
6 pages
Chapter_3_65f2bce8b8a644ae89dfa7708a380ce1_1712934164766
No ratings yet
Chapter_3_65f2bce8b8a644ae89dfa7708a380ce1_1712934164766
19 pages
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
From Everand
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
Ginno
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
More on C# in Front Office
From Everand
More on C# in Front Office
Xing Zhou
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Hadoop
No ratings yet
Hadoop
7 pages
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-H
No ratings yet
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-H
4 pages
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-C
No ratings yet
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-C
10 pages
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-A
No ratings yet
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-A
7 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-9
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-9
4 pages
MapReduce - What It Is, and Why It Is So Popular
No ratings yet
MapReduce - What It Is, and Why It Is So Popular
7 pages
Paper Dvi
No ratings yet
Paper Dvi
7 pages
2 Mapreduce Model Principles
No ratings yet
2 Mapreduce Model Principles
7 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-1E
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-1E
2 pages
Balanced K-Means Revisited-1
No ratings yet
Balanced K-Means Revisited-1
3 pages
Balanced K-Means Revisited-5
No ratings yet
Balanced K-Means Revisited-5
3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-16
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-16
3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-1Q
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-1Q
2 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-14
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-14
3 pages
A Distance-Based Kernel For Classification Via Support Vector Machines - PMC-17
No ratings yet
A Distance-Based Kernel For Classification Via Support Vector Machines - PMC-17
1 page
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-17
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-17
3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-O
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-O
3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-P
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-P
3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-A
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-A
6 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community
3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-5
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-5
4 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-4
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-4
3 pages
Data Visualization Cheat Sheet For Basic Machine Learning Algorithms - by Boriharn K - Mar, 2024 - Towards Data Science
No ratings yet
Data Visualization Cheat Sheet For Basic Machine Learning Algorithms - by Boriharn K - Mar, 2024 - Towards Data Science
3 pages
The Incremental Online K Means Clustering Algorithm and Its Application To Color Quantization
No ratings yet
The Incremental Online K Means Clustering Algorithm and Its Application To Color Quantization
42 pages
Fuzzy K-Mean Clustering in Mapreduce On Cloud Based Hadoop: Dweepna Garg
No ratings yet
Fuzzy K-Mean Clustering in Mapreduce On Cloud Based Hadoop: Dweepna Garg
4 pages
Improved K-Means Map Reduce Algorithm For Big Data Cluster Analysis
No ratings yet
Improved K-Means Map Reduce Algorithm For Big Data Cluster Analysis
7 pages
K-Means Clustering Optimization Algorithm Based On Mapreduce
No ratings yet
K-Means Clustering Optimization Algorithm Based On Mapreduce
6 pages
Fast Scalable K-Means++ Algorithm With Mapreduce
No ratings yet
Fast Scalable K-Means++ Algorithm With Mapreduce
2 pages
Tutorial For K Means Clustering in Python Sklearn - MLK - Machine Learning Knowledge-5
No ratings yet
Tutorial For K Means Clustering in Python Sklearn - MLK - Machine Learning Knowledge-5
3 pages
Lab 3
No ratings yet
Lab 3
8 pages
CC103 Mod3
No ratings yet
CC103 Mod3
12 pages
CC 101 - Basics in C++
No ratings yet
CC 101 - Basics in C++
3 pages
Formal Languages & Finite Theory of Automata: BS Course
No ratings yet
Formal Languages & Finite Theory of Automata: BS Course
15 pages
StringMatchingAlgorithmsL1
No ratings yet
StringMatchingAlgorithmsL1
42 pages
Guia Certificacao Java Meu
No ratings yet
Guia Certificacao Java Meu
283 pages
Problem K: Trash Removal
No ratings yet
Problem K: Trash Removal
2 pages
Karnaugh Map or K-Map - Solving Steps, Solved Example Problems
No ratings yet
Karnaugh Map or K-Map - Solving Steps, Solved Example Problems
9 pages
Big M Method
No ratings yet
Big M Method
12 pages
Lec04
No ratings yet
Lec04
72 pages
Chapter 4
No ratings yet
Chapter 4
15 pages
Microprocessor - 8086 Instruction Sets
No ratings yet
Microprocessor - 8086 Instruction Sets
4 pages
ADA Unit 5- Bellman Ford Algorithm
No ratings yet
ADA Unit 5- Bellman Ford Algorithm
20 pages
1、Recent Advances of Large-Scale Linear Classification
No ratings yet
1、Recent Advances of Large-Scale Linear Classification
20 pages
Lec07 FFT
No ratings yet
Lec07 FFT
14 pages
Gauss and Jacobi C Program
No ratings yet
Gauss and Jacobi C Program
5 pages
Identification User Case of Few DSA
No ratings yet
Identification User Case of Few DSA
12 pages
Knighttour 110904035548 Phpapp01
No ratings yet
Knighttour 110904035548 Phpapp01
22 pages
Latex 1
No ratings yet
Latex 1
8 pages
PHP Primitive and Compound Types: Week2
No ratings yet
PHP Primitive and Compound Types: Week2
29 pages
斯坦福大学机器学习数学基础 57-64
No ratings yet
斯坦福大学机器学习数学基础 57-64
8 pages
19CSE 212: Data Structures and Algorithms Lab Sheet 5 Stack Adt
No ratings yet
19CSE 212: Data Structures and Algorithms Lab Sheet 5 Stack Adt
19 pages
2504.04021v1
No ratings yet
2504.04021v1
10 pages
Introduction To Fuzzy Logic Control: Andrew L. Nelson
No ratings yet
Introduction To Fuzzy Logic Control: Andrew L. Nelson
34 pages
Coursera Algorithms On Graphs
No ratings yet
Coursera Algorithms On Graphs
312 pages
Chapter 7-Callback Interfaces, Delegates
No ratings yet
Chapter 7-Callback Interfaces, Delegates
36 pages
All PDF Merged Complete Toc
No ratings yet
All PDF Merged Complete Toc
755 pages
Sequential Circuits
No ratings yet
Sequential Circuits
22 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Ditp - ch2 1

Uploaded by

Ditp - ch2 1

Uploaded by

tails by introducing partitioners and combiners, which provide greater control

over data flow. MapReduce would not be practical without a tightly-integrated

2.1 Functional Programming Roots

in nature—this will become apparent later.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.