0% found this document useful (0 votes)

15 views14 pages

Map Reduce 2

MapReduce is a distributed computing model that processes data through two main tasks: Map, which converts input data into key/value pairs, and Reduce, which combines those pairs into a smaller set. The framework operates on <key, value> pairs and includes stages such as mapping, shuffling, and reducing, with Hadoop managing task distribution and data handling. Examples include counting word frequencies and identifying top salaried employees from a dataset.

Uploaded by

Kavvya Mridul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views14 pages

Map Reduce 2

Uploaded by

Kavvya Mridul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 14

MAP REDUCE EXAMPLES

WHAT IS MAPREDUCE?
• MapReduce is a processing technique and a program model for distributed computing
based on java. The MapReduce algorithm contains two important tasks, namely Map
and Reduce. Map takes a set of data and converts it into another set of data, where
individual elements are broken down into tuples (key/value pairs). Secondly, reduce
task, which takes the output from a map as an input and combines those data tuples
into a smaller set of tuples. As the sequence of the name MapReduce implies, the
reduce task is always performed after the map job.
• The major advantage of MapReduce is that it is easy to scale data processing over
multiple computing nodes. Under the MapReduce model, the data processing
primitives are called mappers and reducers. Decomposing a data processing
application into mappers and reducers is sometimes nontrivial. But, once we write an
application in the MapReduce form, scaling the application to run over hundreds,
thousands, or even tens of thousands of machines in a cluster is merely a
configuration change. This simple scalability is what has attracted many programmers
to use the MapReduce model.
THE ALGORITHM
• Generally MapReduce paradigm is based on sending the computer to where
the data resides!
• MapReduce program executes in three stages, namely map stage, shuffle
stage, and reduce stage.
• Map stage − The map or mapper’s job is to process the input data.
Generally the input data is in the form of file or directory and is stored in
the Hadoop file system (HDFS). The input file is passed to the mapper
function line by line. The mapper processes the data and creates several
small chunks of data.
• Reduce stage − This stage is the combination of the Shuffle stage and the
Reduce stage. The Reducer’s job is to process the data that comes from the
mapper. After processing, it produces a new set of output, which will be
stored in the HDFS.
• During a MapReduce job, Hadoop sends the Map and Reduce tasks to the appropriate
servers in the cluster.
• The framework manages all the details of data-passing such as issuing tasks, verifying
task completion, and copying data around the cluster between the nodes.
• Most of the computing takes place on nodes with data on local disks that reduces the
network traffic.
• After completion of the given tasks, the cluster collects and reduces the data to form an
appropriate result, and sends it back to the Hadoop server.
INPUTS AND OUTPUTS (JAVA
PERSPECTIVE)
• The MapReduce framework operates on <key, value> pairs, that is, the framework views the
input to the job as a set of <key, value> pairs and produces a set of <key, value> pairs as the
output of the job, conceivably of different types.

• The key and the value classes should be in serialized manner by the framework and hence,
need to implement the Writable interface. Additionally, the key classes have to implement the
Writable-Comparable interface to facilitate sorting by the framework. Input and Output types
of a MapReduce job − (Input) <k1, v1> → map → <k2, v2> → reduce → <k3, v3>(Output).

• Input Output
• Map <k1, v1> list (<k2, v2>)
• Reduce <k2, list(v2)> list (<k3, v3>)
TERMINOLOGY
• Mapper − Mapper maps the input key/value pairs to a set of intermediate key/value
pair.
• NamedNode − Node that manages the Hadoop Distributed File System (HDFS).
• DataNode − Node where data is presented in advance before any processing takes
place.
• MasterNode − Node where JobTracker runs and which accepts job requests from
clients.
• SlaveNode − Node where Map and Reduce program runs.
• JobTracker − Schedules jobs and tracks the assign jobs to Task tracker.
• Task Tracker − Tracks the task and reports status to JobTracker.
• Job − A program is an execution of a Mapper and Reducer across a dataset.
• Task − An execution of a Mapper or a Reducer on a slice of data.
• Task Attempt − A particular instance of an attempt to execute a task on a SlaveNode.
STEPS IN MAP REDUCE

• The MapReduce algorithm contains two important tasks, namely Map

and Reduce.
• The Map task takes a set of data and converts it into another set of
data, where individual elements are broken down into tuples (key-value
pairs).
• The Reduce task takes the output from the Map as an input and
combines those data tuples (key-value pairs) into a smaller set of tuples.
• The reduce task is always performed after the map job.
LET US NOW TAKE A CLOSE LOOK AT EACH
OF THE PHASES AND TRY TO UNDERSTAND
THEIR SIGNIFICANCE.
• Input Phase − Here we have a Record Reader that translates
each record in an input file and sends the parsed data to the
mapper in the form of key-value pairs.
• Map − Map is a user-defined function, which takes a series of
key-value pairs and processes each one of them to generate
zero or more key-value pairs.
• Intermediate Keys − They key-value pairs generated by the
mapper are known as intermediate keys.
• Combiner − A combiner is a type of local Reducer that groups
similar data from the map phase into identifiable sets. It
takes the intermediate keys from the mapper as input and
applies a user-defined code to aggregate the values in a
small scope of one mapper. It is not a part of the main
• Shuffle and Sort − The Reducer task starts with the Shuffle and Sort
step. It downloads the grouped key-value pairs onto the local
machine, where the Reducer is running. The individual key-value
pairs are sorted by key into a larger data list. The data list groups the
equivalent keys together so that their values can be iterated easily
in the Reducer task.
• Reducer − The Reducer takes the grouped key-value paired data as
input and runs a Reducer function on each one of them. Here, the
data can be aggregated, filtered, and combined in a number of ways,
and it requires a wide range of processing. Once the execution is
over, it gives zero or more key-value pairs to the final step.
• Output Phase − In the output phase, we have an output formatter
that translates the final key-value pairs from the Reducer function
and writes them onto a file using a record writer.
LET US TRY TO UNDERSTAND THE TWO
TASKS MAP &F REDUCE WITH THE HELP OF
A SMALL DIAGRAM −
1. MapReduce example to count the frequency of each word in
a given input text. Our input text is, “Big data comes in
various formats. This data can be stored in multiple data
servers.”
2. Find the top 3 salaried employees in following data using
mapreduce
George Vetti caden 3300
Jamie Engesser 3300
Paul Coddin 2800
Joe Niemiec 3100
Adis Cesir 3200
Rohit Bakshi 3300
Tom McCuch 3000
Eric Mizell 3300
Grant Liu 3200
Ajay Singh 2500
Chris Harris 2900
Jeff Markham 3100
Nadeem Asghar 3300
Adam Diaz 3300
Don Hilborn 3300
Jean-Philippe Playe 3400
Michael Aube 3300
Mark Lochbihler 3300
Olivier Renault 3300
Teddy Choi 1200
Dan Rice 2500
Rommel Garcia 3300
Ryan Templeton 3300
Sridhara Sabbella 3300
Frank Romano 3300

SAP interface programming with RFC and VBA: Edit SAP data with MS Access
From Everand
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
Karl Josef Hensel
No ratings yet
BDA Unit-2
No ratings yet
BDA Unit-2
11 pages
BIG DATA
No ratings yet
BIG DATA
120 pages
UNIT-3 (1)
No ratings yet
UNIT-3 (1)
27 pages
BDA UNIT-3
No ratings yet
BDA UNIT-3
44 pages
UNIT – III
No ratings yet
UNIT – III
38 pages
unit3
No ratings yet
unit3
33 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Map reduce
No ratings yet
Map reduce
35 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
Understanding MapReduce in Hadoop
No ratings yet
Understanding MapReduce in Hadoop
25 pages
Map Reduce
No ratings yet
Map Reduce
45 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
43 pages
21CS1601 UNIT 5 UNDERSTANDING BIG DATA TECHNOLGIES
No ratings yet
21CS1601 UNIT 5 UNDERSTANDING BIG DATA TECHNOLGIES
20 pages
6.UNIT 3 BDA
No ratings yet
6.UNIT 3 BDA
18 pages
Unit 5 - Mapreduce
No ratings yet
Unit 5 - Mapreduce
8 pages
Unit 3 - Big Data Technologies
No ratings yet
Unit 3 - Big Data Technologies
42 pages
BDA Unit 3 1
No ratings yet
BDA Unit 3 1
37 pages
Unit-2 Map Reduce Notes
No ratings yet
Unit-2 Map Reduce Notes
28 pages
Big Data Management Continued
No ratings yet
Big Data Management Continued
48 pages
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
No ratings yet
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
15 pages
Map Reduce Workflow Colloquim
No ratings yet
Map Reduce Workflow Colloquim
30 pages
2 Bda Chapter2 Answer
No ratings yet
2 Bda Chapter2 Answer
9 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
9 pages
UNIT 3 NOTES (1)
No ratings yet
UNIT 3 NOTES (1)
21 pages
BIG DATA UNIT -3
No ratings yet
BIG DATA UNIT -3
7 pages
Understanding MapReduce
No ratings yet
Understanding MapReduce
15 pages
UNIT 3bda
No ratings yet
UNIT 3bda
16 pages
Unit-2 (MapReduce-II)
No ratings yet
Unit-2 (MapReduce-II)
11 pages
Big Data BCA Unit4
No ratings yet
Big Data BCA Unit4
9 pages
Data Science Presentation
No ratings yet
Data Science Presentation
20 pages
Unit4 Fos
No ratings yet
Unit4 Fos
7 pages
Unit 3
No ratings yet
Unit 3
22 pages
Notes Bug Data and of Apache
No ratings yet
Notes Bug Data and of Apache
4 pages
Bda Module 4
No ratings yet
Bda Module 4
34 pages
Map Reduce
No ratings yet
Map Reduce
8 pages
HDFS Unit 4
No ratings yet
HDFS Unit 4
12 pages
Unit - III
No ratings yet
Unit - III
37 pages
(BIG DATA) (MapReduce - Quick Guide, Tutorialspoint - Com)
No ratings yet
(BIG DATA) (MapReduce - Quick Guide, Tutorialspoint - Com)
36 pages
3.6. E32 (400T20S) / E32 (868T20S) / E32 (915T20S) : Pin No. Pin Pin Direction Application
No ratings yet
3.6. E32 (400T20S) / E32 (868T20S) / E32 (915T20S) : Pin No. Pin Pin Direction Application
1 page
Bda 03
No ratings yet
Bda 03
10 pages
Hadoop (Mapreduce)
No ratings yet
Hadoop (Mapreduce)
43 pages
BDA Unit 2 Notes
No ratings yet
BDA Unit 2 Notes
32 pages
Map Reduce
No ratings yet
Map Reduce
74 pages
Chapter 4 - Understanding Map Reduce Fundamentals
No ratings yet
Chapter 4 - Understanding Map Reduce Fundamentals
45 pages
Unit-4-1
No ratings yet
Unit-4-1
12 pages
Map Reduce
No ratings yet
Map Reduce
7 pages
Big Data Analytics UNIT 3 Notets
No ratings yet
Big Data Analytics UNIT 3 Notets
12 pages
unit 2
No ratings yet
unit 2
12 pages
Unit Ii Iintroduction To Map Reduce
No ratings yet
Unit Ii Iintroduction To Map Reduce
4 pages
Map Reduce
No ratings yet
Map Reduce
18 pages
IPXACT-2022 User Guide
No ratings yet
IPXACT-2022 User Guide
71 pages
AvayaAura_SystemManager_SNMP_White_Paper_Release_10_1_Aug22_2022 (2)
No ratings yet
AvayaAura_SystemManager_SNMP_White_Paper_Release_10_1_Aug22_2022 (2)
32 pages
Map Reduce Tutorial-1
No ratings yet
Map Reduce Tutorial-1
7 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Literature Review Car Security System
100% (2)
Literature Review Car Security System
7 pages
3 Fuel Consumption Example - MR
No ratings yet
3 Fuel Consumption Example - MR
7 pages
Bda Unit-3
No ratings yet
Bda Unit-3
20 pages
DSBDA Manual Assignment 11
No ratings yet
DSBDA Manual Assignment 11
6 pages
Basic - Computer - Skills - Level C
No ratings yet
Basic - Computer - Skills - Level C
119 pages
3.1.How Map Reduce Works & 3.2 Anatomy
No ratings yet
3.1.How Map Reduce Works & 3.2 Anatomy
11 pages
Understand: The First Phase of Mapreduce Paradigm, What Is A Map/Mapper, What Is The Input To The
No ratings yet
Understand: The First Phase of Mapreduce Paradigm, What Is A Map/Mapper, What Is The Input To The
5 pages
Data Science
No ratings yet
Data Science
7 pages
SCUBA
No ratings yet
SCUBA
34 pages
ZNID-GPON-2301_v1.1.F4AF9_Release_Notes (1)
No ratings yet
ZNID-GPON-2301_v1.1.F4AF9_Release_Notes (1)
4 pages
LINUX Unit 1
No ratings yet
LINUX Unit 1
50 pages
Advanced Programming Chapter One
No ratings yet
Advanced Programming Chapter One
31 pages
White Paper c11 741484 - Architecture
No ratings yet
White Paper c11 741484 - Architecture
25 pages
NOTES DataStructure Stacks 2022 23
No ratings yet
NOTES DataStructure Stacks 2022 23
8 pages
Swayam_Resume
No ratings yet
Swayam_Resume
1 page
Chapter 6 - Combinational Logic Systems
No ratings yet
Chapter 6 - Combinational Logic Systems
55 pages
Anatomy of A MapReduce Job
No ratings yet
Anatomy of A MapReduce Job
5 pages
Data Structure Unit-2 Quiz
No ratings yet
Data Structure Unit-2 Quiz
7 pages
Log
No ratings yet
Log
11 pages
Unit 1: Introduction To Python
100% (1)
Unit 1: Introduction To Python
184 pages
ARM Cortex M0
No ratings yet
ARM Cortex M0
5 pages
Lab Activity-2-Simple Input and Output.pdf
No ratings yet
Lab Activity-2-Simple Input and Output.pdf
6 pages
Python Home Work
No ratings yet
Python Home Work
5 pages
The Forrester Wave™: Infrastructure Automation Platforms, Q3 2020
No ratings yet
The Forrester Wave™: Infrastructure Automation Platforms, Q3 2020
17 pages
© Ericsson-LG Enterprise Co., Ltd. 2020 Version 1.1
No ratings yet
© Ericsson-LG Enterprise Co., Ltd. 2020 Version 1.1
16 pages
Web-Base 1
No ratings yet
Web-Base 1
6 pages
CS311 Mid Term Papers
No ratings yet
CS311 Mid Term Papers
4 pages
High Performance Table-Based Algorithm For Pipelined CRC Calculation
No ratings yet
High Performance Table-Based Algorithm For Pipelined CRC Calculation
8 pages
Power Off Reset Reason Backup
No ratings yet
Power Off Reset Reason Backup
4 pages
Assignment 5 (Hadoop)
No ratings yet
Assignment 5 (Hadoop)
1 page
TYBSC-IT - SEM6 - SIC - APR19 Munotes Mumbai University
No ratings yet
TYBSC-IT - SEM6 - SIC - APR19 Munotes Mumbai University
1 page
Archit Bansal: Work History
No ratings yet
Archit Bansal: Work History
2 pages
Arduino PID Control Tutorial - Make Your Project Smarter
No ratings yet
Arduino PID Control Tutorial - Make Your Project Smarter
7 pages
Software As A Service
No ratings yet
Software As A Service
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Map Reduce 2

Uploaded by

Map Reduce 2

Uploaded by

MAP REDUCE EXAMPLES

• The MapReduce algorithm contains two important tasks, namely Map

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.