3 MapReduce Framework
3 MapReduce Framework
3 MapReduce Framework
By Dinesh Amatya
MapReduce : Introduction
Conceptually, MapReduce programs transform lists of input data elements into lists of
output data elements
MapReduce is a programming model for processing and generating large data sets
A MapReduce program will do this twice, using two different list processing idioms:
map, and reduce
MapReduce: Programming Model
Data
0067011990999991950051507004...9999999N9+00001+99999999999...
0043011990999991950051512004...9999999N9+00221+99999999999...
0043011990999991950051518004...9999999N9-00111+99999999999...
0043012650999991949032412004...0500001N9+01111+99999999999...
0043012650999991949032418004...0500001N9+00781+99999999999...
MapReduce : Example
Input to map
(0, 0067011990999991950051507004...9999999N9+00001+99999999999...)
(106, 0043011990999991950051512004...9999999N9+00221+99999999999...)
(212, 0043011990999991950051518004...9999999N9-00111+99999999999...)
(318, 0043012650999991949032412004...0500001N9+01111+99999999999...)
(424, 0043012650999991949032418004...0500001N9+00781+99999999999...)
MapReduce : Example
(1950,22)
(1950,−11)
(1949,111)
(1949,78)
MapReduce : Example
(1950, 22)
MapReduce : Example
MapReduce : Execution overview
MapReduce: Data Flow
MapReduce : Fault Tolerance
MapReduce: Locality
→ partitioning phase takes place after the map phase and before the reduce
phase
→ number of partitions is equal
to the number of reducers
→ default is HashPartitioner
MapReduce: Combiner
→ local reducer
job.setInputFormatClass(TextInputFormat.class);
Java Mapreduce:
Main class to run the job
job.setMapperClass(MaxTemperatureMapper.class);
job.setReducerClass(MaxTemperatureReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true)?0:1);
}
}
Java Mapreduce:
Command to run job
hadoop jar target/hadoop-first-1.0-SNAPSHOT.jar first.loader.MaxTemperature
Java Mapreduce:
Partitioner
public class TemperaturePartitioner extends Partitioner<LongWritable, Text> {
@Override
public int getPartition(LongWritable key, Text value, int numPartitions) {
// If year less than 1980 return 0
// else return 1
}
}
→ job.setPartitionerClass(TemperaturePartitioner.class);
→ job.setNumReduceTasks(2);
Java Mapreduce:
Combiner
job.setCombinerClass(MaxTemperatureReducer.class);
Figures