0% found this document useful (0 votes)

3 views

BDA-Unit-II

Uploaded by

syambabuj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

BDA-Unit-II

Uploaded by

syambabuj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

UNIT – II

Writing MapReduce Programs: A Weather Dataset, Understanding Hadoop API for

MapReduce Framework (Old and New), Basic programs of Hadoop MapReduce: Driver code,
Mapper code, Reducer code, RecordReader, Combiner, Partitioner
A Weather Dataset
Weather sensors collecting data every hour at many locations across the globe gather a large volume
of log data, which is a good candidate for analysis with MapReduce, since it is semi structured and
record- oriented.
Data Format
The data we will use is from the National Climatic Data Center (NCDC, http://www .ncdc.noaa.gov/).
The data is stored using a line-oriented ASCII format, in which each line is a record.
Classic MapReduce (MapReduce 1)
A job run in classic MapReduce is illustrated in Figure -1. At the highest level, there are four
independent entities:
• The client, which submits the MapReduce job.
• The jobtracker, which coordinates the job run. The jobtracker is a Java application whose main
class is JobTracker.
• The tasktrackers, which run the tasks that the job has been split into. Tasktrackers are Java applica
tions whose main class is TaskTracker.
• The distributed filesystem (normally HDFS, covered in Chapter 3), which is used for sharing job
files between the other entities.

Figure 1. How Hadoop runs a MapReduce job using the classic framework

1
UNIT – II

Job Submission
The submit() method on Job creates an internal JobSummitter instance and calls sub mitJobInternal()
on it (step 1 in Figure 6-1). Having submitted the job, waitForCom pletion() polls the job’s progress
once a second and reports the progress to the console if it has changed since the last report.
The job submission process implemented by JobSummitter does the following:
• Asks the jobtracker for a new job ID (by calling getNewJobId() on JobTracker) (step 2).
• Checks the output specification of the job. For example, if the output directory has not been
specified or it already exists, the job is not submitted and an error is thrown to the MapReduce program.
• Computes the input splits for the job. If the splits cannot be computed, because the input paths
don’t exist, for example, then the job is not submitted and an error is thrown to the MapReduce
program.
• Copies the resources needed to run the job, including the job JAR file, the configuration file, and
the computed input splits, to the jobtracker’s filesystem in a directory named after the job ID. The job
JAR is copied with a high replication factor (controlled by the mapred.submit.replication property,
which defaults to
10) so that there are lots of copies across the cluster for the tasktrackers to access when they run tasks
for the job (step 3).
• Tells the jobtracker that the job is ready for execution (by calling submitJob() on JobTracker) (step
4).
Job Initialization
When the JobTracker receives a call to its submitJob() method, it puts it into an internal queue
from where the job scheduler will pick it up and initialize it (step 5).
To create the list of tasks to run, the job scheduler first retrieves the input splits computed by the
client from the shared filesystem (step 6). It then creates one map task for each split. The number of
reduce tasks to create is determined by the mapred.reduce.tasks property in the Job, which is set by the
setNumReduceTasks() method, and the scheduler simply creates this number of reduce tasks to be run.
In addition to the map and reduce tasks, two further tasks are created: a job setup task and a job
cleanup task. These are run by tasktrackers and are used to run code to setup the job before any map
tasks run, and to cleanup after all the reduce tasks are complete.

Task Assignment
Tasktrackers run a simple loop that periodically sends heartbeat method calls to the jobtracker.
2
UNIT – II

Heartbeats tell the jobtracker that a tasktracker is alive. As a part of the heartbeat, a tasktracker will
indicate whether it is ready to run a new task, and if it is, the jobtracker will allocate it a task, which it
communicates to the tasktracker using the heartbeat return value (step 7).

Tasktrackers have a fixed number of slots for map tasks and for reduce tasks: for example, a
tasktracker may be able to run two map tasks and two reduce tasks simultaneously. (The precise
number depends on the number of cores and the amount of memory on the tasktracker.
Task Execution Now that the tasktracker has been assigned a task, the next step is for it to run the
task. First, it localizes the job JAR by copying it from the shared filesystem to the tasktracker’s
filesystem. It also copies any files needed from the distributed cache by the application to the local
disk. Then, it creates a local working directory for the task, and un-jars the contents of the JAR into
this directory. Third, it creates an instance of TaskRunner to run the task. TaskRunner launches a new
Java Virtual Machine (step 9) to run each task in (step 10), so that any bugs in the user-defined map
and reduce functions don’t affect the tasktracker.
Progress and Status Updates
MapReduce jobs are long-running batch jobs, taking anything from minutes to hours to run.
Because this is a significant length of time, it’s important for the user to get feedback on how the job
is progressing. A job and each of its tasks have a status, which includes such things as the state of the
job or task (e.g., running, successfully completed, failed), the progress of maps and reduces, the values
of the job’s counters, and a status message or description.
When a task is running, it keeps track of its progress, that is, the proportion of the task completed.
For map tasks, this is the proportion of the input that has been processed. For reduce tasks, it’s a little
more complex, but the system can still estimate the proportion of the reduce input processed.
Job Completion
When the jobtracker receives a notification that the last task for a job is complete (this will be
the special job cleanup task), it changes the status for the job to “successful.” Then, when the Job polls
for status, it learns that the job has completed successfully, so it prints a message to tell the user and
then returns from the waitForCompletion() method. Last, the jobtracker cleans up its working state for
the job and instructs tasktrackers to do the same (so intermediate output is deleted, for example).
For very large clusters in the region of 4000 nodes and higher, the MapReduce system described
in the previous section begins to hit scalability bottlenecks, so in 2010 a group at Yahoo! began to
design the next generation of MapReduce. The result was YARN, short for Yet Another Resource
Negotiator.

3
UNIT – II

YARN (MapReduce 2)
YARN meets the scalability shortcomings of “classic” MapReduce by splitting the
responsibilities of the jobtracker into separate entities. The jobtracker takes care of both job scheduling
(matching tasks withtasktrackers) and task progress monitoring (keeping track of tasks and restarting
failed or slow tasks, and doing task bookkeeping such as maintaining counter totals).
YARN separates these two roles into two independent daemons:
I. Resource Manager
II. Application master
a resource manager to manage the use of resources across the cluster, and an application
master to manage the lifecycle of applications running on the cluster.
The idea is that an application master negotiates with the resource manager for cluster resources—
described in terms of a number of containers each with a certain memory limit—then runs
applicationspecific processes in those containers. The containers are overseen by node managers
running on cluster nodes, which ensure that the application does not use more resources than it has
been allocated. In contrast to the jobtracker, each instance of an application—here a MapReduce job
—has a dedicated application master, which runs for the duration of the application.
The beauty of YARN’s design is that different YARN applications can co-exist on the same
cluster—so a MapReduce application can run at the same time as an MPI application, for example—
which brings great benefits for managability and cluster utilization. Furthermore, it is even possible
for users to run different versions of MapReduce on the same YARN cluster, which makes the process
of upgrading MapReduce more manageable.

4
UNIT – II

Figure 4. How Hadoop runs a MapReduce job using YARN

MapReduce on YARN involves more entities than classic MapReduce. They are:
• The client, which submits the MapReduce job.
• The YARN resource manager, which coordinates the allocation of compute resources on the
cluster.
• The YARN node managers, which launch and monitor the compute containers on machines in the
cluster.
• The MapReduce application master, which coordinates the tasks running the MapReduce job. The
application master and the MapReduce tasks run in containers that are scheduled by the resource
manager, and managed by the node managers.
• The distributed filesystem (normally HDFS, covered in Chapter 3), which is used for sharing job
files between the other entities. The process of running a job is shown in Figure 2, and described in the
following sections.
Job Submission
Jobs are submitted in MapReduce 2 using the same user API as MapReduce 1 (step 1). MapReduce
2 has an implementation of ClientProtocol that is activated when mapreduce.framework.name is set to
yarn. The submission process is very similar to the classic implementation. The new job ID is retrieved
from the resource manager rather than the jobtracker.
Job Initialization
When the resource manager receives a call to its submitApplication(), it hands off the request to
the scheduler. The scheduler allocates a container, and the resource manager then launches the
application master’s process there, under the node manager’s management (steps 5a and 5b).
The application master initializes the job by creating a number of bookkeeping objects to keep track
of the job’s progress, as it will receive progress and completion reports from the tasks (step 6). Next,
it retrieves the input splits computed in the client from the shared filesystem (step 7). It then creates a
map task object for each split, and a number of reduce task objects determined by the
mapreduce.job.reduces property.
Task Assignment
The application master requests containers for all the map and reduce tasks in the job from the
resource manager (step 8). Each request, which are piggybacked on heartbeat calls, includes
information about each map task’s data locality, in particular the hosts and corresponding racks that
the input split resides on.
Task Execution
5
UNIT – II

Once a task has been assigned a container by the resource manager’s scheduler, the application
master starts the container by contacting the node manager (steps 9a and 9b). The task is executed by
a Java application whose main class is YarnChild. Before it can run the task it localizes the resources
that the task needs, including the job configuration and JAR file, and any files from the distributed
cache (step 10). Finally, it runs the map or reduce task (step 11).
Understanding Hadoop API for MapReduce Framework (Old and New)
Hadoop provides two Java MapReduce APIs named as old and new respectively.
There are several notable differences between the two APIs:
1. The new API favors abstract classes over interfaces, since these are easier to evolve. For example,
you can add a method (with a default implementation) to an abstract class without breaking old
implementations of the class2. For example, the
Mapper and Reducer interfaces in the old API are abstract classes in the new API.
2. The new API is in the org.apache.hadoop.mapreduce package (and
subpackages). The old API can still be found in org.apache.hadoop.mapred.
3. The new API makes extensive use of context objects that allow the user code to communicate
with the MapReduce system. The new Context, for example, essentially unifies the role of the JobConf,
the OutputCollector, and the Reporter from the old API.
4. In both APIs, key-value record pairs are pushed to the mapper and reducer, but in addition, the
new API allows both mappers and reducers to control the execution flow by overriding the run()
method.
In the old API this is possible for mappers by writing a MapRunnable, but no equivalent exists for
reducers.
5. Configuration has been unified. The old API has a special JobConf object for job configuration.
In the new API, this distinction is dropped, so job configuration is done through a Configuration.
6. Job control is performed through the Job class in the new API, rather than the old JobClient, which
no longer exists in the new API.
7. Output files are named slightly differently: in the old API both map and reduce outputs are named
part- nnnnn, while in the new API map outputs are named partm-nnnnn, and reduce outputs are named
part-r- nnnnn (where nnnnn is an integerdesignating the part number, starting from zero).
8. In the new API the reduce() method passes values as a java.lang.Iterable, rather than a
java.lang.Iterator (as the old API does). This change makes it easier to iterate over the values using
Java’s for-each loop construct: for (VALUEIN value : values) { ...}
Basic programs of Hadoop MapReduce:
6
UNIT – II

Driver code
A Job object forms the specification of the job. It gives you control over how the job is run. When
we run this job on a Hadoop cluster, we will package the code into a JAR file (which Hadoop will
distribute around the cluster). Rather than explicitly specify the name of the JAR file, we can pass a
class in the Job’s setJarByClass() method, which Hadoop will use to locate the relevant JAR file by
looking for the JAR file containing this class.
Having constructed a Job object, we specify the input and output paths. An input path is specified by
calling the static addInputPath() method on FileInputFormat, and it can be a single file, a directory (in
which case, the input forms all the files in that directory), or a file pattern.
The output path (of which there is only one) is specified by the static setOutputPath() method on
FileOutputFormat. It specifies a directory where the output files from the reducer functions are written.
Next, we specify the map and reduce types to use via the setMapperClass() and setReducerClass()
methods.
The setOutputKeyClass() and setOutputValueClass() methods control the output types for the map and
the reduce functions, which are often the same, as they are in our case.
If they are different, then the map output types can be set using the methods
setMapOutputKeyClass() and setMapOutputValueClass().
The input types are controlled via the input format, which we have not explicitly set since we are using
the default TextInputFormat.
After setting the classes that define the map and reduce functions, we are ready to run the job. The
waitForCompletion() method on Job submits the job and waits for it to finish.
The return value of the waitForCompletion() method is a boolean indicating success (true) or failure
(false), which we translate into the program’s exit code of 0 or 1. The driver code for weather program
is specified below.
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MaxTemperature

{

7
UNIT – II

public static void main(String[] args) throws Exception

{
if (args.length != 2) {
System.err.println("Usage: MaxTemperature <input path> <output
path>"); System.exit(-1);

}
Job job = new Job();
job.setJarByClass(MaxTemperature.class);
job.setJobName("Max temperature");
FileInputFormat.addInputPath(job, new
Path(args[0]));
FileOutputFormat.setOutputPath(job, new
Path(args[1]));
job.setMapperClass(MaxTemperatureMapper.cla
ss);
job.setReducerClass(MaxTemperatureReducer.cl
ass); job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Mapper code
The Mapper class is a generic type, with four formal type parameters that specify the input key,
input value, output key, and output value types of the map function. For the present example, the input
key is a long integer offset, the input value is a line of text, the output key is a year, and the output
value is an air temperature (an integer). The following example shows the implementation of our map
method.
import java.io.IOException;
import
org.apache.hadoop.io.IntWritable;
import
org.apache.hadoop.io.LongWritable

8
UNIT – II

; import org.apache.hadoop.io.Text;
import
org.apache.hadoop.mapreduce.Mapper
; public class MaxTemperatureMapper
extends Mapper<LongWritable, Text, Text, IntWritable>
{
private static final int MISSING
= 9999; @Override
public void map(LongWritable key, Text value, Context context) throws IOException,
InterruptedException
{
String line =
value.toString(); String
year = line.substring(15,
19);

9
UNIT – II

int airTemperature;
if (line.charAt(87) == '+')
{
airTemperature = Integer.parseInt(line.substring(88, 92));
}
Else
{
airTemperature = Integer.parseInt(line.substring(87, 92));
}
String quality = line.substring(92, 93);
if (airTemperature != MISSING && quality.matches("[01459]"))
{
context.write(new Text(year), new IntWritable(airTemperature));
}
}
}
The map() method is passed a key and a value. We convert the Text value containing the line of
input into a Java String, then use its substring() method to extract the columns we are interested in.
The map() method also provides an instance of Context to write the output to. In this case, we write
the year as a Text object (since we are just using it as a key), and the temperature is wrapped in an
IntWritable. We write an output record only if the temperature is present and the quality code
indicates the temperature reading is OK.
Reducer code
Again, four formal type parameters are used to specify the input and output types, this time for the
reduce function. The input types of the reduce function must match the output types of the map
function: Text and IntWritable. And in this case, the output types of the reduce function are Text and
IntWritable, for a year and its maximum temperature, which we find by iterating through the
temperatures and comparing each with a record of the highest found so far.
import java.io.IOException;
import org.apache.hadoop.io.IntWritable

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
10
UNIT – II

public class MaxTemperatureReducer

extends Reducer<Text, IntWritable, Text, IntWritable>
{
@Override
public void reduce(Text key, Iterable<IntWritable> values,Context
context) throws IOException, InterruptedException
{
int maxValue =
Integer.MIN_VALUE; for
(IntWritable value : values)
{
maxValue = Math.max(maxValue, value.get());
}
context.write(key, new IntWritable(maxValue));
}
}
Record Reader
RecordReader is responsible for creating key / value pair which has been fed to Map task to
process. Each InputFormat has to provide its own RecordReader implementation to generate key /
value pairs. For example, the default TextInputFormat provides LineRecordReader which generates
byte offset of the file as key and n separated line in the input file as value.
Combiner code
Many MapReduce jobs are limited by the bandwidth available on the cluster, so it pays to
minimize the data transferred between map and reduce tasks. Hadoop allows the user to specify a
combiner function to be run on the map output—the combiner function’s output forms the input to the
reduce function. Since the combiner function is an optimization, Hadoop does not provide a guarantee
of how many times it will call it for a particular map output record, if at all. In other words, calling the
combiner function zero, one, or many times should produce the same output from the reducer. The
combiner function doesn’t replace the reduce function. But it can help cut down the amount of data
shuffled between the maps and reduces.
public class MaxTemperatureWithCombiner
{

11
UNIT – II

public static void main(String[] args) throws Exception

{
if (args.length != 2)
{
System.err.println("Usage: MaxTemperatureWithCombiner <input path> " +"<output
path>"); System.exit(-1);
}
Job job = new Job(); job.setJarByClass(MaxTemperatureWithCombiner.class);
job.setJobName("Max temperature"); FileInputFormat.addInputPath(job, new
Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(MaxTemperatureMapper.class);
job.setCombinerClass(MaxTemperatureReducer.class);
job.setReducerClass(MaxTemperatureReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Partitioner code
The partitioning phase takes place after the map phase and before the reduce phase. The number
of partitions is equal to the number of reducers. The data gets partitioned across the reducers according
to the partitioning function. The difference between a partitioner and a combiner is that the partitioner
divides the data according to the number of reducers so that all the data in a single partition gets
executed by a single reducer. However, the combiner functions similar to the reducer and processes
the data in each partition. The combiner is an optimization to the reducer. The default partitioning
function is the hash partitioning function where the hashing is done on the key. However it might be
useful to partition the data according to some other function of the key or the value.

Anatomy of Map-Reduce Jobs PDF
No ratings yet
Anatomy of Map-Reduce Jobs PDF
30 pages
Unit Iv Mapreduce Applications
No ratings yet
Unit Iv Mapreduce Applications
70 pages
Hadoop and Big Data Unit 31
No ratings yet
Hadoop and Big Data Unit 31
9 pages
A Weather Dataset. Understanding Hadoop API for MapReduce Framework
No ratings yet
A Weather Dataset. Understanding Hadoop API for MapReduce Framework
9 pages
How Map Reduce Work
No ratings yet
How Map Reduce Work
99 pages
BDA_mapreduce-31-01-2025
No ratings yet
BDA_mapreduce-31-01-2025
48 pages
Mapreduce
No ratings yet
Mapreduce
5 pages
UNIT-4 bda
No ratings yet
UNIT-4 bda
26 pages
Big Data Unit 4
No ratings yet
Big Data Unit 4
14 pages
Module 4 BDA Solutions
No ratings yet
Module 4 BDA Solutions
22 pages
BDA unit 4
No ratings yet
BDA unit 4
22 pages
Module 4
No ratings yet
Module 4
37 pages
MapReduce Architecture
No ratings yet
MapReduce Architecture
27 pages
BDA UNIT -4 notes
No ratings yet
BDA UNIT -4 notes
28 pages
UNIT -4 PPT
No ratings yet
UNIT -4 PPT
50 pages
Unit-4
No ratings yet
Unit-4
19 pages
Bda Unit 4
No ratings yet
Bda Unit 4
20 pages
Unit 3-1
No ratings yet
Unit 3-1
65 pages
Unit3 MapReduce
No ratings yet
Unit3 MapReduce
7 pages
1 UNIT-1
No ratings yet
1 UNIT-1
59 pages
Unit 3 Handouts
No ratings yet
Unit 3 Handouts
11 pages
Map Reduce and Hadoop
No ratings yet
Map Reduce and Hadoop
39 pages
Ch2_PART4_INTRODUCTIONTOHADOOPANDHADOOPpdf__2024_08_05_18_47_49
No ratings yet
Ch2_PART4_INTRODUCTIONTOHADOOPANDHADOOPpdf__2024_08_05_18_47_49
23 pages
2 BDA MapReduce
No ratings yet
2 BDA MapReduce
30 pages
BDA Unit-3
No ratings yet
BDA Unit-3
24 pages
Unit-2 Bda Kalyan - Pagenumber
No ratings yet
Unit-2 Bda Kalyan - Pagenumber
15 pages
Mapreduce Lifecycle
No ratings yet
Mapreduce Lifecycle
8 pages
Unit-2 Hadoop and Python
No ratings yet
Unit-2 Hadoop and Python
50 pages
Hadoop Streaming: Mapreduce
No ratings yet
Hadoop Streaming: Mapreduce
8 pages
3-MapReduce Different Phases-13-01-2025
No ratings yet
3-MapReduce Different Phases-13-01-2025
23 pages
Anatomy of Map Reduce Job Run
100% (1)
Anatomy of Map Reduce Job Run
20 pages
2inceptez Hadoop Processing
No ratings yet
2inceptez Hadoop Processing
16 pages
Big Data Unit 2 AKTU Notes
No ratings yet
Big Data Unit 2 AKTU Notes
63 pages
Analyzing The Data With Hadoop
No ratings yet
Analyzing The Data With Hadoop
13 pages
UNIT 4 Notes by ARUN JHAPATE
No ratings yet
UNIT 4 Notes by ARUN JHAPATE
20 pages
Hadoop 2.0
No ratings yet
Hadoop 2.0
20 pages
Big data unit 3 own
No ratings yet
Big data unit 3 own
20 pages
2025-CSC14118-Lecture02c-HadoopMapReduce (1)
No ratings yet
2025-CSC14118-Lecture02c-HadoopMapReduce (1)
89 pages
24 Hadoop Interview Questions & Answers For MapReduce Developers - FromDev
No ratings yet
24 Hadoop Interview Questions & Answers For MapReduce Developers - FromDev
3 pages
24 Hadoop Interview Questions & Answers For MapReduce Developers - FromDev
No ratings yet
24 Hadoop Interview Questions & Answers For MapReduce Developers - FromDev
7 pages
MapReduce workflows
No ratings yet
MapReduce workflows
43 pages
MapReduce Arch
No ratings yet
MapReduce Arch
29 pages
MapReduce
No ratings yet
MapReduce
14 pages
Hadoop Learning MapReduce
No ratings yet
Hadoop Learning MapReduce
3 pages
Big Data Notes Unit-3
No ratings yet
Big Data Notes Unit-3
7 pages
BDA Unit 3 Notes
No ratings yet
BDA Unit 3 Notes
11 pages
Unit 3
No ratings yet
Unit 3
13 pages
Notes Bug Data and of Apache
No ratings yet
Notes Bug Data and of Apache
4 pages
BDA-U4
No ratings yet
BDA-U4
25 pages
BDA UNIT-3 (1) - Merged
No ratings yet
BDA UNIT-3 (1) - Merged
98 pages
Anatomy of Mapreduce Job Run: Some Slides Are Taken From Cmu PPT Presentation
No ratings yet
Anatomy of Mapreduce Job Run: Some Slides Are Taken From Cmu PPT Presentation
73 pages
Map Reduce
No ratings yet
Map Reduce
40 pages
Analyzing_Data_with_Hadoop
No ratings yet
Analyzing_Data_with_Hadoop
54 pages
Bda Unit-3
No ratings yet
Bda Unit-3
20 pages
Chapter 4 MapReduce and New Software Stack
No ratings yet
Chapter 4 MapReduce and New Software Stack
48 pages
Executing Hadoop Map Reduce Jobs
No ratings yet
Executing Hadoop Map Reduce Jobs
2 pages
Big Data Analytics UNIT 3 Notets
No ratings yet
Big Data Analytics UNIT 3 Notets
12 pages
Unit-2
No ratings yet
Unit-2
18 pages
3D Hardware design:: Software applications for GPU
From Everand
3D Hardware design:: Software applications for GPU
S Mathioudakis
No ratings yet
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
From Everand
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
Karl Josef Hensel
No ratings yet
DBMS Test 2 Advanced Key
No ratings yet
DBMS Test 2 Advanced Key
13 pages
Unit-4 Pig Hive
No ratings yet
Unit-4 Pig Hive
40 pages
Chapter 10
No ratings yet
Chapter 10
45 pages
Hadoop Pig Presentation
No ratings yet
Hadoop Pig Presentation
33 pages
Cloud Computing Lab Manual
No ratings yet
Cloud Computing Lab Manual
73 pages
Big Data Management
No ratings yet
Big Data Management
55 pages
MapReduce Example
No ratings yet
MapReduce Example
3 pages
NOSQL DB Modulewise Imp Questions
No ratings yet
NOSQL DB Modulewise Imp Questions
3 pages
Big Data Research Paper
No ratings yet
Big Data Research Paper
14 pages
7 Related Work: To Appear in OSDI 2004
No ratings yet
7 Related Work: To Appear in OSDI 2004
1 page
M.Tech (CSE) Scheme & Syllabus 2024-25
No ratings yet
M.Tech (CSE) Scheme & Syllabus 2024-25
59 pages
Advanced Computing Lab Manual
No ratings yet
Advanced Computing Lab Manual
49 pages
EX. NO Date Program NO Sign
No ratings yet
EX. NO Date Program NO Sign
80 pages
BDA Unit-2
No ratings yet
BDA Unit-2
11 pages
Design and Implementation of HDFS Data Encryption Scheme Using ARIA Algorithm On Hadoop
No ratings yet
Design and Implementation of HDFS Data Encryption Scheme Using ARIA Algorithm On Hadoop
7 pages
Big Data Cloudera TP
No ratings yet
Big Data Cloudera TP
33 pages
Disadvantage of Hadoop
No ratings yet
Disadvantage of Hadoop
21 pages
BIG DATA WITH HADOOP, HDFS & MAPREDUCE (Hands On Training)
No ratings yet
BIG DATA WITH HADOOP, HDFS & MAPREDUCE (Hands On Training)
35 pages
You Have Two Datasets - Trips - TXT Which Records Tri...
No ratings yet
You Have Two Datasets - Trips - TXT Which Records Tri...
6 pages
Architecture: Teradata DBC 1012 Kryder's Law
No ratings yet
Architecture: Teradata DBC 1012 Kryder's Law
2 pages
Big Data Analytics Compiled Notes
No ratings yet
Big Data Analytics Compiled Notes
130 pages
01 Intro PDF
No ratings yet
01 Intro PDF
69 pages
Big Data Hadoop & Spark Curriculum
No ratings yet
Big Data Hadoop & Spark Curriculum
10 pages
Ai and Big Data
No ratings yet
Ai and Big Data
6 pages
239700a5-6c7a-43c1-810e-687c652d046e
No ratings yet
239700a5-6c7a-43c1-810e-687c652d046e
14 pages
Big data analytics 2016th Edition Radha Shankarmani - Download the ebook and start exploring right away
100% (1)
Big data analytics 2016th Edition Radha Shankarmani - Download the ebook and start exploring right away
55 pages
Big Data Unit -1
No ratings yet
Big Data Unit -1
17 pages
Flipkart Recommendation
0% (1)
Flipkart Recommendation
35 pages
BE Computer Engineering 2012
0% (1)
BE Computer Engineering 2012
60 pages
Hadoop and Java Ques - Ans
No ratings yet
Hadoop and Java Ques - Ans
222 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

BDA-Unit-II

Uploaded by

BDA-Unit-II

Uploaded by

UNIT – II

Writing MapReduce Programs: A Weather Dataset, Understanding Hadoop API for

Figure 4. How Hadoop runs a MapReduce job using YARN

public class MaxTemperature

public static void main(String[] args) throws Exception

public class MaxTemperatureReducer

public static void main(String[] args) throws Exception

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.