213nt1306- Big Data Analytics Lab Manual
213nt1306- Big Data Analytics Lab Manual
LAB MANUAL
Year: ………………...
Semester: ……………
Branch: ………………………………………
DEPARTMENT OF INFORMATION TECHNOLOGY
BONAFIDE CERTIFICATE
Submitted for the practical examination held at Kalasalingam Academy of Research and
Education , Anand Nagar, Krishnankovil on ……………………..
Register No.
2 Hadoop Implementation of file management tasks, such as Adding files and directories,
retrieving files and Deleting files
3 Implement of Matrix Multiplication with Hadoop Map Reduce
4 Run a basic Word Count Map Reduce program to understand Map Reduce Paradigm.
Aim : To Install Hadoop and Understanding Different Hadoop Modes, Startup Scripts and Configuration Files.
A. Installation of Hadoop:
The Oracle JDK is the official JDK; however, it is nolonger provided by Oracle as a default
installation for Ubuntu. You can still install it using apt-get.
a. sudo adduser hadoop_dev ( Upon executing this command, you will prompted to enter the
newpassword for this user. Please enter the passwordand enter other details. Don’t forget to save
the details at the end)
b. su - hadoop_dev( Switches the user fromcurrent user to the new user created i.e
Hadoop_dev)
8. Let us verify if the installation is successful ornot( change to home directory cd /home/
hadoop_dev/hadoop2/):
10. Let us run a sample hadoop programs that isprovided to you in the download package:
$ cat output/* (look for the output in the outputdirectory that Hadoop creates for you).
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Note: This change sets the default replicationcount for blocks used by HDFS.
3. We need to setup password less login so that themaster will be able to do a password-less ssh
to start the daemons on all the slaves.
a. ssh localhost( enter your password and if youare able to login then ssh server is running)
4. We can run Hadoop jobs locally or on YARN in this mode. In this Post, we will focus on
running thejobs locally.
5. Format the file system. When we format name node it formats the meta-data related to data-
nodes. By doing that, all the information on the datanodes are lost and they can be reused for new
data:
You can check If NameNode has started successfully or not by using the following web interface:
http://0.0.0.0:50070 . If you are unable tosee this, try to check the logs in the /home/
hadoop_dev/hadoop2/logs folder.
7. You can check whether the daemons are runningor not by issuing Jps command.
10. Stop the daemons when you are done executing the jobs, with the below
command: sbin/stop-dfs.sh
Hadoop Installation – Psuedo Distributed Mode( YARN )
Steps for Installation
sbin/start-yarn.sh
Once this command is run, you can check if ResourceManager is running or not by visiting the
following URL on browser : http://0.0.0.0:8088 . If you are unable to see this, check for the logs in
thedirectory: /home/hadoop_dev/hadoop2/logs
5. To check whether the services are running, issuea jps command. The following shows all the
services necessary to run YARN on a single server:
$ jps
15933 Jps
15567 ResourceManager
15785 NodeManager
6. Let us run the same example as we ran before:
'dfs[a-z.]+'
7. Stop the daemons when you are done executingthe jobs, with the below command:
sbin/stop-yarn.sh
Steps
1. Prerequisites before installing
2. Set Environment
3. Hadoop set-up
Google Search
Oracle Link
https://www.oracle.com/in/java/technologies/javase/javase8-archive-downloads.html
Click download
Install
Download Hadoop
Link : https://hadoop.apache.org/releases.html
2. Set Environment Variable
a. Click Start Button-> Settings
b. Click System
Checking java installation
a. Go to command Prompt
b. Type javac and enter
Coresite.xml:
<configuration>
<property>
<name>fs.default.name</name> <value>hdfs://localhost:50071</value>
</property>
</configuration>
Create new folder in Hadoop folder
Hdfs-site.html
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///C:/hadoop-3.3.6/data/namenode</value>
<final>true</final>
</property>
<property><name>dfs.datanode.data.dir</name>
<value>/C:/hadoop-3.3.6/data/datanode</value>
<final>true</final>
</property>
</configuration>
Mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
Aim : To add files and directories, retrieving and deleting files in hadoop environment.
1. Create a directory in HDFS at given path(s).
Usage : hadoop fs -mkdir <paths>
Example : hadoop fs -mkdir /user/saurzcode/dir1 /user/saurzcode/dir2
hadoop fs -put: Copy single src file, or multiple src files from local
Similar to put command, except that the source is restricted toa local file reference.
copyToLocal
Similar to get command, except that the destination isrestricted to a local file reference.
Program:
MatrixMultiplication.java
package matrix;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class MatrixMultiplication {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
// A is an m-by-n matrix; B is an n-by-p matrix.
conf.set("m", "2");
conf.set("n", "5");
conf.set("p", "3");
Job job = new Job(conf, "MatrixMultiplication");
job.setJarByClass(MatrixMultiplication.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(MatrixMapper.class);
job.setReducerClass(MatrixReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
MatrixMapper.java
package matrix;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class MatrixMapper extends Mapper<LongWritable, Text,
Text, Text> {
public void map(LongWritable key, Text value, Context
context) throws IOException, InterruptedException {
Configuration conf = context.getConfiguration();
int m = Integer.parseInt(conf.get("m"));
int p = Integer.parseInt(conf.get("p"));
String line = value.toString();
String[] indicesAndValue = line.split(",");
Text outputKey = new Text();
Text outputValue = new Text();
if (indicesAndValue[0].equals("A")) {
for (int k = 0; k < p; k++) {
outputKey.set(indicesAndValue[1] + "," + k);
outputValue.set("A," + indicesAndValue[2] + "," +
indicesAndValue[3]);
context.write(outputKey, outputValue);
}
} else {
for (int i = 0; i < m; i++) {
outputKey.set(i + "," + indicesAndValue[2]);
outputValue.set("B," + indicesAndValue[1] + "," +
indicesAndValue[3]);
context.write(outputKey, outputValue);
}
}
}
}
MatrixReducer.java
package matrix;
import java.io.IOException;
import java.util.HashMap;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class MatrixReducer extends Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Iterable<Text> values, Context
context) throws IOException, InterruptedException {
String[] value;
HashMap<Integer, Float> hashA = new HashMap<Integer,
Float>();
HashMap<Integer, Float> hashB = new HashMap<Integer,
Float>();
for (Text val : values) {
value = val.toString().split(",");
if (value[0].equals("A")) {
hashA.put(Integer.parseInt(value[1]),
Float.parseFloat(value[2]));
} else {
hashB.put(Integer.parseInt(value[1]),
Float.parseFloat(value[2]));
}
}
int n =
Integer.parseInt(context.getConfiguration().get("n"));
float result = 0.0f;
float a_ij;
float b_jk;
for (int j = 0; j < n; j++) {
a_ij = hashA.containsKey(j) ? hashA.get(j) : 0.0f;
b_jk = hashB.containsKey(j) ? hashB.get(j) : 0.0f;
result += a_ij * b_jk;
}
if (result != 0.0f) {
context.write(null, new Text(key.toString() + "," +
Float.toString(result)));
}
}
}
Running Steps:
1. Open Eclipse
2. Create Java Project as “matrix”
3. Add three class files in it
4. Add referenced libraries by right click the matrix->Build Path->Add External Archives
o hadoop-common.jar
o hadoop-mapreduce-client-core-0.23.1.jar
Input Data:
Run and Output:
Ex. No. 4 Date :
Procedure :
• After install the hadoop and give service for the hadoop the write program for the
wordcount in java.
• Before writing it we need to create input file and directory to store the input and output
file.
• Create a input file as vi <file_name>.txt.And write something in it.
• After creating the file Then create a directory and put the file inside the directory by
using the command.
hdfs dfs -mkdir /map
hdfs dfs -put test.txt /map
• Then create mapreducer foldar to store the java program by using command. mkdir
mapreduce
• Inside the mapreducer folder write program for the word count.
WordCountMapper.java:
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.Mapper;
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable>
{
private final static IntWritable one= new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException,
InterruptedException
{
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer (line);
while(tokenizer.hasMoreTokens())
{
word.set(tokenizer.nextToken());
context.write(word,one);
} }}
WordCountReducer.java:
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.Reducer;
WordCount.java:
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner;
• After write the program we need hadoop-core-3.3.6.jar download the jar file from
(https://repo1.maven.org/maven2/org/apache/hadoop/hadoopcore/3.3.6/) and move the
hadoop-core-3.3.6.jar to mapreducer folder in hadoop using command cp hadoop-core-
3.3.6.jar /opt/hadoop/mapreducer/
• Then extract tha hadoop-core-3.3.6.jar using the command jar -xvf hadoop-core3.3.6.jar.
• Create java into a jar file using command jar cvfe WordCount.jar WordCount *.class.
• Then give the path for the input file and output file for the wordcount program by using
command.
hadoop jar WordCount.jar /map/test.txt /map/out.txt
Rename it as “derby”
Download hive-2.1.0 using the link https://archive.apache.org/dist/hive/hive-2.1.0
Navigate to the derby folder->Open it->lib->Copy all the files in the folder
Open command Prompt->Type “hive”->type following commands to create database and insert
data into it.
Hive Installation on Ubuntu :
Ex. No. 6 Date :
Installation of Hbase along with practice examples
Step 1:
Unzip the downloaded Hbase and place it in some common path, say C:/Document/hbase-2.2.5
Unzipped file :
Step 2:
Create a folders as shown below inside root folder for HBase data and zookeeper
-> C:/Document/hbase-2.2.5/hbase
-> C:/Document/hbase-2.2.5/zookeeper
Step 3:
set JAVA_HOME=%JAVA_HOME%
set HBASE_CLASSPATH=%HBASE_HOME%\lib\client-facing-thirdparty\*
set HBASE_HEAPSIZE=8000
set HBASE_OPTS="-XX:+UseConcMarkSweepGC" "-Djava.net.preferIPv4Stack=true"
set SERVER_GC_OPTS="-verbose:gc" "-XX:+PrintGCDetails" "-XX:+PrintGCDateStamps"
%HBASE_GC_OPTS%
set HBASE_USE_GC_LOGFILE=true
Step 5:
<property>
<name>hbase.rootdir</name>
<value>file:///C:/Documents/hbase-2.2.5/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/C:/Documents/hbase-2.2.5/zookeeper</value>
</property>
<property>
<name> hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>
Step 6:
Setup the Environment variable for HBASE_HOME and add bin to the path variable as shown
in the below image.
Hbase version
To open hbase shell window
To Create table
Insert data in the table