0% found this document useful (0 votes)

52 views30 pages

Complete Hadoop Map Reduce Hive Setup Step by Step

Uploaded by

viveksingh67067

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views30 pages

Complete Hadoop Map Reduce Hive Setup Step by Step

Uploaded by

viveksingh67067

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 30

MAKE SURE U ARE CONNECTED TO RIGHT INTERNET . .

WIFI (UR OWN HOTSPOT) OR UNIV… SOME

TIMES FIRE WALL RULES WOULD PREVENT SOME INTALLATION/COPY FROM INTERNET**

1. download ubuntu 20.04 version

or
copy from pen drive given by the SME/trainer

2. create a virtual machine named as bigdata

NOTE: While creating virtual machine, point to D or E drive, as VM files occupy space in c:\ drive by
default and u will get into storage issues.
Create a folder like bigdata in d:\ drive and use os type as linux and version as Ubuntu 64 bit.
U can start with 2048 MB i.e 2 GB memory and later can increase it to 3 gb or 4 gb.
Select Create Virtual Hard Disk now option and click Create button.

Next screen would like below:

Give file location in D Drive (default it will choose, just double check); give minimum of 45 gb as we
will be downloading, installing hadoop, hive, mongodb and other BIG DATA system related software
in the same linux box.
3. VM should have been created now and you can see the VM in the Virtual Box Player as below:

It’s time to point the right ISO image corresponding to Ubuntu 20.04 version (64 bit) file by following
the steps given below:
Select the VM bigdata and right click and choose settings
Click the storage option and the pop up window look like the following

Now select the Disk symbol (empty) and in the right panel, there are Attributes, click the down
arrow corresponding to disk sign to choose the optical drive/iso file..

We can go with 16.04.7 – server version for full or can choose 20.04.4 desktop-amd64 file
Once you select the right iso image Ubuntu-20.04.4-desktop-amd64…ur screen look like above.

Keep the network setting with NAT for accessing internet from your linux
SELECT OK and proceed…

Now you are ready to start your VIRTUAL MACHINE.. WHICH WILL BE BOOTED AND INITIALIZED
WITH UBUNTU 20.04 ISO IMAGE… (this process takes about 30 minutes.. max depending on your
system). Select the VM, select Start … and select Normal Start option as below:
Maximize the vm screen where installation is happening…

Initially all black screen.. with lot of lines… no worries..

Select Install Unbutu

Select all default.. and continue

SELECT NORMAL INSTALLATION… AND CONTINUE..

It will show Erase disk and install Ubuntu.. NO WORRIES.. IT WILL BE ERASING ONLY THE SELECTED
FOLDER/SIZE OF 36 GB.. DISK VOLUME…MOUNT POINT… IT DOESN’T DO ANYTHING TO WINDOWS..
IT IS FOR VIRTUAL DISK…. . .Select Install Now
Continue in the next screen…

By default it select india Calcutta region.. go for it.. if in case if it is showing different region..select
accordingly.
NEXT SCREEN IS VERY IMPORTANT.. REMEMBER TO GIVE RIGHT USER NAME AND PASSWORD.. AND
STORE IT…/REMEMBER IT…

bigdata2023

u will be seeing installation and copying takes place.. be patient..

Finally you get the message……………..Restart…..
INSTRUCTIONS FOR SETTING UP HADOOP & MAP REDUCE

don’t upgrade or any packages in Ubuntu / linux OS.. if it asking forcefully ***

Prerequisites before installing/setting up Hadoop and Map Reduce (Java Based..)

 Access to a terminal window/command line
 Sudo or root privileges on local /remote machines
Install OpenJDK on Ubuntu
The Hadoop framework is written in Java, and its services require a compatible Java Runtime
Environment (JRE) and Java Development Kit (JDK).

Use the following command to update your system before initiating a new installation:

Step-1
sudo apt update
drdvenkat@drdvenkat-VirtualBox:~/Desktop$ sudo apt update
[sudo] password for drdvenkat:
Hit:1 http://security.ubuntu.com/ubuntu focal-security InRelease
Hit:2 http://in.archive.ubuntu.com/ubuntu focal InRelease
Hit:3 http://in.archive.ubuntu.com/ubuntu focal-updates InRelease
Hit:4 http://in.archive.ubuntu.com/ubuntu focal-backports InRelease
Reading package lists... Done
Building dependency tree
Reading state information... Done
329 packages can be upgraded. Run 'apt list --upgradable' to see them.
drdvenkat@drdvenkat-VirtualBox:~/Desktop$

to change the shell prompt.. long one to short one/ur favourtie use the following step:
drdvenkat@drdvenkat-VirtualBox:~/Desktop$ export PS1=$:
$:

Type the following command in your terminal to install OpenJDK 8:

Step-2
sudo apt install openjdk-8-jdk -y

$:sudo apt install openjdk-8-jdk -y

Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
...................
it takes a while..
Step-3

Once the installation process is complete, verify the current

Java version:
java -version
javac -version

$:which java
/usr/bin/java
$:java -version
openjdk version "1.8.0_352"
OpenJDK Runtime Environment (build 1.8.0_352-8u352-ga-1~20.04-b08)
OpenJDK 64-Bit Server VM (build 25.352-b08, mixed mode)
$:which javac
/usr/bin/javac
$:javac -version
javac 1.8.0_352
$:
Set Up a Non-Root User for Hadoop Environment
It is advisable to create a non-root user, specifically for the
Hadoop environment. A distinct user improves security and helps you
manage your cluster more efficiently. To ensure the smooth
functioning of Hadoop services, the user should have the ability to
establish a passwordless SSH connection with the localhost.
Install the OpenSSH server and client using the following
command
Step-4
sudo apt install openssh-server openssh-client -y

$:sudo apt install openssh-server openssh-client -y

[sudo] password for drdvenkat:
Reading package lists... Done
Building dependency tree
....................takes a while..
rescue-ssh.target is a disabled or a static unit, not starting it.
Processing triggers for systemd (245.4-4ubuntu3.15) ...
Processing triggers for man-db (2.9.1-1) ...
Processing triggers for ufw (0.36-6ubuntu1) ...
$:
Create Hadoop User
Utilize the adduser command to create a new Hadoop user:
Step-5
sudo adduser hdoop
$:sudo adduser hdoop
Adding user `hdoop' ...
Adding new group `hdoop' (1001) ...
Adding new user `hdoop' (1001) with group `hdoop' ...
Creating home directory `/home/hdoop' ...
Copying files from `/etc/skel' ...
New password:
Retype new password:
passwd: password updated successfully
Changing the user information for hdoop
Enter the new value, or press ENTER for the default
Full Name []: hadoop user account
Room Number []:
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n] Y
Switch to the newly created user and enter the corresponding
password:
Step-6
su – hdoop
$:su - hdoop
Password:
hdoop@drdvenkat-VirtualBox:~$ PS=$:
hdoop@drdvenkat-VirtualBox:~$ export PS1=$:
$:who am i
$:pwd
/home/hdoop
$:
Enable Passwordless SSH for Hadoop User
Generate an SSH key pair and define the location is is to be stored in:
Step-7
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$:ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
Created directory '/home/hdoop/.ssh'.
Your identification has been saved in /home/hdoop/.ssh/id_rsa
Your public key has been saved in /home/hdoop/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:3rXw8t/Hk44PWcJqfXKpdguirPUlc4hduk1YdrTsMR0 hdoop@drdvenkat-
VirtualBox
The key's randomart image is:
+---[RSA 3072]----+
| |
| |
| E |
| . o +|
| S . .* B.|
| . .o+X.* +|
| .oo@oX *.|
| o +oX.O=o|
| ..o oo==+=|
+----[SHA256]-----+
$:
Use the cat command to store the public key as authorized_keys in
the ssh directory:
Step-8
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Set the permissions for your user with the chmod command:

Step-9
chmod 0600 ~/.ssh/authorized_keys
$:ls -lt ~/.ssh/authorized_keys
-rw------- 1 hdoop hdoop 580 Jan 24 22:04 /home/hdoop/.ssh/authorized_keys
$:
The new user is now able to SSH without needing to enter a password every time.
Verify everything is set up correctly by using the hdoop user to SSH to localhost:
Step-10
ssh localhost
$:ssh localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is
SHA256:CXz9eqInsu9wcBTgemSUKUdujMiDkgM91L0lU758Yj0.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 20.04.4 LTS (GNU/Linux 5.15.0-58-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
330 updates can be applied immediately.
255 of these updates are standard security updates.
hdoop@drdvenkat-VirtualBox:~$
Download and Install Hadoop on Ubuntu
Visit the official Apache Hadoop project page, and select the version of Hadoop you want to
implement.
Go to the archive page.. and select Hadoop 3.2.1
https://archive.apache.org/dist/hadoop/common/

U will get into the following page:

https://archive.apache.org/dist/hadoop/common/hadoop-3.2.1/
Select the following correctly..
hadoop-3.2.1.tar.gz 2020-07-03 04:38 343M
Step-11
Save the tar.gz file in the default / downloads directory
Once the download is complete, first copy the file to the home directory from the downloads
directory..
or you can directly get the file using wget command using below step:
Use the provided mirror link and download the Hadoop package with the wget command:
Step-11 **RECOMMENDED ONE
wget https://archive.apache.org/dist/hadoop/common/hadoop-3.2.1/hadoop-
3.2.1.tar.gz
hdoop@drdvenkat-VirtualBox:~$ wget
https://archive.apache.org/dist/hadoop/common/hadoop-3.2.1/hadoop-
3.2.1.tar.gz
--2023-01-24 22:18:37--
https://archive.apache.org/dist/hadoop/common/hadoop-3.2.1/hadoop-
3.2.1.tar.gz
Resolving archive.apache.org (archive.apache.org)... 138.201.131.134,
2a01:4f8:172:2ec5::2
Connecting to archive.apache.org (archive.apache.org)|
138.201.131.134|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 359196911 (343M) [application/x-gzip]
Saving to: ‘hadoop-3.2.1.tar.gz’
hdoop@drdvenkat-VirtualBox:~$ wget
https://archive.apache.org/dist/hadoop/common/hadoop-3.2.1/hadoop-
3.2.1.tar.gz
--2023-01-24 22:29:52--
https://archive.apache.org/dist/hadoop/common/hadoop-3.2.1/hadoop-
3.2.1.tar.gz
Resolving archive.apache.org (archive.apache.org)... 138.201.131.134,
2a01:4f8:172:2ec5::2
Connecting to archive.apache.org (archive.apache.org)|
138.201.131.134|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 359196911 (343M) [application/x-gzip]
Saving to: ‘hadoop-3.2.1.tar.gz’

hadoop-3.2.1.tar.gz 100%[===================================>]
342.56M 2.91MB/s in 2m 0s

2023-01-24 22:31:52 (2.85 MB/s) - ‘hadoop-3.2.1.tar.gz’ saved

[359196911/359196911]

hdoop@drdvenkat-VirtualBox:~$
Extract the files to initiate the Hadoop installation
Step-12:
tar xzf hadoop-3.2.1.tar.gz
hdoop@drdvenkat-VirtualBox:~$ tar xzf hadoop-3.2.1.tar.gz
The Hadoop binary files are now located within the hadoop-3.2.1 directory
hdoop@drdvenkat-VirtualBox:~$ PS1=$:
$:id
uid=1001(hdoop) gid=1001(hdoop) groups=1001(hdoop)
$:ls -lt
total 350788
-rw-rw-r-- 1 hdoop hdoop 359196911 Jul 3 2020 hadoop-3.2.1.tar.gz
drwxr-xr-x 9 hdoop hdoop 4096 Sep 10 2019 hadoop-3.2.1
Single Node Hadoop Deployment (Pseudo-Distributed Mode)
Hadoop excels when deployed in a fully distributed mode on a large cluster of networked
servers. However, if you are new to Hadoop and want to explore basic commands or test
applications, you can configure Hadoop on a single node. This setup, also called pseudo-
distributed mode, allows each Hadoop daemon to run as a single Java process.

Step-13:Configure Hadoop environment is configured by editing a set of

configuration files listed below:
 .bashrc
 hadoop-env.sh
 core-site.xml
 hdfs-site.xml
 mapred-site-xml
 yarn-site.xml
Edit the .bashrc shell configuration file using a text editor of your choice (we will be using nano or
vi editor)
vi .bashrc
(add the following at the end of the file -- u need to know vi editor commands)
## entries added by Dr.D.VENKAT for HADOOP ENVIRONMENT SETTINGS
export HADOOP_HOME=/home/hdoop/hadoop-3.2.1
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/nativ"

Once u edited and added the entries.. save them and exit..
execute the environment file using the following command.
hdoop@drdvenkat-VirtualBox:~$ source .bashrc
hdoop@drdvenkat-VirtualBox:~$ env | grep HADOOP
HADOOP_OPTS=-Djava.library.path=/home/hdoop/hadoop-3.2.1/lib/nativ
HADOOP_INSTALL=/home/hdoop/hadoop-3.2.1
HADOOP_MAPRED_HOME=/home/hdoop/hadoop-3.2.1
HADOOP_COMMON_HOME=/home/hdoop/hadoop-3.2.1
HADOOP_HOME=/home/hdoop/hadoop-3.2.1
HADOOP_HDFS_HOME=/home/hdoop/hadoop-3.2.1
HADOOP_COMMON_LIB_NATIVE_DIR=/home/hdoop/hadoop-3.2.1/lib/native
hdoop@drdvenkat-VirtualBox:~$
Edit hadoop-env.sh File
The hadoop-env.sh file serves as a master file to configure YARN, HDFS, MapReduce, and
Hadoop-related project settings. When setting up a single node Hadoop cluster, you need to
define which Java implementation is to be utilized. Use the previously created
$HADOOP_HOME variable to access the hadoop-env.sh file:
vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh

Uncomment the $JAVA_HOME variable (i.e., remove the #

sign) and add the full path to the OpenJDK installation on
your system.
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
Steps are listed below:
vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh
# export JAVA_HOME=
# New entry for JAVA PATH is added by Dr D VENKAT
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
Use the provided path to find the OpenJDK directory by the
below command
readlink -f /usr/bin/javac
Edit core-site.xml File
U NEED TO MAKE SURE THAT YOU CREATE A DIRECTORY FOR TEMPORARY DATA as
below:

mkdir /home/hdoop/tmpdata

The core-site.xml file defines HDFS and Hadoop core properties.

To set up Hadoop in a pseudo-distributed mode, you need to specify the URL for your
NameNode, and the temporary directory Hadoop uses for the map and reduce process.
vi $HADOOP_HOME/etc/hadoop/core-site.xml
Add the following configuration to override the default values
for the temporary directory and add your HDFS URL to replace
the default local file system setting:

<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hdoop/tmpdata</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://127.0.0.1:9000</value>
</property>
</configuration>
Edit hdfs-site.xml File
U HAVE TO CREATE TWO DIRECTORIES IN ORDER TO CREATE NAME NODE AND DATA NODE
DATA

$:mkdir -p /home/hdoop/dfsdata/namenode

$:mkdir -p /home/hdoop/dfsdata/datanode

The properties in the hdfs-site.xml file govern the location for storing node metadata, fsimage
file, and edit log file. Configure the file by defining the NameNode and DataNode storage
directories.
Additionally, the default dfs.replication value of 3 needs to be changed to 1 to match
the single node setup.
Use the following command to open the hdfs-site.xml file for editing:
$ vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml
Add the following configuration to the file and, if needed,
adjust the NameNode and DataNode directories to your custom
locations:
$: vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml

<configuration>
<property>
<name>dfs.data.dir</name>
<value>/home/hdoop/dfsdata/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hdoop/dfsdata/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
Edit mapred-site.xml File
Use the following command to access the mapred-site.xml file and define MapReduce
values:
vi $HADOOP_HOME/etc/hadoop/mapred-site.xml

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

</configuration>
Edit yarn-site.xml File
The yarn-site.xml file is used to define settings relevant to YARN. It contains configurations
for the Node Manager, Resource Manager, Containers, and Application Master.
Open the yarn-site.xml file in a text editor:
vi $HADOOP_HOME/etc/hadoop/yarn-site.xml

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>127.0.0.1</value>
</property>
<property>
<name>yarn.acl.enable</name>
<value>0</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>

<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSP
ATH_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>

</configuration>
##########SETTING UP THE HADOOP SINGLE NODE CLUSTER######################
STEP-14: Format HDFS NameNode
It is important to format the NameNode before starting Hadoop services for the first time:
hdfs namenode -format

$:which hdfs
/home/hdoop/hadoop-3.2.1/bin/hdfs

$:which hadoop
/home/hdoop/hadoop-3.2.1/bin/hadoop

$:hdfs namenode -format

…………………………
SHUTDOWN_MSG: Shutting down NameNode at drdvenkat-VirtualBox/127.0.1.1
************************************************************/

If u see the above message u are on the right track..

If in case if you see any error.. don’t proceed.. check google uncle
Step-15: Start Hadoop Cluster
Navigate to the hadoop-3.2.1/sbin directory and execute the following commands to start the
NameNode and DataNode:
$:source .bashrc
hdoop@drdvenkat-VirtualBox:~$ PS1=$:
$:env | grep hadoop
YARN_HOME=/home/hdoop/hadoop-3.2.1
HADOOP_OPTS=-Djava.library.path=/home/hdoop/hadoop-3.2.1/lib/nativ
HADOOP_INSTALL=/home/hdoop/hadoop-3.2.1
HADOOP_MAPRED_HOME=/home/hdoop/hadoop-3.2.1
HADOOP_COMMON_HOME=/home/hdoop/hadoop-3.2.1
HADOOP_HOME=/home/hdoop/hadoop-3.2.1
HADOOP_HDFS_HOME=/home/hdoop/hadoop-3.2.1
HADOOP_COMMON_LIB_NATIVE_DIR=/home/hdoop/hadoop-3.2.1/lib/native
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/
games:/usr/local/games:/snap/bin:/home/hdoop/hadoop-3.2.1/sbin:/home/
hdoop/hadoop-3.2.1/bin
$:cd $HADOOP_HOME
$:cd sbin
$:./start-dfs.sh

<<this is what u will see when you execute above command..>>

Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [drdvenkat-VirtualBox]
drdvenkat-VirtualBox: Warning: Permanently added 'drdvenkat-virtualbox'
(ECDSA) to the list of known hosts.
2023-01-24 23:46:38,318 WARN util.NativeCodeLoader: Unable to load native-
hadoop library for your platform... using builtin-java classes where
applicable
$:
Step-16: Stary Yarn Resource Manager
Once the namenode, datanodes, and secondary namenode are up
and running, start the YARN resource and nodemanagers by
typing:
./start-yarn.sh

What u see below when you execute..

Starting resourcemanager
Starting nodemanagers

Step-17: Check to see all fine

CHECK WHETHER name node, data node and yarn resource managers running using
jps comamnd
$:jps

If u see all below.. u are on a right track..

12882 NodeManager
12499 SecondaryNameNode
12757 ResourceManager
13225 Jps
12330 DataNode
12186 NameNode
$:

Alternatively u can use stop-all.sh or start-all.sh

Step-18: Access Hadoop Environment from Web Browser/UI
Use your preferred browser and navigate to your localhost URL or IP. The default port
number 9870 gives you access to the Hadoop NameNode UI:
http://localhost:9870

The default port 9864 is used to access individual DataNodes

directly from your browser:
http://localhost:9864

The YARN Resource Manager is accessible on port 8088:

http://localhost:8088

SETTING UP AND RUNNING WORD COUNT PROGRAM USING MAP REDUCE

Step-19:
$:pwd
/home/hdoop
$:mkdir data_source
$:cd data_source
$:vi data.txt
big data is the latest trend
one of the trend in 2023 is big data
big data is every where
all companies uses big data
as trend is towards big data systems people tend to move towards the trend
data data data is the key to success
big data helps to achieve the faster and distributed processing of data
distributed applications are the way to move forward
~
~
:wq!
$:cat data.txt
big data is the latest trend
one of the trend in 2023 is big data
big data is every where
all companies uses big data
as trend is towards big data systems people tend to move towards the trend
data data data is the key to success
big data helps to achieve the faster and distributed processing of data
distributed applications are the way to move forward
$:
Step 20: EXECUTION OF MAP REDUCE PROGRAM TO CHECK THE WORD
COUNT........ FOR THE GIVEN DATA SOURCE FILE
$:pwd
/home/hdoop/hadoop-3.2.1/share/hadoop/mapreduce
$:ls -lt *example*
-rw-r--r-- 1 hdoop hdoop 316534 Sep 10 2019 hadoop-mapreduce-examples-3.2.1.jar
$:hadoop dfs -mkdir -p /usr/local/hadoop/input
WARNING: Use of this script to execute dfs is deprecated.
WARNING: Attempting to execute replacement "hdfs dfs" instead.

2023-01-25 00:52:17,134 WARN util.NativeCodeLoader: Unable to load native-hadoop

library for your platform... using builtin-java classes where applicable
$:cd
$:cd data_source
$:ls -x
data.txt
$:pwd
/home/hdoop/data_source
$:hadoop dfs -copyFromLocal /home/hdoop/data_source/data.txt
/usr/local/hadoop/input
WARNING: Use of this script to execute dfs is deprecated.
WARNING: Attempting to execute replacement "hdfs dfs" instead.
2023-01-25 00:53:42,244 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
2023-01-25 00:53:43,941 INFO sasl.SaslDataTransferClient: SASL encryption trust check:
localHostTrusted = false, remoteHostTrusted = false
$:
$:cd $HADOOP_HOME/share/hadoop/mapreduce
$:hadoop jar hadoop-mapreduce-examples-3.2.1.jar wordcount
/usr/local/hadoop/input/data.txt /usr/local/hadoop/output
.........
2023-01-25 00:57:05,813 INFO mapreduce.Job: The url to track the job:
http://drdvenkat-VirtualBox:8088/proxy/application_1674584287946_0001/
2023-01-25 00:58:25,870 INFO mapreduce.Job: map 100% reduce 0%
2023-01-25 00:58:54,862 INFO mapreduce.Job: map 100% reduce 100%
2023-01-25 00:58:57,896 INFO mapreduce.Job: Job job_1674584287946_0001
completed successfully
2023-01-25 00:58:58,145 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=413
FILE: Number of bytes written=452215
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=475
HDFS: Number of bytes written=272
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
HDFS: Number of bytes read erasure-coded=0
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=19803
Total time spent by all reduces in occupied slots (ms)=20390
Total time spent by all map tasks (ms)=19803
Total time spent by all reduce tasks (ms)=20390
Total vcore-milliseconds taken by all map tasks=19803
Total vcore-milliseconds taken by all reduce tasks=20390
Total megabyte-milliseconds taken by all map tasks=20278272
Total megabyte-milliseconds taken by all reduce tasks=20879360
Map-Reduce Framework
Map input records=8
Map output records=67
Map output bytes=623
Map output materialized bytes=413
Input split bytes=118
Combine input records=67
Combine output records=34
Reduce input groups=34
Reduce shuffle bytes=413
Reduce input records=34
Reduce output records=34
Spilled Records=68
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=225
CPU time spent (ms)=1350
Physical memory (bytes) snapshot=323899392
File Input Format Counters
Bytes Read=357
File Output Format Counters
Bytes Written=272
Here /usr/local/hadoop/output folder will be created -automatically*
Check the final result of Word Count Program... for the data soure which we
have given.. To view the output, use this:
$:hadoop dfs -cat /usr/local/hadoop/output/part-r-00000
WARNING: Use of this script to execute dfs is deprecated.
WARNING: Attempting to execute replacement "hdfs dfs" instead.

2023-01-25 01:01:43,015 WARN util.NativeCodeLoader: Unable to load native-

hadoop library for your platform... using builtin-java classes where
applicable
2023-01-25 01:01:44,508 INFO sasl.SaslDataTransferClient: SASL encryption
trust check: localHostTrusted = false, remoteHostTrusted = false
2023 1
achieve 1
all 1
and 1
applications 1
are 1
as 1
big 6
companies 1
data 10
distributed 2
every 1
faster 1
forward 1
helps 1
in 1
is 5
key 1
latest 1
move 2
of 2
one 1
people 1
processing 1
success 1
systems 1
tend 1
the 6
to 4
towards 2
trend 4
uses 1
way 1
where 1
$:
we can check the status of the submitted/executed jobs in localhost:8088

HADOOP / BIG DATA SYSTEM MONITORING

FOLLOW THE INSTRUCTIONS….. WHICH WILL BE GIVEN
SEPARATELY…

FOR HIVE

FOR MAHOUT / APACHE ML…….LIBRARIES

FOR APACHE SPARK

FOR KAFKA…..

213nt1306 - Big Data Analytics Lab Manual
No ratings yet
213nt1306 - Big Data Analytics Lab Manual
80 pages
Hadoop Installation Final
No ratings yet
Hadoop Installation Final
32 pages
Experiment No - 1
No ratings yet
Experiment No - 1
13 pages
Aryan
No ratings yet
Aryan
60 pages
CP5261Data Analytics Laboratory
No ratings yet
CP5261Data Analytics Laboratory
57 pages
Hadoop 3 Installation
No ratings yet
Hadoop 3 Installation
10 pages
Hadoop Installaion
No ratings yet
Hadoop Installaion
113 pages
Anurag 1-6 Merged
No ratings yet
Anurag 1-6 Merged
60 pages
Hadoop InstallSteps
No ratings yet
Hadoop InstallSteps
14 pages
Hadoop
No ratings yet
Hadoop
28 pages
CC Record
No ratings yet
CC Record
59 pages
Setup 7
No ratings yet
Setup 7
11 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
49 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
80 pages
A Step-By-Step Approach On Installing Hadoop in Vmware Workstation
No ratings yet
A Step-By-Step Approach On Installing Hadoop in Vmware Workstation
9 pages
2023MCS320004 HEMANTH TARRA - Hadoop Installation - Assignment
No ratings yet
2023MCS320004 HEMANTH TARRA - Hadoop Installation - Assignment
9 pages
PRACTICAL 4 - Single and Multi Node Hadoop Install
No ratings yet
PRACTICAL 4 - Single and Multi Node Hadoop Install
11 pages
BDA LAB Programs
No ratings yet
BDA LAB Programs
56 pages
Support of Hadoop Cluster Installation and Administration
No ratings yet
Support of Hadoop Cluster Installation and Administration
10 pages
Hadoop Installation
No ratings yet
Hadoop Installation
5 pages
EX. NO Date Program NO Sign
No ratings yet
EX. NO Date Program NO Sign
80 pages
BDA Practical
No ratings yet
BDA Practical
38 pages
How To Install Hadoop On Ubuntu 18.04 or 20.04
No ratings yet
How To Install Hadoop On Ubuntu 18.04 or 20.04
20 pages
Installing Hadoop On Ubuntu
No ratings yet
Installing Hadoop On Ubuntu
29 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
45 pages
Hadoop Installation Steps
100% (1)
Hadoop Installation Steps
6 pages
Linked List Q&A
No ratings yet
Linked List Q&A
8 pages
Installation of Hadoop in Ubuntu
No ratings yet
Installation of Hadoop in Ubuntu
15 pages
Bdamanual
No ratings yet
Bdamanual
8 pages
Apache Hadoop Installation and Cluster Setup On AWS EC2 (Ubuntu) - Part 2
No ratings yet
Apache Hadoop Installation and Cluster Setup On AWS EC2 (Ubuntu) - Part 2
23 pages
Hadoop
No ratings yet
Hadoop
5 pages
Bda Lab
No ratings yet
Bda Lab
37 pages
Hadoop Installation
No ratings yet
Hadoop Installation
7 pages
Hadoop Install
No ratings yet
Hadoop Install
19 pages
Hadoop Installation Step by Step
No ratings yet
Hadoop Installation Step by Step
8 pages
Hadoop 2.6 Installing On Ubuntu 14.04 (Single-Node Cluster)
No ratings yet
Hadoop 2.6 Installing On Ubuntu 14.04 (Single-Node Cluster)
27 pages
Updated CMD
No ratings yet
Updated CMD
23 pages
How To Install Hadoop On Ubuntu 18
No ratings yet
How To Install Hadoop On Ubuntu 18
15 pages
Hadoop Installation: Sunday, December 13, 2020 1
No ratings yet
Hadoop Installation: Sunday, December 13, 2020 1
18 pages
How To Install Hadoop On Ubuntu 18.04 or 20.04
No ratings yet
How To Install Hadoop On Ubuntu 18.04 or 20.04
15 pages
Hadoop Installation
No ratings yet
Hadoop Installation
7 pages
Hadoop 2.6 Installing On Ubuntu 14.04 (Single-Node Cluster) STEP:1
No ratings yet
Hadoop 2.6 Installing On Ubuntu 14.04 (Single-Node Cluster) STEP:1
13 pages
Online:: Setting Up The Environment
No ratings yet
Online:: Setting Up The Environment
9 pages
Hadoop Installation
No ratings yet
Hadoop Installation
12 pages
Hadoop 2.6.5 Installing On Ubuntu 16.04 and 18.04 (Single-Node Cluster)
No ratings yet
Hadoop 2.6.5 Installing On Ubuntu 16.04 and 18.04 (Single-Node Cluster)
7 pages
Single Node Hadoop Cluster
No ratings yet
Single Node Hadoop Cluster
9 pages
Manual Hadoop HIve Installation
No ratings yet
Manual Hadoop HIve Installation
4 pages
Ansys Sherlock
No ratings yet
Ansys Sherlock
11 pages
Java-Hadoop 2.X Setting Up
No ratings yet
Java-Hadoop 2.X Setting Up
12 pages
Hadoop Cluster Creation
No ratings yet
Hadoop Cluster Creation
8 pages
Hadoop 2 - Pseudo Node Installation
No ratings yet
Hadoop 2 - Pseudo Node Installation
9 pages
Agile With JIRA Notes
No ratings yet
Agile With JIRA Notes
5 pages
Installation of Hadoop
No ratings yet
Installation of Hadoop
8 pages
GCC Lab Manual
No ratings yet
GCC Lab Manual
61 pages
To Detete A User:: Hadoop-2.7.2 Installation Guide
No ratings yet
To Detete A User:: Hadoop-2.7.2 Installation Guide
3 pages
Module 2
No ratings yet
Module 2
14 pages
HADOOP 1.X Installation Steps On Ubuntu
No ratings yet
HADOOP 1.X Installation Steps On Ubuntu
3 pages
Genlab IB
No ratings yet
Genlab IB
16 pages
Research Papers PDF
No ratings yet
Research Papers PDF
783 pages
DAN Lab ManuaL
No ratings yet
DAN Lab ManuaL
53 pages
BIOS Instructor Setup Rev 6 65
No ratings yet
BIOS Instructor Setup Rev 6 65
24 pages
Selenium Manual
50% (4)
Selenium Manual
17 pages
Kinds of Sof.
No ratings yet
Kinds of Sof.
7 pages
Log
No ratings yet
Log
5 pages
Session 22-Agent Determination Errors
No ratings yet
Session 22-Agent Determination Errors
21 pages
Generated 0
No ratings yet
Generated 0
33 pages
Lec 04 Javascript
No ratings yet
Lec 04 Javascript
46 pages
JSCPD, Copy - Paste Detector
No ratings yet
JSCPD, Copy - Paste Detector
28 pages
Computer Project 12A
No ratings yet
Computer Project 12A
56 pages
1000 C Programming Questions PDF
100% (1)
1000 C Programming Questions PDF
123 pages
1 Docker - Installation
No ratings yet
1 Docker - Installation
6 pages
Errors+Flow Charts
No ratings yet
Errors+Flow Charts
20 pages
Backend Handbook: for Ruby on Rails Apps
From Everand
Backend Handbook: for Ruby on Rails Apps
Francisco Quintero
1/5 (1)
Hospital Management System Project Proposal Hospital Management System Project Proposal
No ratings yet
Hospital Management System Project Proposal Hospital Management System Project Proposal
24 pages
Starting With DCL Lisp
No ratings yet
Starting With DCL Lisp
7 pages
Supermarket Queue Simulation
No ratings yet
Supermarket Queue Simulation
11 pages
Experiment No. 11: Write A Java Program For Database Connectivity Using JDBC
No ratings yet
Experiment No. 11: Write A Java Program For Database Connectivity Using JDBC
4 pages
ASSIGNMENT Teacherapptempdocument1000041559 20231120163855
No ratings yet
ASSIGNMENT Teacherapptempdocument1000041559 20231120163855
5 pages
Conditionalstatements
No ratings yet
Conditionalstatements
3 pages
FIT1045 Algorithmic Problem Solving - Assignment 1 (5%) .: Due: Midnight, Friday 25th March, 2016
No ratings yet
FIT1045 Algorithmic Problem Solving - Assignment 1 (5%) .: Due: Midnight, Friday 25th March, 2016
3 pages
Exercise: II: Aim: To Know How The Constraints Are Used To Make Table Is Consistent
No ratings yet
Exercise: II: Aim: To Know How The Constraints Are Used To Make Table Is Consistent
4 pages
Kok
No ratings yet
Kok
3 pages
Brochure AlgoSec ASMS Sys Admin Training 2021
No ratings yet
Brochure AlgoSec ASMS Sys Admin Training 2021
2 pages
Software Engineering Two Marks Questions Unit-I
No ratings yet
Software Engineering Two Marks Questions Unit-I
2 pages
Tca Create Location API - All Oracle Apps
No ratings yet
Tca Create Location API - All Oracle Apps
3 pages
Arch Linux: Fast and Light!
From Everand
Arch Linux: Fast and Light!
Frank Cheung
3/5 (2)
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hedaya Alasooly
No ratings yet
Setup of a Graphical User Interface Desktop for Linux Virtual Machine on Cloud Platforms
From Everand
Setup of a Graphical User Interface Desktop for Linux Virtual Machine on Cloud Platforms
Dr. Hidaia Mahmood Alassouli
No ratings yet
Hiding Web Traffic with SSH: How to Protect Your Internet Privacy against Corporate Firewall or Insecure Wireless
From Everand
Hiding Web Traffic with SSH: How to Protect Your Internet Privacy against Corporate Firewall or Insecure Wireless
Slava Gomzin
No ratings yet
Cisco Packet Tracer for Beginners
From Everand
Cisco Packet Tracer for Beginners
kalyan chinta
5/5 (3)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Complete Hadoop Map Reduce Hive Setup Step by Step

Uploaded by

Complete Hadoop Map Reduce Hive Setup Step by Step

Uploaded by

MAKE SURE U ARE CONNECTED TO RIGHT INTERNET . .

WIFI (UR OWN HOTSPOT) OR UNIV… SOME

1. download ubuntu 20.04 version

2. create a virtual machine named as bigdata

Next screen would like below:

Initially all black screen.. with lot of lines… no worries..

Select Install Unbutu

Select all default.. and continue

SELECT NORMAL INSTALLATION… AND CONTINUE..

u will be seeing installation and copying takes place.. be patient..

Prerequisites before installing/setting up Hadoop and Map Reduce (Java Based..)

Type the following command in your terminal to install OpenJDK 8:

$:sudo apt install openjdk-8-jdk -y

Once the installation process is complete, verify the current

$:sudo apt install openssh-server openssh-client -y

U will get into the following page:

2023-01-24 22:31:52 (2.85 MB/s) - ‘hadoop-3.2.1.tar.gz’ saved

Step-13:Configure Hadoop environment is configured by editing a set of

Uncomment the $JAVA_HOME variable (i.e., remove the #

The core-site.xml file defines HDFS and Hadoop core properties.

$:hdfs namenode -format

If u see the above message u are on the right track..

<<this is what u will see when you execute above command..>>

What u see below when you execute..

Step-17: Check to see all fine

If u see all below.. u are on a right track..

Alternatively u can use stop-all.sh or start-all.sh

The default port 9864 is used to access individual DataNodes

The YARN Resource Manager is accessible on port 8088:

SETTING UP AND RUNNING WORD COUNT PROGRAM USING MAP REDUCE

2023-01-25 00:52:17,134 WARN util.NativeCodeLoader: Unable to load native-hadoop

2023-01-25 01:01:43,015 WARN util.NativeCodeLoader: Unable to load native-

HADOOP / BIG DATA SYSTEM MONITORING

FOR MAHOUT / APACHE ML…….LIBRARIES

FOR APACHE SPARK

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.