0% found this document useful (0 votes)
52 views30 pages

Complete Hadoop Map Reduce Hive Setup Step by Step

Uploaded by

viveksingh67067
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views30 pages

Complete Hadoop Map Reduce Hive Setup Step by Step

Uploaded by

viveksingh67067
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 30

MAKE SURE U ARE CONNECTED TO RIGHT INTERNET . .

WIFI (UR OWN HOTSPOT) OR UNIV… SOME


TIMES FIRE WALL RULES WOULD PREVENT SOME INTALLATION/COPY FROM INTERNET**

1. download ubuntu 20.04 version


or
copy from pen drive given by the SME/trainer

2. create a virtual machine named as bigdata

NOTE: While creating virtual machine, point to D or E drive, as VM files occupy space in c:\ drive by
default and u will get into storage issues.
Create a folder like bigdata in d:\ drive and use os type as linux and version as Ubuntu 64 bit.
U can start with 2048 MB i.e 2 GB memory and later can increase it to 3 gb or 4 gb.
Select Create Virtual Hard Disk now option and click Create button.

Next screen would like below:


Give file location in D Drive (default it will choose, just double check); give minimum of 45 gb as we
will be downloading, installing hadoop, hive, mongodb and other BIG DATA system related software
in the same linux box.
3. VM should have been created now and you can see the VM in the Virtual Box Player as below:

It’s time to point the right ISO image corresponding to Ubuntu 20.04 version (64 bit) file by following
the steps given below:
Select the VM bigdata and right click and choose settings
Click the storage option and the pop up window look like the following

Now select the Disk symbol (empty) and in the right panel, there are Attributes, click the down
arrow corresponding to disk sign to choose the optical drive/iso file..

We can go with 16.04.7 – server version for full or can choose 20.04.4 desktop-amd64 file
Once you select the right iso image Ubuntu-20.04.4-desktop-amd64…ur screen look like above.

Keep the network setting with NAT for accessing internet from your linux
SELECT OK and proceed…

Now you are ready to start your VIRTUAL MACHINE.. WHICH WILL BE BOOTED AND INITIALIZED
WITH UBUNTU 20.04 ISO IMAGE… (this process takes about 30 minutes.. max depending on your
system). Select the VM, select Start … and select Normal Start option as below:
Maximize the vm screen where installation is happening…

Initially all black screen.. with lot of lines… no worries..

Select Install Unbutu

Select all default.. and continue

SELECT NORMAL INSTALLATION… AND CONTINUE..


It will show Erase disk and install Ubuntu.. NO WORRIES.. IT WILL BE ERASING ONLY THE SELECTED
FOLDER/SIZE OF 36 GB.. DISK VOLUME…MOUNT POINT… IT DOESN’T DO ANYTHING TO WINDOWS..
IT IS FOR VIRTUAL DISK…. . .Select Install Now
Continue in the next screen…

By default it select india Calcutta region.. go for it.. if in case if it is showing different region..select
accordingly.
NEXT SCREEN IS VERY IMPORTANT.. REMEMBER TO GIVE RIGHT USER NAME AND PASSWORD.. AND
STORE IT…/REMEMBER IT…

bigdata2023

u will be seeing installation and copying takes place.. be patient..


Finally you get the message……………..Restart…..
INSTRUCTIONS FOR SETTING UP HADOOP & MAP REDUCE

don’t upgrade or any packages in Ubuntu / linux OS.. if it asking forcefully ***

Prerequisites before installing/setting up Hadoop and Map Reduce (Java Based..)


 Access to a terminal window/command line
 Sudo or root privileges on local /remote machines
Install OpenJDK on Ubuntu
The Hadoop framework is written in Java, and its services require a compatible Java Runtime
Environment (JRE) and Java Development Kit (JDK).

Use the following command to update your system before initiating a new installation:

Step-1
sudo apt update
drdvenkat@drdvenkat-VirtualBox:~/Desktop$ sudo apt update
[sudo] password for drdvenkat:
Hit:1 http://security.ubuntu.com/ubuntu focal-security InRelease
Hit:2 http://in.archive.ubuntu.com/ubuntu focal InRelease
Hit:3 http://in.archive.ubuntu.com/ubuntu focal-updates InRelease
Hit:4 http://in.archive.ubuntu.com/ubuntu focal-backports InRelease
Reading package lists... Done
Building dependency tree
Reading state information... Done
329 packages can be upgraded. Run 'apt list --upgradable' to see them.
drdvenkat@drdvenkat-VirtualBox:~/Desktop$

to change the shell prompt.. long one to short one/ur favourtie use the following step:
drdvenkat@drdvenkat-VirtualBox:~/Desktop$ export PS1=$:
$:

Type the following command in your terminal to install OpenJDK 8:

Step-2
sudo apt install openjdk-8-jdk -y

$:sudo apt install openjdk-8-jdk -y


Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
...................
it takes a while..
Step-3

Once the installation process is complete, verify the current


Java version:
java -version
javac -version

$:which java
/usr/bin/java
$:java -version
openjdk version "1.8.0_352"
OpenJDK Runtime Environment (build 1.8.0_352-8u352-ga-1~20.04-b08)
OpenJDK 64-Bit Server VM (build 25.352-b08, mixed mode)
$:which javac
/usr/bin/javac
$:javac -version
javac 1.8.0_352
$:
Set Up a Non-Root User for Hadoop Environment
It is advisable to create a non-root user, specifically for the
Hadoop environment. A distinct user improves security and helps you
manage your cluster more efficiently. To ensure the smooth
functioning of Hadoop services, the user should have the ability to
establish a passwordless SSH connection with the localhost.
Install the OpenSSH server and client using the following
command
Step-4
sudo apt install openssh-server openssh-client -y

$:sudo apt install openssh-server openssh-client -y


[sudo] password for drdvenkat:
Reading package lists... Done
Building dependency tree
....................takes a while..
rescue-ssh.target is a disabled or a static unit, not starting it.
Processing triggers for systemd (245.4-4ubuntu3.15) ...
Processing triggers for man-db (2.9.1-1) ...
Processing triggers for ufw (0.36-6ubuntu1) ...
$:
Create Hadoop User
Utilize the adduser command to create a new Hadoop user:
Step-5
sudo adduser hdoop
$:sudo adduser hdoop
Adding user `hdoop' ...
Adding new group `hdoop' (1001) ...
Adding new user `hdoop' (1001) with group `hdoop' ...
Creating home directory `/home/hdoop' ...
Copying files from `/etc/skel' ...
New password:
Retype new password:
passwd: password updated successfully
Changing the user information for hdoop
Enter the new value, or press ENTER for the default
Full Name []: hadoop user account
Room Number []:
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n] Y
Switch to the newly created user and enter the corresponding
password:
Step-6
su – hdoop
$:su - hdoop
Password:
hdoop@drdvenkat-VirtualBox:~$ PS=$:
hdoop@drdvenkat-VirtualBox:~$ export PS1=$:
$:who am i
$:pwd
/home/hdoop
$:
Enable Passwordless SSH for Hadoop User
Generate an SSH key pair and define the location is is to be stored in:
Step-7
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$:ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
Created directory '/home/hdoop/.ssh'.
Your identification has been saved in /home/hdoop/.ssh/id_rsa
Your public key has been saved in /home/hdoop/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:3rXw8t/Hk44PWcJqfXKpdguirPUlc4hduk1YdrTsMR0 hdoop@drdvenkat-
VirtualBox
The key's randomart image is:
+---[RSA 3072]----+
| |
| |
| E |
| . o +|
| S . .* B.|
| . .o+X.* +|
| .oo@oX *.|
| o +oX.O=o|
| ..o oo==+=|
+----[SHA256]-----+
$:
Use the cat command to store the public key as authorized_keys in
the ssh directory:
Step-8
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Set the permissions for your user with the chmod command:

Step-9
chmod 0600 ~/.ssh/authorized_keys
$:ls -lt ~/.ssh/authorized_keys
-rw------- 1 hdoop hdoop 580 Jan 24 22:04 /home/hdoop/.ssh/authorized_keys
$:
The new user is now able to SSH without needing to enter a password every time.
Verify everything is set up correctly by using the hdoop user to SSH to localhost:
Step-10
ssh localhost
$:ssh localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is
SHA256:CXz9eqInsu9wcBTgemSUKUdujMiDkgM91L0lU758Yj0.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 20.04.4 LTS (GNU/Linux 5.15.0-58-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
330 updates can be applied immediately.
255 of these updates are standard security updates.
hdoop@drdvenkat-VirtualBox:~$
Download and Install Hadoop on Ubuntu
Visit the official Apache Hadoop project page, and select the version of Hadoop you want to
implement.
Go to the archive page.. and select Hadoop 3.2.1
https://archive.apache.org/dist/hadoop/common/

U will get into the following page:


https://archive.apache.org/dist/hadoop/common/hadoop-3.2.1/
Select the following correctly..
hadoop-3.2.1.tar.gz 2020-07-03 04:38 343M
Step-11
Save the tar.gz file in the default / downloads directory
Once the download is complete, first copy the file to the home directory from the downloads
directory..
or you can directly get the file using wget command using below step:
Use the provided mirror link and download the Hadoop package with the wget command:
Step-11 **RECOMMENDED ONE
wget https://archive.apache.org/dist/hadoop/common/hadoop-3.2.1/hadoop-
3.2.1.tar.gz
hdoop@drdvenkat-VirtualBox:~$ wget
https://archive.apache.org/dist/hadoop/common/hadoop-3.2.1/hadoop-
3.2.1.tar.gz
--2023-01-24 22:18:37--
https://archive.apache.org/dist/hadoop/common/hadoop-3.2.1/hadoop-
3.2.1.tar.gz
Resolving archive.apache.org (archive.apache.org)... 138.201.131.134,
2a01:4f8:172:2ec5::2
Connecting to archive.apache.org (archive.apache.org)|
138.201.131.134|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 359196911 (343M) [application/x-gzip]
Saving to: ‘hadoop-3.2.1.tar.gz’
hdoop@drdvenkat-VirtualBox:~$ wget
https://archive.apache.org/dist/hadoop/common/hadoop-3.2.1/hadoop-
3.2.1.tar.gz
--2023-01-24 22:29:52--
https://archive.apache.org/dist/hadoop/common/hadoop-3.2.1/hadoop-
3.2.1.tar.gz
Resolving archive.apache.org (archive.apache.org)... 138.201.131.134,
2a01:4f8:172:2ec5::2
Connecting to archive.apache.org (archive.apache.org)|
138.201.131.134|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 359196911 (343M) [application/x-gzip]
Saving to: ‘hadoop-3.2.1.tar.gz’

hadoop-3.2.1.tar.gz 100%[===================================>]
342.56M 2.91MB/s in 2m 0s

2023-01-24 22:31:52 (2.85 MB/s) - ‘hadoop-3.2.1.tar.gz’ saved


[359196911/359196911]

hdoop@drdvenkat-VirtualBox:~$
Extract the files to initiate the Hadoop installation
Step-12:
tar xzf hadoop-3.2.1.tar.gz
hdoop@drdvenkat-VirtualBox:~$ tar xzf hadoop-3.2.1.tar.gz
The Hadoop binary files are now located within the hadoop-3.2.1 directory
hdoop@drdvenkat-VirtualBox:~$ PS1=$:
$:id
uid=1001(hdoop) gid=1001(hdoop) groups=1001(hdoop)
$:ls -lt
total 350788
-rw-rw-r-- 1 hdoop hdoop 359196911 Jul 3 2020 hadoop-3.2.1.tar.gz
drwxr-xr-x 9 hdoop hdoop 4096 Sep 10 2019 hadoop-3.2.1
Single Node Hadoop Deployment (Pseudo-Distributed Mode)
Hadoop excels when deployed in a fully distributed mode on a large cluster of networked
servers. However, if you are new to Hadoop and want to explore basic commands or test
applications, you can configure Hadoop on a single node. This setup, also called pseudo-
distributed mode, allows each Hadoop daemon to run as a single Java process.

Step-13:Configure Hadoop environment is configured by editing a set of


configuration files listed below:
 .bashrc
 hadoop-env.sh
 core-site.xml
 hdfs-site.xml
 mapred-site-xml
 yarn-site.xml
Edit the .bashrc shell configuration file using a text editor of your choice (we will be using nano or
vi editor)
vi .bashrc
(add the following at the end of the file -- u need to know vi editor commands)
## entries added by Dr.D.VENKAT for HADOOP ENVIRONMENT SETTINGS
export HADOOP_HOME=/home/hdoop/hadoop-3.2.1
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/nativ"

Once u edited and added the entries.. save them and exit..
execute the environment file using the following command.
hdoop@drdvenkat-VirtualBox:~$ source .bashrc
hdoop@drdvenkat-VirtualBox:~$ env | grep HADOOP
HADOOP_OPTS=-Djava.library.path=/home/hdoop/hadoop-3.2.1/lib/nativ
HADOOP_INSTALL=/home/hdoop/hadoop-3.2.1
HADOOP_MAPRED_HOME=/home/hdoop/hadoop-3.2.1
HADOOP_COMMON_HOME=/home/hdoop/hadoop-3.2.1
HADOOP_HOME=/home/hdoop/hadoop-3.2.1
HADOOP_HDFS_HOME=/home/hdoop/hadoop-3.2.1
HADOOP_COMMON_LIB_NATIVE_DIR=/home/hdoop/hadoop-3.2.1/lib/native
hdoop@drdvenkat-VirtualBox:~$
Edit hadoop-env.sh File
The hadoop-env.sh file serves as a master file to configure YARN, HDFS, MapReduce, and
Hadoop-related project settings. When setting up a single node Hadoop cluster, you need to
define which Java implementation is to be utilized. Use the previously created
$HADOOP_HOME variable to access the hadoop-env.sh file:
vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh

Uncomment the $JAVA_HOME variable (i.e., remove the #


sign) and add the full path to the OpenJDK installation on
your system.
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
Steps are listed below:
vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh
# export JAVA_HOME=
# New entry for JAVA PATH is added by Dr D VENKAT
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
Use the provided path to find the OpenJDK directory by the
below command
readlink -f /usr/bin/javac
Edit core-site.xml File
U NEED TO MAKE SURE THAT YOU CREATE A DIRECTORY FOR TEMPORARY DATA as
below:

mkdir /home/hdoop/tmpdata

The core-site.xml file defines HDFS and Hadoop core properties.


To set up Hadoop in a pseudo-distributed mode, you need to specify the URL for your
NameNode, and the temporary directory Hadoop uses for the map and reduce process.
vi $HADOOP_HOME/etc/hadoop/core-site.xml
Add the following configuration to override the default values
for the temporary directory and add your HDFS URL to replace
the default local file system setting:
<!-- Put site-specific property overrides in this file. -->
<!-- Dr D Venkat Setup for hadoop site configuration... -->

<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hdoop/tmpdata</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://127.0.0.1:9000</value>
</property>
</configuration>
Edit hdfs-site.xml File
U HAVE TO CREATE TWO DIRECTORIES IN ORDER TO CREATE NAME NODE AND DATA NODE
DATA

$:mkdir -p /home/hdoop/dfsdata/namenode

$:mkdir -p /home/hdoop/dfsdata/datanode

The properties in the hdfs-site.xml file govern the location for storing node metadata, fsimage
file, and edit log file. Configure the file by defining the NameNode and DataNode storage
directories.
Additionally, the default dfs.replication value of 3 needs to be changed to 1 to match
the single node setup.
Use the following command to open the hdfs-site.xml file for editing:
$ vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml
Add the following configuration to the file and, if needed,
adjust the NameNode and DataNode directories to your custom
locations:
$: vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml
<!-- ENTRIES ADDED BY Dr D VENKAT. -->
<configuration>
<property>
<name>dfs.data.dir</name>
<value>/home/hdoop/dfsdata/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hdoop/dfsdata/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
Edit mapred-site.xml File
Use the following command to access the mapred-site.xml file and define MapReduce
values:
vi $HADOOP_HOME/etc/hadoop/mapred-site.xml
<!-- Entries Added by Dr D VENKAT for MAP REDUCE to use YARN scheduler -->

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

</configuration>
Edit yarn-site.xml File
The yarn-site.xml file is used to define settings relevant to YARN. It contains configurations
for the Node Manager, Resource Manager, Containers, and Application Master.
Open the yarn-site.xml file in a text editor:
vi $HADOOP_HOME/etc/hadoop/yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->


<!-- Added by Dr D VENKAT -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>127.0.0.1</value>
</property>
<property>
<name>yarn.acl.enable</name>
<value>0</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>

<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSP
ATH_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>

</configuration>
##########SETTING UP THE HADOOP SINGLE NODE CLUSTER######################
STEP-14: Format HDFS NameNode
It is important to format the NameNode before starting Hadoop services for the first time:
hdfs namenode -format

$:which hdfs
/home/hdoop/hadoop-3.2.1/bin/hdfs

$:which hadoop
/home/hdoop/hadoop-3.2.1/bin/hadoop

$:hdfs namenode -format

…………………………
SHUTDOWN_MSG: Shutting down NameNode at drdvenkat-VirtualBox/127.0.1.1
************************************************************/

If u see the above message u are on the right track..

If in case if you see any error.. don’t proceed.. check google uncle
Step-15: Start Hadoop Cluster
Navigate to the hadoop-3.2.1/sbin directory and execute the following commands to start the
NameNode and DataNode:
$:source .bashrc
hdoop@drdvenkat-VirtualBox:~$ PS1=$:
$:env | grep hadoop
YARN_HOME=/home/hdoop/hadoop-3.2.1
HADOOP_OPTS=-Djava.library.path=/home/hdoop/hadoop-3.2.1/lib/nativ
HADOOP_INSTALL=/home/hdoop/hadoop-3.2.1
HADOOP_MAPRED_HOME=/home/hdoop/hadoop-3.2.1
HADOOP_COMMON_HOME=/home/hdoop/hadoop-3.2.1
HADOOP_HOME=/home/hdoop/hadoop-3.2.1
HADOOP_HDFS_HOME=/home/hdoop/hadoop-3.2.1
HADOOP_COMMON_LIB_NATIVE_DIR=/home/hdoop/hadoop-3.2.1/lib/native
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/
games:/usr/local/games:/snap/bin:/home/hdoop/hadoop-3.2.1/sbin:/home/
hdoop/hadoop-3.2.1/bin
$:cd $HADOOP_HOME
$:cd sbin
$:./start-dfs.sh

<<this is what u will see when you execute above command..>>


Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [drdvenkat-VirtualBox]
drdvenkat-VirtualBox: Warning: Permanently added 'drdvenkat-virtualbox'
(ECDSA) to the list of known hosts.
2023-01-24 23:46:38,318 WARN util.NativeCodeLoader: Unable to load native-
hadoop library for your platform... using builtin-java classes where
applicable
$:
Step-16: Stary Yarn Resource Manager
Once the namenode, datanodes, and secondary namenode are up
and running, start the YARN resource and nodemanagers by
typing:
./start-yarn.sh

What u see below when you execute..


Starting resourcemanager
Starting nodemanagers

Step-17: Check to see all fine


CHECK WHETHER name node, data node and yarn resource managers running using
jps comamnd
$:jps

If u see all below.. u are on a right track..

12882 NodeManager
12499 SecondaryNameNode
12757 ResourceManager
13225 Jps
12330 DataNode
12186 NameNode
$:

Alternatively u can use stop-all.sh or start-all.sh


Step-18: Access Hadoop Environment from Web Browser/UI
Use your preferred browser and navigate to your localhost URL or IP. The default port
number 9870 gives you access to the Hadoop NameNode UI:
http://localhost:9870

The default port 9864 is used to access individual DataNodes


directly from your browser:
http://localhost:9864

The YARN Resource Manager is accessible on port 8088:


http://localhost:8088

SETTING UP AND RUNNING WORD COUNT PROGRAM USING MAP REDUCE


Step-19:
$:pwd
/home/hdoop
$:mkdir data_source
$:cd data_source
$:vi data.txt
big data is the latest trend
one of the trend in 2023 is big data
big data is every where
all companies uses big data
as trend is towards big data systems people tend to move towards the trend
data data data is the key to success
big data helps to achieve the faster and distributed processing of data
distributed applications are the way to move forward
~
~
:wq!
$:cat data.txt
big data is the latest trend
one of the trend in 2023 is big data
big data is every where
all companies uses big data
as trend is towards big data systems people tend to move towards the trend
data data data is the key to success
big data helps to achieve the faster and distributed processing of data
distributed applications are the way to move forward
$:
Step 20: EXECUTION OF MAP REDUCE PROGRAM TO CHECK THE WORD
COUNT........ FOR THE GIVEN DATA SOURCE FILE
$:pwd
/home/hdoop/hadoop-3.2.1/share/hadoop/mapreduce
$:ls -lt *example*
-rw-r--r-- 1 hdoop hdoop 316534 Sep 10 2019 hadoop-mapreduce-examples-3.2.1.jar
$:hadoop dfs -mkdir -p /usr/local/hadoop/input
WARNING: Use of this script to execute dfs is deprecated.
WARNING: Attempting to execute replacement "hdfs dfs" instead.

2023-01-25 00:52:17,134 WARN util.NativeCodeLoader: Unable to load native-hadoop


library for your platform... using builtin-java classes where applicable
$:cd
$:cd data_source
$:ls -x
data.txt
$:pwd
/home/hdoop/data_source
$:hadoop dfs -copyFromLocal /home/hdoop/data_source/data.txt
/usr/local/hadoop/input
WARNING: Use of this script to execute dfs is deprecated.
WARNING: Attempting to execute replacement "hdfs dfs" instead.
2023-01-25 00:53:42,244 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
2023-01-25 00:53:43,941 INFO sasl.SaslDataTransferClient: SASL encryption trust check:
localHostTrusted = false, remoteHostTrusted = false
$:
$:cd $HADOOP_HOME/share/hadoop/mapreduce
$:hadoop jar hadoop-mapreduce-examples-3.2.1.jar wordcount
/usr/local/hadoop/input/data.txt /usr/local/hadoop/output
.........
2023-01-25 00:57:05,813 INFO mapreduce.Job: The url to track the job:
http://drdvenkat-VirtualBox:8088/proxy/application_1674584287946_0001/
2023-01-25 00:58:25,870 INFO mapreduce.Job: map 100% reduce 0%
2023-01-25 00:58:54,862 INFO mapreduce.Job: map 100% reduce 100%
2023-01-25 00:58:57,896 INFO mapreduce.Job: Job job_1674584287946_0001
completed successfully
2023-01-25 00:58:58,145 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=413
FILE: Number of bytes written=452215
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=475
HDFS: Number of bytes written=272
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
HDFS: Number of bytes read erasure-coded=0
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=19803
Total time spent by all reduces in occupied slots (ms)=20390
Total time spent by all map tasks (ms)=19803
Total time spent by all reduce tasks (ms)=20390
Total vcore-milliseconds taken by all map tasks=19803
Total vcore-milliseconds taken by all reduce tasks=20390
Total megabyte-milliseconds taken by all map tasks=20278272
Total megabyte-milliseconds taken by all reduce tasks=20879360
Map-Reduce Framework
Map input records=8
Map output records=67
Map output bytes=623
Map output materialized bytes=413
Input split bytes=118
Combine input records=67
Combine output records=34
Reduce input groups=34
Reduce shuffle bytes=413
Reduce input records=34
Reduce output records=34
Spilled Records=68
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=225
CPU time spent (ms)=1350
Physical memory (bytes) snapshot=323899392
File Input Format Counters
Bytes Read=357
File Output Format Counters
Bytes Written=272
Here /usr/local/hadoop/output folder will be created -automatically*
Check the final result of Word Count Program... for the data soure which we
have given.. To view the output, use this:
$:hadoop dfs -cat /usr/local/hadoop/output/part-r-00000
WARNING: Use of this script to execute dfs is deprecated.
WARNING: Attempting to execute replacement "hdfs dfs" instead.

2023-01-25 01:01:43,015 WARN util.NativeCodeLoader: Unable to load native-


hadoop library for your platform... using builtin-java classes where
applicable
2023-01-25 01:01:44,508 INFO sasl.SaslDataTransferClient: SASL encryption
trust check: localHostTrusted = false, remoteHostTrusted = false
2023 1
achieve 1
all 1
and 1
applications 1
are 1
as 1
big 6
companies 1
data 10
distributed 2
every 1
faster 1
forward 1
helps 1
in 1
is 5
key 1
latest 1
move 2
of 2
one 1
people 1
processing 1
success 1
systems 1
tend 1
the 6
to 4
towards 2
trend 4
uses 1
way 1
where 1
$:
we can check the status of the submitted/executed jobs in localhost:8088

HADOOP / BIG DATA SYSTEM MONITORING


FOLLOW THE INSTRUCTIONS….. WHICH WILL BE GIVEN
SEPARATELY…

FOR HIVE

FOR MAHOUT / APACHE ML…….LIBRARIES

FOR APACHE SPARK

FOR KAFKA…..

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy