0% found this document useful (0 votes)
80 views7 pages

Hadoop Installation

This document provides steps to install and configure Hadoop in standalone mode on a single machine. It includes instructions for: 1. Installing Java and setting it as the default version. 2. Creating a Hadoop user account and group. 3. Downloading and extracting Hadoop before moving it to the local file system and setting permissions. 4. Configuring environment variables and files like core-site.xml, hdfs-site.xml, and yarn-site.xml to set properties for ports, directories, and other Hadoop settings. 5. Enabling passwordless SSH access between nodes for communication in distributed mode.

Uploaded by

Girik Khullar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views7 pages

Hadoop Installation

This document provides steps to install and configure Hadoop in standalone mode on a single machine. It includes instructions for: 1. Installing Java and setting it as the default version. 2. Creating a Hadoop user account and group. 3. Downloading and extracting Hadoop before moving it to the local file system and setting permissions. 4. Configuring environment variables and files like core-site.xml, hdfs-site.xml, and yarn-site.xml to set properties for ports, directories, and other Hadoop settings. 5. Enabling passwordless SSH access between nodes for communication in distributed mode.

Uploaded by

Girik Khullar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Minimum Requirements (Recommended)

RAM – More than 4GB


Better if you dual boot your OS (Even Working on Virtual Machine is fine)
Java 8 (Don’t go for higher versions for now, as some features may not work fine with versions
like jdk-11)
Hadoop could be worked in one of the three modes
1. Standalone mode
2. Pseudo-distributed mode
3. Fully distributed mode
Steps for installing Hadoop (including installation of Java)
Java Installation
Step1: update the system by using the following command
sudo apt-get update //sudo- super user
does(Substitute user does)
Step 2: use the following command to upgrade the packages that have already been installed
on your machine
sudo apt-get upgrade
Step 3: visit oracle official site and download java8 version tar file (tar.gz file extension)
Read the contents of the file by using table achieve command as
tar –xzvf oracle-java8-installer.tar.gz [x-extrect, v-verbose, z-zipfile,
f-file]
Where oracle-java-8-installer.tar.gz is the file downloaded
[OR]
Install java by using the following command
Sudo apt-get install oracle-java8-installer
If you have installed Java previously (or if you have multiple versions of java on your
machine), use the command to set the current java version as default
Sudo apt-get install oracle-java8-set-default
Move java to lib folder
Sudo mv
Creating user and installing HADOOP

Step 1: login as root


sudo su
to check the user, you can type
who am i
Step 2: Add a Hadoop group
sudo addgroup hadoop //Hadoop is the group name
Step 3: Create a user to the created group
sudo adduser hduser
Step 4: Add this user to Hadoop group created
sudo adduer hduser hadoop
[OR}
sudo adduser --ingroup hadoop hduser
Step 5: Grant all privileges to the newly created user
Open a new terminal
Use the following command
sudo su root //login as root
sudo gedit /etc/sudoers //super user(substitute
user) does file
A file gets opened, add the following lines
#user privilege specification
root ALL = (ALL:ALL)ALL
hduser ALL = (ALL:ALL)ALL
save the file; close the newly opened terminal.
Step 6: As Hadoop core requires Shell for communication with slave nodes (especially for Fully-
distributed mode, without requirement of password for every call, install password-less SSH. In
standalone and Pseudo-distributed mode, SSH makes Hadoop core to access local host.
SSH : command used to connect to remote machines (client)
SSHD : Daemon that runs on the server and allows clients to connect to the server
sudo apt-get install ssh
To know about the existing SSH installed, one could use
which ssh
which sshd
Generate a password-less SSH for the user created in Hadoop group
sudo hduser
ssh-keygen –t rsa –P “”
Note: In the above command, “P” is capital and don’t give any space in the double-
quotation.
Add the newly created key to the list of authorized keys so the Hadoop can use SSH without
prompting for password
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
To check the new created SSH, just type
ssh localhost
Note: It is recommended to disable IPV6 as Hadoop doesn’t work on IPV6. To disable
IPV6, open the system control configuration file
sudo gedit /etc/sysctl.conf
In the opened file, type the following lines
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
net.ipv6.la.disable_ipv6=1
Reboot the system for the changes to be reflected
sudo reboot
Step 7: Download Hadoop either from the official site (recommended) or by using the
command
To download from official site, Visit http://hadoop.apache.org
[OR]
wget
http://mirrors.sonic.net/apache/hadoop/common/hadoop-3.2.0/hadoop-3.2.0.tar.gz
Extract the contents from the “.tar.gz” file , using the command
tar –xzvf Hadoop-3.2.0.tar.gz
move Hadoop to local by changing the user as hduser
sudo su hduser
sudo mv hadoop-3.2.0 /usr/local/Hadoop
Add ownership to the user
Sudo chown hduser:Hadoop -R /usr/local/Hadoop
If you want to check the permissions, just use
ls -ltr /usr/local
The above will show the permissions for all the folders present in local directory. You can
change the permissions required using
sudo chmod 777 hadoop //777 grants all permissions to all users
{user, group and others}

Step 8: Before using the hadoop environment, it should be set-up by initially configuring the
configuration file of the Hadoop environment, by editing “bashrc” file
Open bashrc file, by using the command
sudo gedit /home/hduser/.bashrc
Append the following lines in the file opened
# Set HADOOP_HOME
export JAVA_HOME=/usr/lib/jvm/java-11-oracle
export HADOOP_HOME=/usr/local/hadoop
# Add Hadoop bin and sbin directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
The above will configure your Hadoop environment which helps in getting environmental set-up
needed for working with hadoop from anywhere for the user “hduser”
Step 9: Move to the hadoop folder that contains other environmental set-up related to HDFS,
Hadoop general environment etc.
cd /usr/local/hadoop/etc/hadoop
Now, configure hadoop-env.sh file, where path for java is to be mentioned
sudo gedit hadoop-env.sh
Append the following line in the file opened, and save it
export JAVA_HOME=/usr/lib/jvm/java-8-oracle

Step 9. A. Now configure core-site.xml file, for this open core-site.xml file, append the following
properties, save the file.
sudo gedit core-site.xml

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
</property>
</configuration>

The above configuration file helps in specifying the default port in which Hadoop environment
should work, including the location in which data related to hadoop should be stored.
It is to be remembered that XML tags are to be correctly opened and closed. There is difference
in port numbers from Hadoop 2.x to 3.x, in order to mitigate some port conflict issues.

Step 9. B. Now configure hdfs-site.xml file. Open hdfs-site.xml file, append the following
properties and save the file.
sudo gedit hdfs-site.xml

<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>
<property>
<name>dfs.block.size</name>
<value>104857600</value>
</property>
</configuration>

The above configuration file specifies the number of replications that you intend to use for your
hadoop set-up, the location for your namenode and datanode folders and the block size for
each of the datanode.
Note: Datanode is where data is stored and complete computations are made. Namenode
just tracks about the processing from DataNode periodically. It is to be noted that
specifying the block size is not mandatory and for hadoop 1.x, the default size would be
64MB, whereas for other set-up like 2.x or higher, it would be 128MB. When specifying
the block size, it should be specified in terms of KB as given above in the XML file. It
purely depends on application to decide about block size.

Step 9.C. Configure yarn-site.xml. Open the file, add the properties and save the file.
sudo gedit yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->


<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>
</configuration>

Note: YARN stands for “Yet Another Resource Negotiator” , which takes care about
resource management and job scheduling. Resource Manager has two main components
namely, Scheduler and ApplicationManager. Here the task of Scheduler is to allocate
resources to various running applications and ApplicationManager is to accept job-
submissions and negotiating containers.

The properties specified in the above XML file has the host specification in which your hadoop
works, about the auxiliary services that are to be formed.

Step 9.D. Configure mapred-site.xml


Open mapred-site.xml file, append the following properties and save the file.
sudo gedit mapred-site.xml

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.admin.user.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_COMMON_HOME</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_COMMON_HOME</value>
</property>

</configuration>

The above file has the configuration for MapReduce tasks used to override the default values
for MapReduce parameters. Configuration includes the path for mapreduce environment for the
current user, and framework name which handles it. (YARN)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy