Hadoop Installation
Hadoop Installation
Step 8: Before using the hadoop environment, it should be set-up by initially configuring the
configuration file of the Hadoop environment, by editing “bashrc” file
Open bashrc file, by using the command
sudo gedit /home/hduser/.bashrc
Append the following lines in the file opened
# Set HADOOP_HOME
export JAVA_HOME=/usr/lib/jvm/java-11-oracle
export HADOOP_HOME=/usr/local/hadoop
# Add Hadoop bin and sbin directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
The above will configure your Hadoop environment which helps in getting environmental set-up
needed for working with hadoop from anywhere for the user “hduser”
Step 9: Move to the hadoop folder that contains other environmental set-up related to HDFS,
Hadoop general environment etc.
cd /usr/local/hadoop/etc/hadoop
Now, configure hadoop-env.sh file, where path for java is to be mentioned
sudo gedit hadoop-env.sh
Append the following line in the file opened, and save it
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
Step 9. A. Now configure core-site.xml file, for this open core-site.xml file, append the following
properties, save the file.
sudo gedit core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
</property>
</configuration>
The above configuration file helps in specifying the default port in which Hadoop environment
should work, including the location in which data related to hadoop should be stored.
It is to be remembered that XML tags are to be correctly opened and closed. There is difference
in port numbers from Hadoop 2.x to 3.x, in order to mitigate some port conflict issues.
Step 9. B. Now configure hdfs-site.xml file. Open hdfs-site.xml file, append the following
properties and save the file.
sudo gedit hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>
<property>
<name>dfs.block.size</name>
<value>104857600</value>
</property>
</configuration>
The above configuration file specifies the number of replications that you intend to use for your
hadoop set-up, the location for your namenode and datanode folders and the block size for
each of the datanode.
Note: Datanode is where data is stored and complete computations are made. Namenode
just tracks about the processing from DataNode periodically. It is to be noted that
specifying the block size is not mandatory and for hadoop 1.x, the default size would be
64MB, whereas for other set-up like 2.x or higher, it would be 128MB. When specifying
the block size, it should be specified in terms of KB as given above in the XML file. It
purely depends on application to decide about block size.
Step 9.C. Configure yarn-site.xml. Open the file, add the properties and save the file.
sudo gedit yarn-site.xml
<configuration>
Note: YARN stands for “Yet Another Resource Negotiator” , which takes care about
resource management and job scheduling. Resource Manager has two main components
namely, Scheduler and ApplicationManager. Here the task of Scheduler is to allocate
resources to various running applications and ApplicationManager is to accept job-
submissions and negotiating containers.
The properties specified in the above XML file has the host specification in which your hadoop
works, about the auxiliary services that are to be formed.
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.admin.user.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_COMMON_HOME</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_COMMON_HOME</value>
</property>
</configuration>
The above file has the configuration for MapReduce tasks used to override the default values
for MapReduce parameters. Configuration includes the path for mapreduce environment for the
current user, and framework name which handles it. (YARN)