0% found this document useful (0 votes)
5 views

DataVisuaization Lab

Uploaded by

Odrib Deb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

DataVisuaization Lab

Uploaded by

Odrib Deb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Practical: 1

Aim: Configure Hadoop cluster in pseudo distributed mode and run basic Hadoop
commands.

Installation of Hadoop 3.3.2 on Ubuntu 18.04 LTS

1. Installing Java

$ sudo apt update


$ sudo apt install openjdk-8-jdk openjdk-8-jre
$ java -version

Set JAVA_HOME in .bashrc

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PATH=$PATH:/usr/lib/jvm/java-8-openjdk-amd64/bin

Apply changes of bashrc in ubuntu environment either by rebooting the system or


applying source ~/.bashrc

2. Adding dedicated hadoop user

$ sudo addgroup hadoop


$ sudo adduser --ingroup hadoop hduser

3. Adding hduser in sudoers file

$ sudo visudo

Add following line in the /etc/sudoers.tmp file

hduser ALL=(ALL:ALL) ALL

4. Now switch to hduser

$ su -hduser

5. Setting up SSH

Hadoop services like Resource Manager & Node Manager uses ssh to share the status of
nodes b/w slave to master & master to master.

$ sudo apt-get install openssh-server openssh-client

After installing ssh, generate ssh keys and copy them in ~/.ssh/authorized_keys.
Generate Keys for secure communication:
$ ssh-keygen -t rsa -P “”
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

6. Download Hadoop 3.3.2 tar file, extract it into /usr/local/Hadoop folder.

$ sudo tar xvzf hadoop-3.0.2.tar.gz

$ sudo mv -r hadoop-3.0.2 /usr/local/hadoop

7. Changing ownership to hduser:Hadoop group and full permission to them.

$ sudo chown -R hduser:hadoop /usr/local/hadoop $ sudo chmod -R 777


/usr/local/Hadoop

8. Hadoop Setup

This setup, also called pseudo-distributed mode, allows each Hadoop daemon to run as
a single Java process. A Hadoop environment is configured by editing a set of
configuration files:

bashrc hadoop-env.sh core-site.xml hdfs-site.xml mapred-site-xml yarn-site.xml

8.1 bashrc

$ sudo gedit ~/.bashrc


Add following lines at the end:

#Hadoop Related Options


export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export
PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"

$ source ~/.bashrc

8.2 hadoop-env.sh

Lets change the working directory to hadoop configurations location $ cd


/usr/local/hadoop/etc/hadoop/

$ sudo gedit hadoop-env.sh


Add this line:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

8.3 yarn-site.xml

$ sudo gedit yarn-site.xml


Add following lines:
<property> <name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>

</property>
<property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

8.4 hdfs-site.xml

$ sudo gedit hdfs-site.xml


Add following lines: <property> <name>dfs.replication</name> <value>1</value>

</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/yarn_data/hdfs/namenode</value> </property>
<property>
<name>dfs.datanode.data.dir</name>

<value>file:/usr/local/hadoop/yarn_data/hdfs/datanode</value> </property>

8.5 core-site.xml

$ sudo gedit core-site.xml


Add following lines:
<property>
<name>hadoop.tmp.dir</name> <value>/home/hduser/hadoop/tmp</value>
</property>

<property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value>


</property>

8.6 mapred-site.xml

$ sudo gedit mapred-site.xml


Add following lines:
<property> <name>mapred.framework.name</name> <value>yarn</value>

</property>
<property> <name>mapreduce.jobhistory.address</name>
<value>localhost:10020</value>
</property>

9. Create temp directory, directory for datanode and namenode

$ sudo mkdir -p /home/hduser/hadoop/tmp


$ sudo chown -R hduser:hadoop /home/hduser/hadoop/tmp

$ sudo chmod -R 777 /home/hduser/hadoop/tmp

$ sudo mkdir -p /usr/local/hadoop/yarn_data/hdfs/namenode


$ sudo mkdir -p /usr/local/hadoop/yarn_data/hdfs/datanode
$ sudo chmod -R 777 /usr/local/hadoop/yarn_data/hdfs/namenode
$ sudo chmod -R 777 /usr/local/hadoop/yarn_data/hdfs/datanode
$ sudo chown -R hduser:hadoop /usr/local/hadoop/yarn_data/hdfs/namenode $ sudo
chown -R hduser:hadoop /usr/local/hadoop/yarn_data/hdfs/datanode

10. Format Hadoop namenode to get the fresh start


$ hdfs namenode -format
Start all hadoop services by executing command one by one. $ start-dfs.sh
$ start-yarn.sh

or
$ start-all.sh

Type this simple command to check if all the daemons are active and running as Java
processes:
$ jps

Following output is expected if all went well:

6960 SecondaryNameNode 7380 NodeManager


6632 NameNode
11066 Jps

7244 ResourceManager 6766 DataNode

Access Hadoop UI from Browser

The default port number 9870 gives you access to the Hadoop NameNode UI:

http://localhost:9870

The NameNode user interface provides a comprehensive overview of the entire cluster.

The default port 9864 is used to access individual DataNodes directly from your
browser:
http://localhost:9864
The YARN Resource Manager is accessible on port 8088: http://localhost:8088

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy