0% found this document useful (0 votes)
37 views9 pages

Docker

Uploaded by

Salma ENNOUAIMI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views9 pages

Docker

Uploaded by

Salma ENNOUAIMI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

1. docker-compose.

yaml

The docker-compose.yaml file sets up a Hadoop cluster with four key services:

1. NameNode: Manages HDFS metadata (port 9870).

2. DataNode: Stores data blocks.

3. ResourceManager: Allocates resources for distributed applications (port 8088).


4. NodeManager: Manages containers on each node.

Each service uses the apache/hadoop:3 image, with mounted configuration files for proper setup and
operation.

2. core-site.xml

This XML configuration file specifies Hadoop's core settings.

 fs.defaultFS: Configures the default file system as HDFS with the address
hdfs://namenode:9000. This points to the NameNode service running on port 9000.

This configuration file defines the HDFS storage directories.

 dfs.namenode.name.dir: Specifies the directory for the NameNode’s metadata storage


(/tmp/hadoop-root/dfs/name).

 dfs.datanode.data.dir: Defines where the DataNodes will store their block data (/tmp/hadoop-
root/dfs/data).
3. mapred-site.xml

This file configures the MapReduce framework.

 mapreduce.framework.name: Specifies the execution framework as YARN, which is Hadoop’s


resource management and job scheduling framework.

4.yarn-site.xml

This file configures the YARN (Yet Another Resource Negotiator) settings.

 yarn.resourcemanager.hostname: Sets the hostname of the YARN ResourceManager to


resourcemanager, indicating where YARN will manage resources and job execution in the
cluster.
Command: Docker-compose up

After running docker-compose up, the Hadoop cluster was successfully deployed with four services:
NameNode, DataNode, ResourceManager, and NodeManager. Configuration files were mounted, and
ports were exposed for the NameNode (9870) and ResourceManager (8088) web interfaces. The cluster
is now fully operational for distributed data storage and processing.

When running docker-compose up, Docker pulls the apache/hadoop:3 image, creates the NameNode,
DataNode, ResourceManager, and NodeManager containers, and starts them. The logs display real-
time service initialization. The web interfaces are accessible at:

 NameNode: http://localhost:9870

 ResourceManager: http://localhost:8088

Command: docker ps
As part of the Hadoop deployment, the docker ps command was used to check the status of the running
Docker containers. This command revealed that the Hadoop components are successfully running across
multiple containers. These components are essential for the distributed file system (HDFS) and resource
management functionalities of the Hadoop ecosystem.

Tests:

First test: containers health check

To verify the health and status of the Hadoop containers, I used the following command to check if all
containers are up and running: docker-compose ps

As shown below, all four key Hadoop components are listed with the status "Up," confirming that they
are running correctly:

 hadoop-namenode-1: NameNode, responsible for managing the HDFS namespace and


regulating access to files, is running and accessible on port 9870.

 hadoop-datanode-1: DataNode, which handles the storage on the HDFS, is running and
operating in conjunction with the NameNode.

 hadoop-resourcemanager-1: ResourceManager, responsible for resource allocation across the


cluster, is running properly.

 hadoop-nodemanager-1: NodeManager, which manages resources and tasks on each node, is


up and functioning.

This output confirms that all services required for the Hadoop environment (NameNode, DataNode,
ResourceManager, and NodeManager) are running as expected.

Second test: Hadoop functionalities:

Command: To interact with the Hadoop container, I executed the following command to access the
running NameNode container: docker exec -it b260b8e4e5ec bash
This command allows me to open an interactive shell inside the hadoop-namenode-1 container
(container ID b260b8e4e5ec).

Once inside the container, I used the following commands to interact with HDFS:

1. Create a new directory: Hadoop fs -mkdir /test1

This command creates a new directory called /test1 in the Hadoop distributed file system
(HDFS).

2. List the contents of HDFS: Hadoop fs -ls


This command lists the directories and files in the root directory (/) of HDFS. The expected
output should include the newly created /test1 directory, confirming that the HDFS is
functioning properly.

This output verifies that the HDFS is functional, as the directory /test1 was created successfully and is
visible when listing the directory contents.

Third test: YARN Job Execution

This test was conducted to ensure that the YARN (Yet Another Resource Negotiator) system is properly
configured and able to execute distributed jobs in the Hadoop environment. To verify this, I ran the
Hadoop MapReduce example job that estimates the value of Pi.

Command: I executed the following command to run a sample MapReduce job using YARN: yarn jar
/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar pi 16 1000
The job was executed successfully, and the output was as follows:

Question 2: Hadoop NameNode and DataNode Directories

This section outlines where the NameNode stores its file system metadata (fsimage and edit logs) and
where the DataNode stores the blocks of data in the Hadoop Distributed File System (HDFS). These
locations are configured in Hadoop’s configuration files.
Configuration File Used: The following configuration file specifies the storage directories for both the
NameNode and DataNode:

Based on the configuration, the metadata is stored in: /tmp/hadoop-root/dfs/name

The data blocks are stored in the following directory: /tmp/hadoop-root/dfs/data

Verification: By accessing the NameNode and DataNode configuration directories, the following was
observed:

NameNode Metadata (fsimage and edit logs): The fsimage and edits logs were found in the
configured directory /tmp/hadoop-root/dfs/name. These files are critical for recovering the
HDFS state.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy