Docker
Docker
yaml
The docker-compose.yaml file sets up a Hadoop cluster with four key services:
Each service uses the apache/hadoop:3 image, with mounted configuration files for proper setup and
operation.
2. core-site.xml
fs.defaultFS: Configures the default file system as HDFS with the address
hdfs://namenode:9000. This points to the NameNode service running on port 9000.
dfs.datanode.data.dir: Defines where the DataNodes will store their block data (/tmp/hadoop-
root/dfs/data).
3. mapred-site.xml
4.yarn-site.xml
This file configures the YARN (Yet Another Resource Negotiator) settings.
After running docker-compose up, the Hadoop cluster was successfully deployed with four services:
NameNode, DataNode, ResourceManager, and NodeManager. Configuration files were mounted, and
ports were exposed for the NameNode (9870) and ResourceManager (8088) web interfaces. The cluster
is now fully operational for distributed data storage and processing.
When running docker-compose up, Docker pulls the apache/hadoop:3 image, creates the NameNode,
DataNode, ResourceManager, and NodeManager containers, and starts them. The logs display real-
time service initialization. The web interfaces are accessible at:
NameNode: http://localhost:9870
ResourceManager: http://localhost:8088
Command: docker ps
As part of the Hadoop deployment, the docker ps command was used to check the status of the running
Docker containers. This command revealed that the Hadoop components are successfully running across
multiple containers. These components are essential for the distributed file system (HDFS) and resource
management functionalities of the Hadoop ecosystem.
Tests:
To verify the health and status of the Hadoop containers, I used the following command to check if all
containers are up and running: docker-compose ps
As shown below, all four key Hadoop components are listed with the status "Up," confirming that they
are running correctly:
hadoop-datanode-1: DataNode, which handles the storage on the HDFS, is running and
operating in conjunction with the NameNode.
This output confirms that all services required for the Hadoop environment (NameNode, DataNode,
ResourceManager, and NodeManager) are running as expected.
Command: To interact with the Hadoop container, I executed the following command to access the
running NameNode container: docker exec -it b260b8e4e5ec bash
This command allows me to open an interactive shell inside the hadoop-namenode-1 container
(container ID b260b8e4e5ec).
Once inside the container, I used the following commands to interact with HDFS:
This command creates a new directory called /test1 in the Hadoop distributed file system
(HDFS).
This output verifies that the HDFS is functional, as the directory /test1 was created successfully and is
visible when listing the directory contents.
This test was conducted to ensure that the YARN (Yet Another Resource Negotiator) system is properly
configured and able to execute distributed jobs in the Hadoop environment. To verify this, I ran the
Hadoop MapReduce example job that estimates the value of Pi.
Command: I executed the following command to run a sample MapReduce job using YARN: yarn jar
/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar pi 16 1000
The job was executed successfully, and the output was as follows:
This section outlines where the NameNode stores its file system metadata (fsimage and edit logs) and
where the DataNode stores the blocks of data in the Hadoop Distributed File System (HDFS). These
locations are configured in Hadoop’s configuration files.
Configuration File Used: The following configuration file specifies the storage directories for both the
NameNode and DataNode:
Verification: By accessing the NameNode and DataNode configuration directories, the following was
observed:
NameNode Metadata (fsimage and edit logs): The fsimage and edits logs were found in the
configured directory /tmp/hadoop-root/dfs/name. These files are critical for recovering the
HDFS state.