0% found this document useful (0 votes)
10 views6 pages

hadoop

Hadoop is an open-source framework for storing and processing large amounts of data, primarily using HDFS for storage and YARN for resource management. It includes additional modules like Hive, Pig, and HBase for enhanced functionality and is widely used in big data applications. Key features include fault tolerance, high availability, and cost-effectiveness, although it is not suitable for small data quantities.

Uploaded by

1mp22ad002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views6 pages

hadoop

Hadoop is an open-source framework for storing and processing large amounts of data, primarily using HDFS for storage and YARN for resource management. It includes additional modules like Hive, Pig, and HBase for enhanced functionality and is widely used in big data applications. Key features include fault tolerance, high availability, and cost-effectiveness, although it is not suitable for small data quantities.

Uploaded by

1mp22ad002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

What is Hadoop?

Hadoop is an open source software programming framework for storing a large amount of data and
performing the computation. Its framework is based on Java programming with some native code in
C and shell scripts.
Hadoop has two main components:
 HDFS (Hadoop Distributed File System): This is the storage component of Hadoop, which
allows for the storage of large amounts of data across multiple machines. It is designed to work
with commodity hardware, which makes it cost-effective.
 YARN (Yet Another Resource Negotiator): This is the resource management component of
Hadoop, which manages the allocation of resources (such as CPU and memory) for processing
the data stored in HDFS.
 Hadoop also includes several additional modules that provide additional functionality, such as
Hive (a SQL-like query language), Pig (a high-level platform for creating MapReduce
programs), and HBase (a non-relational, distributed database).
 Hadoop is commonly used in big data scenarios such as data warehousing, business
intelligence, and machine learning.

Features of hadoop:
1. it is fault tolerance.
2. it is highly available.
3. it’s programming is easy.
4. it have huge flexible storage.
5. it is low cost.
Advantages of HDFS: It is inexpensive, immutable in nature, stores data reliably, ability to tolerate
faults, scalable, block structured, can process a large amount of data simultaneously and many
more. Disadvantages of HDFS: It’s the biggest disadvantage is that it is not fit for small quantities
of data. Also, it has issues related to potential stability, restrictive and rough in nature.
Commands:
1. ls: This command is used to list all the files. Use lsr for recursive approach. It is useful when
we want a hierarchy of a folder.
Syntax:
bin/hdfs dfs -ls <path>
Example:
bin/hdfs dfs -ls /
It will print all the directories present in HDFS. bin directory contains executables
so, bin/hdfs means we want the executables of hdfs particularly dfs(Distributed File System)
commands.

2. mkdir: To create a directory. In Hadoop dfs there is no home directory by default. So let’s first
create it.
Syntax:
bin/hdfs dfs -mkdir <folder name>

creating home directory:

hdfs/bin -mkdir /user


hdfs/bin -mkdir /user/username -> write the username of your computer

touchz: It creates an empty file.


Syntax:
bin/hdfs dfs -touchz <file_path>
Example:
bin/hdfs dfs -touchz /geeks/myfile.txt

1. copyFromLocal (or) put: To copy files/folders from local file system to hdfs store. This is the
most important command. Local filesystem means the files present on the OS.
Syntax:
bin/hdfs dfs -copyFromLocal <local file path> <dest(present on hdfs)>
Example: Let’s suppose we have a file AI.txt on Desktop which we want to copy to
folder geeks present on hdfs.
bin/hdfs dfs -copyFromLocal ../Desktop/AI.txt /geeks

(OR)

bin/hdfs dfs -put ../Desktop/AI.txt /geeks


1. cat: To print file contents.
Syntax:
bin/hdfs dfs -cat <path>
Example:
// print the content of AI.txt present
// inside geeks folder.
bin/hdfs dfs -cat /geeks/AI.txt ->
2. copyToLocal (or) get: To copy files/folders from hdfs store to local file system.
Syntax:
bin/hdfs dfs -copyToLocal <<srcfile(on hdfs)> <local file dest>
Example:
bin/hdfs dfs -copyToLocal /geeks ../Desktop/hero

(OR)

bin/hdfs dfs -get /geeks/myfile.txt ../Desktop/hero


myfile.txt from geeks folder will be copied to folder hero present on Desktop.

Note: Observe that we don’t write bin/hdfs while checking the things present on local
filesystem.
3. moveFromLocal: This command will move file from local to hdfs.
Syntax:
bin/hdfs dfs -moveFromLocal <local src> <dest(on hdfs)>
Example:
bin/hdfs dfs -moveFromLocal ../Desktop/cutAndPaste.txt /geeks

4. cp: This command is used to copy files within hdfs. Lets copy folder geeks to geeks_copied.
Syntax:
bin/hdfs dfs -cp <src(on hdfs)> <dest(on hdfs)>
Example:
bin/hdfs dfs -cp /geeks /geeks_copied

5. mv: This command is used to move files within hdfs. Lets cut-paste a
file myfile.txt from geeks folder to geeks_copied.
Syntax:
bin/hdfs dfs -mv <src(on hdfs)> <src(on hdfs)>
Example:
bin/hdfs dfs -mv /geeks/myfile.txt /geeks_copied

6. rmr: This command deletes a file from HDFS recursively. It is very useful command when you
want to delete a non-empty directory.
Syntax:
bin/hdfs dfs -rmr <filename/directoryName>
Example:
bin/hdfs dfs -rmr /geeks_copied -> It will delete all the content inside the
directory then the directory itself.

7. du: It will give the size of each file in directory.


Syntax:
bin/hdfs dfs -du <dirName>
Example:
bin/hdfs dfs -du /geeks

1. dus:: This command will give the total size of directory/file.


Syntax:
bin/hdfs dfs -dus <dirName>
Example:
bin/hdfs dfs -dus /geeks

1. stat: It will give the last modified time of directory or path. In short it will give stats of the
directory or file.
Syntax:
bin/hdfs dfs -stat <hdfs file>
Example:
bin/hdfs dfs -stat /geeks
2. setrep: This command is used to change the replication factor of a file/directory in HDFS. By
default it is 3 for anything which is stored in HDFS (as set in hdfs core-site.xml).
Example 1: To change the replication factor to 6 for geeks.txt stored in HDFS.
bin/hdfs dfs -setrep -R -w 6 geeks.txt
Example 2: To change the replication factor to 4 for a directory geeksInput stored in HDFS.
bin/hdfs dfs -setrep -R 4 /geeks
Note: The -w means wait till the replication is completed. And -R means recursively, we use it
for directories as they may also contain many files and folders inside them.

Note: There are more commands in HDFS but we discussed the commands which are commonly
used when working with Hadoop. You can check out the list of dfs commands using the following
command:
bin/hdfs dfs

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy