0% found this document useful (0 votes)

6 views1 page

HDFS

Uploaded by

realmex7max5g

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views1 page

HDFS

Uploaded by

realmex7max5g

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

The Hadoop Distributed File System (HDFS) is a core component of the Apache Hadoop ecosystem, designed

for reliable, scalable, and distributed storage of massive data across clusters of commodity hardware. Its
architecture and design principles are optimized for big data workloads, focusing on high throughput rather
than low latency. Here’s a breakdown of its design principles, components, and functionalities:
1.Design Principles of HDFS
Scalability • HDFS can scale horizontally to accommodate petabytes of data across thousands of machines
(nodes).•The system is built to expand seamlessly by adding more commodity hardware.
Fault Tolerance •Data is replicated across multiple nodes to ensure fault tolerance. If one node fails, copies of
the data are available on other nodes.•Automatic recovery of data is handled by re-replication mechanisms.
High Throughput •Optimized for batch processing of large datasets rather than low-latency access.•Large block
sizes (e.g., 128MB, 256MB) and sequential reads improve efficiency.
Cost Efficiency •Built to run on inexpensive, commodity hardware instead of high-end, specialized servers.
Write-Once, Read-Many •HDFS follows a write-once-read-many model, meaning data can be written once but
read multiple times. •This simplifies the system’s design and is ideal for data analytics workloads.
2.HDFS Architecture HDFS follows a master-slave architecture with three main components:
NameNode (Master Node) •\Role: Manages the file system metadata (file names, directories, blocks, and their
locations). •It does not store the actual data but keeps a metadata map of where each block of data resides
across the cluster./Responsibilities: •Maintains the namespace hierarchy (file-to-block mapping). •Tracks
replication levels of data blocks. •Coordinates file operations like opening, closing, and renaming. •Detects
node failures and triggers data re-replication. •High Availability: A standby NameNode can be set up to avoid a
single point of failure.
DataNodes (Slave Nodes) /Role: •Store actual data blocks.DataNodes communicate with the NameNode to
send block reports and health status./Responsibilities: •Store, retrieve, and replicate data blocks upon the
NameNode’s request. •Periodically send “heartbeats” to the NameNode to report their status. •If a DataNode
fails, the NameNode initiates replication to other nodes.
3.HDFS Blocks •HDFS divides files into fixed-size blocks (default 128MB or 256MB) for storage. •Blocks are
stored across multiple DataNodes for redundancy./Advantages of Blocks:•Simplifies storage management.
•Optimized for sequential reads of large datasets.
4.Replication Mechanism •HDFS uses data replication to ensure fault tolerance. •By default, each block is
replicated three times across different DataNodes. Replication Policy: •One replica is stored on the same rack
as the client. •Another replica is stored on a different rack to avoid single-rack failures. •The replication factor
can be configured based on requirements.
5.HDFS High Availability (HA) •To address the single point of failure of the NameNode, HDFS supports High
Availability:•Active NameNode: Handles client requests.•Standby NameNode: Mirrors the metadata and takes
over in case of failure.
6.File Access Process in HDFS
Here’s how a file is read and written in HDFS:
File Write Workflow•Client Interaction: The client contacts the NameNode to create a file.•Block Allocation:
The NameNode assigns blocks and selects DataNodes for block storage.•Data Streaming: The client writes data
directly to the DataNodes in a pipeline (not to the NameNode).
•Replication: DataNodes replicate the blocks to other nodes based on the replication factor.•Acknowledgment:
Once blocks are written and replicated, acknowledgments are sent back to the client.
7.File Read Workflow
•Client Query: The client contacts the NameNode to fetch the file metadata (block locations).•Data Retrieval:
The client reads data directly from the nearest DataNodes for better efficiency. •Sequential Reads: HDFS
ensures optimized reads for large datasets.•Rack Awareness HDFS is rack-aware to improve fault tolerance and
data locality. •The NameNode uses a rack topology to place data replicas strategically:•Ensures at least one
copy resides on a different rack.•Reduces inter-rack traffic for read/write operations.
8.Key Strengths and Limitations of HDFS
•Handles large files efficiently. •Built-in fault tolerance and reliability. •Linear scalability across thousands of
nodes. •Cost-effective, using commodity hardware.
Limitations:•Not suitable for small files: HDFS is optimized for large files; storing many small files can overload
the NameNode metadata. •High latency: Not ideal for real-time applications.
•Write-once limitation: Files cannot be updated once written.

Bigdata Unit 3
No ratings yet
Bigdata Unit 3
96 pages
UNIT 3 FULL
No ratings yet
UNIT 3 FULL
89 pages
Big Data Refers to Extremely Large and Complex Datasets That 1
No ratings yet
Big Data Refers to Extremely Large and Complex Datasets That 1
421 pages
Unit-4 BDA as on 25-11-2024
No ratings yet
Unit-4 BDA as on 25-11-2024
258 pages
Unit-4 BDA as on 25-11-2024
No ratings yet
Unit-4 BDA as on 25-11-2024
248 pages
HDFS
No ratings yet
HDFS
16 pages
Big data aktu unit 3
No ratings yet
Big data aktu unit 3
90 pages
Unit-Iv CC&BD CS71
No ratings yet
Unit-Iv CC&BD CS71
148 pages
SPA2018proceedings PDF
No ratings yet
SPA2018proceedings PDF
376 pages
(17CS82) 8 Semester CSE: Big Data Analytics
No ratings yet
(17CS82) 8 Semester CSE: Big Data Analytics
169 pages
unit IV
No ratings yet
unit IV
248 pages
DeviceNet - Trouble Shooting The Network
0% (1)
DeviceNet - Trouble Shooting The Network
22 pages
Unit-2 Introduction To Hadoop
No ratings yet
Unit-2 Introduction To Hadoop
19 pages
Intecon Tersus SolidflowBVH2478GB
No ratings yet
Intecon Tersus SolidflowBVH2478GB
178 pages
Unit 2 Da Material
No ratings yet
Unit 2 Da Material
71 pages
Hadoop Architecture
No ratings yet
Hadoop Architecture
84 pages
092910100C - Manual, User, N100, N100H
No ratings yet
092910100C - Manual, User, N100, N100H
167 pages
Big Data Unit-III
No ratings yet
Big Data Unit-III
39 pages
BIGDTA_UNIT_3
No ratings yet
BIGDTA_UNIT_3
65 pages
Hadoop Architecture
No ratings yet
Hadoop Architecture
48 pages
PORTSERVER TS 8-16, 8-16 MEI - Manual
No ratings yet
PORTSERVER TS 8-16, 8-16 MEI - Manual
140 pages
BDT - Unit - II - Hdfs and Hadoop Io
No ratings yet
BDT - Unit - II - Hdfs and Hadoop Io
42 pages
HDFS 3
No ratings yet
HDFS 3
51 pages
Introduction To Hadoop Ecosystem
No ratings yet
Introduction To Hadoop Ecosystem
46 pages
3.1 Hadoop Ecosystem
No ratings yet
3.1 Hadoop Ecosystem
48 pages
Hadoop File System: B. Ramamurthy
No ratings yet
Hadoop File System: B. Ramamurthy
36 pages
Hadoop Distributed File System (HDFS) : Suresh Pathipati
No ratings yet
Hadoop Distributed File System (HDFS) : Suresh Pathipati
43 pages
HDFS
No ratings yet
HDFS
37 pages
IMTC634_Data Science_Chapter 14
No ratings yet
IMTC634_Data Science_Chapter 14
22 pages
Unit 1 and II Operating System DR - Ashish
No ratings yet
Unit 1 and II Operating System DR - Ashish
89 pages
Hadoop Distributed File System: Bhavneet Kaur B.Tech Computer Science 2 Year
No ratings yet
Hadoop Distributed File System: Bhavneet Kaur B.Tech Computer Science 2 Year
34 pages
Hadoop File System: B. Ramamurthy
No ratings yet
Hadoop File System: B. Ramamurthy
36 pages
BDAV_QB
No ratings yet
BDAV_QB
88 pages
Unit 2 Hadoop
No ratings yet
Unit 2 Hadoop
67 pages
UNIT-3-1 (1)
No ratings yet
UNIT-3-1 (1)
20 pages
Rob Jordan & Chris Livdahl
No ratings yet
Rob Jordan & Chris Livdahl
32 pages
Apex Institute of Technology: Big Data Security
No ratings yet
Apex Institute of Technology: Big Data Security
30 pages
TBMR 722 UserGuide
No ratings yet
TBMR 722 UserGuide
75 pages
Big Data-UNIT-2
No ratings yet
Big Data-UNIT-2
46 pages
CS8382 - Digital Systems Laboratory Manual - by LearnEngineering - in
No ratings yet
CS8382 - Digital Systems Laboratory Manual - by LearnEngineering - in
61 pages
Hadoop File System: B. Ramamurthy
No ratings yet
Hadoop File System: B. Ramamurthy
36 pages
HDFS
No ratings yet
HDFS
11 pages
HDFS Internals
No ratings yet
HDFS Internals
30 pages
UI Test Automation of Windows Applications With Appium and WinAppDriver in CSharp
No ratings yet
UI Test Automation of Windows Applications With Appium and WinAppDriver in CSharp
49 pages
Notes - 3 Unit neha
No ratings yet
Notes - 3 Unit neha
25 pages
The Hadoop Distributed File System
No ratings yet
The Hadoop Distributed File System
29 pages
BD U-3 Notes
No ratings yet
BD U-3 Notes
27 pages
Automation 2
No ratings yet
Automation 2
22 pages
BD U-3 (Anupam Sir)
No ratings yet
BD U-3 (Anupam Sir)
23 pages
HDFS Architecture Guide: by Dhruba Borthakur
No ratings yet
HDFS Architecture Guide: by Dhruba Borthakur
13 pages
BG 345
No ratings yet
BG 345
26 pages
HDFS
No ratings yet
HDFS
22 pages
BIG DATA - Unit 4 HADOOP AND MAP REDUCE -mini xerox - easy read
No ratings yet
BIG DATA - Unit 4 HADOOP AND MAP REDUCE -mini xerox - easy read
16 pages
Read Write in HDFS
No ratings yet
Read Write in HDFS
6 pages
Hadoop Distributed File System
No ratings yet
Hadoop Distributed File System
14 pages
Big Data Huawei Course
No ratings yet
Big Data Huawei Course
12 pages
Reflective Essay of Computer Networks
No ratings yet
Reflective Essay of Computer Networks
20 pages
HDFS Unit 4
No ratings yet
HDFS Unit 4
8 pages
DLSR Moot Court Competition - 2021
No ratings yet
DLSR Moot Court Competition - 2021
9 pages
Darktrace VSensors and Amazon VPC Mirroring
No ratings yet
Darktrace VSensors and Amazon VPC Mirroring
15 pages
DC MOD 6
No ratings yet
DC MOD 6
9 pages
BDA Mid 2
No ratings yet
BDA Mid 2
21 pages
Exp1 Bda
No ratings yet
Exp1 Bda
11 pages
MySQL Cheatsheet CodeWithHarry
No ratings yet
MySQL Cheatsheet CodeWithHarry
15 pages
AD Lecture #2
No ratings yet
AD Lecture #2
10 pages
ControlXTPB Setup Guide
No ratings yet
ControlXTPB Setup Guide
11 pages
Daily Lesson Log Basiao National High School 10 Mixed Computer System Servicing Monday Tuesday Wednesday Thursday Friday
No ratings yet
Daily Lesson Log Basiao National High School 10 Mixed Computer System Servicing Monday Tuesday Wednesday Thursday Friday
6 pages
Assessing Artificial Intelligence For Humanity Will AI Be The Our Biggest Ever Advance or The Biggest Threat Opinion
No ratings yet
Assessing Artificial Intelligence For Humanity Will AI Be The Our Biggest Ever Advance or The Biggest Threat Opinion
9 pages
Activity 3
No ratings yet
Activity 3
5 pages
What is HDFS
No ratings yet
What is HDFS
3 pages
Big Data Lecture # 05
No ratings yet
Big Data Lecture # 05
22 pages
Big Data Unit 3 by Multi Atoms
No ratings yet
Big Data Unit 3 by Multi Atoms
6 pages
Paper Hdfs Summary
No ratings yet
Paper Hdfs Summary
5 pages
Pasino Api Documentation
No ratings yet
Pasino Api Documentation
9 pages
EEET2597 FinalAssignment KimLe s3634824
No ratings yet
EEET2597 FinalAssignment KimLe s3634824
6 pages
BDA Module-2
No ratings yet
BDA Module-2
7 pages
Fireflink
No ratings yet
Fireflink
4 pages
Ic 694 MDL 660
No ratings yet
Ic 694 MDL 660
5 pages
Catia Help HVAC-DESIGN-5
No ratings yet
Catia Help HVAC-DESIGN-5
5 pages
HDFS
No ratings yet
HDFS
3 pages
MajorKey ISR - Telemarketing Scripts
No ratings yet
MajorKey ISR - Telemarketing Scripts
2 pages
How To Customize Cipher Suites in SSLContext - Properties File
No ratings yet
How To Customize Cipher Suites in SSLContext - Properties File
3 pages
Data Science and Artificial Intelligence
No ratings yet
Data Science and Artificial Intelligence
2 pages
FD 16GB
No ratings yet
FD 16GB
2 pages
Location Intelligence - Use Geospatial Data & Analytics To ..
No ratings yet
Location Intelligence - Use Geospatial Data & Analytics To ..
2 pages
Binary Bug Automatic Binary Trading
No ratings yet
Binary Bug Automatic Binary Trading
6 pages
Reliability and Architecture of HDFS: Definitive Reference for Developers and Engineers
From Everand
Reliability and Architecture of HDFS: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Learn Hbase in 24 Hours
From Everand
Learn Hbase in 24 Hours
Alex Nordeen
No ratings yet
Learn Cassandra in 24 Hours
From Everand
Learn Cassandra in 24 Hours
Alex Nordeen
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

HDFS

Uploaded by

HDFS

Uploaded by

The Hadoop Distributed File System (HDFS) is a core component of the Apache Hadoop ecosystem, designed

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.