Introduction To Hadoop
Introduction To Hadoop
Topics
Introduction to Hadoop
Hadoop Architecture
Characteristics Hadoop
What is Hadoop?
What is Hadoop?
What is Hadoop?
What is Hadoop?
Hadoop History
Doug Cutting added Hadoop defeated
DFS & MapReduce Super computer
in
converted 4TB of
Doug Cutting started Doug Cutting
image archives over
working on joined Cloudera
100 EC2 instances
launched Hive,
SQL Support for Hadoop
Hadoop Nodes
Nodes
Hadoop Daemons
Nodes
Resource Node
Manager Manager
NameNode DataNode
Hadoop Characteristics
Open Source
• Can be redistributed
• Can be modified Inter- Open Affordable
operable
Source
No vendor
Community
lock
Centralized Processing
Distributed Processing
Certified Big Data & Hadoop Training – DataFlair
Fault Tolerance
Reliability
Scalability
Economic
Commodity
Open Source + Hardware = Economic
Data Locality
•
Data Data
Move computation to data
instead of data to computation
•
Data Data
Data is processed on the nodes
Storage Servers App Servers
where it is stored
Algo Algo
Data Data
Algorithm
Algo Algo
Data Data
Servers
Certified Big Data & Hadoop Training – DataFlair
Summary
• Everyday we generate 2.3 trillion GBs of data
• Hadoop handles huge volumes of data efficiently
• Hadoop uses the power of distributed computing
• HDFS & Yarn are two main components of Hadoop
• It is highly fault tolerant, reliable & available