0% found this document useful (0 votes)
38 views8 pages

Introduction To Hadoop

Uploaded by

lamaeidlm2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views8 pages

Introduction To Hadoop

Uploaded by

lamaeidlm2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Introduction to Hadoop

Certified Big Data & Hadoop Training – DataFlair

Topics

 Introduction to Hadoop

 Hadoop nodes & daemons

 Hadoop Architecture

 Characteristics Hadoop

Certified Big Data & Hadoop Training – DataFlair

What is Hadoop?

An Open Source framework that


allows distributed processing of
large data-sets across the cluster
of commodity hardware

Certified Big Data & Hadoop Training – DataFlair


What is Hadoop?

An Open Source framework that Open Source


allows distributed processing of
large data-sets across the cluster  Source code is freely available
of commodity hardware  It may be redistributed and
modified

Certified Big Data & Hadoop Training – DataFlair

What is Hadoop?

An open source framework that Distributed Processing


allows Distributed Processing of
large data-sets across the cluster  Data is processed distributedly
of commodity hardware on multiple nodes / servers
 Multiple machines processes
the data independently

Certified Big Data & Hadoop Training – DataFlair

What is Hadoop?

An open source framework that Cluster


allows distributed processing of
large data-sets across the Cluster  Multiple machines connected
of commodity hardware together
 Nodes are connected via LAN

Certified Big Data & Hadoop Training – DataFlair


What is Hadoop?

An open source framework that Commodity Hardware


allows distributed processing of
large data-sets across the cluster  Economic / affordable
of Commodity Hardware machines
 Typically low performance
hardware

Certified Big Data & Hadoop Training – DataFlair

What is Hadoop?

• Open source framework written in Java


• Inspired by Google's Map-Reduce programming model

Certified Big Data & Hadoop Training – DataFlair

Hadoop History
Doug Cutting added Hadoop defeated
DFS & MapReduce Super computer
in
converted 4TB of
Doug Cutting started Doug Cutting
image archives over
working on joined Cloudera
100 EC2 instances

2002 2003 2004 2005 2006 2007 2008 2009

published GFS & Hadoop became


Development of
MapReduce papers top-level project
started as Lucene sub-project

launched Hive,
SQL Support for Hadoop

Certified Big Data & Hadoop Training – DataFlair


Hadoop Components
Hadoop consists of three key parts

Certified Big Data & Hadoop Training – DataFlair

Hadoop Nodes
Nodes

Master Node Slave Node

Certified Big Data & Hadoop Training – DataFlair

Hadoop Daemons
Nodes

Master Node Slave Node

Resource Node
Manager Manager

NameNode DataNode

Certified Big Data & Hadoop Training – DataFlair


Basic Hadoop Architecture
Sub Work Sub Work Sub Work Sub Work

Sub Work Sub Work Sub Work Sub Work

Sub Work Sub Work Sub Work Sub Work

Sub Work Sub Work Sub Work Sub Work

Sub Work Sub Work Sub Work Sub Work

Work Sub Work Sub Work Sub Work Sub Work

Sub Work Sub Work Sub Work Sub Work

Sub Work Sub Work Sub Work Sub Work

Certified Big Data & Hadoop Training – DataFlair

Hadoop Characteristics

Certified Big Data & Hadoop Training – DataFlair

Open Source

• Source code is freely


available Free Transparent

• Can be redistributed
• Can be modified Inter- Open Affordable
operable
Source

No vendor
Community
lock

Certified Big Data & Hadoop Training – DataFlair


Distributed Processing

• Data is processed distributedly


on cluster
• Multiple nodes in the cluster
process data independently

Centralized Processing

Distributed Processing
Certified Big Data & Hadoop Training – DataFlair

Fault Tolerance

• Failure of nodes are recovered


automatically
• Framework takes care of failure
of hardware as well tasks

Certified Big Data & Hadoop Training – DataFlair

Reliability

• Data is reliably stored on the


cluster of machines despite
machine failures
• Failure of nodes doesn’t
cause data loss

Certified Big Data & Hadoop Training – DataFlair


High Availability

• Data is highly available and


accessible despite hardware
failure
• There will be no downtime for
end user application due to
data

Certified Big Data & Hadoop Training – DataFlair

Scalability

• Vertical Scalability – New


hardware can be added to the
nodes

• Horizontal Scalability – New


nodes can be added on the fly

Certified Big Data & Hadoop Training – DataFlair

Economic

• No need to purchase costly license


• No need to purchase costly hardware

Commodity
Open Source + Hardware = Economic

Certified Big Data & Hadoop Training – DataFlair


Easy to Use

• Distributed computing challenges


are handled by framework
• Client just need to concentrate on
business logic

Certified Big Data & Hadoop Training – DataFlair

Data Locality


Data Data
Move computation to data
instead of data to computation

Data Data
Data is processed on the nodes
Storage Servers App Servers
where it is stored
Algo Algo
Data Data
Algorithm
Algo Algo
Data Data

Servers
Certified Big Data & Hadoop Training – DataFlair

Summary
• Everyday we generate 2.3 trillion GBs of data
• Hadoop handles huge volumes of data efficiently
• Hadoop uses the power of distributed computing
• HDFS & Yarn are two main components of Hadoop
• It is highly fault tolerant, reliable & available

Certified Big Data & Hadoop Training – DataFlair

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy