Ch6 Architectural Design v1
Ch6 Architectural Design v1
14014305-3
Slides are utilized from ISE:4172 Big Data Analytics by Stephen Baek,
University of Iowa, with appreciation for their educational contribution.
Overview of Hadoop
What is Hadoop and why is it useful?
Last Time...
◉ Local Machine:
○ Uses own computational resources
◉ Distributed System:
○ Utilizes resources across network
◉ Vertical Scaling:
○ Adding to single machine, expensive
◉ Horizontal Scaling:
○ Adding computers via network, cost-effective
What is Hadoop
7
Hadoop Ecosystem
Hadoop Ecosystem
• Apache Mesos
• Manages computer clusters similarly to YARN
• Handles task scheduling and resource management within the
cluster
• Apache Spark
• Fast and widely adopted technology within the ecosystem
• Offers significant performance improvements over
MapReduce
• Supports multiple programming languages, including Scala,
Java, and Python
Motivation: Project Management
👦 👩
John Sarah
👧 Tricia
👳 👨
(Project Manager)
Sanjay Bob
Motivation: Project Management
“Metadata”
● John: A
👦 👩
Project A ● Sarah: B Project B
● Sanjay: C
● Bob: D
John Sarah
👧 Tricia
👳 👨
(Project Manager)
Project C Project D
Sanjay Bob
Motivation: Project Management
“Metadata”
● John: A
👦 👩
Project A ● Sarah: B Project B
● Sanjay: C
● Bob: D
John Sarah
👧 Tricia
👳 😖
(Project Manager)
Project C Project D
Sanjay Bob
Motivation: Project Management
“Metadata”
● John: A
👦 👩
Project A (D) ● Sarah: B Project B
● Sanjay: C
● Bob: D
John Sarah
👧 Tricia
👳 😖
(Project Manager)
Project C Project D
Sanjay Bob
Motivation: Project Management
“Metadata”
● John: A (D)
👦 👩
Project A (D) ● Sarah: B (C) Project B (C)
● Sanjay: C (A)
● Bob: D (B)
John Sarah
👧 Tricia
👳 👨
(Project Manager)
Sanjay Bob
Hadoop Master/Slave Architecture
“Metadata”
● Slave 1: A (D)
💻 💻
Project A (D) ● Slave 2: B (C) Project B (C)
● Slave 3: C (A)
● Slave 4: D (B)
Slave Node Slave Node
💻
Master Node
Course
Course ID
Title Room
Instructor Room No.
Room No. Capacity
Computers (Y/N)
Multimedia (Y/N)
Course
Course ID
Title Room
Instructor Room No.
Room No. Capacity
Computers (Y/N)
Multimedia (Y/N)
...
• SQL databases:
• Organized in a logical form with interrelated tables and
compatible keys.
• Hadoop:
• Data stored in compressed files within the Hadoop
Distributed File System (HDFS).
• Replicated across multiple machines for fault tolerance.
• Master node tracks replicated data locations.
• Power of Hadoop lies in parallel distribution of tasks.
Hadoop vs SQL
⏳
...
⏳
Hadoop (Return whatever is SQL (Two-phase Commit)
currently available)
Hadoop vs SQL