0% found this document useful (0 votes)

20 views

Unit 1 Notes

The document discusses the key characteristics of big data including volume, velocity, variety, veracity and value. It also covers traditional business intelligence vs big data, what is big data analytics, types of big data analytics and difference between big data analytics and engineering.

Uploaded by

sanchitghare

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

Unit 1 Notes

Uploaded by

sanchitghare

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

1.

What are characteristics of Big Data:

Big data is characterized by several key features, often referred to as the "3Vs" -
Volume, Velocity, and Variety. Additionally, two more Vs - Veracity and Value -
are sometimes included to provide a more comprehensive understanding. Here
are the characteristics of big data:

1. Volume: Big data refers to datasets that are extremely large in size, far beyond
the capacity of traditional data processing systems to manage, store, and analyze
efficiently. The volume of data can range from terabytes to petabytes and even
exabytes.
2. Velocity: Big data is generated and collected at an unprecedented speed. Data
streams in continuously from various sources such as social media, sensors, web
logs, and transactions. The velocity of data refers to the rate at which data is
generated, captured, and processed in real-time or near real-time.
3. Variety: Big data comes in various formats and types, including structured, semi-
structured, and unstructured data. Structured data, such as relational databases,
follows a predefined schema. Semi-structured data, like JSON or XML files, has
some organization but lacks a fixed schema. Unstructured data, such as text,
images, audio, and video, lacks any predefined structure.
4. Veracity: Veracity refers to the quality, accuracy, and reliability of the data. Big
data sources may include noisy, incomplete, inconsistent, or erroneous data.
Ensuring data veracity involves assessing data quality, detecting and correcting
errors, and maintaining data integrity throughout the data lifecycle.
5. Value: The ultimate goal of big data analysis is to derive meaningful insights,
actionable intelligence, and business value from the vast amounts of data
collected. Extracting value from big data involves applying advanced analytics
techniques, such as data mining, machine learning, and predictive modeling, to
uncover patterns, trends, correlations, and hidden knowledge that can inform
decision-making, drive innovation, and optimize processes.

2. Compare Traditional Business Intelligence with Big Data?

3. What is Big Data Analytics:
4. Types of Big Data Analytics:

5. Difference between big data analytics and big data engineering:

 In a big data analytics case study, you might explore how a company utilized
large datasets to gain insights, make data-driven decisions, or improve
business processes.
 For instance, a retail company could analyze customer purchasing patterns to
optimize inventory and marketing strategies.
 On the other hand, a big data engineering case study would focus on the
technical aspects of handling massive datasets.
 It could detail how a company redesigned its data architecture, implemented
data pipelines, or scaled its infrastructure to efficiently process and store
large volumes of data.
 An example might involve a technology firm enhancing its data storage and
processing capabilities to accommodate growing data volumes.
 In summary, big data analytics case studies highlight the strategic use of data
for business insights, while big data engineering case studies showcase the
technical solutions and infrastructure developed to handle large-scale data
processing.

6. Explain architecture of a Hadoop:

Hadoop is an open-source framework for distributed storage and processing

of large-scale datasets across clusters of commodity hardware. The
architecture of Hadoop consists of several key components, each playing a
specific role in the storage, processing, and management of data. Here's an
overview of the architecture of Hadoop:

1. Hadoop Distributed File System (HDFS):

 HDFS is the primary storage layer of Hadoop, designed to store large
datasets reliably across a cluster of machines.
 It follows a master-slave architecture with two main components:
NameNode and DataNode.
 NameNode: Manages the metadata of the file system, including the
namespace, file-to-block mapping, and access control.
 DataNode: Stores the actual data blocks and manages read and write
operations on the data.
2. Yet Another Resource Negotiator (YARN):
 YARN is the resource management and job scheduling component of
Hadoop.
 It allows multiple data processing engines to run on top of Hadoop,
enabling diverse workloads such as MapReduce, Apache Spark, Apache
Flink, and Apache Hive.
 YARN consists of ResourceManager and NodeManager.
 ResourceManager: Manages cluster resources, allocates containers,
and schedules application tasks.
 NodeManager: Runs on each worker node and manages resources
such as CPU, memory, and disk on that node.
3. MapReduce:
 MapReduce is a programming model and processing engine for
distributed data processing in Hadoop.
 It divides data processing tasks into two phases: Map and Reduce.
 Map: Processes input data and produces intermediate key-value pairs.
 Reduce: Aggregates and combines intermediate key-value pairs to
generate the final output.
 MapReduce jobs are submitted to the YARN ResourceManager for
execution.
4. Hadoop Common:
 Hadoop Common contains libraries and utilities shared by other
Hadoop modules.
 It provides common functionalities such as authentication,
configuration, logging, and networking.
5. Hadoop Ecosystem:
 Hadoop ecosystem consists of various projects and tools built on top
of Hadoop core components to extend its capabilities.
 Examples include Apache Hive for SQL-like querying, Apache Pig for
data flow scripting, Apache HBase for NoSQL database, Apache Spark
for in-memory processing, Apache Kafka for real-time data streaming,
and many others.

The architecture of Hadoop is designed to be scalable, fault-tolerant, and

cost-effective, making it suitable for processing and analyzing large volumes
of data across distributed clusters. It enables organizations to store, process,
and derive insights from big data, driving innovation, decision-making, and
business value across various industries.

7. What are the different components of HDFS:

The Hadoop Distributed File System (HDFS) comprises several components

that work together to store and manage large datasets across a cluster of
machines. These components include:

1. NameNode:
 The NameNode is the master node in the HDFS architecture.
 It manages the metadata of the file system, including the namespace
hierarchy, file permissions, and file-to-block mappings.
 The NameNode stores metadata in memory for faster access and
periodically persists it to the disk in the form of the fsimage and edits
log files.
 The failure of the NameNode can lead to the unavailability of the entire
file system, making it a single point of failure. To mitigate this, Hadoop
provides mechanisms like high availability (HA) through a secondary
NameNode and tools like Hadoop Federation and Hadoop Cluster.
2. DataNode:
 DataNodes are worker nodes in the HDFS architecture.
 They store the actual data blocks that make up the files in HDFS.
 DataNodes communicate with the NameNode to report the list of
blocks they are storing and to replicate or delete blocks based on
instructions from the NameNode.
 DataNodes are responsible for serving read and write requests from
clients and other Hadoop components.
3. Secondary NameNode:
 Despite its name, the Secondary NameNode does not act as a standby
or backup NameNode.
 Its primary role is to periodically merge the fsimage and edits log files
produced by the NameNode to prevent them from growing
indefinitely.
 The Secondary NameNode generates a new combined image of the file
system, which is then sent back to the NameNode to replace the
current fsimage file.
 This process helps reduce the startup time of the NameNode in case of
failure and minimizes the risk of data loss in the event of NameNode
failure.

8. What are different components of YARN:

YARN (Yet Another Resource Negotiator) is the resource management and job
scheduling component of Hadoop. It enables multiple data processing
engines to run on top of Hadoop, allowing for diverse workloads such as
MapReduce, Apache Spark, Apache Flink, and Apache Hive. YARN consists of
several key components that work together to manage resources and
schedule tasks efficiently across a Hadoop cluster. These components include:

1. ResourceManager (RM):
 The ResourceManager is the master daemon in the YARN architecture.
 It is responsible for managing and allocating cluster resources among
different applications.
 The ResourceManager consists of two main components:
 Scheduler: Allocates resources to various applications based on
their resource requirements, scheduling policies, and constraints.
 ApplicationManager: Manages the lifecycle of applications
running on the cluster, including submission, monitoring, and
termination.
2. NodeManager (NM):
 NodeManagers are worker nodes in the YARN architecture.
 They run on each node in the Hadoop cluster and are responsible for
managing resources such as CPU, memory, and disk on that node.
 NodeManagers report resource availability and health status to the
ResourceManager and execute tasks allocated to them by the
ResourceManager.
 NodeManagers monitor the resource usage of containers running on
the node and report back to the ResourceManager for resource
accounting and monitoring.
3. ApplicationMaster (AM):
 The ApplicationMaster is a per-application component responsible for
coordinating and managing the execution of a specific application on
the cluster.
 When a client submits an application to run on the cluster, YARN
launches an ApplicationMaster instance for that application.
 The ApplicationMaster negotiates with the ResourceManager for
resources, requests containers from NodeManagers, monitors the
progress of tasks, and handles failures and retries.
 Each application running on the cluster has its own ApplicationMaster
instance, ensuring isolation and resource management at the
application level.

9. Explain commands of HDFS:

In HDFS (Hadoop Distributed File System), you interact with the file system
using command-line tools or APIs provided by Hadoop. Below are some
commonly used commands for interacting with HDFS:

1. hadoop fs:
 This is the main command used to interact with HDFS. It has various
subcommands to perform different operations.
2. hadoop fs -ls:
 Lists the contents of a directory in HDFS.
 Example: hadoop fs -ls /user
3. hadoop fs -mkdir:
 Creates a directory in HDFS.
 Example: hadoop fs -mkdir /user/mydirectory
4. hadoop fs -put:
 Copies files or directories from the local file system to HDFS.
 Example: hadoop fs -put localfile.txt
/user/mydirectory
5. hadoop fs -get:
 Copies files or directories from HDFS to the local file system.
 Example: hadoop fs -get /user/mydirectory/hdfsfile.txt
localfile.txt
6. hadoop fs -rm:
 Deletes files or directories in HDFS.
 Example: hadoop fs -rm /user/mydirectory/hdfsfile.txt
7. hadoop fs -cat:
 Displays the contents of a file in HDFS.
 Example: hadoop fs -cat /user/mydirectory/hdfsfile.txt
8. hadoop fs -copyToLocal:
 Copies files or directories from HDFS to the local file system.
 Example: hadoop fs -copyToLocal
/user/mydirectory/hdfsfile.txt localfile.txt
9. hadoop fs -copyFromLocal:
 Copies files or directories from the local file system to HDFS.
 Example: hadoop fs -copyFromLocal localfile.txt
/user/mydirectory/hdfsfile.txt
10. hadoop fs -du:
 Displays the disk usage of files and directories in HDFS.
 Example: hadoop fs -du /user/mydirectory
11. hadoop fs -chmod:
 Changes the permissions of files or directories in HDFS.
 Example: hadoop fs -chmod 777
/user/mydirectory/hdfsfile.txt
12. hadoop fs -chown:
 Changes the owner of files or directories in HDFS.
 Example: hadoop fs -chown username
/user/mydirectory/hdfsfile.txt
13. hadoop fs -chgrp:
 Changes the group of files or directories in HDFS.
 Example: hadoop fs -chgrp groupname
/user/mydirectory/hdfsfile.txt

10. Explain working of Map Reduce:

11. Case study on big data analytics:

• Challenge:
A leading retail chain faced challenges in optimizing its inventory management
and enhancing customer satisfaction. The company struggled with stockouts,
excess inventory, and lacked insights into customer preferences, leading to
suboptimal stocking decisions.
• Solution:
The retail chain implemented a comprehensive big data analytics solution to
address these challenges.
• Steps Taken:
Data Collection
Customer Segmentation
Demand Forecasting
Inventory Optimization
Personalized Marketing
• Results:
Reduced Stockouts and Excess Inventory:
Improved Customer Satisfaction:
Increased customer loyalty and repeat business.
Increased Revenue:
Operational Efficiency:
• Conclusion:
This case study demonstrates how big data analytics can transform retail
operations by providing actionable insights. The implemented solution not only
optimized inventory management but also enhanced the overall customer
experience, leading to increased revenue and operational efficiency.

12. Case study on big data analytics:

• Steps Taken:
Data Infrastructure Overhaul.
Upgraded the data infrastructure to a distributed and scalable architecture.
Adopted big data technologies such as Apache Hadoop and Apache Spark for
distributed processing.
• Real-time Data Ingestion:
Implemented a real-time data ingestion pipeline to capture sales transactions,
customer interactions, and inventory updates in real-time.
Utilized Apache Kafka for seamless and scalable event streaming.
• Data Storage Optimization:
Employed distributed storage solutions like Hadoop Distributed File System (HDFS)
for efficient and cost-effective storage of large datasets.
Utilized data compression techniques to optimize storage space.
• Data Processing and Transformation:
Developed data processing pipelines using Apache Spark for efficient and parallelized
data transformation.
Applied data cleaning and enrichment processes to enhance the quality of incoming
data.
• Integration with Inventory Systems:
Integrated the big data infrastructure with the inventory management system for
real-time updates.
Enabled automated triggers for inventory replenishment based on demand
forecasts.
• Results:
Real-time Insights
Scalability and Performance
Cost Savings
Improved Inventory Management

13. How do we submit MapReduce job to YARN?

14. Explain with example hadoop cluster:

Consider a Hadoop cluster comprising several physical or virtual machines,

each with its own processing power, memory, and storage capacity. Let's say
our cluster consists of the following nodes:

1. NameNode (Master Node): Responsible for storing metadata and

coordinating file system operations.
2. Secondary NameNode (Optional): Assists the NameNode by performing
periodic checkpoints and merging edit logs.
3. ResourceManager (Master Node): Manages resources and schedules jobs
across the cluster.
4. DataNodes (Worker Nodes): Store data blocks and perform data processing
tasks.
5. NodeManagers (Worker Nodes): Manage resources and execute tasks on
behalf of the ResourceManager.

Now, let's walk through an example of how this Hadoop cluster would work
with a MapReduce job:

1. Job Submission:
 A user submits a MapReduce job to the Hadoop cluster, specifying the
input data location, map and reduce functions, and any other job
configurations.
 The job is submitted to the ResourceManager, which assigns it an
application ID and schedules it for execution.
2. Job Initialization:
 The ResourceManager communicates with the NameNode to
determine the location of input data blocks.
 The ResourceManager selects NodeManagers to run the map and
reduce tasks based on resource availability and scheduling policies.
 The ResourceManager launches an ApplicationMaster for the job,
which is responsible for managing the job's execution.
3. Map Phase:
 The ApplicationMaster negotiates with the ResourceManager to
allocate resources for map tasks.
 NodeManagers execute map tasks in parallel across the cluster, reading
input data blocks from DataNodes and applying the user-defined map
function.
 Intermediate key-value pairs are generated by the map tasks and
partitioned based on keys.
 The output of the map tasks is written to local disk and buffered until it
is ready for the shuffle and sort phase.
4. Shuffle and Sort:
 Intermediate key-value pairs generated by map tasks are shuffled and
sorted based on keys.
 The shuffle and sort process involves transferring data over the network
from map tasks to reduce tasks and grouping data by key.
 This phase ensures that all values associated with the same key are sent
to the same reducer for processing.
5. Reduce Phase:
 The ApplicationMaster negotiates with the ResourceManager to
allocate resources for reduce tasks.
 NodeManagers execute reduce tasks in parallel across the cluster,
reading intermediate data from map tasks and applying the user-
defined reduce function.
 The reduce tasks aggregate and process the intermediate key-value
pairs to generate the final output.
6. Output:
 The final output of the MapReduce job is written to HDFS or another
distributed file system.
 Each reducer produces its output file, which contains the final results of
the computation.
 The output files can be accessed by the user for further analysis or
processing.

Throughout this process, Hadoop provides fault tolerance by automatically

handling failures and rerunning tasks as needed. It also optimizes resource
utilization by dynamically allocating resources based on job requirements and
cluster availability. Overall, the Hadoop cluster efficiently processes large-scale
data workloads in a distributed and fault-tolerant manner, enabling
organizations to derive insights and value from their data.

Unit Iii
No ratings yet
Unit Iii
20 pages
Do-it-Yourself Dictionary: The 90 Day Korean
No ratings yet
Do-it-Yourself Dictionary: The 90 Day Korean
7 pages
Blockchain Unconfirmed Transaction Hack Script Docx PDF Free
100% (1)
Blockchain Unconfirmed Transaction Hack Script Docx PDF Free
2 pages
Big Data QB
No ratings yet
Big Data QB
37 pages
bda final sem 7
No ratings yet
bda final sem 7
120 pages
BDA-2
No ratings yet
BDA-2
6 pages
1 Bda Chapter1 Answer
No ratings yet
1 Bda Chapter1 Answer
7 pages
Big Data Analysis IAT-1
No ratings yet
Big Data Analysis IAT-1
43 pages
BDA UNIT 2 (1)
No ratings yet
BDA UNIT 2 (1)
16 pages
Bda Summer 2022 Solution
No ratings yet
Bda Summer 2022 Solution
30 pages
BIGDATA FINAL
No ratings yet
BIGDATA FINAL
25 pages
SDCBDASPARKWEEK1-1
No ratings yet
SDCBDASPARKWEEK1-1
9 pages
Unit III
No ratings yet
Unit III
15 pages
Big Data QB
No ratings yet
Big Data QB
24 pages
HADOOP
No ratings yet
HADOOP
19 pages
Unit 2 Notes BDA
No ratings yet
Unit 2 Notes BDA
10 pages
Big-Data Final
No ratings yet
Big-Data Final
7 pages
BDA Unit 2 Q&A
No ratings yet
BDA Unit 2 Q&A
14 pages
Fbda Unit-3
No ratings yet
Fbda Unit-3
27 pages
Bda Imp No Header Footer (1)
No ratings yet
Bda Imp No Header Footer (1)
25 pages
BDAV_QB
No ratings yet
BDAV_QB
88 pages
Unit 2 Hadoop
No ratings yet
Unit 2 Hadoop
60 pages
IMTC634_Data Science_Chapter 13
No ratings yet
IMTC634_Data Science_Chapter 13
16 pages
Bda Mod 1
No ratings yet
Bda Mod 1
32 pages
Big Data Visualization
No ratings yet
Big Data Visualization
55 pages
Unit 5 - Big Data Ecosystem - 06.05.18
No ratings yet
Unit 5 - Big Data Ecosystem - 06.05.18
21 pages
Chapter_6 - Hadoop
No ratings yet
Chapter_6 - Hadoop
51 pages
Bda Unit 2
No ratings yet
Bda Unit 2
79 pages
Big Data - Unit 2 Hadoop Framework
100% (1)
Big Data - Unit 2 Hadoop Framework
19 pages
Hadoop Ecosystem for Big Data
From Everand
Hadoop Ecosystem for Big Data
Dr. Zemelak Goraga
No ratings yet
Act2 - March7 - 6E - BDA - SEC
No ratings yet
Act2 - March7 - 6E - BDA - SEC
8 pages
Introduction to Hadoop - Copy
No ratings yet
Introduction to Hadoop - Copy
14 pages
Bda A1
No ratings yet
Bda A1
5 pages
BDA_Question_bank
No ratings yet
BDA_Question_bank
33 pages
Unit 3 Bda
No ratings yet
Unit 3 Bda
13 pages
Super 25 Unit 3 Notes
No ratings yet
Super 25 Unit 3 Notes
8 pages
Chapter2 Bdi
No ratings yet
Chapter2 Bdi
101 pages
Data Encoding Principles Assignment 1
No ratings yet
Data Encoding Principles Assignment 1
4 pages
Top 50 Hadoop Interview Questions for 2019
No ratings yet
Top 50 Hadoop Interview Questions for 2019
42 pages
UNIT-1-part-2-BIG DATA ANALYTICS AND TOOLS
No ratings yet
UNIT-1-part-2-BIG DATA ANALYTICS AND TOOLS
19 pages
2 module
No ratings yet
2 module
14 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
5 pages
DSBDA ORAL Question Bank
100% (1)
DSBDA ORAL Question Bank
6 pages
BDA Unit-3
No ratings yet
BDA Unit-3
47 pages
Hadoop Interview1
No ratings yet
Hadoop Interview1
27 pages
Big Data Analysis pdf 2
No ratings yet
Big Data Analysis pdf 2
18 pages
BDA Module-2
No ratings yet
BDA Module-2
7 pages
Bda QB Soln
No ratings yet
Bda QB Soln
22 pages
HADOOP FRAME WORK
No ratings yet
HADOOP FRAME WORK
38 pages
bigdata
No ratings yet
bigdata
18 pages
Chapter 2 Introduction To Hadoop
No ratings yet
Chapter 2 Introduction To Hadoop
31 pages
Hadoop Architecture
No ratings yet
Hadoop Architecture
8 pages
Big Data Unit 2
No ratings yet
Big Data Unit 2
31 pages
Big Data
No ratings yet
Big Data
16 pages
Bda Assignment
No ratings yet
Bda Assignment
7 pages
BDA CW Chapter 2
No ratings yet
BDA CW Chapter 2
6 pages
What Are Basic Characteristics of Data and How Is Parallel Processing System Different From Distributed System?
No ratings yet
What Are Basic Characteristics of Data and How Is Parallel Processing System Different From Distributed System?
24 pages
What Are Basic Characteristics of Data and How Is Parallel Processing System Different From Distributed System?
No ratings yet
What Are Basic Characteristics of Data and How Is Parallel Processing System Different From Distributed System?
24 pages
I am preparing for a Big Data Analytics university... (1)
No ratings yet
I am preparing for a Big Data Analytics university... (1)
15 pages
HADOOP
No ratings yet
HADOOP
18 pages
500+ Data Engineering Interview_Questions
No ratings yet
500+ Data Engineering Interview_Questions
118 pages
Hadoop Ecosystem PDF
No ratings yet
Hadoop Ecosystem PDF
6 pages
NX Mach Design Solutions 2022
No ratings yet
NX Mach Design Solutions 2022
7 pages
Ch-2-Intro. To IT (1st Sem. 2020)
No ratings yet
Ch-2-Intro. To IT (1st Sem. 2020)
11 pages
Responsive Web Design Is About Creating Web Pages That Look Good On All Devices
No ratings yet
Responsive Web Design Is About Creating Web Pages That Look Good On All Devices
11 pages
BIA 658 Social Network Analysis - Midterm Exam Fall 2020
No ratings yet
BIA 658 Social Network Analysis - Midterm Exam Fall 2020
3 pages
Thesis PLC
100% (3)
Thesis PLC
6 pages
MU Engineering Semester 6 Timetable
No ratings yet
MU Engineering Semester 6 Timetable
1 page
Lesson 7 Working With Data From External Sources
100% (2)
Lesson 7 Working With Data From External Sources
21 pages
Unit 4
No ratings yet
Unit 4
30 pages
5th Sem BCS515B_ AI_Module3
No ratings yet
5th Sem BCS515B_ AI_Module3
113 pages
90 - 100 - CALL - M1 - (Tarek Future Signals
No ratings yet
90 - 100 - CALL - M1 - (Tarek Future Signals
27 pages
An Introduction to Language 11th Edition Victoria Fromkin pdf download
100% (1)
An Introduction to Language 11th Edition Victoria Fromkin pdf download
41 pages
22BDS0212 DAA Assignment 2
No ratings yet
22BDS0212 DAA Assignment 2
8 pages
Technical Specification NGR
No ratings yet
Technical Specification NGR
16 pages
Grade 8 HISTORY OF THE INTERNET
No ratings yet
Grade 8 HISTORY OF THE INTERNET
6 pages
1227 9单
No ratings yet
1227 9单
18 pages
Social Media Management System Project Report
No ratings yet
Social Media Management System Project Report
92 pages
How To Use AmpliTube 3 With Logic Pro 8 - Ehow
No ratings yet
How To Use AmpliTube 3 With Logic Pro 8 - Ehow
4 pages
Trace - 2021-07-05 00 - 31 - 37 861
No ratings yet
Trace - 2021-07-05 00 - 31 - 37 861
19 pages
What Is The Prefix Length Notation For The Subnet Mask 255.255.255.224?
No ratings yet
What Is The Prefix Length Notation For The Subnet Mask 255.255.255.224?
63 pages
5.4 Data Models - Network Model
No ratings yet
5.4 Data Models - Network Model
3 pages
3.主板电路图 715G9523M
No ratings yet
3.主板电路图 715G9523M
29 pages
FCM Sec
No ratings yet
FCM Sec
16 pages
Git Log
No ratings yet
Git Log
43 pages
MS-1551 v2.0 English
No ratings yet
MS-1551 v2.0 English
62 pages
CSF 2
No ratings yet
CSF 2
91 pages
Canatal Product Catalogue
No ratings yet
Canatal Product Catalogue
12 pages
MC Email Journey Pricing Sheet
No ratings yet
MC Email Journey Pricing Sheet
8 pages
Microsoft Excel Formulas and Functions
No ratings yet
Microsoft Excel Formulas and Functions
19 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit 1 Notes

Uploaded by

Unit 1 Notes

Uploaded by

1.

What are characteristics of Big Data:

2. Compare Traditional Business Intelligence with Big Data?

5. Difference between big data analytics and big data engineering:

6. Explain architecture of a Hadoop:

Hadoop is an open-source framework for distributed storage and processing

1. Hadoop Distributed File System (HDFS):

The architecture of Hadoop is designed to be scalable, fault-tolerant, and

7. What are the different components of HDFS:

The Hadoop Distributed File System (HDFS) comprises several components

8. What are different components of YARN:

9. Explain commands of HDFS:

10. Explain working of Map Reduce:

11. Case study on big data analytics:

12. Case study on big data analytics:

13. How do we submit MapReduce job to YARN?

14. Explain with example hadoop cluster:

Consider a Hadoop cluster comprising several physical or virtual machines,

1. NameNode (Master Node): Responsible for storing metadata and

Throughout this process, Hadoop provides fault tolerance by automatically

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.