0% found this document useful (0 votes)

36 views

Big Data Ia Answers

Uploaded by

DARSHAN DARSH

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views

Big Data Ia Answers

Uploaded by

DARSHAN DARSH

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

1. What is Big Data?

Give an example
Definition: Big Data refers to large, complex datasets that traditional data
processing software can't handle.
 Example 1 : Social media data, sensor data, E-mails, Zipped files,
Web pages, etc.
 Example 2 : FaceBook According to Facebook, its data system
processes 500+ terabytes of data daily. Facebook generates 2.7 billion
Like actions per day and 300 million new photos are uploaded daily.
It has 2.38 billion users. Allows searching, recommendation.

2. Explain the design features of Hadoop Distributed File

System (HDFS).
The Hadoop Distributed file system(HDFS) was designed for Big Data
processing and is capable of supporting many users simultaneously. The design
assumes a large file write once/read-many model. HDFS restricts data writing
to one user at a time. All additional writes are “append-only,” and there is no
random writing to HDFS files.
• HDFS is designed for data streaming where large amounts of data are read
from disk in bulk. The HDFS block size is typically 64MB or 128MB.
• There is no local caching mechanism. The large block and file sizes makes it
more efficient to reread data from HDFS than to try to cache the data.
• Hadoop MapReduce moves the computation to the data rather than moving
the data to the computation. That is, Converged data storage and processing
happen on the same server or data nodes.
• A reliable file system maintains multiple copies of data across the cluster.
Consequently, failure of a single will not bring down the file system.
• A specialized file system is used, which is not designed for general use.

3. Describe the main components of HDFS.

The design of HDFS is based on two types of nodes: NameNode and
multiple DataNodes. In a basic design, NameNode manages all the
metadata needed to store and retrieve the actual data from the DataNodes.
The NameNode stores all metadata in memory. No data is actually stored on
the NameNode. The design is a Master/Slave architecture in which
master(NameNode) regulates access to files by clients. File system
operations such as opening, closing and renaming files and directories are
all managed by the NameNode. The NameNode also determines the
mapping of blocks to DataNodes and handles DataNode failures. The
NameNode manages block creation, deletion and replication

HDFS uses a master/slave architecture to design large file reading/streaming.

• The NameNode is a metadata server or “data traffic cop.”
• HDFS provides a single namespace that is managed by the NameNode.
• Data is redundantly stored on DataNodes; there is no data on the
NameNode.
• The SecondaryNameNode performs checkpoints of the NameNode file
system’s state but is not a failover node.
4. MapReduce Parallel Data Flow

• HDFS distributes and replicates data over multiple data nodes.

• Apache Hadoop MapReduce will try to move the mapping tasks to the data
nodes that contains the data slice.Results from each data slice are then
combined in the reducer step.
• Parallel execution of MapReduce requires other steps in addition to the
mapper and reducer processes.
• The basic steps are as follows:

1.Input Splits.
• HDFS distributes and replicates data over multiple servers.
• The default data block size is 64MB. Thus, a 500MB file would be broken
into 8 blocks and written to different machines in the cluster.
• The data are also replicated on multiple machines (typically three machines).

2. Map Step.
• The user provides the specific mapping process.
• MapReduce will try to execute the mapper on the machines where the block
resides.
• Because the file is replicated in HDFS, the least busy node with the data will
be chosen.
• If all nodes holding the data are too busy, MapReduce will try to pick a node
that is closest to the node that hosts the data block. Introduction to Big Data
3. Combiner Step.
• It is possible to provide an optimization or pre-reduction as part of the map
stage where key–value pairs are combined prior to the next stage.
• The combiner stage is optional.

4. Shuffle Step.
• Before the parallel reduction stage can complete, all similar keys must be
combined and counted by the same reducer process.
• Therefore, results of the map stage must be collected by key–value pairs and
shuffled to the same reducer process.
• If only a single reducer process is used, the shuffle stage is not needed.

5. Reduce Step.
• The final step is the actual reduction. In this stage, the data reduction is
performed as per the programmer’s design.
• The results are written to HDFS. Each reducer will write an output file. For
example, a MapReduce job running four reducers will create files called part-
0000, part-0001, part 0002, and part-0003.

Figure is an example of a simple Hadoop MapReduce data flow for a

wordcount program. The map process counts the words in the split, and the
reduceprocess calculates the total for each word

5. What are some common HDFS user commands and

Explain their purposes?
• The preferred way to interact with HDFS is through the hdfs command and
will facilitate navigation within HDFS.
• The following listing presents the full range of options that are available for
the hdfs command. In the next section, only portions of the dfs and hdfsadmin
options are given.

General HDFS Commands

The version of HDFS can be found from the version option.
$ hdfs version
Hadoop 2.6.0.2.2.4.2-2
List Files in HDFS
To list the files in the root HDFS directory, enter the following command:
$ hdfsdfs -ls /
Output: Found 2 items
drwxrwxrwx - yarn hadoop 0 2015-04-29 16:52 /app-logs
drwxr-xr-x - hdfshdfs 0 2015-04-21 14:28 /apps

To list files in our home directory, enter the following command:

Syntax: $ hdfsdfs -ls
Output:
Found 2 items
drwxr-xr-x - hdfshdfs 0 2015-05-24 20:06 bin
drwxr-xr-x - hdfshdfs 0 2015-04-29 16:52

Make a Directory in HDFS

To make a directory in HDFS, use the following command.
$ hdfs dfs -mkdir stuff

Copy Files to HDFS

• To copy a file from your current local directory into HDFS, use the following
command.
• If a full path is not supplied, your home directory is assumed.
• In this case, the file test is placed in the directory stuff that was created
previously. $ hdfs dfs -put test stuff
• The file transfer can be confirmed by using the -ls command:
$ hdfs dfs -ls stuff Found 1 items -rw-r--r-- 2 hdfs hdfs 12857 2015-05-29
13:12 stuff/test

Copy Files from HDFS

• Files can be copied back to your local file system using the following
command.
• In this case, the file test from HDFS, will be copied back to the current local
directory with the name test-local.
$ hdfs dfs -get stuff/test test-local
Copy Files within HDFS
The following command will copy a file in HDFS
$ hdfs dfs -cp stuff/test test.hdfs

Delete a File within HDFS

The following command will delete the HDFS file test.hdfs
$ hdfs dfs -rm test.hdfs

Delete a Directory in HDFS

The following command will delete the HDFS directory stuff and all its
contents:
$ hdfs dfs -rm -r -skipTrash stuff
Deleted stuff

Get an HDFS Status Report

HDFS status report using the following command. Those with HDFS
administrator privileges will get a full report. Also, this command uses
dfsadmin instead of dfs to invoke administrative commands.
$ hdfs dfsadmin -report
Configured Capacity: 1503409881088 (1.37 TB)
Present Capacity: 1407945981952 (1.28 TB)
DFS Remaining: 1255510564864 (1.14 TB)
DFS Used: 152435417088 (141.97 GB)
DFS Used%: 10.83%
Under replicated blocks: 54
Blocks with corrupt replicas: 0
Missing blocks: 0

6. Illustrate the process of HDFS block replication.

HDFS Block Replication

• When HDFS writes a file, it is replicated across the cluster. For Hadoop
clusters containing more than eight DataNodes, the replication value is usually
set to 3.
• The HDFS default block size is often 64MB. If a file of size 80MB is written
to HDFS, a 64MB block and a 16MB block will be created.
• Figure provides an example of how a file is broken into blocks and replicated
across the cluster. In this case, a replication factor of 3 ensures that any one
DataNode can fail and the replicated blocks will be available on other nodes—
and then subsequently re replicated on other DataNodes.

7. Write a note on HDFS Safe Mode and Rack Awareness

HDFS Safe Mode

• When the NameNode starts, it enters a read-only safe mode where blocks
cannot be replicated or deleted. Safe Mode enables the NameNode to perform
two important processes:
• The previous file system state is reconstructed by loading the fsimage file
into memory and replaying the edit log.
• The mapping between blocks and data nodes is created by waiting for enough
Of the DataNodes to register so that at least one copy of the data is available.

Rack Awareness
• Rack awareness about knowing where data is stored in a Hadoop system. It
deals with data locality which is moving computation to the node where data
resides.
• Hadoop cluster will exhibit three levels of data locality:
• Data resides on the local machine .
• Data resides in the same rack.
• Data resides in a different rack.
• To protect against failures, the system makes copies of data and stores them
across different racks. So, if one rack fails, the data is still safe and available
from another rack, keeping the system running without losing data.

8.Explain
i) NameNode High Availability
ii) HDFS NameNode Federation

iii) HDFS Checkpoints and Backup

HDFS Checkpoints
• The NameNode stores the metadata of the HDFS file system in a file called
fsimage.
• File systems modifications are written to an edits log file, and at startup the
NameNode merges the edits into a new fsimage.
• The SecondaryNameNode or CheckpointNode periodioally fetches edits from
the NameNode, merges them, and returns an updated fsimage to the
NameNode.

HDFS Backups
• An HDFS BackupNode maintains an up-to-date copy of the metadata both in
memory and on disk.
• The BackupNode does not need to download the fsimage and edits files from
the active NameNode because it already has an up-to-date metadata state in
memory.
• A NameNode supports one BackupNode at a time. No CheckpointNodes may
be registered if a Backup node is in use.
9. Explain Apache Sqoop import & Export Methods with
suitable diagram.
10. Explain Apache Pig with suitable examples.

The following PIG examples used in HDFS.

3.
4.

11. Explain Apache Hive with suitable examples.

12. Explain
i) Apache Sqoop version comparison

ii)Steps to be performed in Apache Sqoop

Step 1: Go To Step 2: Search Document or Page You Want To View or Download Without Paying To Scribd and Copy The URL of
25% (4)
Step 1: Go To Step 2: Search Document or Page You Want To View or Download Without Paying To Scribd and Copy The URL of
10 pages
Substation Construction Manual
100% (1)
Substation Construction Manual
63 pages
NFPA 72 2022 Edition Changes v2
100% (1)
NFPA 72 2022 Edition Changes v2
32 pages
BDA UNIT -3 Updated (1).docx
No ratings yet
BDA UNIT -3 Updated (1).docx
25 pages
Bda - Unit 2
No ratings yet
Bda - Unit 2
56 pages
Unit-2 Introduction To Hadoop
No ratings yet
Unit-2 Introduction To Hadoop
19 pages
Big data Unit 4 own
No ratings yet
Big data Unit 4 own
18 pages
Unit v Programming Model
No ratings yet
Unit v Programming Model
53 pages
BDA-Unit 4
No ratings yet
BDA-Unit 4
20 pages
BDA Module 2 - Notes PDF
No ratings yet
BDA Module 2 - Notes PDF
101 pages
HDFS Commands Updated
No ratings yet
HDFS Commands Updated
87 pages
lab2_BD
No ratings yet
lab2_BD
20 pages
Unit-4 BDA as on 25-11-2024
No ratings yet
Unit-4 BDA as on 25-11-2024
248 pages
Unit 3.1
No ratings yet
Unit 3.1
88 pages
Module-2 PPT-1
No ratings yet
Module-2 PPT-1
126 pages
Big-Data Computing: Hadoop Distributed File System: B. Ramamurthy
No ratings yet
Big-Data Computing: Hadoop Distributed File System: B. Ramamurthy
43 pages
Unit 1 Haoop Architecture
No ratings yet
Unit 1 Haoop Architecture
26 pages
3.1 Hadoop Ecosystem
No ratings yet
3.1 Hadoop Ecosystem
48 pages
Introduction To Hadoop Ecosystem
No ratings yet
Introduction To Hadoop Ecosystem
46 pages
Unit 3
No ratings yet
Unit 3
5 pages
10 Dfs
No ratings yet
10 Dfs
5 pages
bdh_unit_3
No ratings yet
bdh_unit_3
25 pages
Hadoop Architecture
No ratings yet
Hadoop Architecture
48 pages
DW - Bigdata9
No ratings yet
DW - Bigdata9
113 pages
HDFS(27 Jan 2025 Hadoop Distributed File System)
No ratings yet
HDFS(27 Jan 2025 Hadoop Distributed File System)
73 pages
Unit - II
No ratings yet
Unit - II
64 pages
BIG DATA UNIT -2
No ratings yet
BIG DATA UNIT -2
18 pages
BDT - Unit - II - Hdfs and Hadoop Io
No ratings yet
BDT - Unit - II - Hdfs and Hadoop Io
42 pages
UNIT-2
No ratings yet
UNIT-2
14 pages
5.Apache Hadoop Updated
No ratings yet
5.Apache Hadoop Updated
57 pages
Fbda Unit-3
No ratings yet
Fbda Unit-3
27 pages
Unit 3
No ratings yet
Unit 3
61 pages
3 Hadoop
No ratings yet
3 Hadoop
40 pages
1) Discuss The Design of Hadoop Distributed File System (HDFS) and Concept in Detail
No ratings yet
1) Discuss The Design of Hadoop Distributed File System (HDFS) and Concept in Detail
11 pages
Unit 2 Da Material
No ratings yet
Unit 2 Da Material
71 pages
Big-Data Computing: Hadoop Distributed File System: B. Ramamurthy
No ratings yet
Big-Data Computing: Hadoop Distributed File System: B. Ramamurthy
45 pages
Introduction to Hadoop- chapter-2
No ratings yet
Introduction to Hadoop- chapter-2
59 pages
Hadoop Architecture
No ratings yet
Hadoop Architecture
84 pages
Hadoop: OREIN IT Technologies
No ratings yet
Hadoop: OREIN IT Technologies
65 pages
02 Unit-II Hadoop Architecture and HDFS
No ratings yet
02 Unit-II Hadoop Architecture and HDFS
18 pages
Module 1 PDF
No ratings yet
Module 1 PDF
49 pages
Unit 2
No ratings yet
Unit 2
22 pages
Hadoop: A Software Framework For Data Intensive Computing Applications
No ratings yet
Hadoop: A Software Framework For Data Intensive Computing Applications
47 pages
Bda Imp No Header Footer (1)
No ratings yet
Bda Imp No Header Footer (1)
25 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
5 pages
HDFS Internals
No ratings yet
HDFS Internals
30 pages
4
No ratings yet
4
53 pages
10th August Morning and Afternoon session Hadoop (1)
No ratings yet
10th August Morning and Afternoon session Hadoop (1)
18 pages
Big Data-UNIT-2
No ratings yet
Big Data-UNIT-2
46 pages
Unit 2 Hadoop
No ratings yet
Unit 2 Hadoop
67 pages
Chapter 4 - Hadoop Ecosystem
No ratings yet
Chapter 4 - Hadoop Ecosystem
24 pages
PDC All Labs
100% (1)
PDC All Labs
129 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
52 pages
UNIT 3 FULL
No ratings yet
UNIT 3 FULL
89 pages
Lecture-1 - 3 Hadoop - HDFS - Mapreduce (Self Study)
No ratings yet
Lecture-1 - 3 Hadoop - HDFS - Mapreduce (Self Study)
25 pages
Hadoop
No ratings yet
Hadoop
9 pages
2-Hadoop History Terminologies DFS-03-01-2025
No ratings yet
2-Hadoop History Terminologies DFS-03-01-2025
52 pages
Chap4_BigDataStorageAndManagement
No ratings yet
Chap4_BigDataStorageAndManagement
46 pages
Hadoop module1
No ratings yet
Hadoop module1
37 pages
Hadoopintro
No ratings yet
Hadoopintro
31 pages
huawei
No ratings yet
huawei
32 pages
Bda Lab 1
No ratings yet
Bda Lab 1
9 pages
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet
Mitt Robo Challenge - 20241031 - 113639 - 0000
No ratings yet
Mitt Robo Challenge - 20241031 - 113639 - 0000
15 pages
21ec741 Iot & WSN
100% (1)
21ec741 Iot & WSN
1 page
Module#2
No ratings yet
Module#2
1 page
Module-2 21ec33 Notes Updated
No ratings yet
Module-2 21ec33 Notes Updated
51 pages
Question 2
No ratings yet
Question 2
3 pages
DocScanner 27 Jun 2024 9 47 PM
No ratings yet
DocScanner 27 Jun 2024 9 47 PM
42 pages
Introduction To Big data-21CS753-syllabus
No ratings yet
Introduction To Big data-21CS753-syllabus
3 pages
RTK Notes m1
No ratings yet
RTK Notes m1
16 pages
DC Module2 Notes
No ratings yet
DC Module2 Notes
32 pages
Module 1 Question Bank Dspa - Kms
No ratings yet
Module 1 Question Bank Dspa - Kms
1 page
21EC63 Module 4B
No ratings yet
21EC63 Module 4B
29 pages
VLSI Assignment 1
No ratings yet
VLSI Assignment 1
16 pages
21 Scheme Nss All Cerificates
No ratings yet
21 Scheme Nss All Cerificates
4 pages
KIT MaFoi 2025 Batch
No ratings yet
KIT MaFoi 2025 Batch
32 pages
College Fest014
No ratings yet
College Fest014
5 pages
Python Module5 Notes
No ratings yet
Python Module5 Notes
36 pages
Python Module
No ratings yet
Python Module
2 pages
CSS Notes
No ratings yet
CSS Notes
3 pages
HTML Notes
No ratings yet
HTML Notes
2 pages
21EC62 Model Question Paper
No ratings yet
21EC62 Model Question Paper
5 pages
21EC61 Model Question Paper
50% (2)
21EC61 Model Question Paper
2 pages
Automobile Engineering Project Topics For Third - Final Year B.e-B
No ratings yet
Automobile Engineering Project Topics For Third - Final Year B.e-B
26 pages
Skin Effect in Electrical Conductors: Any Commercially Viable Solution?
No ratings yet
Skin Effect in Electrical Conductors: Any Commercially Viable Solution?
6 pages
Introduction To Mentor Graphics: Schematic Dr. Lynn Fuller: Capture, Circuit Simulation and IC Layout
No ratings yet
Introduction To Mentor Graphics: Schematic Dr. Lynn Fuller: Capture, Circuit Simulation and IC Layout
43 pages
Merchant of Venice in Urdu by William Shakespeare - Summary - Themes - Analysis - Characters - Urdu Translations PDF
100% (1)
Merchant of Venice in Urdu by William Shakespeare - Summary - Themes - Analysis - Characters - Urdu Translations PDF
9 pages
Digital System Design & Digital Ic Applications
No ratings yet
Digital System Design & Digital Ic Applications
4 pages
Developing Digital Library: Bs (Lis)
No ratings yet
Developing Digital Library: Bs (Lis)
199 pages
A 5 CM Diameter Sphere Solidifies in 1050 S. Calcu...
No ratings yet
A 5 CM Diameter Sphere Solidifies in 1050 S. Calcu...
3 pages
Nva100 Mb0 Manual PR 0300
100% (1)
Nva100 Mb0 Manual PR 0300
63 pages
Lab - DFDs Using MS Visio 2003 (3) - CS230
No ratings yet
Lab - DFDs Using MS Visio 2003 (3) - CS230
9 pages
Superior Results Robust Design
No ratings yet
Superior Results Robust Design
2 pages
Levo Plus Pipette Filler
No ratings yet
Levo Plus Pipette Filler
8 pages
Idera Solution Brief Getting Started Guide For SQL Diagnostic Manager
No ratings yet
Idera Solution Brief Getting Started Guide For SQL Diagnostic Manager
22 pages
Professional Graduate Programme in Electrical Engineering
No ratings yet
Professional Graduate Programme in Electrical Engineering
19 pages
Mahammad Kaleel Talebailu:, Sprinkler Pipe Sizing
No ratings yet
Mahammad Kaleel Talebailu:, Sprinkler Pipe Sizing
8 pages
Charge Pump With A Regulated Cascode Circuit For Reducing Current Mismatch in PLLs
No ratings yet
Charge Pump With A Regulated Cascode Circuit For Reducing Current Mismatch in PLLs
3 pages
S2400 User Manual 2021-07-15
No ratings yet
S2400 User Manual 2021-07-15
72 pages
Spindle Motor Troubleshooting Guide
No ratings yet
Spindle Motor Troubleshooting Guide
7 pages
Vignan'S University:: Vadlamudi Department of Electrical & Electronics Engineering III Year I Semester Day Wise Lecture Plan For 2015-16
No ratings yet
Vignan'S University:: Vadlamudi Department of Electrical & Electronics Engineering III Year I Semester Day Wise Lecture Plan For 2015-16
6 pages
s-mc-ai-for-marketers-workbook-updates-2025-01-16
No ratings yet
s-mc-ai-for-marketers-workbook-updates-2025-01-16
30 pages
Mcintosh c53
No ratings yet
Mcintosh c53
32 pages
11 Questioning Tool
No ratings yet
11 Questioning Tool
4 pages
NHA77925-07 - HU250 Installation Guide
No ratings yet
NHA77925-07 - HU250 Installation Guide
2 pages
Body Network Diagnosis 2016
100% (8)
Body Network Diagnosis 2016
79 pages
Hero Honda Final
100% (1)
Hero Honda Final
19 pages
A/D & D/A Converters
No ratings yet
A/D & D/A Converters
16 pages
Assessing The Network With Common Security Tools 3e - Mason Burton
No ratings yet
Assessing The Network With Common Security Tools 3e - Mason Burton
12 pages
Barracuda NG Firewall Product Guide
No ratings yet
Barracuda NG Firewall Product Guide
64 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Big Data Ia Answers

Uploaded by

Big Data Ia Answers

Uploaded by

1. What is Big Data?

2. Explain the design features of Hadoop Distributed File

3. Describe the main components of HDFS.

HDFS uses a master/slave architecture to design large file reading/streaming.

• HDFS distributes and replicates data over multiple data nodes.

Figure is an example of a simple Hadoop MapReduce data flow for a

5. What are some common HDFS user commands and

General HDFS Commands

To list files in our home directory, enter the following command:

Make a Directory in HDFS

Copy Files to HDFS

Copy Files from HDFS

Delete a File within HDFS

Delete a Directory in HDFS

Get an HDFS Status Report

6. Illustrate the process of HDFS block replication.

HDFS Block Replication

7. Write a note on HDFS Safe Mode and Rack Awareness

HDFS Safe Mode

iii) HDFS Checkpoints and Backup

The following PIG examples used in HDFS.

11. Explain Apache Hive with suitable examples.

ii)Steps to be performed in Apache Sqoop

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.