0% found this document useful (0 votes)
22 views

Big Data Quiz1.1

Uploaded by

yitej21617
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Big Data Quiz1.1

Uploaded by

yitej21617
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

BIG DATA saMA

1. What class does the ApplicationMaster use to communicate with ResourceManager?—AMRM Client
or AMRM client async

2. True or False: The AppMaster is actually container itself- TRUE

3. True pr false : AppMaster asks NodeManager if it is not too busy to start a container for the
AppMaster. Justify , if it is wrong(Golden Rule) – FALSE , AM communicate with the Node Manager to
create containers.

4. Hadoop is fault0tolerant system. What does Hadoop do in case if HDFS is no longer available due to
disk corruption or machine failure?
it will replicate to another rack/machine

5. Difference in hardware requirements for NameNode and DataNode. Is NameNodenmachine same


as dataNade machine as in terms of hardware.

NameNode needs more memory. It is a memory based server. Determines and maintains how the
chunks of data are distributed across the DataNode. Namespace, Metadata, Block Map.
DataNode needs more storage. Store the chunk of data, is responsible for replicating chunks to other
Datanodes. Handling read and write requests
. Performing the blocks of creation, deletion and replication upon instruction from namenode. Send
heartbeat and blockreport to Namenode.

6. How memory reqirements of NameNode will change if we increase the size of files stored in HDFS
without increasing the number of files--- Memory requirements will decrease.
512 mb = 4 blocks ( 128 mb each)

150+150+150+62= 512 multiple file but same size


2+2+2+1 = 7 blocks

7. When a client contacts the NameNode for accessing a file, the NameNode responds with – block
location

8. Which among the followings can be related to Hadoop 1.x (multiple choise is possible)
A. Job Tracker
B. NodeManager
C. Task Tracker
D. NameNode
E. Datanode

9. Who are the job tracker in Hadoop 1?


A.container
B.master +
c.both
d.slaves
e.none of them
10. What is the difference between Hadoop1 and hadoop2 ?
Hadoop1 is single use system – batch processing, block size 64 MB
Hadoop2 is multi process platform (batch, interactive, online…), block size is 128 MB
Hadoop 1 processing – map reduce (it support only Map Reduce processing model, doesn’t support non
MR tools) , HDFS. Has limited scaling of nodes. Limited to 4000 nodes per cluster. A single Namenode to
manage the entire namespace. NameNode failure affect the stack. Does not support Microsoft Windows.
Hadoop 2 processing—map reduce , others ( data processing), YARN, HDFS. Has better scalability.
Scalable up to 10000 nodes per cluster. Multiple Namenode services manage multiple namespaces. The
Hadoop stack – Hive, pig , Hbase and etc is equipped to handle the NameNode failure. Supports
Microsoft Windows

11. What is the default WEBUI port number of NodeManager? --8042

12. Default duration for heartbeat sent from DataNode to NameNode is --- Heartbeat is 3
seconds

13. Hadoop Ecosystem: “HIVE” is a query engine that supports the parts of SQL specific to
quering data
HIVE

14. Map Reduce is computing model that OPTIMIZED for HIGH SCALABILITY but not for LOW
LATENCY
-Ambari
-MapReduce
-Low
-High
-Optimize
-NameNode
-Scalability
-latency

15. Components of the YARN


Resource Manager, Container, Node Manager, Application Master.

16. Worker Node = DataNode + Node Manager

17. TRUE OR FALSE If the container fails to complete its task successfully, Resource manager starts
…. On different Node Manager.---- TRUE

18. IF you had 10 Mapreduce job running on your cluster , how many Application master instances
would you have running? --- 10 AM. each job has its own AM.

19. What 2 types of resources can Application Master request for a container. --- Processing and
Storage ( RAM + CPU)

20. Data Mining and Analitycs is cross disciplinary area of research which includes following
diciplines?
Machine learning, statistics, artificial intelligence, signal processing, data engineering, probability
models, statistical learning, database management systems, cloud computing
21. Briefly explain Main concepts of Mapreduce –. Functioning programming. Works well in big data.
Can process large data sets. It is a programming model designed for processing large volumes of data in
parallel by dividing the work into set or independent tasks. It provides a flexible and scalable foundation
for analytics, from traditional reporting to leading-edge machine learning algorithms.

22. Hadoop ecosystem : MAPREDUCE is a general pupose computing model and runtime system for
distributed data analytics

23. Explain difference between Block and replica and what are their default value?
Block –128--- is the file on the underlying file system, is for fast reading
Replica- 3---- replica is a copy of original files,

24. List daemons of Hadoop version and briefly explain their roles in Hadoop Cluster
Master
1. Name Node—Hold metadata for HDFS
2. Secondary Name Node-- Perform housekeeping function for the Namenode and back up for
Namenode
3. Job tracker—Manage mapreduce jobs, distribute individually tasks to machine running the Task
Tracker. Coordinates MapReduce stages

Slave service
1. Data Node—Stores actual HDFS data blocks.
2. Task Tracker—Responsible for representing and monitoring the Map and Reduce job

25. What does the following command do? (learn all the commands in slide)
-cat: display file content (uncompressed)
-text: just like cat but works on compressed files
-chgrp,-chmod,-chown: changes file permissions
-put,-get,-copyFromLocal,-copyToLocal: copies files from the local file system to the HDFS and vice
versa.
-ls, -ls -R: list files/directories
-mv,-moveFromLocal,-moveToLocal: moves files
-stat: statistical info for any given file (block size, number of blocks, file type, etc.)

26. Bring 2 use cases on how Big Data Management and Analytics can help multi sectoral business to
increase profit and effectiveness ?
Use Case 1 – Financial use cases transformed to analytics
Customer profiling— Financial firms use parameters about customers to determine risk

Use case 2 -- Retail transformed Market basket analysis


Fraud Detection -- Credit card companies
look at transaction factors to detect fraud

27. 5V’S of Big Data


1. high volume (data at rest)--- terabytes, record, files
2. high velocity( data in motion)—batch, near time, real time, stream
3. high veraity ( data in many forms) – structured, unstructured, multi factor, probabilistic….
4. high veracity ( data in doupt) – trustworthiness, availability, accountability,
5. high value ( data in limbo) – statistical, correlation, hypothetical

28. Briefly describe the primary steps of map-reduce jop in Hadoop


1. input
2.split
3.map
4. shuffer
5.reduce
6. result

29. What is heartbeat in terms of Hadoop and how they are important for cluster?

It is a signal from datanode to name node. It indicates that data node is alive.

30.
vagrant Centos VM
Centos box Hadoop

31.
Rm gives command to nodemanager to create AM
AM send request to RM for allocating resource
RM allocate resource for AM
AM send request to NodeManager to create container
AM directly communicate with container
32. all(sample(15:25), 11> 15)—False
any(sample(15:25), 11> 20)—True

33. rep(seq(2,10,2), each=3)


[1] 2 2 2 4 4 4 6 6 6 8 8 8 10 10 10

34. salesDT<- as.data.table(sales)


Setkey(salesDT,total)
salesDT(nrow(salesDT), [, 6]------ 199

35. streamingde hadoop un istifade etdiyi componentler


Hive—data warehouse structure that supports ad hoq sql queries
Kafka—fast, scaleable, durable and fault-tolerance publish subscribing messaging system
Hbise—scalable distributed nosql database that supports structured data storage for large data tables

36. how to open csv file in R


dat = read.csv("spam.csv", header = TRUE)
37.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy