Big Data Quiz1.1
Big Data Quiz1.1
1. What class does the ApplicationMaster use to communicate with ResourceManager?—AMRM Client
or AMRM client async
3. True pr false : AppMaster asks NodeManager if it is not too busy to start a container for the
AppMaster. Justify , if it is wrong(Golden Rule) – FALSE , AM communicate with the Node Manager to
create containers.
4. Hadoop is fault0tolerant system. What does Hadoop do in case if HDFS is no longer available due to
disk corruption or machine failure?
it will replicate to another rack/machine
NameNode needs more memory. It is a memory based server. Determines and maintains how the
chunks of data are distributed across the DataNode. Namespace, Metadata, Block Map.
DataNode needs more storage. Store the chunk of data, is responsible for replicating chunks to other
Datanodes. Handling read and write requests
. Performing the blocks of creation, deletion and replication upon instruction from namenode. Send
heartbeat and blockreport to Namenode.
6. How memory reqirements of NameNode will change if we increase the size of files stored in HDFS
without increasing the number of files--- Memory requirements will decrease.
512 mb = 4 blocks ( 128 mb each)
7. When a client contacts the NameNode for accessing a file, the NameNode responds with – block
location
8. Which among the followings can be related to Hadoop 1.x (multiple choise is possible)
A. Job Tracker
B. NodeManager
C. Task Tracker
D. NameNode
E. Datanode
12. Default duration for heartbeat sent from DataNode to NameNode is --- Heartbeat is 3
seconds
13. Hadoop Ecosystem: “HIVE” is a query engine that supports the parts of SQL specific to
quering data
HIVE
14. Map Reduce is computing model that OPTIMIZED for HIGH SCALABILITY but not for LOW
LATENCY
-Ambari
-MapReduce
-Low
-High
-Optimize
-NameNode
-Scalability
-latency
17. TRUE OR FALSE If the container fails to complete its task successfully, Resource manager starts
…. On different Node Manager.---- TRUE
18. IF you had 10 Mapreduce job running on your cluster , how many Application master instances
would you have running? --- 10 AM. each job has its own AM.
19. What 2 types of resources can Application Master request for a container. --- Processing and
Storage ( RAM + CPU)
20. Data Mining and Analitycs is cross disciplinary area of research which includes following
diciplines?
Machine learning, statistics, artificial intelligence, signal processing, data engineering, probability
models, statistical learning, database management systems, cloud computing
21. Briefly explain Main concepts of Mapreduce –. Functioning programming. Works well in big data.
Can process large data sets. It is a programming model designed for processing large volumes of data in
parallel by dividing the work into set or independent tasks. It provides a flexible and scalable foundation
for analytics, from traditional reporting to leading-edge machine learning algorithms.
22. Hadoop ecosystem : MAPREDUCE is a general pupose computing model and runtime system for
distributed data analytics
23. Explain difference between Block and replica and what are their default value?
Block –128--- is the file on the underlying file system, is for fast reading
Replica- 3---- replica is a copy of original files,
24. List daemons of Hadoop version and briefly explain their roles in Hadoop Cluster
Master
1. Name Node—Hold metadata for HDFS
2. Secondary Name Node-- Perform housekeeping function for the Namenode and back up for
Namenode
3. Job tracker—Manage mapreduce jobs, distribute individually tasks to machine running the Task
Tracker. Coordinates MapReduce stages
Slave service
1. Data Node—Stores actual HDFS data blocks.
2. Task Tracker—Responsible for representing and monitoring the Map and Reduce job
25. What does the following command do? (learn all the commands in slide)
-cat: display file content (uncompressed)
-text: just like cat but works on compressed files
-chgrp,-chmod,-chown: changes file permissions
-put,-get,-copyFromLocal,-copyToLocal: copies files from the local file system to the HDFS and vice
versa.
-ls, -ls -R: list files/directories
-mv,-moveFromLocal,-moveToLocal: moves files
-stat: statistical info for any given file (block size, number of blocks, file type, etc.)
26. Bring 2 use cases on how Big Data Management and Analytics can help multi sectoral business to
increase profit and effectiveness ?
Use Case 1 – Financial use cases transformed to analytics
Customer profiling— Financial firms use parameters about customers to determine risk
29. What is heartbeat in terms of Hadoop and how they are important for cluster?
It is a signal from datanode to name node. It indicates that data node is alive.
30.
vagrant Centos VM
Centos box Hadoop
31.
Rm gives command to nodemanager to create AM
AM send request to RM for allocating resource
RM allocate resource for AM
AM send request to NodeManager to create container
AM directly communicate with container
32. all(sample(15:25), 11> 15)—False
any(sample(15:25), 11> 20)—True