0% found this document useful (0 votes)
191 views

Hadoop Questions and Answers Part 100

IBM and Google announced a major initiative to use Hadoop to support university courses in distributed computer programming. Hadoop stores data in HDFS and supports data compression/decompression. Hadoop is distributed under the Apache License 2.0. The OpenSolaris Hadoop LiveCD project built a bootable CD-ROM image. The Hadoop Distributed File System is designed to store very large data sets reliably. The Hadoop framework itself is mostly written in the Java programming language. Hadoop runs on cross-platform operating systems.

Uploaded by

yashswami284
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
191 views

Hadoop Questions and Answers Part 100

IBM and Google announced a major initiative to use Hadoop to support university courses in distributed computer programming. Hadoop stores data in HDFS and supports data compression/decompression. Hadoop is distributed under the Apache License 2.0. The OpenSolaris Hadoop LiveCD project built a bootable CD-ROM image. The Hadoop Distributed File System is designed to store very large data sets reliably. The Hadoop framework itself is mostly written in the Java programming language. Hadoop runs on cross-platform operating systems.

Uploaded by

yashswami284
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Hadoop Questions and Answers Part-1

1. IBM and ________ have announced a major initiative to use Hadoop to support
university courses in distributed computer programming.
a) Google Latitude
b) Android (operating system)
c) Google Variations
d) Google

View Answer Discussion

Answer: d
Explanation: Google and IBM Announce University Initiative to Address Internet-
Scale.

2. Point out the correct statement.


a) Hadoop is an ideal environment for extracting and transforming small volumes of
data
b) Hadoop stores data in HDFS and supports data compression/decompression
c) The Giraph framework is less useful than a MapReduce job to solve graph and
machine learning
d) None of the mentioned

View Answer Discussion

Answer: b
Explanation: Data compression can be achieved using compression algorithms like
bzip2, gzip, LZO, etc. Different algorithms can be used in different scenarios based
on their capabilities.

3. What license is Hadoop distributed under?


a) Apache License 2.0
b) Mozilla Public License
c) Shareware
d) Commercial

View Answer Discussion

Answer: a
Explanation: Hadoop is Open Source, released under Apache 2 license.
4. Sun also has the Hadoop Live CD ________ project, which allows running a fully
functional Hadoop cluster using a live CD.
a) OpenOffice.org
b) OpenSolaris
c) GNU
d) Linux

View Answer Discussion

Answer: b
Explanation: The OpenSolaris Hadoop LiveCD project built a bootable CD-ROM
image.

5. Which of the following genres does Hadoop produce?


a) Distributed file system
b) JAX-RS
c) Java Message Service
d) Relational Database Management System

View Answer Discussion

Answer: a
Explanation: The Hadoop Distributed File System (HDFS) is designed to store very
large data sets reliably, and to stream those data sets at high bandwidth to the user.

6. What was Hadoop written in?


a) Java (software platform)
b) Perl
c) Java (programming language)
d) Lua (programming language)

View Answer Discussion

Answer: c
Explanation: The Hadoop framework itself is mostly written in the Java programming
language, with some native code in C and command-line utilities written as shell
scripts.
7. Which of the following platforms does Hadoop run on?
a) Bare metal
b) Debian
c) Cross-platform
d) Unix-like

View Answer Discussion

Answer: c
Explanation: Hadoop has support for cross-platform operating system.

8. Hadoop achieves reliability by replicating the data across multiple hosts and
hence does not require ________ storage on hosts.
a) RAID
b) Standard RAID levels
c) ZFS
d) Operating system

View Answer Discussion

Answer: a
Explanation: With the default replication value, 3, data is stored on three nodes: two
on the same rack, and one on a different rack.

9. Above the file systems comes the ________ engine, which consists of one Job
Tracker, to which client applications submit MapReduce jobs.
a) MapReduce
b) Google
c) Functional programming
d) Facebook

View Answer Discussion

Answer: a
Explanation: MapReduce engine uses to distribute work around a cluster.

10. The Hadoop list includes the HBase database, the Apache Mahout ________
system, and matrix operations.
a) Machine learning
b) Pattern recognition
c) Statistical classification
d) Artificial intelligence

View Answer Discussion

Answer: a
Explanation: The Apache Mahout project’s goal is to build a scalable machine
learning tool.

Hadoop Questions and Answers Part-2


1. As companies move past the experimental phase with Hadoop, many cite the
need for additional capabilities, including _______________
a) Improved data storage and information retrieval
b) Improved extract, transform and load features for data integration
c) Improved data warehousing functionality
d) Improved security, workload management, and SQL support

View Answer Discussion

Answer: d
Explanation: Adding security to Hadoop is challenging because all the interactions
do not follow the classic client-server pattern.

2. Point out the correct statement.


a) Hadoop do need specialized hardware to process the data
b) Hadoop 2.0 allows live stream processing of real-time data

c) In the Hadoop programming framework output files are divided into lines or
records
d) None of the mentioned

View Answer Discussion


Answer: b
Explanation: Hadoop batch processes data distributed over a number of computers
ranging in 100s and 1000s.
3. According to analysts, for what can traditional IT systems provide a foundation
when they’re integrated with big data technologies like Hadoop?
a) Big data management and data mining
b) Data warehousing and business intelligence
c) Management of Hadoop clusters
d) Collecting and storing unstructured data

View Answer Discussion

Answer: a
Explanation: Data warehousing integrated with Hadoop would give a better
understanding of data.

4. Hadoop is a framework that works with a variety of related tools. Common cohorts
include ____________
a) MapReduce, Hive and HBase
b) MapReduce, MySQL and Google Apps
c) MapReduce, Hummer and Iguana
d) MapReduce, Heron and Trumpet

View Answer Discussion

Answer: a
Explanation: To use Hive with HBase you’ll typically want to launch two clusters, one
to run HBase and the other to run Hive.

5. Point out the wrong statement.


a) Hardtop processing capabilities are huge and its real advantage lies in the ability
to process terabytes & petabytes of data
b) Hadoop uses a programming model called “MapReduce”, all the programs should
conform to this model in order to work on the Hadoop platform
c) The programming model, MapReduce, used by Hadoop is difficult to write and test
d) All of the mentioned

View Answer Discussion

Answer: c
Explanation: The programming model, MapReduce, used by Hadoop is simple to
write and test.
6. What was Hadoop named after?
a) Creator Doug Cutting’s favorite circus act
b) Cutting’s high school rock band
c) The toy elephant of Cutting’s son
d) A sound Cutting’s laptop made during Hadoop development

View Answer Discussion

Answer: c
Explanation: Doug Cutting, Hadoop creator, named the framework after his child’s
stuffed toy elephant.

7. All of the following accurately describe Hadoop, EXCEPT ____________


a) Open-source
b) Real-time
c) Java-based
d) Distributed computing approach

View Answer Discussion

Answer: b
Explanation: Apache Hadoop is an open-source software framework for distributed
storage and distributed processing of Big Data on clusters of commodity hardware.

8. __________ can best be described as a programming model used to develop


Hadoop-based applications that can process massive amounts of data.
a) MapReduce
b) Mahout
c) Oozie
d) All of the mentioned

View Answer Discussion

Answer: a
Explanation: MapReduce is a programming model and an associated
implementation for processing and generating large data sets with a parallel,
distributed algorithm.
9. _________ has the world’s largest Hadoop cluster.
a) Apple
b) Datamatics
c) Facebook
d) None of the mentioned

View Answer Discussion

Answer: c
Explanation: Facebook has many Hadoop clusters, the largest among them is the
one that is used for Data warehousing.

10. Facebook Tackles Big Data With _______ based on Hadoop.


a) ‘Project Prism’
b) ‘Prism’
c) ‘Project Big’
d) ‘Project Data’

View Answer Discussion

Answer: a
Explanation: Prism automatically replicates and moves data wherever it’s needed
across a vast network of computing facilities.

Hadoop Questions and Answers Part-3


1. ________ is a platform for constructing data flows for extract, transform, and load
(ETL) processing and analysis of large datasets.
a) Pig Latin
b) Oozie
c) Pig
d) Hive

View Answer Discussion

Answer: c
Explanation: Apache Pig is a platform for analyzing large data sets that consists of a
high-level language for expressing data analysis programs.
2. Point out the correct statement.
a) Hive is not a relational database, but a query engine that supports the parts of
SQL specific to querying data
b) Hive is a relational database with SQL support
c) Pig is a relational database with SQL support
d) All of the mentioned

View Answer Discussion

Answer: a
Explanation: Hive is a SQL-based data warehouse system for Hadoop that facilitates
data summarization, ad hoc queries, and the analysis of large datasets stored in
Hadoop-compatible file systems.

3. _________ hides the limitations of Java behind a powerful and concise Clojure
API for Cascading.
a) Scalding
b) HCatalog
c) Cascalog
d) All of the mentioned

View Answer Discussion

Answer: c
Explanation: Cascalog also adds Logic Programming concepts inspired by Datalog.
Hence the name “Cascalog” is a contraction of Cascading and Datalog.

4. Hive also support custom extensions written in ____________


a) C#
b) Java
c) C
d) C++

View Answer Discussion

Answer: b
Explanation: Hive also supports custom extensions written in Java, including user-
defined functions (UDFs) and serializer-deserializers for reading and optionally
writing custom formats.
5. Point out the wrong statement.
a) Elastic MapReduce (EMR) is Facebook’s packaged Hadoop offering
b) Amazon Web Service Elastic MapReduce (EMR) is Amazon’s packaged Hadoop
offering
c) Scalding is a Scala API on top of Cascading that removes most Java boilerplate
d) All of the mentioned

View Answer Discussion

Answer: a
Explanation: Rather than building Hadoop deployments manually on EC2 (Elastic
Compute Cloud) clusters, users can spin up fully configured Hadoop installations
using simple invocation commands, either through the AWS Web Console or through
command-line tools.

6. ________ is the most popular high-level Java API in Hadoop Ecosystem


a) Scalding
b) HCatalog
c) Cascalog
d) Cascading

View Answer Discussion

Answer: d
Explanation: Cascading hides many of the complexities of MapReduce programming
behind more intuitive pipes and data flow abstractions.

7. ___________ is general-purpose computing model and runtime system for


distributed data analytics.
a) Mapreduce
b) Drill
c) Oozie
d) None of the mentioned

View Answer Discussion

Answer: a
Explanation: Mapreduce provides a flexible and scalable foundation for analytics,
from traditional reporting to leading-edge machine learning algorithms.
8. The Pig Latin scripting language is not only a higher-level data flow language but
also has operators similar to ____________
a) SQL
b) JSON
c) XML
d) All of the mentioned

View Answer Discussion

Answer: a
Explanation: Pig Latin, in essence, is designed to fill the gap between the declarative
style of SQL and the low-level procedural style of MapReduce.

9. ______ jobs are optimized for scalability but not latency.


a) Mapreduce
b) Drill
c) Oozie
d) Hive

View Answer Discussion

Answer: d
Explanation: Hive Queries are translated to MapReduce jobs to exploit the scalability
of MapReduce.

10. ______ is a framework for performing remote procedure calls and data
serialization.
a) Drill
b) BigTop
c) Avro
d) Chukwa

View Answer Discussion

Answer: c
Explanation: In the context of Hadoop, Avro can be used to pass data from one
program or language to another.
Hadoop Questions and Answers Part-4
1. A ________ node acts as the Slave and is responsible for executing a Task
assigned to it by the JobTracker.
a) MapReduce
b) Mapper
c) TaskTracker
d) JobTracker

View Answer Discussion

Answer: c
Explanation: TaskTracker receives the information necessary for the execution of a
Task from JobTracker, Executes the Task, and Sends the Results back to
JobTracker.

2. Point out the correct statement.


a) MapReduce tries to place the data and the compute as close as possible
b) Map Task in MapReduce is performed using the Mapper() function

c) Reduce Task in MapReduce is performed using the Map() function


d) All of the mentioned

View Answer Discussion


Answer: a
Explanation: This feature of MapReduce is “Data Locality”.

3. ___________ part of the MapReduce is responsible for processing one or more


chunks of data and producing the output results.
a) Maptask
b) Mapper
c) Task execution
d) All of the mentioned

View Answer Discussion

Answer: a
Explanation: Map Task in MapReduce is performed using the Map() function.
4. _________ function is responsible for consolidating the results produced by each
of the Map() functions/tasks.
a) Reduce
b) Map
c) Reducer
d) All of the mentioned

View Answer Discussion

Answer: a
Explanation: Reduce function collates the work and resolves the results.

5. Point out the wrong statement.


a) A MapReduce job usually splits the input data-set into independent chunks which
are processed by the map tasks in a completely parallel manner
b) The MapReduce framework operates exclusively on <key, value> pairs
c) Applications typically implement the Mapper and Reducer interfaces to provide the
map and reduce methods
d) None of the mentioned

View Answer Discussion

Answer: d
Explanation: The MapReduce framework takes care of scheduling tasks, monitoring
them and re-executes the failed tasks.

6. Although the Hadoop framework is implemented in Java, MapReduce applications


need not be written in ____________
a) Java
b) C
c) C#
d) None of the mentioned

View Answer Discussion

Answer: a
Explanation: Hadoop Pipes is a SWIG- compatible C++ API to implement
MapReduce applications (non JNITM based).
7. ________ is a utility which allows users to create and run jobs with any
executables as the mapper and/or the reducer.
a) Hadoop Strdata
b) Hadoop Streaming
c) Hadoop Stream
d) None of the mentioned

View Answer Discussion

Answer: b
Explanation: Hadoop streaming is one of the most important utilities in the Apache
Hadoop distribution.

8. _________ maps input key/value pairs to a set of intermediate key/value pairs.


a) Mapper
b) Reducer
c) Both Mapper and Reducer
d) None of the mentioned

View Answer Discussion

Answer: a
Explanation: Maps are the individual tasks that transform input records into
intermediate records.

9. The number of maps is usually driven by the total size of ____________


a) inputs
b) outputs
c) tasks
d) None of the mentioned

View Answer Discussion

Answer: a
Explanation: Total size of inputs means the total number of blocks of the input files.

10. _________ is the default Partitioner for partitioning key space.


a) HashPar
b) Partitioner
c) HashPartitioner
d) None of the mentioned

View Answer Discussion

Answer: c
Explanation: The default partitioner in Hadoop is the HashPartitioner which has a
method called getPartition to partition.

Hadoop Questions and Answers Part-5


1. Streaming supports streaming command options as well as _________ command
options.
a) generic
b) tool
c) library
d) task

View Answer Discussion

Answer: a
Explanation: Place the generic options before the streaming options, otherwise the
command will fail.

2. Point out the correct statement.


a) You can specify any executable as the mapper and/or the reducer
b) You cannot supply a Java class as the mapper and/or the reducer

c) The class you supply for the output format should return key/value pairs of Text
class
d) All of the mentioned

View Answer Discussion


Answer: a
Explanation: If you do not specify an input format class, the TextInputFormat is used
as the default.

3. Which of the following Hadoop streaming command option parameter is required?


a) output directoryname
b) mapper executable
c) input directoryname
d) all of the mentioned

View Answer Discussion

Answer: d
Explanation: Required parameters are used for Input and Output location for the
mapper.

4. To set an environment variable in a streaming command use ____________


a) -cmden EXAMPLE_DIR=/home/example/dictionaries/
b) -cmdev EXAMPLE_DIR=/home/example/dictionaries/
c) -cmdenv EXAMPLE_DIR=/home/example/dictionaries/
d) -cmenv EXAMPLE_DIR=/home/example/dictionaries/

View Answer Discussion

Answer: c
Explanation: Environment Variable is set using cmdenv command.

5. Point out the wrong statement.


a) Hadoop has a library package called Aggregate
b) Aggregate allows you to define a mapper plugin class that is expected to generate
“aggregatable items” for each input key/value pair of the mappers
c) To use Aggregate, simply specify “-mapper aggregate”
d) None of the mentioned

View Answer Discussion

Answer: c
Explanation: To use Aggregate, simply specify “-reducer aggregate”:

6. The ________ option allows you to copy jars locally to the current working
directory of tasks and automatically unjar the files.
a) archives
b) files
c) task
d) none of the mentioned
View Answer Discussion

Answer: a
Explanation: Archives options is also a generic option.

7. ______________ class allows the Map/Reduce framework to partition the map


outputs based on certain key fields, not the whole keys.
a) KeyFieldPartitioner
b) KeyFieldBasedPartitioner
c) KeyFieldBased
d) None of the mentioned

View Answer Discussion

Answer: b
Explanation: The primary key is used for partitioning, and the combination of the
primary and secondary keys is used for sorting.

8. Which of the following class provides a subset of features provided by the


Unix/GNU Sort?
a) KeyFieldBased
b) KeyFieldComparator
c) KeyFieldBasedComparator
d) All of the mentioned

View Answer Discussion

Answer: c
Explanation: Hadoop has a library class, KeyFieldBasedComparator, that is useful
for many applications.

9. Which of the following class is provided by the Aggregate package?


a) Map
b) Reducer
c) Reduce
d) None of the mentioned

View Answer Discussion


Answer: b
Explanation: Aggregate provides a special reducer class and a special combiner
class, and a list of simple aggregators that perform aggregations such as “sum”,
“max”, “min” and so on over a sequence of values.

10. Hadoop has a library class,


org.apache.hadoop.mapred.lib.FieldSelectionMapReduce, that effectively allows you
to process text data like the unix ______ utility.
a) Copy
b) Cut
c) Paste
d) Move

View Answer Discussion

Answer: b
Explanation: The map function defined in the class treats each input key/value pair
as a list of fields.

Hadoop Questions and Answers Part-6


1. Mapper implementations are passed the JobConf for the job via the ________
method.
a) JobConfigure.configure
b) JobConfigurable.configure
c) JobConfigurable.configurable
d) None of the mentioned

View Answer Discussion

Answer: b
Explanation: JobConfigurable.configure method is overridden to initialize
themselves.

2. Point out the correct statement.


a) Applications can use the Reporter to report progress
b) The Hadoop MapReduce framework spawns one map task for each InputSplit
generated by the InputFormat for the job
c) The intermediate, sorted outputs are always stored in a simple (key-len, key,
value-len, value) format
d) All of the mentioned

View Answer Discussion


Answer: d
Explanation: Reporters can be used to set application-level status messages and
update Counters.

3. Input to the _______ is the sorted output of the mappers.


a) Reducer
b) Mapper
c) Shuffle
d) All of the mentioned

View Answer Discussion

Answer: a
Explanation: In the Shuffle phase the framework fetches the relevant partition of the
output of all the mappers, via HTTP.

4. The right number of reduces seems to be ____________


a) 0.90
b) 0.80
c) 0.36
d) 0.95

View Answer Discussion

Answer: d
Explanation: The right number of reduces seems to be 0.95 or 1.75.

5. Point out the wrong statement.


a) Reducer has 2 primary phases
b) Increasing the number of reduces increases the framework overhead, but
increases load balancing and lowers the cost of failures
c) It is legal to set the number of reduce-tasks to zero if no reduction is desired
d) The framework groups Reducer inputs by keys (since different mappers may have
output the same key) in the sort stage
View Answer Discussion

Answer: a
Explanation: Reducer has 3 primary phases: shuffle, sort and reduce.

6. The output of the _______ is not sorted in the Mapreduce framework for Hadoop.
a) Mapper
b) Cascader
c) Scalding
d) None of the mentioned

View Answer Discussion

Answer: d
Explanation: The output of the reduce task is typically written to the FileSystem. The
output of the Reducer is not sorted.

7. Which of the following phases occur simultaneously?


a) Shuffle and Sort
b) Reduce and Sort
c) Shuffle and Map
d) All of the mentioned

View Answer Discussion

Answer: a
Explanation: The shuffle and sort phases occur simultaneously; while map-outputs
are being fetched they are merged.

8. Mapper and Reducer implementations can use the ________ to report progress or
just indicate that they are alive.
a) Partitioner
b) OutputCollector
c) Reporter
d) All of the mentioned

View Answer Discussion


Answer: c
Explanation: Reporter is a facility for MapReduce applications to report progress, set
application-level status messages and update Counters.

9. _________ is a generalization of the facility provided by the MapReduce


framework to collect data output by the Mapper or the Reducer.
a) Partitioner
b) OutputCollector
c) Reporter
d) All of the mentioned

View Answer Discussion

Answer: b
Explanation: Hadoop MapReduce comes bundled with a library of generally useful
mappers, reducers, and partitioners.

10. _________ is the primary interface for a user to describe a MapReduce job to
the Hadoop framework for execution.
a) Map Parameters
b) JobConf
c) MemoryConf
d) None of the above

View Answer Discussion

Answer: b
Explanation: JobConf represents a MapReduce job configuration.

Hadoop Questions and Answers Part-7


1. _______ systems are scale-out file-based (HDD) systems moving to more uses of
memory in the nodes.
a) NoSQL
b) NewSQL
c) SQL
d) All of the mentioned
View Answer Discussion

Answer: a
Explanation: NoSQL systems make the most sense whenever the application is
based on data with varying data types and the data can be stored in key-value
notation.

2. Point out the correct statement.


a) Hadoop is ideal for the analytical, post-operational, data-warehouse-ish type of
workload
b) HDFS runs on a small cluster of commodity-class nodes
c) NEWSQL is frequently the collection point for big data
d) None of the mentioned

View Answer Discussion

Answer: a
Explanation: Hadoop together with a relational data warehouse, they can form very
effective data warehouse infrastructure.

3. Hadoop data is not sequenced and is in 64MB to 256MB block sizes of delimited
record values with schema applied on read based on ____________
a) HCatalog
b) Hive
c) Hbase
d) All of the mentioned

View Answer Discussion

Answer: a
Explanation: Other means of tagging the values also can be used.

4. __________ are highly resilient and eliminate the single-point-of-failure risk with
traditional Hadoop deployments.
a) EMR
b) Isilon solutions
c) AWS
d) None of the mentioned
View Answer Discussion

Answer: b
Explanation: Enterprise data protection and security options including file system
auditing and data-at-rest encryption to address compliance requirements are also
provided by Isilon solution.

5. Point out the wrong statement.


a) EMC Isilon Scale-out Storage Solutions for Hadoop combine a powerful yet
simple and highly efficient storage platform
b) Isilon native HDFS integration means you can avoid the need to invest in a
separate Hadoop infrastructure

c) NoSQL systems do provide high latency access and accommodate less


concurrent users
d) None of the mentioned

View Answer Discussion


Answer: c
Explanation: NoSQL systems do provide low latency access and accommodate
many concurrent users.

6. HDFS and NoSQL file systems focus almost exclusively on adding nodes to
____________
a) Scale out
b) Scale up
c) Both Scale out and up
d) None of the mentioned

View Answer Discussion

Answer: a
Explanation: HDFS and NoSQL file systems focus almost exclusively on adding
nodes to increase performance (scale-out) but even they require node configuration
with elements of scale up.

7. Which is the most popular NoSQL database for scalable big data store with
Hadoop?
a) Hbase
b) MongoDB
c) Cassandra
d) None of the mentioned

View Answer Discussion

Answer: a
Explanation: HBase is the Hadoop database: a distributed, scalable Big Data store
that lets you host very large tables — billions of rows multiplied by millions of
columns — on clusters built with commodity hardware.

8. The ___________ can also be used to distribute both jars and native libraries for
use in the map and/or reduce tasks.
a) DataCache
b) DistributedData
c) DistributedCache
d) All of the mentioned

View Answer Discussion

Answer: c
Explanation: The child-jvm always has its current working directory added to the
java.library.path and LD_LIBRARY_PATH.

9. HBase provides ___________ like capabilities on top of Hadoop and HDFS.


a) TopTable
b) BigTop
c) Bigtable
d) None of the mentioned

View Answer Discussion

Answer: c
Explanation: Google Bigtable leverages the distributed data storage provided by the
Google File System.

10. __________ refers to incremental costs with no major impact on solution design,
performance and complexity.
a) Scale-out
b) Scale-down
c) Scale-up
d) None of the mentioned

View Answer Discussion

Answer: c
Explanation: Adding more CPU/RAM/Disk capacity to Hadoop DataNode that is
already part of a cluster does not require additional network switches.

Hadoop Questions and Answers Part-8


1. A ________ serves as the master and there is only one NameNode per cluster
a) Data Node
b) NameNode
c) Data block
d) Replication

View Answer Discussion

Answer: b
Explanation: All the metadata related to HDFS including the information about data
nodes, files stored on HDFS, and Replication, etc. are stored and maintained on the
NameNode.

2. Point out the correct statement.


a) DataNode is the slave/worker node and holds the user data in the form of Data
Blocks
b) Each incoming file is broken into 32 MB by default

c) Data blocks are replicated across different nodes in the cluster to ensure a low
degree of fault tolerance
d) None of the mentioned

View Answer Discussion


Answer: a
Explanation: There can be any number of DataNodes in a Hadoop Cluster.

3. HDFS works in a __________ fashion.


a) master-worker
b) master-slave
c) worker/slave
d) all of the mentioned

View Answer Discussion

Answer: a
Explanation: NameNode servers as the master and each DataNode servers as a
worker/slave

4. _______ NameNode is used when the Primary NameNode goes down


a) Rack
b) Data
c) Secondary
d) None of the mentioned

View Answer Discussion

Answer: c
Explanation: Secondary namenode is used for all time availability and reliability.

5. Point out the wrong statement.


a) Replication Factor can be configured at a cluster level (Default is set to 3) and
also at a file level
b) Block Report from each DataNode contains a list of all the blocks that are stored
on that DataNode
c) User data is stored on the local file system of DataNodes
d) DataNode is aware of the files to which the blocks stored on it belong to

View Answer Discussion

Answer: d
Explanation: NameNode is aware of the files to which the blocks stored on it belong
to.

6. Which of the following scenario may not be a good fit for HDFS?
a) HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the
same file
b) HDFS is suitable for storing data related to applications requiring low latency data
access
c) HDFS is suitable for storing data related to applications requiring high latency data
access
d) None of the mentioned

View Answer Discussion

Answer: a
Explanation: HDFS can be used for storing archive data since it is cheaper as HDFS
allows storing the data on low cost commodity hardware while ensuring a high
degree of fault-tolerance.

7. The need for data replication can arise in various scenarios like ____________
a) Replication Factor is changed
b) DataNode goes down
c) Data Blocks get corrupted
d) All of the mentioned

View Answer Discussion

Answer: d
Explanation: Data is replicated across different DataNodes to ensure a high degree
of fault-tolerance.

8. _______ is the slave/worker node and holds the user data in the form of Data
Blocks.
a) DataNode
b) NameNode
c) Data block
d) Replication

View Answer Discussion

Answer: a
Explanation: A DataNode stores data in the [HadoopFileSystem]. A functional
filesystem has more than one DataNode, with data replicated across them.

9. HDFS is implemented in _____________ programming language.


a) C++
b) Java
c) Scala
d) None of the mentioned

View Answer Discussion

Answer: b
Explanation: HDFS is implemented in Java and any computer which can run Java
can host a NameNode/DataNode on it.

10. For YARN, the ___________ Manager UI provides host and port information.
a) Data Node
b) NameNode
c) Resource
d) Replication

View Answer Discussion

Answer: c
Explanation: All the metadata related to HDFS including the information about data
nodes, files stored on HDFS, and Replication, etc. are stored and maintained on the
NameNode.

Hadoop Questions and Answers Part-9


1. For ________ the HBase Master UI provides information about the HBase Master
uptime.
a) HBase
b) Oozie
c) Kafka
d) All of the mentioned

View Answer Discussion

Answer: a
Explanation: HBase Master UI provides information about the number of live, dead
and transitional servers, logs, ZooKeeper information, debug dumps, and thread
stacks.
2. During start up, the ___________ loads the file system state from the fsimage and
the edits log file.
a) DataNode
b) NameNode
c) ActionNode
d) None of the mentioned

View Answer Discussion

Answer: b
Explanation: HDFS is implemented on any computer which can run Java can host a
NameNode/DataNode on it.

3. In order to read any file in HDFS, instance of __________ is required.


a) filesystem
b) datastream
c) outstream
d) inputstream

View Answer Discussion

Answer: a
Explanation: InputDataStream is used to read data from file.

4. Point out the correct statement.


a) The framework groups Reducer inputs by keys
b) The shuffle and sort phases occur simultaneously i.e. while outputs are being
fetched they are merged
c) Since JobConf.setOutputKeyComparatorClass(Class) can be used to control how
intermediate keys are grouped, these can be used in conjunction to simulate
secondary sort on values
d) All of the mentioned

View Answer Discussion

Answer: d
Explanation: If equivalence rules for keys while grouping the intermediates are
different from those for grouping keys before reduction, then one may specify a
Comparator.
5. ______________ is method to copy byte from input stream to any other stream in
Hadoop.
a) IOUtils
b) Utils
c) IUtils
d) All of the mentioned

View Answer Discussion

Answer: a
Explanation: IOUtils class is static method in Java interface.

6. _____________ is used to read data from bytes buffers.


a) write()
b) read()
c) readwrite()
d) all of the mentioned

View Answer Discussion

Answer: a
Explanation: readfully method can also be used instead of read method.

7. Point out the wrong statement.


a) The framework calls reduce method for each <key, (list of values)> pair in the
grouped inputs
b) The output of the Reducer is re-sorted
c) reduce method reduces values for a given key
d) None of the mentioned

View Answer Discussion

Answer: b
Explanation: The output of the Reducer is not re-sorted.

8. Interface ____________ reduces a set of intermediate values which share a key


to a smaller set of values.
a) Mapper
b) Reducer
c) Writable
d) Readable

View Answer Discussion

Answer: b
Explanation: Reducer implementations can access the JobConf for the job.

9. Reducer is input the grouped output of a ____________


a) Mapper
b) Reducer
c) Writable
d) Readable

View Answer Discussion

Answer: a
Explanation: In the phase the framework, for each Reducer, fetches the relevant
partition of the output of all the Mappers, via HTTP.

10. The output of the reduce task is typically written to the FileSystem via
____________
a) OutputCollector
b) InputCollector
c) OutputCollect
d) All of the mentioned

View Answer Discussion

Answer: a
Explanation: In reduce phase the reduce(Object, Iterator, OutputCollector, Reporter)
method is called for each pair in the grouped inputs.

Hadoop Questions and Answers Part-10


1. Applications can use the _________ provided to report progress or just indicate
that they are alive.
a) Collector
b) Reporter
c) Dashboard
d) None of the mentioned

View Answer Discussion

Answer: b
Explanation: In scenarios where the application takes a significant amount of time to
process individual key/value pairs, this is crucial since the framework might assume
that the task has timed-out and kill that task.

2. Which of the following parameter is to collect keys and combined values?


a) key
b) values
c) reporter
d) output

View Answer Discussion

Answer: d
Explanation: The reporter parameter is for a facility to report progress.

3. ________ is the name of the archive you would like to create.


a) archive
b) archiveName
c) name
d) none of the mentioned

View Answer Discussion

Answer: b
Explanation: The name should have a *.har extension.

4. Point out the correct statement.


a) A Hadoop archive maps to a file system directory
b) Hadoop archives are special format archives
c) A Hadoop archive always has a *.har extension
d) All of the mentioned

View Answer Discussion

Answer: d
Explanation: A Hadoop archive directory contains metadata (in the form of _index
and _masterindex) and data (part-*) files.

5. Using Hadoop Archives in __________ is as easy as specifying a different input


filesystem than the default file system.
a) Hive
b) Pig
c) MapReduce
d) All of the mentioned

View Answer Discussion

Answer: c
Explanation: Hadoop Archives is exposed as a file system MapReduce will be able
to use all the logical input files in Hadoop Archives as input.

6. The __________ guarantees that excess resources taken from a queue will be
restored to it within N minutes of its need for them.
a) capacitor
b) scheduler
c) datanode
d) none of the mentioned

View Answer Discussion

Answer: b
Explanation: Free resources can be allocated to any queue beyond its guaranteed
capacity.

7. Point out the wrong statement.


a) The Hadoop archive exposes itself as a file system layer
b) Hadoop archives are immutable
c) Archive rename, deletes and creates return an error
d) None of the mentioned

View Answer Discussion

Answer: d
Explanation: All the fs shell commands in the archives work but with a different URI.

8. _________ is a pluggable Map/Reduce scheduler for Hadoop which provides a


way to share large clusters.
a) Flow Scheduler
b) Data Scheduler
c) Capacity Scheduler
d) None of the mentioned

View Answer Discussion

Answer: c
Explanation: The Capacity Scheduler supports multiple queues, where a job is
submitted to a queue.

9. Which of the following parameter describes destination directory which would


contain the archive?
a) -archiveName <name>
b) <source>
c) <destination>
d) none of the mentioned

View Answer Discussion

Answer: c
Explanation: -archiveName <name> is the name of the archive to be created.

10. _________ identifies filesystem path names which work as usual with regular
expressions.
a) -archiveName <name>
b) <source>
c) <destination>
d) none of the mentioned
View Answer Discussion

Answer: d
Explanation: identifies destination directory which would contain the archive.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy