Week 0 To 8 Assignment
Week 0 To 8 Assignment
1 point
1 point
Hadoop
SQL
Python
Excel
1 point
1 point
Java
Python
C++
Ruby
1 point
Data storage
Data querying
Data modeling
1 point
YARN
HDFS
MapReduce
Pig
1 point
Data storage
Data processing
Configuration management
Data visualization
1 point
128 MB
256 MB
64 MB
512 MB
1 point
Data storage
Data processing
Configuration management
Data visualization
Week 1: Assignment 1
1 point
Accepted Answers:
Data that is collected from multiple sources and is of high variety, volume,
and velocity
1 point
MySQL
Hadoop
Excel
SQLite
Accepted Answers:
Hadoop
1 point
Accepted Answers:
1 point
YARN
HDFS
MapReduce
Pig
HDFS
1 point
Which Hadoop ecosystem tool is primarily used for querying and analyzing
large datasets stored in Hadoop's distributed storage?
HBase
Hive
Kafka
Sqoop
Accepted Answers:
Hive
1 point
NodeManager
ResourceManager
ApplicationMaster
DataNode
Accepted Answers:
NodeManager
1 point
Accepted Answers:
1 point
Accepted Answers:
1 point
connectedComponents
triangleCount
shortestPaths
pageRank
Accepted Answers:
triangleCount
1 point
HDFS Namenode
TaskTracker
YARN ResourceManager
DataNode
Accepted Answers:
TaskTracker
Week 2: Assignment 2
1 point
Which statement best describes the data storage model used by HBase?
Key-value pairs
Document-oriented
Encryption
Relational tables
Accepted Answers:
Key-value pairs
1 point
What is Apache Avro primarily used for in the context of Big Data?
Data serialization
Machine learning
Database management
Yes, the answer is correct.
Score: 1
Accepted Answers:
Data serialization
1 point
NameNode
DataNode
Secondary NameNode
ResourceManager
Accepted Answers:
DataNode
1 point
Partitioning
Compression
Replication
Encryption
Accepted Answers:
Replication
1 point
Mapper
Reducer
Partitioner
Combiner
Accepted Answers:
Partitioner
1 point
Accepted Answers:
1 point
Aggregating results
Accepted Answers:
Aggregating results
1 point
Image rendering
Accepted Answers:
1 point
PageRank algorithm
K-means clustering
Word count
Recommender system
Accepted Answers:
Word count
1 point
Accepted Answers:
1 point
DataFrame
Dataset
Spark SQL
Accepted Answers:
1 point
Spark Streaming
Spark SQL
RDDs
Accepted Answers:
Spark SQL
1 point
Which statements about Cassandra and its Snitches are correct?
Statement 1: In Cassandra, during a write operation, when a hinted
handoff is enabled and if any replica is down, the coordinator writes to all
other replicas and keeps the write locally until the down replica comes
back up. Statement 2: In Cassandra, Ec2Snitch is an important snitch for
deployments, and it is a simple snitch for Amazon EC2 deployments where
all nodes are in a single region. In Ec2Snitch, the region name refers to the
data center, and the availability zone refers to the rack in a cluster.
Accepted Answers:
1 point
GraphX
MLlib
Spark SQL
Spark R
Accepted Answers:
Spark SQL
1 point
Apache Hadoop
Apache Spark
Apache HBase
Apache Pig
Accepted Answers:
Apache HBase
1 point
The primary Machine Learning API for Spark is now the _____ based API.
DataFrame
Dataset
RDD
Accepted Answers:
DataFrame
1 point
Accepted Answers:
1 point
Which DAG action in Apache Spark triggers the execution of all previously
defined transformations in the DAG and returns the count of elements in
the resulting RDD or DataFrame?
collect()
count()
take()
first()
Accepted Answers:
count()
1 point
Graph processing
Accepted Answers:
1 point
Batch
Window
Micro-batch
Record
Accepted Answers:
Micro-batch
Week 4: Assignment 4
1 point
Accepted Answers:
1 point
Accepted Answers:
1 point
Consistency
Availability
Partition tolerance
Latency tolerance
Accepted Answers:
Partition tolerance
1 point
What consistency level in Apache Cassandra ensures that a write
operation is acknowledged only after the write has been successfully
written to all replicas?
ONE
LOCAL_ONE
LOCAL_QUORUM
ALL
Accepted Answers:
ALL
1 point
Accepted Answers:
1 point
Master
Region
Zookeeper
Accepted Answers:
Zookeeper
1 point
Data storage
Data processing
Configuration management
Data visualization
Accepted Answers:
Configuration management
1 point
CQL is a language used for creating and managing tables and querying
data in Apache Cassandra
CQL is a scripting language used for data transformation tasks in
Cassandra
Accepted Answers:
CQL is a language used for creating and managing tables and querying
data in Apache Cassandra
1 point
Consistency
Accessibility
Partition tolerance
Atomicity
Accepted Answers:
Partition tolerance
1 point
Accepted Answers:
To mark data that has been logically deleted
Week 5: Assignment 5
1 point
MLlib
GraphX
Spark streaming
ALL
Accepted Answers:
GraphX
1 point
Which of the following frameworks is best suited for fast, in-memory data
processing and supports advanced analytics such as machine learning and
graph processing?
Apache Flink
Apache Storm
Apache Spark
Accepted Answers:
Apache Spark
1 point
Apache Spark
Apache Storm
Hadoop MapReduce
Apache Flume
Accepted Answers:
Apache Spark
1 point
Accepted Answers:
1 point
log aggregation
compaction
collection
all of the mentioned
Accepted Answers:
log aggregation
1 point
In-memory computation
Lazy-evaluation
Lineage information
Accepted Answers:
Lineage information
1 point
Accepted Answers:
Pig Latin provides a procedural data flow language for ETL tasks.
Accepted Answers:
Pig Latin provides a procedural data flow language for ETL tasks.
1 point
Apache HBase for storing student data and Apache Pig for processing.
Apache Kafka for data streaming and Apache Storm for real-time
analytics.
Hadoop MapReduce for batch processing and Apache Hive for querying.
Apache Spark for data processing and Apache Hadoop for storage.
Accepted Answers:
Apache Spark for data processing and Apache Hadoop for storage.
1 point
Hadoop MapReduce
Apache Kafka
Apache Spark
Apache Hive
Yes, the answer is correct.
Score: 1
Accepted Answers:
Apache Spark
Week 6: Assignment 6
1 point
Block Report from each DataNode contains a list of all the blocks that are
stored on that DataNode
Accepted Answers:
1 point
Boosting
Bagging
Pruning
Neural networks
Accepted Answers:
Bagging
1 point
S2: Random Forest is use for regression whereas Gradient Boosting is use
for Classification task
S1 and S2
S2 and S4
S3 and S4
S1 and S4
Accepted Answers:
S1 and S4
1 point
In the context of K-means clustering with MapReduce, what role does the
Map phase play in handling very large datasets?
1 point
Accepted Answers:
1 point
Manhattan Distance
Cosine Similarity
Jaccard Similarity
Hamming Distance
Accepted Answers:
Cosine Similarity
1 point
Minkowski distance
Cosine similarity
Manhattan distance
Euclidean distance
Accepted Answers:
Manhattan distance
1 point
Accepted Answers:
1 point
To ensure that every data point is used for training only once
Accepted Answers:
1 point
Which of the following steps is NOT typically part of the machine learning
process?
Data Collection
Model Training
Model Deployment
Data Encryption
Accepted Answers:
Data Encryption
Download
Download
Download
Download
Download
Download
HADOOP MAPREDUCE 2.0 (PART-II)
Download
MAPREDUCE EXAMPLES
Download
Download
INTRODUCTION TO SPARK
Download
Download
Download
Download
CAP THEOREM
Download
CONSISTENCY SOLUTIONS
Download
DESIGN OF ZOOKEEPER
Download
Download
DESIGN OF HBASE
Download
Download
Download
INTRODUCTION TO KAFKA
Download
Download
Download
Download
Download
Download
Download
Download
PARAMETER SERVERS
Download
Download
Download
Download
Download