CCS334 -BDA -QB - SEC A

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

SEMESTER: V ‘A’ BRANCH: CSE

BIG DATA ANALYTICS

Question Bank
(ODD Semester 2023-24)

Prepared by
Mr.N. KARTHIK, [AP/CSE]
Department of Computer Science and Engineering,
Kingston Engineering College.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
QUESTION BANK (R2021)
Year/Sem: III / V SEM Batch: 2021-25
Subject Code/Name: CCS334 – BIG DATA ANALYTICS

UNIT I UNDERSTANDING BIG DATA


Introduction to big data – convergence of key trends – unstructured data – industry examples of big
data – web analytics – big data applications – big data technologies – introduction to Hadoop – open
source technologies – cloud and big data – mobile business intelligence – Crowd sourcing analytics –
inter and trans firewall analytics.

Part- A

1. What is data science?


2. What is data?[APR/MAY 2019]
3. Define structured data?
4. What is unstructured data?
5. What is machine-generated data?
6. Define streaming data.
7. Explain NIST definition of define cloud computing?[NOV/DEC 2019]
8. How is big data used in marketing?
9. What is a web log file?[NOV/DEC 2021]
10. What are web browsers?[NOV/DEC 2022]
11. What is a web crawler?
12. Define focused crawler.
13. Define a firewall.[APR /MAY 2020]
14. What are the characteristics of a firewall?
15. What is Hadoop?[NOV/DEC 2020] (MAY/JUNE 2015) [NOV/DEC 2021]
16. Why Hadoop is important.
17. What is Big data?[APR/MAY 2021]
18. Difference between Data Science and Big Data.
19. What are the benefits of Big Data Processing?
20. List out the Big Data Challenges? [APRIL/MAY 2022]
21. Define Data Explosion.
22. Compare Cloud Computing and Big Data.[APR/MAY 2022]
23. List out the examples of machine generated unstructured data.
24. Difference between structured and Unstructured Data.
25. Give the Industry Examples of Big Data.
26. Define one-to-one marketing.
27. What is Web analytics?[NOV/DEC 2018]
28. Define software – utility.
29. Write short notes on Big Data Applications.
30. Define Cassandra.
31. Give the examples of Big Data technologies.
32. Define Apache Pig.
33. Define Apache Spark.[NOV/DEC 2018]
34. What is MongoDB?
35. Define HDFS.
36. List the key features of Hadoop.
37. What are the four main libraries in Hadoop?
38. List out the advantages of Hadoop.
39. Define Open source software.[APR/MAY 2018]
40. What is meant by Proprietary software?
41. What are the successes of open source?
42. List out the advantages of Open source.[NOV/DEC 2021]
43. Write the Disadvantages of Open source.
44. What are the applications of Open source software?
45. Compare the Open source with close source / Proprietary software.
46. Difference between Cloud and Big Data.
47. What is meant by Out-sourcing?
48. Difference between Cloud computing and internet.
49. Difference between Mobile Analytics and Web Analytics.
50. What is Crowdsourcing?
51. What are the types of Crowdsourcing?
52. What are the characteristics of Firewall?
53. List the types of Firewall.
54. What is Big Data Analytics? [APRIL/MAY 2018]
55. Explain about 5v's? [NOV/DEC 2021]
PART B

1. Discuss in detail about Big Data, compare difference between Data Science and Big Data,
Benefits and challenges.[APR/MAY 2022]
2. Explain about the Converegence of Key trends in detail.[NOV/DEC 2022]
3. Discuss about Unstructured Data and compare with structured Data.
4. Explain about the Industry examples of Big Data.[[NOV/DEC 2020]
5. Elaborate on Web Analytics with examples.
6. Briefly explain about the Big Data Applications.[APR/MAY 2021]
7. Explain about Big Data Technologies.
8. Explain about the Hadoop features, ecosystem, Hadoop advantages.[APR/MAY 2022]
9. Explain about the Open Source Technologies, difference between Open source and Open
Standards,
10. Explain about the i)Cloud and Big Data,
ii) difference between Cloud Computing and Big Data
iii) difference between Cloud Computing and internet[APR/MAY 2022]
11. Explain about the i) mobile business intelligence
ii) Difference between Mobile Analytics and Web Analytics
12. Discuss briefly about the Crowd Sourcing Analytics in detail.[NOV/DEC 2021]
13. Explain the Inter and Trans Firewall Analytics in detail.
14. Explain Big data and Hadoop open source technologies? [NOV/DEC 2021]
15. Explain Characteristics of big data applications. [APRIL/MAY 2018]
16. Discuss Industry Examples of Bigdata in detail. [APRIL/MAY 2019] [APRIL/MAY 2021]

UNIT II NOSQL DATA MANAGEMENT


Introduction to NoSQL – aggregate data models – key – value and document data
models – relationships – graph databases – schemaless databases – materialized views
– distribution models – master – slave replication – consistency – Cassandra –
Cassandra data model – Cassandra examples – Cassandra clients

PART A

1. What is consistency in a distributed system?


2. What is database Sharding?[APR/MAY 2021]
3. Why are NoSQL database known as schemaless databases?
4. What is the difference between sharding and replication?
5. How is Sharding different from partitioning?
6. What are write-write and read-write conflicts?
7. Define Cassandra.[NOV/DEC 2021]
8. What is the use of Bloom filters in Cassandra?
9. Define sorted strings table.[NOV/DEC 2020]
10. Define Cassandra data center.
11. List the advantages and disadvantages of graph data.
12. Define session consistency.
13. What are schemaless databases?
14. What is known as SSTable?[APR/MAY 2020]
15. Define Coordinator node.
16. What is StorageProxy?
17. Define Keyspaces.[NOV/DEC 2019]
18. Define Tables.
19. Define Column families.
20. Define Node.
21. Define Data Center.
22. Define Cluster.[APR/MAY 2019]
23. What is meant by Commit log?
24. Define Mem-table.[APR/MAY 2022]
25. What is Bloom filter?
26. List the features of Cassandra.
27. Define column-oriented database.
28. How Quorums are used?
29. Define Optimistic.[NOV/DEC 2022]
30. Define Read-write conflict.
31. Difference between Replication and Sharding.
32. Define consistency.
33. What are the advantages of Sharding?
34. Define Data Partitioning.
35. Define Horizontal scaling.[APR/MAY 2023]
36. Define Data relationships.
37. What are the Terminologies in a document data store?
38. Define Column-based.
39. Comparison of SQL and NoSQL Databases.
40. Write three properties of CAP Theorem.
41. What is a Graph database? [NOV/DEC 2021]
42. Enumerate the term Graph Analytics. [APRIL/MAY 2021]
43. What is NoSQL database[NOV/DEC 2018] [NOV/DEC 2021] [APRIL/MAY 2021]
44. Compare document store vs Key value store. [NOV/DEC 2021]
45. Define Tabular store[APRIL/MAY 2018]
46. List out any three business challenges in an organization. [APRIL/MAY 2019]
47. Point out the aspects of adopting big data techniques. [NOV/DEC 2018] [NOV/DEC 2021]
48. Define object data stores. [APRIL/MAY 2018]

PART-B

1. Explain NoSQL and definition of NoSQL with example and advantages.


[NOV/DEC 2022]
2. Explain Aggregate Data Models, Key-value store, Document-based, column-based,
graph-based, NoSQL Key/Value Database:MongoDB [APR/MAY 2022]
3. Explain Schemaless Databases in detail.
4. Discuss about the Materialized Views.[NOV/DEC 2021]
5. Briefly explain about the Distribution Models.[APR/MAY 2022]
6. Explain about the Consistency, Update Consistency, Read Consistency, Quorums and
Relaxing Durability.[NOV/DEC 2019]
7. Explain about the Cassandra with architecture, Data Model, Cassandra Clients.
8. What is NoSQL? What are the advantages of NoSQL? Explain the types of NoSQL
databases. (13) [APRIL/MAY 2019] [NOV/DEC 2021]
9. With suitable examples differentiate the applications, structure, working and usage of
different NoSQL databases (13) [NOV/DEC 2021]
10. What is the purpose of sharding? (6) [APRIL/MAY 2019] [APRIL/MAY 2021]
11. Explain the process of sharding in MongoDB. (7) [APRIL/MAY 2021]
12. Formulate how big data analytics helps business people to increase their revenue.
Discuss
with any one real time application.(15) [APRIL/MAY 2021]
13. Draw insights out of any one visualization tool.(15) [NOV/DEC 2021]

UNIT III MAP REDUCE APPLICATIONS


MapReduce workflows – unit tests with MRUnit – test data and local tests – anatomy of
MapReduce job run – classic Map-reduce – YARN – failures in classic Map-reduce and
YARN – job scheduling – shuffle and sort – task execution – MapReduce types – input
formats – output formats.

PART A

1. Define MapReduce. [APRIL/MAY 2019] [APRIL/MAY 2021]


2. List the characteristics of MapReduce. [APRIL/MAY 2021]
3. What are the major responsibilities of YARN?
4. Why is YARN used?
5. What is fair scheduler?
6. List the failures of MapReduce. [APRIL/MAY 2021]
7. Explain First in First out (FIFO) scheduling.
8. Why Hadoop works better with a small number of large files?
9. What is TextInputFormat?
10. What is Node Manager Failure in YARN?
11. Define Map.
12. Define Split.
13. Define Shuffle and Sort.
14. Define Reduce.
15. Define Combine.
16. What are the functions of Job Tracker and Task Tracker?
17. What are the limitations of MapReduce?
18. Define MRUnit.
19. How to test Java MapReduce Jobs in Hadoop?
20. What are the five independent entities?
21. Define YARN.
22. What are the two major responsibilities in YARN?
23. Why is YARN used?
24. What are the components in YARN?
25. List the features of YARN.
26. Discuss the merits and demerits of YARN.
27. Difference between YARN and MapReduce.
28. What are the failures in classic MapReduce.
29. What is Fair Scheduler?
30. List the advantages of FIFO scheduler.
31. Difference between Fair and Capacity Scheduler.

PART B

1. Explain about MapReduce characteristics, workflows, dataflow, Functions of Job tracker


and
Task tracker and limitation of MapReduce. [APRIL/MAY 2021]
2. Describe about the Unit Tests with MRUnit.
3. Briefly discuss about the Anatomy of MapReduce Job Run.
4. Explain about the YARN merits and demerits.
5. Discuss about the Failures in classic Map Reduce and YARN.[APR/MAY 2019]
6. Explain about the Job Scheduling.
7. Discuss about the Shuffle and Sort.
8. Explain about the Task execution.
9. Describe the MapReduce Types.[APR/MAY 2018]
10.Write notes on Map Reduce Programming Model. [APRIL/MAY 2019] [APRIL/MAY
2021]
11. Discuss Industry Examples of Bigdata in detail. [APRIL/MAY 2019] [APRIL/MAY 2021]
12. Explain Big data and Hadoop open source technologies? [NOV/DEC 2021]
13. Explain Characteristics of big data applications. [APRIL/MAY 2022]
UNIT IV BASICS OF HADOOP
Data format – analyzing data with Hadoop – scaling out – Hadoop streaming –
Hadoop pipes – design of Hadoop distributed file system (HDFS) – HDFS concepts –
Java interface – data flow – Hadoop I/O – data integrity – compression – serialization
– Avro – file-based data structures – Cassandra – Hadoop integration

PART A
1. Why do we need Hadoop streaming?
2. What is the Hadoop Distributed file system?
3. What is data locality optimization?
4. Why do map tasks write their output to the local disk, not to HDFS?
5. Why is a block in HDFS so large?
6. How HDFS services support big data?
7. What if writable were not there in Hadoop?
8. Define serialization.
9. What is writables? Explain its importance in Hadoop.
10. What happens if a client detects an error when reading a block in Hadoop?
11. What is MapFile?
12. What are Hadoop pipes?
13. Define Blocks.
14. Define Job tracker.
15. Define Task tracker.
16. What is input splits?
17. List the features of Hadoop Streaming.
18. Define DataNodes.
19. Define chunks.
20. List the design issue of HDFS.
21. List the goals of HDFS.
22. Define Checkpoint node.
23. Define Data queue.
24. Define raw file system.
25. List the benefits of compression.
26. Define Writable.
27. Define Writable Comparator.
28. List the primitive writable data types available in Hadoop.
29. Define sequence files.
30. Define lazy write-back caching.
PART B

1. Explain in detail about Data Format.


2. Describe about Hadoop streaming.
3. Explain in detail about Hadoop Pipes with neat diagram.
4. Explain the design of Hadoop Distributed File System with Architecture.
5. Describe the Hadoop I/O.
6. Explain the file-based Data structures.
7. Describe about Cassandra-Hadoop integration.
8. Explain in detail the Hadoop Distributed File System (HDFS) and its functionality in
processing big data. [NOV/DEC 2021]

UNIT V HADOOP RELATED TOOLS


Hbase – data model and implementations – Hbase clients – Hbase examples – praxis. Pig
– Grunt – pig data model – Pig Latin – developing and testing Pig Latin scripts. Hive – data
types and file formats – HiveQL data definition – HiveQL data manipulation – HiveQL
queries.

PART A
1. What is HBase?
2. What is Hive?
3. What is Hive data definition?
4. Explain services provided by Zookeeper in Hbase.
5. What is Zookeeper?
6. What are the responsibilities of HMaster?
7. Where to Use HBase?
8. Explain unique features of Hbase?
9. Explain data model in Hbase?
10.What is the difference between Pig Latin and Pig engine?
11.What is pig storage?
12.What are the features of Hive?
13.List the features and application of Hbase.
14.Where to use Hbase?
15.Difference between HDFS and Hbase.
16.Difference between Hbase and Relational Database.
17.List the limitations of Hbase.
18.What are the four components in Pig Hadoop framework?
19.Define Parser.
20.Define Optimizer.
21.Define Compiler.
22.What are the advantages of Pig?

PART B

1. What is Hbase? Draw architecture of Hbase. Explain difference between HDFS and Hbase.
[APR/MAY 2020]
2. i. Write short note on Hbase client.[NOV/DEC 2020]
ii. What is pig? Explain feature of pig. Draw architecture of pig.[NOV/DEC 2019]
3. Explain about Hbase clients.[NOV/DEC 2022]
4. Explain about Praxis with examples. [APR/MAY 2022]
5. Describe about the Pig. Explain the features of Pig Hadoop. Explain the Pig Data Model.
[APR/MAY 2019]
6. Describe Hive with architecture. List the data types and file formats.[NOV/DEC/2018]
7. Explain HiveQL Data Definition.[NOV/DEC 2021]
8. Explain in detail about HiveQL Data Manipulation.[APR/MAY 2021]
9. Describe about HiveQL Queries.[APR/MAY 2018]
10. Describe the system architecture and components of Hive and Hadoop(13) [APRIL/MAY
2021]
11. What is HBase? Give detailed note on features of HBASE(13) [NOV/DEC 2018] [NOV/DEC
2021]
12. Explain features and Application of Hbase in detail.[APR/MAY 2022]
13. Difference between HDFS and Hbase.[APR/MAY 2022]
14. Explain about the limitations of Hbase.[NOV/DEC 2022]
15. Discuss about the Hbase and Relational database. [APR/MAY 2019]
16. List the features of Pig Hadoop.[NOV/DEC 2021]
17. Describe the advantages and the disadvantages of Pig.[APR/MAY 2018]
18. Explain about the types of Pig Data Model.[APR/MAY 2022]
19. Explain the Data types and File formats.
20. Draw the Architecure of Hive and explain in detail about the Hive Architecture.
21. Describe about the developing and testing Pig Latin Scripts.

Prepared by: Approved by


Verified by:
Mr. N. KARTHIK Dr. U.V ARIVAZHAGU
& Principal
Dr. U.V ARIVAZHAGU
Ms. A.MANJU
HOD/CSE
Asst.Prof/CSE

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy