0% found this document useful (0 votes)

61 views

Important Da

This document discusses various topics related to big data analytics including: 1. It defines big data and discusses the challenges of working with big data. 2. It discusses Hadoop and its ecosystem including HDFS, MapReduce, Hive, and YARN. 3. It covers data mining techniques and how they are used in various sectors like marketing, finance, manufacturing, and government. 4. It discusses analyzing data streams and algorithms for handling streaming data. 5. It provides examples of how to write MapReduce programs and queries in MongoDB.

Uploaded by

Priyadarshini

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views

Important Da

Uploaded by

Priyadarshini

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

PART-A

UNIT-1

Define the term “Big Data”.

What are the two open source analytics tools?
What do you understand about Reactive-Business Intelligence?
What are the challenges with Big Data?
What do you mean by R analytics?
What is Statistica?
What is IBM SPSS Modeler?
Define Weka.

UNIT-2

What is meant by unstructured data? Give examples.

What are the major components of HDFS 2?
What are the features of HDFS 2?

What is HDFS Limitation? What do you understand about the HDFS Federation?
What are the three important classes of MapReduce?
What is the use of hive in the Hadoop ecosystem?
What are the two MapReduce Daemons?
List the Daemons that are part of YARN Architecture.
How many blocks will be created for a file that is 300 MB? The default block size is 64 MB and
the replication factor is 3.
What is an active and passive NameNode?

UNIT-3

1. Mention the steps involved in EDA.

2. Consider the following headline:“Although Nintendo sales were already good last year,
this year they are even better. When the game Animal Crossing was released, the game
sold almost 12 million times in only two weeks.” Which types of data are mentioned in
this headline?
3. Two students are studying for the course Data Analytics for Engineers and have just
read the passage about interval data and ratio data. Student one says: “When students
take a test, they are graded from a scale of 0 to 10. For this score ratios make sense,
hence scores are ratio data.” Student two says: “Ratios do not make sense for IQ-
scores. Hence, scores are interval data.” Note that IQ-tests are scored relative to the
reference score of 100. Which of the two students is correct? And why?
4. What do you mean by Categorical data?
5. What is the function used to identify NaN values?
6. How do you perform random sampling without replacement?
7. Let’s say a Hive table is created as an external table. If we drop the table, will the data
be accessible?
8. Mention the types of NoSQL databases.
9. A Hive partition table is created which is partition by a column say yearofexperience. If
we create a directory say yearofexperience=3 at the HDFS path of the table and dump
the data set which is as per the table structure. Will the data be available if we execute
select query on the table?
10. What is Hive metastore?
11. What is Data Lakes?

UNIT-4

What are the examples of Stream Sources?

How hash function is used in sampling data stream?
What is Bloom Filtering?
What is the Alon-Matias-Szegedy Algorithm for Second Moments?
What is Flajolet-Martin Algorithm?
What is the Datar-Gionis-Indyk-Motwani (DGIM) Algorithm?
What do you mean by Count-Distinct Problem?
What is Decaying Windows?
What are the six rules that must be followed when representing a stream by buckets?
How do you deal with infinite streams?

UNIT-5

How data mining is used in sales and marketing?

How data mining technology is used to acquire new customers?
How Data mining is used to personalize online retailing?
How data mining techniques are used in the Government sector?
How data mining is used in manufacturing?
What do you mean by churn or attrition?
What is the use of data mining in the finance sector?
What is credit card fraud detection?
How are fraudulent activities detected in telecommunications?
What is mining of retail transaction data?

PART-B

UNIT-1
Discuss the following in detail

i) Conventional challenges in Big Data

ii) Nature of Big Data

Define Big Data. Explain the Evolution of Big Data and their characteristics.

Define data, web data, Big Data. Also explain the structured, semi structured and unstructured
data.

Analyse how the unstructured data is getting processed? Explain the sources of unstructured
data? What are the challenges in handling with Big data?

Write short notes on.

i) In-memory Analytics
ii) In -Database Processing
iii) Shared nothing architecture.

Discuss the 6 V’s of Big data.

Explain the terminologies used in big data environments.

What are the key questions to be answered by all organizations stepping into analytics? Justify
with example.

Explain about the process involved in data mining. Also explain the algorithms used in data
mining process.

UNIT-2

What are the goals of Hadoop framework? Discuss and Illustrate the ecosystem of Hadoop?

Explain the following with neat diagram.

i) Hadoop Version 1.0

ii) Hadoop Version 2.0

Explain how do you process the data in Hadoop?

i) Explain the issues faced with MapReduce in Hadoop 1.0?

ii) What are the alternate solutions to MapReduce in Hadoop 2.0? Analyse

Write in detail about the steps involved in MapReduce to achieve the high throughput?

Explain HDFS operations in detail.

Explain the significances of Hadoop distributed file systems and its application.

Explain how BigQuery is working? With neat sketch.

Explain how Matrix Multiplication is carried out in MapReduce Algorithm?

UNIT-3

Explain the steps involved in Exploratory Data analysis.

Analyse the software tools available for EDA.

Analyse the usage of numerical and Categorical data. Also emphasize your answer with
relevant examples.

Analyse the way to handle missing data. Give example to support your views.

Explain the difference between SQL and NoSQL.

A Hive table is created as an external table at location say hdfs://usr/data/table_name. If we

dump a data set which are having the data as per the table structure, will you able to fetch the
records from the table using a select query?

Explain the datatypes used in MongoDB. Explain with example queries.

What is NoSQL? Explain its types and advantages.

What is Hive? Explain the Hive architecture in detail.

Explain Hive Query languages with examples.

UNIT-4

What is Sampling Data in a Stream? Explain the General Sampling Problem.

What is Filtering Streams? Explain the Analysis of Bloom Filtering.

Explain Counting Distinct Elements in a Stream.

Explain Estimating Moments. How do you deal with Infinite Streams?

Explain the concept of Counting Ones in a Window.

Explain the Flajolet-Martin Algorithm.

Explain the Alon-Matias-Szegedy Algorithm for Second Moments and Higher order moments.

Elaborate on how google and yahoo are handling the streams.

Brief the concept of Combining Estimates.

UNIT-5

Explain how data mining techniques are used in Sales and Marketing.

Describe the data mining techniques used in finance and manufacturing sectors.

Explain the role of data mining in Government and healthcare.

Write short notes on data mining approaches used in Telecommunications.

Create a case study to evaluate the data mining and data analytics for a healthcare industry.

Explain microRNA data analysis case study in detail.

Explain credit scoring case study in detail.

Explain data mining non-tabular data.

Analyse how innovative insurance organizations extract value from uncertain data.
PART-C

UNIT-1

You are the university library. You see a few students browsing through the library catalogue
on a Kiosk. You observe the librarians busy at work issuing and returning books. You see a
few students fill up the feedback form on the services offered by the library. Quite a few
students are learning using the e-learning content. Think on the different types of data that are
being generated in this scenario. Support your answer with logic.

Analyse the difference between the various types of analytics.

UNIT-2

Create a MapReduce program to count the occurrences of similar words across 50 files.

Create a MapReduce program to analyse the temperature dataset.

Consider a collection of literature survey made by a researcher in the form of a text document
with respect to cloud and big data analytics. Using Hadoop and MapReduce, develop an
application to count the occurrence of pre-dominant words.

UNIT-3

Here are the counts (in thousands) of earned degrees in the U.S. for a recent year, classified by
degree type and sex of degree recipient.

Bachelor's Master's Professional Doctorate

Female 616 194 30 16
Male 529 171 44 26

Problems:

i) If you choose a degree recipient at random, what is the probability you pick a woman?

ii) If you choose a male degree recipient at random, what is the probability that you pick
someone who earned a professional degree?
iii) If you pick a degree recipient at random, what is the probability you pick a woman with a
doctorate?

IV) If you pick a Bachelor's degree recipient at random, what is the probability you pick a
man?

2. How a bank turned challenges into opportunities to serve its customers using NoSQL
Database. Demonstrate with architectural and database design.

3. Structure of 'restaurants' collection:

"address": {

"building": "1007",

"coord": [ -73.856077, 40.848447 ],

"street": "Morris Park Ave",

"zipcode": "10462"

"borough": "Bronx",

"cuisine": "Bakery",

"grades": [

{ "date": { "$date": 1393804800000 }, "grade": "A", "score": 2 },

{ "date": { "$date": 1378857600000 }, "grade": "A", "score": 6 },

{ "date": { "$date": 1358985600000 }, "grade": "A", "score": 10 },

{ "date": { "$date": 1322006400000 }, "grade": "A", "score": 9 },

{ "date": { "$date": 1299715200000 }, "grade": "B", "score": 14 }

"name": "Morris Park Bake Shop",

"restaurant_id": "30075445"

i) Write a MongoDB query to display all the documents in the collection restaurants.
ii) Write a MongoDB query to display the fields restaurant_id, name, borough and
cuisine for all the documents in the collection restaurant.
iii) Write a MongoDB query to display the fields restaurant_id, name, borough and
cuisine, but exclude the field _id for all the documents in the collection restaurant.
iv) Write a MongoDB query to find the restaurants that achieved a score, more than 80
but less than 100.
v) Write a MongoDB query to find the restaurant Id, name, borough and cuisine for
those restaurants which contain 'Wil' as first three letters for its name.
vi) Write a MongoDB query to find the restaurant Id, name, borough and cuisine for
those restaurants which contain 'Reg' as three letters somewhere in its name.

4) Create a MongoDB instance and create the table with following fields. ModelNo, Brand,
Color, Price, Size(Height, Width)
ii) Update Brand name from Adidas to Puma in mongodb for the very first matching record.
iii) insert one more brand info of your choice.
iv) Print all shoes which are available in blue color and width 2cm.
v) Print all shoes which are available either in blue or Neon Color using $in expression.
v) Delete all records for Adidas brand from this collection.
vi) Update height for Nike Shoes to 12cm.
vii)Drop shoes collection.

Onkyo TX-SR 393 DAB Service Manual
67% (3)
Onkyo TX-SR 393 DAB Service Manual
85 pages
Step 1: Go To Step 2: Search Document or Page You Want To View or Download Without Paying To Scribd and Copy The URL of
25% (4)
Step 1: Go To Step 2: Search Document or Page You Want To View or Download Without Paying To Scribd and Copy The URL of
10 pages
BS EN ISO 19296 2018 Mobile Machines Working Underground
67% (3)
BS EN ISO 19296 2018 Mobile Machines Working Underground
54 pages
Celebrity Bluff
No ratings yet
Celebrity Bluff
29 pages
R J2 Remastering
No ratings yet
R J2 Remastering
16 pages
Bda Sem 7 Book
No ratings yet
Bda Sem 7 Book
188 pages
SEM VII BDA Syllabus Theory
No ratings yet
SEM VII BDA Syllabus Theory
4 pages
Developing Analytic Talent: Becoming a Data Scientist
From Everand
Developing Analytic Talent: Becoming a Data Scientist
Vincent Granville
3/5 (7)
Data Science Mastery: From Beginner to Expert in Big Data Analytics
From Everand
Data Science Mastery: From Beginner to Expert in Big Data Analytics
Kameron Hussain
No ratings yet
20ai402 Data Analytics Unit-2
No ratings yet
20ai402 Data Analytics Unit-2
72 pages
MCAD2232 (PRESS) BIG DATA and Its Applications
No ratings yet
MCAD2232 (PRESS) BIG DATA and Its Applications
140 pages
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
B.Tech Jntuh DWDM Course Description
No ratings yet
B.Tech Jntuh DWDM Course Description
6 pages
Big Data Analytics Comp Syllabus Sem7
No ratings yet
Big Data Analytics Comp Syllabus Sem7
4 pages
DWM 10 Marks
No ratings yet
DWM 10 Marks
3 pages
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
BDA Syllabus - Sem VII - Mumbai University
No ratings yet
BDA Syllabus - Sem VII - Mumbai University
3 pages
MCA - BigData Notes
No ratings yet
MCA - BigData Notes
136 pages
Gujarat Technological University: Page 1 of 3
No ratings yet
Gujarat Technological University: Page 1 of 3
3 pages
MCA - BigData Notes
No ratings yet
MCA - BigData Notes
136 pages
Digital Notes of Big Data Analytics Dated 5.1.2024
No ratings yet
Digital Notes of Big Data Analytics Dated 5.1.2024
175 pages
BDA QUESTION BANK
No ratings yet
BDA QUESTION BANK
10 pages
21cs71BDA Question bank
No ratings yet
21cs71BDA Question bank
4 pages
IT_(R20)_4-1_BIG DATA ANALYTICS_DIGITAL NOTES (1)
No ratings yet
IT_(R20)_4-1_BIG DATA ANALYTICS_DIGITAL NOTES (1)
117 pages
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Big Data Analytics
No ratings yet
Big Data Analytics
19 pages
BDA_DIGITAL NOTES
No ratings yet
BDA_DIGITAL NOTES
85 pages
BDA UNITWISE QB
No ratings yet
BDA UNITWISE QB
3 pages
BDA Techmax (Searchable)
No ratings yet
BDA Techmax (Searchable)
150 pages
Building Scalable Data-Intensive Applications
From Everand
Building Scalable Data-Intensive Applications
Chandani Kaul
No ratings yet
Data Mining Syllabus and Question
No ratings yet
Data Mining Syllabus and Question
6 pages
Smarter Data Science: Succeeding with Enterprise-Grade Data and AI Projects
From Everand
Smarter Data Science: Succeeding with Enterprise-Grade Data and AI Projects
Neal Fishman
No ratings yet
Question_Bank_DSBDA
No ratings yet
Question_Bank_DSBDA
4 pages
MR20 Vi-I Syllabus
No ratings yet
MR20 Vi-I Syllabus
22 pages
No SQL Database in Bda
No ratings yet
No SQL Database in Bda
84 pages
Data Analytics Important Questions
No ratings yet
Data Analytics Important Questions
2 pages
BDA Questions
No ratings yet
BDA Questions
8 pages
Chaoter Data Science
No ratings yet
Chaoter Data Science
20 pages
It (r20) 4-1 Big Data Analytics Digital Notes
No ratings yet
It (r20) 4-1 Big Data Analytics Digital Notes
84 pages
Bda QP PDF
No ratings yet
Bda QP PDF
12 pages
BDA - AIDS Syllabus
No ratings yet
BDA - AIDS Syllabus
2 pages
Data Mining Doubt Clearing Session Questions
No ratings yet
Data Mining Doubt Clearing Session Questions
12 pages
Unit 1
No ratings yet
Unit 1
19 pages
IAT-I-BDA-odd sem
No ratings yet
IAT-I-BDA-odd sem
1 page
Digital Notes IDBA Final Original
No ratings yet
Digital Notes IDBA Final Original
156 pages
DA Full
No ratings yet
DA Full
738 pages
DOC-20241202-WA0037.
No ratings yet
DOC-20241202-WA0037.
3 pages
BDA_2M
No ratings yet
BDA_2M
10 pages
Mining 2720209
No ratings yet
Mining 2720209
3 pages
Big Data Analytics (R20a0520)
No ratings yet
Big Data Analytics (R20a0520)
84 pages
21PCS203 - Big Data Analytics
No ratings yet
21PCS203 - Big Data Analytics
4 pages
Bda End Sem
No ratings yet
Bda End Sem
6 pages
Chapter 2 - Intro. To Data Sciences
No ratings yet
Chapter 2 - Intro. To Data Sciences
27 pages
Chapter Two Data Science: by Abdulaziz Oumer
No ratings yet
Chapter Two Data Science: by Abdulaziz Oumer
29 pages
DATA ANALYTICS Lab
No ratings yet
DATA ANALYTICS Lab
3 pages
CCS334 BDA Syllabus
No ratings yet
CCS334 BDA Syllabus
5 pages
Unit 1 - DA - Introduction To Big Data
No ratings yet
Unit 1 - DA - Introduction To Big Data
65 pages
Big Data Analytics-Digital Notes
No ratings yet
Big Data Analytics-Digital Notes
86 pages
MCA1
No ratings yet
MCA1
9 pages
Chapter 2 - Introduction to Data Science (2)
No ratings yet
Chapter 2 - Introduction to Data Science (2)
35 pages
Important Questions-Bigdata
No ratings yet
Important Questions-Bigdata
4 pages
Unit 1 - DA - Introduction To Data Science
No ratings yet
Unit 1 - DA - Introduction To Data Science
70 pages
CCS334 -BDA -QB - SEC A
No ratings yet
CCS334 -BDA -QB - SEC A
12 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
ME CSE sem 1
No ratings yet
ME CSE sem 1
9 pages
Using The AVCS ENC Service
No ratings yet
Using The AVCS ENC Service
19 pages
B8031-8033FXS EN Borri 07 15
No ratings yet
B8031-8033FXS EN Borri 07 15
4 pages
Idera Solution Brief Getting Started Guide For SQL Diagnostic Manager
No ratings yet
Idera Solution Brief Getting Started Guide For SQL Diagnostic Manager
22 pages
Digital Forensics Sample Case Study Report
No ratings yet
Digital Forensics Sample Case Study Report
19 pages
Life Sciences 12th Feb 2017 180 100: Question Paper Name: Subject Name: Duration: Total Marks
No ratings yet
Life Sciences 12th Feb 2017 180 100: Question Paper Name: Subject Name: Duration: Total Marks
9 pages
Aspire 4738/4738G/4738Z/4738ZG Service Guide
No ratings yet
Aspire 4738/4738G/4738Z/4738ZG Service Guide
220 pages
Summarizing and Note-Taking DFCE
No ratings yet
Summarizing and Note-Taking DFCE
11 pages
Enterprise System Development Summary Notes - by Evans Mwololo
No ratings yet
Enterprise System Development Summary Notes - by Evans Mwololo
16 pages
Vehicle Detection Assignment Report
No ratings yet
Vehicle Detection Assignment Report
4 pages
Task 4 M5 LA3
No ratings yet
Task 4 M5 LA3
5 pages
Panchenthriya Five Senses Meditation by Yogiraj Vethathiri Maharishi - Journey of Consciousness by Krish Murali Eswar
No ratings yet
Panchenthriya Five Senses Meditation by Yogiraj Vethathiri Maharishi - Journey of Consciousness by Krish Murali Eswar
9 pages
Bank Account Management System by Using Java Full Stack
No ratings yet
Bank Account Management System by Using Java Full Stack
9 pages
NO PLNT Tag No Material Description MRP Remark Rop Max Material Code QTY Install Spare Parts Class
100% (1)
NO PLNT Tag No Material Description MRP Remark Rop Max Material Code QTY Install Spare Parts Class
2 pages
A 5 CM Diameter Sphere Solidifies in 1050 S. Calcu...
No ratings yet
A 5 CM Diameter Sphere Solidifies in 1050 S. Calcu...
3 pages
Wireless Sensor Networks
No ratings yet
Wireless Sensor Networks
9 pages
Speaker Session Wise PDF
No ratings yet
Speaker Session Wise PDF
16 pages
Jackson Pollock Style Template: Your Name
No ratings yet
Jackson Pollock Style Template: Your Name
9 pages
Multimedia-Assisted Instruction in Developing The English Language Skills: CBSUA Experience
No ratings yet
Multimedia-Assisted Instruction in Developing The English Language Skills: CBSUA Experience
6 pages
Project Initiation Phase
No ratings yet
Project Initiation Phase
3 pages
Recent Trends or Advances in Embedded Systems From KVKK Prasad
No ratings yet
Recent Trends or Advances in Embedded Systems From KVKK Prasad
10 pages
DTU-SLS
No ratings yet
DTU-SLS
17 pages
Infor M3 (AWS) Early Announcement of Changes Requiring Advance Customer Action Future 20230401
No ratings yet
Infor M3 (AWS) Early Announcement of Changes Requiring Advance Customer Action Future 20230401
73 pages
JBVNL Manual XD01 Customer Master Data Creation - Docx C
No ratings yet
JBVNL Manual XD01 Customer Master Data Creation - Docx C
6 pages
JD-1300E Specification
100% (2)
JD-1300E Specification
14 pages
software-defined-storage-concepts
No ratings yet
software-defined-storage-concepts
41 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Important Da

Uploaded by

Important Da

Uploaded by

PART-A

Define the term “Big Data”.

What is meant by unstructured data? Give examples.

1. Mention the steps involved in EDA.

What are the examples of Stream Sources?

How data mining is used in sales and marketing?

i) Conventional challenges in Big Data

Write short notes on.

Discuss the 6 V’s of Big data.

Explain the terminologies used in big data environments.

Explain the following with neat diagram.

i) Hadoop Version 1.0

Explain how do you process the data in Hadoop?

Explain HDFS operations in detail.

Explain how BigQuery is working? With neat sketch.

Explain how Matrix Multiplication is carried out in MapReduce Algorithm?

Explain the steps involved in Exploratory Data analysis.

Analyse the software tools available for EDA.

Explain the difference between SQL and NoSQL.

A Hive table is created as an external table at location say hdfs://usr/data/table_name. If we

Explain the datatypes used in MongoDB. Explain with example queries.

What is NoSQL? Explain its types and advantages.

What is Hive? Explain the Hive architecture in detail.

What is Sampling Data in a Stream? Explain the General Sampling Problem.

What is Filtering Streams? Explain the Analysis of Bloom Filtering.

Explain Counting Distinct Elements in a Stream.

Explain Estimating Moments. How do you deal with Infinite Streams?

Explain the concept of Counting Ones in a Window.

Explain the Flajolet-Martin Algorithm.

Elaborate on how google and yahoo are handling the streams.

Brief the concept of Combining Estimates.

Explain the role of data mining in Government and healthcare.

Write short notes on data mining approaches used in Telecommunications.

Explain microRNA data analysis case study in detail.

Explain data mining non-tabular data.

Analyse the difference between the various types of analytics.

Create a MapReduce program to analyse the temperature dataset.

Bachelor's Master's Professional Doctorate

3. Structure of 'restaurants' collection:

"coord": [ -73.856077, 40.848447 ],

"street": "Morris Park Ave",

{ "date": { "$date": 1393804800000 }, "grade": "A", "score": 2 },

{ "date": { "$date": 1378857600000 }, "grade": "A", "score": 6 },

{ "date": { "$date": 1358985600000 }, "grade": "A", "score": 10 },

{ "date": { "$date": 1322006400000 }, "grade": "A", "score": 9 },

{ "date": { "$date": 1299715200000 }, "grade": "B", "score": 14 }

"name": "Morris Park Bake Shop",

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.