0% found this document useful (0 votes)

131 views10 pages

It 6001 Da 2 Marks With Answer PDF

This document discusses data analytics and related concepts. It covers: 1. Big data approaches and applications of big data analytics such as marketing, finance, healthcare, etc. 2. Types of data analysis including reporting, which organizes data into summaries, and analysis, which extracts insights from reports. 3. Machine learning techniques including linear regression, Bayesian inference, rule induction, and neural networks. 4. Data stream mining and architectures like Lambda architecture for processing streaming data. 5. Association rule mining to discover relationships between variables in large datasets.

Uploaded by

kumar3544

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

131 views10 pages

It 6001 Da 2 Marks With Answer PDF

Uploaded by

kumar3544

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Data Analytics

IT 6006 DATA ANALYTICS 2 MARKS WITH ANSWER

UNIT-I

1.What is big data approach?

Many It tools are available for big data projects. Organizations whose data workloads
are constant and predictable are better served by traditional database whereas organizations
challenged by increasing data demands will need to take advantage of Hadoop’s scalable
infrastructure.

2.List out the applications of big data analytics.

 Marketing
 Finance
 Government
 Healthcare
 Insurance
 Retail

3.List the types of cloud environment.

 Public cloud
 Private cloud

4.What is reporting?

It is the process of organizing data into informational summaries in order to monitor how
different areas of a business are performing.

5.What is analysis?

It is the process of exploring data and reports in order to extract meaningful insights
which can be used to better understand and improve business performance.

6.List out the cross validation technique.

 Simple cross validation

 Double cross validation
 Multicross validation

7.Write short note on MapReduce?

1
Data Analytics

MapReduce provides a data parallel programming model for clusters of commodity

machines. It is pioneered by google which process 20PB of data per day. MapReduce is
popularized by Apache Hadoop project and used by Yahoo, Facebook, Amazon and others.

8.What is cloud computing?

Cloud computing is internet-based computing. It relies on sharing computing resources

on-demand rather than having local servers or PCS and other devices. It is a model for enabling
ubiquitous, convenient, on-demand network access to a shared pool of configurable computing
resources that can be rapidly provisioned and released with minimal management effort.

9.Describe the drawbacks of cloud computing?

In cloud computing, cheap nodes fail, especially when you have many of them. Mean
time between failures(MTBF) for 1 node = 3 years – MTBF for 1000 nodes = 1 day and
commodity network has low bandwidth.

10.List out the four major types of resampling.

 Randomized exact test

 Cross-validation
 Jackknife
 Bootstrap

2
Data Analytics

UNIT – II

1.What are the three stages of IDA process?

o Data preparation
o Data mining and rule finding
o Result validation and interpretation

2. What is linear regression?

Linear regression is an approach for modeling the relationship between a scalar

dependent variable y and one or more explanatory variables (or independent variables)
denoted X. The case of one explanatory variable is called simple linear regression.

3.Explain Bayesian Inference ?

Bayesian inference is a method of statistical inference in which Bayes' theorem is used

to update the probability for a hypothesis as more evidence or information becomes available.
Bayesian inference is an important technique in statistics, and especially in mathematical
statistics.

4.What is meant by rule induction?

Rule induction is an area of machine learning in which formal rules are extracted from a
set of observations. The rules extracted may represent a full scientific model of the data, or
merely represent local patterns in the data.

5.What are the two strategies in Learn-One-Rule Function.

o General to specific
o Specific to general

6.Write down the topologies of Neural Network.

 Single layer
 Multi layer
 Recurrent
 Self-organized

3
Data Analytics

7.What is meant by fuzzy logic.

More than data mining tasks such as prediction, classification, etc., fuzzy models can
give insight to the underlying system and an be automatically derived from system’s dataset.
For achieving this, the technique used is grid based rule set.

8. Write short note on fuzzy qualitative modeling.

The fuzzy modeling can be interpreted as a qualitative modeling scheme by which the
system behavior is qualitatively described using a natural language. A fuzzy qualitative model is
a generalized fuzzy model consisting of linguistic explanations about system behavior in the
framework of fuzzy logic instead of mathematical equations with numerical values or
conventional logical formula with logical symbols.

9.What are the steps for Bayesian data analysis.

 Setting up the prior distribution

 Setting up the posterior distribution
 Evaluating the fit of the model

10.Write short notes on time series model.

A time series is a sequential set of data points, measured typically at successive times. It
is mathematically defined as a set of vectors x(t), t=0,1,2,… where t represents the time
elapsed. The Variable x9t0 is treated as a random variable.

4
Data Analytics

UNIT - III

1.What is data stream model?

A data stream is a real-time, continuous and ordered sequence of items. It is not

possible to control the order in which the items arrive, nor it is feasible to locally store a stream
in its entirety in any memory device.

2.Define Data Stream Mining.

Data Stream Mining is the process of extracting useful knowledge from continuous,
rapid data streams. Many traditional data mining algorithms can be recast to work with larger
datasets, but they cannot address the problem of a continuous supply of data.

3.Write short note about sensor networks.

Sensor networks are a huge source of data occurring in streams. They are used in
numerous situations that require constant monitoring of several variables, based on which
important decisions are made. in many cases, alerts and alarms may be generated as a
response to the information received from a series of sensors.

4.what is meant by one-time queries?

One-Time queries are queries that are evaluated once over a point-in-time snapshot of
the data set, with the answer returned to the user.

Eg: A stock price checker may alert the user when a stock price crosses a particular price point.

5.Define biased reservoir sampling.

Biased reservoir sampling is defined as bias function to regulate the sampling from the
stream. The bias gives a higher probability of selecting data points from recent parts of the
stream as compared to distant past.

6.What is Bloom Filter?

5
Data Analytics

A Bloom Filter is a space-efficient probabilistic data structure, conceived by Burton

Howard Bloom in 1970, that is used to test whether an element is a member of set. False
Positive matches are possible but false negative are not, thus a Bloom filter has a 100% recall
rate.

7.List out the applications of RTAP.

o Financial services
o Government
o E-Commerce sites

8.Draw a High-Level architecture for RADAR.

9.What are the three layers of Lambda architecture.

o Batch Layer- for batch processing of all data.

o Speed Layer- for real-time processing of streaming data.
o Serving Layer- for responding to queries.

10.What is RTSA?

Real-Time Sentiment analysis (also known as opinion mining) refers to the use of natural
language processing text analysis and computational linguistics to identify and extract
subjective information in source materials.

6
Data Analytics

Unit-IV

1.What is Association Rule Mining?

The Association Rule Mining is main purpose to discovering frequent itemsets from a
large dataset is to discover a set of if-then rules called Association rules. The form of an
association rules is I→j, where I is a set of items(products) and j is a particular item.

2.List any two algorithms for Finding Frequent Itemset.

o Apriori Algorithm
o FP-Growth Algorithm
o SON algorithm
o PCY algorithm

3.What is meant by curse of dimensionality?

Points in high-dimensional Euclidean spaces, as well as points in non-Euclidean spaces

often behave unintuitively. Two unexpected properties of these spaces are that the random
points are almost always at about the same distance, and random vectors are almost always
orthogonal.

4.Write an algorithm of Park-Chen-Yu.

FOR(each basket):

FOR(each item in basket):

add 1 to item’s count;

FOR(each pair of items):

7
Data Analytics

{hash the pair to a bucket;

add 1 to the count for that bucket:}

5.Define Toivonen’s Algorithm

Toivonen’s algorithm makes only one full pass over the database. The algorithm thus
produces exact association rules in one full pass over the database. The algorithm will give
neither false negatives nor positives, but there is a small yet non-zero probability that it will fail
to produce any answer at all. Toivonen’s algorithm begins by selecting a small sample of the
input dataset and finding from it the candidate frequent itemsets.

6.List out some applications of clustering.

o Collaborative filtering
o Customer segmentation
o Data summarization
o Dynamic trend detection
o Multimedia data analysis
o Biological data analysis
o Social network analysis

7.What are the types of Hierarchical Clustering Methods.

o Single-link clustering
o Complete-link clustering
o Average-link clustering
o Centroid link clustering

8.Define CLIQUE

CLIQUE is a subspace clustering algorithm that automatically finds subspaces with high-
density clustering in high dimensional attribute spaces. CLIQUE is a simple grid-based method
for finding density-based clusters in subspaces. The procedure for this grid-baased clustering is
relatively simple.

9.What is meant by k-means algorithm?

8
Data Analytics

The family of algorithms is of the point-assignment type and assumes a Euclidean space.
It is assumed that there are exactly k clusters for some known k. After picking k initial cluster
centroids, the points are considered one at a time and assigned to the closest centroid.

10.Draw the diagram for Hierarchical Clustering.

UNIT-V

1.What are the main goals of Hadoop?

o Saclable
o Fault tolerance
o Economical
o Handle hardware failures.

2.What is hive?

Hive provides a warehouse structure for other Hadoop input sources and SQL-Like
access for data in HDFS. Hive’s query language, HiveQL, compiles to MapReduce and also allows
user-defined functions(UDFS).

3.What are the responsibilities of MapReduce Framework?

o Provides overall coordination of execution.

o Selects nodes for running mappers.
o Starts and monitors mapper’s execution.
o Sorts and shuffles output of mappers.
o Chooses locations for reducer’s execution.
o Delivers the output of mapper to reducers node.
o Starts and monitors reducers’s execution.
9
Data Analytics

4.What is a Key-Value store?

The key-value store uses a key to access a value. The key-value store has a schema-less
format. The key can be artificially generated or auto-generated while the value can be string,
JSON, BLOB, etc. the key-value uses a hash table with a unique key and a pointer to a particular
item of data.

5.What is visualization? What are the three major goals in visualization.

Visual Visualization is the presentation or communication of data using interactive

interfaces. It has three major goals:

 Communicating/presenting the analysis results efficiently and effectively.

 As a tool for confirmatory analysis that is to examine the hypothesis, analyze and
confirm.
 Exploratory data analysis as an interactive and mostly undirected search for
finding structures and trends.

6.What is sharding?

Horizontal partitioning of a large database leads to partitioning of rows of the database.

Each partition forms part of a shard, meaning small part of the whole. Each part can be located
on a separate database server or any physical location.

Clock Angle Problem
No ratings yet
Clock Angle Problem
3 pages
MCQ'S - Business Analytics
No ratings yet
MCQ'S - Business Analytics
42 pages
D-DS-FN-23 Dell Data Science Foundations 2023 Updated Dumps
No ratings yet
D-DS-FN-23 Dell Data Science Foundations 2023 Updated Dumps
16 pages
Debt and Equity Financing
No ratings yet
Debt and Equity Financing
4 pages
Question Bank With Answers
No ratings yet
Question Bank With Answers
103 pages
Radical Optimism
No ratings yet
Radical Optimism
24 pages
Thesis PTSD
100% (3)
Thesis PTSD
4 pages
Knowledge Sharing Personality Traits and Diversity A Literature Review
No ratings yet
Knowledge Sharing Personality Traits and Diversity A Literature Review
6 pages
Authors Men of A Certain Age (MOCA) 'S New Book, "From Then Until Now: Short Memoirs of Eight African American Savannahians," Explores Coming of Age in The Jim Crow South
No ratings yet
Authors Men of A Certain Age (MOCA) 'S New Book, "From Then Until Now: Short Memoirs of Eight African American Savannahians," Explores Coming of Age in The Jim Crow South
4 pages
Midterm Examination Schedule 2nd Sem SY 24-25-1
No ratings yet
Midterm Examination Schedule 2nd Sem SY 24-25-1
8 pages
Mod 1,2
No ratings yet
Mod 1,2
15 pages
Da 2023
No ratings yet
Da 2023
30 pages
Datamining Quiz
No ratings yet
Datamining Quiz
173 pages
Da Last Year
No ratings yet
Da Last Year
21 pages
Dawoodi Bohra Curriculum
No ratings yet
Dawoodi Bohra Curriculum
8 pages
BDA 2 Marks
No ratings yet
BDA 2 Marks
13 pages
DM PYQ Merged
No ratings yet
DM PYQ Merged
26 pages
DA - AKTU Short Answer + Differences
No ratings yet
DA - AKTU Short Answer + Differences
42 pages
Blinded Sun
100% (1)
Blinded Sun
321 pages
Short Quetions Data Analytics
No ratings yet
Short Quetions Data Analytics
15 pages
Data Analytics Quantum
No ratings yet
Data Analytics Quantum
144 pages
BD Question Bank MCQ Answered
No ratings yet
BD Question Bank MCQ Answered
8 pages
DA - QnBank 1
No ratings yet
DA - QnBank 1
15 pages
2021-22 Solution
No ratings yet
2021-22 Solution
28 pages
Question Bank For All 5 Units: Department of Computer Science and Engineering & Department of Information Technology
No ratings yet
Question Bank For All 5 Units: Department of Computer Science and Engineering & Department of Information Technology
14 pages
Unit 3 Question Bank
No ratings yet
Unit 3 Question Bank
8 pages
Data Mining Long Answers
No ratings yet
Data Mining Long Answers
4 pages
Objectives Questions For Data Mining
No ratings yet
Objectives Questions For Data Mining
4 pages
Penatalaksanaan Nyeri Webinar HISFARSI 200222
No ratings yet
Penatalaksanaan Nyeri Webinar HISFARSI 200222
47 pages
DWDM SR2
No ratings yet
DWDM SR2
21 pages
Data Science
No ratings yet
Data Science
13 pages
MCQ On Data Mining
No ratings yet
MCQ On Data Mining
20 pages
Da Bits
No ratings yet
Da Bits
18 pages
Mcqs Unit 3
No ratings yet
Mcqs Unit 3
6 pages
Revision
No ratings yet
Revision
19 pages
Big Data (Imp-Questions)
No ratings yet
Big Data (Imp-Questions)
17 pages
Question Bank (DA) - 1
No ratings yet
Question Bank (DA) - 1
14 pages
DA QnBank Full 17jan22 NoKey
No ratings yet
DA QnBank Full 17jan22 NoKey
16 pages
BE Information Technology 0
No ratings yet
BE Information Technology 0
655 pages
2marks With Answers
No ratings yet
2marks With Answers
10 pages
Datamining Bits
No ratings yet
Datamining Bits
16 pages
2018 & 2019 Data Mining Answers
No ratings yet
2018 & 2019 Data Mining Answers
25 pages
U I Q-A
No ratings yet
U I Q-A
7 pages
G Energy 20200615122815
No ratings yet
G Energy 20200615122815
28 pages
Question Bank With 2 Marks
100% (1)
Question Bank With 2 Marks
21 pages
Data Mining
No ratings yet
Data Mining
8 pages
Tarea de Nacionalidades 3ro
No ratings yet
Tarea de Nacionalidades 3ro
2 pages
79 Hunain Ibn Ishaq
No ratings yet
79 Hunain Ibn Ishaq
44 pages
Data Science MCQs Sample Mid2xlsx 2024 11-29-23!19!54
No ratings yet
Data Science MCQs Sample Mid2xlsx 2024 11-29-23!19!54
8 pages
TC 2411985-20250113 1736774205
No ratings yet
TC 2411985-20250113 1736774205
1 page
DMDW
No ratings yet
DMDW
4 pages
Data Science Interview Best
No ratings yet
Data Science Interview Best
48 pages
Galicia Poland Intro
No ratings yet
Galicia Poland Intro
27 pages
Villasanta April 30, 1957 in Re Charges of LILIAN F. VILLASANTA For Immorality, vs. HILARION M. PERALTA, Respondent
No ratings yet
Villasanta April 30, 1957 in Re Charges of LILIAN F. VILLASANTA For Immorality, vs. HILARION M. PERALTA, Respondent
4 pages
DM MCQS Unit-1
No ratings yet
DM MCQS Unit-1
4 pages
Da CH1 Slqa
No ratings yet
Da CH1 Slqa
6 pages
Data Science Multiple Choice Question
No ratings yet
Data Science Multiple Choice Question
9 pages
More Powerful Names Preview
No ratings yet
More Powerful Names Preview
3 pages
Brawl Stars Quotes - TV Tropes
No ratings yet
Brawl Stars Quotes - TV Tropes
2 pages
Muhammad Arslan 70078092
No ratings yet
Muhammad Arslan 70078092
7 pages
Data Mining IMP Objective Questions - Sep 2023
No ratings yet
Data Mining IMP Objective Questions - Sep 2023
4 pages
Imtiyaz Pasha: Work Experience
No ratings yet
Imtiyaz Pasha: Work Experience
2 pages
Business Analytics
No ratings yet
Business Analytics
11 pages
Data Mining MCQ Multiple Choice Questions With Answers: Eguardian
No ratings yet
Data Mining MCQ Multiple Choice Questions With Answers: Eguardian
15 pages
The Duality of Matter and Waves
100% (5)
The Duality of Matter and Waves
4 pages
AsiaTEFL Book Series-ELT Curriculum Innovation and Implementation in Asia-English Language Teaching Curriculum Innovations and Implementation Strategies Philippine Experience-Dinah F. Mindo PDF
No ratings yet
AsiaTEFL Book Series-ELT Curriculum Innovation and Implementation in Asia-English Language Teaching Curriculum Innovations and Implementation Strategies Philippine Experience-Dinah F. Mindo PDF
30 pages
DMDW Question Bank
No ratings yet
DMDW Question Bank
17 pages
Nptel Bia All
No ratings yet
Nptel Bia All
42 pages
01.ad3491 Fdsa QB
No ratings yet
01.ad3491 Fdsa QB
16 pages
Nptel Week 1 - 2
No ratings yet
Nptel Week 1 - 2
4 pages
Subject Code: 80359 Subject Name: Data Warehousing and Data Mining Common Subject Code (If Any)
No ratings yet
Subject Code: 80359 Subject Name: Data Warehousing and Data Mining Common Subject Code (If Any)
9 pages
Data Warehouse 1
No ratings yet
Data Warehouse 1
21 pages
Answer Midterm Exam Data Mining1 2021 - 2022
100% (2)
Answer Midterm Exam Data Mining1 2021 - 2022
4 pages
CC Unit - 4 Imp Questions
No ratings yet
CC Unit - 4 Imp Questions
4 pages
DWM Mid 2 Question Bank
No ratings yet
DWM Mid 2 Question Bank
5 pages
Business Intelligence and Analytics: Systems For Decision Support, 10e (Sharda) Chapter 5 Data Mining
100% (1)
Business Intelligence and Analytics: Systems For Decision Support, 10e (Sharda) Chapter 5 Data Mining
13 pages
Regression Analysis: IMTC634 - Data Science - Assignment
100% (4)
Regression Analysis: IMTC634 - Data Science - Assignment
7 pages
Competencies Usually Fall Into Two Categories
No ratings yet
Competencies Usually Fall Into Two Categories
11 pages
Deploying RecoverPoint For Virtual Machines 5.0 SP1 With VxRail
No ratings yet
Deploying RecoverPoint For Virtual Machines 5.0 SP1 With VxRail
41 pages
Template For Submitting Question Paper in Hard Copy in Person in Examination Section
No ratings yet
Template For Submitting Question Paper in Hard Copy in Person in Examination Section
10 pages
The Teacher and The Community School Culture and Organizational Leadership PDF
75% (4)
The Teacher and The Community School Culture and Organizational Leadership PDF
15 pages
Walt Whitman's Poem I Hear America Singing A Study of Michael Riffaterre's Semiotics
No ratings yet
Walt Whitman's Poem I Hear America Singing A Study of Michael Riffaterre's Semiotics
8 pages
Tunnel Segment Gasket Design - Solutions and Innovations: Bakhshi, Mehdi and Nasri, Verya
No ratings yet
Tunnel Segment Gasket Design - Solutions and Innovations: Bakhshi, Mehdi and Nasri, Verya
10 pages
LearnEnglish-Listening-A1-A-request-from-your-boss - 1 Activity 3 - Luis Fernandez
No ratings yet
LearnEnglish-Listening-A1-A-request-from-your-boss - 1 Activity 3 - Luis Fernandez
4 pages
Limpangog vs. CA
No ratings yet
Limpangog vs. CA
3 pages
List of Approved Calculators.
No ratings yet
List of Approved Calculators.
2 pages
ICATT Leaflet Final
No ratings yet
ICATT Leaflet Final
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

It 6001 Da 2 Marks With Answer PDF

Uploaded by

It 6001 Da 2 Marks With Answer PDF

Uploaded by

Data Analytics

IT 6006 DATA ANALYTICS 2 MARKS WITH ANSWER

1.What is big data approach?

2.List out the applications of big data analytics.

3.List the types of cloud environment.

6.List out the cross validation technique.

 Simple cross validation

7.Write short note on MapReduce?

MapReduce provides a data parallel programming model for clusters of commodity

8.What is cloud computing?

Cloud computing is internet-based computing. It relies on sharing computing resources

9.Describe the drawbacks of cloud computing?

10.List out the four major types of resampling.

 Randomized exact test

1.What are the three stages of IDA process?

2. What is linear regression?

Linear regression is an approach for modeling the relationship between a scalar

3.Explain Bayesian Inference ?

Bayesian inference is a method of statistical inference in which Bayes' theorem is used

4.What is meant by rule induction?

5.What are the two strategies in Learn-One-Rule Function.

6.Write down the topologies of Neural Network.

7.What is meant by fuzzy logic.

8. Write short note on fuzzy qualitative modeling.

9.What are the steps for Bayesian data analysis.

 Setting up the prior distribution

10.Write short notes on time series model.

1.What is data stream model?

A data stream is a real-time, continuous and ordered sequence of items. It is not

2.Define Data Stream Mining.

3.Write short note about sensor networks.

4.what is meant by one-time queries?

5.Define biased reservoir sampling.

6.What is Bloom Filter?

A Bloom Filter is a space-efficient probabilistic data structure, conceived by Burton

7.List out the applications of RTAP.

8.Draw a High-Level architecture for RADAR.

9.What are the three layers of Lambda architecture.

o Batch Layer- for batch processing of all data.

1.What is Association Rule Mining?

2.List any two algorithms for Finding Frequent Itemset.

3.What is meant by curse of dimensionality?

Points in high-dimensional Euclidean spaces, as well as points in non-Euclidean spaces

4.Write an algorithm of Park-Chen-Yu.

FOR(each item in basket):

add 1 to item’s count;

FOR(each pair of items):

{hash the pair to a bucket;

add 1 to the count for that bucket:}

5.Define Toivonen’s Algorithm

6.List out some applications of clustering.

7.What are the types of Hierarchical Clustering Methods.

9.What is meant by k-means algorithm?

10.Draw the diagram for Hierarchical Clustering.

1.What are the main goals of Hadoop?

3.What are the responsibilities of MapReduce Framework?

o Provides overall coordination of execution.

4.What is a Key-Value store?

5.What is visualization? What are the three major goals in visualization.

Visual Visualization is the presentation or communication of data using interactive

 Communicating/presenting the analysis results efficiently and effectively.

Horizontal partitioning of a large database leads to partitioning of rows of the database.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.