0% found this document useful (0 votes)
2 views

DM Endsem 2023-1

The document outlines the examination paper for the B.Tech (Computer Engineering) 7th Semester in Data Mining, detailing the structure and instructions for candidates. It includes various questions covering topics such as central tendency, data reduction strategies, model evaluation, Bayesian classification, support vector machines, and clustering algorithms. The paper emphasizes the need for students to demonstrate their understanding of data mining concepts and techniques through practical examples and theoretical explanations.

Uploaded by

Sufiyan Beg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

DM Endsem 2023-1

The document outlines the examination paper for the B.Tech (Computer Engineering) 7th Semester in Data Mining, detailing the structure and instructions for candidates. It includes various questions covering topics such as central tendency, data reduction strategies, model evaluation, Bayesian classification, support vector machines, and clustering algorithms. The paper emphasizes the need for students to demonstrate their understanding of data mining concepts and techniques through practical examples and theoretical explanations.

Uploaded by

Sufiyan Beg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

B.Tech (Computer Engineering).

7th Semester, Examination 2023


Data Mining 20BCSO
Max. Marks: 60 Paper Code: CEN - 701
Time: 3 Hours

Instruction to the candidates:


Attempt Any Two parts from each
Each part of the
question carries 6 question.
marks.
Q.i a) i) What do
you understand by central tendency of data? Howitis measured?
) Give a brief CO-1

In a
explanation
for data reduction
strategies.
= 300,certain
distribution
of 1000 number of data
Q3=400 and Maximum points, it is found that Q1=20, Q2 CO-1

distribution? Comment on the value


How do you find
is 450. What can you say about this
skewness of the dataset.
c)
Given the
dissimilarity between nominal attributes and binary attributes? CO-1
most similar? following data show the
dissimilarity matrix. VWhich pair of objects is

Object ld Test 1
(nominal)
A
Test 2(ordinal) Test 3 (numeric)
2 Excellent 45
Fair 22
A
Good 54
Excellent 28
Q.2.a)/ Given the foliowing
database, show all rules that one can
ABE. Also give the support and confidence of all the generate from the set CO-2
generated rules.
Tid Itemset
T1 ACD
T2 BCE
T3
ABCE
T4 BDE
T5 ABDE
T6 ABCD

b) i) State TRUE of FALSE with proper


reason /example no marks willbereasons example wherever required (without
awarded). CO-2
a) Maximal frequent itemsets are sufficient to
itemsets with their supports. determine all frequent
b) The set of all
maximal frequent sets is the set of longest
itemsets. possible frequent
ii) Given the following lattice of frequent itemset along with
their support.
List all the closed itemsets and max
itemsets.

(D(6)

(A(6)) B(5)
C(4) D(3))
AB(5) (ACC4)) AD)S (BC(3) BD(2) CD(2)

ABC(3)) ABD(2) ACD2) BCD(1)

ABCD(1)

c) CO-4
Given the following DNAsequence, answer the following questions using
minsup=3.

i) Find the maximal irequent sequences


i) Find the ciosedfrequent sequences
Si: ACGTCACG
S2: TCGA
S3: GACTGCA
S4: CAGTC

CO-4
Q.3.a) Discuss the different methods for model evaluation and selection. What is a .632
bootstrap method and fro where his 0.632 has come?

b) Suppose we havea data of afew individuals who have been surveyed. The response
to the promotional offer in the areas is listed below. Using Bayes Classification CO-3
Algorithm classify the sex (output attribute) of a new tuple whose data is
Investment=No, Travel =Yes, Reading =Yes and Health = No,
Investment Travel Reading Health Sex
promotion promotion promotion Promosion
Yes No Yes No Male
Yes Yes No No Male
No Yes Yes Yes Female
No Yes No Yes Male
Yes Yes Yes Yes Female
No No Yes No Femaie
Yes No No No Male
Yes Yes No No Male
No No No
Yes Yes Female
No No No Male
Theorem?2 CO-3
Explain the concept of Bayesian Classification with emphasis on Bayes

If theentropy function has a value is 0, what does this mean? Why do decislone
2
learning algorithms prefer choosing tests which lead to a low entropy?
does not
Assume you apply the decision tree learning algorithm to a data set which
decision tree
contain any inconsistent examples, and you continue growing the
tree which
until you have leafs which are pure. What can be said about the decision
youobtain following this procedure?
CO-4
when the
Q.4 a) Explain the working of Support vector machine with emphasis on case
data is linearly separable.
CO-4
b) Howcan we effectively construct an Ensemble classifier?
whether a
Suppose we have a dataset of individuals, and the task is to predict Age and
person will buy a product (class 1) or not (class 0) based ontwo features:
Income. Show howdoes the Adaboost algorithm tries to solve this probiem.
c) Consider a dataset of students with to fe2tures: hours of study per day (Stuay
Hours) and the number of hours of sleen per day (Sleep Hours). The dataset Co
or failed
Contains binary class labels indicating whether a student passed (Class 1)
(Class 0) an exam.
apply the
i) Using the given dataset and the Euclidean distance metric,
features:
KNN algorithm to classify a new student with the following
Study Hcurs = 4and Sleep Hours =6. Assume K=3.
i) What will happen if you change K=4?

Sleep Hours Class


Student Study Hours
7 1
S1 5

S2 3
6 1
s3
2 9
S4
5 1
S5 7
8 0
S6 5

5.a) Usesingle-link and complete-link agglomerative clustering to Cluster the following co-s
8 examples:
A1=(2,10), A2=(2,5), A3-(8,4), A4=(5,8), A5=(7,5),A6=(6,4), A7=(1,2), A8-(4.9).
both the above methods.
Also show the dendogram obtained in
Use Euclidean distance.
Both k-means and k-medoids
algorithms can perform effective
clustering. Illustrate
of k-means in comparison with the CO-5
the strength and weakness k-medoids
algorithm. Also, illustrate the strength and weakness of these
comparison with a hierarchical clustering. schemes in
)) Prove that density connected and density reachable are reflexive and symmetric inCO-5
DBSCAN algorithm
1i) Explain how DBSCAN and OPTICS finds clusters of arbitraryshape whereas partition cO-5
and hierarchical algorithms fails to find such clusters.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy