DM Endsem 2023-1
DM Endsem 2023-1
In a
explanation
for data reduction
strategies.
= 300,certain
distribution
of 1000 number of data
Q3=400 and Maximum points, it is found that Q1=20, Q2 CO-1
Object ld Test 1
(nominal)
A
Test 2(ordinal) Test 3 (numeric)
2 Excellent 45
Fair 22
A
Good 54
Excellent 28
Q.2.a)/ Given the foliowing
database, show all rules that one can
ABE. Also give the support and confidence of all the generate from the set CO-2
generated rules.
Tid Itemset
T1 ACD
T2 BCE
T3
ABCE
T4 BDE
T5 ABDE
T6 ABCD
(D(6)
(A(6)) B(5)
C(4) D(3))
AB(5) (ACC4)) AD)S (BC(3) BD(2) CD(2)
ABCD(1)
c) CO-4
Given the following DNAsequence, answer the following questions using
minsup=3.
CO-4
Q.3.a) Discuss the different methods for model evaluation and selection. What is a .632
bootstrap method and fro where his 0.632 has come?
b) Suppose we havea data of afew individuals who have been surveyed. The response
to the promotional offer in the areas is listed below. Using Bayes Classification CO-3
Algorithm classify the sex (output attribute) of a new tuple whose data is
Investment=No, Travel =Yes, Reading =Yes and Health = No,
Investment Travel Reading Health Sex
promotion promotion promotion Promosion
Yes No Yes No Male
Yes Yes No No Male
No Yes Yes Yes Female
No Yes No Yes Male
Yes Yes Yes Yes Female
No No Yes No Femaie
Yes No No No Male
Yes Yes No No Male
No No No
Yes Yes Female
No No No Male
Theorem?2 CO-3
Explain the concept of Bayesian Classification with emphasis on Bayes
If theentropy function has a value is 0, what does this mean? Why do decislone
2
learning algorithms prefer choosing tests which lead to a low entropy?
does not
Assume you apply the decision tree learning algorithm to a data set which
decision tree
contain any inconsistent examples, and you continue growing the
tree which
until you have leafs which are pure. What can be said about the decision
youobtain following this procedure?
CO-4
when the
Q.4 a) Explain the working of Support vector machine with emphasis on case
data is linearly separable.
CO-4
b) Howcan we effectively construct an Ensemble classifier?
whether a
Suppose we have a dataset of individuals, and the task is to predict Age and
person will buy a product (class 1) or not (class 0) based ontwo features:
Income. Show howdoes the Adaboost algorithm tries to solve this probiem.
c) Consider a dataset of students with to fe2tures: hours of study per day (Stuay
Hours) and the number of hours of sleen per day (Sleep Hours). The dataset Co
or failed
Contains binary class labels indicating whether a student passed (Class 1)
(Class 0) an exam.
apply the
i) Using the given dataset and the Euclidean distance metric,
features:
KNN algorithm to classify a new student with the following
Study Hcurs = 4and Sleep Hours =6. Assume K=3.
i) What will happen if you change K=4?
S2 3
6 1
s3
2 9
S4
5 1
S5 7
8 0
S6 5
5.a) Usesingle-link and complete-link agglomerative clustering to Cluster the following co-s
8 examples:
A1=(2,10), A2=(2,5), A3-(8,4), A4=(5,8), A5=(7,5),A6=(6,4), A7=(1,2), A8-(4.9).
both the above methods.
Also show the dendogram obtained in
Use Euclidean distance.
Both k-means and k-medoids
algorithms can perform effective
clustering. Illustrate
of k-means in comparison with the CO-5
the strength and weakness k-medoids
algorithm. Also, illustrate the strength and weakness of these
comparison with a hierarchical clustering. schemes in
)) Prove that density connected and density reachable are reflexive and symmetric inCO-5
DBSCAN algorithm
1i) Explain how DBSCAN and OPTICS finds clusters of arbitraryshape whereas partition cO-5
and hierarchical algorithms fails to find such clusters.