0% found this document useful (0 votes)

19 views12 pages

DM-I Q Paper 2024

This document is a question paper for a Data Mining I course, containing various sections and questions related to data mining concepts, techniques, and applications. Students are required to answer a compulsory question and select additional questions from other sections, covering topics such as clustering, classification, and association rules. The paper includes practical tasks, theoretical questions, and data analysis scenarios to assess students' understanding of data mining principles.

Uploaded by

viyaasingh66

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views12 pages

DM-I Q Paper 2024

Uploaded by

viyaasingh66

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Moj

[This question paper contains 12 printed pages.]

Your Roll No...

Sr. No. of Question Paper : 4192 H

Unique Paper Code 2343012005

Name of the Paper Data Mining I

Name of the Course : B.Sc. (Hons.) Computer

Science

Semester : IV

Duration:3 Hours Maximum Marks : 90

Instructions for Candidates

30
1. Write your Roll No. on the top immediately on receipt
of this question paper.

2. Section A (Question No. 1) is compulsory.

3. Attempt any four questions from Section B

(Questions 2 to 7).

4. The use of a simple calculator is allowed.

S. Parts of the question must be answered together.

P.T.0.
tnecanaratio) or isolation.

4192 2
3
4192
Section A dataset, which contains six
(d) Consider the given
attributes: Age and Salary.
objects, cach with two
(a) Differentiate between the unsupervised and cluster the given
K-means clustering is used to
1.
supervised evaluation measures used for cluster with applying K
objects. Do you see any issue
validity. If yes, then state the
(3) means to the given dataset?
preprocessing
(b) What is the
issue. Also apply the appropriate
anti-monotone property of the support explicitly
technique to overcome it. If no, state
measure in association rule mining? Does the required. (4)
that no preprocessing technique is
confidence measure follow anti-monotone
property?
(2+) (3) Age
(in years)
Salary
(in rupees)
62000 :43
(c) Consider a dataset with two class labels, News Object 1 40 2
and Entertainment, and six labeled documents D1 24 48000
Object 2
30 2 54000
D6. A new document, D7, is to be classified. The Object 3
Object 4 35 67000
similarity values of D7 with D1, D2, D3, D4, D5
Object 5 46 80000
and D6 are 0.75, 0.85, 0.66, 0.87, 0.70 and 0.84
Object 6 34 66000
respectively. Using the k-Nearest Neighbor
classifier, predict the class label that should be
assigned to D7 when k=3. Will the predicted class (e) Define the curse of dimensionality. The Iris flower
label change with k-5? (i++|+)(4) dataset comprises of 150 data points and four
features, namely sepal length, sepal width, petal
Document Class Label
D1 News
width, and petal length. Is it a high-dimensional
D2 Entertainment
data or low-dimensional data? Justify your answer.
D3 Entertainment (2t1) (4)
D4 News
DS
() Consider a decision tree to classify the health of
News
D6 Entertainment an individual as Fit or Unfit given below :

P.T.0.
4
4192
4192 5

Age < 30 ?
(ii) Grouping the customers of a company

Yes NO according to their buying interests. )

Smo kes/
Drinks ?
Workout ? (i1i)) Finding a group of genes such that genes
in each group have related functionality. D
NO Yes No
Yes

(iv) Using historical data from previous financial

UnFit Fit Fit
Diet Control ? statements to project sales, revenue, and
expenses for a company. 2o
Yes No

Fit UnFit
(h) Given two objects X = (22, 1, 42, 10) and
Y =(20, 0, 36, 8), compute the distance between
these two objects using the following distance
(i) Extract all classification rules from the measures :

decision tree.
(i) Euclidean Distance670
(ii) Classify the following object:
(ii) Manhattan Distance (4)
Age = 50, Workout = No, Smokes/ Drinks =
No, Diet Control = No, Health =? (4)
Section B

(g) Classify the following tasks as "predictive" or

2. (a) Given the following training dataset, compute all
"descriptive". Justify your answer. (4)
class conditional and prior probabilities. Use the
(1) Foretelling whether an online user will shop
Naive Bayes approach to predict the class label
on Flipkart for a specific item.
(Salary) for the test instance : (12)

P.T.0.
4192 4192 7

Education Levcl = PG, Career

Years of Experience = 3 to 10 Management, ID
Dept. Name Location Establish Size Annual
Budget
ed On
5-01-2020 Large 460
Finance Nehru
DP12
Place
DP19 Marketing Nehru 8-08-2020 Medium 300
Education Career Years of Salary Place
Level Experience Hauz Khas 2-01-2020 Medium 240
Less than 3 DP21 Human
Management Low
UG Resource
290
Management 3 to 10 2-02-2020 Mediun
UG Low DP27 Production
4-07-2021 Small 90
PG Management Less than 3 High DP33 Research Nehru
Development Place
PG Service More than 10 Low Information Hauz Khas 6-08-2020 Mediun 210
Service 3 to 10 Low DP39
UG Technology
Service 3 to 10 9-09-2020 Large 510
PG High DP41 Sales Nehru
Management More than 10 Place
PG High Hauz Khas 2-10-2020 Medium
PG Service Less than 3 Low DP52 Customer
Service
UG Management More than 10 High DP55 Public Nehru 3-03-2021 Large 900
UG Service More than l0 Low Relations Place

Le * Annual Budget is In Lakhs

b) Adata mining application uses a particular type
of data. Give one application for each of the (i) Identify the type of attributes ID, Dept.
following type : (3) Name, Location, Established On, Size, and
6r(3+4) Annual Budget as nominal, ordinal, interval,
() Sparse dataset
or ratio. Give justification for each. (6)
(i) Spatio-Temporal data
(ii) Suggest a technique for dealing with
(ii) Graph-based data missing values in the attribute Location.
stt:) Will the same technique apply to the
3. (a) Consider the following dataset having details about attribute Annual Budget? Justify. (3)
different departments of a company :

P.T.0.
ause th/

8
4192 4192
What is an outlier? Spot an outlier in
the (c) Enumerate all association rulcs gencrated from the
provided dataset.
(3) largest frequcnt itemset found in cach datasct scan.
Compute the confidence of cach generatcd rule.
(b) What is the need for sampling in data mining ? Assuming that the minimum con fidence threshold
What problems ariseAb
ifmthe
ud sample
g l size is too
small is 70%, find all the strong association rules.
(HI+) u
or too large? (6)
(3) H,co’ C
D P S S

H,Ca co
Co,’ H, ca
5 (a) A medical team develops classification models for
4. Consider the following transactional data of a grocery
predicting the occurrence of a "genetic disorder"
store :
using C1assifier A and Classifier B. Patients having
genetic disorders are considered positive
Transaction ID Items
instances. In contrast, negative instances are ones
Tl Boots, Hoodie, Gloves
T2 Boots, Hoodie with the absence of genetic disorders. The
T3 Hoodie, Coat, Cardigan classifiers were tested on data from 500 patients
T4 Cardigan, Coat and then obtained the result as:
T5 Cardigan, Gloves
T6 Hoodie, Coat, Cardigan Actual Label
Presence of Absence of
Genetic Genetic
Disorder Disorder
(a) What is the maximum number of rules that can be Classifier A, predicted
extracted from this data (including rules that have "presence of genetic 131 TP 155 FP|
disorder"

(3) Classifier A, predicted

zero support). "absence of genetic 19 195 TN
disorder"
Classi£ier B, predicted
(b) Use the Apriori algorithm on the given transactional "presence of genetic
disorder"
82 72

dataset and compute the candidate and frequent Classifier B, predicted

"absence of genetic 68 278
itemsets for each dataset scan. Assume a support disorder"

threshold of 33.34%. (6)

P.T.0.

NS
4192 10 4192 11
226 -652 (i) List the confusion matrix for "Classifier
AcA Age Fever
ID BD
A" and Classifier B'". Find the accuracy, Pl Young Yes High Outcorne
Ace P2 Young No
High
In IC
Hospital ized
precision, sensitivity, recall and specificity P3 Elderly Yes
High In ICU
P4 Middle Yes
aged Moderate In ICU
for each classifier. (8) P5 Middle No
R|SA 87- 3 aged
High Home Care
P6 Middle
SA SS. 7) (ii) What problem may occur if the provided aged
Yes
Moderate In ICU
P7 Elderly No Moderate In ICU
training dataset of 500 patients had only P8 Elderly No High
P9 Elderly Yes Deceased
15 positive instances and the remaining Pl0
High In ICU
R|S= sy6 Young No High
BD: Breathing Difficulty Hospitalized
negative instances? Which performance
Sha= 79.42 measure would you choose to evaluate the
Age (a) Compute the Gini Index of Age, Fever, and
BD
classifiers in such a scenario? Which is
attributes. Given that you construct a decision tree
the better classifier between Classifier A -qlG using the Gini Index as the splitting criteria, which
and Classifier B in such a scenario? of the three attributes would you choose at
the
root? Justify your choice.
(4) 0i36
BD 3x2stI:s) (9)
au
644 (b) Compute the Gini Index of ID. Why should it not
(b) Consider a categorical attribute Grade with three T496
be used as a splitting attribute for
values {A, B, and C}. Convert this attribute to constructing a
decision tree? (3)
asymmetric binary attributes. (3)
(c) Given ten objects in the dataset (Pl -P10),
mention all train and test distributions for
6. Consider the given COVID-19 dataset of ten
performing k-fold cross-validation. Assume the
patients.
value of k= 5. (3)

P.T.0.
4192 12

7. Given a dataset with six records about startup

companies, cach record has two fields: Number of
Clients and Annual Turnover. Assuming that k 2
and initial cluster centres as the first two records.
compute the cluster centres of the resulting clusters
until the stopping criterion is met. Use Euclidean
distance as the distance metric. Also, compute the
SSE (Sum of Squared Error) of cach generated cluster.

Nunber of Annual Turnover

Clients (in Lakhs)
185 72
170 56 C
168 60
Ce
179 68
182 72
188 77

Sst B5.58
SStG q.q4

5+2t 5+3-’ eerSSE

usteva ltey
Custo
G(183-s, 72-2r)
G(69,) (1000)
-June 2024
B.Sc. (Hons.)Computer Science
Unique Paper Code: 2343012005
May
S. No of Ques Paper: 4192
Data Mining-I Solution Set
Section A
Q1 (a) UnsuperVised measures evaluate the goodness of a clustering
structure
They are often called internal indices becausc thev use only information presentwithout respect to external information.
in the dataset. Example: SSE, Cluster
Cohesion or compactness, Cluster separation or isolation.
Supervised measures evaluate the clustering structure discovered by aclustering algorithm with respect to Some external
Structure. They are often called external indices because they use information not present in the dataset. Example:
Entropy
2marks for proper difference, mark cach for a correct example
Q(b) A measuref possesses the antimonotone property iffor everyitemset X that is a proper subset of itemset Y, 1.¬.
ACY,we have f(Y)<f(X). Support (s) follows anti-monotone nroperty as the support of an itemset never exceeds the
support for its subsets.

VX,Y:(c )’s(X)2 s()(2 marks)

Confidence measure does not follow anti-monotone property. (1 mark)
Ql(c)When k=3, the nearest neighbors are D2, D4 and D6. The label *Entertainment" should be assigned to D/.(2
marks: 1 mark for correct nearest neighbors, l mark for correct
label)
When k5, the nearest neighbors are Di, D2, D4, D5 and D6. The label News" should be assigned to D7. (2 marks: I
mark for listing the correct nearest neighbors, 1 mark for correct label)

Q(d) 2marks for stating the issue and the preprocessing technique. 1 mark for normalized value of Age
attribute, 1mark for normalized value of Salary attribute
Yes, there would be an issue
applying K-means clustering on the give data. The range of the given attributes Age and
Salary are 24-46 and 48000- 80000 respectively. The salary attribute with larger values will dominate the computation
of Euclidean distance. Using K-means on the given dataset as it is would be incorrect because of its biasedness towards
"Salary" attribute. We can apply min- maxnonalization on the given data before using it for k-means clustering.
Min (Age) = 24; Max (Age) = 46; Min (Salary) = 48000 Max (Salary) = 80000
a'=
I min(c)
Min-Max Normalization max(z) - min(z)

Age Normalized Values of Age Salary Normalized Values of Salary

Object 1 40 0.727273 62000 0.4375
Object 2 24 48000
Object 3 30 0.272727 54000 0.1875
Object 4 35 0.5 67000 0.59375
Object 5 46 80000
Object 6 34 0.45454S 66000 0.5625

O1 (e) 2 marks for defining curse of dimensionality and its associated issues; 1 mark for stating
dimensionality:
1mark for justification of dimensionality
Lowdimensional data as it has only 4 attributes with 150data points. The number of observations are
larger as compared to the number of features.
significantly
Q1()3 marks for classification Rules, 1 mark for correct prediction
Classification Rules
(Age < 30=Yes) ^ (Smokes/ Drinks = Yes) ’ Health = UnFit
(Age < 30 = Yes)A(Smokes/ Drinks = No) ’ Health = Fit
(Age < 30 = No) A(Workout =Yes) ’ Health - Fit
(Age < 30 - No) A(Workout = No) A(Diet Control = Yes) ’ Health = Fit
(Age < 30 = No) A(Workout = No) A(Diet Control = No) > Health UnFit
The object Age S0, Workout = No, Smokes/ DrinkS =No, Diet Control = No, will be
classified as Health = UnFit
O1 (g) (½ marks for correct answver and 2 marks for justification of each part)
() Predictive
(i) (iii) Descriptive
Descriptive (iv) Predictive

Q1 (h)2 marks for Euclidean distance (formula +

computation). computation), 2 marks for Manhattan distance (iormula t
() Distance (X, Y) =/ (22-20)² + (1- 0)2 +(42-36)2 + (10 - 8)2 = V45 6.70
(ii) Distance (X, Y) = |22-20| + |1-0| + |42-36| + |10-8| =11
Section B
Q2 (a) 8marks if all the mentioned probabilities are present and correct (½ mark for each)
P (Salary =Low) =6/10 =3/5
P (Salary = High) = 4/10= 2/5
P(Education Level = UG|Salary = Low) =4/6=2/3
P (Education Level = UG|Salary = High) = 1/4
P (Education Level = PG|Salary =Low) =1/3 Pctow)
P (Education Level PG|Salary = High) =3/4 v P(Low
Ix) P(xlLow)
P (Career= Management | Salary = High) = 3/4 v
P (Career = Management | Salary = Low) = 1/3 -
P (Career= Service | Salary= High) = 1/4
P (Career -Service | Salary = Low) =4/6=2/3 /H
P (Years of Experience = Less than 3 | Salary -High) = 1/4 3j4 /3
P (Years of Experience =3 to 10|Salary -High) = 1/4
P(Years of Experience- Morethan 10 |Salary -High) =2/4 - 1/2
213
M
P(Years of Experience = Less than 3 | Salary =Low) =2/6 = 1/3
P (Years of Experience = 3 to 10| Salary -Low) = 2/6= 13 - LT3
213
P(Years of Experience = More than 10 | Salary -Low) =2/6 = 1/3 Btolo
Let P(Education Level =PG, Career =Management, Years of Experience -3 to, l0) =k(I mark)
P(EducationLevel= PG |High) * P(Career Management | High) * P(Years of Exp - 3to 10| High) * P(High) =%
* , * y * 2/5 =0.056 /k (1 mark)

P(Education Level = PG| Low) * P(Career -Management | Low) * P(Years of Exp - 3to 10| Low) *P (Low) -2/3
* 2/6 * 2/6 * 3/5 = 0.04 / k (I mark)
As 0.056/k >0.04/k, the instance will be predicted with Salary "High" (l mark)
Q2. (b) 1mark for each part. There could be multiple applications for a particular data type.
() Market basket analysis
(ii) Weather Data
(ii) Molecule Data, Data for web page linking
Q3. (a) (i) ½ mark for correct type, ½ mark for justification
ID: Nominal; Dept. Name: Nominal; Location: Nominal: Established On: Interval; Size: Ordinal; Annual Budget:
Ratio

(i) 1%mark for correct answer for "Location" attribute, I½mark for eorrect answer for Annual Buduet
attribute.

You can ignore the missing value in the attributeLocation"

For missing value in Annual Budget, youcan compute the missing value by replacing it with the mean value of the
atribute "Annual Budget" (Mcan to be used as replacement value can also be calculate for all the departments whose
size is "medium",)
iii) 1V, mark for definition of an outlier, 1% mark for spotting the outlier.An outlier is a point that
differs
sipniicantly fron other observations in the dataset. In the given dataset, DP55 can be regarded as an outlier with an
eNceptionally highvalue in the "annual budget" atribute.
03. (b) 1 mark for need of sampling. Imark for problem with large sample size, I mark for problems with
small
sample size
Sampling is used indata mining because procssing the entire of data ofinterest is too cxpcnsive or limc consuming
set
In some cases, using a sampling can reduce the
can be used.
dataset size tothe nint where abetter but more cxpensivc
algonn
Large sample size increases the probability that the sample will bhe representative but they liminate much of
the advantage of sampling.
IWith smaller sample sizes patterns nmay be misscd or crroneous patterns
Q4. (a)I mark for stating unique items,lmark for correct formula, 1mark may bc dctectea.
for correct caleulation
There are 5 unique items in the given transactional data (Boots Hoodie. Gloves. Coat, Cardigan). Let us call these
nique items d. Total number rules that can be made from these items can be computed using the 3d -24l + | = 180.
Thus, a total of 180 rules can be made.
(b) 2marks for each correct dataset scan
Support Threshold = 33.34% => At least 2 transactions
For k = I For k = 2
Item
Support Item Support Frequent
Count Count
Boots {Boots, Hoodie} Yes
Hoodie 4
(Boots, Gloves No
Gloves No
{Boots, Coat}
Coat 3 No
{Boots, Cardigan} 0
Cardigan 4 (Hoodie, Gloves} No
AlI items qualify the minimum (Hoodie, Coat} 2 Yes
support threshold.
{Hoodie, Cardigan} 2 Yes
{Gloves, Coat} No
{Gloves, Cardigan } 1 No

{Coat, Cardigan} 3 Yes

For k=3

Item Support Count Frequent

{Hoodie, Coat, Cardigan} 2 Yes
No need to go for k=4, since the largest transaction has only 3 items.
(c) 3 marks for generating all the rules, 2 marks for generating all the confidence scores, 1 mark for listing strong
rules

Since, {Hoodie, Coat, Cardigan} is the largest frequent itemset, we will generate all the rules from the same.
Rule Confidence
(Hoodie} O(Coat, Cardigan} 2/4 = 0.5
2/2 = 1
(Hoodie, Coat} 0 (Cardigan}
2/2 = 1
{Coat) 0{Hoodie, Cardigan} 2/3 =0.66
{Coat, Cardigan} D{Hoodie}
{Cardigan} 0 {Hoodie, Coat} 2/4 = 0.5
|(Hoodie, Cardigan} D(Coat}) 2/2 = 1

As confidence threshold is 70%, the strong rules are: {Hoodie, Coat’ {Cardigan}; {Coat}->{Hoodie, Cardigan}:
(Hoodie, Cardigan}’{Coat}
Os. (a) (i)½mark each for confusion matrix; % mark each for accuracy; Imark if stated both recall and
sensitivity; Imark each for precision and specificity; Allof theabove for both classifier A and Classifier B
Classifier A

TP =131 FP 155
FN 19 TN = 193
TP+TN TP TP
Accuracy 65.2 Precision= = 45.8 Recall/ Sensitivity =87.3
TP+FN+FP+T TP+F TP+F

131t19s 32%
131ss+19411s
TN
Specificity TN+F 55.71
Classifier B
TP= 82
FN= 68 FP=72
TN= 278
TP4TN
Accuracy 72 Precision T
TP+FN4FP4T S3.24 Rccall/ Sernsitivity TP
54.6

Specificity TN
TN4FP 79.42

( ) MMarking Seheme: 1mark for

calculating F1 scorc of Classificr Amentioning
class imbalance. I mark for
and Classifer B. 1 mark for mentioning PI-sere. ma
mentioning thebetter classer.
The problem of class imbalance would oceur ifin a
a scenario, dataset of S00 patients there are only 15positive instances.
F-measure/ F1-score should be evaluated.
Fl-score 2"Precision"Recall
Precision +R
F1-Score of Classifier A - 60.08 F1-Score of Classifier B 53.91
Classifier A is better in such a scenario.
Q5 (b) Conversion to asymmetric binary variables (3 marks)
Categorical Value X2 X3
A
1
B 0 (

Q6. 2 ½marks for computing each attribute correctly 1%marks for correct choice at root
(a) Computation of Gini Index for Age Attribute

It has three possible values of Young (3 examples), Middle Aged (3 examples) and Elderly (4 examples).
For Age = Young, there are 2 examples with Hospitalized" and I with Admitted to ICU".
Gini(S) = 1 -[(2/3) +(V3)] =0.444
For Age = Middle Aged, there are 2 examples with Admitted to ICU" and Iexample with Home Care"
Gini(S) =l- [(2/3)² +(1/3)] =0.444
For Age = Elderly, there are 3 examples with Admitted to ICU" and I example with "Deceased'"
Gini(S) = 1 -[(3/4) + (1/4)] = 0.375
Weighted Average: 0.444 *(3/10)+ 0.444 *(3/10) +0.375 *(4/10) = 0.416
Computation of Gini Index for Fever Attribute
It has twO possible values of Yes (5 examples) and No (5 examples).

For Fever = Yes, there are 5 examples, all with Admitted to ICU",
Gini(S) = l -[(5/5)]=0
For Fever = No, there are 2 examples with "Hospitalized", 1example with Deceased", *Home Care" and *Admitted
to ICU" each
Gini(S) = 1-[(2/5) +(1/5) +(1/S)? +(1/5)]=0.72
Weighted Average: [Yes] 0 * (5/10) + [No] 0.72 * (5/10) =0.36
Computation of Gini Index forBreathing Difficulty' Attribute
It has two possible values of High (7 examples) and Moderate (3 examples).
For Breathing Difficulty = "Moderate", there are 3 examples with Admitted to ICU".
Gini(S) =1-[(3/3)]=0
Rreathing Difficulty = High, there arc 2
example with " Home
Gini(S) =1-[(2/7) +Care",
exaples with "Hospitalived.3
(3/7) *Deceased"
+(/7 + cach examples with "Admitted to IC0 aid
Weighted Average: 0 * (3/10) + 0.694 *(/7|-0.694
Fever attribute is selected as it (7/10) - 0.486
has the smallest Gini
(b)
Computation of Gini Index: 2 marks; Reason for index.
not
The Gini for each ID value is 0. sclccting ID: 1mark
new patients will be Therefore. the overall Gini for D is 0 The |D atribute has
allocated to new ID's. no predictive po
(c) Fold 1: (P1,P2), Fold 2:
(P3, P4), Fold 3: (P5, P6). Fold 4:
Train: Fold1,Fold2, Fold3, Fold4 (P7, P8), Fold 5: (P9, PT0)
Train: Fold2, Fold3, Fold4, FoldS Test: Fold 5
Train: Fold 1, Fold3, Fold4, Folds Test: Fold 1
Train: Fold1,Fold2, Fold4, Fold5 Test: Fold 2
Train:Fold1, Fold2, Fold3, Folds Test: Fold 3
Test: Fold 4
Q7. Iteration 1& resulting clusters: 5
marks: Computing new cluster centroids: 2 marks;
resulting clusters: 5 marks; Computing SSE: 3 marks Tteratlon z
Given K=2, Instance 1 :C1, Instance 2: C2
Iteration 1 (5 marks)
Instance Number of Annual Distance from C1 Distance from C2 Assigned Cluster
Clients Turnover
185 72 C1
12 170 56 C2
13 168 60 20.80 4.47 C2
14 179 68 7.21 15 C1
I5 182 72 3 20 C1
16 188 77 5.83 27.65 C1
The resulting clusters after first iteration
Cluster 1:I1, 14, I5, 16 Cluster 2: 12 and I3
Computation of New Centroids: (2 marks)
Cl=
185+179+182+1
72+68+72+77 ) =(183.5, 72.25)
4

C2 =(170+1
4
S6+60 (169, 58)
4

Iteration 2: (5 marks)
Instances Number of Annual Distance Distance Assigned Cluster
Clients Turnover from C1 from C2
185 72 1.52 21.26 C1
12 170 56 21.12 2.23 C2
13 168 60 19.75 2.23 C2
14 79 68 6.18 14.14 C1
182 72 1.52 19.105 C1
16 88 77 6.54 26.87 C1
The resulting clusters after second iteration
Cluster 1:I1,14, I5, 16 Cluster 2: 12 and 13 There is no change in the clusters. So we stop.
Computing the SSE (3 marks)
sSE of Cluster 1: -(Distance of ll from C1) +(Distance of 14 from Cl (Distance of I5 from C1 + (Distance of I6
from Cl =(1.52)² + (6.18) + (1.52) + (6.54) = 2.3104 +38.1924 +2.3 104 +42.7716 - 85.5848
SSE of Cluster 2: -(Distance of I2 from C2)' +(Distance of 13 from C2)² = (2.23)2 + (2.23)2 = 9.9458

Isp565 - Its665 Feb 22
No ratings yet
Isp565 - Its665 Feb 22
17 pages
Data Mining Exam Questions
No ratings yet
Data Mining Exam Questions
25 pages
AIML_UNIT-4
No ratings yet
AIML_UNIT-4
82 pages
Practical Guide and Concepts Data Mining
No ratings yet
Practical Guide and Concepts Data Mining
63 pages
DM 2023
No ratings yet
DM 2023
8 pages
DM 2019
No ratings yet
DM 2019
7 pages
Data Mining
No ratings yet
Data Mining
7 pages
DM 2022
No ratings yet
DM 2022
4 pages
dm23
No ratings yet
dm23
8 pages
Data Mining Exam
No ratings yet
Data Mining Exam
14 pages
Sheet With Answers
No ratings yet
Sheet With Answers
87 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
30 pages
HW 2
No ratings yet
HW 2
7 pages
DM_Practice_Problem_Set-2
No ratings yet
DM_Practice_Problem_Set-2
7 pages
SMAI Question Papers
No ratings yet
SMAI Question Papers
13 pages
2021_Data Mining DU CBCS
No ratings yet
2021_Data Mining DU CBCS
4 pages
Comp 1942 finalExamQuestion-2019
No ratings yet
Comp 1942 finalExamQuestion-2019
14 pages
Comp 1942 finalExamQuestion-2016
No ratings yet
Comp 1942 finalExamQuestion-2016
11 pages
2CSOE03-O_IR_December_2023 (2)
No ratings yet
2CSOE03-O_IR_December_2023 (2)
4 pages
Uct633 Mst e Mar25
No ratings yet
Uct633 Mst e Mar25
2 pages
DataMining - Workbook MCQ
No ratings yet
DataMining - Workbook MCQ
16 pages
DM Endsem 2023-1
No ratings yet
DM Endsem 2023-1
4 pages
3 Marks Dobara
No ratings yet
3 Marks Dobara
6 pages
Data Mining and Warehousing22
No ratings yet
Data Mining and Warehousing22
3 pages
HW_02
No ratings yet
HW_02
3 pages
DWDM Unit Wise Question Bank
No ratings yet
DWDM Unit Wise Question Bank
8 pages
Exercises ML PDF
No ratings yet
Exercises ML PDF
4 pages
PCCCS504 Module 4
No ratings yet
PCCCS504 Module 4
4 pages
DWM - END SEM LAB Questions
No ratings yet
DWM - END SEM LAB Questions
9 pages
Data Mining - Sem 3 - Assignment - 2
No ratings yet
Data Mining - Sem 3 - Assignment - 2
5 pages
data_mining_end_23_24
No ratings yet
data_mining_end_23_24
2 pages
Mid-Sem Model Answer 7
No ratings yet
Mid-Sem Model Answer 7
5 pages
COSC 6335 Data Mining (Dr. Eick) Solution Sketches Midterm Exam October 25, 2012
No ratings yet
COSC 6335 Data Mining (Dr. Eick) Solution Sketches Midterm Exam October 25, 2012
11 pages
Key3 DM
No ratings yet
Key3 DM
4 pages
Assignment I
No ratings yet
Assignment I
4 pages
Data Final
No ratings yet
Data Final
17 pages
Harmon Ifcc Ppt
No ratings yet
Harmon Ifcc Ppt
34 pages
Data Mining Sample Midterm Questions (Last Modified 2/17/19)
No ratings yet
Data Mining Sample Midterm Questions (Last Modified 2/17/19)
4 pages
hw2 2011spring
0% (1)
hw2 2011spring
3 pages
Multiple Choice Questions
No ratings yet
Multiple Choice Questions
56 pages
DM QB
No ratings yet
DM QB
7 pages
Assignment Data Mining
No ratings yet
Assignment Data Mining
27 pages
DM Quiz2 Ans DJ
No ratings yet
DM Quiz2 Ans DJ
4 pages
Mid Semester Regular-DM
No ratings yet
Mid Semester Regular-DM
3 pages
6 (4 Files Merged)
0% (1)
6 (4 Files Merged)
4 pages
Presentation of Tables Graphs and Maps
No ratings yet
Presentation of Tables Graphs and Maps
64 pages
Besavilla: Review Center
100% (1)
Besavilla: Review Center
33 pages
Pamela or Virtue Rewarded
No ratings yet
Pamela or Virtue Rewarded
507 pages
Exam-dm1-121017-ans
No ratings yet
Exam-dm1-121017-ans
8 pages
ML MID-1 Question Bank
No ratings yet
ML MID-1 Question Bank
6 pages
dsa _dk question paper
No ratings yet
dsa _dk question paper
4 pages
COMP1942 Question Paper
No ratings yet
COMP1942 Question Paper
7 pages
QB Students DM
No ratings yet
QB Students DM
12 pages
Data mining algorithms - exam 23/24
No ratings yet
Data mining algorithms - exam 23/24
5 pages
Is Zc415 Ec-2r First Sem 2013-2014
No ratings yet
Is Zc415 Ec-2r First Sem 2013-2014
2 pages
B._Sc._H_Computer_S_3OWYH6v
No ratings yet
B._Sc._H_Computer_S_3OWYH6v
6 pages
Mid-Semester Regular Data Mining QP v1 PDF
No ratings yet
Mid-Semester Regular Data Mining QP v1 PDF
2 pages
Data Mining f20 Practice Final Solutions
No ratings yet
Data Mining f20 Practice Final Solutions
8 pages
Betting On Yourself
No ratings yet
Betting On Yourself
185 pages
C-3 Pap365er
No ratings yet
C-3 Pap365er
4 pages
640005
No ratings yet
640005
4 pages
697-01
No ratings yet
697-01
21 pages
Module 1-ICT Concepts
No ratings yet
Module 1-ICT Concepts
48 pages
Notice to Priya for Recovery
No ratings yet
Notice to Priya for Recovery
2 pages
Week 7 Assignment 1
No ratings yet
Week 7 Assignment 1
6 pages
Assignment-2 3
No ratings yet
Assignment-2 3
4 pages
Chemical Engineering Design Fourth Edition Chemical Engineering Volume 6 Coulson Amp Richardson 039 S Chemical Engineering - 773-782
No ratings yet
Chemical Engineering Design Fourth Edition Chemical Engineering Volume 6 Coulson Amp Richardson 039 S Chemical Engineering - 773-782
10 pages
BS en 60974-6-2003 (2005)
No ratings yet
BS en 60974-6-2003 (2005)
24 pages
MODULE 10 Ô luyện tiếng Anh
No ratings yet
MODULE 10 Ô luyện tiếng Anh
16 pages
Final Exam, Data Mining (CEN 871) : Name Surname: Student's ID
No ratings yet
Final Exam, Data Mining (CEN 871) : Name Surname: Student's ID
2 pages
AB CompactLogix PDF
No ratings yet
AB CompactLogix PDF
68 pages
TelesisGuide 20191111
No ratings yet
TelesisGuide 20191111
31 pages
CT10R 6050a2423901-Mb-A01 CS 20101216 PDF
No ratings yet
CT10R 6050a2423901-Mb-A01 CS 20101216 PDF
50 pages
CSEET January 2024 Master Study Plan
No ratings yet
CSEET January 2024 Master Study Plan
8 pages
Agatha Christie's World
No ratings yet
Agatha Christie's World
14 pages
Corporate Domination: Chapter Two
No ratings yet
Corporate Domination: Chapter Two
46 pages
Triple Blending Effect of Fly Ash, Silica Fume and Steel Fibers On Performance of High Strength Concrete
No ratings yet
Triple Blending Effect of Fly Ash, Silica Fume and Steel Fibers On Performance of High Strength Concrete
8 pages
Marisa Constantinides Dip - RSA, M.A. App Ling Teacher Educator & Director CELT Athens Teacher Development Centre
No ratings yet
Marisa Constantinides Dip - RSA, M.A. App Ling Teacher Educator & Director CELT Athens Teacher Development Centre
24 pages
Animals Name Young One Sound and Their Home
No ratings yet
Animals Name Young One Sound and Their Home
34 pages
100 Albums PDF
No ratings yet
100 Albums PDF
20 pages
Sheet Metal Bending: A. Elbadan
No ratings yet
Sheet Metal Bending: A. Elbadan
24 pages
FABM Final Output Practice Set C
No ratings yet
FABM Final Output Practice Set C
11 pages
How Is Tequila Made
No ratings yet
How Is Tequila Made
4 pages
Aluminium Alloy 1050 H14 TDS
No ratings yet
Aluminium Alloy 1050 H14 TDS
1 page
DSP-UNIT-5 Objective
No ratings yet
DSP-UNIT-5 Objective
5 pages
Fedesign Tosca Brochure
No ratings yet
Fedesign Tosca Brochure
4 pages
Arduino Basics - 433 MHZ RF Module With Arduino Tutorial 1
No ratings yet
Arduino Basics - 433 MHZ RF Module With Arduino Tutorial 1
10 pages
Payment Scenario in The Malaysian Construction Industry Prior To CIPAA
No ratings yet
Payment Scenario in The Malaysian Construction Industry Prior To CIPAA
10 pages
Possessive Pronouns 7°-8°
No ratings yet
Possessive Pronouns 7°-8°
4 pages
IGNOU MCA Cloud Computing and IoT Previous year Unsolved Papers MCS 227
From Everand
IGNOU MCA Cloud Computing and IoT Previous year Unsolved Papers MCS 227
Manish Soni
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

DM-I Q Paper 2024

Uploaded by

DM-I Q Paper 2024

Uploaded by

Moj

[This question paper contains 12 printed pages.]

Your Roll No...

Sr. No. of Question Paper : 4192 H

Unique Paper Code 2343012005

Name of the Paper Data Mining I

Name of the Course : B.Sc. (Hons.) Computer

Duration:3 Hours Maximum Marks : 90

Instructions for Candidates

2. Section A (Question No. 1) is compulsory.

3. Attempt any four questions from Section B

4. The use of a simple calculator is allowed.

S. Parts of the question must be answered together.

Yes NO according to their buying interests. )

(iv) Using historical data from previous financial

(g) Classify the following tasks as "predictive" or

Education Levcl = PG, Career

Le * Annual Budget is In Lakhs

(3) Classifier A, predicted

dataset and compute the candidate and frequent Classifier B, predicted

threshold of 33.34%. (6)

7. Given a dataset with six records about startup

Nunber of Annual Turnover

5+2t 5+3-’ eerSSE

VX,Y:(c )’s(X)2 s()(2 marks)

Age Normalized Values of Age Salary Normalized Values of Salary

Q1 (h)2 marks for Euclidean distance (formula +

You can ignore the missing value in the attributeLocation"

{Coat, Cardigan} 3 Yes

Item Support Count Frequent

( ) MMarking Seheme: 1mark for

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.