0% found this document useful (0 votes)

26 views

Data Mining and Warehousing

The document discusses topics related to data mining and warehousing. It provides instructions for a exam and includes 6 questions related to concepts such as data warehousing, OLAP, data preprocessing, association rules, classification, and clustering.

Uploaded by

Dev Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

Data Mining and Warehousing

Uploaded by

Dev Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Course Code : CAT 307 WYXZ/MS – 23 / 1673

Sixth Semester B. Tech. ( Computer Science and Engineering /

AIML ) Examination

DATA MINING AND WAREHOUSING

Time : 3 Hours ] [ Max. Marks : 60

Instructions to Candidates :—
(1) All questions carry marks as indicated against them.
(2) Assume suitable data wherever necessary and clearly state your assumptions.

1. (a) An insurance company, with branches all over the country, wants to develop
a data warehouse for effective decision-making about their insurance policies.
There are a number of different types of insurance like Auto insurance,
Home insurance, Industrial insurance, etc. The entire country is categorized
into four regions, namely, North, South, East and West. Each region consists
of a set of states. There may be different types of customers like individuals,
institution, industry, etc. The data warehouse should record an entry for
each policy issued to each customer along with the premium paid.
With respect to the above use case, answer the following questions. Necessary
assumptions can be made to support your answer :
= Design a star schema for the data warehouse clearly identifying
the fact table(s), dimensional table(s), their attributes and measures
along with the primary key and foreign key relationships.
= Convert Star Schema to Snowflake Schema.
= Write an SQL query by which you can display region - wise,
insurance - type - wise, year-wise total premium collected from your
schema 6(CO1)

(b) Consider a dimension CUSTOMER (cust_key, cust_name, cust_code, acc_status,

marital_status, address, state, zip).
Using your knowledge of types of dimension tables, which of the dimension table
will you suggest for the following scenarios ? Give explanation for each :
= Correction in customer name.

WYXZ/MS-23 / 1673 Contd.

= Address of customer changes and the application needs to keep
track of current and previous address.

= Acc_status values can be good, late, very late, in arrears,

suspended. The history of account status of each customer is
to be maintained. The account status of a customer gets changed
frequently. 4(CO1)

2. (a) Consider the star schema of an automobile data :

Autos (ModelId, modelname, serialNo, color)
Dealers (DealerId, name, city, state, phone)
Time (TimeId, day, week, month, year)
Sales (ModelId, DealerId, TimeId, QtySold, AmountGenerated)
where the attribute val in intended to be the total price of all automobiles
for the given model, color, date and dealer, while cnt is the total number
of automobiles in that category.
Answer the following OLAP queries :—
= Find total sales generated for model name (Maruti, Honda) and
dealer state (Maharashtra, Gujarat) in September 2017 and October
2017 using ROLL - UP across three dimensions - ModelID, DealerID
and TimeId.

= Find total sales generated for model name (Maruti, Honda) and
dealer state (Maharashtra, Gujarat) in September 2017 and October
2017 using CUBE across three dimensions - ModelID, DealerID
and TimeId.

= Comment on difference in output using ROLL - UP and CUBE

aggregation clause.

= Find total sales generated for model name (Maruti, Honda) and
dealer state (Maharashtra, Gujarat) in September 2017 and October
2017 using Partial ROLL - UP across DealerID and TimeId and
group by ModelId..

= Perform aggregation on amount generated. It should get aggregated

by day first, then by all the weeks in each month, and then
across all months in the year.

WYXZ/MS-23 / 1673 2 Contd.

= Why Groupid( ) clause is used in OLAP queries ? 7(CO2)

(b) Illustrate various types of metadata used in the data warehouse. 3(CO1)

3. (a) Write SQL command for Index Organized Table Employee with the attributes
cust_no, cust_name and cust_address in tablespace ts_iot as directed :
b cust_no is primary key for the table.

b PCTTHRESHOLD is 30.

b Specify Overflow and Including clause. Assume cust_name to

be included in Including clause.

Give meaning of PCTTHRESHOLD, including and overflow clause. 5(CO2)

(b) Consider the following snapshot of SALES table :

Extract of Sales Data

Address or Rowid Date Product Color Region Sale(s)

00001BFE.0012.0111 15-Nov.-00 Dishwasher White East 300

00001BFE.0013.0114 15-Nov.-00 Dryer Almond West 450

00001BFF.0012.0115 16-Nov.-00 Dishwasher Almond West 350

00001BFF.0012.0138 16-Nov.-00 Washer Black North 550

00001BFF.0012.0145 17-Nov.-00 Washer White South 500

00001BFF.0012.0157 17-Nov.-00 Dryer White East 400

00001BFF.0014.0165 17-Nov.-00 Washer Almond South 575

Explain how the query : Select the rows from the Sales table where product
is "Washer" and color is "Almond" and division is "East" or "South" will
be executed if bitmap indexes are created on Product, Color and Region
columns. Show the intermediate steps. 5(CO2)

WYXZ/MS-23 / 1673 3 Contd.

4. (a) Suppose that the data for analysis include the attribute salary in thousands.
The salary values for the data tuples are :

24, 17, 23, 20, 19, 24, 20, 26, 29, 25, 26, 34, 29, 37, 29, 39,
29, 37, 40, 39, 44, 39, 39, 74, 49, 50, 56
(i) What is mean, median and mode of data ?

(ii) What is the range and midrange of the data ?

(iii) Find first quartile (Q1), third quartile (Q3), IQR of the data.

(iv) Give the five - number summary of the data.

(v) Show a boxplot of the data. 5(CO3)

(b) Generated all strong association rules using the Apriori algorithm for the
transaction database shown below and a minimum support s_min = 3 and
minimum confidence = 60%.

TId Items
T1 a, d, e
T2 b, c, d
T3 a, c, e
T4 a, c, d, e
T5 a, e
T6 a, c, d
T7 b, c
T8 a, c, d, e
T9 b, c, e
T10 a, d, e 5(CO3)

5. (a) Bring out the difference between supervised and unsupervised learning with
an example. 4(CO3)

WYXZ/MS-23 / 1673 4 Contd.

(b) Apply Naïve Bayes classification algorithm to predict whether
Rahul (Home owner : Yes, Marital status : Married, Job experience : 3) will
default his loan.
Home Marital Job Expreience
Defaulted
Owner Status (yrs.)
Yes Single 3 No
No Married 4 No
No Single 5 No
Yes Married 4 No
No Divorced 2 Yes
No Married 4 No
Yes Divorced 2 No
No Married 3 Yes
No Married 3 No
Yes Single 2 Yes 6(CO3)

6. (a) Consider the following data points : A1(2, 10), A2(2, 5), A3(8, 4), B1(5, 8),
B2(7, 5), B3(6, 4), C1(1, 2), C2(4, 0). Suppose initially we assign A1, B1
and C1 as the center of cluster, respectively. Apply K-means algorithm
using distance function as Manhattan distance to show only the final three
clusters. 6(CO4)
(b) Present conditions under which density-based clustering is more suitable than
partitioning-based clustering and hierarchical clustering. Given some application
examples to support your argument. 4(CO4)

WYXZ/MS-23 / 1673 5 55

The Hobbyist's Guide To The RTL-SDR - Really Cheap Software Defined Radio PDF
100% (3)
The Hobbyist's Guide To The RTL-SDR - Really Cheap Software Defined Radio PDF
577 pages
MPK Mini Editor - User Guide - V1.0
No ratings yet
MPK Mini Editor - User Guide - V1.0
22 pages
Requirements For Online Auction System
No ratings yet
Requirements For Online Auction System
5 pages
Cis 417.Ccs 415. Cct 416 Cat
No ratings yet
Cis 417.Ccs 415. Cct 416 Cat
4 pages
SEM 5 - Comps, IOT, CYBER, CS - Data Warehousing & Mining - 2024 MAY To 2022 DEC PYQ - Aeraxia - in
No ratings yet
SEM 5 - Comps, IOT, CYBER, CS - Data Warehousing & Mining - 2024 MAY To 2022 DEC PYQ - Aeraxia - in
10 pages
2024 Honework 01 Questions
No ratings yet
2024 Honework 01 Questions
3 pages
Data Warehousing and Mining
No ratings yet
Data Warehousing and Mining
4 pages
SSG515 I
No ratings yet
SSG515 I
5 pages
640005
No ratings yet
640005
4 pages
Adbms
No ratings yet
Adbms
19 pages
CS614 MidTerm MCQs Solved by Arslan
0% (1)
CS614 MidTerm MCQs Solved by Arslan
21 pages
Jss Mahavidyapeetha: AY 2019-20 (Even Semester)
No ratings yet
Jss Mahavidyapeetha: AY 2019-20 (Even Semester)
2 pages
Relational Reporting Business Reporting Management Reporting
No ratings yet
Relational Reporting Business Reporting Management Reporting
7 pages
Pyqp - Cs402-Qp-Jun21
No ratings yet
Pyqp - Cs402-Qp-Jun21
3 pages
Module 1 Part1 DataWarehouse Architecture, Schema
No ratings yet
Module 1 Part1 DataWarehouse Architecture, Schema
58 pages
Evaluation of Business Performance Source 01
No ratings yet
Evaluation of Business Performance Source 01
25 pages
UNIT 2 Question Bank
No ratings yet
UNIT 2 Question Bank
8 pages
SS G515
No ratings yet
SS G515
4 pages
Data Warehousing and OLAP Technology For Data Mining
No ratings yet
Data Warehousing and OLAP Technology For Data Mining
30 pages
CS614 - Data Warehousing Quiz No.2 May 07,2012
No ratings yet
CS614 - Data Warehousing Quiz No.2 May 07,2012
17 pages
Question Bank For DMDW
No ratings yet
Question Bank For DMDW
10 pages
Ds Assign
No ratings yet
Ds Assign
6 pages
Faculty of Engineering Scit B. Tech It/Cse/Cce VI Semester First Mid Term Examination: 2021-22 Data Mining and Warehousing (IT3240)
No ratings yet
Faculty of Engineering Scit B. Tech It/Cse/Cce VI Semester First Mid Term Examination: 2021-22 Data Mining and Warehousing (IT3240)
2 pages
(It-704c) Data Warehousing and Data Mining (2013-14)
No ratings yet
(It-704c) Data Warehousing and Data Mining (2013-14)
6 pages
DWDM
No ratings yet
DWDM
7 pages
Data Warehousing and Mining: Unit: Introduction and Datawarehousing
No ratings yet
Data Warehousing and Mining: Unit: Introduction and Datawarehousing
8 pages
Viva Questions For Data Mining and Warehousing: Q1. Ans.
No ratings yet
Viva Questions For Data Mining and Warehousing: Q1. Ans.
13 pages
Batch B DWM Experiments
No ratings yet
Batch B DWM Experiments
90 pages
حل اسئلة استاذ علاء
No ratings yet
حل اسئلة استاذ علاء
10 pages
Answers
No ratings yet
Answers
4 pages
DWDM Viva Question
50% (2)
DWDM Viva Question
31 pages
Question With Answer
No ratings yet
Question With Answer
22 pages
Dcs 7302
No ratings yet
Dcs 7302
17 pages
Be6 r4
No ratings yet
Be6 r4
2 pages
qb_unit_1 mcqs
No ratings yet
qb_unit_1 mcqs
6 pages
(WWW - Entrance-Exam - Net) - ICFAI University MBA Data Warehousing and Data Mining (MB3G1IT) Sample Paper 2
No ratings yet
(WWW - Entrance-Exam - Net) - ICFAI University MBA Data Warehousing and Data Mining (MB3G1IT) Sample Paper 2
9 pages
Data Cube
No ratings yet
Data Cube
42 pages
Data Mining Syllabus and Question
No ratings yet
Data Mining Syllabus and Question
6 pages
Dec 2016
No ratings yet
Dec 2016
2 pages
Mid Sem
No ratings yet
Mid Sem
3 pages
DW Final Exam May 2022 Answer Booklet
No ratings yet
DW Final Exam May 2022 Answer Booklet
13 pages
Document 5
No ratings yet
Document 5
10 pages
DMW Question Paper
0% (1)
DMW Question Paper
7 pages
Ps Assignment - Solution
No ratings yet
Ps Assignment - Solution
7 pages
DM
No ratings yet
DM
7 pages
Advance Database Concepts
No ratings yet
Advance Database Concepts
23 pages
DWDM 1-5 QB Sols
No ratings yet
DWDM 1-5 QB Sols
193 pages
Subject Code: 80359 Subject Name: Data Warehousing and Data Mining Common Subject Code (If Any)
No ratings yet
Subject Code: 80359 Subject Name: Data Warehousing and Data Mining Common Subject Code (If Any)
9 pages
qb_unit_1 mcqs
No ratings yet
qb_unit_1 mcqs
6 pages
2 Data Warehouse
No ratings yet
2 Data Warehouse
61 pages
Answer Sheet 5 Semester Regular Examination 2017-18 B.Tech PCS5H002 Data Mining & Data Warehousing Branch: Cse Max Marks: 100 Q. CODE: B307
No ratings yet
Answer Sheet 5 Semester Regular Examination 2017-18 B.Tech PCS5H002 Data Mining & Data Warehousing Branch: Cse Max Marks: 100 Q. CODE: B307
14 pages
Coimbtore Sahodaya IP Set B
No ratings yet
Coimbtore Sahodaya IP Set B
8 pages
Data Warehousing MCQ
No ratings yet
Data Warehousing MCQ
71 pages
List Data Warehouse Models With Example
No ratings yet
List Data Warehouse Models With Example
19 pages
In For Matics Practices Xii
No ratings yet
In For Matics Practices Xii
39 pages
Data Warehouse C
No ratings yet
Data Warehouse C
34 pages
Advance Database System
No ratings yet
Advance Database System
8 pages
DWM QB
No ratings yet
DWM QB
10 pages
DMDW Chapter 1
No ratings yet
DMDW Chapter 1
31 pages
The Full Form of KDD Is
No ratings yet
The Full Form of KDD Is
6 pages
Chapter 2.introduction To Data Warehouse
No ratings yet
Chapter 2.introduction To Data Warehouse
49 pages
Btech Cs 6 Sem Data Warehousing and Data Mining Ncs 066 2016 17
No ratings yet
Btech Cs 6 Sem Data Warehousing and Data Mining Ncs 066 2016 17
2 pages
100 Puzzles to Learn Data Warehousing
From Everand
100 Puzzles to Learn Data Warehousing
Cristian Scutaru
No ratings yet
Report Project
No ratings yet
Report Project
20 pages
JavaScript Browser Detection
No ratings yet
JavaScript Browser Detection
18 pages
Computer Science
No ratings yet
Computer Science
32 pages
Automated Log Parsing For Large Scale Log Analysis
No ratings yet
Automated Log Parsing For Large Scale Log Analysis
18 pages
Scales For Jazz Improvisation Dan Haerle PDF Download
0% (3)
Scales For Jazz Improvisation Dan Haerle PDF Download
2 pages
Microsoft Azure Security Center
75% (4)
Microsoft Azure Security Center
193 pages
Linux
100% (1)
Linux
3 pages
Javascript Js Acrobat Pro Developer Guide
No ratings yet
Javascript Js Acrobat Pro Developer Guide
219 pages
Ishan R Tripathi Resume
No ratings yet
Ishan R Tripathi Resume
1 page
Common PC Hardware Problems & There Solutions
70% (10)
Common PC Hardware Problems & There Solutions
3 pages
NetNumen U31 R18 - External CBC Configuration Guide - GSM - R1.1
No ratings yet
NetNumen U31 R18 - External CBC Configuration Guide - GSM - R1.1
26 pages
Schools & Employees Information - SE&LD
No ratings yet
Schools & Employees Information - SE&LD
1 page
Report On A European Collaborative Cloud For Cultural Heritage
No ratings yet
Report On A European Collaborative Cloud For Cultural Heritage
108 pages
Studuino Manual
No ratings yet
Studuino Manual
120 pages
Smart Shopping: - Augmented Reality Based Shopping Application
No ratings yet
Smart Shopping: - Augmented Reality Based Shopping Application
13 pages
Manuale NanoVIP2 Rel. 1.4 en-UK
No ratings yet
Manuale NanoVIP2 Rel. 1.4 en-UK
58 pages
Infs213 Pasco PDF Information Information S
No ratings yet
Infs213 Pasco PDF Information Information S
2 pages
ARM Architecture: Navigation Search
No ratings yet
ARM Architecture: Navigation Search
20 pages
Qdoc - Tips Suzuki Cello Vol 1pdf
No ratings yet
Qdoc - Tips Suzuki Cello Vol 1pdf
52 pages
Www Freecram Net Question CompTIA SY0 701 v2024!04!08 q76 Which of the Following...
No ratings yet
Www Freecram Net Question CompTIA SY0 701 v2024!04!08 q76 Which of the Following...
7 pages
Electronic Total Station
No ratings yet
Electronic Total Station
2 pages
WCE Internals RootedCon2011 Ampliasecurity
No ratings yet
WCE Internals RootedCon2011 Ampliasecurity
53 pages
Csew Ii
No ratings yet
Csew Ii
2 pages
Lesson 7 - Creating Multimodal Texts 1
No ratings yet
Lesson 7 - Creating Multimodal Texts 1
11 pages
Assignment Question 201705
No ratings yet
Assignment Question 201705
2 pages
DD vcredistMSI02C4
No ratings yet
DD vcredistMSI02C4
56 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data Mining and Warehousing

Uploaded by

Data Mining and Warehousing

Uploaded by

Course Code : CAT 307 WYXZ/MS – 23 / 1673

Sixth Semester B. Tech. ( Computer Science and Engineering /

DATA MINING AND WAREHOUSING

Time : 3 Hours ] [ Max. Marks : 60

(b) Consider a dimension CUSTOMER (cust_key, cust_name, cust_code, acc_status,

WYXZ/MS-23 / 1673 Contd.

= Acc_status values can be good, late, very late, in arrears,

2. (a) Consider the star schema of an automobile data :

= Comment on difference in output using ROLL - UP and CUBE

= Perform aggregation on amount generated. It should get aggregated

WYXZ/MS-23 / 1673 2 Contd.

b Specify Overflow and Including clause. Assume cust_name to

Give meaning of PCTTHRESHOLD, including and overflow clause. 5(CO2)

(b) Consider the following snapshot of SALES table :

Extract of Sales Data

Address or Rowid Date Product Color Region Sale(s)

00001BFE.0012.0111 15-Nov.-00 Dishwasher White East 300

00001BFE.0013.0114 15-Nov.-00 Dryer Almond West 450

00001BFF.0012.0115 16-Nov.-00 Dishwasher Almond West 350

00001BFF.0012.0138 16-Nov.-00 Washer Black North 550

00001BFF.0012.0145 17-Nov.-00 Washer White South 500

00001BFF.0012.0157 17-Nov.-00 Dryer White East 400

00001BFF.0014.0165 17-Nov.-00 Washer Almond South 575

WYXZ/MS-23 / 1673 3 Contd.

(ii) What is the range and midrange of the data ?

(iv) Give the five - number summary of the data.

(v) Show a boxplot of the data. 5(CO3)

WYXZ/MS-23 / 1673 4 Contd.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.