Data Mining and Warehousing
Data Mining and Warehousing
Data Mining and Warehousing
Instructions to Candidates :—
(1) All questions carry marks as indicated against them.
(2) Assume suitable data wherever necessary and clearly state your assumptions.
1. (a) An insurance company, with branches all over the country, wants to develop
a data warehouse for effective decision-making about their insurance policies.
There are a number of different types of insurance like Auto insurance,
Home insurance, Industrial insurance, etc. The entire country is categorized
into four regions, namely, North, South, East and West. Each region consists
of a set of states. There may be different types of customers like individuals,
institution, industry, etc. The data warehouse should record an entry for
each policy issued to each customer along with the premium paid.
With respect to the above use case, answer the following questions. Necessary
assumptions can be made to support your answer :
= Design a star schema for the data warehouse clearly identifying
the fact table(s), dimensional table(s), their attributes and measures
along with the primary key and foreign key relationships.
= Convert Star Schema to Snowflake Schema.
= Write an SQL query by which you can display region - wise,
insurance - type - wise, year-wise total premium collected from your
schema 6(CO1)
= Find total sales generated for model name (Maruti, Honda) and
dealer state (Maharashtra, Gujarat) in September 2017 and October
2017 using CUBE across three dimensions - ModelID, DealerID
and TimeId.
= Find total sales generated for model name (Maruti, Honda) and
dealer state (Maharashtra, Gujarat) in September 2017 and October
2017 using Partial ROLL - UP across DealerID and TimeId and
group by ModelId..
(b) Illustrate various types of metadata used in the data warehouse. 3(CO1)
3. (a) Write SQL command for Index Organized Table Employee with the attributes
cust_no, cust_name and cust_address in tablespace ts_iot as directed :
b cust_no is primary key for the table.
b PCTTHRESHOLD is 30.
Explain how the query : Select the rows from the Sales table where product
is "Washer" and color is "Almond" and division is "East" or "South" will
be executed if bitmap indexes are created on Product, Color and Region
columns. Show the intermediate steps. 5(CO2)
24, 17, 23, 20, 19, 24, 20, 26, 29, 25, 26, 34, 29, 37, 29, 39,
29, 37, 40, 39, 44, 39, 39, 74, 49, 50, 56
(i) What is mean, median and mode of data ?
(iii) Find first quartile (Q1), third quartile (Q3), IQR of the data.
(b) Generated all strong association rules using the Apriori algorithm for the
transaction database shown below and a minimum support s_min = 3 and
minimum confidence = 60%.
TId Items
T1 a, d, e
T2 b, c, d
T3 a, c, e
T4 a, c, d, e
T5 a, e
T6 a, c, d
T7 b, c
T8 a, c, d, e
T9 b, c, e
T10 a, d, e 5(CO3)
5. (a) Bring out the difference between supervised and unsupervised learning with
an example. 4(CO3)
6. (a) Consider the following data points : A1(2, 10), A2(2, 5), A3(8, 4), B1(5, 8),
B2(7, 5), B3(6, 4), C1(1, 2), C2(4, 0). Suppose initially we assign A1, B1
and C1 as the center of cluster, respectively. Apply K-means algorithm
using distance function as Manhattan distance to show only the final three
clusters. 6(CO4)
(b) Present conditions under which density-based clustering is more suitable than
partitioning-based clustering and hierarchical clustering. Given some application
examples to support your argument. 4(CO4)
WYXZ/MS-23 / 1673 5 55