0% found this document useful (0 votes)
167 views6 pages

Data Warehousing and Data Mining

The document discusses topics related to data warehousing and data mining including introduction to data warehousing, classification and prediction of data warehousing, mining time series data, mining data streams, web mining, and recent trends in distributed warehousing. Key concepts covered include data marts, data mining advantages over traditional approaches, and importance of association rules in data mining.

Uploaded by

sachin singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
167 views6 pages

Data Warehousing and Data Mining

The document discusses topics related to data warehousing and data mining including introduction to data warehousing, classification and prediction of data warehousing, mining time series data, mining data streams, web mining, and recent trends in distributed warehousing. Key concepts covered include data marts, data mining advantages over traditional approaches, and importance of association rules in data mining.

Uploaded by

sachin singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

DATA WAREHOUSING AND DATA MINING

Introduction to Data Warehouse 02

Classification and Prediction of Data Warehousing 20

Mining Time Series Data 35

Mining Data Streams 48

Web Mining 66

Recent Trends in Distributed Warehousing 72

NOTE:

MAKAUT course structure and syllabus of 6th semester has been changed from 2021. Previously
DATA WAREHOUSING AND DATA MINING was in 7th semester. This subject has been redesigned
and shifted in 6th semester as per present curriculum. Subject organization has been changed
slightly. Taking special care of this matter we are providing the relevant MAKAUT university
solutions and some model questions & answers for newly introduced topics, so that students can
get an idea about university questions patterns.
POPULAR PUBLICATIONS

INTRODUCTION TO DATA WAREHOUSING


Multiple Choice Type Questions

1. A data warehouse is an integrated collection of data because [WBUT 2009, 2015]

a) It is a collection of data of different types

b) It is a collection of data derived from multiple sources

c) It is a relational database

d) It contains summarized data

Answer: (b)

2. A data warehouse is said to contain a 'subject oriented' collection of data because [WBUT 2009,
2013]

a) Its contents have a common theme

b) It is built for a specific application c) It cannot support multiple subjects

d) It is a generalization of 'object-oriented'

Answer: (a)

3. A Data warehouse is said to be contain in time-varying collection of data because [WBUT 2010,
2013, 2015]

a) Its content vary automatically with time

b) Its life-span is very limited

c) Every key structure of data warehouse contains either implicitly or explicitly an element of time

d) Its content has explicit time-stamp

Answer: (c)

4. Data Warehousing is used for [WBUT 2010, 2012]

a) Decision Support System

c) Database applications

b) OLTP applications

d) Data Manipulation applications


Answer: (a)

5. Which of the following is TRUE? [WBUT 2010, 2012]

a) Data warehouse can be used for analytical processing only

b) Data warehouse can be used for information processing (query, report) and analytical
processing

c) Data warehouse can be used for data mining only

d) Data warehouse can be used for information processing (query, report), analytical processing
and data mining

Answer: (d)

Short Answer Type Questions


1. Define Data Marts. [WBUT 2009, 2010, 2011, 2015, 2018

Define the types of Data Marts. ] [WBUT 2009, 2010, 2011, 2018]

Answer:

1 Part:

A data mart is a group of subjects that are organized in a way that allows them to assist
departments in making specific decisions. For example, the advertising department will have its
own data mart, while the finance department will have a data mart that is separate from it. In
addition to this, each department will have full ownership of the software, hardware, and other
components that make up their data mart.

2nd Part:

There are two types of Data Marts:

Independent data marts sources from data captured form OLTP system, external providers or from
data generated locally within a particular department or geographic area.

Dependent data mart - sources directly form enterprise data warehouses.

2. Define data mining. What is the advantages data mining over traditional approaches? [WBUT
2009]

Answer:

1 Part:

Data mining, which is also known as knowledge discovery, is one of the most popular topics in
information technology. It concerns the process of automatically extracting useful information and
has the promise of discovering hidden relationships that exist in large databases. These
relationships represent valuable knowledge that is crucial for many applications. Data mining is
not confined to the analysis of data stored in data warehouses. It may analyze data existing at
more detailed granularities than the Sumanarized data provided in a data warehouse. It may also
analyze transactional, textual, spatial, and multimedia data which are difficult to model with
current multidimensional databotechnology.

2nd Part:

With the help of data mining, organizations are in a better position to predict the future regarding
the business trend, the possible amount of revenue that could be generated, the orders that could
be expected and the type of customers that could be approached. The traditional approaches will
not be able to generate such accurate results as they use simpler algorithms. One major advantage
of data mining over a traditional statistical approach is its ability to deal directly with
heterogeneous data fields.

The advaritages of data mining helps the businesses grow help the customers be happy, and help
in a lot of other areas like data management.

3. What is the importance of Association Rules in Data mining? [WBUT 2009]

OR,

Explain support, confidence, frequent item set and give a formal definition of association rule.
[WBUT 2013]

OR,

What is an Association Rule? Define Support, Confidence, Item set and Frequent item set in
Association Rule Mining? [WBUT 2017]

Answer:

To illustrate the concepts, we use a small example from the supermarket domain. The set of items
is 1 = {milk, bread, butter, beer) and a small database containing the items is shown in Table
below.

Transaction Items

1 Milk, bread

2 Bread, butter

3 Beer

4 Milk, bread, butter

5 Bread, butter

An example supermarket database with five transactions.


An example rule for the supermarket could be (milk, bread} → {butter) meaning that if milk and
bread is bought, customers also buy butter. To select interesting rules from the set of all possible
rules, constraints on various measures of significance and interest can be used. The best-known
constraints are minimum thresholds on support and confidence. The support supp(X) of an itemset
X is defined as the proportion of transactions in the data set which contain the itemset. In the
example database in Table 1, the itemset {milk, bread) has a support of 2/5 = 0.4 since it occurs in
40% of all transactions (2 out of 5 transactions).

The confidence of a rule is defined conf(X) Y) = supp(X [Y)/supp(X). For example, the rule (milk,
bread)) {butter) has a confidence of 0.2/0.4 = 0.5 in the database in the Table, which means that
for 50% of the transactions containing milk and bread the rule is correct. Confidence can be
interpreted as an estimate of the probability P(Y (X), the probability of finding the RHS of the rule
in transactions under the condition that these transactions also contain the LHS.

In many (but not all) situations, we only care about association rules or causalities involving sets of
items that appear frequently in baskets. For example, we cannot run a good marketing strategy
involving items that no one buys anyway. Thus, much data mining starts with the assumption that
we only care about sets of items with high support; i.e., they appear together in many baskets. We
then find association rules or causalities only involving a high-support set of items must appear in
at least a certain percent of the baskets, called the support threshold. We use the term frequent
itemset for "a set S that appears in at least fraction s of the baskets," where s is some chosen
constant, typically 0.01 or 1%.

Association rules are statements of the form (X1,X2, ...,X, Y, meaning that if we find all of
X1,X2,...X, in the market basket, then we have a good chance of nding Y. The probability of finding
Y for us to accept this rule is the condence of the rule. We normally would search only for rules
that had confidence above a certain threshold.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy