Data Warehousing and Data Mining: Dr. Karunendra Verma

Download as pdf or txt
Download as pdf or txt
You are on page 1of 101

Data Warehousing

and Data mining

Dr. Karunendra Verma


Assot. Professor, Dept. Of CSE

1
BASIC CONCEPTS
• What is Data?
Data is raw, unorganized facts that need to be processed. Data
can be something simple and seemingly random and useless
until it is organized
• What is Information?
When data is processed, organized, structured or presented in a
given context so as to make it useful, it is called information.
• How data and Information are
related?
Example :Each student's test score is one piece of data.
The average score of a class or of the entire school is
information that can be derived from the given data.

2
What is a Warehouse in general?

“A warehouse is a commercial building for


storage of goods”.
Warehouses are used by manufacturers,
importers, exporters, wholesalers, transport
businesses, customs, etc. They are usually
large plain buildings in industrial parts of
towns. They come equipped with loading
docks ( harbor ) to load and unload trucks;
or sometimes are loaded directly from
railways, airports, or seaports. They also
often have cranes and forklifts for moving
goods.
3
Decision Support System
• How DBMS Applications are classified as ?
1. Transaction Processing System:
A transaction process system (TPS) is an information
processing system for business transactions involving
the collection, modification and retrieval of all
transaction data.

2. Decision Support System (DSS)

❖ Definition of DSS
DSS aims to get High Level information out of the
detailed data stored in transaction processing system

Information generated after observing several


modifications to tuples
4
Uses of DSS

1. What products to stock?


2. When to plan for production?
3. At what quantity a product to be
manufactured?
4. What items are preferred by peoples?
5. What items are likely to be
purchased in future?

Business Intelligence
5
Problems in Storage and Retrieval of
Data for DSS
1. Many DSS queries can not be written
using SQL
Reasons

a. Require Extensive statistical Analysis


b. Requires some preprocessing for analysis
c. Some queries written in SQL are inefficient

6
2. Large Companies have Various sources of Data
(Branches spread worldwide)
o Located at different places
o Use different Database Schemas
o Designed and implemented at
different time
o May use different DBMS S/W
To Avoid These Problems Companies are using
Data Warehouse 7
Data Warehouse Definition

• A Data Warehouse gathers data from


multiple sources under a unified schema
at a single site and thus provides a single
uniform interface to data. It also stores
historical data.
Bill Inmon coined the term “Data
Warehouse” in 1990

Definition by Father of Data Warehouse


8
Definition by Bill Inmon

• A Data warehouse is a subject oriented,


integrated, time variant and non-volatile
collection of data in support of
managements decision making
1. Subject
process.Oriented:- data that gives information
about a particular subject (like sales, production)
rather than mixing various subjects about
companies ongoing operations

There must be a separate Warehouse for each


Subject
9
Application and Subject Orientation
Application Subject
❑ Loans ❑ Customer
❑ Savings ❑ Vendor
❑ Accounts ❑ Sales
❑ Trusts ❑ Products
❑ Checking ❑ Purchase
DW focuses on “Data Modeling” and “Data
Design” but “Process Design” is not part of the
Data Warehousing

10
Definition by Bill Inmon

2. Integrated:- means data is gathered into


data warehouse from variety of sources and
merged into a single unified schema
Operational DBMS Data Warehouse
Male Female

1 0 M F

X Y

134 c ms

5.8 feet Inches

1.6 meters
11
Definition by Bill Inmon

• 3. Time Variant:- All the data in the data


warehouse is identified with a
particular time period. e.g.
- balance of an account on a given date
- total sale of a specific item on a date
Thus Data warehouse may have multiple tuples of the
same account indicating various transactions on it till
date

12
Definition by Bill Inmon

4. Non-volatile:- Data is stable in data


warehouse. More data is added but data
Replaceis never removed

Insert Load

Update Access

Delete
Operational Data
DBMS Warehouse

13
This definition Given
by Bill Inmon
remains reasonably
accurate almost for
10 years

14
Accepted Changes in Definition

Now a days
▪ DW can be volatile due to required multi- terra
(2 power 40 bytes) bytes of data
▪ DW becomes more general. It can have data about
more than one subject. A single subject DW is
called “Data Mart” and now a DW is an enterprise.
▪ Only a period (3 years) history is kept into DW and
older tuples are automatically “rolled off”

Deleting tuples based on dates

15
What is “Data Warehousing”?

• It is set of H/W and S/W components that


can be used to better analyze the massive
amount of data that companies are
accumulating to make better business
decisions
Data warehousing is not just the data in Data
warehouse but the term deals with other things also,
like
Architecture Data Loaders

Query and Analysis tools


16
S2

S1 B2

B1 Bn

Sn

Data Loaders

Unified Schema
Data Base
(un-normalized)

Bi- Branches
Si- Schemas

Analytic Tools General Architecture 17


of DW
Issues in building Data
Warehouse

18
1. When and How to gather data?

Based on this we have two architectures


of DW.

a. Source Driven Architecture


b. Destination driven Architecture

In Source Driven architecture data Sources


transmits new information either
continually (as transaction processing
takes place) OR Periodically (e.g. per night)
to DW 19
In Source Driven Architecture,
“Sources” are more active than
Destination.

In Destination Driven Architecture, DW


periodically request for new Data to the
Sources.
In Destination Driven Architecture,
“Destination” is more active than
Sources.
20
Data Sources

2
1

DW DW
1

Source Driven Destination Driven

Active
n Order 21
Passive
2. What Schema To use ?

❑ Since various Data Sources have


different schema and data model
(Network, Relational and OODM), DW
has to perform following tasks

1. Schema Integration
2. Data conversion

22
2. What Schema To use ?

1. Schema Integration:- Finding unified


and integrated schema suitable for all the
data sources

2. Data Conversion:- Converting Data into


unified and integrated schema.

Thus Data stored in DW is not just a copy of


Data present in multiple data sources

23
3. Data Transformation and Cleansing.

Cleansing:- (means purification or refinement).


It is the task of correcting and preprocessing
the Data.
What is need of Data Cleansing?
(1). Same attribute value may be written in
different manner at various data sources. e. g
company name
- I.B.M - IBM
- International Business Machines 24
(2). To correct spelling mistakes in
- Names of City/ State/ Country
- Other standard attributes

(3) To correct Incorrect Entries of


- ZIP codes
- Telephone codes

25
Data Transformation

Other than cleansing; data may be


transformed for
- Changing units of measure
- converting data to a different
schema by
joining multiple columns e. g. name
First-name Middle-name Last-name

Name

26
4. How to propagate Updates?

- Updates on relations at the data


sources must be propagated to the DW
- It is exactly like “View-maintenance” in
DBMS.

27
5. What Data To Summarize?

- Raw data generated online is too-large


- Require more space to store
(impracticable)
- Require more time to answer queries
on it.
However many queries can be answered
by using summary of Data rather than
maintaining big relations.
Data obtained by aggregation on relations
28
Example of aggregation / Data summary
- Instead of storing Data about every
sale of clothing, we can store total
sales by item-name and size.
ETL Tasks:- different steps involved in
getting data into DW is Extraction, Transform
and Load

Getting data from various data sources

29
Data Warehouse Schemas

❑ For what purpose DBMS


Schema are designed ?

Consistency,
Avoiding redundancy,
Modification of data,
Ability to represent data…
30
Note Difference

❑ For what purpose DW Schema


are designed ?

1. Data analysis
2. To support interactive analysis of
summary

31
Data Warehouse Schema

❑ Data present in DW is modeled as


“Multidimensional Data”
Data that can be modeled as “dimension
attributes” and “measure attributes” is
called as “Multidimensional Data”
Measure Attribute:- An attribute that can be
measured and aggregated

e.g. Number of units sold, balance, etc are


measure attributes . 32
Data Warehouse Schema

Dimension Attributes:- are those


attributes that define dimensions on
which measure attributes and
summaries are viewed
Sales=(item_name, color, size, units-sold)

Dimension Attribute Measure Attribute


33
Interpretation Of Dimension and
Measure Attributes

color
400 units
Yellow

100 units
blue

150 units a b
Item name
XL
300 units

Size
Data Visualization
34
Fact Tables

• DW contains set of “Fact Tables”,


which are the tables, containing
“multidimensional Data”.
Facts about Fact Tables
- Are very large Tables
- e.g. a table storing Sales information of a
retail store with one tuple for each purchase
made by a customer.

35
EXAMPLE OF A FACT TABLE

• Sales Table include


• Item_id
• DateOfPurchase Dimension
• Customer_id Attributes
• Store_id
• NoOfItemsSold Measure
• prizeOfItem Attributes

To minimize storage requirement dimension


attributes used in fact table, are primary keys
of other tables called dimension table 36
Data Warehouse Star Schema

1. Star schema
2. Snowflake schema

Star Schema, is a Data warehouse


schema, containing a fact table and
multiple dimension tables

37
Item_info Store
Store_id
Item_id
City
Item_name Sales
State
Color
Item_id Country
Size
Store_id
category
Customer_id
Date
Customer
Date_info Number
Customer_id
Date Prize
Name
Day
Street
Month
STAR City
Year
SCHEMA state
38
Characteristics of star schema
1. Contains only one Fact table and
multiple dimension tables
2. Primary key of fact table is composite
key made of all the dimension
attributes present in it.
3. Fact table may include level (e.g. item
1 sold at district, regional and state
level)
4. A single fact table will contain detail
data such as sales at a store, of an
item k, on date xyz 39
How to design a star schema?

1. Find out unified schema by considering


schemas at all the data sources.
2. Find out measure and dimension
attributes in each table along with their
primary keys.
3. Prepare a single fact table by collecting
primary keys of all the tables plus some
additional attributes ( e.g. level, total
sale and etc).
40
4. Draw the schema diagram of above
schema.
5. Before actual loading the data do the
following things
1. Data transformation
2. Data Cleansing
3. Adding time unit as part of key

41
TUTORIAL 1
Q.1. Design of star schema for Olympic events.
-- Consider particular example of attendance at Olympic events.
Facts are numbers attending, value of ticket sales. Dimensions
include Olympiad (year of Olympic), venue, sport, type ( heat
(common match), semifinal, final), men's / woman's. Venues
are classified by location and type of building into central
enclosed, central open, remote. Sports are subdivided into
events.
The following is a sample of a report representing
attendance at various events. (A page will be given) Do the
following
a). Construct a fact table for this Olympic event.
b). What is the key of the fact table?
c). Design a star schema by using the fact table designed in a)
and using dimension tables.
42
Olympiad
Olympiad Venue

city
Organizing Venue
committee.
Location
Contact
address Region
Olympiad

Venue
Sport_event
Sports
Gender

Gender Attendance Sport_event

Ticket sale Event_class


Gender Subsport
Fact Sporting-
federation

43
Drawbacks of star schema

1. Addition and deletions of levels in


hierarchy will require physical
modification to the fact table
Fact=(store_key, store, district, region, zone, total sale, units_sold)

2. Since dimension tables are unnormalized,


star schema requires more space
44
Snowflake schema

❑ Snowflake is a pile (flake) of snow


made during snowfall.

45
Snowflake schema

❑ Is variant of star schema where some


dimension tables are normalized, by
splitting the data into additional tables
❑ The resulting schema diagram looks
like a “snowflake” , hence it is called
snowflake.
❑ Following partial schema diagram
shows an example of snowflake
schema
46
Location key
Location

Location key
Fact City
Street
City key City key

City name
State
Country

47
Example of Snowflake
tim
Schema
ite
e
time_key m suppli
day item_key
day_of_the_we Sales Fact item_nam er
supplier_k
ek Table e ey
month time_key brand supplier_ty
quarter type pe
year item_key supplier_k
ey
bran branch_key locatio
n
location_k
ch
branch_ke location_key ey
y
street
branch_na units_sold cit
city_key
me y
city_key
dollars_sold
branch_typ city
e avg_sales state_or_provi
Measur nce
es country 48
Difference
• Star schema • Snowflake schema
1. Dimension tables 1. Some dimension
are un-normalized tables are
2. Requires more normalized
space due to 2. Requires less
redundant data space since less
3. Query evaluation redundancy
cost is less due to 3. Query evaluation
less join operations cost is more due to
4. Simple and more joins
commonly used 4. Difficult and less
common 49
Q.2. Design a snowflake schema for the
above partial star schema shown, by
decomposing dimension tables into 3NF.
Consider following functional
dependencies exist on customer and
store dimensions.

50
Customer id
Customer id
Name

Store id Address
Phone
location

Store id

Slocation id
Owner
City
State
country

51
Functional dependencies

customerid Name, address


customerid phone
phone location

storeid owner, slocationid


slocationid city, state
state country

52
Fact Constellation Schema

1. Constellation means “group of stars”


2. Star schema handles only one fact table
and thus only one subject. When we
have to handle multiple subjects,
multiple facts tables must be used.
3. Fact constellation schema allows more
than one fact tables that shares several
dimension tables.

53
General example of fact constellation

F1 F2

D1 D2 D3

54
Example of Fact
tim Constellation
e ite
time_key Shipping Fact
m
day item_key Table
day_of_the_we Sales Fact item_name time_key
ek Table brand
month time_key type
quarter supplier_ty item_key
year pe shipper_key
item_key
branch_key from_locatio
bran locatio n
ch
branch_ke location_key n to_location
location_key
y street dollars_cost
branch_na units_sold
city
me
dollars_sold province_or_st units_shippe
branch_typ ate d shipp
e avg_sales country
Measure er
shipper_ke
s y
shipper_na
55
me
Q.3. Draw the fact constellation schema
diagram for following tables by identifying
fact and dimension tables.
Sales=(time_key, item_key,branch_key,
location_key, dollars_sold, units_sold)
Shipping=(item_key, time_key, shipper_key,
from_location, to_location, dollars_cost,
units_shipped)

56
Time=(time_key, day_of_week, month,
quarter, year)
Branch=(branch_key, branch_name,
branch_type)
Location=(location_key, street, city, country)
Item=(item_key, item_name,brand, type,
supplier)
Shipper=(shipper_key,shipper_name,
location_key, shipper_type) 57
Q.3. a) How many subjects this fact
constellation schema handles? Why?
b) How many fact tables are there?
Which?
c) What tables are shared by all the
fact tables?

58
Difference between OLTP and DW
system
OLTP DW

Few Indexes Many

Many Joins few

Less Duplicated More


data
Rare Derived Common
Data and
Aggregation
One tuple at a Modification Bulk 59

time database
Data Cubes and OLAP

▪ A DW is modeled by multidimensional
database structures, where each dimension
corresponds to an attribute (s) and each cell
stores the value of aggregate measure in it.

▪ Thus the actual structure of DW may be a


relational data store or multidimensional
data cube

60
DATA CUBE

DATA CUBE

61
Browsing a Data Cube

62
On-Line Analytic Processing (OLAP)
• Relational DBMS contains 2-
dimensional data spread in rows and
columns. Thus OLTP (On-Line
Transaction Processing) is way to
use DBMS.
• DW is a multidimensional structure,
so OLAP is the proper way to use DW.
• OLAP is set of operations that uses
aggregation of data at various
dimensions, to present the data at
different levels. 63
Difference between OLTP and OLAP
No Feature OLTP OLAP

1 Characteristi Operational Informational


c processing processing
2 Orientation Transaction Analysis
3 Users Clerk, DBA, Knowledge
DB workers
(managers,
professionals
executive, analyst)
4 Function Day-to-day Long term
operation decision support
64
Difference between OLTP and OLAP

No Feature OLTP OLAP


5 DB design ER based, Star, snowflake,
application subject oriented
oriented
6 Data Current Historical

7 Summariza Highly detail summarized


tion
8 View Flat relational Multidimensional
(2-D)
65
Difference between OLTP and OLAP

No Feature OLTP OLAP


9 Unit of work Simple Complex query
transaction
10 Access Read/Write Mostly Read

11 Focus Data in Information out

12 Operations Index on PK Lots of scan

66
Difference between OLTP and OLAP

No Feature OLTP OLAP


13 Numbers tens millions
of records
accessed
14 Number of Thousands Hundreds
users
15 DB size 100 MB to 100 GB to TB
GB
16 Metric Transaction Response time
throughput 67
Concept Hierarchy

• Defines a sequence of mappings


from a set of low level concepts to
high level, more general concepts.
• Next figure shows concept hierarchy
of location

68
Country

State k

State 1

District m
District 1

City 1 …..... City n-1 City n


69
Concept Hierarchies

• Many concept hierarchies are


implicit within the database schema.

• Location=(id, street, city, state,


country)

• These attributes are related by total


order
like street<city<state<country
Or in a partial order framing a lattice. 70
Total and Partial Concept Hierarchies

country Year
Quarter

state

city

Week
Month
street
Day

Total Partial 71
Use of concept Hierarchy

• In multidimensional model , data is


organized into multiple dimensions, each
dimensions contains multiple levels of
abstraction defined by concept
hierarchies.
• Organization of data in concept
hierarchies provides users with the
flexibility to view data from different
perspective.
72
OLAP OPERATIONS

1. Roll-Up (Drill-Up)
2. Drill-Down
3. Slice and Dice
1
4. Pivot (Rotate)
Roll-Up: 2

Roll-Up is also called as “Drill-Up” by some


vendors. Performs aggregation on data cube
by climbing up a concept hierarchy for a
dimension or by dimension reduction.
73
Roll-Up on Date dimension
Total annual
Dat sales
1Qt 2Qt 3Qt 4Qt su of TV in U.S.A.
e
u

TV r
r
ct od

r r m
P U.S.A
Pr

VCRC

Countr
su
Canad
m
a
Mexic

y
o
su
m

74
Roll-Up on dimension reduction

• When roll-Up is performed by


dimension reduction , one or more
dimensions are removed from the
given data cube.
▪ For example in sales cube containing
only two dimensions location and time,
aggregation can be done on only location
rather than on both location and time

75
Roll-Up on dimension reduction

76
Extreme Roll-Up

Aggregation along n-1 dimension on n dimension cube


77
Drill-Down

• It is reverse of Roll-Up , it can be realized


by stepping down a concept hierarchy for
a dimension. or Country

• By introducing a new dimension.

city

78
Slice and Dice

• Slice operation performs a selection


on one or more dimension of the given
cube, resulting in a sub cube.

79
Slice Operation 80
81
Dice

• Dicing refers to range selection in


multiple dimensions.
• Dice selects two or more dimensions from
a given cube and provides a new sub-
cube

82
Dicing Operation 83
Pivot (Rotate)

• Pivot (Rotate) is a visualization


operation that rotates the data axes in
order to provide an alternative
presentation of the data.

84
Before Rotating 85
86
After Rotating
Three-Tier data Warehouse
Architecture

• DW can be designed by using three


tiers.

1. Top-tier:- Front end tools query


report, data mining, analysis. (Client)
2. Middle-tier:- OLAP server
3. Bottom-tier:- Warehouse database
server 87
Multi-Tiered DW Architecture
Bottom Middle top
Monito
r
Metada & OLAP
other
ta Integra Server
sourc
tor
es Analysis
Operation Extract Query
al Transfor Data Serv Reports
DBs m
Warehou e Data
Load
Refresh se mining

Data
Marts
Data Data OLAP Engine Front-End88
BOTTOM-tier:- DW database server

• Data from operational DB and other


sources are extracted using application
programs called “Gateways” . ( DB
Interface)
• A Gateway is supported by underlying
DBMS, to provide DB connectivity.
• Examples are ODBC, OLE-DB (Open
linking and embedding), JDBC.
89
MIDDLE-tier:- OLAP server
• OLAP server is typically implemented
in either
1. Relational OLAP (ROLAP):-
is an extended relational DBMS that
maps operations on multidimensional
data to standard relational operations.
2. Multidimensional OLAP (MOLAP):-
is a special purpose server that
directly implements multidimensional
data and operations. 90
3. Hybrid OLAP (HOLAP):-
Is combination of both ROLAP and
MOLAP.

91
ROLAP VSS MOLAP VSS HOLAP

ROLAP MOLAP HOLAP

Relational Data cubes Combination


tables, M views of tables and
data cubes
Higher Faster Both features
scalability computation

Easy to build Difficult to Difficult to


by extending build since build since
relational DB start from start from
scratch scratch 92
Top-tier:- Client

• Query and reporting tools


• Analysis tools
• Data mining tools
(prediction, trend analysis)

93
Data Warehousing - Interview Questions

• Q: Define data warehouse?


• A : Data warehouse is a subject oriented, integrated, time-variant,
and nonvolatile collection of data that supports management's
decision-making process.
• Q: What does subject-oriented data warehouse signify?
• A : Subject oriented signifies that the data warehouse stores the
information around a particular subject such as product, customer,
sales, etc.
• Q: List any five applications of data warehouse.
• A : Some applications include financial services, banking services,
customer goods, retail sectors, controlled manufacturing.
• Q: What do OLAP and OLTP stand for?
• A : OLAP is an acronym for Online Analytical Processing and
OLTP is an acronym of Online Transactional Processing.
94
• Q: What is the very basic difference between data warehouse
and operational databases?
• A : A data warehouse contains historical information that is made
available for analysis of the business whereas an operational
database contains current information that is required to run the
business.
• Q: List the Schema that a data warehouse system can
implements.
• A : A data Warehouse can implement star schema, snowflake
schema, and fact constellation schema.
• Q: What is Data Warehousing?
• A : Data Warehousing is the process of constructing and using the
data warehouse.
• Q: List the process that are involved in Data Warehousing.
• A : Data Warehousing involves data cleaning, data integration and
data consolidations.
• Q: List the functions of data warehouse tools and utilities.
• A : The functions performed by Data warehouse tool and utilities
are Data Extraction, Data Cleaning, Data Transformation, Data
Loading and Refreshing. 95
• Q: What do you mean by Data Extraction?
• A : Data extraction means gathering data from multiple
heterogeneous sources.
• Q: Define metadata?
• A : Metadata is simply defined as data about data. In other words,
we can say that metadata is the summarized data that leads us to
the detailed data.
• Q: What does Metadata Respiratory contain?
• A : Metadata respiratory contains definition of data warehouse,
business metadata, operational metadata, data for mapping from
operational environment to data warehouse, and the algorithms for
summarization.
• Q: How does a Data Cube help?
• A : Data cube helps us to represent the data in multiple dimensions.
The data cube is defined by dimensions and facts.
• Q: Define dimension?
• A : The dimensions are the entities with respect to which an
enterprise keeps the records.

96
• Q: Explain data mart.
• A : Data mart contains the subset of organization-wide data. This
subset of data is valuable to specific groups of an organization. In
other words, we can say that a data mart contains data specific to a
particular group.
• Q: List the phases involved in the data warehouse delivery
process.
• A : The stages are IT strategy, Education, Business Case Analysis,
technical Blueprint, Build the version, History Load, Ad hoc query,
Requirement Evolution, Automation, and Extending Scope.
• Q: Define load manager.
• A : A load manager performs the operations required to extract and
load the process. The size and complexity of load manager varies
between specific solutions from data warehouse to data warehouse.
• Q: Define the functions of a load manager.
• A : A load manager extracts data from the source system. Fast load
the extracted data into temporary data store. Perform simple
transformations into structure similar to the one in the data
warehouse.
97
• Q: Define a warehouse manager.
• A : Warehouse manager is responsible for the warehouse
management process. The warehouse manager consist of third
party system software, C programs and shell scripts. The size and
complexity of warehouse manager varies between specific
solutions.
• Q: Define the functions of a warehouse manager.
• A : The warehouse manager performs consistency and referential
integrity checks, creates the indexes, business views, partition
views against the base data, transforms and merge the source data
into the temporary store into the published data warehouse, backs
up the data in the data warehouse, and archives the data that has
reached the end of its captured life.

• Q: What is Summary Information?


• A : Summary Information is the area in data warehouse where the
predefined aggregations are kept.
• Q: What does the Query Manager responsible for?
• A : Query Manager is responsible for directing the queries to the
suitable tables. 98
• Q: List the types of OLAP server
• A : There are four types of OLAP servers, namely Relational OLAP,
Multidimensional OLAP, Hybrid OLAP, and Specialized SQL
Servers.
• Q: Which one is faster, Multidimensional OLAP or Relational
OLAP?
• A : Multidimensional OLAP is faster than Relational OLAP.
• Q: List the functions performed by OLAP.
• A : OLAP performs functions such as roll-up, drill-down, slice, dice,
and pivot.
• Q: How many dimensions are selected in Slice operation?
• A : Only one dimension is selected for the slice operation.
• Q: How many dimensions are selected in dice operation?
• A : For dice operation two or more dimensions are selected for a
given cube.
• Q: How many fact tables are there in a star schema?
• A : There is only one fact table in a star Schema.

99
• Q: What is Normalization?
• A : Normalization splits up the data into additional tables.
• Q: Out of star schema and snowflake schema, whose
dimension table is normalized?
• A : Snowflake schema uses the concept of normalization.
• Q: What is the benefit of normalization?
• A : Normalization helps in reducing data redundancy.
• Q: Which language is used for defining Schema Definition?
• A : Data Mining Query Language (DMQL) is used for Schema
Definition.
• Q: What language is the base of DMQL?
• A : DMQL is based on Structured Query Language (SQL).
• Q: What are the reasons for partitioning?
• A : Partitioning is done for various reasons such as easy
management, to assist backup recovery, to enhance performance.
• Q: What kind of costs are involved in Data Marting?
• A : Data Marting involves hardware & software cost, network
access cost, and time cost
100
THANK YOU

• http://www.tutorialspoint.com/dwh/
dwh_quick_guide.htm

101

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy