0% found this document useful (0 votes)

3 views

DWH Unit 3

Data Warehouse and Mining notes Unit 3

Uploaded by

lasaciv776

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

DWH Unit 3

Data Warehouse and Mining notes Unit 3

Uploaded by

lasaciv776

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

What is Data Mining?

Data Mining refers to extracting knowledge from larger amount of data.

The process of extracting information to identify patterns, trends, and useful data that would allow the business
to take the data-driven decision from huge sets of data is called Data Mining.

Data mining is the act of automatically searching for large stores of information to find trends and patterns that
go beyond simple analysis procedures. Data mining utilizes complex mathematical algorithms for data segments
and evaluates the probability of future events. Data Mining is also called Knowledge Discovery of Data (KDD).

Alternative names for Data Mining :

1. Knowledge discovery (mining) in databases (KDD)
2. Knowledge extraction
3. Data/pattern analysis
4. Data archaeology
5. Data dredging
6. Information harvesting
7. Business intelligence

Example

• So there is a Mobile network operator. They consult a data-miner to dig into the call records of the operator. No

specific targets are given to the Data Miner.

• A quantitative target of finding at least 2 new patterns in a month is given.

• As the data miner starts digging into the data, he finds a pattern that there are fewer international calls on

Wednesday than on other days.

• This information is shared with the management, and they come up with the plan to reduce the international call

rates on Wednesdays and start a campaign.

• Call rates surge, customers are happy with low call prices, more customers sign up, and they make more

money! Win-Win situation!

Data Mining Architecture

The significant components of data mining systems are a data source, data mining engine, data
warehouse server, the pattern evaluation module, graphical user interface, and knowledge base.

Data Source:
The actual source of data is the Database, data warehouse, World Wide Web (WWW), text files, and
other documents. You need a huge amount of historical data for data mining to be successful.
Organizations typically store data in databases or data warehouses. Data warehouses may comprise
one or more databases, text files spreadsheets, or other repositories of data. Sometimes, even plain text
files or spreadsheets may contain information. Another primary source of data is the World Wide Web
or the internet.

Different processes:
Before passing the data to the database or data warehouse server, the data must be cleaned, integrated,
and selected. As the information comes from various sources and in different formats, it can't be used
directly for the data mining procedure because the data may not be complete and accurate. So, the
first data requires to be cleaned and unified. More information than needed will be collected from
various data sources, and only the data of interest will have to be selected and passed to the server.
These procedures are not as easy as we think. Several methods may be performed on the data as part
of selection, integration, and cleaning.

Database or Data Warehouse Server:

The database or data warehouse server consists of the original data that is ready to be processed.
Hence, the server is cause for retrieving the relevant data that is based on data mining as per user
request.

Data Mining Engine:

The data mining engine is a major component of any data mining system. It contains several modules
for operating data mining tasks, including association, characterization, classification, clustering,
prediction, time-series analysis, etc.

In other words, we can say data mining is the root of our data mining architecture. It comprises
instruments and software used to obtain insights and knowledge from data collected from various data
sources and stored within the data warehouse.

Pattern Evaluation Module:

The Pattern evaluation module is primarily responsible for the measure of investigation of the pattern
by using a threshold value. It collaborates with the data mining engine to focus the search on exciting
patterns.

This segment commonly employs stake measures that cooperate with the data mining modules to focus
the search towards fascinating patterns. It might utilize a stake threshold to filter out discovered
patterns. On the other hand, the pattern evaluation module might be coordinated with the mining
module, depending on the implementation of the data mining techniques used. For efficient data
mining, it is abnormally suggested to push the evaluation of pattern stake as much as possible into the
mining procedure to confine the search to only fascinating patterns.

Graphical User Interface:

The graphical user interface (GUI) module communicates between the data mining system and the user.
This module helps the user to easily and efficiently use the system without knowing the complexity of
the process. This module cooperates with the data mining system when the user specifies a query or a
task and displays the results.

Knowledge Base:
The knowledge base is helpful in the entire process of data mining. It might be helpful to guide the
search or evaluate the stake of the result patterns. The knowledge base may even contain user views
and data from user experiences that might be helpful in the data mining process. The data mining
engine may receive inputs from the knowledge base to make the result more accurate and reliable. The
pattern assessment module regularly interacts with the knowledge base to get inputs, and also update
it.
Different Data Mining Methods
There are many methods used for Data Mining, but the crucial step is to select the appropriate form

from them according to the business or the problem statement. These methods help in predicting

the future and then making decisions accordingly. These also help in analyzing market trends and

increasing company revenue.

Some Methods are:

• Association

• Classification

• Clustering Analysis

• Prediction

• Sequential Patterns or Pattern Tracking

• Decision Trees

• Outlier Analysis or Anomaly Analysis

• Neural Network

Mining various Data types in Data Mining

The data from multiple sources are integrated into a common source known as Data Warehouse.

Type of data can be mined:

1. Flat Files
2. Relational Databases
3. DataWarehouse
4. Transactional Databases
5. Multimedia Databases
6. Spatial Databases
7. Time Series Databases
8. World Wide Web(WWW)
Major Issues in Data Mining
Major Issues in Data Mining: Data Mining is not very simple to understand and implement. As it is
already evident that Data Mining is a process which is very crucial for various researchers and
businesses. But in data mining, the algorithms are very complex and on top of that, the data is not
readily available at one place. Every technology has flaws or issues. But one needs to always know the
various flaws or issues that technology has.
Data Mining Techniques

Classification:
The technique used for obtaining important and relevant information about data and the metadata is
called classification. As we already know the literal meaning of classification is to categorize the given
set of information or data according to some criteria.

Clustering:
Clustering can be defined as an analytics technique that heavily depends on the visual approaches in
order to understand the data. The mechanism in Clustering displays the graphics so as to show where
the distribution of data is. In this type, the colors are used to show the distribution of data.
Using a Graphical approach is ideal for cluster analytics because with the help of graphs and clustering,
the users are easily able to see the distribution of data so that the trends are identified which are
relevant to the business objectives.

Regression:
Regression can be defined as a technique of data mining that is most often used to predict a range of
numeric values (or continuous values), that are provided to the user in a specific set of data. Regression
is a widely used concept. It is used across the multiple industries mainly for planning marketing
strategies, forecasting finance related matters and analyzing the trends.

Outer:
Outer detection is also called Outlier Analysis and Outlier mining. Outer is a type of data mining
technique in which there is a full examination of particular data objects in a set of data that do not
equate to the expected pattern or trend. This technique is very helpful as it can be used in a big variety
of domains, such as intrusion, detection, fraud, fault detection, and so on.

Sequential Patterns:
This type of data mining technique is used for discovering a series of events that has taken place in
sequence. It is notably very useful/beneficial for transactional mining of data. To give an example, we
can say that this technique is used to reveal what items of hand accessories that a customer is more
likely to buy after an initial purchase of let’s say, a clothing item. The sequential patterns help us in
understanding the general trends of a customer’s need. In this way, organization can recommend
additional items to the customers so as to exponentially increase its sales.
Prediction:
Prediction is an extremely powerful and very beneficial feature/quality of data mining that is
responsible for representing one of four branches of analytics (the other branches are descriptive,
diagnostic, and prescriptive). Let us take an example for understanding better, let’s say- a manager of a
well- respected organization has to predict how much the customers of their products or services are
likely to spend during the sale that organization has offered.

Association Rules:
Association Rules is a data mining technique that is very beneficial as we can search the association
between two or more Items, data or information. It helps in discovering the hidden pattern or trend in a
given data set. To explain in simpler yet apt language, this technique helps us in identifying or finding
engaging associations or relations within various large sets of data.

Tracking Patterns:
Tracking patterns is an essential data mining technique. In this technique, the user identifies and
monitors pattern or the trend that are usually seen in the data we are. It helps in making intelligent
inferences about outcomes of business. If a certain organization is able to figure out and then determine
that a product, they are selling is more popular than the other products sold by that very organization,
the organization will be able to use that information in order to create products or services that are
similar to the best sellers.

Apriori Algorithm
The Apriori algorithm uses frequent itemsets to generate association rules, and it is designed to work
on the databases that contain transactions. With the help of these association rule, it determines how
strongly or how weakly two objects are connected. This algorithm uses a breadth-first
search and Hash Tree to calculate the itemset associations efficiently. It is the iterative process for
finding the frequent itemsets from the large dataset.

This algorithm was given by the R. Agrawal and Srikant in the year 1994. It is mainly used for market
basket analysis and helps to find those products that can be bought together. It can also be used in the
healthcare field to find drug reactions for patients.

Data Mining Notes
100% (1)
Data Mining Notes
75 pages
KITCHEN DESIGN Aliza Case Study-Model
No ratings yet
KITCHEN DESIGN Aliza Case Study-Model
1 page
Business Plan For Teaching and Learning Center
No ratings yet
Business Plan For Teaching and Learning Center
35 pages
Unit 1 Datamining For Business Intelligence
No ratings yet
Unit 1 Datamining For Business Intelligence
101 pages
Data Mining Notes
No ratings yet
Data Mining Notes
21 pages
DM Chapter 1
No ratings yet
DM Chapter 1
10 pages
data_mining_ppt
No ratings yet
data_mining_ppt
17 pages
Module 4
No ratings yet
Module 4
54 pages
Unit 3
No ratings yet
Unit 3
34 pages
Unit-I Data Mining
No ratings yet
Unit-I Data Mining
28 pages
Data Mining Notes
No ratings yet
Data Mining Notes
82 pages
Unit-1 Introduction To Data Mining
No ratings yet
Unit-1 Introduction To Data Mining
33 pages
Data Mining Notes
No ratings yet
Data Mining Notes
9 pages
Data Mining
No ratings yet
Data Mining
43 pages
DWM Notes Class by Proff
No ratings yet
DWM Notes Class by Proff
88 pages
DMDW Lecture Notes
No ratings yet
DMDW Lecture Notes
24 pages
LECTURE NOTES ON DATA MINING and DATA WA
No ratings yet
LECTURE NOTES ON DATA MINING and DATA WA
84 pages
Module 2 Introduction To Data Mining
No ratings yet
Module 2 Introduction To Data Mining
19 pages
DM NOTES
No ratings yet
DM NOTES
91 pages
DWM_Module 2 (1) (1) (1)
No ratings yet
DWM_Module 2 (1) (1) (1)
74 pages
data mining
No ratings yet
data mining
44 pages
Data Mining U-1
No ratings yet
Data Mining U-1
10 pages
dw and dm notes (1)
No ratings yet
dw and dm notes (1)
89 pages
Data Mining Notes UNIT I
No ratings yet
Data Mining Notes UNIT I
21 pages
8 Data Mining and Warehousing
No ratings yet
8 Data Mining and Warehousing
171 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
DM Notes-1
No ratings yet
DM Notes-1
71 pages
Data Mining - KTUweb PDF
No ratings yet
Data Mining - KTUweb PDF
82 pages
Data Mining - Digital Notes (Unit I To V)
No ratings yet
Data Mining - Digital Notes (Unit I To V)
85 pages
UNIT-2 BI
No ratings yet
UNIT-2 BI
26 pages
Chapter 1&2
No ratings yet
Chapter 1&2
91 pages
Data Mining Notes
No ratings yet
Data Mining Notes
25 pages
Unit 1 DMW
No ratings yet
Unit 1 DMW
41 pages
unit-1 notes onl
No ratings yet
unit-1 notes onl
25 pages
A Conceptual Overview of Data Mining: B.N. Lakshmi., G.H. Raghunandhan
No ratings yet
A Conceptual Overview of Data Mining: B.N. Lakshmi., G.H. Raghunandhan
6 pages
Data Mining-CH5
No ratings yet
Data Mining-CH5
49 pages
Notes for DMDWH -Module1
No ratings yet
Notes for DMDWH -Module1
21 pages
DMWH M1
No ratings yet
DMWH M1
25 pages
DWDM R13 Unit 1 PDF
No ratings yet
DWDM R13 Unit 1 PDF
10 pages
Unit 1 DMDW
No ratings yet
Unit 1 DMDW
57 pages
DWDM Unit3
No ratings yet
DWDM Unit3
15 pages
Unit I DM
No ratings yet
Unit I DM
27 pages
Unit - I
No ratings yet
Unit - I
22 pages
Data Mining Tutorials
No ratings yet
Data Mining Tutorials
52 pages
ware house server
No ratings yet
ware house server
89 pages
UNIT 3 DWM NOTES
No ratings yet
UNIT 3 DWM NOTES
17 pages
Lps Week 16 Iatb
No ratings yet
Lps Week 16 Iatb
5 pages
R18CSE4102-UNIT 2 Data Mining Notes
100% (1)
R18CSE4102-UNIT 2 Data Mining Notes
31 pages
DATA MINING-Knowledge Discovery in Databases
No ratings yet
DATA MINING-Knowledge Discovery in Databases
6 pages
big data analytics notes
No ratings yet
big data analytics notes
15 pages
Unit 1
No ratings yet
Unit 1
27 pages
Chapter-1 - Introduction To Data Mining
No ratings yet
Chapter-1 - Introduction To Data Mining
10 pages
Lecture 1428550844
No ratings yet
Lecture 1428550844
87 pages
Data Mining Mod 1 Notes
No ratings yet
Data Mining Mod 1 Notes
25 pages
CSC 425 Data Mining and Warehousing 2024
No ratings yet
CSC 425 Data Mining and Warehousing 2024
54 pages
Chapter 1
No ratings yet
Chapter 1
55 pages
Data Warehousing&Dat Mining
No ratings yet
Data Warehousing&Dat Mining
12 pages
DATA Mining
No ratings yet
DATA Mining
21 pages
DM Unit1 Intro
No ratings yet
DM Unit1 Intro
12 pages
Unit III Dwdm
No ratings yet
Unit III Dwdm
113 pages
A) Data Cleaning
No ratings yet
A) Data Cleaning
7 pages
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Q1 Journals
No ratings yet
Q1 Journals
2 pages
Ampidnhs.308126@deped - Gov.ph: Asynchronous Learning (MDL-D / MDL - P)
No ratings yet
Ampidnhs.308126@deped - Gov.ph: Asynchronous Learning (MDL-D / MDL - P)
6 pages
Crash-Course-2025-Jan-23
No ratings yet
Crash-Course-2025-Jan-23
4 pages
Grammar Task 1. Fill in The Gaps With Appropriate Question Words
100% (1)
Grammar Task 1. Fill in The Gaps With Appropriate Question Words
3 pages
Elementary GW 11a
No ratings yet
Elementary GW 11a
2 pages
Appendix E Sample Interview Guide (Revise or Delete Title As Needed)
No ratings yet
Appendix E Sample Interview Guide (Revise or Delete Title As Needed)
4 pages
DLL - Science 3 - Q1 - W1
No ratings yet
DLL - Science 3 - Q1 - W1
3 pages
Brochure Medical Content Writing.
100% (1)
Brochure Medical Content Writing.
7 pages
res-1234 (1)
No ratings yet
res-1234 (1)
18 pages
MBBS theory paper pattern
No ratings yet
MBBS theory paper pattern
1 page
ProGrad Map - International Business UEH
No ratings yet
ProGrad Map - International Business UEH
2 pages
Molecular Characterization And Analysis Of Polymers John M Chalmers Robert J Meier pdf download
No ratings yet
Molecular Characterization And Analysis Of Polymers John M Chalmers Robert J Meier pdf download
84 pages
The Cambridge Companion To The Federalist Jack N Rakove Colleen A Sheehan download
100% (2)
The Cambridge Companion To The Federalist Jack N Rakove Colleen A Sheehan download
80 pages
Imagic: Text-Based Real Image Editing With Diffusion Models
No ratings yet
Imagic: Text-Based Real Image Editing With Diffusion Models
16 pages
A-Level Animal Cell - Google Search
No ratings yet
A-Level Animal Cell - Google Search
1 page
Active Listening - Presentation
No ratings yet
Active Listening - Presentation
13 pages
Course Plan Hci
No ratings yet
Course Plan Hci
4 pages
BASIC MARKETING A Marketing Strategy Planning Approach Perreault Jr 19th Edition Test Bank - Available For Quick Download And Unlimited Reading
100% (3)
BASIC MARKETING A Marketing Strategy Planning Approach Perreault Jr 19th Edition Test Bank - Available For Quick Download And Unlimited Reading
64 pages
DCT 2 Test Format - 13 July 2015
100% (1)
DCT 2 Test Format - 13 July 2015
37 pages
CV Eng - BELINGA MBARGA Joël Mathieu
No ratings yet
CV Eng - BELINGA MBARGA Joël Mathieu
1 page
2025 to 2027 Literature in Zambian Languages Prescribed Books(1)
No ratings yet
2025 to 2027 Literature in Zambian Languages Prescribed Books(1)
2 pages
Search Algorithm Essay
No ratings yet
Search Algorithm Essay
2 pages
AMACC Dagupan Research (2012 - 2013) - Montemayor, JB - SinoCruz, MJF
No ratings yet
AMACC Dagupan Research (2012 - 2013) - Montemayor, JB - SinoCruz, MJF
19 pages
Review-of-Related-Literature 2
No ratings yet
Review-of-Related-Literature 2
3 pages
Pengaruh Supervisi Akademik Dan Motivasi Kerja Terhadap Peningkatan Kompetensi Pedagogik Guru Di SMP Swasta Parulian 2 P. Mandala Medan
No ratings yet
Pengaruh Supervisi Akademik Dan Motivasi Kerja Terhadap Peningkatan Kompetensi Pedagogik Guru Di SMP Swasta Parulian 2 P. Mandala Medan
11 pages
Eapp q1m3
No ratings yet
Eapp q1m3
14 pages
Workshop 1&2 EED 121
No ratings yet
Workshop 1&2 EED 121
7 pages
OJ 12 B Engleza 2024 Bar
No ratings yet
OJ 12 B Engleza 2024 Bar
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

DWH Unit 3

Uploaded by

DWH Unit 3

Uploaded by

What is Data Mining?

Data Mining refers to extracting knowledge from larger amount of data.

Alternative names for Data Mining :

specific targets are given to the Data Miner.

• A quantitative target of finding at least 2 new patterns in a month is given.

Wednesday than on other days.

rates on Wednesdays and start a campaign.

money! Win-Win situation!

Data Mining Architecture

Database or Data Warehouse Server:

Data Mining Engine:

Pattern Evaluation Module:

Graphical User Interface:

increasing company revenue.

Some Methods are:

• Sequential Patterns or Pattern Tracking

• Outlier Analysis or Anomaly Analysis

Mining various Data types in Data Mining

Type of data can be mined:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.