Data Mining Concepts
Data Mining Concepts
1
Chapter 1. Introduction
2
Why Data Mining?
3
Evolution of Sciences
4
Evolution of Database Technology
1960s:
Data collection, database creation, and network DBMS
1970s:
Relational data model, relational DBMS implementation
1980s:
RDBMS, advanced data models
Application-oriented DBMS
1990s:
Data mining, data warehousing, multimedia databases, and Web
databases
2000s
Stream data management and mining
Data mining and its applications
Web technology (XML, data integration) and global information systems
5
Chapter 1. Introduction
6
What Is Data Mining?
Alternative names
Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data
dredging, information harvesting, business intelligence, etc.
7
Knowledge Discovery (KDD) Process
Task-relevant Data
Data Cleaning
Data Integration
Databases
8
Example: A Web Mining Framework
9
Data Mining in Business Intelligence
Increasing potential
to support
business decisions End User
Decision
Making
Data Exploration
Statistical Summary, Querying, and Reporting
11
KDD Process: A Typical View
12
Example: Medical Data Mining
13
Chapter 1. Introduction
Why Data Mining?
14
Multi-Dimensional View of Data Mining
Data to be mined
Database data (extended-relational, object-oriented, heterogeneous,
legacy), transactional data, stream, spatiotemporal, time-series,
sequence, text and web, multi-media, graphs & social and information
networks
Knowledge to be mined
Characterization, discrimination, association, classification, clustering,
trend/deviation, outlier analysis, etc.
Descriptive vs. predictive data mining
Multiple/integrated functions and mining at multiple levels
Techniques utilized
Data-intensive, data warehouse (OLAP), machine learning, statistics,
pattern recognition, visualization, high-performance computing, etc.
Applications adapted
Retail, telecommunication, banking, fraud analysis, bio-data mining, stock
market analysis, text mining, Web mining, etc.
15
Introduction
16
Data Mining: On What Kinds of Data?
17
Introduction
18
Data Mining Function: (1) Generalization
19
(2) Association and Correlation Analysis
21
(4) Cluster Analysis
22
(5) Outlier Analysis
Outlier analysis
Outlier: A data object that does not comply with the general
behavior of the data
Noise or exception? … One person’s garbage could be another
person’s treasure
Methods: by product of clustering or regression analysis, …
Useful in fraud detection, rare events analysis
23
Time and Ordering: Sequential Pattern, Trend
and Evolution Analysis
memory cards
Periodicity analysis
Similarity-based analysis
24
Structure and Network Analysis
Graph mining
Finding frequent subgraphs (e.g., chemical compounds), trees
family, classmates, …
Links carry a lot of semantic information: Link mining
Web mining
Web is a big information network: from PageRank to Google
25
Evaluation of Knowledge
Are all mined knowledge interesting?
One can mine tremendous amount of “patterns” and knowledge
Some may fit only certain dimension space (time, location, …)
Some may not be representative, may be transient, …
Evaluation of mined knowledge → directly mine only
interesting knowledge?
Descriptive vs. predictive
Coverage
Typicality vs. novelty
Accuracy
Timeliness
…
26
Introduction
Summary
27
Data Mining: Confluence of Multiple Disciplines
Applications Visualization
Data Mining
28
Why Confluence of Multiple Disciplines?
30
Applications of Data Mining
31
Introduction
32
Major Issues in Data Mining (1)
Mining Methodology
Mining various and new kinds of knowledge
Mining knowledge in multi-dimensional space
Data mining: An interdisciplinary effort
Boosting the power of discovery in a networked environment
Handling noise, uncertainty, and incompleteness of data
Pattern evaluation and pattern- or constraint-guided mining
User Interaction
Interactive mining
Incorporation of background knowledge
Presentation and visualization of data mining results
33
Major Issues in Data Mining (2)
34
Summary
Data mining: Discovering interesting patterns and knowledge from
massive amount of data
A natural evolution of database technology, in great demand, with
wide applications
A KDD process includes data cleaning, data integration, data
selection, transformation, data mining, pattern evaluation, and
knowledge presentation
Mining can be performed in a variety of data
35