1 - Introduction To DM
1 - Introduction To DM
1 - Introduction To DM
Data Mining
Tran Thi Oanh
Outline
➢Why Data Mining?
➢What Is Data Mining?
➢A Multi-Dimensional View of Data Mining
➢What Kind of Data Can Be Mined?
➢What Kinds of Patterns Can Be Mined?
➢What Technology Are Used?
➢What Kind of Applications Are Targeted?
➢Major Issues in Data Mining
➢A Brief History of Data Mining and Data Mining Society
➢Summary
2
Why Data Mining?
5
Outline
➢Why Data Mining?
➢What Is Data Mining?
➢A Multi-Dimensional View of Data Mining
➢What Kind of Data Can Be Mined?
➢What Kinds of Patterns Can Be Mined?
➢What Technology Are Used?
➢What Kind of Applications Are Targeted?
➢Major Issues in Data Mining
➢A Brief History of Data Mining and Data Mining Society
➢Summary
6
What Is Data Mining?
➢Alternative names
o Knowledge discovery (mining) in databases (KDD), knowledge extraction,
data/pattern analysis, data archeology, data dredging, information harvesting,
business intelligence, etc.
7
Knowledge Discovery (KDD) Process
➢ This is a view from typical database systems
and data warehousing communities
Pattern Evaluation
➢ Data mining plays an essential role in the
knowledge discovery process
Data Mining
Task-relevant Data
Data Cleaning
Data Integration
Databases 8
Example: A Web Mining Framework
9
Data Mining in Business Intelligence
Increasing potential
to support
business decisions End User
Decision
Making
Data Exploration
Statistical Summary, Querying, and Reporting
11
KDD Process: A Typical View from ML and Statistics
12
Example: Medical Data Mining
➢Health care & medical data mining – often adopted such a view in
statistics and machine learning
➢Preprocessing of the data (including feature extraction and
dimension reduction)
➢Classification or/and clustering processes
➢Post-processing for presentation
13
Outline
➢Why Data Mining?
➢What Is Data Mining?
➢A Multi-Dimensional View of Data Mining
➢What Kind of Data Can Be Mined?
➢What Kinds of Patterns Can Be Mined?
➢What Technology Are Used?
➢What Kind of Applications Are Targeted?
➢Major Issues in Data Mining
➢A Brief History of Data Mining and Data Mining Society
➢Summary
14
Multi-Dimensional View of Data Mining
➢ Data to be mined
o Database data (extended-relational, object-oriented, heterogeneous, legacy), data
warehouse, transactional data, stream, spatiotemporal, time-series, sequence, text
and web, multi-media, graphs & social and information networks
➢ Knowledge to be mined (or: Data mining functions)
o Characterization, discrimination, association, classification, clustering,
trend/deviation, outlier analysis, etc.
o Descriptive vs. predictive data mining
o Multiple/integrated functions and mining at multiple levels
➢ Techniques utilized
o Data-intensive, data warehouse (OLAP), machine learning, statistics, pattern
recognition, visualization, high-performance, etc.
➢ Applications adapted
o Retail, telecommunication, banking, fraud analysis, bio-data mining, stock market
analysis, text mining, Web mining, etc. 15
Outline
➢Why Data Mining?
➢What Is Data Mining?
➢A Multi-Dimensional View of Data Mining
➢What Kind of Data Can Be Mined?
➢What Kinds of Patterns Can Be Mined?
➢What Technology Are Used?
➢What Kind of Applications Are Targeted?
➢Major Issues in Data Mining
➢A Brief History of Data Mining and Data Mining Society
➢Summary
16
Data Mining: On What Kinds of Data?
20
Data Mining Function: (3) Classification
22
Data Mining Function: (5) Outlier Analysis
➢Outlier analysis
o Outlier: A data object that does not comply with the general behavior
of the data
o Noise or exception? ― One person’s garbage could be another
person’s treasure
o Methods: by product of clustering or regression analysis, …
o Useful in fraud detection, rare events analysis
23
Time and Ordering: Sequential Pattern, Trend and Evolution
Analysis
24
Structure and Network Analysis
25
Evaluation of Knowledge
➢Are all mined knowledge interesting?
o One can mine tremendous amount of “patterns” and knowledge
o Some may fit only certain dimension space (time, location, …)
o Some may not be representative, may be transient, …
28
Why Confluence of Multiple Disciplines?
➢Mining Methodology
o Mining various and new kinds of knowledge
o Mining knowledge in multi-dimensional space
o Data mining: An interdisciplinary effort
o Boosting the power of discovery in a networked environment
o Handling noise, uncertainty, and incompleteness of data
o Pattern evaluation and pattern- or constraint-guided mining
➢User Interaction
o Interactive mining
o Incorporation of background knowledge
o Presentation and visualization of data mining results
33
Major Issues in Data Mining (2)
37
Where to Find References? DBLP, CiteSeer, Google
A. It uses machine-learning techniques. Here program can learn from past experience and
adapt themselves to new situations
B. Computational procedure that takes some value as input and produces some value as
output
C. Science of making machines performs tasks that would require intelligence when
performed by humans
D. None of these
Bayesian classifiers is
A. This takes only two values. In general, these values will be 0 and 1 and .they can be coded
as one bit
B. The natural environment of a certain species
C. Systems that can be used without knowledge of internal operations
D. None of these
Classification accuracy is
A. A component of a network
B. In the context of KDD and data mining, this refers to random errors in a database table.
C. One of the defining aspects of a data warehouse
D. None of these
Prediction is