Data Mining Methods
Data Mining Methods
Data Mining Methods
Knowledge Technique
Data
Application
Data Mining Pipeline
Knowledge
Pattern
evaluation
Data modeling
Data
warehousing
Technique
Data
preprocessing
Data
understanding
Data
Technique View
Ø Frequent pattern analysis
Ø Classification, prediction
Ø Clustering
Ø Anomaly detection
Ø Trend and evolution analysis
Frequent Pattern
Analysis
Ø Frequent itemset
Ø Frequent sequence
Ø Frequent structure
Ø Association rules
Ø Correlation analysis
Classification
Ø Pre-defined
classes
Ø Need training data
Ø Build model to
distinguish classes
Prediction
Ø Numerical prediction
(continuous value)
• E.g., weather
• E.g., stock price
• E.g., traffic
Clustering
Ø No predefined
classes
Ø Intra-cluster
similarity
Ø Inter-cluster
dissimilarity
Anomaly Detection
280
Kelvin
• E.g., error, noise 220
160
199 200 200 200 200 200
8-0 0-0 1-0 2-1 4-0 5-0
9-1 1-3 6-1 0-2 3-1 7-2
8 1 5 8 1 4
Date (yyyy-mm-dd)
Trend and Evolution Analysis
Ø Changes over time
• Overall trend
• Periodical patterns
• Anomalies
• E.g.,
Data Mining Methods
Ø Frequent pattern analysis
Ø Classification
Ø Clustering
Ø Outlier analysis
Market Basket Analysis
Tid Items
Ø List of transactions
1 A, B, C, E
• Each Ti contains multiple items
2 A, D, E
Ø (Frequent) itemset
• X = {x1, x2, …, xk} 3 B, C, E
Ø (Minimum) support 4 B, C, D, E
• Probability of Ti containing X 5 B, D, E
Frequent Pattern Mining
Ø Brute force approach (e.g., 100 items)