Data Mining - Lecture 2
Data Mining - Lecture 2
Data Mining - Lecture 2
1960s:
Data collection, database creation, IMS and network DBMS
1970s:
Relational data model, relational DBMS implementation
1980s:
RDBMS, advanced data models (extended-relational, OO,
deductive, etc.) and application-oriented DBMS (spatial, scientific,
engineering, etc.)
1990s—2000s:
Data mining and data warehousing, multimedia databases, and
Web databases
Other Applications
Text mining (news group, email, documents) and Web analysis.
Intelligent query answering
Customer profiling
data mining can tell you what types of customers buy what
products (clustering or classification)
Identifying customer requirements
identifying the best products for different customers
use prediction to find what factors will attract new customers
Applications
widely used in health care, retail, credit card services,
telecommunications (phone card fraud), etc.
Approach
use historical data to build models of fraudulent behavior and use
data mining to help identify similar instances
Examples
auto insurance: detect a group of people who stage accidents to
collect on insurance
money laundering: detect suspicious money transactions (US
Treasury's Financial Crimes Enforcement Network)
medical insurance: detect professional patients and ring of doctors
and ring of references
May 12, 2024 Data Mining: Concepts and Techniques 9
Fraud Detection and Management (2)
Sports
IBM Advanced Scout analyzed NBA game statistics (shots
blocked, assists, and fouls) to gain competitive advantage for
New York Knicks and Miami Heat
Astronomy
JPL and the Palomar Observatory discovered 22 quasars with the
help of data mining
Internet Web Surf-Aid
IBM Surf-Aid applies data mining algorithms to Web access logs
for market-related pages to discover customer preference and
behavior pages, analyzing effectiveness of Web marketing,
improving Web site organization, etc.
May 12, 2024 Data Mining: Concepts and Techniques 11
Data Mining: A KDD Process
Pattern Evaluation
Data mining: the core of
knowledge discovery process.
Data Mining
Task-relevant Data
Data Selection
Warehouse
Data Cleaning
Data Integration
Databases
May 12, 2024 Data Mining: Concepts and Techniques 12
Steps of a KDD Process
Data Exploration
Statistical Analysis, Querying and Reporting
Data Warehouses / Data Marts
OLAP, MDA DBA
Data Sources
Paper, Files, Information Providers, Database Systems, OLTP
May 12, 2024 Data Mining: Concepts and Techniques 14
Architecture of a Typical Data
Mining System
Graphical user interface
Pattern evaluation
Data
Databases Warehouse
Relational databases
Data warehouses
Transactional databases
Advanced DB and information repositories
Object-oriented and object-relational databases
Spatial databases
Time-series data and temporal data
Text databases and multimedia databases
Heterogeneous and legacy databases
WWW
May 12, 2024 Data Mining: Concepts and Techniques 16
Object-Relational Databases
Cluster analysis
Class label is unknown: Group data to form new classes, e.g.,
cluster houses to find distribution patterns
Clustering based on the principle: maximizing the intra-class
similarity and minimizing the interclass similarity
May 12, 2024 Data Mining: Concepts and Techniques 22
Data Mining Functionalities (3)
Outlier analysis
Outlier: a data object that does not comply with the general behavior of
the data
It can be considered as noise or exception but is quite useful in fraud
Similarity-based analysis
Machine
Learning
Data Mining Visualization
Information Other
Science Disciplines
General functionality
Descriptive data mining
Layer2
MDDB
MDDB
Meta
Data
Filtering&Integration Database API Filtering
Layer1
Data cleaning Data
Databases Data
Data integration Warehouse Repository
May 12, 2024 Data Mining: Concepts and Techniques 30
Major Issues in Data Mining (1)