Week1-1
Week1-1
LECTURE 1
There is an inherent meaning in everything. “Signs for people who
can see.”
agenda
Course Introduction
Course Details
• Text book:
• Data Mining: Concepts and Techniques (2 nd Edition)
by Jiawei Han and Micheline Kamber
• Reference book:
• M. Kantardzic, Data Mining Concepts, Models,
Methods and Algorithms, IEEE Press, 2003
• M.H. Dunham, S. Sridhar, Data Mining Introductory
and Advanced Topics, Pearson Education, 2006
• Website:
• Some useful resources may be found at Jiawei
Han’s website (the lectures are inspired from him)
• www.cs.uiuc.edu/hanj/bk2
• www.mkp.com/datamining2e
Course Requirement
• Other Applications
• Text mining (news group, email, documents) and Web mining
• Stream data mining
• Bioinformatics and bio-data analysis
Ex. 1: Market Analysis and Management
• Where does the data come from?—Credit card transactions, loyalty cards,
discount coupons, customer complaint calls, plus (public) lifestyle studies
• Target marketing
• Find clusters of “model” customers who share the same characteristics: interest,
income level, spending habits, etc.
• Determine customer purchasing patterns over time
Database
Technology Statistics
Machine Visualization
Learning Data Mining
Pattern
Recognition Artificial
Algorithm Intelligence
Knowledge Discovery (KDD) Process
– Data mining—core of
knowledge discovery Pattern Evaluation
process
Data Mining
Task-relevant Data
Data Cleaning
Data Integration
Databases
KDD Process: Several Key Steps
• Learning the application domain
• relevant prior knowledge and goals of application
• Creating a target data set: data selection
• Data cleaning and preprocessing: (may take 60% of effort!)
• Data reduction and transformation
• Find useful features, dimensionality/variable reduction, invariant
representation
• Choosing functions of data mining
• summarization, classification, regression, association, clustering
• Choosing the mining algorithm(s)
• Data mining: search for patterns of interest
• Pattern evaluation and knowledge presentation
• visualization, transformation, removing redundant patterns, etc.
• Use of discovered knowledge