Classification & Prediction: - Shailesh Yadav Central University of Rajasthan
Classification & Prediction: - Shailesh Yadav Central University of Rajasthan
Classification & Prediction: - Shailesh Yadav Central University of Rajasthan
- Shailesh Yadav
Central University Of Rajasthan
CONTENTS
Classification & Prediction
Methods Of Classification
Other Classification Methods
Prediction
Conclusion
Classification vs. Prediction
Classification
predicts categorical class labels (discrete or nominal)
classifies data (constructs a model) based on the training set and the
values (class labels) in a classifying attribute and uses it in classifying
new data
Prediction
models continuous-valued functions, i.e., predicts unknown or missing
values
Typical applications:-
1-Credit/loan approval: Medical diagnosis: if a tumor is cancerous or benign
2-Fraud detection: if a transaction is fraudulent
3-Web page categorization: which category it is
Classification—A Two-Step Process
Classification
Algorithms
Training
Data
Classifier
Testing
Data Unseen Data
(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom
Merlisa
Assistant Prof
Associate Prof
2
7
no
no
Tenured?
George Professor 5 yes
Joseph Assistant Prof 7 yes
Issues: Data Preparation
Data cleaning
Preprocess data in order to reduce noise and handle
missing values
Relevance analysis (feature selection)
Remove the irrelevant or redundant attributes
Data transformation
Generalize and/or normalize data
Issues: Evaluating Classification Methods
Accuracy
classifier accuracy: predicting class label
predictor accuracy: guessing value of predicted attributes
Speed
time to construct the model (training time)
time to use the model (classification/prediction time)
Robustness: handling noise and missing values
Scalability: efficiency in disk-resident databases
Interpretability
understanding and insight provided by the model
Other measures, e.g., goodness of rules, such as decision tree size or
compactness of classification rules
Methods Of Classification
This <=30
<=30
high
high
no
no
fair
excellent
no
no
follows 31…40 high no fair yes
>40 medium no fair yes
an >40 low yes fair yes
example >40
31…40
low
low
yes
yes
excellent
excellent
no
yes
of <=30 medium no fair no
Quinlan’s <=30
>40
low
medium
yes
yes
fair
fair
yes
yes
ID3 <=30 medium yes excellent yes
31…40 medium no excellent yes
(Playing 31…40 high yes fair yes
Tennis) >40 medium no excellent no
Output: A Decision Tree for “buys_computer”
age?
<=30 overcast
31..40 >40
no yes no yes
Algorithm for Decision Tree Induction
Genetic Algorithms
Rough Set Approach
Fuzzy Set Approach
What Is Prediction?
(Numerical) prediction is similar to classification
construct a model
use model to predict continuous or ordered value for a given input
Prediction is different from classification
Classification refers to predict categorical class label
Prediction models continuous-valued functions
Major method for prediction: regression
model the relationship between one or more independent or predictor variables
and a dependent or response variable
Regression analysis
Linear and multiple regression
Non-linear regression
Other regression methods: generalized linear model, Poisson regression, log-linear
models, regression trees
Linear Regression
Linear regression: involves a response variable y and a single predictor variable x
y = w0 + w1 x
where w0 (y-intercept) and w1 (slope) are regression coefficients
Method of least squares: estimates the best-fitting straight line
| D|
(x x )( yi y )
w w yw x
i
i 1
1 | D|
0 1
(x i 1
i x)2
(y yi ' ) 2
i
i 1 i
d
i 1
| y i y| d
i 1
(y i y)2
The mean squared-error exaggerates the presence of outliers i 1
Popularly use (square) root mean-square error, similarly, root relative squared
error
Conclusion
Classification and prediction are two forms of data analysis that can be used to
extract models describing important data classes or to predict future data trends.
Effective and scalable methods have been developed for decision trees
induction, Naive Bayesian classification, Bayesian belief network, rule-based
classifier, Back propagation, Support Vector Machine (SVM), associative
classification, nearest neighbor classifiers, and case-based reasoning, and other
classification methods such as genetic algorithms, rough set and fuzzy set
approaches.
Linear, nonlinear, and generalized linear models of regression can be used for
prediction. Many nonlinear problems can be converted to linear problems by
performing transformations on the predictor variables. Regression trees and
model trees are also used for prediction.
Conclusion (cont.)