Classification & Prediction: - Shailesh Yadav Central University of Rajasthan

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 28

CLASSIFICATION & PREDICTION

- Shailesh Yadav
Central University Of Rajasthan
CONTENTS
 Classification & Prediction
 Methods Of Classification
 Other Classification Methods
 Prediction
 Conclusion
Classification vs. Prediction

 Classification
 predicts categorical class labels (discrete or nominal)
 classifies data (constructs a model) based on the training set and the
values (class labels) in a classifying attribute and uses it in classifying
new data
 Prediction
 models continuous-valued functions, i.e., predicts unknown or missing
values
 Typical applications:-
1-Credit/loan approval: Medical diagnosis: if a tumor is cancerous or benign
2-Fraud detection: if a transaction is fraudulent
3-Web page categorization: which category it is
Classification—A Two-Step Process

 Model construction: describing a set of predetermined classes


 Each tuple /sample is assumed to belong to a predefined class, as determined
by the class label attribute
 The set of tuples used for model construction is training set
 The model is represented as classification rules, decision trees, or
mathematical formulae
 Model usage: for classifying future or unknown objects
 Estimate accuracy of the model
 The known label of test sample is compared with the classified result
from the model
 Accuracy rate is the percentage of test set samples that are correctly
classified by the model
 Test set is independent of training set, otherwise over-fitting will occur
 If the accuracy is acceptable, use the model to classify data tuples whose class
labels are not known
Process (1): Model Construction

Classification
Algorithms
Training
Data

NAME RANK YEARS TENURED Classifier


Mike Assistant Prof 3 no (Model)
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes
Dave Assistant Prof 6 no IF rank = ‘professor’
Anne Associate Prof 3 no OR years > 6
THEN tenured = ‘yes’
Process (2): Using the Model in Prediction

Classifier

Testing
Data Unseen Data

(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom
Merlisa
Assistant Prof
Associate Prof
2
7
no
no
Tenured?
George Professor 5 yes
Joseph Assistant Prof 7 yes
Issues: Data Preparation

 Data cleaning
 Preprocess data in order to reduce noise and handle
missing values
 Relevance analysis (feature selection)
 Remove the irrelevant or redundant attributes
 Data transformation
 Generalize and/or normalize data
Issues: Evaluating Classification Methods

 Accuracy
 classifier accuracy: predicting class label
 predictor accuracy: guessing value of predicted attributes
 Speed
 time to construct the model (training time)
 time to use the model (classification/prediction time)
 Robustness: handling noise and missing values
 Scalability: efficiency in disk-resident databases
 Interpretability
 understanding and insight provided by the model
 Other measures, e.g., goodness of rules, such as decision tree size or
compactness of classification rules
Methods Of Classification

 By Decision Tree Induction


 Bayesian Classification
 Rule Based Classification
Decision Tree Induction
 A decision tree induction is the learning of
decision trees from class- labeled training
tuples. A decision tree is a flowchart- like tree
structure, where each internal node (non leaf
node) denotes a test on an attribute, each
branch represents an outcome of the test, and
each leaf node (or terminal node) hold a class
label. The topmost node in a tree is the root
node.
Decision Tree Induction: Training Dataset

age income student credit_rating buys_computer

This <=30
<=30
high
high
no
no
fair
excellent
no
no
follows 31…40 high no fair yes
>40 medium no fair yes
an >40 low yes fair yes
example >40
31…40
low
low
yes
yes
excellent
excellent
no
yes
of <=30 medium no fair no

Quinlan’s <=30
>40
low
medium
yes
yes
fair
fair
yes
yes
ID3 <=30 medium yes excellent yes
31…40 medium no excellent yes
(Playing 31…40 high yes fair yes
Tennis) >40 medium no excellent no
Output: A Decision Tree for “buys_computer”

age?

<=30 overcast
31..40 >40

student? yes credit rating?

no yes excellent fair

no yes no yes
Algorithm for Decision Tree Induction

 Basic algorithm (a greedy algorithm)


 Tree is constructed in a top-down recursive divide-and-conquer manner
 At start, all the training examples are at the root
 Attributes are categorical (if continuous-valued, they are discretized in advance)
 Examples are partitioned recursively based on selected attributes
 Test attributes are selected on the basis of a heuristic or statistical measure (e.g.,
information gain)
 Conditions for stopping partitioning
 All samples for a given node belong to the same class
 There are no remaining attributes for further partitioning – majority voting is
employed for classifying the leaf
 There are no samples left
Bayesian Classification: Why?

 A statistical classifier: performs probabilistic prediction, i.e.,


predicts class membership probabilities
 Foundation: Based on Baye’s Theorem.
 Performance: A simple Bayesian classifier, naïve Bayesian
classifier, has comparable performance with decision tree and
selected neural network classifiers
 Incremental: Each training example can incrementally
increase/decrease the probability that a hypothesis is correct —
prior knowledge can be combined with observed data
 Standard: Even when Bayesian methods are computationally
intractable, they can provide a standard of optimal decision
making against which other methods can be measured
Bayesian Theorem: Basics
 Let X be a data sample (“evidence”): class label is unknown
 Let H be a hypothesis that X belongs to class C
 Classification is to determine P(H|X), the probability that the hypothesis
holds given the observed data sample X
 P(H) (prior probability), the initial probability
 E.g., X will buy computer, regardless of age, income, …
 P(X): probability that sample data is observed
 P(X|H) (posteriori probability), the probability of observing the sample X,
given that the hypothesis holds
 E.g., Given that X will buy computer, the prob. that X is 31..40, medium
income
Bayesian Theorem

 Given training data X, posteriori probability of a hypothesis H, P(H|X),


follows the Bayes theorem
P( H | X)  P(X | H ) P(H )
P(X)
 Informally, this can be written as
posteriori = likelihood x prior/evidence
 Predicts X belongs to C2 iff the probability P(Ci|X) is the highest among
all the P(Ck|X) for all the k classes
 Practical difficulty: require initial knowledge of many probabilities,
significant computational cost
Using IF-THEN Rules for Classification

 Represent the knowledge in the form of IF-THEN rules


R: IF age = youth AND student = yes THEN buys_computer = yes
 Rule antecedent/precondition vs. rule consequent
 Assessment of a rule: coverage and accuracy
 ncovers = # of tuples covered by R
 ncorrect = # of tuples correctly classified by R
coverage(R) = ncovers /|D| /* D: training data set */
accuracy(R) = ncorrect / ncovers
 If more than one rule is triggered, need conflict resolution
 Size ordering: assign the highest priority to the triggering rules that has the “toughest”
requirement (i.e., with the most attribute test)
 Class-based ordering: decreasing order of prevalence or misclassification cost per class
 Rule-based ordering (decision list): rules are organized into one long priority list, according
to some measure of rule quality or by experts
Other Classification Methods

 Genetic Algorithms
 Rough Set Approach
 Fuzzy Set Approach
What Is Prediction?
 (Numerical) prediction is similar to classification
 construct a model
 use model to predict continuous or ordered value for a given input
 Prediction is different from classification
 Classification refers to predict categorical class label
 Prediction models continuous-valued functions
 Major method for prediction: regression
 model the relationship between one or more independent or predictor variables
and a dependent or response variable
 Regression analysis
 Linear and multiple regression
 Non-linear regression
 Other regression methods: generalized linear model, Poisson regression, log-linear
models, regression trees
Linear Regression
 Linear regression: involves a response variable y and a single predictor variable x
y = w0 + w1 x
where w0 (y-intercept) and w1 (slope) are regression coefficients
 Method of least squares: estimates the best-fitting straight line
| D|

 (x  x )( yi  y )
w  w  yw x
i
i 1

1 | D|
0 1
 (x i 1
i  x)2

 Multiple linear regression: involves more than one predictor variable


 Training data is of the form (X1, y1), (X2, y2),…, (X|D|, y|D|)
 Ex. For 2-D data, we may have: y = w0 + w1 x1+ w2 x2
 Solvable by extension of least square method or using SAS, S-Plus
 Many nonlinear functions can be transformed into the above
Nonlinear Regression

 Some nonlinear models can be modeled by a polynomial function


 A polynomial regression model can be transformed into linear regression
model. For example,
y = w0 + w 1 x + w2 x 2 + w3 x3
convertible to linear with new variables: x2 = x2, x3= x3
y = w0 + w 1 x + w2 x 2 + w3 x3
 Other functions, such as power function, can also be transformed to linear
model
 Some models are intractable nonlinear (e.g., sum of exponential terms)
 possible to obtain least square estimates through extensive calculation on
more complex formulae
Other Regression-Based Models

 Generalized linear model:


 Foundation on which linear regression can be applied to modeling
categorical response variables
 Variance of y is a function of the mean value of y, not a constant
 Logistic regression: models the prob. of some event occurring as a linear
function of a set of predictor variables
 Poisson regression: models the data that exhibit a Poisson distribution
 Log-linear models: (for categorical data)
 Approximate discrete multidimensional prob. distributions
 Also useful for data compression and smoothing
 Regression trees and model trees
 Trees to predict continuous values rather than class labels
Predictor Error Measures
 Measure predictor accuracy: measure how far off the predicted value is from the
actual known value
 Loss function: measures the error betw. y i and the predicted value yi’
 Absolute error: | yi – yi’|
 Squared error: (yi – yi’)2
 Test error (generalization error): the average loss over the test set
d d
 Mean absolute error:  | yi  yiMean
'| squared error: (y i  yi ' ) 2
i 1 i 1
d d
d
 Relative absolute error: | y Relative squared error:
 yi '| d

(y  yi ' ) 2
i
i 1 i
d
i 1
| y i y| d
i 1
(y i  y)2
The mean squared-error exaggerates the presence of outliers i 1

Popularly use (square) root mean-square error, similarly, root relative squared
error
Conclusion
 Classification and prediction are two forms of data analysis that can be used to
extract models describing important data classes or to predict future data trends.
 Effective and scalable methods have been developed for decision trees
induction, Naive Bayesian classification, Bayesian belief network, rule-based
classifier, Back propagation, Support Vector Machine (SVM), associative
classification, nearest neighbor classifiers, and case-based reasoning, and other
classification methods such as genetic algorithms, rough set and fuzzy set
approaches.
 Linear, nonlinear, and generalized linear models of regression can be used for
prediction. Many nonlinear problems can be converted to linear problems by
performing transformations on the predictor variables. Regression trees and
model trees are also used for prediction.
Conclusion (cont.)

 Stratified k-fold cross-validation is a recommended method for accuracy


estimation. Bagging and boosting can be used to increase overall accuracy
by learning and combining a series of individual models.
 There have been numerous comparisons of the different classification and
prediction methods, and the matter remains a research topic
 No single method has been found to be superior over all others for all data
sets
 Issues such as accuracy, training time, robustness, interpretability, and
scalability must be considered and can involve trade-offs, further
complicating the quest for an overall superior method
References

 L. Breiman, J. Friedman, R. Olshen, and C.


Stone. Classification and Regression
Trees. Wadsworth International Group,
1984.
 Jiawei Han and Micheline Kamber :Data
Mining- Concepts and Techniques
.
ANY QUERY ?
THANK YOU!

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy