DM UNIT-3
DM UNIT-3
Halt tree construction early—do not split a node if this would result in the goodness measure
falling below a threshold
Difficult to choose an appropriate threshold
Post pruning:
Remove branches from a “fully grown” tree—get a sequence of progressively pruned trees
Use a set of data different from the training data to decide which the “best pruned tree”
1. Deciding not to divide a set of samples any further under some conditions. The stopping
criterion is usually based on some statistical tests, such as the χ2 test: If there are no
significant differences in classification accuracy before and after division, then represent a
current node as a leaf. The decision is made in advance, before splitting, and therefore this
approach is called pre pruning.
2. Removing retrospectively some of the tree structure using selected accuracy criteria. The
decision in this process of post pruning is made after the tree has been built.
C4.5 follows the post pruning approach, but it uses a specific technique to estimate the predicted
error rate. This method is called pessimistic pruning. For every node in a tree, the estimation of
the upper confidence limit u cf is computed using the statistical tables for binomial distribution
(given in most textbooks on statistics). Parameter Ucf is a function of ∣ Ti∣ and E for a given node.
C4.5 uses the default confidence level of 25%, and compares U 25% (∣ Ti∣ /E) for a given node Ti
with a weighted confidence of its leaves. Weights are the total number of cases for every leaf. If
the predicted error for a root node in a sub tree is less than weighted sum of U 25% for the leaves
(predicted error for the sub tree), then a sub tree will be replaced with its root node, which
becomes a new leaf in a pruned tree.
Let us illustrate this procedure with one simple example. A sub tree of a decision tree is given in
Figure, where the root node is the test x1 on three possible values {1, 2, 3} of the attribute A. The
children of the root node are leaves denoted with corresponding classes and (∣ Ti∣ /E) parameters.
The question is to estimate the possibility of pruning the sub tree and replacing it with its root
node as a new, generalized leaf node.
To analyze the possibility of replacing the sub tree with a leaf node it is necessary to compute a
predicted error PE for the initial tree and for a replaced node. Using default confidence of 25%,
the upper confidence limits for all nodes are collected from statistical tables: U 25% (6, 0) = 0.206,
U25%(9, 0) = 0.143, U25%(1, 0) = 0.750, and U25%(16, 1) = 0.157. Using these values, the
predicted errors for the initial tree and the replaced node are
Since the existing subtree has a higher value of predicted error than the replaced node, it is
recommended that the decision tree be pruned and the subtree replaced with the new leaf node.
BAYESIAN CLASSIFICATION
• Probabilistic learning: Calculate explicit probabilities for hypothesis, among the most
practical approaches to certain types of learning problems
• Incremental: Each training example can incrementally increase/decrease the probability that a
hypothesis is correct. Prior knowledge can be combined with observed data.
• Probabilistic prediction: Predict multiple hypotheses, weighted by their probabilities
• Standard: Even when Bayesian methods are computationally intractable, they can provide a
standard of optimal decision making against which other methods can be measured
BAYESIAN THEOREM
• Given training data D, posteriori probability of a hypothesis h, P(h|D) follows the Bayes
theorem
Greatly reduces the computation cost, only count the class distribution.
Naive Bayesian Classifier (II)
Given a training set, we can compute the probabilities
Outlook P N Humidity P N
Temperature Windy
Object recognition
Speaker identification,
Any training tuples that fall on hyper planes H1 or H2 (i.e., the sides defining
the margin) are support vectors
This becomes a constrained (convex) quadratic optimization problem: Quadratic
objective function and linear constraints Quadratic Programming (QP)
Lagrangian multipliers
PREDICTION
(Numerical) prediction is similar to classification
construct a model
use model to predict continuous or ordered value for a given input
Prediction is different from classification
Classification refers to predict categorical class label
Prediction models continuous-valued functions
Major method for prediction: regression
model the relationship between one or more independent or predictorvariables
and a dependent or response variable
Regression analysis
Linear and multiple regression
Non-linear regression
Other regression methods: generalized linear model, Poisson
regression, log-linear models, regression trees
LINEAR REGRESSION
Linear regression: involves a response variable y and a single predictorvariable x
y = w0 + w1 x
Where w0 (y-intercept) and w1 (slope) are regression coefficients
Method of least squares: estimates the best-fitting straight line
Multiple linear regression: involves more than one predictor variable
Training data is of the form (X1, y1), (X2, y2),…, (X|D|, y|D|)
Ex. For 2-D data, we may have: y = w0 + w1 x1+ w2 x2
Solvable by extension of least square method or using SAS, S-Plus
Many nonlinear functions can be transformed into the above
Nonlinear Regression
Some nonlinear models can be modeled by a polynomial function
A polynomial regression model can be transformed into linear regression model. For
example,
y = w0 + w1 x + w2 x2 + w3 x3
Convertible to linear with new variables: x2 = x2, x3= x3
y = w0 + w1 x + w2 x2 + w3 x3
Other functions, such as power function, can also be transformed to linear model
Some models are intractable nonlinear (e.g., sum of exponential terms)
Possible to obtain least square estimates through extensive calculation on more
complex formulae
PART-A
Q. No Questions Competence BT Level
9. What inference can you formulate with Bayes theorem? Create BTL-6
10. Define Lazy learners and eager learners with an example. Remember BTL-1
PART-B