0% found this document useful (0 votes)

8 views

DM UNIT-3

This document discusses classification and prediction methods in data mining, focusing on techniques such as decision tree induction, Bayesian classification, and support vector machines. It outlines the processes involved in model construction and usage, as well as issues related to data preparation and evaluating classification methods. The document also highlights the importance of avoiding overfitting and provides insights into various classification algorithms and their applications.

Uploaded by

sahasrana7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

DM UNIT-3

Uploaded by

sahasrana7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

UNIT – III - CLASSIFICATION AND PREDICTION

CLASSIFICATION AND PREDICTION

Classification and prediction - Issues Regarding Classification and Prediction -
Classification by Decision Tree Induction -Bayesian classification - Baye’s
Theorem - Naïve Bayesian Classification - Bayesian Belief Network - Rule based
classification - Classification by Back propagation - Support vector machines -
Prediction - Linear Regression.
CLASSIFICATION AND PREDICTION
* Used for prediction (future analysis) to know the unknown attributes with their values by using
classifier algorithms and decision tree. (In data mining)
* Which constructs some models (like decision trees) then which classifies the attributes.
* Already we know the types of attributes are 1.categorical attribute and 2.numerical attribute
* These classification can work on both the above mentioned attributes.
Prediction: prediction also used for to know the unknown or missing values.
1. Which also uses some models in order to predict the attributes
2. Models like neural networks, if else rules and other mechanisms
Classification and prediction are used in the Applications like
*Credit approval
*Target marketing
*Medical diagnosis
Classification—A Two-Step Process
• Model construction: describing a set of predetermined classes
– Each tuple/sample is assumed to belong to a predefined class, as determined by the class label
attribute
– The set of tuples used for model construction: training set
– The model is represented as classification rules, decision trees, or mathematical formulae
• Model usage: for classifying future or unknown objects
– Estimate accuracy of the model
• The known label of test sample is compared with the classified result from the model
• Accuracy rate is the percentage of test set samples that are correctly classified by the model
• Test set is independent of training set, otherwise over-fitting will occur
Process (1): Model Construction

Figure 4.1 Model Construction

Process (2): Using the Model in Prediction

Figure 4.2 Model in Prediction

Supervised vs. Unsupervised Learning
 Supervised learning (classification)
 Supervision: The training data (observations, measurements, etc.) are
accompanied by labels indicating the class of the observations
 New data is classified based on the training set
 Unsupervised learning (clustering)
 The class labels of training data is unknown
 Given a set of measurements, observations, etc. with the aim of establishing the
existence of classes or clusters in the data
ISSUES REGARDING CLASSIFICATION AND PREDICTION
There are two issues regarding classification and prediction they are
Issues (1): Data Preparation
Issues (2): Evaluating Classification Methods
Issues (1): Data Preparation: Issues of data preparation includes the following
1) Data cleaning
Preprocess data in order to reduce noise and handle missing values (refer preprocessing
techniques i.e. data cleaning notes)
2) Relevance analysis (feature selection)
Remove the irrelevant or redundant attributes (refer unit-iv AOI Relevance analysis)
3) Data transformation (refer preprocessing techniques i.e data cleaning notes) Generalize and/or
normalize data
Issues (2): Evaluating Classification Methods: considering classification methods should
satisfy the following properties
1. Predictive accuracy
2. Speed and scalability
 Time to construct the model
 Time to use the model
3. Robustness
 Handling noise and missing values
4. Scalability
 Efficiency in disk-resident databases
5. Interpretability:
 Understanding and insight provided by the model
6. Goodness of rules
 Decision tree size
 Compactness of classification rules
Comparing Classification Methods
Classification and prediction methods can be compared and evaluated according to the following
criteria:
Predictive Accuracy: This refers to the ability of the model to correctly predict the class label of
new or previously unseen data.
Speed: This refers to the computation costs involved in generating and using the model.
Robustness: This is the ability of the model to make correct predictions given noisy data or data
with missing values.
Scalability: This refers to the ability to construct the model efficiently given large amount of data.
Interpretability: This refers to the level of understanding and insight that is provided by the model
CLASSIFICATION BY DECISION TREE INDUCTION
Decision tree
– A flow-chart-like tree structure
– Internal node denotes a test on an attribute
– Branch represents an outcome of the test
– Leaf nodes represent class labels or class distribution
• Decision tree generation consists of two phases
– Tree construction
• At start, all the training examples are at the root
• Partition examples recursively based on selected attributes
– Tree pruning
• Identify and remove branches that reflect noise or outliers
• Use of decision tree: Classifying an unknown sample
– Test the attribute values of the sample against the decision tree
Training Dataset
This follows an example from Quinlan’s ID3
Age Income Student Credit rating
<=30 High No Fair
<=30 High No Excellent
31…40 High No Fair
>40 Medium No Fair
>40 Low Yes Fair
>40 Low Yes Excellent
31…40 Low Yes Excellent
<=30 Medium No Fair
<=30 Low Yes Fair
>40 Medium Yes Fair
<=30 Medium Yes Excellent
31…40 Medium No Excellent
31…40 High Yes Fair
>40 Medium Excellent

Table 4.1 Training Dataset

Algorithm for decision tree induction
• Basic algorithm (a greedy algorithm)
– Tree is constructed in a top-down recursive divide-and-conquer manner
– At start, all the training examples are at the root
– Attributes are categorical (if continuous-valued, they are discretized in advance)
– Examples are partitioned recursively based on selected attributes
– Test attributes are selected on the basis of a heuristic or statistical measure (e.g.,
information gain)
• Conditions for stopping partitioning
– All samples for a given node belong to the same class
– There are no remaining attributes for further partitioning – majority voting is employed
for classifying the leaf
– There are no samples left
Extracting Classification Rules from Trees

• Represent the knowledge in the form of IF-THEN rules

• One rule is created for each path from the root to a leaf
• Each attribute-value pair along a path forms a conjunction
• The leaf node holds the class prediction
• Rules are easier for humans to understand
Example
IF age = “<=30” AND student = “no” THEN buys_computer = “no”

IF age = “<=30” AND student = “yes” THEN buys_computer = “yes”

IF age = “31…40” THEN buys_computer = “yes”

IF age = “>40” AND credit_rating = “excellent” THEN buys_computer = “yes”

IF age = “>40” AND credit_rating = “fair” THEN buys_computer = “no”

Avoid Overfitting in Classification

• The generated tree may overfit the training data

– Too many branches, some may reflect anomalies due to noise or outliers
– Result is in poor accuracy for unseen samples
• Two approaches to avoid over fitting
Prepruning:

 Halt tree construction early—do not split a node if this would result in the goodness measure
falling below a threshold
 Difficult to choose an appropriate threshold
Post pruning:

 Remove branches from a “fully grown” tree—get a sequence of progressively pruned trees
 Use a set of data different from the training data to decide which the “best pruned tree”

Tree Mining in Weka and Tree Mining in Clementine.

Tree Mining in Weka
• Example:
– Weather problem: build a decision tree to guide the decision about whether or not to play
tennis.
– Dataset (weather.nominal.arff)
• Validation:
– Using training set as a test set will provide optimal classification accuracy.
– Expected accuracy on a different test set will always be less.
– 10-fold cross validation is more robust than using the training set as a test set.
• Divide data into 10 sets with about same proportion of class label values as in original set.
• Run classification 10 times independently with the remaining 9/10 of the set as the training set.
• Average accuracy.
– Ratio validation: 67% training set / 33% test set.
– Best: having a separate training set and test set.
• Results:
– Classification accuracy (correctly classified instances).
– Errors (absolute mean, root squared mean …)
– Kappa statistic (measures agreement between predicted and observed classification; -
100%-100% is the proportion of agreements after chance agreement has been excluded; 0%
means complete agreement by chance)
• Results:
– TP (True Positive) rate per class label
– FP (False Positive) rate
– Precision = TP rate = TP / (TP + FN)) * 100%
– Recall = TP / (TP + FP)) * 100%
– F-measure = 2* recall * precision / recall + precision
• ID3 characteristics:
– Requires nominal values
– Improved into C4.5
• Dealing with numeric attributes
• Dealing with missing values
• Dealing with noisy data
• Generating rules from trees
Tree Mining in Clementine
• Methods:
– C5.0: target field must be categorical, predictor fields may be numeric or categorical,
provides multiple splits on the field that provides the maximum information gain at each level
– QUEST: target field must be categorical, predictor fields may be numeric ranges or
categorical, statistical binary split
– C&RT: target and predictor fields may be numeric ranges or categorical, statistical binary
split based on regression
– CHAID: target and predictor fields may be numeric ranges or categorical, statistical binary
split based on chi-square
Attribute Selection Measures
• Information Gain
• Gain ratio
• Gini Index
Pruning of decision trees
Discarding one or more sub trees and replacing them with leaves simplify a decision tree, and
that is the main task in decision-tree pruning. In replacing the sub tree with a leaf, the algorithm
expects to lower the predicted error rate and increase the quality of a classification model. But
computation of error rate is not simple. An error rate based only on a training data set does not
provide a suitable estimate. One possibility to estimate the predicted error rate is to use a new,
additional set of test samples if they are available, or to use the cross-validation techniques. This
technique divides initially available samples into equal sized blocks and, for each block; the tree
is constructed from all samples except this block and tested with a given block of samples. With
the available training and testing samples, the basic idea of decision tree-pruning is to remove
parts of the tree (sub trees) that do not contribute to the classification accuracy of unseen testing
samples, producing a less complex and thus more comprehensible tree. There are two ways in
which therecursive-partitioning method can be modified:

1. Deciding not to divide a set of samples any further under some conditions. The stopping
criterion is usually based on some statistical tests, such as the χ2 test: If there are no
significant differences in classification accuracy before and after division, then represent a
current node as a leaf. The decision is made in advance, before splitting, and therefore this
approach is called pre pruning.
2. Removing retrospectively some of the tree structure using selected accuracy criteria. The
decision in this process of post pruning is made after the tree has been built.
C4.5 follows the post pruning approach, but it uses a specific technique to estimate the predicted
error rate. This method is called pessimistic pruning. For every node in a tree, the estimation of
the upper confidence limit u cf is computed using the statistical tables for binomial distribution
(given in most textbooks on statistics). Parameter Ucf is a function of ∣ Ti∣ and E for a given node.
C4.5 uses the default confidence level of 25%, and compares U 25% (∣ Ti∣ /E) for a given node Ti
with a weighted confidence of its leaves. Weights are the total number of cases for every leaf. If
the predicted error for a root node in a sub tree is less than weighted sum of U 25% for the leaves
(predicted error for the sub tree), then a sub tree will be replaced with its root node, which
becomes a new leaf in a pruned tree.
Let us illustrate this procedure with one simple example. A sub tree of a decision tree is given in
Figure, where the root node is the test x1 on three possible values {1, 2, 3} of the attribute A. The
children of the root node are leaves denoted with corresponding classes and (∣ Ti∣ /E) parameters.
The question is to estimate the possibility of pruning the sub tree and replacing it with its root
node as a new, generalized leaf node.
To analyze the possibility of replacing the sub tree with a leaf node it is necessary to compute a
predicted error PE for the initial tree and for a replaced node. Using default confidence of 25%,
the upper confidence limits for all nodes are collected from statistical tables: U 25% (6, 0) = 0.206,
U25%(9, 0) = 0.143, U25%(1, 0) = 0.750, and U25%(16, 1) = 0.157. Using these values, the
predicted errors for the initial tree and the replaced node are

PEtree = 6.0.206 + 9.0.143 + 1.0.750 = 3.257

PEnode = 16.0.157 = 2.512

Since the existing subtree has a higher value of predicted error than the replaced node, it is
recommended that the decision tree be pruned and the subtree replaced with the new leaf node.
BAYESIAN CLASSIFICATION
• Probabilistic learning: Calculate explicit probabilities for hypothesis, among the most
practical approaches to certain types of learning problems
• Incremental: Each training example can incrementally increase/decrease the probability that a
hypothesis is correct. Prior knowledge can be combined with observed data.
• Probabilistic prediction: Predict multiple hypotheses, weighted by their probabilities
• Standard: Even when Bayesian methods are computationally intractable, they can provide a
standard of optimal decision making against which other methods can be measured
BAYESIAN THEOREM
• Given training data D, posteriori probability of a hypothesis h, P(h|D) follows the Bayes
theorem

• MAP (maximum posteriori) hypothesis

• Practical difficulty: require initial knowledge of many probabilities, significant computational

cost
Naïve Bayes Classifier (I)
 A simplified assumption: attributes are conditionally independent:
n

P(C j |V )  P(C j ) P(vi | C j )

i1

 Greatly reduces the computation cost, only count the class distribution.
Naive Bayesian Classifier (II)
Given a training set, we can compute the probabilities
Outlook P N Humidity P N

sunny 2/9 3/5 high 3/9 4/5

overcast 4/9 0 normal 6/9 1/5

rain 3/9 2/5

Temperature Windy

hot 2/9 2/5 true 3/9 3/5

mild 4/9 2/5 false 6/9 2/5

cool 3/9 1/5

Table 4.1 Training Dataset

BAYESIAN CLASSIFICATION
• The classification problem may be formalized using a-posteriori probabilities:
• P(C|X) = prob. that the sample tuple
• X=<x1,…,xk> is of class C.
• E.g. P(class=N | outlook=sunny, windy=true,…)
• Idea: assign to sample X the class label C such that P(C|X) is maximal
Estimating a-posteriori probabilities
• Bayes theorem:
P(C|X) = P(X|C)·P(C) / P(X)
• P(X) is constant for all classes
• P(C) = relative freq of class C samples
• C such that P(C|X) is maximum = C such that P(X|C)·P(C) is maximum
• Problem: computing P(X|C) is unfeasible!

NAÏVE BAYESIAN CLASSIFICATION

• Naïve assumption: attribute independence
P(x1,…,xk|C) =
P(x1|C)·…·P(xk|C)
• If i-th attribute is categorical:
P(xi|C) is estimated as the relative freq of samples having value xi as i-th attribute in
class C
• If i-th attribute is continuous:
P(xi|C) is estimated thru a Gaussian density function
• Computationally easy in both cases
BAYESIAN BELIEF NETWORKS
 Bayesian belief network allows a subset of the variables conditionally independent
 A graphical model of causal relationships
 Represents dependency among the variables
 Gives a specification of joint probability distribution
Figure 4.3 Bayesian Belief Networks

Bayesian belief network: an example

Figure 4.4 Bayesian Belief Networks

The conditional probability table (CPT) for variable Lung Cancer:

Table 4.2 conditional probability table

CPT shows the conditional probability for each possible combination of its parents
Derivation of the probability of a particular combination of values of X, from CPT:
Association-Based Classification
• Several methods for association-based classification
– ARCS: Quantitative association mining and clustering of association rules (Lent et
al’97)
• It beats C4.5 in (mainly) scalability and also accuracy
– Associative classification: (Liu et al’98)
• It mines high support and high confidence rules in the form of “cond_set => y”, where y is a
class label
– CAEP (Classification by aggregating emerging patterns) (Dong et al’99)
 Emerging patterns (EPs): the item sets whose support increases significantly from
one class to another
 Mine Eps based on minimum support and growth rate

RULE BASED CLASSIFICATION

Using IF-THEN Rules for Classification
 Represent the knowledge in the form of IF-THEN rules
R: IF age = youth AND student = yes THEN buys_computer = yes
 Rule antecedent/precondition vs. rule consequent
 Assessment of a rule: coverage and accuracy
 ncovers = # of tuples covered by R
 ncorrect = # of tuples correctly classified by R
coverage(R) = ncovers /|D| /* D: training data set */ accuracy(R) =
ncorrect / ncovers
 If more than one rule is triggered, need conflict resolution
 Size ordering: assign the highest priority to the triggering rules thathas the
“toughest” requirement (i.e., with the most attribute test)

 Class-based ordering: decreasing order of prevalence or

misclassification cost per class
 Rule-based ordering (decision list): rules are organized into one long priority
list, according to some measure of rule quality or by experts
Rule Extraction from a Decision Tree
 Rules are easier to understand than large trees
 One rule is created for each path from the root to a leaf
 Each attribute-value pair along a path forms a conjunction: the leaf holds the class
prediction
 Rules are mutually exclusive and exhaustive

Figure 4.5 Decision Tree

 Example: Rule extraction from our buys_computer decision-tree

IF age = young AND student = no THEN buys_computer = no

IF age = young AND student = yes THEN buys_computer = yes
IF age = mid-age THEN buys_computer = yes
IF age = old AND credit_rating = excellent THEN buys_computer = yes
IF age = young AND credit_rating = fair THEN buys_computer = no
Rule Extraction from the Training Data

 Sequential covering algorithm: Extracts rules directly from training data

 Typical sequential covering algorithms: FOIL, AQ, CN2, RIPPER
 Rules are learned sequentially, each for a given class Ci will cover many tuples of
Ci but none (or few) of the tuples of other classes
 Steps:
 Rules are learned one at a time
 Each time a rule is learned, the tuples covered by the rules are removed

 The process repeats on the remaining tuples unless termination condition,

e.g., when no more training examples or when the quality of a rule returned is
below a user-specified threshold
 Comp. w. decision-tree induction: learning a set of rules simultaneously
CLASSIFICATION BY BACKPROPAGATION
 Back propagation: A neural network learning algorithm
 Started by psychologists and neurobiologists to develop and test computational analogues of
neurons
 A neural network: A set of connected input/output units where each connection has a weight
associated with it
 During the learning phase, the network learns by adjusting the weights so as to be able to
predict the correct class label of the input tuples
 Also referred to as connectionist learning due to the connections between units
Neural network as a classifier
 Weakness
 Long training time
 Require a number of parameters typically best determined empirically, e.g., the
network topology or ``structure."
 Poor interpretability: Difficult to interpret the symbolic meaning behind the
learned weights and of ``hidden units" in the network
 Strength
 High tolerance to noisy data
 Ability to classify untrained patterns
 Well-suited for continuous-valued inputs and outputs
 Algorithms are inherently parallel
 Techniques have recently been developed for the extraction of rules from trained
neural networks
A Neuron (= a perceptron)

Figure 4.6 Neuron

 The n-dimensional input vector x is mapped into variable y by means of the scalar
productand a nonlinear function mapping
A multi-layer feed-forward neural network

Figure 4.7 A multi-layer feed-forward neural network

The inputs to the network correspond to the attributes measured for each training tuple
 Inputs are fed simultaneously into the units making up the input layer
 They are then weighted and fed simultaneously to a hidden layer
 The number of hidden layers is arbitrary, although usually only one
 The weighted outputs of the last hidden layer are input to units making up the output layer,
which emits the network's prediction
 The network is feed-forward in that none of the weights cycles back to an input unit or to an
output unit of a previous layer
 From a statistical point of view, networks perform nonlinear regression: Given enough
hidden units and enough training samples, they can closely approximate any function
Back propagation
 Iteratively process a set of training tuples & compare the network's prediction with the actual
known target value
 For each training tuple, the weights are modified to minimize the mean squared error
between the network's prediction and the actual target value
 Modifications are made in the “backwards” direction: from the output layer, through each
hidden layer down to the first hidden layer, hence “backpropagation”
 Steps
 Initialize weights (to small random #s) and biases in the network
 Propagate the inputs forward (by applying activation function)
 Back propagate the error (by updating weights and biases)
 Terminating condition (when error is very small, etc.)
 Efficiency of backpropagation: Each epoch (one interaction through the training set)
takes O(|D| * w), with |D| tuples and w weights, but # of epochs can be exponential to n,
the number of inputs, in the worst case
 Rule extraction from networks: network pruning
 Simplify the network structure by removing weighted links that have the least
effect on the trained network
 Then perform link, unit, or activation value clustering
 The set of input and activation values are studied to derive rules describing the
relationship between the input and hidden unit layers
 Sensitivity analysis: assess the impact that a given input variable has on a network
output. The knowledge gained from this analysis can be represented in rules
SVM—SUPPORT VECTOR MACHINES
 A new classification method for both linear and nonlinear data
 It uses a nonlinear mapping to transform the original training data into a higher dimension
 With the new dimension, it searches for the linear optimal separating hyper plane (i.e.,
“decision boundary”)
 With an appropriate nonlinear mapping to a sufficiently high dimension, data from two
classes can always be separated by a hyper plane
 SVM finds this hyper plane using support vectors (“essential” training tuples) and margins
(defined by the support vectors)
 Features: training can be slow but accuracy is high owing to their ability to model complex
nonlinear decision boundaries (margin maximization)
 Used both for classification and prediction
 Applications

 Handwritten digit recognition,

 Object recognition
 Speaker identification,

 Benchmarking time-series prediction tests

SVM—General Philosophy

Figure 4.8 SVM—General Philosophy

SVM—Margins and Support Vectors

Figure 4.9 SVM—Margins and Support Vectors

SVM—Linearly Separable

 A separating hyper plane can be written as

W●X+b=0
Where W={w1, w2, …, wn} is a weight vector and b a scalar (bias)
 For 2-D it can be written as
w0 + w1 x1 + w2 x2 = 0
 The hyper plane defining the sides of the margin:
H1: w0 + w1 x1 + w2 x2 ≥ 1 for yi = +1, and
H2: w0 + w1 x1 + w2 x2 ≤ – 1 for yi = –1

 Any training tuples that fall on hyper planes H1 or H2 (i.e., the sides defining
the margin) are support vectors
 This becomes a constrained (convex) quadratic optimization problem: Quadratic
objective function and linear constraints  Quadratic Programming (QP) 
Lagrangian multipliers

Why Is SVM Effective on High Dimensional Data?

 The complexity of trained classifier is characterized by the # of support vectors rather than
the dimensionality of the data
 The support vectors are the essential or critical training examples —they lie closest to the
decision boundary (MMH)
 If all other training examples are removed and the training is repeated, the same separating
hyper plane would be found
 The number of support vectors found can be used to compute an (upper) bound on the
expected error rate of the SVM classifier, which is independent of the data dimensionality
 Thus, an SVM with a small number of support vectors can have good generalization, even
when the dimensionality of the data is high

PREDICTION
 (Numerical) prediction is similar to classification
 construct a model
 use model to predict continuous or ordered value for a given input
 Prediction is different from classification
 Classification refers to predict categorical class label
 Prediction models continuous-valued functions
 Major method for prediction: regression
 model the relationship between one or more independent or predictorvariables
and a dependent or response variable
 Regression analysis
 Linear and multiple regression
 Non-linear regression
 Other regression methods: generalized linear model, Poisson
regression, log-linear models, regression trees
LINEAR REGRESSION
 Linear regression: involves a response variable y and a single predictorvariable x
y = w0 + w1 x
Where w0 (y-intercept) and w1 (slope) are regression coefficients
 Method of least squares: estimates the best-fitting straight line
 Multiple linear regression: involves more than one predictor variable
 Training data is of the form (X1, y1), (X2, y2),…, (X|D|, y|D|)
 Ex. For 2-D data, we may have: y = w0 + w1 x1+ w2 x2
 Solvable by extension of least square method or using SAS, S-Plus
 Many nonlinear functions can be transformed into the above
Nonlinear Regression
 Some nonlinear models can be modeled by a polynomial function
 A polynomial regression model can be transformed into linear regression model. For
example,
y = w0 + w1 x + w2 x2 + w3 x3
Convertible to linear with new variables: x2 = x2, x3= x3
y = w0 + w1 x + w2 x2 + w3 x3
 Other functions, such as power function, can also be transformed to linear model
 Some models are intractable nonlinear (e.g., sum of exponential terms)
Possible to obtain least square estimates through extensive calculation on more
complex formulae
PART-A
Q. No Questions Competence BT Level

1. Define decision tree induction Remember BTL-1

Define Data pruning. State the need for pruning phase in
2. Remember BTL-1
decision tree construction
3. Name the features of Decision tree induction. Understand BTL-2

4. Give why pruning is needed in decision tree Understand BTL-2

5. Demonstrate the Bayes classification methods. Apply BTL-3

How would you show your understanding about pessimistic
6. Apply BTL-3
pruning?
What is Naïve Bayesian classification? How is it differing
7. Analyze BTL-4
from Bayesian classification?
8. How would you evaluate accuracy of a classifier? Evaluate BTL-5

9. What inference can you formulate with Bayes theorem? Create BTL-6

10. Define Lazy learners and eager learners with an example. Remember BTL-1

PART-B

Q. No Questions Competence BT Level

i) Develop an algorithm for classification using decision
trees. Illustrate the algorithm with a relevant example. (7)
1. Apply BTL-3
ii) What approach would you use to apply decision tree
induction? (6)
i) What is Classification? What are the features of Bayesian
classification? Explain in detail with an example. (8)
2. Evaluate BT
ii) Explain how the Bayesian Belief Networks are trained to
L-5
perform classification. (5)
i) Generalize the Bayes theorem of posterior probability
and explain the working of a Bayesian classifier with an
3. Create BT
example. (9)
L-6
ii) Formulate rule based classification techniques. (4)
i) Define classification. With an example explain how
support vector machines can be used for classification. (7)
4. Remember BT
ii) What are the prediction techniques supported by a data L-1
mining systems? (6)
(i) Explain algorithm for constructing a decision tree from
5. Analyze BT
training samples. (9) (ii)Write Bayes theorem. (4)
L-4
i) Describe in detail about the following Classification
methods. (6)
(a) Bayesian classification
6. Remember BT
(b) Fuzzy set approach L-1
(c) Genetic algorithms.
ii) Describe in detail Classification by Back propagation.
i) Examine in detail about Lazy learners with examples. (4)
ii) Describe about the process of multi-layer feed-forward
7. Remember BT
neural network classification using back propagation L-1
learning.
TEXT / REFERENCE BOOKS
1. Jiawei Han and Micheline Kamber, “Data Mining Concepts and
Techniques”, 2nd Edition, Elsevier, 2007
2. Alex Berson and Stephen J. Smith, “ Data Warehousing, Data Mining
& OLAP”, Tata McGraw Hill, 2007.
3. Pang-Ning Tan, Michael Steinbach and Vipin Kumar, “Introduction
To Data Mining”, PersonEducation, 2007.
4. K.P. Soman, Shyam Diwakar and V. Ajay, “Insight into Data
mining Theory and Practice”, Easter Economy Edition, Prentice Hall of
India, 2006.
5. G. K. Gupta, “Introduction to Data Mining with Case Studies”,
Easter Economy Edition, Prentice Hall of India, 2006.
6. Daniel T.Larose, “Data Mining Methods and Models”, Wile-Interscience, 2006

Green Dot Fraud
86% (7)
Green Dot Fraud
16 pages
DM Module-3 Notes
No ratings yet
DM Module-3 Notes
25 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
7 Classification
100% (3)
7 Classification
63 pages
Classification
No ratings yet
Classification
81 pages
Unit V - Classification and Prediction 2020-21
100% (1)
Unit V - Classification and Prediction 2020-21
68 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Week 6 - 7 - Classification
No ratings yet
Week 6 - 7 - Classification
67 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
08 Class Basic
No ratings yet
08 Class Basic
141 pages
Module 3
No ratings yet
Module 3
64 pages
CH 5
No ratings yet
CH 5
84 pages
TTDS Lecture 4
No ratings yet
TTDS Lecture 4
31 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
Data Mining: Classification
No ratings yet
Data Mining: Classification
70 pages
DWDM Unit 4 PDF
No ratings yet
DWDM Unit 4 PDF
18 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Data Mining: Concepts and Techniques: - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 7
61 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
Unit Iv
No ratings yet
Unit Iv
38 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Classification and Prediction
No ratings yet
Classification and Prediction
14 pages
19-Introduction classification algorithm-18-09-2024
No ratings yet
19-Introduction classification algorithm-18-09-2024
102 pages
DWM Unit-III
No ratings yet
DWM Unit-III
24 pages
DWDM 4
No ratings yet
DWDM 4
58 pages
CS402 Mod 3
No ratings yet
CS402 Mod 3
2 pages
7 - Classification
No ratings yet
7 - Classification
71 pages
2 Supervised Learning
No ratings yet
2 Supervised Learning
48 pages
Data Mining Unit 3
No ratings yet
Data Mining Unit 3
50 pages
20210913115613D3708 - Session 05-08 Decision Tree Classification
No ratings yet
20210913115613D3708 - Session 05-08 Decision Tree Classification
37 pages
Unit-3
No ratings yet
Unit-3
98 pages
Unit 4- Classification and Prediction
No ratings yet
Unit 4- Classification and Prediction
72 pages
Data Mining-Unit-3
No ratings yet
Data Mining-Unit-3
16 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
22 pages
Unit - Iii
No ratings yet
Unit - Iii
52 pages
Classification and Prediction
100% (1)
Classification and Prediction
31 pages
New Classification11
No ratings yet
New Classification11
98 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
Classification
No ratings yet
Classification
73 pages
unit 5
No ratings yet
unit 5
25 pages
331mt 3.1 (1)
No ratings yet
331mt 3.1 (1)
36 pages
Classification and Prediction
No ratings yet
Classification and Prediction
69 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
Data Mining Unit-Iii
No ratings yet
Data Mining Unit-Iii
36 pages
Chapter 4
No ratings yet
Chapter 4
31 pages
Classification and Prediction: Data Mining 이복주 단국대학교 컴퓨터공학과
No ratings yet
Classification and Prediction: Data Mining 이복주 단국대학교 컴퓨터공학과
75 pages
Unit 4
No ratings yet
Unit 4
20 pages
Classification
No ratings yet
Classification
33 pages
Module 04
No ratings yet
Module 04
75 pages
Classification, Prediction
100% (1)
Classification, Prediction
67 pages
Unit 3
No ratings yet
Unit 3
16 pages
Lecture 6 - Decision Trees
No ratings yet
Lecture 6 - Decision Trees
43 pages
DWDM UNIT-IV Classification and Prediction
100% (1)
DWDM UNIT-IV Classification and Prediction
70 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
Unit-III Classification
No ratings yet
Unit-III Classification
10 pages
IntroClassificationDA-2024
No ratings yet
IntroClassificationDA-2024
129 pages
Week 5
No ratings yet
Week 5
72 pages
Lecture 6 Classification-Decision Tree Rule Based K-NN
No ratings yet
Lecture 6 Classification-Decision Tree Rule Based K-NN
73 pages
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
From Everand
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
Idea Link
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
SJ-20170921091514-004-ZXMW NR8961 (V2.04.03B) Hardware Installation Guide
No ratings yet
SJ-20170921091514-004-ZXMW NR8961 (V2.04.03B) Hardware Installation Guide
71 pages
Date and Time Formats Used in HTML - HTML - HyperText Markup Language - MDN
No ratings yet
Date and Time Formats Used in HTML - HTML - HyperText Markup Language - MDN
10 pages
Security-Week-4_5-2023-Risk-Management-Slides
No ratings yet
Security-Week-4_5-2023-Risk-Management-Slides
57 pages
cl-12 Eng Preboard - 2 Set-1
No ratings yet
cl-12 Eng Preboard - 2 Set-1
10 pages
Ag Select B2-B3 (Uk)
No ratings yet
Ag Select B2-B3 (Uk)
36 pages
ht2000 Satellite Modem User Guide
No ratings yet
ht2000 Satellite Modem User Guide
31 pages
Abbitti App
No ratings yet
Abbitti App
4 pages
EVOLIUM™ A9110-E Micro-BTS Product Description
No ratings yet
EVOLIUM™ A9110-E Micro-BTS Product Description
37 pages
Java Cheatsheet CodeWithHarry
0% (1)
Java Cheatsheet CodeWithHarry
18 pages
Human Activities Recognition and Monitoring System Using Machine Learning Techniques
No ratings yet
Human Activities Recognition and Monitoring System Using Machine Learning Techniques
5 pages
Installation Instruction-20200617
No ratings yet
Installation Instruction-20200617
65 pages
Smart India Hackathon 2024: Problem Statement ID - Problem Statement Title-Theme - PS Category - Team ID Team Name
No ratings yet
Smart India Hackathon 2024: Problem Statement ID - Problem Statement Title-Theme - PS Category - Team ID Team Name
6 pages
HSETS 11 Horizon Solar Energy Training Set
No ratings yet
HSETS 11 Horizon Solar Energy Training Set
4 pages
TAPAN Aadhar Card
No ratings yet
TAPAN Aadhar Card
1 page
Agile Handbook PDF
No ratings yet
Agile Handbook PDF
34 pages
MA 041 DGA 900 Plus Installation and Commissioning Manual Rev1.1
No ratings yet
MA 041 DGA 900 Plus Installation and Commissioning Manual Rev1.1
99 pages
Basic Excel Skills Kiana
No ratings yet
Basic Excel Skills Kiana
53 pages
Ps Admin
No ratings yet
Ps Admin
6 pages
Update 16jan 08: Download Error
No ratings yet
Update 16jan 08: Download Error
9 pages
PDF Unofficial Guide To Radiolo Christopher Gee The Download
100% (2)
PDF Unofficial Guide To Radiolo Christopher Gee The Download
34 pages
HP UFT 12.0 New Features
No ratings yet
HP UFT 12.0 New Features
1 page
Creating A Commercial Sign For Vinyl Cutting: Vallentin Vassileff
No ratings yet
Creating A Commercial Sign For Vinyl Cutting: Vallentin Vassileff
12 pages
DCA Generators ECU800 Diagnostic Mode
No ratings yet
DCA Generators ECU800 Diagnostic Mode
2 pages
Optikam B1-B3-B5-B9 - en It Es FR de
No ratings yet
Optikam B1-B3-B5-B9 - en It Es FR de
24 pages
Digital Payments
No ratings yet
Digital Payments
13 pages
Epitech C Coding Style
No ratings yet
Epitech C Coding Style
19 pages
CIT882
No ratings yet
CIT882
131 pages
Indorsement Addtl CF
No ratings yet
Indorsement Addtl CF
5 pages
Student Guide 2019-2020: London - Ac.uk
No ratings yet
Student Guide 2019-2020: London - Ac.uk
34 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

DM UNIT-3

Uploaded by

DM UNIT-3

Uploaded by

UNIT – III - CLASSIFICATION AND PREDICTION

CLASSIFICATION AND PREDICTION

Figure 4.1 Model Construction

Figure 4.2 Model in Prediction

Table 4.1 Training Dataset

• Represent the knowledge in the form of IF-THEN rules

IF age = “<=30” AND student = “yes” THEN buys_computer = “yes”

IF age = “31…40” THEN buys_computer = “yes”

IF age = “>40” AND credit_rating = “excellent” THEN buys_computer = “yes”

IF age = “>40” AND credit_rating = “fair” THEN buys_computer = “no”

Avoid Overfitting in Classification

• The generated tree may overfit the training data

Tree Mining in Weka and Tree Mining in Clementine.

PEtree = 6.0.206 + 9.0.143 + 1.0.750 = 3.257

• MAP (maximum posteriori) hypothesis

• Practical difficulty: require initial knowledge of many probabilities, significant computational

P(C j |V )  P(C j ) P(vi | C j )

sunny 2/9 3/5 high 3/9 4/5

overcast 4/9 0 normal 6/9 1/5

rain 3/9 2/5

hot 2/9 2/5 true 3/9 3/5

mild 4/9 2/5 false 6/9 2/5

cool 3/9 1/5

Table 4.1 Training Dataset

NAÏVE BAYESIAN CLASSIFICATION

Bayesian belief network: an example

Figure 4.4 Bayesian Belief Networks

Table 4.2 conditional probability table

RULE BASED CLASSIFICATION

 Class-based ordering: decreasing order of prevalence or

Figure 4.5 Decision Tree

IF age = young AND student = no THEN buys_computer = no

 Sequential covering algorithm: Extracts rules directly from training data

 The process repeats on the remaining tuples unless termination condition,

Figure 4.6 Neuron

Figure 4.7 A multi-layer feed-forward neural network

 Handwritten digit recognition,

 Benchmarking time-series prediction tests

Figure 4.8 SVM—General Philosophy

SVM—Margins and Support Vectors

Figure 4.9 SVM—Margins and Support Vectors

 A separating hyper plane can be written as

Why Is SVM Effective on High Dimensional Data?

1. Define decision tree induction Remember BTL-1

4. Give why pruning is needed in decision tree Understand BTL-2

5. Demonstrate the Bayes classification methods. Apply BTL-3

Q. No Questions Competence BT Level

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.