dwdm
dwdm
dwdm
DATA MINING
1 List all the categorical (or nominal) attributes and the real-valued attributes 1
separately.
2 What attributes do you think might be crucial in making the credit assessment? 2
Come up with some simple rules in plain English using your selected attributes.
3 *What attributes do you think might be crucial in making the bank assessment? 5
4 One type of model that you can create is a Decision Tree -train a Decision Tree 6
using the complete dataset as the training data. Report the model obtained after
training.
5 Suppose you use your above model trained on the complete dataset, and classify 9
credit good/bad for each of the examples in the dataset. What % of examples can
you classify correctly? (This is also called testing on the training set) Why do you
think you cannot get 100 % training accuracy?
6 *Find out the correctly classified instances, root mean squared error, kappa 13
statistics, and mean absolute error for weather data set?
7 Is testing on the training set as you did above a good idea? Why or Why not? 17
8 One approach for solving the problem encountered in the previous question is 18
using cross-validation? Describe what is cross -validation briefly. Train a Decision
Tree again using cross - validation and report your results. Does your accuracy
increase/decrease? Why?
9 Check to see if the data shows a bias against "foreign workers" (attribute 20), or 21
"personal
-status" (attribute 9). One way to do this (perhaps rather simple minded) is to
remove these attributes from the dataset and see if the decision tree created in those
cases is significantly different from the full dataset case which you have already
done. To remove an attribute you can use the preprocess tab in Weka's GUI
Explorer. Did removing these attributes have any significant effect? Discuss.
10 *Load the „weather.arff‟ dataset in Weka and run the ID3 classification algorithm. 23
What problem do you have and what is the solution?
11 Another question might be, do you really need to input so many attributes to get 24
good results? Maybe only a few would do. For example, you could try just having
attributes 2, 3, 5, 7, 10, 17 (and 21, the class attribute (naturally)). Try out some
combinations. (You had removed two attributes in problem 7. Remember to reload
the arff data file to get all the attributes initially before you start selecting the ones
you want.)
12 Sometimes, the cost of rejecting an applicant who actually has a good credit (case 26
1) might be higher than accepting an applicant who has bad credit (case 2). Instead
of counting the misclassifications equally in both cases, give a higher cost to the
first case (say cost 5) and lower cost to the second case. You can do this by using a
cost matrix in Weka. Train your Decision Tree again and report the Decision Tree
and cross -validation results. Are they significantly different from results obtained
in problem 6 (using equal cost)?
13 Do you think it is a good idea to prefer simple decision trees instead of having long 28
complex decision trees? How does the complexity of a Decision Tree relate to the
bias of the model?
14 *Run the J48 and 1Bk classifiers using-the cross-validation strategy with various 29
fold levels. Compare the accuracy results.
.
15 You can make your Decision Trees simpler by pruning the nodes. one approach is 31
to use Reduced Error Pruning -Explain this idea briefly. Try reduced error pruning
for training your Decision Trees using cross -validation (you can do this in Weka)
and report the Decision Tree you obtain? Also, report your accuracy using the
pruned model. Does your accuracy increase?
16 (Extra Credit): How can you convert a Decision Trees into "if –then -else rules". 33
Make up your own small Decision Tree consisting of 2 - 3 levels and convert it into
a set of rules. There also exist different classifiers that output the model in the form
of rules -one such classifier in Weka is rules. PART, train this model and report the
set of rules obtained. Sometimes just one attribute can be good enough in making
the decision, yes, just one! Can you predict what attribute that might be in this
dataset? OneR classifier uses a single attribute to make decisions (it chooses the
attribute based on minimum error). Report the rule obtained by training a one R
classifier. Rank the performance of j48, PART and oneR.
17 *Run J48 and Naïve Bayes classifiers on the following datasets and determine the 37
accuracy:
1.vehicle.arff
2.kr-vs-kp.arff
3.glass.arff
4.wave-form-5000.arff
On which datasets does the Naïve Bayes perform better?
OBJECTIVE:
Data mining tools allow predicting future trends and behaviors, allowing businesses to make
proactive, knowledge-driven decisions. The data mining laboratory course is designed to exercise
the data mining techniques such as classification, clustering, pattern mining etc with varied datasets
and dynamic parameters. Weka data mining tool is used for the purpose of acquainting the students
with the basic environment of the data mining tools.
OUTCOMES:
Upon the completion of Data Mining practical course, the student will be able to:
1. Learn to build a data warehouse and query it using open source tools.
2. Learn to execute data mining tasks using a data mining toolkit (such as WEKA) and
visualize the results.
3. Demonstrate the working of algorithms for data mining tasks such association rule
mining, classification, clustering and regression.
DATA WAREHOUSING
AND
DATA MINING LAB
Credit Risk Assessment
Description: The business of banks is making loans. Assessing the credit worthiness of an applicant is of
crucial importance. You have to develop a system to help a loan officer decide whether the credit of a
customer is good, or bad. A bank‟s business rules regarding loans must consider two opposing factors.
On the one hand, a bank wants to make as many loans as possible. Interest on these loans is the ban‟s
profit source. On the other hand, a bank cannot afford to make too many bad loans. Too many bad loans
could lead to the collapse of the bank. The bank‟s loan policy must involve a compromise not too strict,
and not too lenient.
To do the assignment, you first and foremost need some knowledge about the world of credit . You can
acquire such knowledge in a number of ways.
1. Knowledge Engineering. Find a loan officer who is willing to talk. Interview her and try
to represent her knowledge in the form of production rules.
2. Books. Find some training manuals for loan officers or perhaps a suitable textbook on finance.
Translate this knowledge from text form to production rule form.
3. Common sense. Imagine yourself as a loan officer and make up reasonable rules which can be
used to judge the credit worthiness of a loan applicant.
4. Case histories. Find records of actual cases where competent loan officers correctly judged
when not to, approve a loan application.
The German Credit Data :
Actual historical credit data is not always easy to come by because of confidentiality rules. Here is one
such dataset (original) Excel spreadsheet version of the German credit data (download from web).
In spite of the fact that the data is German, you should probably make use of it for this assignment,
(Unless you really can consult a real loan officer !)
A few notes on the German dataset :
• DM stands for Deutsche Mark, the unit of currency, worth about 90 cents Canadian (but
looks and acts like a quarter).
• Owns_telephone. German phone rates are much higher than in Canada so fewer people
own telephones.
• Foreign_worker. There are millions of these in Germany (many from Turkey). It is very hard
to get German citizenship if you were not born of German parents.
• There are 20 attributes used in judging a loan applicant. The goal is the classify the applicant
into one of two categories, good or bad.
Subtasks : (Turn in your answers to the following tasks)
EXPERIMENT-1
1.1 OBJECTIVE:
List all the categorical (or nominal) attributes and the real-valued attributes separately.
1.2 RESOURCES:
1.3 PROCEDURE:
1.4 OUTPUT:
1
EXPERIMENT-2
2.1 OBJECTIVE:
Which attributes do you think might be crucial in making the credit assessment? Come up with some
simple rules in plain English using your selected attributes.
2.2 RESOURCES:
2.3 THEORY:
Association rule mining is defined as: Let be a set of n binary attributes called items. Let be a set of
transactions called the database. Each transaction in D has a unique transaction ID and contains a subset
of the items in I. A rule is defined as an implication of the form X=>Y where X,Y C I and X Π Y=Φ . The
sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequen t
(right hand side or RHS) of the rule respectively.
To illustrate the concepts, we use a small example from the supermarket domain.
The set of items is I = {milk,bread,butter,beer} and a small database containing the items (1 codes
presence and 0 absence of an item in a transaction) is shown in the table to the right. An example rule for
the supermarket could be meaning that if milk and bread is bought, customers also buy butter.
Note: this example is extremely small. In practical applications, a rule needs a support of several hundred
transactions before it can be considered statistically significant, and datasets often contain thousands or
millions of transactions.
To select interesting rules from the set of all possible rules, constraints on various measures of
significance and interest can be used. The best known constraints are minimum thresholds on support and
confidence. The support supp(X) of an itemset X is defined as the proportion of transactions in the data set
which contain the itemset. In the example database, the itemset {milk, bread} has a support of 2 / 5 = 0.4
since it occurs in 40% of all transactions (2 out of 5 transactions).
The confidence of a rule is defined. For example, the rule has a confidence of 0.2 / 0.4 = 0.5 in the
database, which means that for 50% of the transactions containing milk and bread the rule is correct.
Confidence can be interpreted as an estimate of the probability P(Y | X), the probability of finding the
RHS of the rule in transactions under the condition that these transactions also contain the LHS .
ALGORITHM:
Association rule mining is to find out association rules that satisfy the predefined minimum support and
confidence from a given database. The problem is usually decomposed into two sub problems. One is to
find those itemsets whose occurrences exceed a predefined threshold in the database; those itemsets are
called frequent or large itemsets. The second problem is to generate association rules from those large
itemsets with the constraints of minimal confidence.
Suppose one of the large itemsets is Lk, Lk = {I1, I2, … , Ik}, association rules with this itemsets are
generated in the following way: the first rule is {I1, I2, … , Ik1} and {Ik}, by checking the confidence
this rule can be determined as interesting or not. Then other rule are generated by deleting the last items in
the antecedent and inserting it to the consequent, further the confidences of the new rules are checked to
determine the interestingness of them. Those processes iterated until the antecedent becomes empty.
2
Since the second subproblem is quite straight forward, most of the researches focus on the first
subproblem. The Apriori algorithm finds the frequent sets L In Database D.
· Join Step.
· Prune Step.
k) Apriori Pseudocode
Apriori (T,£)
K<2
while L(k1)≠ Φ
C(k)<Generate( Lk - 1)
for transactions t € T
C(t)Subset(Ck,t)
count[c]<count[ c]+1
K<K+ 1
return Ụ L(k) k
2.4 PROCEDURE:
3
1
4
7) Select Start button
8) Now we can see the sample rules.
2.5 OUTPUT:
5
EXPERIMENT-3
3.1 OBJECTIVE:
*What attributes do you think might be crucial in making the bank assessment?
3.2 RESOURCES:
3.3 PROCEDURE:
3.4 OUTPUT:
6
EXPERIMENT-4
4.1 OBJECTIVE:
One type of model that you can create is a decision tree. Train a decision tree using the complete dataset
as the training data. Report the model obtained after training.
4.2 RESOURCES:
4.3 THEORY:
Classification is a data mining function that assigns items in a collection to target categories or classes.
The goal of classification is to accurately predict the target class for each case in the data. For example, a
classification model could be used to identify loan applicants as low, medium, or high credit risks. A
classification task begins with a data set in which the class assignments are known. For example, a
classification model that predicts credit risk could be developed based on observed data for many loan
applicants over a period of time.
In addition to the historical credit rating, the data might track employment history, home ownership or
rental, years of residence, number and type of investments, and so on. Credit rating would be the target,
the other attributes would be the predictors, and the data for each customer would constitute a case.
Classifications are discrete and do not imply order. Continuous, floating point values would indicate a
numerical, rather than a categorical, target. A predictive model with a numerical target uses a regression
algorithm, not a classification algorithm. The simplest type of classification problem is binary
classification. In binary classification, the target attribute has only two possible values: for example, high
credit rating or low credit rating. Multiclass targets have more than two values: for example, low,
medium, high, or unknown credit rating. In the model build (training) process, a classification algorithm
finds relationships between the values of the predictors and the values of the target. Different
classification algorithms use different techniques for finding relationships. These relationships are
summarized in a model, which can then be applied to a different data set in which the class assignments
are unknown.
Classification models are tested by comparing the predicted values to known target values in a set of test
data. The historical data for a classification project is typically divided into two data sets: one for building
the model; the other for testing the model. Scoring a classification model results in class assignments and
probabilities for each case. For example, a model that classifies customers as low, medium, or high value
would also predict the probability of each classification for each customer. Classification has many
applications in customer segmentation, business modeling, marketing, credit analysis, and biomedical and
drug response modeling.
Different Classification Algorithms: Oracle Data Mining provides the following algorithms for
classification:
Decision Tree - Decision trees automatically generate rules, which are conditional statements that reveal the
logic used to build the tree.
Naive Bayes - Naive Bayes uses Bayes' Theorem, a formula that calculates a probability by counting the
frequency of values and combinations of values in the historical data.
7
4.4 PROCEDURE:
4.5 OUTPUT:
8
The decision tree constructed by using the implemented C4.5 algorithm
8
EXPERIMENT-5
5.1 OBJECTIVE:
Suppose you use your above model trained on the complete dataset, and classify credit good/bad for each
of the examples in the dataset. What % of examples can you classify correctly? (This is also called testing
on the training set) Why do you think you cannot get 100 % training accuracy?
5.2 RESOURCES:
5.3 THEORY:
Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated
to the presence (or absence) of any other feature. For example, a fruit may be considered to be an apple if
it is red, round, and about 4" in diameter. Even though these features depend on the existence of the other
features, a naive Bayes classifier considers all of these properties to independently contribute to the
probability that this fruit is an apple.
An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate
the parameters (means and variances of the variables) necessary for classification. Because independent
variables are assumed, only the variances of the variables for each class need to be determined and not the
entire covariance matrix The naive Bayes probabilistic model :
P(C|F1..................Fn) over a dependent class variable C with a small number of outcomes or classes,
conditional on several feature variables F1 through Fn. The problem is that if the number of features n is
large or when a feature can take on a large number of values, then basing such a model on probability
tables is infeasible. We therefore reformulate the model to make it more tractable.
F1...............Fn)=[{p(C)p(F1..................Fn|C)}/p(F1,...............Fn)]
In practice we are only interested in the numerator of that fraction, since the denominator does not depend
on C and the values of the features Fi are given, so that the denominator is effectively constant. The
numerator is equivalent to the joint probability model p(C,F1.......Fn) which can be rewritten as follows,
using repeated applications of the definition of conditional probability:
Now the "naive" conditional independence assumptions come into play: assume that each feature Fi is
conditionally independent of every other feature Fj for j≠i .
9
This means that p(Fi|C,Fj)=p(Fi|C) and so the joint model can be expressed
as p(C,F1,.......Fn)=p(C)p(F1|C)p(F2|C)..........= p(C)π p(Fi|C)
This means that under the above independence assumptions, the conditional distribution over the
class variable C can be expressed like this:
where Z is a scaling factor dependent only on F1........Fn, i.e., a constant if the values of the feature
variables are known.
Models of this form are much more manageable, since they factor into a so called class prior p(C) and
independent probability distributions p(Fi|C). If there are k classes and if a model for each p(Fi|C=c) can
be expressed in terms of r parameters, then the corresponding naive Bayes model has (k - 1) + n r k
parameters. In practice, often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are
common, and so the total number of parameters of the naive Bayes model is 2n + 1, where n is the
number of binary features used for prediction
• D : Set of tuples
– X : (x1,x2,x3,…. xn)
10
k=1
5.4 PROCEDURE:
5.5 OUTPUT:
=== Evaluation on training set ===
=== Summary ===
Correctly Classified Instances 554 92.3333 %
Weighted Avg.
11
=== Confusion Matrix ===
A b
24529
17309
<-- classified as
a = YES , b = NO
12
EXPERIMENT-6
6.1 OBJECTIVE:
*Find out the correctly classified instances, root mean squared error, kappa statistics, and mean absolute
error for weather data set?
6.2 RESOURCES:
6.3 THEORY:
Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated
to the presence (or absence) of any other feature. For example, a fruit may be considered to be an apple if
it is red, round, and about 4" in diameter. Even though these features depend on the existence of the other
features, a naive Bayes classifier considers all of these properties to independently contribute to the
probability that this fruit is an apple.
An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate
the parameters (means and variances of the variables) necessary for classification. Because independent
variables are assumed, only the variances of the variables for each class need to be determined and not the
entire covariance matrix The naive Bayes probabilistic model :
P(C|F1..................Fn) over a dependent class variable C with a small number of outcomes or classes,
conditional on several feature variables F1 through Fn. The problem is that if the number of features n is
large or when a feature can take on a large number of values, then basing such a model on probability
tables is infeasible. We therefore reformulate the model to make it more tractable.
F1...............Fn)=[{p(C)p(F1..................Fn|C)}/p(F1,...............Fn)]
In practice we are only interested in the numerator of that fraction, since the denominator does not depend
on C and the values of the features Fi are given, so that the denominator is effectively constant. The
numerator is equivalent to the joint probability model p(C,F1.......Fn) which can be rewritten as follows,
using repeated applications of the definition of conditional probability:
Now the "naive" conditional independence assumptions come into play: assume that each feature Fi is
conditionally independent of every other feature Fj for j≠i .
13
This means that p(Fi|C,Fj)=p(Fi|C) and so the joint model can be expressed
as p(C,F1,.......Fn)=p(C)p(F1|C)p(F2|C)..........= p(C)π p(Fi|C)
This means that under the above independence assumptions, the conditional distribution over the
class variable C can be expressed like this:
where Z is a scaling factor dependent only on F1........Fn, i.e., a constant if the values of the feature
variables are known.
Models of this form are much more manageable, since they factor into a so called class prior p(C) and
independent probability distributions p(Fi|C). If there are k classes and if a model for each p(Fi|C=c) can
be expressed in terms of r parameters, then the corresponding naive Bayes model has (k - 1) + n r k
parameters. In practice, often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are
common, and so the total number of parameters of the naive Bayes model is 2n + 1, where n is the
number of binary features used for prediction
• D : Set of tuples
– X : (x1,x2,x3,…. xn)
14
k=1
6.4 PROCEDURE:
6.5 OUTPUT:
Weighted Avg.
15
=== Confusion Matrix ===
A b
24529
17309
<-- classified as
a = YES , b = NO
16
EXPERIMENT-7
7.1 OBJECTIVE:
Is testing on the training set as you did above a good idea? Why or Why not?
7.2 RESOURCES:
7.3 PROCEDURE:
1) In Test options, select the Supplied test set radio button
2) Click Set
3) Choose the file which contains records that were not in the training set we used to create the model.
4) Click Start(WEKA will run this test data set through the model we already created. )
5) Compare the output results with that of the 4th experiment
7.4 OUTPUT:
This can be experienced by the different problem solutions while doing practice.
The important numbers to focus on here are the numbers next to the "Correctly Classified Instances"
(92.3 percent) and the "Incorrectly Classified Instances" (7.6 percent). Other important numbers are in the
"ROC Area" column, in the first row (the 0.936); Finally, in the "Confusion Matrix," it shows the number
of false positives and false negatives. The false positives are 29, and the false negatives are 17 in this
matrix.
Based on our accuracy rate of 92.3 percent, we say that upon initial analysis, this is a good model.
One final step to validating our classification tree, which is to run our test set through the model and
ensure that accuracy of the model
Comparing the "Correctly Classified Instances" from this test set with the "Correctly Classified Instances"
from the training set, we see the accuracy of the model, which indicates that the model will not break
down with unknown data, or when future data is applied to it.
17
EXPERIMENT-8
8.1 OBJECTIVE:
One approach for solving the problem encountered in the previous question is using cross-validation?
Describe what is cross -validation briefly. Train a Decision Tree again using cross - validation and report
your results. Does your accuracy increase/decrease? Why?
8.2 RESOURCES:
8.3 THEORY:
Decision tree learning, used in data mining and machine learning, uses a decision tree as a predictive
model which maps observations about an item to conclusions about the item's target value In these tree
structures, leaves represent classifications and branches represent conjunctions of features that lead to
those classifications. In decision analysis, a decision tree can be used to visually and explicitly represent
decisions and decision making. In data mining, a decision tree describes data but not decisions; rather the
resulting classification tree can be an input for decision making. This page deals with decision trees in
data mining.
Decision tree learning is a common method used in data mining. The goal is to create a model that
predicts the value of a target variable based on several input variables. Each interior node corresponds to
one of the input variables; there are edges to children for each of the possible values of that input variable.
Each leaf represents a value of the target variable given the values of the input variables represented by
the path from the root to the leaf.
A tree can be "learned" by splitting the source set into subsets based on an attribute value test. This
process is repeated on each derived subset in a recursive manner called recursive partitioning. The
recursion is completed when the subset at a node all has the same value of the target variable, or when
splitting no longer adds value to the predictions. In data mining, trees can be described also as the
combination of mathematical and computational techniques to aid the description, categorization and
generalization of a given set of data.
The dependent variable, Y, is the target variable that we are trying to understand, classify or generalize.
The vector x is comprised of the input variables, x1, x2, x3 etc., that are used for that task.
8.4 PROCEDURE:
18
6) Go to Classify tab.
7) Choose Classifier “Tree”
8) Select J48
9) Select Test options “Cross-validation”.
10) Set “Folds” Ex:10
11) if need select attribute.
12) now Start weka.
13) now we can see the output details in the Classifier output.
th
14) Compare the output results with that of the 4 experiment
15) check whether the accuracy increased or decreased?
8.5 OUTPUT:
19
Relative absolute error 33.6511 %
Weighted Avg.
a b <-- classified as
236 38 | a = YES 23
303 | b = NO
20
EXPERIMENT-9
9.1 OBJECTIVE
Check to see if the data shows a bias against "foreign workers" (attribute 20), or "personal -status"
(attribute 9). One way to do this (perhaps rather simple minded) is to remove these attributes from the
dataset and see if the decision tree created in those cases is significantly different from the full dataset
case which you have already done. To remove an attribute you can use the preprocess tab in Weka's GUI
Explorer. Did removing these attributes have any significant effect? Discuss.
9.2 RESOURCES:
9.3 PROCEDURE:
21
9.4 OUTPUT:
22
EXPERIMENT-10
10.1 OBJECTIVE:
*Load the „weather.arff‟ dataset in Weka and run the ID3 classification algorithm. What problem do
you have and what is the solution?
10.2 RESOURCES:
10.3 PROCEDURE:
10.4 OUTPUT:
This can be experienced by the different problem solutions while doing practice.
The important numbers to focus on here are the numbers next to the "Correctly Classified Instances"
(92.3 percent) and the "Incorrectly Classified Instances" (7.6 percent). Other important numbers are in the
"ROC Area" column, in the first row (the 0.936); finally, in the "Confusion Matrix," it shows the number
of false positives and false negatives. The false positives are 29, and the false negatives are 17 in this
matrix.
Based on our accuracy rate of 92.3 percent, we say that upon initial analysis, this is a good model.
One final step to validating our classification tree, which is to run our test set through the model and
ensure that accuracy of the model
Comparing the "Correctly Classified Instances" from this test set with the "Correctly Classified Instances"
from the training set, we see the accuracy of the model, which indicates that the model will not break
down with unknown data, or when future data is applied to it.
23
EXPERIMENT-11
11.1 OBJECTIVE:
Another question might be, do you really need to input so many attributes to get good results? Maybe
only a few would do. For example, you could try just having attributes 2, 3, 5, 7, 10, 17 (and 21, the class
attribute (naturally)). Try out some combinations. (You had removed two attributes in problem 7.
Remember to reload the arff data file to get all the attributes initially before you start selecting the ones
you want).
11.2 RESOURCES:
11.3 PROCEDURE:
24
11.4 OUTPUT:
25
EXPERIMENT-12
12.1 OBJECTIVE:
Sometimes, the cost of rejecting an applicant who actually has a good credit (case 1) might be higher than
accepting an applicant who has bad credit (case 2). Instead of counting the misclassifications equally in
both cases, give a higher cost to the first case (say cost 5) and lower cost to the second case. You can do
this by using a cost matrix in Weka. Train your Decision Tree again and report the Decision Tree and
cross -validation results. Are they significantly different from results obtained in problem 6 (using equal
cost)?
12.2 RESOURCES:
12.3 PROCEDURE:
26
12.4 OUTPUT:
27
EXPERIMENT-13
13.1 OBJECTIVE:
Do you think it is a good idea to prefer simple decision trees instead of having long complex decision
trees? How does the complexity of a Decision Tree relate to the bias of the model?
13.2 RESOURCES:
13.3 PROCEDURE:
This will be based on the attribute set, and the requirement of relationship among attribute we want to
study. This can be viewed based on the database and user requirement.
28
EXPERIMENT-14
14.1 OBJECTIVE:
*Run the J48 and 1Bk classifiers using-the cross-validation strategy with various fold levels. Compare the
accuracy results. Hold out strategy with three percentage levels. Compare the accuracy results.
14.2 RESOURCES:
14.3 THEORY:
Reduced-error pruning
Each node of the (over-fit) tree is examined for pruning
A node is pruned (removed) only if the resulting pruned tree performs no worse than the original over
the validation set
Pruning a node consists of
• Removing the sub-tree rooted at the pruned node
• Making the pruned node a leaf node
• Assigning the pruned node the most common classification of the training instances attached
to that node
Pruning nodes iteratively
• Always select a node whose removal most increases the DT accuracy over the validation set
• Stop when further pruning decreases the DT accuracy over the validation
set IF (Children=yes) Λ (income=>30000)
THEN (car=Yes)
14.4 PROCEDURE:
14.5 OUTPUT:
30
EXPERIMENT-15
15.1 OBJECTIVE:
You can make your Decision Trees simpler by pruning the nodes. one approach is to use Reduced Error
Pruning -Explain this idea briefly. Try reduced error pruning for training your Decision Trees using cross
-validation (you can do this in Weka) and report the Decision Tree you obtain? Also, report your accuracy
using the pruned model. Does your accuracy increase?
15.2 RESOURCES:
15.3 THEORY:
Reduced-error pruning
Each node of the (over-fit) tree is examined for pruning
A node is pruned (removed) only if the resulting pruned tree
performs no worse than the original over the validation set
Pruning a node consists of
• Removing the sub-tree rooted at the pruned node
• Making the pruned node a leaf node
• Assigning the pruned node the most common classification of the training instances attached
to that node
Pruning nodes iteratively
• Always select a node whose removal most increases the DT accuracy over the validation set
• Stop when further pruning decreases the DT accuracy over the validation
set IF (Children=yes) Λ (income=>30000)
THEN (car=Yes)
15.4 PROCEDURE:
31
14) If need select attribute.
15) Now start weka.
16) Now we can see the output details in the Classifier output.
17) Right click on the result list and select ” visualize tree “option.
15.5 OUTPUT:
32
EXPERIMENT-16
16.1 OBJECTIVE:
(Extra Credit): How can you convert a Decision Trees into "if –then -else rules". Make up your own small
Decision Tree consisting of 2 - 3 levels and convert it into a set of rules. There also exist different
classifiers that output the model in the form of rules -one such classifier in Weka is rules. PART, train this
model and report the set of rules obtained. Sometimes just one attribute can be good enough in making
the decision, yes, just one! Can you predict what attribute that might be in this dataset? OneR classifier
uses a single attribute to make decisions (it chooses the attribute based on minimum error). Report the
rule obtained by training a one R classifier. Rank the performance of j48, PART and OneR.
16.2 RESOURCES:
16.3 PROCEDURE:
12.4 OUTPUT:
34
J48
One R
35
PART
36
EXPERIMENT-17
17.1 OBJECTIVE:
*Run J48 and Naïve Bayes classifiers on the following datasets and determine the accuracy:
1. vehicle.arff
2.kr-vs-kp.arff
3.glass.arff
4.wave-form-5000.arff
On which datasets does the Naïve Bayes perform better?
17.2 RESOURCES:
17.3 PROCEDURE:
17.4 OUTPUT:
J48
38
One R
PART
39