0% found this document useful (0 votes)

29 views

DWM Exp6 C49

Uploaded by

yadneshshende2223

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views

DWM Exp6 C49

Uploaded by

yadneshshende2223

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

LAB Manual

PART A
(PART A: TO BE REFFERED BY STUDENTS)

Experiment No.06
A.1 Aim:
Perform pre-processing on data and Implementation of Decision Tree
Algorithm Using R-tool or WEKA.

A.2 Prerequisite:
Familiarity with the WEKA tool.

A.3 Outcome:
After successful completion of this experiment students will be able
to
Use classification and clustering algorithms of data mining.

A.4 Theory:

Preprocessing:

Data have quality if they satisfy the requirements of the intended use.
There are many factors comprising data quality, including accuracy,
completeness, consistency, timeliness, believability, and interpretability.
Major Tasks in Data Preprocessing:
In this section, we look at the major steps involved in data preprocessing,
namely, data cleaning, data integration, data reduction, and data
transformation.
Data cleaning routines work to “clean” the data by filling in missing
values, smoothing noisy data, identifying or removing outliers, and
resolving inconsistencies. If users believe the data are dirty, they are
unlikely to trust the results of any data mining that has been applied.
Furthermore, dirty data can cause confusion for the mining procedure,
resulting in unreliable output. Although most mining routines have some
procedures for dealing with incomplete or noisy data, they are not always
robust. Instead, they may concentrate on avoiding overfitting the data to
the function being modeled. Therefore, a useful preprocessing step is to
run your data through some data cleaning routines.
Data reduction obtains a reduced representation of the data set that is
much smaller in volume, yet produces the same (or almost the same)
analytical results. Data reduction strategies include dimensionality
reduction and numerosity reduction.
In dimensionality reduction, data encoding schemes are applied so as
to obtain a reduced or “compressed” representation of the original data.
Examples include data compression techniques (e.g., wavelet transforms
and principal components analysis), attribute subset selection (e.g.,
removing irrelevant attributes), and attribute construction (e.g., where a
small set of more useful attributes is derived from the original set).
In numerosity reduction, the data are replaced by alternative, smaller
representations using parametric models (e.g., regression or log-linear
models) or nonparametric models (e.g., histograms, clusters, sampling, or
data aggregation). Discretization and concept hierarchy generation are
powerful tools for data mining in that they allow data mining at multiple
abstraction levels. Normalization, data discretization, and concept
hierarchy generation are forms of data transformation.
Data Cleaning:
Real-world data tend to be incomplete, noisy, and inconsistent. Data
cleaning (or data cleansing) routines attempt to fill in missing values,
smooth out noise while identifying outliers, and correct inconsistencies in
the data.
Dealing with Missing Values
1. Ignore the tuple: This is usually done when the class label is missing
(assuming the mining task involves classification). This method is not very
effective, unless the tuple contains several attributes with missing values.
It is especially poor when the percentage of missing values per attribute
varies considerably. By ignoring the tuple, we do not make use of the
remaining attributes’ values in the tuple. Such data could have been
useful to the task at hand.
2. Fill in the missing value manually: In general, this approach is time
consuming and may not be feasible given a large data set with many
missing values.
3. Use a global constant to fill in the missing value: Replace all
missing attribute values by the same constant such as a label like
“Unknown” or -∞. If missing values are replaced by, say, “Unknown,” then
the mining program may mistakenly think that they form an interesting
concept, since they all have a value in common—that of “Unknown.”
Hence, although this method is simple, it is not foolproof.
4. Use a measure of central tendency for the attribute (e.g., the
mean or median) to fill in the missing value: central tendency,
indicate the “middle” value of a data distribution. For normal (symmetric)
data distributions, the mean can be used, while skewed data distribution
should employ the median..
5. Use the attribute mean or median for all samples belonging to
the same class as the given tuple: For example, if classifying
customers according to credit risk, we may replace the missing value with
the mean income value for customers in the same credit risk category as
that of the given tuple. If the data distribution for a given class is skewed,
the median value is a better choice.
6. Use the most probable value to fill in the missing value: This
may be determined with regression, inference-based tools using a
Bayesian formalism, or decision tree induction.
Dealing with Noise:
Noise is a random error or variance in a measured variable
Binning: Binning methods smooth a sorted data value by consulting its
“neighbourhood,” that is, the values around it. The sorted values are
distributed into a number of “buckets,” or bins. Because binning methods
consult the neighbourhood of values, they perform local smoothing. In
smoothing by bin means, each value in a bin is replaced by the mean
value of the bin. Similarly, smoothing by bin medians can be employed, in
which each bin value is replaced by the bin median. In smoothing by bin
boundaries, the minimum and maximum values in a given bin are
identified as the bin boundaries. Each bin value is then replaced by the
closest boundary value.
Regression: Data smoothing can also be done by regression, a technique
that conforms data values to a function. Linear regression involves finding
the “best” line to fit two attributes (or variables) so that one attribute can
be used to predict the other. Multiple linear regression is an extension of
linear regression, where more than two attributes are involved and the
data are fit to a multidimensional surface.
Outlier analysis: Outliers may be detected by clustering, for example,
where similar values are organized into groups, or “clusters.” Intuitively,
values that fall outside of the set of clusters may be considered outliers.

Decision tree:
A decision tree is a tree in which each branch node represents a
choice between a number of alternatives, and each leaf node represents a
decision. Decision tree are commonly used for gaining information for the
purpose of decision -making. A Decision Tree is a tree-structured plan of a
set of attributes to test in order to predict the output. To decide which
attribute should be tested first, simply find the one with the highest
information gain. They are able to produce human-readable descriptions
of trends in the underlying relationships of a dataset and can be used for
classification and prediction tasks. Various Decision tree algorithms are
CART, C4.5 and ID3 algorithm.

Advantages of decision tree:

1. Simple to understand and interpret.

2. Requires little data preparation.

3. Able to handle both numerical and categorical data.

4. Perform well with large data in a short time.

ID 3 algorithm:

ID3 is a simple decision tree learning algorithm developed by Ross

Quinlan (1983). The basic idea of ID3 algorithm is to construct the
decision tree by employing a top-down, greedy search through the given
sets to test each attribute at every tree node. In order to select the
attribute that is most useful for classifying a given sets, we introduce a
metric - information gain. A measure used from Information Theory in the
ID3 algorithm used in decision tree construction is that of Entropy.
Informally, the entropy of a dataset can be considered to be how
disordered it is. It has been shown that entropy is related to information,
in the sense that the higher the entropy, or uncertainty, of some data,
then the more information is required in order to completely describe that
data. In building a decision tree, we aim to decrease the entropy of the
dataset until we reach leaf nodes at which point the subset that we are
left with is pure, or has zero entropy and represent instances all of one
class (all instances have the same value for the target attribute).We
measure the entropy of a dataset, S, with respect to one attribute ‘a i’, in
this case the target attribute, with the following calculation.

Where Pi is the proportion of instances in the dataset that take the ith
value of the target attribute.

Measures the expected reduction in entropy. The higher the IG,

more is the expected reduction in entropy.

Where v is a value of A, |Sv| is the subset of instances of S where A takes

the value v, and |S| is the number of instances.
Steps of ID3 (Examples, Target, and Attributes) Algorithm:-
Create a root node
1. If all Examples have the same Target value, give the root this label

2. Else if Attributes is empty label the root according to the most common
value

3. Else begin

3.1.Calculate the information gain for each attribute, according to the

average entropy formula

3.2.Select the attribute, A, with the lowest average entropy (highest

information gain) and make this the attribute tested at the root

3.3.For each possible value, v, of this attribute

3.3.1.Add a new branch below the root, corresponding to A = v

3.3.2.Let Examples(v) be those examples with A = v

3.3.3.If Examples(v) is empty, make the new branch a leaf node
labeled with the most common value among Examples

3.3.4.Else let the new branch be the tree created by

ID3(Examples(v), Target, Attributes - {A})

4. End.

Advantages of ID3:

1. Predict new data.

2. Understandable prediction rules are created from the training data.

3. Builds the fastest and short tree.

4. Only need to test enough attributes until all data is classified.

5. Finding leaf nodes enables test data to be pruned, reducing number of

tests.

Disadvantages of ID3:

1. Data may be over-fitted or over-classified, if a small sample is tested.

2. Only one attribute at a time is tested for making a decision.

3. Numerous trees needed for continuous data.

j-48 classifier in weka:-

This experiment illustrates the use of j-48 classifier in weka. The sample
data set used in this experiment is “student” data available at arff format.
This document assumes that appropriate data preprocessing has been
performed.

Steps involved in this experiment:

Step-1: We begin the experiment by loading the data (student.arff) into

weka.
Step2: Next we select the “classify” tab and click “choose” button to
select the “j48”classifier.
Step3: Now we specify the various parameters. These can be specified by
clicking in the text box to the right of the chose button. In this example,
we accept the default values. The default version does perform some
pruning but does not perform error pruning.
Step4: Under the “text” options in the main panel. We select the 10-fold
cross validation as our evaluation approach. Since we don’t have separate
evaluation data set, this is necessary to get a reasonable idea of accuracy
of generated model.
Step-5: We now click ”start” to generate the model .the Ascii version of
the tree as well as evaluation statistic will appear in the right panel when
the model construction is complete.
Step-6: Note that the classification accuracy of model is about 69%.this
indicates that we may find more work. (Either in preprocessing or in
selecting current parameters for the classification)
Step-7: Now weka also lets us a view a graphical version of the
classification tree. This can be done by right clicking the last result set and
selecting “visualize tree” from the pop-up menu.
Step-8: We will use our model to classify the new instances.
Step-9: In the main panel under “text” options click the “supplied test
set” radio button and then click the “set” button. This will pop-up a
window which will allow you to open the file containing test instances.

Data set employee.arff:

@relation employee
@attribute age {25, 27, 28, 29, 30, 35, 48}
@attribute salary{10k,15k,17k,20k,25k,30k,35k,32k}
@attribute performance {good, avg, poor}
@data
%
25, 10k, poor
27, 15k, poor
27, 17k, poor
28, 17k, poor
29, 20k, avg
30, 25k, avg
29, 25k, avg
30, 20k, avg
35, 32k, good
48, 34k, good
48, 32k,good
%
The following screenshot shows the classification rules that were
generated whenj48 algorithm is applied on the given dataset.
PART B
(PART B: TO BE COMPLETED BY STUDENTS)

(Students must submit the soft copy as per following segments

within two hours of the practical. The soft copy must be uploaded
on the Blackboard or emailed to the concerned lab in charge
faculties at the end of the practical in case the there is no Black
board access available)

B.1 Software Code written by student:

(Paste your problem statement related to your case study completed during
the 2 hours of practical in the lab here)
@RELATION iris
@ATTRIBUTE sepallength REAL
@ATTRIBUTE sepalwidth REAL
@ATTRIBUTE petallength REAL
@ATTRIBUTE petalwidth REAL
@ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica}
@DATA
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5.0,3.4,1.5,0.2,Iris-setosa
4.4,2.9,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
5.4,3.7,1.5,0.2,Iris-setosa
4.8,3.4,1.6,0.2,Iris-setosa
4.8,3.0,1.4,0.1,Iris-setosa
4.3,3.0,1.1,0.1,Iris-setosa
5.8,4.0,1.2,0.2,Iris-setosa
5.7,4.4,1.5,0.4,Iris-setosa
5.4,3.9,1.3,0.4,Iris-setosa
5.1,3.5,1.4,0.3,Iris-setosa
5.7,3.8,1.7,0.3,Iris-setosa
5.1,3.8,1.5,0.3,Iris-setosa
5.4,3.4,1.7,0.2,Iris-setosa
5.1,3.7,1.5,0.4,Iris-setosa
4.6,3.6,1.0,0.2,Iris-setosa
5.1,3.3,1.7,0.5,Iris-setosa
4.8,3.4,1.9,0.2,Iris-setosa
5.0,3.0,1.6,0.2,Iris-setosa
5.0,3.4,1.6,0.4,Iris-setosa
5.2,3.5,1.5,0.2,Iris-setosa
5.2,3.4,1.4,0.2,Iris-setosa
4.7,3.2,1.6,0.2,Iris-setosa
4.8,3.1,1.6,0.2,Iris-setosa
5.4,3.4,1.5,0.4,Iris-setosa
5.2,4.1,1.5,0.1,Iris-setosa
5.5,4.2,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
5.0,3.2,1.2,0.2,Iris-setosa
5.5,3.5,1.3,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
4.4,3.0,1.3,0.2,Iris-setosa
5.1,3.4,1.5,0.2,Iris-setosa
5.0,3.5,1.3,0.3,Iris-setosa
4.5,2.3,1.3,0.3,Iris-setosa
4.4,3.2,1.3,0.2,Iris-setosa
5.0,3.5,1.6,0.6,Iris-setosa
5.1,3.8,1.9,0.4,Iris-setosa
4.8,3.0,1.4,0.3,Iris-setosa
5.1,3.8,1.6,0.2,Iris-setosa
4.6,3.2,1.4,0.2,Iris-setosa
5.3,3.7,1.5,0.2,Iris-setosa
5.0,3.3,1.4,0.2,Iris-setosa
7.0,3.2,4.7,1.4,Iris-versicolor
6.4,3.2,4.5,1.5,Iris-versicolor
6.9,3.1,4.9,1.5,Iris-versicolor
5.5,2.3,4.0,1.3,Iris-versicolor
6.5,2.8,4.6,1.5,Iris-versicolor
5.7,2.8,4.5,1.3,Iris-versicolor
6.3,3.3,4.7,1.6,Iris-versicolor
4.9,2.4,3.3,1.0,Iris-versicolor
6.6,2.9,4.6,1.3,Iris-versicolor
5.2,2.7,3.9,1.4,Iris-versicolor
5.0,2.0,3.5,1.0,Iris-versicolor
5.9,3.0,4.2,1.5,Iris-versicolor
6.0,2.2,4.0,1.0,Iris-versicolor
6.1,2.9,4.7,1.4,Iris-versicolor
5.6,2.9,3.6,1.3,Iris-versicolor
6.7,3.1,4.4,1.4,Iris-versicolor
5.6,3.0,4.5,1.5,Iris-versicolor
5.8,2.7,4.1,1.0,Iris-versicolor
6.2,2.2,4.5,1.5,Iris-versicolor
5.6,2.5,3.9,1.1,Iris-versicolor
5.9,3.2,4.8,1.8,Iris-versicolor
6.1,2.8,4.0,1.3,Iris-versicolor
6.3,2.5,4.9,1.5,Iris-versicolor
6.1,2.8,4.7,1.2,Iris-versicolor
6.4,2.9,4.3,1.3,Iris-versicolor
6.6,3.0,4.4,1.4,Iris-versicolor
6.8,2.8,4.8,1.4,Iris-versicolor
6.7,3.0,5.0,1.7,Iris-versicolor
6.0,2.9,4.5,1.5,Iris-versicolor
5.7,2.6,3.5,1.0,Iris-versicolor
5.5,2.4,3.8,1.1,Iris-versicolor
5.5,2.4,3.7,1.0,Iris-versicolor
5.8,2.7,3.9,1.2,Iris-versicolor
6.0,2.7,5.1,1.6,Iris-versicolor
5.4,3.0,4.5,1.5,Iris-versicolor
6.0,3.4,4.5,1.6,Iris-versicolor
6.7,3.1,4.7,1.5,Iris-versicolor
6.3,2.3,4.4,1.3,Iris-versicolor
5.6,3.0,4.1,1.3,Iris-versicolor
5.5,2.5,4.0,1.3,Iris-versicolor
5.5,2.6,4.4,1.2,Iris-versicolor
6.1,3.0,4.6,1.4,Iris-versicolor
5.8,2.6,4.0,1.2,Iris-versicolor
5.0,2.3,3.3,1.0,Iris-versicolor
5.6,2.7,4.2,1.3,Iris-versicolor
5.7,3.0,4.2,1.2,Iris-versicolor
5.7,2.9,4.2,1.3,Iris-versicolor
6.2,2.9,4.3,1.3,Iris-versicolor
5.1,2.5,3.0,1.1,Iris-versicolor
5.7,2.8,4.1,1.3,Iris-versicolor
6.3,3.3,6.0,2.5,Iris-virginica
5.8,2.7,5.1,1.9,Iris-virginica
7.1,3.0,5.9,2.1,Iris-virginica
6.3,2.9,5.6,1.8,Iris-virginica
6.5,3.0,5.8,2.2,Iris-virginica
7.6,3.0,6.6,2.1,Iris-virginica
4.9,2.5,4.5,1.7,Iris-virginica
7.3,2.9,6.3,1.8,Iris-virginica
6.7,2.5,5.8,1.8,Iris-virginica
7.2,3.6,6.1,2.5,Iris-virginica
6.5,3.2,5.1,2.0,Iris-virginica
6.4,2.7,5.3,1.9,Iris-virginica
6.8,3.0,5.5,2.1,Iris-virginica
5.7,2.5,5.0,2.0,Iris-virginica
5.8,2.8,5.1,2.4,Iris-virginica
6.4,3.2,5.3,2.3,Iris-virginica
6.5,3.0,5.5,1.8,Iris-virginica
7.7,3.8,6.7,2.2,Iris-virginica
7.7,2.6,6.9,2.3,Iris-virginica
6.0,2.2,5.0,1.5,Iris-virginica
6.9,3.2,5.7,2.3,Iris-virginica
5.6,2.8,4.9,2.0,Iris-virginica
7.7,2.8,6.7,2.0,Iris-virginica
6.3,2.7,4.9,1.8,Iris-virginica
6.7,3.3,5.7,2.1,Iris-virginica
7.2,3.2,6.0,1.8,Iris-virginica
6.2,2.8,4.8,1.8,Iris-virginica
6.1,3.0,4.9,1.8,Iris-virginica
6.4,2.8,5.6,2.1,Iris-virginica
7.2,3.0,5.8,1.6,Iris-virginica
7.4,2.8,6.1,1.9,Iris-virginica
7.9,3.8,6.4,2.0,Iris-virginica
6.4,2.8,5.6,2.2,Iris-virginica
6.3,2.8,5.1,1.5,Iris-virginica
6.1,2.6,5.6,1.4,Iris-virginica
7.7,3.0,6.1,2.3,Iris-virginica
6.3,3.4,5.6,2.4,Iris-virginica
6.4,3.1,5.5,1.8,Iris-virginica
6.0,3.0,4.8,1.8,Iris-virginica
6.9,3.1,5.4,2.1,Iris-virginica
6.7,3.1,5.6,2.4,Iris-virginica
6.9,3.1,5.1,2.3,Iris-virginica
5.8,2.7,5.1,1.9,Iris-virginica
6.8,3.2,5.9,2.3,Iris-virginica
6.7,3.3,5.7,2.5,Iris-virginica
6.7,3.0,5.2,2.3,Iris-virginica
6.3,2.5,5.0,1.9,Iris-virginica
6.5,3.0,5.2,2.0,Iris-virginica
6.2,3.4,5.4,2.3,Iris-virginica
5.9,3.0,5.1,1.8,Iris-virginica
%
%
%
B.2 Input and Output:
(Paste diagram of star schema and snowflake schema model related to your
case study in following format )
B.3 Observations and learning:
(Students are expected to comment on the output obtained with clear
observations and learning for each task/ sub part assigned)
In this experiment, we use of j-48 classifier in the Weka tool. Data is already
available in AIFF format. This document assumes that appropriate data
preprocessing has been performed.
B.4 Conclusion:
(Students must write the conclusion as per the attainment of individual
outcome listed above and learning/observation noted in section B.3)
After completing this Experiment we can use the j-48 classifier in the Weka
tool.
B.5 Question of Curiosity
(To be answered by student based on the practical performed and
learning/observations)
Q1: Draw the tree according to the classifier output and answers the
following questions:
1. What is the depth of the tree?
Ans: 2

1. How many leaf nodes are there in the tree?

Ans: 4
1. How many tree nodes?
Ans: 5

BS EN 50131-1-2006+a1-2009i
75% (4)
BS EN 50131-1-2006+a1-2009i
42 pages
Nichiyu Forklift FBR 75 Troubleshooting Manual
100% (52)
Nichiyu Forklift FBR 75 Troubleshooting Manual
20 pages
ზოგადი პათოლოგიური ანატომია ო. ხარძეიშვილი
No ratings yet
ზოგადი პათოლოგიური ანატომია ო. ხარძეიშვილი
466 pages
E-Tivity 2.2 Tharcisse 217010849
No ratings yet
E-Tivity 2.2 Tharcisse 217010849
7 pages
Chap 1 Data Preprocessing
No ratings yet
Chap 1 Data Preprocessing
17 pages
Data Cleaning
No ratings yet
Data Cleaning
26 pages
Data Preprocessing
No ratings yet
Data Preprocessing
0 pages
UNIT-2
No ratings yet
UNIT-2
34 pages
3-Data Pre-Processing
No ratings yet
3-Data Pre-Processing
18 pages
Unit-2 Lecture Notes
No ratings yet
Unit-2 Lecture Notes
33 pages
DM-24-DATA-CLEANING
No ratings yet
DM-24-DATA-CLEANING
2 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
52 pages
Mit401 Unit 10-Slm
No ratings yet
Mit401 Unit 10-Slm
23 pages
Data Preprocessing
No ratings yet
Data Preprocessing
33 pages
Data Preprocessing Techniques: 1.1 Why Preprocess The Data?
No ratings yet
Data Preprocessing Techniques: 1.1 Why Preprocess The Data?
12 pages
DWDM UNIT-II
No ratings yet
DWDM UNIT-II
18 pages
3 Data Preprocessing
No ratings yet
3 Data Preprocessing
25 pages
253777
No ratings yet
253777
66 pages
Script
No ratings yet
Script
5 pages
Data Mining
No ratings yet
Data Mining
5 pages
Data Preprocessing Solution-24-37
No ratings yet
Data Preprocessing Solution-24-37
14 pages
Data Preprocessing 013333
No ratings yet
Data Preprocessing 013333
8 pages
Knowledge Discovery and Data Mining
No ratings yet
Knowledge Discovery and Data Mining
55 pages
Dataminin Presentation (1) .PPTX - Read-Only
No ratings yet
Dataminin Presentation (1) .PPTX - Read-Only
23 pages
Data Pre-Processing: - Data Cleaning - Data Integration - Data Transformation - Data Reduction - Data Discretization
No ratings yet
Data Pre-Processing: - Data Cleaning - Data Integration - Data Transformation - Data Reduction - Data Discretization
55 pages
Data Pre Processing - NG
No ratings yet
Data Pre Processing - NG
43 pages
Data Preprocessing
No ratings yet
Data Preprocessing
12 pages
CH1-data Preprocessing
No ratings yet
CH1-data Preprocessing
49 pages
DW&DM(Unit -4)
No ratings yet
DW&DM(Unit -4)
9 pages
ML Assignment-1
No ratings yet
ML Assignment-1
7 pages
CS-DM MODULE-2
No ratings yet
CS-DM MODULE-2
30 pages
3 Data Preprocessing
No ratings yet
3 Data Preprocessing
25 pages
Lecture 7 -Data Preprocessing - Cleaning-M
No ratings yet
Lecture 7 -Data Preprocessing - Cleaning-M
21 pages
Data and DW Lab Manual Updated
No ratings yet
Data and DW Lab Manual Updated
44 pages
Lecture - 04 - Data Understanding and Preparation
No ratings yet
Lecture - 04 - Data Understanding and Preparation
59 pages
ml4
No ratings yet
ml4
17 pages
03 Data Preparation
No ratings yet
03 Data Preparation
28 pages
Lecture5
No ratings yet
Lecture5
27 pages
R21 Unit 2
No ratings yet
R21 Unit 2
101 pages
CS-DM Module-2
No ratings yet
CS-DM Module-2
29 pages
Data Mining and Business Intelligence
No ratings yet
Data Mining and Business Intelligence
52 pages
Preprocessing
No ratings yet
Preprocessing
52 pages
Spatial and Temporal Data Mining
No ratings yet
Spatial and Temporal Data Mining
52 pages
DWDM 3
No ratings yet
DWDM 3
12 pages
6 Data Preprocessing
No ratings yet
6 Data Preprocessing
37 pages
Data Prep
No ratings yet
Data Prep
5 pages
Preprocessing
No ratings yet
Preprocessing
62 pages
Preprocessing 935
No ratings yet
Preprocessing 935
68 pages
Feature Pruning and Normalization
No ratings yet
Feature Pruning and Normalization
8 pages
Data Preparation: KIT306/606: Data Analytics A/Prof. Quan Bai University of Tasmania
No ratings yet
Data Preparation: KIT306/606: Data Analytics A/Prof. Quan Bai University of Tasmania
49 pages
20231019142012303_29402Data Preprocessing - Data cleaning
No ratings yet
20231019142012303_29402Data Preprocessing - Data cleaning
12 pages
UNIT-2
No ratings yet
UNIT-2
37 pages
Data Preparation DM
No ratings yet
Data Preparation DM
26 pages
CH2 Data Cleaning
No ratings yet
CH2 Data Cleaning
41 pages
Concepts (PPT) - Data Preprocessing
No ratings yet
Concepts (PPT) - Data Preprocessing
19 pages
1preparing Data
No ratings yet
1preparing Data
6 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
69 pages
DM Chapter 3 Data Preprocessing
No ratings yet
DM Chapter 3 Data Preprocessing
76 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Pattern Recognition: Fundamentals and Applications
From Everand
Pattern Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
C-49 DWM Expt10
No ratings yet
C-49 DWM Expt10
13 pages
DWM Exp8 C49
No ratings yet
DWM Exp8 C49
10 pages
DWM Exp9 C49
No ratings yet
DWM Exp9 C49
8 pages
C49 DWM Expt4
No ratings yet
C49 DWM Expt4
14 pages
DWM Exp7 C49
No ratings yet
DWM Exp7 C49
11 pages
NCP1380 Quasi-Resonant Current-Mode Controller For High-Power Universal Off-Line Supplies
No ratings yet
NCP1380 Quasi-Resonant Current-Mode Controller For High-Power Universal Off-Line Supplies
26 pages
Lec 01
No ratings yet
Lec 01
84 pages
H 046 011072 00 TM80 Telemetry Monitor Service Manual FDA 2.0
No ratings yet
H 046 011072 00 TM80 Telemetry Monitor Service Manual FDA 2.0
212 pages
Energy Recharge Calculator
No ratings yet
Energy Recharge Calculator
6 pages
Business Policy & Strategy - Chapter 1
No ratings yet
Business Policy & Strategy - Chapter 1
58 pages
DX-9840E Service Manual DC-S84D1S
No ratings yet
DX-9840E Service Manual DC-S84D1S
37 pages
Lecture 8 Database Security
No ratings yet
Lecture 8 Database Security
34 pages
UNIT - I Part - I
No ratings yet
UNIT - I Part - I
8 pages
Tampad - BSIS3B-Module 2 (Lesson 2) Assesment
No ratings yet
Tampad - BSIS3B-Module 2 (Lesson 2) Assesment
2 pages
Contextual References Latihan
No ratings yet
Contextual References Latihan
5 pages
ACS580 Compared To ACS550 RevC
No ratings yet
ACS580 Compared To ACS550 RevC
33 pages
Bizhub C3320i SpecSheet
No ratings yet
Bizhub C3320i SpecSheet
4 pages
Micro Project CGR
No ratings yet
Micro Project CGR
21 pages
B01-01-01-01 Introduction To NX EasyFill
No ratings yet
B01-01-01-01 Introduction To NX EasyFill
28 pages
UNIT3DAApptx 2022 08 05 11 12 14
No ratings yet
UNIT3DAApptx 2022 08 05 11 12 14
68 pages
Outlier Detection - Priciples and Techniques
No ratings yet
Outlier Detection - Priciples and Techniques
67 pages
Learn Java Programming PDF
No ratings yet
Learn Java Programming PDF
16 pages
User Manual For Medical Reimbursement - Claims: Prepared by Aponline
No ratings yet
User Manual For Medical Reimbursement - Claims: Prepared by Aponline
14 pages
(P) (Kalyuga, 2015) Managing Cognitive Load in Technology-Based Learning Environments PDF
No ratings yet
(P) (Kalyuga, 2015) Managing Cognitive Load in Technology-Based Learning Environments PDF
8 pages
Chapter 4 - ER Model
No ratings yet
Chapter 4 - ER Model
96 pages
MiCOMPx30SoftwareUpdateProcedure EN C1
No ratings yet
MiCOMPx30SoftwareUpdateProcedure EN C1
10 pages
Electric Power Station Contracting
No ratings yet
Electric Power Station Contracting
4 pages
ComputerScienceQpSet-2 72596
No ratings yet
ComputerScienceQpSet-2 72596
4 pages
FEAP User's Manual
100% (1)
FEAP User's Manual
551 pages
Lab 07 - Perform Data Analysis in Power BI
No ratings yet
Lab 07 - Perform Data Analysis in Power BI
8 pages
Test Bank for Organization Theory and Design 13th Edition Richard L. Daft pdf download
100% (4)
Test Bank for Organization Theory and Design 13th Edition Richard L. Daft pdf download
26 pages
MD - Hafizur - Rahman
No ratings yet
MD - Hafizur - Rahman
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

DWM Exp6 C49

Uploaded by

DWM Exp6 C49

Uploaded by

LAB Manual

Advantages of decision tree:

1. Simple to understand and interpret.

3. Able to handle both numerical and categorical data.

4. Perform well with large data in a short time.

ID3 is a simple decision tree learning algorithm developed by Ross

Measures the expected reduction in entropy. The higher the IG,

Where v is a value of A, |Sv| is the subset of instances of S where A takes

3.1.Calculate the information gain for each attribute, according to the

3.2.Select the attribute, A, with the lowest average entropy (highest

3.3.For each possible value, v, of this attribute

3.3.1.Add a new branch below the root, corresponding to A = v

3.3.2.Let Examples(v) be those examples with A = v

3.3.4.Else let the new branch be the tree created by

1. Predict new data.

2. Understandable prediction rules are created from the training data.

3. Builds the fastest and short tree.

4. Only need to test enough attributes until all data is classified.

5. Finding leaf nodes enables test data to be pruned, reducing number of

1. Data may be over-fitted or over-classified, if a small sample is tested.

2. Only one attribute at a time is tested for making a decision.

3. Numerous trees needed for continuous data.

j-48 classifier in weka:-

Steps involved in this experiment:

Step-1: We begin the experiment by loading the data (student.arff) into

Data set employee.arff:

(Students must submit the soft copy as per following segments

B.1 Software Code written by student:

1. How many leaf nodes are there in the tree?

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.