0% found this document useful (0 votes)

136 views

Classification: Decision Tree Hunt's Algorithm ID3 Rule Based Classifier C4.5

The document discusses classification techniques in data mining. It begins by defining classification and describing the general classification process of building a model from a training set and testing it on a separate test set. It then discusses decision tree based classification specifically, covering Hunt's algorithm, ID3, and C4.5 rule-based classifier. It describes how decision trees work, including their structure and how they are built recursively by splitting records at internal nodes based on attribute tests. Methods for splitting, determining the best split, and measures of impurity like entropy are also summarized.

Uploaded by

Ricky Chandra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

136 views

Classification: Decision Tree Hunt's Algorithm ID3 Rule Based Classifier C4.5

Uploaded by

Ricky Chandra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

Classification

■ Definition
■ Decision Tree
■ Hunt’s Algorithm

■ ID3

■ Rule Based Classifier

■ C4.5

2-Sep-20 Data Mining: Classification 1

Classification : Definition
■ Given a set of records (The training set)
■ Each record contains a set of attributes

■ One of the attributes is the class

■ Find a model for the class attribute as a function of the

values of other attributes
■ Goal: Previously unseen records should be assigned to a
class as accurately as possible
■ Usually, the given data set is divided into training and test
set:
■ Training set used to build the model.

■ Test set used to validate it.

■ The accuracy of the model is determined on the test set.

2-Sep-20 Data Mining: Classification 2

General Approach

2-Sep-20 Data Mining: Classification 3

Classification : Example

Name Give Birth Lay Eggs Can Fly Live in water Have Legs
Chinese Dragon No No Yes Yes No
2-Sep-20 Data Mining: Classification 4
Examples of Classification Task
■ Banking: determining whether a mortgage application is
a good or bad credit risk, or whether a particular credit
card transaction is fraudulent
■ Education: placing a new student into a particular track
with regard to special needs
■ Medicine: diagnosing whether a particular disease is
present
■ Law: determining whether a will was written by the
actual person deceased or fraudulently by someone else
■ Homeland Security: identifying whether or not certain
financial or personal behavior indicates a possible terrorist
threat
2-Sep-20 Data Mining: Classification 5
Classification Model
■ In general a classification model can
be used for the following purposes:
■ It can serve as a explanatory tool for
distinguishing objects of different
classes (descriptive).
■ It can be used to predict the class
labels of new records (predictive).

2-Sep-20 Data Mining: Classification 6

Classification Techniques
■ Decision Tree based Methods
■ Rule-based Methods
■ Instance-Based Classifier
■ Memory based reasoning
■ Neural Networks

■ Naïve Bayes and Bayesian Belief

Networks
■ Etc.

2-Sep-20 Data Mining: Classification 7

Decision Tree Structure
■ A decision tree is a hierarchical
structure of nodes and directed
edges.
■ There are three types of nodes in
a decision tree:
■ A root node, which has no
incoming edges and zero or
more outgoing edges
■ Internal nodes, each of which
have exactly one incoming
edge and two or more
outgoing edges
■ Leaf nodes, each of which
have exactly one incoming
edge and no outgoing edges.
Each leaf node also has a
class label attached to it

2-Sep-20 Data Mining: Classification 8

Decision Tree Based Classification
■ One of the most widely used classification
technique
■ Highly expressive in terms of capturing
relationships among discrete variables
■ Relatively inexpensive to construct and extremely
fast at classifying new records
■ Easy to interpret
■ Can effectively handle both missing values and noisy
data
■ Comparable or better accuracy than other
techniques in many applications

2-Sep-20 Data Mining: Classification 9

Example Decision Tree

2-Sep-20 Data Mining: Classification 10

Another Example Decision Tree

2-Sep-20 Data Mining: Classification 11

Decision Tree Classification Task

2-Sep-20 Data Mining: Classification 12

Hunt’s Algorithm
■ Most of the decision tree induction algorithms are based on
original ideas proposed in Hunt’s Algorithm
■ Let Dt be the training set and y be the set of class labels {y1, y2,
… , yc}
■ If Dt contains records that belong to the same class, yk,
then its decision tree consists of leaf node labeled as yk
■ If Dt contains records that belong to more than one class,
use an attribute test to split the data into smaller
subsets. Recursively apply the procedure to each subset.
■ If Dt is an empty set, then its decision tree is a leaf node
whose class label is determined from other information such
as the majority class of the records

2-Sep-20 Data Mining: Classification 13

Example of Hunt’s Algorithm

2-Sep-20 Data Mining: Classification 14

Decision Tree Classification Task

2-Sep-20 Data Mining: Classification 15

Apply Model to Test Data

2-Sep-20 Data Mining: Classification 16

Tree Induction
■ Determine how to split the records
■ Use greedy heuristics to make a series of locally optimum
decision about which attribute to use for partitioning the data
■ At each step of the greedy algorithm, a test condition is
applied to split the data in to subsets with a more
homogenous class distribution
■ How to specify test condition for each attribute?
■ How to determine the best split?
■ Determine when to stop splitting
■ A stopping condition is needed to terminate tree growing
process. Stop expanding a node
■ if all the instances belong to the same class
■ if all the instances have similar attribute values

2-Sep-20 Data Mining: Classification 17

Methods for splitting the records
■ Depends on attribute types
■ Binary: true / false, yes/no, +/-, etc.

■ Nominal: ID number, eye color, zip codes, etc.

■ Ordinal: rankings (e.g., taste of potato chips on a

scale from 1-10), grades, height in {tall, medium,

short}, etc.
■ Continuous/Ratio: calendar dates, temperatures

in Celsius or Fahrenheit, age, etc.

■ Depends on number of ways to split
■ 2-way split (Binary split)

■ Multi-way split

2-Sep-20 Data Mining: Classification 18

Splitting based on Nominal attributes

■ Each partition has subset

of values signifying it
■ Multi-way split: Use

as many partitions as
distinct values.
■ Binary split: Divides

values in to two
subsets. Need to find
optimal partitioning.

2-Sep-20 Data Mining: Classification 19

Splitting based on Ordinal attributes
■ Multi-way split:
■ Use as many

partitions as distinct
values
■ Binary split:
■ Divides values into

two subsets
■ Need to find optimal

partitioning
■ Preserve order

property among
attribute values
2-Sep-20 Data Mining: Classification 20
Splitting based on Continuous attributes
■ Different ways of handling
■ Discretization to form an ordinal categorical attribute
■ Static – discretize once at the beginning
■ Dynamic – ranges can be found by equal interval bucketing,
equal frequency bucketing (percentiles), clustering, ect.
■ Binary Decision: (A < v) or (A ≥ v)
■ Consider all possible splits and finds the best cut
■ Can be more compute intensive

2-Sep-20 Data Mining: Classification 21

How to Determine The Best Split?

2-Sep-20 Data Mining: Classification 22

How to Determine The Best Split?

■ Greedy approach:
■ Nodes with purer class distribution are

preferred
■ Need a measure of node impurity:

2-Sep-20 Data Mining: Classification 23

Finding The Best Split
1. Compute impurity measure (P) before splitting
2. Compute impurity measure (M) after splitting
■ Compute impurity measure of each child node

■ Compute the average impurity of the children (M)

3. Choose the attribute test condition that produces the

highest gain
Gain = P – M
or equivalently, lowest impurity measure after splitting
(M)

2-Sep-20 Data Mining: Classification 24

Measure of Impurity: Entropy
■ Entropy at a given node t:

■ (NOTE: p( j | t) is the relative frequency of class j at node t).

■ Information Gain:

■ Parent Node, p is split into k partitions;

■ ni is number of records in partition i

■ Choose the split that achieves most reduction (maximizes GAIN)

■ Used in the ID3 and C4.5 decision tree algorithms

2-Sep-20 Data Mining: Classification 25

Entropy Example

2-Sep-20 Data Mining: Classification 26

Entropy Example

■ The ‘Wind’ attribute:

■ Then:

■ Information Gain of “Wind” attribute:

2-Sep-20 Data Mining: Classification 27

Entropy Example
■ The “Sky” attribute:

■ Then:

■ Information Gain of “Sky” attribute:

2-Sep-20 Data Mining: Classification 28

Entropy Example
■ The “Barometer” attribute:

■ Then:

■ Information Gain of “Barometer” attribute:

2-Sep-20 Data Mining: Classification 29

Entropy Example
0.549

0.156

0.049

2-Sep-20 Data Mining: Classification 30

Measures of Node Impurity

■ Entropy

■ Gini Index

■ Misclassification error

2-Sep-20 Data Mining: Classification 31

Practical Challenges in Classification

■ Over-fitting
■ Model performs well on training set, but

poorly on test set

■ If the model is too simple, it may not fit the

training and test sets well. If the model is

too complex, over- fitting may occur and
reduce its ability to generalize beyond
training instances
■ Missing Values
■ Data Heterogeneity
2-Sep-20 Data Mining: Classification 32
Example of over-fitting

2-Sep-20 Data Mining: Classification 33

How to Address Over-fitting
■ Pre-Pruning (Early Stopping Rule)
■ Stop the algorithm before it becomes a

fully-grown tree
■ Typical stopping conditions for a node:
■ Stop if all instances belong to the same class
■ Stop if all the attribute values are the same
■ More restrictive conditions:
■ Stop if number of instances is less than some
user-specified threshold
■ Stop if class distribution of instances are independent
of the available features
■ Stop if expanding the current node does not improve
impurity measures (e.g., Gini or information gain)

2-Sep-20 Data Mining: Classification 34

How to Address Over-fitting
■ Post-pruning
■ Grow decision tree to its entirety

■ Trim the nodes of the decision tree in a

bottom-up fashion. If generalization
error improves after trimming, replace
sub-tree by a leaf node
■ Class label of leaf node is determined
from majority class of instances in the
sub-tree

2-Sep-20 Data Mining: Classification 35

Other Issues
■ Missing values affect decision tree construction in three different ways:
■ Affects how impurity measures are computed
■ Affects how to distribute records with missing value to child nodes
■ Affects how a test record with missing value is classified
■ Data Fragmentation
■ Number of records get smaller as you traverse down the tree
■ Number of records at the leaf nodes could be too small to make any
statistically significant decision
■ Difficult to interpret large-sized trees
■ Tree could be large because of using a single attribute in the test
condition
■ Oblique decision trees
■ Tree Replication
■ Sub-Tree may appear at different parts of a decision tree
■ Constructive induction: create new attributes by combining existing
attributes
2-Sep-20 Data Mining: Classification 36
Other Issues

2-Sep-20 Data Mining: Classification 37

Rule Based Classifier
■ Classify records by using a collection of “if…then…” rules
■ Rule: (Condition) → y
■ where

■ Condition is a conjunctions of attributes

■ y is the class label
■ LHS: rule antecedent or condition
■ RHS: rule consequent
■ Examples of classification rules:
■ (Blood Type=Warm) ∧ (Lay Eggs=Yes) → Birds
■ (Taxable Income < 50K) ∧ (Refund=Yes) → Evade=No

2-Sep-20 Data Mining: Classification 38

Application of Rule Based Classifier

2-Sep-20 Data Mining: Classification 39

Rule Coverage and Accuracy

2-Sep-20 Data Mining: Classification 40

Building Classification Rules
■ Direct Method:
■ Extract rules directly from data

■ e.g.: RIPPER, CN2, Holte’s 1R

■ Indirect Method:
■ Extract rules from other classification
models (e.g. decision trees, neural
networks, etc).
■ e.g: C4.5 rules

(Extract rules from decision tree)

2-Sep-20 Data Mining: Classification 41
From Decision Trees to Rules

2-Sep-20 Data Mining: Classification 42

Rules can be Simplified

2-Sep-20 Data Mining: Classification 43

Scalable Decision Tree Induction
Methods in Data Mining Studies
■ SLIQ (EDBT’96 — Mehta et al.)
■ builds an index for each attribute and only class list
and the current attribute list reside in memory
■ SPRINT (VLDB’96 — J. Shafer et al.)
■ constructs an attribute list data structure

■ PUBLIC (VLDB’98 — Rastogi & Shim)

■ integrates tree splitting and tree pruning: stop
growing the tree earlier
■ RainForest (VLDB’98 — Gehrke, Ramakrishnan & Ganti)
■ separates the scalability aspects from the criteria
that determine the quality of the tree
■ builds an AVC-list (attribute, value, class label)

2-Sep-20 Data Mining: Classification 44

Exercise

2-Sep-20 Data Mining: Classification 45

Python Full Notes Apna College
100% (2)
Python Full Notes Apna College
80 pages
Data Mining CS4168 Lecture 5 Basics of Classification 1
No ratings yet
Data Mining CS4168 Lecture 5 Basics of Classification 1
25 pages
Assignment 6
100% (4)
Assignment 6
11 pages
Ownership You Can Count On: A Hybrid Approach To Safe Explicit Memory Management
No ratings yet
Ownership You Can Count On: A Hybrid Approach To Safe Explicit Memory Management
11 pages
Lec05 Classification DecisionTree
No ratings yet
Lec05 Classification DecisionTree
67 pages
Data Mining
No ratings yet
Data Mining
68 pages
Decision Trees and Decision Modeling
No ratings yet
Decision Trees and Decision Modeling
58 pages
Lecture Notes For Chapter 4 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 4 Introduction To Data Mining: by Tan, Steinbach, Kumar
35 pages
siv UNIT-3 Classification DWM PART-A
No ratings yet
siv UNIT-3 Classification DWM PART-A
12 pages
updated dm unit 3
No ratings yet
updated dm unit 3
28 pages
Class Basic
No ratings yet
Class Basic
75 pages
Data Mining UNIT-III R20 Syllabus
No ratings yet
Data Mining UNIT-III R20 Syllabus
50 pages
Unit-IV Classification Part 1
No ratings yet
Unit-IV Classification Part 1
38 pages
Data Mining I: Summer Semester 2017
No ratings yet
Data Mining I: Summer Semester 2017
52 pages
Decision Tree - All Cost Functions - Stanford
No ratings yet
Decision Tree - All Cost Functions - Stanford
56 pages
07.2.Decision Trees_ML
No ratings yet
07.2.Decision Trees_ML
32 pages
5+6 Classification
No ratings yet
5+6 Classification
95 pages
APznzabwlzV5M2e5GjQ954nHSvXZJgoScUzxJJGGObe92caYJVEnxuSRlgugOxlDuIjc-9F42C4ZhbwuYnh0O69UinLutAfSUZxUg2Nuy6xm-Rs3ubxzNFS7ZmZOgZDG2KcsCi2ukySFiw0LC9JPY6dbbd5SMEZWe8kjP5IWtAn_cWgcAMBg1fG60cRdL3iMi5hZ56pOq9v
No ratings yet
APznzabwlzV5M2e5GjQ954nHSvXZJgoScUzxJJGGObe92caYJVEnxuSRlgugOxlDuIjc-9F42C4ZhbwuYnh0O69UinLutAfSUZxUg2Nuy6xm-Rs3ubxzNFS7ZmZOgZDG2KcsCi2ukySFiw0LC9JPY6dbbd5SMEZWe8kjP5IWtAn_cWgcAMBg1fG60cRdL3iMi5hZ56pOq9v
82 pages
ML Classification Tree
No ratings yet
ML Classification Tree
36 pages
Classification & Prediction
No ratings yet
Classification & Prediction
78 pages
Classification and Clustering
No ratings yet
Classification and Clustering
59 pages
dm unit 4
No ratings yet
dm unit 4
24 pages
Fundamentals of Data Science Unit 4
100% (1)
Fundamentals of Data Science Unit 4
31 pages
Chap4 - Basic - Classification - Class Teaching
No ratings yet
Chap4 - Basic - Classification - Class Teaching
168 pages
Ch02 DecisionTree
No ratings yet
Ch02 DecisionTree
41 pages
Classification
No ratings yet
Classification
45 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
decision-tree-intro-MDT903
No ratings yet
decision-tree-intro-MDT903
40 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Unit-3 Classification
No ratings yet
Unit-3 Classification
28 pages
Dwdm-Unit-3 R16
No ratings yet
Dwdm-Unit-3 R16
14 pages
Les 3 DWM
No ratings yet
Les 3 DWM
21 pages
Module 04 Edited
No ratings yet
Module 04 Edited
19 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
Classification - Prediction Data Model Very Important
No ratings yet
Classification - Prediction Data Model Very Important
173 pages
Unit-3 DWDM
No ratings yet
Unit-3 DWDM
11 pages
Decision tree
No ratings yet
Decision tree
16 pages
Data Mining Algorithms For Classification
No ratings yet
Data Mining Algorithms For Classification
27 pages
BI Chapter 04 - Unlocked
No ratings yet
BI Chapter 04 - Unlocked
47 pages
DataMining_Unit-3
No ratings yet
DataMining_Unit-3
8 pages
Data Mining: Classification-1
No ratings yet
Data Mining: Classification-1
53 pages
3 - Sınıflandırma 2
No ratings yet
3 - Sınıflandırma 2
62 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
Data Mining Unit-Iii
No ratings yet
Data Mining Unit-Iii
36 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Lect6 Basic Classification PDF
No ratings yet
Lect6 Basic Classification PDF
30 pages
Classification: Decision Trees: Business Analytics Lecture 7/8
No ratings yet
Classification: Decision Trees: Business Analytics Lecture 7/8
35 pages
DAMI 011114a
No ratings yet
DAMI 011114a
48 pages
Classification&DecisionTree (2)
No ratings yet
Classification&DecisionTree (2)
10 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
Classification: Rule Based Classification 0R Holte 1R Holte Decision Tree
No ratings yet
Classification: Rule Based Classification 0R Holte 1R Holte Decision Tree
24 pages
Module 04
No ratings yet
Module 04
75 pages
Classification, Prediction
100% (1)
Classification, Prediction
67 pages
Unit 4 Data Mining Algorithms: Dr. Anjan Krishnamurthy Associate Professor Bmsit&M
No ratings yet
Unit 4 Data Mining Algorithms: Dr. Anjan Krishnamurthy Associate Professor Bmsit&M
95 pages
Decision Tree
No ratings yet
Decision Tree
45 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
Decision Trees Edited
No ratings yet
Decision Trees Edited
56 pages
05 chap3_basic_classification edited on Oct 10, 2023
No ratings yet
05 chap3_basic_classification edited on Oct 10, 2023
78 pages
Session 5b Classification by Decision Tree Induction (1)
No ratings yet
Session 5b Classification by Decision Tree Induction (1)
42 pages
Unit 3 - Classification
No ratings yet
Unit 3 - Classification
28 pages
Lecture 8
No ratings yet
Lecture 8
109 pages
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
Data Collection: Six Sigma Thinking, #1
From Everand
Data Collection: Six Sigma Thinking, #1
Sumeet Savant
No ratings yet
2 DMiningKuliah 2A DPreparation
No ratings yet
2 DMiningKuliah 2A DPreparation
32 pages
Data Preprocessing: Data Cleaning Data Integration and Transformation
No ratings yet
Data Preprocessing: Data Cleaning Data Integration and Transformation
41 pages
Advantage of Using Virtual Reality
No ratings yet
Advantage of Using Virtual Reality
3 pages
1 DMiningKuliah 1 Introduction
No ratings yet
1 DMiningKuliah 1 Introduction
51 pages
考古題
No ratings yet
考古題
1 page
PLUS 1 COMPUTER SCIENCE MODEL EXAM 2025 ANSWER KEY
No ratings yet
PLUS 1 COMPUTER SCIENCE MODEL EXAM 2025 ANSWER KEY
9 pages
Red Black Tree Insertion and Deletion
No ratings yet
Red Black Tree Insertion and Deletion
14 pages
Introduction Fuzzy Logic Onelectr
No ratings yet
Introduction Fuzzy Logic Onelectr
118 pages
JU NCPC 2023 - Preliminary Contest Online
0% (1)
JU NCPC 2023 - Preliminary Contest Online
14 pages
Lexical Analyser in C++ - ASHWATH KV - 106120017 For Full Code
No ratings yet
Lexical Analyser in C++ - ASHWATH KV - 106120017 For Full Code
4 pages
It Cse 6TH Sem Int Q P PDF
No ratings yet
It Cse 6TH Sem Int Q P PDF
86 pages
Suhailjt 1
No ratings yet
Suhailjt 1
3 pages
Bca Part III Sem V and Vi
No ratings yet
Bca Part III Sem V and Vi
39 pages
Ethical
No ratings yet
Ethical
8 pages
PCD Lab Manual - Parsing - C (Programming Language)
No ratings yet
PCD Lab Manual - Parsing - C (Programming Language)
26 pages
Operator Precedence - V1.1
100% (1)
Operator Precedence - V1.1
2 pages
DSA Assignment 2 Spr2023
No ratings yet
DSA Assignment 2 Spr2023
2 pages
Haskell - Haskell Cheat Sheet
No ratings yet
Haskell - Haskell Cheat Sheet
1 page
16
No ratings yet
16
2 pages
A Multi-View Confidence-Calibrated Framework For Fair and Stable Graph Representation Learning
No ratings yet
A Multi-View Confidence-Calibrated Framework For Fair and Stable Graph Representation Learning
6 pages
What Is Recursion
No ratings yet
What Is Recursion
4 pages
CRF Design Template v4.0
No ratings yet
CRF Design Template v4.0
28 pages
Cryptography Practical 1
No ratings yet
Cryptography Practical 1
41 pages
Introduction Embedded Networks
No ratings yet
Introduction Embedded Networks
13 pages
Opcodes of 8085 Microprocessor - Electricalvoice
No ratings yet
Opcodes of 8085 Microprocessor - Electricalvoice
14 pages
SAP Business One SDK COM Objects Change Report 1000200 Vs 1000210 SAP Business One 10.0 DIAPI and UIAPI Changelist
No ratings yet
SAP Business One SDK COM Objects Change Report 1000200 Vs 1000210 SAP Business One 10.0 DIAPI and UIAPI Changelist
24 pages
Example Programs 2 Structure
No ratings yet
Example Programs 2 Structure
8 pages
Instant Download (Ebook) EJB 3.1 Cookbook by M. Reese, Richard ISBN 9781849682381, 1849682380 PDF All Chapters
100% (4)
Instant Download (Ebook) EJB 3.1 Cookbook by M. Reese, Richard ISBN 9781849682381, 1849682380 PDF All Chapters
81 pages
(QUESTIONS) Sup Exam Object-Oriented Programming With Java (Wed, 7 July 2024)
No ratings yet
(QUESTIONS) Sup Exam Object-Oriented Programming With Java (Wed, 7 July 2024)
4 pages
Class 12 CS Project Allotment
No ratings yet
Class 12 CS Project Allotment
1 page
comp-2002-3-computer-programming-1
No ratings yet
comp-2002-3-computer-programming-1
101 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Classification: Decision Tree Hunt's Algorithm ID3 Rule Based Classifier C4.5

Uploaded by

Classification: Decision Tree Hunt's Algorithm ID3 Rule Based Classifier C4.5

Uploaded by

Classification

■ Rule Based Classifier

2-Sep-20 Data Mining: Classification 1

■ One of the attributes is the class

■ Find a model for the class attribute as a function of the

■ Test set used to validate it.

■ The accuracy of the model is determined on the test set.

2-Sep-20 Data Mining: Classification 2

2-Sep-20 Data Mining: Classification 3

2-Sep-20 Data Mining: Classification 6

■ Naïve Bayes and Bayesian Belief

2-Sep-20 Data Mining: Classification 7

2-Sep-20 Data Mining: Classification 8

2-Sep-20 Data Mining: Classification 9

2-Sep-20 Data Mining: Classification 10

2-Sep-20 Data Mining: Classification 11

2-Sep-20 Data Mining: Classification 12

2-Sep-20 Data Mining: Classification 13

2-Sep-20 Data Mining: Classification 14

2-Sep-20 Data Mining: Classification 15

2-Sep-20 Data Mining: Classification 16

2-Sep-20 Data Mining: Classification 17

■ Nominal: ID number, eye color, zip codes, etc.

■ Ordinal: rankings (e.g., taste of potato chips on a

scale from 1-10), grades, height in {tall, medium,

in Celsius or Fahrenheit, age, etc.

2-Sep-20 Data Mining: Classification 18

■ Each partition has subset

2-Sep-20 Data Mining: Classification 19

2-Sep-20 Data Mining: Classification 21

2-Sep-20 Data Mining: Classification 22

2-Sep-20 Data Mining: Classification 23

■ Compute the average impurity of the children (M)

3. Choose the attribute test condition that produces the

2-Sep-20 Data Mining: Classification 24

■ (NOTE: p( j | t) is the relative frequency of class j at node t).

■ Parent Node, p is split into k partitions;

■ Choose the split that achieves most reduction (maximizes GAIN)

2-Sep-20 Data Mining: Classification 25

2-Sep-20 Data Mining: Classification 26

■ The ‘Wind’ attribute:

■ Information Gain of “Wind” attribute:

2-Sep-20 Data Mining: Classification 27

■ Information Gain of “Sky” attribute:

2-Sep-20 Data Mining: Classification 28

■ Information Gain of “Barometer” attribute:

2-Sep-20 Data Mining: Classification 29

2-Sep-20 Data Mining: Classification 30

2-Sep-20 Data Mining: Classification 31

poorly on test set

training and test sets well. If the model is

2-Sep-20 Data Mining: Classification 33

2-Sep-20 Data Mining: Classification 34

■ Trim the nodes of the decision tree in a

2-Sep-20 Data Mining: Classification 35

2-Sep-20 Data Mining: Classification 37

■ Condition is a conjunctions of attributes

2-Sep-20 Data Mining: Classification 38

2-Sep-20 Data Mining: Classification 39

2-Sep-20 Data Mining: Classification 40

■ e.g.: RIPPER, CN2, Holte’s 1R

(Extract rules from decision tree)

2-Sep-20 Data Mining: Classification 42

2-Sep-20 Data Mining: Classification 43

■ PUBLIC (VLDB’98 — Rastogi & Shim)

2-Sep-20 Data Mining: Classification 44

2-Sep-20 Data Mining: Classification 45

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.