0% found this document useful (0 votes)

24 views

Classification

The document discusses classification models and decision trees. It explains how decision trees work by splitting a dataset into subsets based on attribute values to create partitions that are increasingly homogeneous. It covers concepts like information gain, entropy, and when to stop splitting data in a decision tree.

Uploaded by

mostafasameer858

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views

Classification

Uploaded by

mostafasameer858

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Department: Data Science

Year: 2nd Year

Course: Fundamentals of Data Science Semester: 2nd Term
Academic Year: 2023/2024
Lecture No.: 4
Fundamentals of
Data Science

LECTURE 4: CLASSIFICATION
4. Classification
Models
Classification
• The process in which historical records are used to make a prediction
about an uncertain future.
• At a fundamental level, most data science problems can be
categorized into either class or numeric prediction problems.
• In classification or class prediction, one should try to use the
information from the predictors or independent variables to sort the
data samples into two or more distinct classes or buckets.
• In the case of numeric prediction, one would try to predict the
numeric value of a dependent variable using the values assumed by
the independent variables.
Classification

Target variable is categorical. Predictors could be of any data type.

Algorithms
Decision Trees
Rule induction
K-NN
Naive Bayesian
Neural Networks
Support Vector Machines
Decision Trees
• A decision tree is a supervised learning algorithm used for
both classification and regression problems.
• Simply put, it takes the form of a tree with branches
representing the potential answers to a given question.
• There are metrics used to train decision trees.
• One of them is information gain (Entropy).
Decision Trees
• Decision trees (also known as classification trees) are
probably one of the most intuitive and frequently used data
science techniques.
• From an analyst’s point of view, they are easy to set up.
• From a business user’s point of view they are easy to
interpret.
• Classification trees, are used to separate a dataset into
classes belonging to the response variable.Usually the
response variable has two classes: Yes or No (1 or 0).
Decision Trees
How It Works
• A decision tree model takes a form of decision flowchart where
an attribute is tested in each node.
• At end of the decision tree path is a leaf node where a prediction
is made.
• The nodes split the dataset into subsets.
• In a decision tree, the idea is to split the dataset based on the
homogeneity of data.
Decision Trees
Entropy
• Entropy is an information theory metric that measures the impurity or
uncertainty in a group of observations.
• It determines how a decision tree chooses to split data.
Decision Trees
• Defined entropy as log2 (1/p) or -log2 (p) where p is the
probability of an event occurring.
• If the probability for all events is not identical, a weighted expression
is needed and, thus, entropy, H, is adjusted as follows:

where k=1, 2, 3, . . ., m represents the m classes of the target variable.

Pk represents the proportion of samples that belong to class k.
Decision Trees
The Gini index (G) is similar to the entropy measure in its
characteristics and is defined as

The value of G ranges between 0 and a maximum value of 0.5,

otherwise has properties identical to H, and either of these
formulations can be used to create partitions in the data
Decision Trees

http://archive.ics.uci.edu/ml/datasets/

All datasets used in this book are

available at the companion website.
Decision Trees
Predictors / Attributes Target / Class
Decision Trees
Decision Trees
• Start by partitioning the data on each of the four
regular attributes.
• Let us start with Outlook.
• There are three categories for this variable:
sunny, overcast, and rain.
• We see that when it is overcast, there are four
examples where the outcome was Play=yes for
all four cases and so the proportion (ratio) of
examples in this case is 100% or 1.0.
• If we split the dataset here, the resulting four
sample partition will be 100% pure for Play= yes.
Decision Trees

Mathematically for this partition, the entropy

can be calculated using as:

Similarly, the entropy in the other two

situations (sunny, rain) for Outlook can be
calculated:
Decision Trees

• The Total Information i is calculated as the

weighted sum of these component entropies.
• There are four instances of Outlook=overcast, thus,
the proportion for overcast is given by
poutlook:overcast=4/14.
• The other proportions (for Outlook,sunny= 5/14 and
Outlook,rain= are 5/14):
Decision Trees

Had the data not been partitioned along

the three values for Outlook, the total
information would have been simply the
weighted average of the respective
entropies for the two classes whose
overall proportions were 5/14 (Play=no)
and 9/14 (Play=yes):
Decision Trees
• By creating these splits or partitions,
some entropy has been reduced (and,
thus, some information has been gained).
• This is called, aptly enough, information
gain.
• In the case of Outlook, this is given
simply by:
Decision Trees (Assignment)
• Similar information gain values for the other three attributes can now
be computed, as shown in Table 4.2.
• It is clear that if the dataset is partitioned into three sets along the
three values of Outlook, the largest information gain would be
experienced.
Decision Tree
Decision Tree
Decision Tree

When to Stop Splitting Data?

• In real-world datasets, it is very unlikely that to get terminal nodes
that are 100% homogeneous as was just seen in the golf dataset.
• In this case, the algorithm would need to be instructed when to stop.
• There are several situations where the process can be terminated:
1- No attribute satisfies a minimum information gain threshold (such
as the one computed in Table 4.2).
2- A maximal depth is reached: as the tree grows larger, not only
does interpretation get harder.
3- There are less than a certain number of examples in the current
subtree.
Decision Tree

Now the application of the decision tree algorithm can

be summarized with this simple five-step process:
1. Using Shannon entropy, sort the dataset into
homogenous (by class) and non-homogeneous
variables.
Homogeneous variables have low information
entropy and non-homogeneous variables have high
information entropy.
This was done in the calculation of ioutlook, no partition.
Decision Tree

2. Weight each independent variable on the target

variable using the entropy weighted averages.
This was done during the calculation of ioutlook in the
example.
Decision Tree

3. Compute the information gain, which is essentially

the reduction in the entropy of the target variable due
to its relationship with each independent variable.
This is simply the difference between the information
entropy found in step 1 minus the joint entropy
calculated in step 2.
This was done during the calculation of ioutlook, no partition - ioutlook.
Decision Tree

4. The independent variable with the highest

information gain will become the root or the first
node on which the dataset is divided.
This was done using the calculation of the information
gain table.
Decision Tree

5. Repeat this process for each variable for which the

Shannon entropy is nonzero. If the entropy of a
variable is zero, then that variable becomes a “leaf”
node.
Tree to Rules

Data Types
No ratings yet
Data Types
35 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
Unit6 -2 Classification-Decision-Trees_25625586-1bf9-4821-a721-70db2d7805ef
No ratings yet
Unit6 -2 Classification-Decision-Trees_25625586-1bf9-4821-a721-70db2d7805ef
36 pages
Decision Tree
No ratings yet
Decision Tree
19 pages
DS4 - CLS-Decision Tree
No ratings yet
DS4 - CLS-Decision Tree
32 pages
06 - Decision Trees
No ratings yet
06 - Decision Trees
14 pages
decision tree
No ratings yet
decision tree
66 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
Chapter 5 2018 2019
No ratings yet
Chapter 5 2018 2019
5 pages
Data Mining Notes Unit 4
No ratings yet
Data Mining Notes Unit 4
30 pages
Cse 445 Lecture 8 Mma
No ratings yet
Cse 445 Lecture 8 Mma
107 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
DMDW-CO3-SESSION-14
No ratings yet
DMDW-CO3-SESSION-14
55 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
DM UNIT III (1)
No ratings yet
DM UNIT III (1)
87 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
Entropy and Information Gain Explained
No ratings yet
Entropy and Information Gain Explained
10 pages
ML_Unit-2_Material
No ratings yet
ML_Unit-2_Material
20 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
Geometric Intuition of Decision Tree: Axis Parallel Hyperplanes
No ratings yet
Geometric Intuition of Decision Tree: Axis Parallel Hyperplanes
7 pages
ML_Module-3-chapter-6 RNSIT
No ratings yet
ML_Module-3-chapter-6 RNSIT
10 pages
Decision Trees
No ratings yet
Decision Trees
14 pages
Supervised Learning-Classification Part-4 Divide and Conquer
No ratings yet
Supervised Learning-Classification Part-4 Divide and Conquer
32 pages
Assignment 3
No ratings yet
Assignment 3
8 pages
ML Classification Tree
No ratings yet
ML Classification Tree
36 pages
Decision Tree (1)
No ratings yet
Decision Tree (1)
7 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
DM-Lecture Decision Trees (A)
No ratings yet
DM-Lecture Decision Trees (A)
161 pages
DM chapter 4
No ratings yet
DM chapter 4
6 pages
Slide 3
No ratings yet
Slide 3
23 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
Decision - Tree
No ratings yet
Decision - Tree
75 pages
Module 3 Chap 3 Decision Tree Learning
No ratings yet
Module 3 Chap 3 Decision Tree Learning
79 pages
06-Classification_Part1
No ratings yet
06-Classification_Part1
44 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
18 pages
Decision Trees Iterative Dichotomiser 3 (ID3) For Classification: An ML Algorithm
No ratings yet
Decision Trees Iterative Dichotomiser 3 (ID3) For Classification: An ML Algorithm
7 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
Decision Trees: A Recent Overview: S. B. Kotsiantis
No ratings yet
Decision Trees: A Recent Overview: S. B. Kotsiantis
23 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
43 pages
2 ML Ch3 Decision Trees Final
No ratings yet
2 ML Ch3 Decision Trees Final
70 pages
L5 - Decision Tree - B
No ratings yet
L5 - Decision Tree - B
51 pages
ML Unit 3
No ratings yet
ML Unit 3
15 pages
UNIT 3 PART 2
No ratings yet
UNIT 3 PART 2
21 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
MLT UNIT-3 notes
No ratings yet
MLT UNIT-3 notes
35 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
ML Unit-2 Material WORD
No ratings yet
ML Unit-2 Material WORD
25 pages
Chap 4 - Using Decision Trees For Classification
No ratings yet
Chap 4 - Using Decision Trees For Classification
10 pages
Decision Tree
100% (4)
Decision Tree
66 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
Lecture2 DT
No ratings yet
Lecture2 DT
75 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
Chapter 3 Decision Trees
No ratings yet
Chapter 3 Decision Trees
61 pages
ML_UNIT_3_NOTES-1
No ratings yet
ML_UNIT_3_NOTES-1
118 pages
فاينل تعلم
No ratings yet
فاينل تعلم
144 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Chapter 4 CS Class 11 Notes
No ratings yet
Chapter 4 CS Class 11 Notes
9 pages
C++ Assignment
50% (2)
C++ Assignment
185 pages
21CS42 Design and Analysis of Algorithms
No ratings yet
21CS42 Design and Analysis of Algorithms
5 pages
Mathematical Foundations of Data Sciences - FundationsDataScience
No ratings yet
Mathematical Foundations of Data Sciences - FundationsDataScience
294 pages
Bisection Method
No ratings yet
Bisection Method
24 pages
Chapter 3
No ratings yet
Chapter 3
46 pages
Special Cases in Simplex Method
No ratings yet
Special Cases in Simplex Method
18 pages
Operational Research BIP Modeling Example
No ratings yet
Operational Research BIP Modeling Example
12 pages
2-3 Tree PDF
No ratings yet
2-3 Tree PDF
4 pages
Lecture 6
No ratings yet
Lecture 6
32 pages
DSTL UNIT 5
No ratings yet
DSTL UNIT 5
5 pages
4 D and C - Maximum Subarray
No ratings yet
4 D and C - Maximum Subarray
13 pages
CSCI 4717/5717 Computer Architecture: Error Correction in Memory
No ratings yet
CSCI 4717/5717 Computer Architecture: Error Correction in Memory
3 pages
C# Class Presentation
No ratings yet
C# Class Presentation
39 pages
CSE3506 - CAT-2 Answers: Q. No. Sub-Division Question Text Marks
No ratings yet
CSE3506 - CAT-2 Answers: Q. No. Sub-Division Question Text Marks
11 pages
DX Dy: Order To Estimate y (0.1) and y (0.2) When y (X)
No ratings yet
DX Dy: Order To Estimate y (0.1) and y (0.2) When y (X)
4 pages
B. Tech. Assignment Questions OBE Pattern
No ratings yet
B. Tech. Assignment Questions OBE Pattern
2 pages
08 - Chapter 1
No ratings yet
08 - Chapter 1
19 pages
Avl Tree An Efficient Retrieval Engine in Classified Fingerprint Database
No ratings yet
Avl Tree An Efficient Retrieval Engine in Classified Fingerprint Database
2 pages
Numerical Solutions - Module 5 - Incremental Search
No ratings yet
Numerical Solutions - Module 5 - Incremental Search
6 pages
ADS1 Lab Manual - Java - Updated Jan 2021-1
No ratings yet
ADS1 Lab Manual - Java - Updated Jan 2021-1
30 pages
AIMA Decision Trees
No ratings yet
AIMA Decision Trees
11 pages
Clock Driven
No ratings yet
Clock Driven
11 pages
Week 2 Lecture Material PDF
No ratings yet
Week 2 Lecture Material PDF
107 pages
Graphs in Data Structures
No ratings yet
Graphs in Data Structures
28 pages
19Nh14 102190051 Lab13 Chương Trình MapReduce Shortest Path Using Parallel Breadth First Search BFS 02
No ratings yet
19Nh14 102190051 Lab13 Chương Trình MapReduce Shortest Path Using Parallel Breadth First Search BFS 02
16 pages
COL106 Assignment 2
No ratings yet
COL106 Assignment 2
5 pages
Division Algorithms in Computer Organization and Architecture
No ratings yet
Division Algorithms in Computer Organization and Architecture
5 pages
Intelligent Systems: Principles, Paradigms and Pragmatics
0% (1)
Intelligent Systems: Principles, Paradigms and Pragmatics
20 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Classification

Uploaded by

Classification

Uploaded by

Department: Data Science

Year: 2nd Year

Target variable is categorical. Predictors could be of any data type.

where k=1, 2, 3, . . ., m represents the m classes of the target variable.

The value of G ranges between 0 and a maximum value of 0.5,

All datasets used in this book are

Mathematically for this partition, the entropy

Similarly, the entropy in the other two

• The Total Information i is calculated as the

Had the data not been partitioned along

When to Stop Splitting Data?

Now the application of the decision tree algorithm can

2. Weight each independent variable on the target

3. Compute the information gain, which is essentially

4. The independent variable with the highest

5. Repeat this process for each variable for which the

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.