0% found this document useful (0 votes)
24 views

Classification

The document discusses classification models and decision trees. It explains how decision trees work by splitting a dataset into subsets based on attribute values to create partitions that are increasingly homogeneous. It covers concepts like information gain, entropy, and when to stop splitting data in a decision tree.

Uploaded by

mostafasameer858
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Classification

The document discusses classification models and decision trees. It explains how decision trees work by splitting a dataset into subsets based on attribute values to create partitions that are increasingly homogeneous. It covers concepts like information gain, entropy, and when to stop splitting data in a decision tree.

Uploaded by

mostafasameer858
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Department: Data Science

Year: 2nd Year


Course: Fundamentals of Data Science Semester: 2nd Term
Academic Year: 2023/2024
Lecture No.: 4
Fundamentals of
Data Science

LECTURE 4: CLASSIFICATION
4. Classification
Models
Classification
• The process in which historical records are used to make a prediction
about an uncertain future.
• At a fundamental level, most data science problems can be
categorized into either class or numeric prediction problems.
• In classification or class prediction, one should try to use the
information from the predictors or independent variables to sort the
data samples into two or more distinct classes or buckets.
• In the case of numeric prediction, one would try to predict the
numeric value of a dependent variable using the values assumed by
the independent variables.
Classification

Target variable is categorical. Predictors could be of any data type.

Algorithms
Decision Trees
Rule induction
K-NN
Naive Bayesian
Neural Networks
Support Vector Machines
Decision Trees
• A decision tree is a supervised learning algorithm used for
both classification and regression problems.
• Simply put, it takes the form of a tree with branches
representing the potential answers to a given question.
• There are metrics used to train decision trees.
• One of them is information gain (Entropy).
Decision Trees
• Decision trees (also known as classification trees) are
probably one of the most intuitive and frequently used data
science techniques.
• From an analyst’s point of view, they are easy to set up.
• From a business user’s point of view they are easy to
interpret.
• Classification trees, are used to separate a dataset into
classes belonging to the response variable.Usually the
response variable has two classes: Yes or No (1 or 0).
Decision Trees
How It Works
• A decision tree model takes a form of decision flowchart where
an attribute is tested in each node.
• At end of the decision tree path is a leaf node where a prediction
is made.
• The nodes split the dataset into subsets.
• In a decision tree, the idea is to split the dataset based on the
homogeneity of data.
Decision Trees
Entropy
• Entropy is an information theory metric that measures the impurity or
uncertainty in a group of observations.
• It determines how a decision tree chooses to split data.
Decision Trees
• Defined entropy as log2 (1/p) or -log2 (p) where p is the
probability of an event occurring.
• If the probability for all events is not identical, a weighted expression
is needed and, thus, entropy, H, is adjusted as follows:

where k=1, 2, 3, . . ., m represents the m classes of the target variable.


Pk represents the proportion of samples that belong to class k.
Decision Trees
The Gini index (G) is similar to the entropy measure in its
characteristics and is defined as

The value of G ranges between 0 and a maximum value of 0.5,


otherwise has properties identical to H, and either of these
formulations can be used to create partitions in the data
Decision Trees

http://archive.ics.uci.edu/ml/datasets/

All datasets used in this book are


available at the companion website.
Decision Trees
Predictors / Attributes Target / Class
Decision Trees
Decision Trees
• Start by partitioning the data on each of the four
regular attributes.
• Let us start with Outlook.
• There are three categories for this variable:
sunny, overcast, and rain.
• We see that when it is overcast, there are four
examples where the outcome was Play=yes for
all four cases and so the proportion (ratio) of
examples in this case is 100% or 1.0.
• If we split the dataset here, the resulting four
sample partition will be 100% pure for Play= yes.
Decision Trees

Mathematically for this partition, the entropy


can be calculated using as:

Similarly, the entropy in the other two


situations (sunny, rain) for Outlook can be
calculated:
Decision Trees

• The Total Information i is calculated as the


weighted sum of these component entropies.
• There are four instances of Outlook=overcast, thus,
the proportion for overcast is given by
poutlook:overcast=4/14.
• The other proportions (for Outlook,sunny= 5/14 and
Outlook,rain= are 5/14):
Decision Trees

Had the data not been partitioned along


the three values for Outlook, the total
information would have been simply the
weighted average of the respective
entropies for the two classes whose
overall proportions were 5/14 (Play=no)
and 9/14 (Play=yes):
Decision Trees
• By creating these splits or partitions,
some entropy has been reduced (and,
thus, some information has been gained).
• This is called, aptly enough, information
gain.
• In the case of Outlook, this is given
simply by:
Decision Trees (Assignment)
• Similar information gain values for the other three attributes can now
be computed, as shown in Table 4.2.
• It is clear that if the dataset is partitioned into three sets along the
three values of Outlook, the largest information gain would be
experienced.
Decision Tree
Decision Tree
Decision Tree

When to Stop Splitting Data?


• In real-world datasets, it is very unlikely that to get terminal nodes
that are 100% homogeneous as was just seen in the golf dataset.
• In this case, the algorithm would need to be instructed when to stop.
• There are several situations where the process can be terminated:
1- No attribute satisfies a minimum information gain threshold (such
as the one computed in Table 4.2).
2- A maximal depth is reached: as the tree grows larger, not only
does interpretation get harder.
3- There are less than a certain number of examples in the current
subtree.
Decision Tree

Now the application of the decision tree algorithm can


be summarized with this simple five-step process:
1. Using Shannon entropy, sort the dataset into
homogenous (by class) and non-homogeneous
variables.
Homogeneous variables have low information
entropy and non-homogeneous variables have high
information entropy.
This was done in the calculation of ioutlook, no partition.
Decision Tree

2. Weight each independent variable on the target


variable using the entropy weighted averages.
This was done during the calculation of ioutlook in the
example.
Decision Tree

3. Compute the information gain, which is essentially


the reduction in the entropy of the target variable due
to its relationship with each independent variable.
This is simply the difference between the information
entropy found in step 1 minus the joint entropy
calculated in step 2.
This was done during the calculation of ioutlook, no partition - ioutlook.
Decision Tree

4. The independent variable with the highest


information gain will become the root or the first
node on which the dataset is divided.
This was done using the calculation of the information
gain table.
Decision Tree

5. Repeat this process for each variable for which the


Shannon entropy is nonzero. If the entropy of a
variable is zero, then that variable becomes a “leaf”
node.
Tree to Rules

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy