0% found this document useful (0 votes)

179 views27 pages

Decision Trees and Regression Techniques

- Decision trees are a supervised learning technique that can be used for both classification and regression problems. They use a tree-like model of decisions and their possible consequences to predict an outcome. - The C5.0 algorithm is a popular and effective method for building decision tree models. It uses a divide-and-conquer approach to recursively split the data into purer subsets based on feature values. - When choosing how to split the data, C5.0 aims to select the feature that creates the most homogeneous child nodes by minimizing entropy, a measure of disorder within nodes.

Uploaded by

ABY MOTTY RMCAA20-23

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

179 views27 pages

Decision Trees and Regression Techniques

Uploaded by

ABY MOTTY RMCAA20-23

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 27

Decision Trees and Regression Techniques

Module 3
Decision Tree

• Introduction

Classification is a two-step process, learning step and

prediction step, in machine learning. In the learning step,
the model is developed based on given training data. In
the prediction step, the model is used to predict the
response for given data. Decision Tree is one of the
easiest and popular classification algorithms to
understand and interpret.
 Decision Tree is a Supervised learning technique that can be used for both classification and Regression problems, but mostly
it is preferred for solving Classification problems. It is a tree-structured classifier, where internal nodes represent the features
of a dataset, branches represent the decision rules and each leaf node represents the outcome
 The goal of using a Decision Tree is to create a training model that can use to predict the class or value of the target
variable by learning simple decision rules inferred from prior data(training data).

 In Decision Trees, for predicting a class label for a record we start from the root of the tree. We compare the values of the
root attribute with the record’s attribute. On the basis of comparison, we follow the branch corresponding to that value
and jump to the next node.

 It is called a decision tree because, similar to a tree, it starts with the root node, which expands on further branches and
constructs a tree-like structure.

 In order to build a tree, we use the CART algorithm, which stands for Classification and Regression Tree algorithm.

 A decision tree simply asks a question, and based on the answer (Yes/No), it further split the tree into subtrees.

 A decision tree can contain categorical data (YES/NO) as well as numeric data.
Why use Decision Trees?
There are various algorithms in Machine learning, so choosing the best algorithm for the given dataset and problem is the
main point to remember while creating a machine learning model. Below are the two reasons for using the Decision tree:

•Decision Trees usually mimic human thinking ability while making a decision, so it is easy to understand.

•The logic behind the decision tree can be easily understood because it shows a tree-like structure.
Decision Tree Terminologies

•Root Node: Root node is from where the decision tree starts. It represents the entire dataset, which further gets divided into two or
more homogeneous sets.
•Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after getting a leaf node.
•Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes according to the given conditions.
•Branch/Sub Tree: A tree formed by splitting the tree.
•Pruning: Pruning is the process of removing the unwanted branches from the tree.
•Parent/Child node: The root node of the tree is called the parent node, and other nodes are called the child nodes

decision tree learners build a model in the form of a tree structure. The model itself comprises a series
of logical decisions, similar to a flowchart, with decision nodes that indicate a decision to be made on
an attribute. These split into branches that indicate the decision's choices. The tree is terminated by leaf
nodes (also known as terminal nodes) that denote the result of following a combination of decisions
Data that is to be classified begin at the root node where it is passed through the various decisions in the tree
according to the values of its features. The path that the data takes funnels each record into a leaf node,
which assigns it a predicted class

As the decision tree is essentially a flowchart, it is particularly appropriate for applications in which the
classification mechanism needs to be transparent for legal reasons or the results need to be shared in order
to facilitate decision making. Some potential uses include:
• Credit scoring models in which the criteria that causes an applicant to be rejected need to be well-specified
• Marketing studies of customer churn or customer satisfaction that will be shared with management or
advertising agencies
• Diagnosis of medical conditions based on laboratory measurements, symptoms, or rate of disease
progression
Divide and conquer

Decision trees are built using a heuristic called recursive partitioning. This approach is generally known as
divide and conquer because it uses the feature values to split the data into smaller and smaller subsets of
similar classes.
Beginning at the root node, which represents the entire dataset, the algorithm chooses a feature that is
the most predictive of the target class. The examples are then partitioned into groups of distinct values
of this feature; this decision forms the first set of tree branches. The algorithm continues to divide-and-
conquer the nodes, choosing the best candidate feature each time until a stopping criterion is reached.
This might occur at a node if:
• All (or nearly all) of the examples at the node have the same class
• There are no remaining features to distinguish among examples
• The tree has grown to a predefined size limit
To illustrate the tree building process, let's consider a simple example. Imagine that you are working for a
Hollywood film studio, and your desk is piled high with screenplays. Rather than read each one cover-to-
cover, you decide to develop a decision tree algorithm to predict whether a potential movie would fall into
one of three categories: mainstream hit, critic's choice, or box office bust.

 To gather data for your model, you turn to the studio archives to examine the previous ten years of
movie releases.
 After reviewing the data for 30 different movie scripts, a pattern emerges.
 There seems to be a relationship between the film's proposed shooting budget, the number of A-list
celebrities lined up for starring roles, and the categories of success
A scatter plot of this data might look something like the following diagram:
To build a simple decision tree using this data, we can apply a divide-and-conquer strategy. Let's first split
the feature indicating the number of celebrities, partitioning the movies into groups with and without a low
number of A-list stars:
Next, among the group of movies with a larger number of celebrities, we can make another split between
movies with and without a high budget:

At this point we have partitioned the data into three

groups. The group at the top-left corner of the diagram is
composed entirely of critically-acclaimed films. This group
is distinguished by a high number of celebrities and a
relatively low budget. At the top-right corner, the majority
of movies are box office hits, with high budgets and a
large number of celebrities. The final group, which has
little star power but budgets ranging from small to large,
contains the flops.
Since real-world data contains
more than two features,
decision trees quickly become
far more complex than this,
with many more nodes,
branches, and leaves. In the
next section you will learn
about a popular algorithm for
building decision tree models
automatically
The C5.0 decision tree algorithm

Function C5.0()

This function applies the divide and conquer stategy onto the example set S to create a descion tree DT.

1.If all examples in S belong to the same class ‘c’, then:

return(a new leaf and label it with ‘c’

Else :

a) select an attribute A according to some impurity function.

b) generate a new node DT with A as a test.

c) For each value vi in A

i. let Si=all examples in S with A=Vi

//Note: there are other examples of S

//here A != Vi

ii) Use C5.0() to construct a decision tree DTi for example sets Si.

iii) Generate an edge that connects DT and DTi

The C5.0 decision tree algorithm

• There are numerous implementations of decision trees, but one of the most wellknown is the C5
algorithm.
• This algorithm was developed by computer scientist J. Ross Quinlan as an improved version of his prior

algorithm, C4.5, which itself is an improvement over his ID3 (Iterative Dichotomiser 3) algorithm
• The C5.0 algorithm has become the industry standard for producing decision trees, because it does well

for most types of problems directly out of the box.

• Compared to more advanced and sophisticated machine learning models (e.g. Neural Networks and

Support Vector Machines), the decision trees under the C5.0 algorithm generally perform nearly as well

but are much easier to understand and deploy.

Choosing the best split

• The first main challenge that a decision tree will face is to identify which feature to split upon. If the

segments of the data contain only a single class, they are considered pure.

C5.0 uses the concept of entropy for measuring purity. The entropy of a sample of data indicates how
mixed the class values are; the minimum value of 0 indicates that the sample is completely
homogenous, while 1 indicates the maximum amount of disorder. The definition of entropy can be
specified as
For a given segment of data (S), the term c refers to the number of class levels and Pi refers to the
proportion of values falling into class level i. For example, suppose we have a partition of data with two
classes: red (60 percent) and white (40 percent). We can calculate the entropy as follows:

-0.60 * log2(0.60) - 0.40 * log2(0.40)

0.9709506

# Examine the entropy for all the possible two-class arrangements

curve(-x * log2(x) - (1 - x) * log2(1 - x), col = "red", xlab = "x", ylab = "Entropy", lwd = 4)
•A 50-50 split results in maximum entropy. As one class increasingly dominates the other, the entropy reduces to zero

Information gain to use entropy to determine the optimal feature to split upon. The information gain for a feature F is
calculated as the difference between the entropy in the segment before the split (S1) and the partitions resulting from
the split (S2):

InfoGain(F)=Entropy(S1)−Entropy(S2)

One complication is that after a split, the data is divided into more than one partition. Therefore, the
function to calculate Entropy(S2) needs to consider the total entropy across all of the partitions. It does this
by weighing each partition’s entropy by the proportion of records falling into the partition.
This can be stated in a formula as:
In simple terms, the total entropy resulting from a split is the sum of entropy of each of the n partitions

weighted by the proportion of examples falling in that partition (wi ).

The total entropy resulting from a split is the sum of the entropy of each of the n partitions weighted by

the proportion of examples falling in the partition (wi).

The higher the information gain, the better a feature is at creating homogeneous groups after a split

on this feature. If the information gain is zero, there is no reduction in entropy for splitting on this

feature.
Pruning the Decision Tree

Decision trees can continue to grow indefinitely by choosing splitting features and dividing them into
smaller and smaller partitions until each example is perfectly classified, or the algorithm runs out of features
to split on. However, if the tree grows overly large, many of the decisions it makes will be overly specific and
the model will overfit the training data. To overcome this, we can prune a decision tree which involves a
process of reducing its size such that it generalizes better to unobserved data

One solution to this problem is to stop the tree from growing once it reaches a certain number of decisions,
or if the decision node contains only a small number of examples. This method is known as pre-pruning, or
early stopping the decision tree. As the tree avoids doing unnecessary work, this can quite often be an
appealing strategy.

One of the benefits of the C5.0 algorithm is that it is opinionated about pruning; it takes care of many of the
decisions automatically using fairly reasonable defaults. Its overall strategy is to postprune the tree. It does
this by first growing a large tree that overfits the training data. Aftewrwards, nodes and branches that have
little effect on the classification errors are removed.
One of the benefits of the C5.0 algorithm is that it is opinionated about pruning—it takes care of many of
the decisions, automatically using fairly reasonable defaults. Its overall strategy is to postprune the tree. It
first grows a large tree that overfits the training data. Later, nodes and branches that have little effect on
the classification errors are removed. In some cases, entire branches are moved further up the tree or
replaced by simpler decisions. These processes of grafting branches are known as subtree raising and
subtree replacement, respectively
Understanding classification rules

Classification rules represent knowledge in the form of logical if-else statements that assign a class to
unlabelled examples. They are specified in terms of an antecedent and a consequent; these form a
hypothesis stating that "if this happens, then that happens.

Rule learners are often used in a manner similar to decision tree learners. Like decision trees, they can be
used for applications that generate knowledge for future action, such as:
• Identifying conditions that lead to a hardware failure in mechanical devices
• Describing the defining characteristics of groups of people for customer segmentation
• Finding conditions that precede large drops or increases in the prices of shares on the stock market

Supervised Learning-Classification Part-4 Divide and Conquer
No ratings yet
Supervised Learning-Classification Part-4 Divide and Conquer
32 pages
Module III - Classification Decision Tree
No ratings yet
Module III - Classification Decision Tree
48 pages
Machine_Learning_Lecture_08_Decision Tree Learning (1)
No ratings yet
Machine_Learning_Lecture_08_Decision Tree Learning (1)
67 pages
Konsep Ensemble
No ratings yet
Konsep Ensemble
52 pages
Decision Tree
No ratings yet
Decision Tree
82 pages
Lecture Note #5_PEC-CS701E
No ratings yet
Lecture Note #5_PEC-CS701E
16 pages
Decision Tree
No ratings yet
Decision Tree
57 pages
DS_TECH_M_3_1
No ratings yet
DS_TECH_M_3_1
13 pages
Decisiontree
No ratings yet
Decisiontree
6 pages
decision tree
No ratings yet
decision tree
13 pages
DM chapter 4
No ratings yet
DM chapter 4
6 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
Decision Tree
100% (1)
Decision Tree
57 pages
Machine Learning: Prepared by
No ratings yet
Machine Learning: Prepared by
44 pages
m3
No ratings yet
m3
141 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
Tree Based Learning Methods
No ratings yet
Tree Based Learning Methods
28 pages
Decision Treesnotes
No ratings yet
Decision Treesnotes
3 pages
Decision Tree
No ratings yet
Decision Tree
45 pages
Decision Tree Classification Algorithm (2)
No ratings yet
Decision Tree Classification Algorithm (2)
11 pages
Unit 3 Classification - Dr. Vidyut D
No ratings yet
Unit 3 Classification - Dr. Vidyut D
72 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
Unit 3
No ratings yet
Unit 3
31 pages
1822-b.e-cse-batchno-149
No ratings yet
1822-b.e-cse-batchno-149
66 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
17 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
UNIT 2 - Groups (Decision Tree) (1)
No ratings yet
UNIT 2 - Groups (Decision Tree) (1)
20 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Ml Ch-3 Decision Trees and Ensemble Methods
No ratings yet
Ml Ch-3 Decision Trees and Ensemble Methods
14 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
7 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Lecture 7 Overview of ML models
No ratings yet
Lecture 7 Overview of ML models
77 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
Classification: Decision Trees: Business Analytics Lecture 7/8
No ratings yet
Classification: Decision Trees: Business Analytics Lecture 7/8
35 pages
TEAA_ Tree Ensembles-1
No ratings yet
TEAA_ Tree Ensembles-1
43 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
11 pages
EDA Cat2
No ratings yet
EDA Cat2
54 pages
What is Decision Tree
No ratings yet
What is Decision Tree
35 pages
decision tree
No ratings yet
decision tree
11 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
U4 ML Updated
No ratings yet
U4 ML Updated
32 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
15 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
Adobe Scan 16 May 2023 (5)
No ratings yet
Adobe Scan 16 May 2023 (5)
12 pages
Unit 4
No ratings yet
Unit 4
33 pages
DMI UNIT 4
No ratings yet
DMI UNIT 4
34 pages
Decision Trees Edited
No ratings yet
Decision Trees Edited
56 pages
Tree
No ratings yet
Tree
7 pages
MI_Unit 4
No ratings yet
MI_Unit 4
79 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
34 pages
unit-4[1].docx ML
No ratings yet
unit-4[1].docx ML
42 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
Decisiontree1 2
No ratings yet
Decisiontree1 2
29 pages
ML_Module-3-chapter-6 RNSIT
No ratings yet
ML_Module-3-chapter-6 RNSIT
10 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
CCNA2 SRWE Module 6
No ratings yet
CCNA2 SRWE Module 6
31 pages
Neural Network
No ratings yet
Neural Network
37 pages
Regression
No ratings yet
Regression
25 pages
Women Security
No ratings yet
Women Security
18 pages
Animal Intrusion Detection
No ratings yet
Animal Intrusion Detection
18 pages
Btech Oe 3 Sem Basic Data Structure and Algorithms Koe035 2022
No ratings yet
Btech Oe 3 Sem Basic Data Structure and Algorithms Koe035 2022
2 pages
Lec01-Introduction and Overview
No ratings yet
Lec01-Introduction and Overview
45 pages
Fa22-Bse-051 Ai Assignment # 01
100% (1)
Fa22-Bse-051 Ai Assignment # 01
22 pages
Report of Breast Cancer
No ratings yet
Report of Breast Cancer
80 pages
A Complete Guide For Building A AVL Tree
100% (4)
A Complete Guide For Building A AVL Tree
5 pages
DSA solutions of endsem paper 2023
No ratings yet
DSA solutions of endsem paper 2023
22 pages
Microsoft Word - C++ Assignment
No ratings yet
Microsoft Word - C++ Assignment
77 pages
Chapter5 1
No ratings yet
Chapter5 1
41 pages
AVL Trees: Cse, Postech
100% (2)
AVL Trees: Cse, Postech
29 pages
Mcdermott 6014 Final Summer B 17
No ratings yet
Mcdermott 6014 Final Summer B 17
14 pages
CourseModule-DS(UGCA-1915) (1)
No ratings yet
CourseModule-DS(UGCA-1915) (1)
14 pages
Oracle Inventory Management R12 Features and Highlights: Your Friend in The Business
No ratings yet
Oracle Inventory Management R12 Features and Highlights: Your Friend in The Business
28 pages
Algorithms and Data Structures
0% (1)
Algorithms and Data Structures
161 pages
CHICHEWA
No ratings yet
CHICHEWA
20 pages
DSA Cheatsheet Capgemini
100% (1)
DSA Cheatsheet Capgemini
166 pages
Data Structure Interview Questions (2022) - Javatpoint
No ratings yet
Data Structure Interview Questions (2022) - Javatpoint
22 pages
Data Structures
No ratings yet
Data Structures
2 pages
APEX
No ratings yet
APEX
30 pages
DSA DAY 5 - Trees
100% (1)
DSA DAY 5 - Trees
36 pages
Dynamic Markov Compression
No ratings yet
Dynamic Markov Compression
10 pages
DAA - Internal 2 QP (With Key)
No ratings yet
DAA - Internal 2 QP (With Key)
22 pages
Nov Dec 2022
No ratings yet
Nov Dec 2022
4 pages
AI One Marks Q&A
No ratings yet
AI One Marks Q&A
5 pages
Course Description Form Theory
No ratings yet
Course Description Form Theory
5 pages
Data Structures and Algorithms
No ratings yet
Data Structures and Algorithms
33 pages
ISC 2012 Computer Science Paper 1 Theory
No ratings yet
ISC 2012 Computer Science Paper 1 Theory
10 pages
Merge Sort: 1. Divide Step
No ratings yet
Merge Sort: 1. Divide Step
5 pages
Definition of B-Trees Properties Specialization Examples 2-3 Trees Insertion of B-Tree Remove Items From B-Tree
No ratings yet
Definition of B-Trees Properties Specialization Examples 2-3 Trees Insertion of B-Tree Remove Items From B-Tree
21 pages
DSA Trees
No ratings yet
DSA Trees
122 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Decision Trees and Regression Techniques

Uploaded by

Decision Trees and Regression Techniques

Uploaded by

Decision Trees and Regression Techniques

Classification is a two-step process, learning step and

At this point we have partitioned the data into three

1.If all examples in S belong to the same class ‘c’, then:

return(a new leaf and label it with ‘c’

a) select an attribute A according to some impurity function.

b) generate a new node DT with A as a test.

c) For each value vi in A

//Note: there are other examples of S

iii) Generate an edge that connects DT and DTi

for most types of problems directly out of the box.

but are much easier to understand and deploy.

-0.60 * log2(0.60) - 0.40 * log2(0.40)

# Examine the entropy for all the possible two-class arrangements

weighted by the proportion of examples falling in that partition (wi ).

the proportion of examples falling in the partition (wi).

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.