0% found this document useful (0 votes)
179 views27 pages

Decision Trees and Regression Techniques

- Decision trees are a supervised learning technique that can be used for both classification and regression problems. They use a tree-like model of decisions and their possible consequences to predict an outcome. - The C5.0 algorithm is a popular and effective method for building decision tree models. It uses a divide-and-conquer approach to recursively split the data into purer subsets based on feature values. - When choosing how to split the data, C5.0 aims to select the feature that creates the most homogeneous child nodes by minimizing entropy, a measure of disorder within nodes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
179 views27 pages

Decision Trees and Regression Techniques

- Decision trees are a supervised learning technique that can be used for both classification and regression problems. They use a tree-like model of decisions and their possible consequences to predict an outcome. - The C5.0 algorithm is a popular and effective method for building decision tree models. It uses a divide-and-conquer approach to recursively split the data into purer subsets based on feature values. - When choosing how to split the data, C5.0 aims to select the feature that creates the most homogeneous child nodes by minimizing entropy, a measure of disorder within nodes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

Decision Trees and Regression Techniques

Module 3
Decision Tree

• Introduction

Classification is a two-step process, learning step and


prediction step, in machine learning. In the learning step,
the model is developed based on given training data. In
the prediction step, the model is used to predict the
response for given data. Decision Tree is one of the
easiest and popular classification algorithms to
understand and interpret.
 Decision Tree is a Supervised learning technique that can be used for both classification and Regression problems, but mostly
it is preferred for solving Classification problems. It is a tree-structured classifier, where internal nodes represent the features
of a dataset, branches represent the decision rules and each leaf node represents the outcome
 The goal of using a Decision Tree is to create a training model that can use to predict the class or value of the target
variable by learning simple decision rules inferred from prior data(training data).

 In Decision Trees, for predicting a class label for a record we start from the root of the tree. We compare the values of the
root attribute with the record’s attribute. On the basis of comparison, we follow the branch corresponding to that value
and jump to the next node.

 It is called a decision tree because, similar to a tree, it starts with the root node, which expands on further branches and
constructs a tree-like structure.

 In order to build a tree, we use the CART algorithm, which stands for Classification and Regression Tree algorithm.

 A decision tree simply asks a question, and based on the answer (Yes/No), it further split the tree into subtrees.

 A decision tree can contain categorical data (YES/NO) as well as numeric data.
Why use Decision Trees?
There are various algorithms in Machine learning, so choosing the best algorithm for the given dataset and problem is the
main point to remember while creating a machine learning model. Below are the two reasons for using the Decision tree:

•Decision Trees usually mimic human thinking ability while making a decision, so it is easy to understand.

•The logic behind the decision tree can be easily understood because it shows a tree-like structure.
Decision Tree Terminologies

•Root Node: Root node is from where the decision tree starts. It represents the entire dataset, which further gets divided into two or
more homogeneous sets.
•Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after getting a leaf node.
•Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes according to the given conditions.
•Branch/Sub Tree: A tree formed by splitting the tree.
•Pruning: Pruning is the process of removing the unwanted branches from the tree.
•Parent/Child node: The root node of the tree is called the parent node, and other nodes are called the child nodes

decision tree learners build a model in the form of a tree structure. The model itself comprises a series
of logical decisions, similar to a flowchart, with decision nodes that indicate a decision to be made on
an attribute. These split into branches that indicate the decision's choices. The tree is terminated by leaf
nodes (also known as terminal nodes) that denote the result of following a combination of decisions
Data that is to be classified begin at the root node where it is passed through the various decisions in the tree
according to the values of its features. The path that the data takes funnels each record into a leaf node,
which assigns it a predicted class

As the decision tree is essentially a flowchart, it is particularly appropriate for applications in which the
classification mechanism needs to be transparent for legal reasons or the results need to be shared in order
to facilitate decision making. Some potential uses include:
• Credit scoring models in which the criteria that causes an applicant to be rejected need to be well-specified
• Marketing studies of customer churn or customer satisfaction that will be shared with management or
advertising agencies
• Diagnosis of medical conditions based on laboratory measurements, symptoms, or rate of disease
progression
Divide and conquer

Decision trees are built using a heuristic called recursive partitioning. This approach is generally known as
divide and conquer because it uses the feature values to split the data into smaller and smaller subsets of
similar classes.
Beginning at the root node, which represents the entire dataset, the algorithm chooses a feature that is
the most predictive of the target class. The examples are then partitioned into groups of distinct values
of this feature; this decision forms the first set of tree branches. The algorithm continues to divide-and-
conquer the nodes, choosing the best candidate feature each time until a stopping criterion is reached.
This might occur at a node if:
• All (or nearly all) of the examples at the node have the same class
• There are no remaining features to distinguish among examples
• The tree has grown to a predefined size limit
To illustrate the tree building process, let's consider a simple example. Imagine that you are working for a
Hollywood film studio, and your desk is piled high with screenplays. Rather than read each one cover-to-
cover, you decide to develop a decision tree algorithm to predict whether a potential movie would fall into
one of three categories: mainstream hit, critic's choice, or box office bust.

 To gather data for your model, you turn to the studio archives to examine the previous ten years of
movie releases.
 After reviewing the data for 30 different movie scripts, a pattern emerges.
 There seems to be a relationship between the film's proposed shooting budget, the number of A-list
celebrities lined up for starring roles, and the categories of success
A scatter plot of this data might look something like the following diagram:
To build a simple decision tree using this data, we can apply a divide-and-conquer strategy. Let's first split
the feature indicating the number of celebrities, partitioning the movies into groups with and without a low
number of A-list stars:
Next, among the group of movies with a larger number of celebrities, we can make another split between
movies with and without a high budget:

At this point we have partitioned the data into three


groups. The group at the top-left corner of the diagram is
composed entirely of critically-acclaimed films. This group
is distinguished by a high number of celebrities and a
relatively low budget. At the top-right corner, the majority
of movies are box office hits, with high budgets and a
large number of celebrities. The final group, which has
little star power but budgets ranging from small to large,
contains the flops.
Since real-world data contains
more than two features,
decision trees quickly become
far more complex than this,
with many more nodes,
branches, and leaves. In the
next section you will learn
about a popular algorithm for
building decision tree models
automatically
The C5.0 decision tree algorithm

Function C5.0()

This function applies the divide and conquer stategy onto the example set S to create a descion tree DT.

1.If all examples in S belong to the same class ‘c’, then:

return(a new leaf and label it with ‘c’

Else :

a) select an attribute A according to some impurity function.

b) generate a new node DT with A as a test.

c) For each value vi in A


i. let Si=all examples in S with A=Vi

//Note: there are other examples of S

//here A != Vi

ii) Use C5.0() to construct a decision tree DTi for example sets Si.

iii) Generate an edge that connects DT and DTi


The C5.0 decision tree algorithm

• There are numerous implementations of decision trees, but one of the most wellknown is the C5
algorithm.
• This algorithm was developed by computer scientist J. Ross Quinlan as an improved version of his prior

algorithm, C4.5, which itself is an improvement over his ID3 (Iterative Dichotomiser 3) algorithm
• The C5.0 algorithm has become the industry standard for producing decision trees, because it does well

for most types of problems directly out of the box.


•  Compared to more advanced and sophisticated machine learning models (e.g. Neural Networks and

Support Vector Machines), the decision trees under the C5.0 algorithm generally perform nearly as well

but are much easier to understand and deploy.


Choosing the best split

• The first main challenge that a decision tree will face is to identify which feature to split upon. If the

segments of the data contain only a single class, they are considered pure.

C5.0 uses the concept of entropy for measuring purity. The entropy of a sample of data indicates how
mixed the class values are; the minimum value of 0 indicates that the sample is completely
homogenous, while 1 indicates the maximum amount of disorder. The definition of entropy can be
specified as
For a given segment of data (S), the term c refers to the number of class levels and Pi refers to the
proportion of values falling into class level i. For example, suppose we have a partition of data with two
classes: red (60 percent) and white (40 percent). We can calculate the entropy as follows:

-0.60 * log2(0.60) - 0.40 * log2(0.40)

0.9709506

# Examine the entropy for all the possible two-class arrangements


curve(-x * log2(x) - (1 - x) * log2(1 - x), col = "red", xlab = "x", ylab = "Entropy", lwd = 4)
•A 50-50 split results in maximum entropy. As one class increasingly dominates the other, the entropy reduces to zero

Information gain to use entropy to determine the optimal feature to split upon. The information gain for a feature  F is
calculated as the difference between the entropy in the segment before the split (S1) and the partitions resulting from
the split (S2):

InfoGain(F)=Entropy(S1)−Entropy(S2)

One complication is that after a split, the data is divided into more than one partition. Therefore, the
function to calculate Entropy(S2) needs to consider the total entropy across all of the partitions. It does this
by weighing each partition’s entropy by the proportion of records falling into the partition.
This can be stated in a formula as:
In simple terms, the total entropy resulting from a split is the sum of entropy of each of the n partitions

weighted by the proportion of examples falling in that partition (wi ).

The total entropy resulting from a split is the sum of the entropy of each of the n partitions weighted by

the proportion of examples falling in the partition (wi).

The higher the information gain, the better a feature is at creating homogeneous groups after a split

on this feature. If the information gain is zero, there is no reduction in entropy for splitting on this

feature.
Pruning the Decision Tree

Decision trees can continue to grow indefinitely by choosing splitting features and dividing them into
smaller and smaller partitions until each example is perfectly classified, or the algorithm runs out of features
to split on. However, if the tree grows overly large, many of the decisions it makes will be overly specific and
the model will overfit the training data. To overcome this, we can prune a decision tree which involves a
process of reducing its size such that it generalizes better to unobserved data

One solution to this problem is to stop the tree from growing once it reaches a certain number of decisions,
or if the decision node contains only a small number of examples. This method is known as pre-pruning, or
early stopping the decision tree. As the tree avoids doing unnecessary work, this can quite often be an
appealing strategy.

One of the benefits of the C5.0 algorithm is that it is opinionated about pruning; it takes care of many of the
decisions automatically using fairly reasonable defaults. Its overall strategy is to postprune the tree. It does
this by first growing a large tree that overfits the training data. Aftewrwards, nodes and branches that have
little effect on the classification errors are removed.
One of the benefits of the C5.0 algorithm is that it is opinionated about pruning—it takes care of many of
the decisions, automatically using fairly reasonable defaults. Its overall strategy is to postprune the tree. It
first grows a large tree that overfits the training data. Later, nodes and branches that have little effect on
the classification errors are removed. In some cases, entire branches are moved further up the tree or
replaced by simpler decisions. These processes of grafting branches are known as subtree raising and
subtree replacement, respectively
Understanding classification rules

Classification rules represent knowledge in the form of logical if-else statements that assign a class to
unlabelled examples. They are specified in terms of an antecedent and a consequent; these form a
hypothesis stating that "if this happens, then that happens.

Rule learners are often used in a manner similar to decision tree learners. Like decision trees, they can be
used for applications that generate knowledge for future action, such as:
• Identifying conditions that lead to a hardware failure in mechanical devices
• Describing the defining characteristics of groups of people for customer segmentation
• Finding conditions that precede large drops or increases in the prices of shares on the stock market

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy