CE880_Lecture7_slides
CE880_Lecture7_slides
Haider Raza
Tuesday, 28th February 2023
1
About Myself
2
What we will be covering in this lecture
3
Decision Tree
Decision Trees (DTs) are a non-parametric supervised learning method used for
classification and regression. The goal is to create a model that predicts the value of
a target variable by learning simple decision rules inferred from the data features. A
tree can be seen as a piecewise constant approximation.
0
https://scikit-learn.org/
4
Let’s take an example to understand decision tree
5
Let’s take an example to understand decision tree
6
Let’s take an example to understand decision tree
7
Let’s Create a Decision Tree
8
Let’s Create a Decision Tree
9
Let’s Create a Decision Tree
10
Let’s Create a Decision Tree
11
Let’s Create a Decision Tree
12
Let’s Create a Decision Tree
13
Let’s Create a Decision Tree
14
Let’s Create a Decision Tree
15
Let’s Create a Decision Tree
16
Let’s Create a Decision Tree
17
Let’s Create a Decision Tree
18
Let’s Create a Decision Tree
19
Let’s Create a Decision Tree
20
Let’s Create a Decision Tree
21
Let’s Create a Decision Tree
22
Let’s Create a Decision Tree
23
Let’s Create a Decision Tree
24
Let’s Create a Decision Tree
25
Let’s Create a Decision Tree
26
Let’s Create a Decision Tree
27
Let’s Create a Decision Tree
28
Let’s Create a Decision Tree
29
Let’s Create a Decision Tree
30
Let’s Create a Decision Tree
31
Let’s Create a Decision Tree
32
Let’s Create a Decision Tree
33
Let’s Create a Decision Tree
34
Let’s Create a Decision Tree
35
Let’s Create a Decision Tree
36
Let’s Create a Decision Tree
37
Let’s Create a Decision Tree
38
Let’s Create a Decision Tree
39
Let’s Create a Decision Tree
40
Let’s Create a Decision Tree
41
Advantages of Decision Tree
42
Disadvantages of Decision Tree
I Prone to over-fitting
I Instability towards small data variation
I Unequal sample size can cause bias
43
Random Forest
44
Random Forest
Decision trees are good. They are good at the data that used to create them they are
not so good while working with new data samples.
45
Random Forest
Decision trees are good. They are good at the data that used to create them they are
not so good while working with new data samples.
Let‘s combine the simplicity of decision trees and randomness to provide flexibility and
resolve the problem of over-fitting. Random forest is an ensemble machine learning
algorithm
46
Let’s create Random Forest
47
step 1: Create a ‘bootstrapped’ dataset
48
step 1: Create a ‘bootstrapped’ dataset
49
step 1: Create a ‘bootstrapped’ dataset
50
step 1: Create a ‘bootstrapped’ dataset
51
step 1: Create a ‘bootstrapped’ dataset
52
step 1: Create a ‘bootstrapped’ dataset
53
Step 2: Create a decision tree using the bootstrapped data
54
Step 2: Create a decision tree using the bootstrapped data
Use randomly selected subset of variables rather than all variables at each step
55
Step 2: Create a decision tree using the bootstrapped data
56
Step 2: Create a decision tree using the bootstrapped data
57
Step 2: Create a decision tree using the bootstrapped data
58
Step 2: Create a decision tree using the bootstrapped data
59
Build Random Forest
Now repeat all steps again to create several (usually 100s) of trees to build a Random
Forest
60
Build Random Forest
61
How well we have created Random Forest?
Remember bootstrapping?
62
How well we have created Random Forest?
63
How well we have created Random Forest?
64
How well we have created Random Forest?
65
How well we have created Random Forest?
66
How well we have created Random Forest?
Finally, we find predicted outcomes for rest of the OOB samples.The proportion of the
Out-Of-Bag samples predicted incorrectly is = Out-Of-Bag Error
67
How to use Random Forest?
Finally, we find predicted outcomes for rest of the OOB samples.The proportion of the
Out-Of-Bag samples predicted incorrectly is = Out-Of-Bag Error
68
How to use Random Forest?
69
How to use Random Forest?
70
How to use Random Forest?
71
How to use Random Forest?
72
How to use Random Forest?
We have to pass this sample from the number of trees we have trained
73
How to use Random Forest?
We have to pass this sample from the number of trees we have trained
74
Advantages of Random Forest?
75
Disadvantages of Random Forest?
76
77