0% found this document useful (0 votes)
7 views

CE880_Lecture7_slides

Uploaded by

Anand A J
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

CE880_Lecture7_slides

Uploaded by

Anand A J
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

School of Computer Science and Electronics Engineering, University of Essex

ILecture 7: Decision Tree and Ensemble Learning


CE880: An Approachable Introduction to Data Science

Haider Raza
Tuesday, 28th February 2023

1
About Myself

I Name: Haider Raza


I Position: Senior Lecturer in AI
I Research interest: AI, Machine Learning, Data Science
I Contact: h.raza@essex.ac.uk
I Academic Support Hours: 12-1 on Friday via zoom. Zoom link is available on
Moodle
I Website: www.sagihaider.com

2
What we will be covering in this lecture

I Introduction to statistics and statistical inference


I An Introduction to Decision Tree
I Math behind Decision Tree
I Decision Tree Classifier
I Ensemble methods
I Bagging and Boosting
I Random forests

3
Decision Tree

Decision Trees (DTs) are a non-parametric supervised learning method used for
classification and regression. The goal is to create a model that predicts the value of
a target variable by learning simple decision rules inferred from the data features. A
tree can be seen as a piecewise constant approximation.

0
https://scikit-learn.org/

4
Let’s take an example to understand decision tree

5
Let’s take an example to understand decision tree

6
Let’s take an example to understand decision tree

7
Let’s Create a Decision Tree

8
Let’s Create a Decision Tree

9
Let’s Create a Decision Tree

10
Let’s Create a Decision Tree

11
Let’s Create a Decision Tree

12
Let’s Create a Decision Tree

13
Let’s Create a Decision Tree

14
Let’s Create a Decision Tree

15
Let’s Create a Decision Tree

16
Let’s Create a Decision Tree

17
Let’s Create a Decision Tree

18
Let’s Create a Decision Tree

19
Let’s Create a Decision Tree

20
Let’s Create a Decision Tree

21
Let’s Create a Decision Tree

22
Let’s Create a Decision Tree

23
Let’s Create a Decision Tree

24
Let’s Create a Decision Tree

25
Let’s Create a Decision Tree

26
Let’s Create a Decision Tree

27
Let’s Create a Decision Tree

28
Let’s Create a Decision Tree

29
Let’s Create a Decision Tree

30
Let’s Create a Decision Tree

31
Let’s Create a Decision Tree

32
Let’s Create a Decision Tree

33
Let’s Create a Decision Tree

34
Let’s Create a Decision Tree

35
Let’s Create a Decision Tree

36
Let’s Create a Decision Tree

37
Let’s Create a Decision Tree

38
Let’s Create a Decision Tree

39
Let’s Create a Decision Tree

40
Let’s Create a Decision Tree

41
Advantages of Decision Tree

I Simple of understand, interpret, and visualise


I Variable screening or feature selection
I Numerical and categorical data
I Little effort for data preparation
I Effective handling of non-linear data

42
Disadvantages of Decision Tree

I Prone to over-fitting
I Instability towards small data variation
I Unequal sample size can cause bias

43
Random Forest

44
Random Forest

Decision trees are good. They are good at the data that used to create them they are
not so good while working with new data samples.

45
Random Forest

Decision trees are good. They are good at the data that used to create them they are
not so good while working with new data samples.

This is due to the over-fitting


45
Random Forest

Let‘s combine the simplicity of decision trees and randomness to provide flexibility and
resolve the problem of over-fitting. Random forest is an ensemble machine learning
algorithm

46
Let’s create Random Forest

47
step 1: Create a ‘bootstrapped’ dataset

48
step 1: Create a ‘bootstrapped’ dataset

49
step 1: Create a ‘bootstrapped’ dataset

50
step 1: Create a ‘bootstrapped’ dataset

51
step 1: Create a ‘bootstrapped’ dataset

52
step 1: Create a ‘bootstrapped’ dataset

53
Step 2: Create a decision tree using the bootstrapped data

54
Step 2: Create a decision tree using the bootstrapped data

Use randomly selected subset of variables rather than all variables at each step

55
Step 2: Create a decision tree using the bootstrapped data

56
Step 2: Create a decision tree using the bootstrapped data

57
Step 2: Create a decision tree using the bootstrapped data

58
Step 2: Create a decision tree using the bootstrapped data

To create each decision tree follow the steps

I Using a bootstrapped dataset


I Considering random subset of features at each step

59
Build Random Forest

Now repeat all steps again to create several (usually 100s) of trees to build a Random
Forest

60
Build Random Forest

Hurray, we have built a random forest

Let’s see how well we have created them

61
How well we have created Random Forest?

Remember bootstrapping?

62
How well we have created Random Forest?

63
How well we have created Random Forest?

64
How well we have created Random Forest?

65
How well we have created Random Forest?

66
How well we have created Random Forest?

Finally, we find predicted outcomes for rest of the OOB samples.The proportion of the
Out-Of-Bag samples predicted incorrectly is = Out-Of-Bag Error

67
How to use Random Forest?

We have created a Random Forest, Tested it with out-of-bag (OOB) data

Finally, we find predicted outcomes for rest of the OOB samples.The proportion of the
Out-Of-Bag samples predicted incorrectly is = Out-Of-Bag Error
68
How to use Random Forest?

This is our task and we want to find the answer

69
How to use Random Forest?

This is our task and we want to find the answer

70
How to use Random Forest?

This is our task and we want to find the answer

71
How to use Random Forest?

This is our task and we want to find the answer

72
How to use Random Forest?

We have to pass this sample from the number of trees we have trained

73
How to use Random Forest?

We have to pass this sample from the number of trees we have trained

After testing on all the trees this is the answer

74
Advantages of Random Forest?

I Both classification and regression can be performed


I It can handle missing values as well
I Can be used for both categorical and numerical data
I Generally avoid over-fitting on large datasets
I It can handle large dimensional data

75
Disadvantages of Random Forest?

I Difficult to interpret because of complex structure


I Noisy data can lead to over-fitting
I For categorical data with number of levels random forest can become biased
I Large number of tress limit the real-time prediction capability
I It can handle large dimensional data

76
77

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy