0% found this document useful (0 votes)

85 views

Predictive Modeling Week3

(1) Decision trees are a popular machine learning algorithm that can be used for classification and prediction problems. (2) A decision tree uses a tree-like model to represent rules that classify examples based on attribute values. (3) It builds classification or regression models in the form of a tree structure, where each node represents a test on an attribute value.

Uploaded by

Kunwar Rawat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

85 views

Predictive Modeling Week3

Uploaded by

Kunwar Rawat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 68

Predictive Modeling

week 3
Decision Tree Algorithm
The problem

• Given a set of training cases/objects and their attribute

values, try to determine the target attribute value of new
examples.

– Classification
– Prediction
Why decision tree?

• Decision trees are powerful and popular tools for

classification and prediction.
• Decision trees represent rules, which can be understood
by humans and used in knowledge system such as
database.
key requirements
• Attribute-value description: object or case must be expressible
in terms of a fixed collection of properties or attributes (e.g., hot,
mild, cold).
• Predefined classes (target values): the target function has
discrete output values (bollean or multiclass)
• Sufficient data: enough training cases should be provided to learn
the model.
A simple example

• You want to guess the outcome of next week's game

between the MallRats and the Chinooks.

• Available knowledge / Attribute

– was the game at Home or Away
– was the starting time 5pm, 7pm or 9pm.
– Did Joe play center, or forward.
– whether that opponent's center was tall or not.
– …..
Basket ball data
What we know

• The game will be away, at 9pm, and that Joe will play
center on offense…

• A classification problem
• Generalizing the learned rule to new examples
Definition

 Decision tree is a classifier in the form of a tree structure

 Decision node: specifies a test on a single attribute
 Leaf node: indicates the value of the target attribute
 Arc/edge: split of one attribute
 Path: a disjunction of test to make the final decision

 Decision trees classify instances or examples by starting

at the root of the tree and moving through it until a leaf
node.
Illustration

(1) Which to start? (root)

(2) Which node to proceed?

(3) When to stop/ come to conclusion?

Random split

• The tree can grow huge

• These trees are hard to understand.
• Larger trees are typically less accurate than smaller trees.
Principled Criterion
• Selection of an attribute to test at each node - choosing the
most useful attribute for classifying examples.
• information gain
– measures how well a given attribute separates the training
examples according to their target classification
– This measure is used to select among the candidate attributes at
each step while growing the tree
Entropy
• A measure of homogeneity of the set of examples.

• Given a set S of positive and negative examples of some target

concept (a 2-class problem), the entropy of set S relative to
this binary classification is

E(S) = - p(P)log2 p(P) – p(N)log2 p(N)

• Suppose S has 25 examples, 15 positive and 10 negatives [15+,
10-]. Then the entropy of S relative to this classification is

E(S)=-(15/25) log2(15/25) - (10/25) log2 (10/25)

Some Intuitions

• The entropy is 0 if the outcome is ``certain’’.

• The entropy is maximum if we have no knowledge of
the system (or any outcome is equally possible).

Entropy of a 2-class problem

with regard to the portion of
one of the two groups
Information Gain

• Information gain measures the expected reduction in

entropy, or uncertainty.
Sv
Gain( S , A)  Entropy ( S )  
vValues ( A ) S
Entropy (S v )

– Values(A) is the set of all possible values for attribute A, and

Sv the subset of S for which attribute A has value v Sv = {s in
S | A(s) = v}.
– the first term in the equation for Gain is just the entropy of
the original collection S
– the second term is the expected value of the entropy after
S is partitioned using attribute A
• It is simply the expected reduction in entropy
caused by partitioning the examples according
to this attribute.
• It is the number of bits saved when encoding
the target value of an arbitrary member of S,
by knowing the value of attribute A.
Examples

• Before partitioning, the entropy is

– H(10/20, 10/20) = - 10/20 log(10/20) - 10/20 log(10/20) = 1
• Using the ``where’’ attribute, divide into 2 subsets
– Entropy of the first set H(home) = - 6/12 log(6/12) - 6/12 log(6/12) = 1
– Entropy of the second set H(away) = - 4/8 log(6/8) - 4/8 log(4/8) = 1
• Expected entropy after partitioning
– 12/20 * H(home) + 8/20 * H(away) = 1
Decision Tree Example
40 % Chance of a Good Economy
.4 Profit = $6M
Expand Factory
Cost = $1.5 M
.6 60% Chance Bad Economy
Profit = $2M

Good Economy (40%)

.4 Profit = $3M
Don’t Expand Factory
Cost = $0 .6 Bad Economy (60%)
Profit = $1M

NPVExpand = (.4(6) + .6(2)) – 1.5 = $2.1M

NPVNo Expand = .4(3) + .6(1) = $1.8M

$2.1 > 1.8, therefore you should expand the factory

S.no Outlook Temperature Humidity Wind Play

1 Sunny Hot High Weak No

2 Sunny Hot High Strong No

3 Overcast Hot High Weak Yes

4 Rain Mild High Weak Yes

5 Rain Cool Normal Weak Yes

6 Rain Cool Normal Strong No

7 Overcast Cool Normal Strong Yes

8 Sunny Mild High Weak No

9 Sunny Cool Normal Weak Yes

10 Rain Mild Normal Weak Yes

11 Sunny Mild Normal Strong Yes

12 Overcast Mild High Strong Yes

13 Overcast Hot Normal Weak Yes

14 Rain Mild High Strong No

Data Mining and Predictive Analytics, By

Daniel Larose and Chantal Larose John 20
Wiley & Sons, Inc, Hoboken, NJ, 2015.
Data Mining and Predictive Analytics, By
Daniel Larose and Chantal Larose John 21
Wiley & Sons, Inc, Hoboken, NJ, 2015.
• Using the ``when’’ attribute, divide into 3 subsets
– Entropy of the first set H(5pm) = - 1/4 log(1/4) - 3/4 log(3/4);
– Entropy of the second set H(7pm) = - 9/12 log(9/12) - 3/12 log(3/12);
– Entropy of the second set H(9pm) = - 0/4 log(0/4) - 4/4 log(4/4) = 0
• Expected entropy after partitioning
– 4/20 * H(1/4, 3/4) + 12/20 * H(9/12, 3/12) + 4/20 * H(0/4, 4/4) = 0.65
• Information gain 1-0.65 = 0.35
Decision

• Knowing the ``when’’ attribute values provides larger

information gain than ``where’’.
• Therefore the ``when’’ attribute should be chosen for
testing prior to the ``where’’ attribute.
• Similarly, we can compute the information gain for
other attributes.
• At each node, choose the attribute with the largest
information gain.
• Stopping rule
– Every attribute has already been included along this path
through the tree, or
– The training examples associated with this leaf node all have the
same target attribute value (i.e., their entropy is zero).

Demo
Continuous Attribute?

• Each non-leaf node is a test, its edge partitioning the

attribute into subsets (easy for discrete attribute).
• For continuous attribute
– Partition the continuous value of attribute A into a discrete
set of intervals
– Create a new boolean attribute Ac , looking for a threshold
c,
true if Ac  c
Ac  
 false otherwise
How to choose
c?
Evaluation
• Training accuracy
– How many training instances can be correctly classify based on the
available data?
– Is high when the tree is deep/large, or when there is less confliction in
the training instances.
– however, higher training accuracy does not mean good generalization
• Testing accuracy
– Given a number of new instances, how many of them can we correctly
classify?
– Cross validation
What is forecasting?

Forecasting is a tool used for predicting

future demand based on
past demand information.
Why is forecasting important?

Demand for products and services is usually uncertain.

Forecasting can be used for…
• Strategic planning (long range planning)
• Finance and accounting (budgets and cost controls)
• Marketing (future sales, new products)
• Production and operations
What is forecasting all about?

Demand for Mercedes E Class We try to predict the future

by looking back at the past

Predicted
demand
looking back
Time six months
Jan Feb Mar Apr May Jun Jul Aug

Actual demand (past sales)

Predicted demand
Some general characteristics of forecasts

• Forecasts are always wrong

• Forecasts are more accurate for groups or families of
items
• Forecasts are more accurate for shorter time periods
• Every forecast should include an error estimate
• Forecasts are no substitute for calculated demand.
Key issues in forecasting

1. A forecast is only as good as the information included in the

forecast (past data)
2. History is not a perfect predictor of the future (i.e.: there is
no such thing as a perfect forecast)

REMEMBER: Forecasting is based on the assumption that the

past predicts the future! When forecasting, think carefully
whether or not the past is strongly related to what you expect
to see in the future…
Example: Mercedes E-class vs. M-class Sales
Month E-class Sales M-class Sales
Jan 23,345 -
Feb 22,034 -
Mar 21,453 -
Apr 24,897 -
May 23,561 -
Jun 22,684 -
Jul ? ?

Question: Can we predict the new model M-class sales based on the
data in the the table?

Answer: Maybe... We need to consider how much the two markets

have in common
What should we consider when looking at
past demand data?

• Trends

• Seasonality

• Cyclical elements
Some Important Questions

• What is the purpose of the forecast?

• Which systems will use the forecast?
• How important is the past in estimating the future?

Answers will help determine time horizons, techniques,

and level of detail for the forecast.
Simple Linear Regression

35
Introduction
• In this chapter we employ Regression Analysis
to examine the relationship among
quantitative variables.
• The technique is used to predict the value of
one variable (the dependent variable - y)based
on the value of other variables (independent
variables x1, x2,…xk.)

36
The Model
• The first order linear model

y   0  1x  
y = dependent variable b0 and b1 are unknown,
x = independent variable y therefore, are estimated
from the data.
b0 = y-intercept
b1 = slope of the line
Rise b1 = Rise/Run
= error variable b0 Run
 x
37
Estimating the Coefficients

• The estimates are determined by

– drawing a sample from the population of interest,
– calculating sample statistics.
– producing a straight line that cuts into the data.
y w The question is:
w Which straight line fits best?
w
w
w w w ww
w w w w w
w
x 38
The best line is the one that minimizes
the sum of squared vertical differences
between the points and the line.

Sum of squared differences = (2 - 1)2 + (4 - 2)2 + (1.5 - 3)2 + (3.2 - 4)2 = 6.89
Sum of squared differences = (2 -2.5)2 + (4 - 2.5)2 + (1.5 - 2.5)2 + (3.2 - 2.5)2 = 3.99
Let us compare two lines
4 (2,4)
w The second line is horizontal
3 w (4,3.2)
2.5
2
(1,2) w The smaller the sum of
w (3,1.5)
1 squared differences
the better the fit of the
1 2 3 4
line to the data.
39
To calculate the estimates of the coefficients The regression equation that estimates
that minimize the differences between the data the equation of the first order linear model
points and the line, use the formulas: is:

cov( X , Y )
b1 
s 2x ŷ  b 0  b1x
b 0  y  b1 x

40
• Example Relationship between odometer
reading and a used car’s selling price.

– A car dealer wants to find Car Odometer Price

the relationship between 1 37388 5318
2 44758 5061
the odometer reading and
3 45833 5008
the selling price of used cars. 4 30862 5795
– A random sample of 100 cars 5 31705 5784
6 34010 5359
is selected, and the data
. . .
recorded. . . .
– Find the regression line. . . .

Independent variable x
Dependent variable y
41
• Solution
– Solving by hand
• To calculate b0 and b1 we need to calculate several
statistics first;
x  36,009.45; s 2x
2

 (x i  x)
 43,528,688
n 1

y  5,411.41; cov( X , Y ) 
 ( x  x )(y
i i  y)
 1,356,256
n 1
where n = 100.

cov( X, Y) 1,356,256
b1    .0312
s 2x 43,528,688
b 0  y  b1x  5411.41  ( .0312)(36,009.45)  6,533
ŷ  b 0  b1x  6,533  .0312 x
42
– Using the computer (see file Xm17-01.xls)
Tools > Data analysis > Regression > [Shade the y range and the x range] > OK
Odometer Price SUMMARY OUTPUT
37388 5318
6000
44758 5061 Regression Statistics 5500

Price
45833 5008 Multiple R 0.806308 5000

30862 5795 R Square 0.650132 4500

19000 29000 39000 49000
31705 5784 Adjusted R Square
0.646562 Odometer
34010 5359 Standard Error
151.5688
45854 5235 Observations 100
19057 5845 ŷ  6,533  .0312 x
40149 5536 ANOVA
40237 5401 df SS MS
32359 5595 Regression 1 4183528 4183528
43533 5330 Residual 98 2251362 22973.09
32744 5806 Total 99 6434890
34470 5805
37720 5317 CoefficientsStandard Error t Stat
41350 5316 Intercept 6533.383 84.51232 77.30687
24469 5870 Odometer -0.03116 0.002309 -13.4947
43
6533
6000
5500

Price
5000
4500
0 No data 19000 29000 39000 49000
Odometer

ŷ  6,533  .0312 x

The intercept is b0 = 6533. This is the slope of the line.

For each additional mile on the odometer,
the price decreases by an average of $0.0312
Do not interpret the intercept as the
“Price of cars that have not been driven”

44
• Sum of squares for errors
– This is the sum of differences between the
points and the regression line.
– It can serve as a measure
n
of how well the line
fits the data. SSE   i i .
( y 
i 1
ŷ ) 2

cov( X , Y )
SSE  (n  1)s 2Y 
s 2x

– This statistic plays a role in every statistical

technique we employ to assess the model.
45
• Standard error of estimate

– The mean error is equal to zero.

– If se is small the errors tend to be close to zero
(close to the mean error). Then, the model fits
the data well.
– Therefore, we can, use se as a measure of the
suitability of using a linear model.
– An unbiased estimator of se2 is given by se2

S tan dard Error of Estimate

SSE
s 
n2
46
• Example
– Calculate the standard error of estimate for
example , and describe what does it tell you
about the model fit?
• Solution
s 
2  i i
( y  ˆ
y ) 2


6,434,890
 64,999 Calculated before
Y
n 1 99
cov( X , Y ) ( 1,356 , 256 ) 2
SSE  (n  1) sY2  2
 99(64,999)   2,252,3
sx 43,528,688
Thus , It is hard to assess the model based
SSE 2,251,363 on se even when compared with the
s   151.6 mean value of y.
n2 98 s   151.6, y  5,411.4 47
• Testing the slope

– When no linear relationship exists between two

variables, the regression line should be horizontal.

q q
q
q q q
q q q
q q q q q
q q q
qq qq
q qq
q q q q q q
q q q qq q q q
q q q q
q qq qq qq qq qq q q q q q q q q qq q q
qq q q q q q qq q
qq qq q q q q q q q qq q q q q
q qq q q qq q q qq q q qq q qq q

Linear relationship. No linear relationship.

Different inputs (x) yield Different inputs (x) yield
different outputs (y). the same output (y).

The slope is not equal to zero The slope is equal to zero

48
• Coefficient of determination
– When we want to measure the strength of the linear
relationship, we use the coefficient of determination.

2
2 [cov( X , Y )] 2 SSE
R  or R  1 
2 2
sx sy (y i  y) 2

49
– To understand the significance of this coefficient
note:

part by The regression model

la in ed in
Ex p
Overall variability in y R
emain
s , in p a
r t, u n
expla
ined The error

50
Two data points (x1,y1) and (x2,y2) of a certain sample are shown.

x1 x2
Total variation in y = Variation explained by the + Unexplained variation (error)
regression line)
(y1  y )2  (y 2  y)2  ( ŷ 1  y ) 2  ( ŷ 2  y ) 2  ( y 1  ŷ 1 ) 2  ( y 2  ŷ 2 ) 2
51
Variation in y = SSR + SSE

• R2 measures the proportion of the variation

in y that is explained by the variation in x.

2
R  1
SSE

 ( y i  y ) 2  SSE

SSR
 (y i  y ) 2
 (y  y)
i
2
 (y i  y ) 2

• R2 takes on any value between zero and one.

R2 = 1: Perfect match between the line and the data points.
R2 = 0: There are no linear relationship between x and y.
52
– Find the coefficient of determination for example ;
what does this statistic tell you about the model?
• Solution
– Solving by hand; R  2 [cov( X , Y )]2 [ 1,356 ,256 ]2
 ( 43 ,528 ,688 )( 64 ,999 )
 .6501
s 2x s 2y
– Using the computer
• From the regression output we have
Regression Statistics
Multiple R 0.8063 65% of the variation in the auction
R Square 0.6501 selling price is explained by the
Adjusted R Square 0.6466 variation in odometer reading. The
rest (35%) remains unexplained by
Standard Error 151.57
this model.
Observations 100
53
Coefficient of correlation

• The coefficient of correlation is used to measure

the strength of association between two variables.
• The coefficient values range between -1 and 1.
– If r = -1 (negative association) or r = +1 (positive
association) every point falls on the regression line.
– If r = 0 there is no linear pattern.
• The coefficient can be used to test for linear
relationship between two variables.

54
Regression Diagnostics - I
• The three conditions required for the validity
of the regression analysis are:
– the error variable is normally distributed.
– the error variance is constant for all values of x.
– The errors are independent of each other.
• How can we diagnose violations of these
conditions?

55
• Residual Analysis
– Examining the residuals (or standardized
residuals), we can identify violations of the
required conditions
– Example - continued
• Nonnormality.
– Use Excel to obtain the standardized residual histogram.
– Examine the histogram and look for a bell shaped diagram
with mean close to zero.

56
RESIDUAL OUTPUT A Partial list of For each residual we calculate
Standard residuals the standard deviation as follows:
Observation Residuals Standard Residuals
1 -50.45749927 -0.334595895 sri  s  1  hi where
2 -77.82496482 -0.516076186
3 -97.33039568 -0.645421421 1 ( xi  x)2
hi  
4
5
223.2070978
238.4730715
1.480140312
1.58137268
n
 ( x j  x)2

Standardized residual i =
Residual i / Standard deviation
40

20
We can also apply the Lilliefors test
10 or the c2 test of normality.
0
-2.5 -1.5 -0.5 0.5 1.5 2.5 More

57
• Nonindependence of error variables
– A time series is constituted if data were collected
over time.
– Examining the residuals over time, no pattern
should be observed if the errors are independent.
– When a pattern is detected, the errors are said to
be autocorrelated.
– Autocorrelation can be detected by graphing the
residuals against time.

58
• Outliers
– An outlier is an observation that is unusually small
or large.
– Several possibilities need to be investigated when
an outlier is observed:
• There was an error in recording the value.
• The point does not belong in the sample.
• The observation is valid.
– Identify outliers from the scatter diagram.
– It is customary to suspect an observation is an
outlier if its |standard residual| > 2
59
An outlier An influential observation

+++++++++++
+ +
+ … but, some outliers
+ +
+ +
may be very influential
+
+ + + +
+
+ +
+

The outlier causes a shift

in the regression line

60
• Procedure for regression diagnostics
– Develop a model that has a theoretical basis.
– Gather data for the two variables in the model.
– Draw the scatter diagram to determine whether a
linear model appears to be appropriate.
– Check the required conditions for the errors.
– Assess the model fit.
– If the model fits the data, use the regression
equation.

61
The Bayes Classifier
• Use Bayes Rule!

Likelihood Prior

Normalization Constant
• Why did this help? Well, we think that we might be able to
specify how features are “generated” by the class label
Another Example of the Naïve Bayes
Classifier
The weather data, with counts and probabilities
outlook temperature humidity windy play
yes no yes no yes no yes no yes no

sunny 2 3 hot 2 2 high 3 4 false 6 2 9 5

overcast 4 0 mild 4 2 normal 6 1 true 3 3
rainy 3 2 cool 3 1
sunny 2/9 3/5 hot 2/9 2/5 high 3/9 4/5 false 6/9 2/5 9/14 5/14
overcast 4/9 0/5 mild 4/9 2/5 normal 6/9 1/5 true 3/9 3/5
rainy 3/9 2/5 cool 3/9 1/5

A new day
outlook temperature humidity windy play
sunny cool high true ?
• Likelihood of yes
2 3 3 3 9
      0.0053
9 9 9 9 14
• Likelihood of no
3 1 4 3 5
      0.0206
5 5 5 5 14
• Therefore, the prediction is No
The Naive Bayes Classifier for Data Sets
with Numerical Attribute Values

• One common practice to handle numerical

attribute values is to assume normal
distributions for numerical attributes.
The numeric weather data with summary statistics
outlook temperature humidity windy play
yes no yes no yes no yes no yes no

sunny 2 3 83 85 86 85 false 6 2 9 5
overcast 4 0 70 80 96 90 true 3 3
rainy 3 2 68 65 80 70
64 72 65 95
69 71 70 91
75 80
75 70
72 90
81 75
sunny 2/9 3/5 mean 73 74.6 mean 79.1 86.2 false 6/9 2/5 9/14 5/14
overcast 4/9 0/5 std 6.2 7.9 std 10.2 9.7 true 3/9 3/5
dev dev
rainy 3/9 2/5
• Let x1, x2, …, xn be the values of a numerical attribute
in the training data set.
1 n
   xi
n i 1
1 n
   xi    2

n  1 i 1
 w  2
1 
f ( w)  e 2

2 
• For examples,
 6 6 73 2
1 
f  temperature  66 | Yes   e 2  6.2  2
 0.0340
2  6.2 
2 3 9
• Likelihood of Yes =  0.0340 0.0221   0.000036
9 9 14

3 3 5
• Likelihood of No =  0.0291 0.038   0.000136
5 5 14

Axioma - ModelUpdate WW4 Jan2018 2
No ratings yet
Axioma - ModelUpdate WW4 Jan2018 2
34 pages
Com747-Statistical Modelling and Data Mining: Data Model On Suicide Rates
No ratings yet
Com747-Statistical Modelling and Data Mining: Data Model On Suicide Rates
28 pages
Cheat Sheet - BT1101
100% (2)
Cheat Sheet - BT1101
29 pages
Random Forest Regression
No ratings yet
Random Forest Regression
57 pages
Chapter 3 Decision Trees
No ratings yet
Chapter 3 Decision Trees
61 pages
Classification With Decision Trees I: Instructor: Qiang Yang
No ratings yet
Classification With Decision Trees I: Instructor: Qiang Yang
29 pages
6CS4-02 Machine Learning Manish Bhardwaj
No ratings yet
6CS4-02 Machine Learning Manish Bhardwaj
625 pages
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
No ratings yet
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
25 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
DMDW-CO3-SESSION-14
No ratings yet
DMDW-CO3-SESSION-14
55 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
PPT4 TOPIK4 R0 Predictive Modeling
No ratings yet
PPT4 TOPIK4 R0 Predictive Modeling
35 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
Lecture 06 Part A - Macine Learning
No ratings yet
Lecture 06 Part A - Macine Learning
77 pages
7-Decision Trees Learning
No ratings yet
7-Decision Trees Learning
51 pages
Classification DecisionTreesNaiveBayeskNN
No ratings yet
Classification DecisionTreesNaiveBayeskNN
75 pages
2.3 Decision-Tree-Algorithm
No ratings yet
2.3 Decision-Tree-Algorithm
61 pages
Decision Tree
No ratings yet
Decision Tree
25 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
No ratings yet
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
25 pages
06-Classification_Part1
No ratings yet
06-Classification_Part1
44 pages
DWM Unit-III
No ratings yet
DWM Unit-III
24 pages
L05 - Advance Analytical Theory and Methods - Classification
No ratings yet
L05 - Advance Analytical Theory and Methods - Classification
34 pages
Decision Trees-Lecture 9&10
No ratings yet
Decision Trees-Lecture 9&10
60 pages
Decision Tree
No ratings yet
Decision Tree
12 pages
Chapter 5 2018 2019
No ratings yet
Chapter 5 2018 2019
5 pages
ML Unit 2
No ratings yet
ML Unit 2
84 pages
2 ML Ch3 Decision Trees Final
No ratings yet
2 ML Ch3 Decision Trees Final
70 pages
Classification and Prediction
No ratings yet
Classification and Prediction
143 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
7. Decision Tree & Random Forest
No ratings yet
7. Decision Tree & Random Forest
41 pages
Unit 4 Classification (1) (P)
No ratings yet
Unit 4 Classification (1) (P)
50 pages
Module 3
No ratings yet
Module 3
101 pages
L3 - Decision Trees
No ratings yet
L3 - Decision Trees
28 pages
Lecture 04 Decession Trees 04112022 015118pm
No ratings yet
Lecture 04 Decession Trees 04112022 015118pm
43 pages
DM UNIT III (1)
No ratings yet
DM UNIT III (1)
87 pages
3 - Intro To Predictive Modeling
No ratings yet
3 - Intro To Predictive Modeling
40 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
Data Mining Unit 3
No ratings yet
Data Mining Unit 3
50 pages
Chapter 4 (2)
No ratings yet
Chapter 4 (2)
103 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
2.decision Tree
No ratings yet
2.decision Tree
56 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
Module 3
No ratings yet
Module 3
102 pages
Module 3 Chap 3 Decision Tree Learning
No ratings yet
Module 3 Chap 3 Decision Tree Learning
79 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
Presentation Report S2019 Artificial Intelligence-CS360
No ratings yet
Presentation Report S2019 Artificial Intelligence-CS360
9 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
20210913115613D3708 - Session 05-08 Decision Tree Classification
No ratings yet
20210913115613D3708 - Session 05-08 Decision Tree Classification
37 pages
Class i Fiers
No ratings yet
Class i Fiers
24 pages
New Module 3 Part1
No ratings yet
New Module 3 Part1
69 pages
9-Module 5 Decision Tree-21-03-2024
No ratings yet
9-Module 5 Decision Tree-21-03-2024
83 pages
19 -- Decision Tree -- ID3
No ratings yet
19 -- Decision Tree -- ID3
87 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
Classification
No ratings yet
Classification
33 pages
Supervised Learning Algorithm
No ratings yet
Supervised Learning Algorithm
59 pages
Unit-6: Classification and Prediction
No ratings yet
Unit-6: Classification and Prediction
63 pages
GCSE Maths Revision: Cheeky Revision Shortcuts
From Everand
GCSE Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (2)
Functions and Probability for Sixth Graders
From Everand
Functions and Probability for Sixth Graders
Home School Brew
No ratings yet
Yara International School - Riyadh: Date: 17 April 2024 Circular No: 003/YIS/AS/24-25/003 Dear Parents
No ratings yet
Yara International School - Riyadh: Date: 17 April 2024 Circular No: 003/YIS/AS/24-25/003 Dear Parents
1 page
IMT-64 - Old Assignment
No ratings yet
IMT-64 - Old Assignment
4 pages
MCQ From Predictive Analytics
No ratings yet
MCQ From Predictive Analytics
10 pages
CVRU MBA Details
No ratings yet
CVRU MBA Details
90 pages
Dr. C. V. Raman University: Diploma in Computer Application (DCA) 4800 3500
No ratings yet
Dr. C. V. Raman University: Diploma in Computer Application (DCA) 4800 3500
10 pages
Applied Predictive Analytics - Caselets
No ratings yet
Applied Predictive Analytics - Caselets
9 pages
Institution Name: Sensitivity: Internal Restricted
No ratings yet
Institution Name: Sensitivity: Internal Restricted
7 pages
Welcome, Kunwar Pal Singh Rawat!: Sorry!
No ratings yet
Welcome, Kunwar Pal Singh Rawat!: Sorry!
2 pages
Interaction ID Type Resul T Response: Return To Progress Summary
No ratings yet
Interaction ID Type Resul T Response: Return To Progress Summary
2 pages
435 - Problem Set 1 (Solution)
No ratings yet
435 - Problem Set 1 (Solution)
9 pages
Optimization of Vane Demister Based On Neural Netw PDF
No ratings yet
Optimization of Vane Demister Based On Neural Netw PDF
12 pages
Satisfaction, Loyalty, and Repurchase: A Study of Norwegian Customers of Furniture and Grocery Stores
No ratings yet
Satisfaction, Loyalty, and Repurchase: A Study of Norwegian Customers of Furniture and Grocery Stores
13 pages
Constructing The Basic House of Quality
No ratings yet
Constructing The Basic House of Quality
10 pages
Psychology As A Social Science
No ratings yet
Psychology As A Social Science
607 pages
Msqe Mqek Mqed Pea 2014
No ratings yet
Msqe Mqek Mqed Pea 2014
7 pages
May D. Segletes D. and Gordon A. P. 2013 The Application of The Norton Bailey Law For Creep Prediction Through Power Law Regression
No ratings yet
May D. Segletes D. and Gordon A. P. 2013 The Application of The Norton Bailey Law For Creep Prediction Through Power Law Regression
8 pages
Surveying 1 PDF
No ratings yet
Surveying 1 PDF
5 pages
British Journal of Guidance & Counselling
No ratings yet
British Journal of Guidance & Counselling
14 pages
Varlist Exp: Alphabetical List of Common Stata Commands
No ratings yet
Varlist Exp: Alphabetical List of Common Stata Commands
3 pages
Defond
No ratings yet
Defond
17 pages
Packaging and Labeling
No ratings yet
Packaging and Labeling
50 pages
MA6453-Probability and Queuing Theory PQT IMPORTANT QUESTIONS
0% (1)
MA6453-Probability and Queuing Theory PQT IMPORTANT QUESTIONS
27 pages
K - PPT - Simple Regression and Correlation
No ratings yet
K - PPT - Simple Regression and Correlation
21 pages
Estimation of Concrete Paving Construction Productivity Using Discrete Event Simulation
No ratings yet
Estimation of Concrete Paving Construction Productivity Using Discrete Event Simulation
14 pages
Assi Ginment Regression
100% (1)
Assi Ginment Regression
2 pages
Business Volatility and Employee Performance: Wen-Chyuan Chiang and Li Sun Brian R. Walkup
No ratings yet
Business Volatility and Employee Performance: Wen-Chyuan Chiang and Li Sun Brian R. Walkup
24 pages
Quantitative Methods TB
No ratings yet
Quantitative Methods TB
441 pages
Correlation & Simple Regression
No ratings yet
Correlation & Simple Regression
15 pages
A New Point Estimate Method For Probabilistic Load Flow With Correlated Variables Including Wind Farms
No ratings yet
A New Point Estimate Method For Probabilistic Load Flow With Correlated Variables Including Wind Farms
5 pages
The Impact of Study Habits On The Academ
No ratings yet
The Impact of Study Habits On The Academ
13 pages
Expected Values, Covariance, Correlation and Expected Values
No ratings yet
Expected Values, Covariance, Correlation and Expected Values
25 pages
A Note On Using Stratified Alpha To Estimate The Composite Reliability of A Test Composed of Interrelated Nonhomogeneous Items
No ratings yet
A Note On Using Stratified Alpha To Estimate The Composite Reliability of A Test Composed of Interrelated Nonhomogeneous Items
8 pages
Accident Analysis and Prevention: Farhana Naznin, Graham Currie, David Logan, Majid Sarvi
No ratings yet
Accident Analysis and Prevention: Farhana Naznin, Graham Currie, David Logan, Majid Sarvi
7 pages
Automated MASI - Pilot Study
No ratings yet
Automated MASI - Pilot Study
13 pages
Evaluating The Relationship of The Daily Income and Expenses of Tricycle Drivers in Angeles City
100% (2)
Evaluating The Relationship of The Daily Income and Expenses of Tricycle Drivers in Angeles City
6 pages
Cross Cultural Research Using The TEMAS Test With Hispanic Children
100% (1)
Cross Cultural Research Using The TEMAS Test With Hispanic Children
226 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Predictive Modeling Week3

Uploaded by

Predictive Modeling Week3

Uploaded by

Predictive Modeling

• Given a set of training cases/objects and their attribute

• Decision trees are powerful and popular tools for

• You want to guess the outcome of next week's game

• Available knowledge / Attribute

 Decision tree is a classifier in the form of a tree structure

 Decision trees classify instances or examples by starting

(1) Which to start? (root)

(2) Which node to proceed?

(3) When to stop/ come to conclusion?

• The tree can grow huge

• Given a set S of positive and negative examples of some target

E(S) = - p(P)log2 p(P) – p(N)log2 p(N)

E(S)=-(15/25) log2(15/25) - (10/25) log2 (10/25)

• The entropy is 0 if the outcome is ``certain’’.

Entropy of a 2-class problem

• Information gain measures the expected reduction in

– Values(A) is the set of all possible values for attribute A, and

• Before partitioning, the entropy is

Good Economy (40%)

NPVExpand = (.4(6) + .6(2)) – 1.5 = $2.1M

NPVNo Expand = .4(3) + .6(1) = $1.8M

$2.1 > 1.8, therefore you should expand the factory

1 Sunny Hot High Weak No

2 Sunny Hot High Strong No

3 Overcast Hot High Weak Yes

4 Rain Mild High Weak Yes

5 Rain Cool Normal Weak Yes

6 Rain Cool Normal Strong No

7 Overcast Cool Normal Strong Yes

8 Sunny Mild High Weak No

9 Sunny Cool Normal Weak Yes

10 Rain Mild Normal Weak Yes

11 Sunny Mild Normal Strong Yes

12 Overcast Mild High Strong Yes

13 Overcast Hot Normal Weak Yes

14 Rain Mild High Strong No

Data Mining and Predictive Analytics, By

• Knowing the ``when’’ attribute values provides larger

• Each non-leaf node is a test, its edge partitioning the

Forecasting is a tool used for predicting

Demand for products and services is usually uncertain.

Demand for Mercedes E Class We try to predict the future

Actual demand (past sales)

• Forecasts are always wrong

1. A forecast is only as good as the information included in the

REMEMBER: Forecasting is based on the assumption that the

Answer: Maybe... We need to consider how much the two markets

• What is the purpose of the forecast?

Answers will help determine time horizons, techniques,

• The estimates are determined by

– A car dealer wants to find Car Odometer Price

30862 5795 R Square 0.650132 4500

The intercept is b0 = 6533. This is the slope of the line.

– This statistic plays a role in every statistical

– The mean error is equal to zero.

S tan dard Error of Estimate

– When no linear relationship exists between two

Linear relationship. No linear relationship.

The slope is not equal to zero The slope is equal to zero

part by The regression model

• R2 measures the proportion of the variation

• R2 takes on any value between zero and one.

• The coefficient of correlation is used to measure

The outlier causes a shift

sunny 2 3 hot 2 2 high 3 4 false 6 2 9 5

• One common practice to handle numerical

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.