0% found this document useful (0 votes)

13 views

Week 10_Lecture 10

The document discusses model building in the context of linear and logistic regression, emphasizing the importance of evidence-based modeling and the relationship between independent and dependent variables. It outlines the goals of model development, assessment, and the need to avoid overfitting by using training and validation datasets. Additionally, it covers techniques for selecting predictor variables and evaluating model performance using various statistical measures.

Uploaded by

hanyuan2079

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Week 10_Lecture 10

Uploaded by

hanyuan2079

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 59

`

W10
Tertulia:…

Lecture: Model Building

Class discussion: Based on this week’s readings

Case Presentation: Moved to March 24

Python workshop: Linear and Logistic Regression with Python

1
• Election and Digital Transformation

2
3
• Linear and Logistic Regression

4
5
George Box

6
Models are built
based on evidence,
not vice versa.
7
Careful:
The outcome
There is never It is all about
But your story
50-50 is a bad
It is far from Prescribing
matters theONLY ONE the art of should predictive
SIMPLE! has
most. WAY! presentation!MAKE SENSE!performance.
consequences!

8
Dependent
Variable
Actual
Price
Salary

𝑀𝑜𝑑𝑒𝑙

𝑀𝑒𝑎𝑛

Independent Performance
Condition
9
Variable(s)
10
Advantage Limitations
• Simple, easy to understand, • Cannot capture and
easy to implement represent non-linear
relationship between input
and output variable
• Cannot deal with interaction
between input variables

11
Description Prediction

12
Goal: to explain relationship between independent
(explanatory) variables and dependent variable

Datasets: rows are cases (observations) and

columns variables (features)

Objective: to fit the data well and understand the

contribution of explanatory variables to the target
variable
Measurement of the goodness-of-fit: R2, residual
analysis, p-values
13
Goal: to predict target values in situations where we
only have predictor values, but not target values

Datasets: rows are cases (observations) and

columns variables (features)

Objective: to optimize predictive accuracy

Model development and assessment: training

model with training set and measure its
performance on validation set
14
• Problem: How well will our model perform with

Training
Data
new data? Build
• Solution: Separate data into two parts Model(s)
• Training partition to develop the model
• Validation partition to implement the model and
evaluate its performance on “new” data. It addresses

Validation
Data
the issue of overfitting.
Evaluate
Model(s)

15
y= b0 + b1x1 + b2x2+ .... + bkxk+ e

MULTIPLE Given value of

predictors (independent variables) (x1, x2, …, xk),
LINEAR the algorithm chooses regression
coefficients (b0, b1, b2, ..., bk )
REGRESSION to minimize
(WHAT DATA DO error (e):
the difference between
WE NEED?) actual values (dependent variable) (y)
and
predicted values (y’).
16
PRICE
y b0

b 1 x1 b 2 x2 b 3 x3 b 4 x4 b 5 x5

Age Mileage Fuel Type Engine Power Metallic Color

b 6 x6 b 7 x7 b 8 x8 b 9 x9 b10 x10
Automatic Engine Number of Number of
Gear Capacity Doors Gears Weight

b11 x11 b12 x12 b13 x13 b14 x14 e

Air Central Powered
ABS Conditioner Lock Windows
17
remodeled None Old Recent

None 1 0 0

Old 0 1 0

Recent 0 0 1

18
Fuel
region
Type East North South West
East 1 0 0 0
North 0 1 0 0
South 0 0 1 0
West 0 0 0 1
19
pd.get_dummies(dataframe[predictors],drop_first=True
X
train_X,
= pd.get_dummies(dataframe[predictors],drop_first=True
valid_X, train_y, valid_y = train_test_split(X, y, test_size=0.3,
(of False))
(of False))  y = dataframe[outcome]
random_state=1)

CONTINOUS VARIABLES CATEGORICAL

DUMMY VARIABLES
VARIABLES

train_X X train_p train_e y

train_y
OUTCOME
MODELPREDICTORS VARIABLE

valid_X valid_p valid_e valid_y

THE DATASET
20
21
(𝑒 = 𝑦 − 𝑦′ )

RMSE or RASE
MAE or MAD Average error MAPE Total SSE
Root mean
mean absolute systematic over- or mean absolute total sum of
(average) squared
error (deviation) under-prediction percentage error squared errors
error

•∑ •∑ • ∑ ∑ •∑ 𝑒
•

22
y y' e |e| e2 |e/y|
33 34 -1 1 1 0.03
59 49 10 10 100 0.17
47 51 -4 4 16 0.09
65 70 -5 5 25 0.08
Toral 0 20 142 0.36
Average error MAE RMSE MAPE

0 20 142 0.36
=0 =5 ≅6 = 0.09
4 4 4 4

23
% of % of
y y' |e| e2 MAE RMSE
(20) (142)
e1 33 34 1 5% 1 0.7% 5%
e2 59 49 10 50% 100 70.4% 0.7%
e3 47 51 4 20% 16 11.3% 25% 17.6%
e4 65 70 5 25% 25 17.6%
20 142 11.3%

20% 50% 70.4%

MAE RMSE

20 142
=5 ≅ 5.96
4 4
e1 e2 e3 e4 e1 e2 e3 e4

24
TSS: represents the variation of y
ESS: represents the variation explained by a function of Observation
predicting variables y
RSS: represents the prediction error
𝑦
RSS: Residual Sum of Squares
TSS: Toral Sum of 𝑅𝑆𝑆 = (𝑦 − 𝑦 )
Squares
𝑦
𝑇𝑆𝑆 = (𝑦 − 𝑦 ) ESS: Explained Sum of Squares 𝑦 predicted
𝑇𝑆𝑆 = (𝑦 − 𝑦 )
𝑦 mean
𝑦
𝐸𝑆𝑆 𝑅𝑆𝑆
𝑅 = =1−
𝑇𝑆𝑆 𝑇𝑆𝑆
Model explains nothing  0≤ R2 ≤1  Model explains everything
𝑥 x
25
Observation
y
𝑦

𝑦 mean
𝑦

𝑥 x
26
Observation
y
𝑦

𝑦 mean
𝑦

𝑥 x
27
predicted_price =
631,256
– 10706age
+ 65size
+ 116,211rooms
-250area_density
REGRESSION +54307school_score
+ 105593remodeled_Old
FORMULA + 267035remodeled_Recent
-16081region_North
- 78267region_South
+16655region_West

28
UNDERFIT,
OVERFIT,
OR RIGHT FIT
29
Problem:

• Overly complex models have the danger of overfitting

Solution:

• Reduce variables via automated selection of variable

subsets.
30
Training set
Validation set
Happiness

Income 31
Training set
Validation set
Happiness

High Bias
Low Variance
High Training Error
High Validation Error

Income 32
Training set
Validation set
Happiness

Low Bias
High Variance
No Training Error
High Validation Error

Income 33
Training set
Validation set
Happiness

Low Bias
Low Variance
Low Training Error
Low Validation Error

Income 34
Optimal Model
Complexity
Training error
Prediction Error

Low Variance High Variance Validation error

High Bias Low Bias

Variance

Bias

Low Model Complexity High 35

Selecting the best set of
variables for predicting
• Manually
• Automatically

36
Goal:

• To find parsimonious model.

• The simplest model that
SELECTING performs sufficiently well.

SUBSETS OF Benefits:
PREDICTORS
• Minimizing data collection
requirement
• Maximizing robustness
• Improving predictive accuracy

37
Forward Selection
• Addition

Backward Elimination
• Deletion

Mixed Stepwise Exhaustive Search

• Addition/Deletion • All possible combinations
38
• Starts with no predictors
• Adds them one by one (add the one with largest contribution)
• When to stop:
• p-value Threshold: Stops when no other potential predictor has statistically
significant contribution
• Max Validation R2: Stops when the R2 on the validation set stops improving
when predictors are added (only available when there is a validation
column)
 Using more predictors in the prediction model does not always improve RSquare for the
validation data set
39
a4
1 x4
a3
x3
a2
R2 x2 y
a1
x1
a0 y= a0 x0 + a1x1 + a2 x2 + a3x3 + a4x4
0 x0=1
40
• Starts with all predictors
• Successively eliminates least useful predictors one by one
• Stopping Rules
• p-value Threshold: Stops when all remaining predictors have statistically
significant contribution
• Max Validation R2 : Stops when the R2 on the validation set does not improve
by removing the predictor (only available when there is a validation
column)

41
a4
1 x4
a3
x3
a2
R2 x2 y
a1
x1
a0 y= a0 x0 + a1x1 + a2 x2 + a3x3 + a4x4
0 x0=1
42
• Like Forward selection except that at each step we consider
dropping predictors that are not statistically significant as we do in
backward elimination
• The stopping rule is p-value Threshold with
• Prob to Enter (for adding) and
• Prob to Leave (for dropping)

43
a4
1 x4
a3
x3
a2
R2 x2 y
a1
x1
a0 y= a0 x0 + a1x1 + a2 x2 + a3x3 + a4x4
0 x0=1
44
• Exhaustive Search
• All possible subsets of predictors
• 2n-1 combinations
 15 combinations for 4 variables
• Computationally intensive: O(2n)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
a b c d a a a b b c a a a b a
b c d c d d b b c c b
c d d d c
d

45
RSS: The residual sum of squares (the smaller the better)

R2: Goodness-of-fit, (the larger the better)

Adj. R2: Adjusted R2 values. (the larger the better)

Cp : Mallows Cp is a measure of the error in the best subset model, relative to the
error incorporating all variables. (should be smaller)
AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion):
measuring information lost by fitting a given model (should be smaller)
46
• Adjusted R2 is a modification of R2 that adjusts for the number of
explanatory terms in a model.
1 − 𝑅 × (𝑛 − 1) 𝑆𝑆 𝑑𝑓
𝑅 =1− =1− ×
𝑛−𝑝−1 𝑆𝑆 𝑑𝑓
• where p is the total number of regressors in the linear model (but not
counting the constant term), n is the sample size, dft is the degrees of
freedom n – 1 of the estimate of the population variance of the dependent
variable, and dfe is the degrees of freedom n – p – 1 of the estimate of the
underlying population error variance.

47
• Measurements for comparing models
• Both measures penalize complexity in models.
• Use in the context, in combination with other measures, informed by domain
knowledge
• AICc (corrected for small samples)
• It could be as simple as:
THE MODEL WITH THE LOWEST AIC OR BIC VALUE IS THE BEST
MODEL.
• AIC and BIC are based on likelihood and MSE which could be a bit complicated for
an introductory course.
• It is challenging to find consistent formulas for AICc and BIC

48
×( )
• 𝐴𝐼𝐶𝑐 = 𝐴𝐼𝐶 +
×
• 𝐴𝐼𝐶𝑐 = (+2𝑘 − 2𝐿𝐿) +
×
• 𝐴𝐼𝐶𝑐 = +2𝑘 + 𝑛 × ln + 𝑛 × ln 2𝜋 + 𝑛 +

• Where
• n = number of observations
• k = number of parameters in the model
49
• 𝐵𝐼𝐶 = −2𝐿𝐿 + 𝑘 × 𝑙𝑛(𝑛)
• 𝐵𝐼𝐶 = 𝑛 × ln + 𝑘 × 𝑙𝑛(𝑛) + 𝑛 × ln 2𝜋 + 𝑛

• 𝐵𝐼𝐶 = 𝑛 × ln + 𝑘 × 𝑙𝑛(𝑛)

• Where
• n = number of observations
• k = number of parameters in the model
50
predicted_price =
641,400
– 10,477age
+ 71size
+ 118,214rooms
+55,257school_score
REGRESSION + 169,436remodeled_Recent
+39,656region_West
FORMULA

51
52
• Algorithms Need Managers, Too.
by: Porter, M. E., Davenport, T. H., Daugherty, P., & Wilson, H. J. (2018). HBR's 10
Must Reads on AI, Analytics, and the New Machine Age

53
Algorithms follow • Discuss examples where algorithms maximized short-
literal instructions, term gains at the expense of long-term brand reputation
but business goals
often involve
nuanced trade-offs. • Explore how managers can set explicit multi-objective
goals (e.g., balancing profitability with fairness, as in the
How can managers case of targeted neighborhood inspections).
ensure that both
short-term and
long-term goals • Explore how managers can set explicit multi-objective
are embedded in goals (e.g., balancing profitability with fairness, as in the
algorithm design? case of targeted neighborhood inspections).

54
Algorithms often • How companies like Netflix or eBay have benefited
provide accurate from algorithms, despite limited understanding of the
predictions but "why" behind their predictions?
don't explain why
they made a
particular • What challenges do managers face when algorithmic
recommendation. decisions lack transparency, and how can
experimentation and data validation help?
How can
managers trust
and use these • Does reliance on predictions without understanding
predictions causation lead to suboptimal outcomes (e.g., eBay’s
effectively? ineffective advertising).

55
• What is the importance of using wide and diverse data
Algorithms rely inputs (e.g., Yelp reviews used in Boston’s restaurant
heavily on the inspection algorithm)?
quality and
diversity of data • How can managers prevent the algorithm from being too
inputs. myopic by expanding the range of data inputs (e.g.,
How can short-term sales versus long-term customer
businesses ensure satisfaction)?
they are using the
right data to train • How do companies adjust their data strategy to improve
their algorithms? predictive power (e.g., moving from sales data to
satisfaction metrics for product longevity predictions).
56
57
`
 ANY QUESTION? W10

Regression Modeling Strategies
No ratings yet
Regression Modeling Strategies
506 pages
Data Science Cheatsheet
100% (1)
Data Science Cheatsheet
5 pages
Grammar Beginer 4
No ratings yet
Grammar Beginer 4
5 pages
Diagnostic Tests2
No ratings yet
Diagnostic Tests2
25 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
ML 3 (1)
No ratings yet
ML 3 (1)
50 pages
S2-Linear-Regression-LKW-9March2025
No ratings yet
S2-Linear-Regression-LKW-9March2025
23 pages
0 Regularization PDF
No ratings yet
0 Regularization PDF
88 pages
Week 4 Lecture Slides BUS265 2023
No ratings yet
Week 4 Lecture Slides BUS265 2023
45 pages
2 Modele lineare
No ratings yet
2 Modele lineare
43 pages
Unit 5
No ratings yet
Unit 5
18 pages
LP III Lab Manual
100% (1)
LP III Lab Manual
8 pages
Untitled Document
No ratings yet
Untitled Document
6 pages
Chapter 06 Linear Reg
No ratings yet
Chapter 06 Linear Reg
24 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
ch5
No ratings yet
ch5
42 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Lecture 09_02.09.2024_Regression-01
No ratings yet
Lecture 09_02.09.2024_Regression-01
62 pages
Features Election
No ratings yet
Features Election
18 pages
Hair PPT Ch05
No ratings yet
Hair PPT Ch05
18 pages
Regression
No ratings yet
Regression
24 pages
IJARCCE.2022.115105
No ratings yet
IJARCCE.2022.115105
6 pages
DSEnd
No ratings yet
DSEnd
30 pages
3rd Module EDBA Contiuation1
No ratings yet
3rd Module EDBA Contiuation1
6 pages
Lecture 3
No ratings yet
Lecture 3
90 pages
Supervised_Learning (2)
No ratings yet
Supervised_Learning (2)
41 pages
Lesson 5 Model Selection
No ratings yet
Lesson 5 Model Selection
41 pages
Model Selection and Model Validation
No ratings yet
Model Selection and Model Validation
36 pages
DDMA05_ModelSelection
No ratings yet
DDMA05_ModelSelection
28 pages
UNIT 1 DS_End_Term
No ratings yet
UNIT 1 DS_End_Term
6 pages
Unit 2
No ratings yet
Unit 2
80 pages
Module 4
No ratings yet
Module 4
33 pages
AAM UNIT 1 QB WITH ANSWER
No ratings yet
AAM UNIT 1 QB WITH ANSWER
12 pages
ML 04 Validation Regularization
No ratings yet
ML 04 Validation Regularization
57 pages
SML
No ratings yet
SML
8 pages
Machine learning (1)
No ratings yet
Machine learning (1)
20 pages
Practical Machine Learning Course Notes
No ratings yet
Practical Machine Learning Course Notes
76 pages
AST Day 2 Slides
No ratings yet
AST Day 2 Slides
58 pages
A1388404476 - 64039 - 23 - 2023 - Machine Learning II
No ratings yet
A1388404476 - 64039 - 23 - 2023 - Machine Learning II
10 pages
Mock Exams 2024
No ratings yet
Mock Exams 2024
81 pages
What Are The Differences Between Supervised and Unsupervised Learning?
No ratings yet
What Are The Differences Between Supervised and Unsupervised Learning?
22 pages
Cheatsheet Machine Learning Tips and Tricks PDF
No ratings yet
Cheatsheet Machine Learning Tips and Tricks PDF
2 pages
MECH4403 LR Week04
No ratings yet
MECH4403 LR Week04
25 pages
m2 Data analytic and visualization
No ratings yet
m2 Data analytic and visualization
53 pages
ML Unit 3
No ratings yet
ML Unit 3
23 pages
Lecture 31-36
No ratings yet
Lecture 31-36
44 pages
DMV Unit 3 PPT_RSK_250419_125620 jfhuehiwhu
No ratings yet
DMV Unit 3 PPT_RSK_250419_125620 jfhuehiwhu
89 pages
Predictive Maintenance
No ratings yet
Predictive Maintenance
66 pages
Bookdown Demo PDF
No ratings yet
Bookdown Demo PDF
19 pages
Rms PDF
No ratings yet
Rms PDF
506 pages
Midterm Sol
No ratings yet
Midterm Sol
23 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
Developing Prediction Models w5 2024
No ratings yet
Developing Prediction Models w5 2024
20 pages
Class 9 after
No ratings yet
Class 9 after
38 pages
Regression_Questionnaire
No ratings yet
Regression_Questionnaire
10 pages
Linear Regression Algorithm
No ratings yet
Linear Regression Algorithm
16 pages
hw16_109090023
No ratings yet
hw16_109090023
22 pages
Ds Module 4
No ratings yet
Ds Module 4
73 pages
3 - SupervisedIntro
No ratings yet
3 - SupervisedIntro
80 pages
Jkkklphftbbhuii
No ratings yet
Jkkklphftbbhuii
17 pages
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
P615 B_Session 3_Jan 23_ST
No ratings yet
P615 B_Session 3_Jan 23_ST
60 pages
GRIT W25 Project Scoping-FOTA
No ratings yet
GRIT W25 Project Scoping-FOTA
12 pages
P615 B_Session 6_Feb 13_Upload
No ratings yet
P615 B_Session 6_Feb 13_Upload
74 pages
Day 9-Inventory Management I_S
No ratings yet
Day 9-Inventory Management I_S
58 pages
Day 10-Inventory Management II_VF
No ratings yet
Day 10-Inventory Management II_VF
72 pages
Week 06_Lecture 06
No ratings yet
Week 06_Lecture 06
57 pages
Day2-Process Analysis I_VF
No ratings yet
Day2-Process Analysis I_VF
115 pages
Day 11-Beer Game_VF
No ratings yet
Day 11-Beer Game_VF
42 pages
Day 7-Quality Management-Project Management_S
No ratings yet
Day 7-Quality Management-Project Management_S
71 pages
P620 IOA Case Discussion
No ratings yet
P620 IOA Case Discussion
5 pages
P615 A - Lecture 3 Final
No ratings yet
P615 A - Lecture 3 Final
36 pages
Day 4-Waiting line I_S
No ratings yet
Day 4-Waiting line I_S
71 pages
Day1_Intorduction to OM
No ratings yet
Day1_Intorduction to OM
107 pages
P615 A - Lecture 7 Final
No ratings yet
P615 A - Lecture 7 Final
50 pages
P615 A - Lecture 6 Final
No ratings yet
P615 A - Lecture 6 Final
50 pages
P615 A - Lecture 4 Final
No ratings yet
P615 A - Lecture 4 Final
73 pages
W2 Personality & Values
No ratings yet
W2 Personality & Values
32 pages
Report- Ricardo Semler
No ratings yet
Report- Ricardo Semler
16 pages
Exercises On Indirect Speech With Key
No ratings yet
Exercises On Indirect Speech With Key
5 pages
12th Phy Summary
No ratings yet
12th Phy Summary
33 pages
Tea Estates
No ratings yet
Tea Estates
3 pages
Formation of Image L
No ratings yet
Formation of Image L
24 pages
Plumbing Reviewer Chapters 1-3
No ratings yet
Plumbing Reviewer Chapters 1-3
14 pages
Class Ix Notes 2023-24
No ratings yet
Class Ix Notes 2023-24
27 pages
3 Activity Carbohyrdates I
No ratings yet
3 Activity Carbohyrdates I
10 pages
A Cannabinoid Hypothesis of Schizophrenia
No ratings yet
A Cannabinoid Hypothesis of Schizophrenia
6 pages
Department of Labor: OWCP-915
No ratings yet
Department of Labor: OWCP-915
2 pages
FABM 2 Finals Reviewer
No ratings yet
FABM 2 Finals Reviewer
10 pages
Descase 3D Sight Glasses Datasheet
No ratings yet
Descase 3D Sight Glasses Datasheet
4 pages
Chapter 9 Term Paper
No ratings yet
Chapter 9 Term Paper
3 pages
Countries and Verb To Be
No ratings yet
Countries and Verb To Be
23 pages
Zuberi Notes Directions
No ratings yet
Zuberi Notes Directions
45 pages
Jadam MMP
100% (10)
Jadam MMP
61 pages
Marketing scales handbook a compilation of multi item measures for consumer behavior advertising research Vol 6 Gordon C. Bruner instant download
100% (2)
Marketing scales handbook a compilation of multi item measures for consumer behavior advertising research Vol 6 Gordon C. Bruner instant download
56 pages
Sidharth Tiw
No ratings yet
Sidharth Tiw
80 pages
Notice for Security Refund
No ratings yet
Notice for Security Refund
7 pages
Boarding Pass Passport: Documents
No ratings yet
Boarding Pass Passport: Documents
45 pages
Geoffrey Batchen, "Phantasm: Digital Imaging and The Death of Photography." in Edward A Shanken Ed, Art and Electronic Media (Phaidon, 2009), 209-211.
100% (1)
Geoffrey Batchen, "Phantasm: Digital Imaging and The Death of Photography." in Edward A Shanken Ed, Art and Electronic Media (Phaidon, 2009), 209-211.
7 pages
15+ resume
No ratings yet
15+ resume
3 pages
CSEH Internship Report
No ratings yet
CSEH Internship Report
18 pages
Lesson 1 - Introduction To System Integration
100% (1)
Lesson 1 - Introduction To System Integration
27 pages
RPN
50% (8)
RPN
21 pages
APQR
No ratings yet
APQR
12 pages
Solutions 2004
No ratings yet
Solutions 2004
12 pages
Business & Economic Forecasting
No ratings yet
Business & Economic Forecasting
27 pages
Example Simple Present
100% (1)
Example Simple Present
8 pages
Social Media Marketing at Reebok India - The Dilemma of ROMI and Beyond
No ratings yet
Social Media Marketing at Reebok India - The Dilemma of ROMI and Beyond
20 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Week 10_Lecture 10

Uploaded by

Week 10_Lecture 10

Uploaded by

`

Lecture: Model Building

Class discussion: Based on this week’s readings

Case Presentation: Moved to March 24

Python workshop: Linear and Logistic Regression with Python

Datasets: rows are cases (observations) and

Objective: to fit the data well and understand the

Datasets: rows are cases (observations) and

Objective: to optimize predictive accuracy

Model development and assessment: training

MULTIPLE Given value of

Age Mileage Fuel Type Engine Power Metallic Color

b11 x11 b12 x12 b13 x13 b14 x14 e

CONTINOUS VARIABLES CATEGORICAL

train_X X train_p train_e y

valid_X valid_p valid_e valid_y

20% 50% 70.4%

• Overly complex models have the danger of overfitting

• Reduce variables via automated selection of variable

Low Variance High Variance Validation error

Low Model Complexity High 35

• To find parsimonious model.

Mixed Stepwise Exhaustive Search

R2: Goodness-of-fit, (the larger the better)

Adj. R2: Adjusted R2 values. (the larger the better)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.