0% found this document useful (0 votes)

30 views

Notes 12

1. The document discusses variable selection in regression modeling and describes four possible outcomes: correctly specified model, underspecified model, model with extraneous variables, and overspecified model. 2. It then explains the consequences of each outcome, such as biased vs unbiased coefficients and predictions. Correct specification yields the best results while underspecification is the worst. 3. Stepwise regression and best subsets regression are presented as methods to select variables and achieve a balanced model that is correctly specified. Stepwise regression works by iteratively adding and removing variables based on statistical testing at each step.

Uploaded by

Yash Sirowa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views

Notes 12

Uploaded by

Yash Sirowa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Statistical Methods - 2

Notes 12
Further...

1 Introduction to model building and variable selection

2 R examples
What if the Regression Equation Contains “Wrong”
Predictors?

Before we learn about variable selection methods, we first need to understand the
consequences of a regression equation containing the “wrong” or “inappropriate”
variables.
There are four possible outcomes when formulating a regression model for a set of data:

1 The regression model is “correctly specified.”

2 The regression model is “underspecified.”

3 The regression model contains one or more “extraneous variables.”

4 The regression model is “over-specified.”

The Four Possible Outcomes

1 A regression model is correctly specified (outcome 1) if the regression equation contains

all of the relevant predictors, including any necessary transformations and interaction terms.
1. That is, there are no missing, redundant or extraneous predictors in the model. This is
the best possible outcome.
2. A correctly specified regression model yields unbiased regression coefficients and
unbiased predictions of the response.

2 A regression model is underspecified (outcome 2) if the regression equation is missing

one or more important predictor variables.
1. This situation is perhaps the worst outcome, because an underspecified model yields
biased regression coefficients and biased predictions of the response.
2. That is, in using the model, we would consistently underestimate or overestimate the
population slopes and the population means.
A Detour: Bias and Variation in Predicted Responses
The true value is the center of the target.

1 The predicted responses in the first instance may be considered reliable, precise or as
having negligible random error, but all the responses missed the true value by a wide
margin. (biased estimate)

2 The target on the right has more random error (large variance) in the predicted responses ,
however, the results are valid, lacking systematic error (unbiased).

3 The middle target depicts our goal: observations that are both reliable (small variance) and
valid (unbiased).
1 A regression model contains one or more extraneous variables (outcome 3).
1. Extraneous variables are neither related to the response nor to any of the other
predictors.
2. The good news is that such a model does yield unbiased regression coefficients and
an unbiased M SE.
3. The bad news is that M SE has fewer degrees of freedom associated with it. When this
happens, our confidence intervals tend to be wider and our hypothesis tests tend to
have lower power.

2 A regression model is over-specified (outcome 4), then the regression equation contains
one or more redundant predictor variables.
1. That is, part of the model is correct, but we have predictors that are redundant.
2. Regression models that are over-specified yield unbiased regression coefficients and
an unbiased M SE.
3. Also, as with including extraneous variables, we’ve also made our model more
complicated and hard to understand than necessary.
Strategy

1 Know your goal, know your research question. Knowing how you plan to use
your regression model can assist greatly in the model building stage.

2 Identify all of the possible candidate predictors.

1. Don’t worry about interactions or the appropriate functional form – such as x2 and log x
– just yet.
2. Just make sure you identify all the possible important predictors.

3 Use variable selection procedures to find the middle ground between an

underspecified model and a model with extraneous or redundant variables.
Two possible variable selection procedures are stepwise regression and best
subsets regression.

4 Fine-tune the model to get a correctly specified model.

Iterate back and forth between formulating different regression models and checking
the behavior of the residuals until you are satisfied with the model.
Stepwise Regression

Example: let’s learn how the stepwise regression procedure works by considering a data
set that concerns the hardening of cement.
Researchers were interested in learning how the composition of the cement affected the
heat evolved during the hardening of the cement. They measured and recorded the
following data on 13 batches of cement:

1 Predictor x1 : % of tricalcium aluminate

2 Predictor x2 : % of tricalcium silicate

3 Predictor x3 : % of tetracalcium alumino ferrite

4 Predictor x4 : % of dicalcium silicate

5 Response Y : heat evolved in calories during hardening of cement on a per gram

basis
As usual, we can first take a look at the scatterplot matrix.

5 10 15 20 5 10 15 20

110
y

80 90
5 10 15 20

30 40 50 60 70
x2
10 15 20

x3
5

50
x4

30
10
80 90 110 30 40 50 60 70 10 30 50

Note: The number of predictors in this data set is not large. The stepwise procedure is typically
used on much larger data sets, for which it is not feasible to attempt to fit all of the possible
regression models.
Overview of Stepwise Regression using F -Tests to
Choose Variables
1 First, we start with no predictors in our “stepwise model.”

2 Then, at each step along the way we either enter or remove a predictor based on the general
(partial) F -tests – that is, the t-tests for the slope parameters – that are obtained.

3 We stop when no more predictors can be justifiably entered or removed from our stepwise
model, thereby leading us to a “final model.”
Now let’s start the procedure:
Step 1.

1 Fit each of the one-predictor models – that is, regress Y on x1 , regress Y on x2 ,. . ., and
regress Y on xp−1 .

2 Of those predictors whose t-test p-value is less than some α level, say, 0.15, the first
predictor put in the stepwise model is the predictor that has the smallest t-test p-value.

3 If no predictor has a t-test p-value less than α, stop.

Note that in a stepwise regression procedure, α can be larger than 0.05, in order to make it easier
to enter or remove a predictor.
Step 2.

1 Suppose x1 had the smallest t-test p-value below α and therefore was deemed the “best”
single predictor arising from the the first step.

2 Now, fit each of the two-predictor models that include x1 as a predictor – that is, regress Y
on x1 and x2 , regress Y on x1 and x3 ,...,and regress Y on x1 and xp−1 .

3 Of those predictors whose t-test p-value is less than α, the second predictor put in the
stepwise model is the predictor that has the smallest t-test p-value.

4 If no predictor has a t-test p-value less than α, stop. The model with the one predictor
obtained from the first step is your final model.

5 But, suppose instead that x2 was deemed the “best” second predictor and it is therefore
entered into the stepwise model.

6 Now, since x1 was the first predictor in the model, step back and see if entering x2 into the
stepwise model somehow affected the significance of the x1 predictor.
That is, check the t-test p-value for testing β1 = 0. If the t-test p-value for β1 = 0 has become
not significant, remove x1 from the stepwise model.
Step 3.

1 Suppose both x1 and x2 made it into the two-predictor stepwise model and remained
there.

2 Now, fit each of the three-predictor models that include x1 and x2 as predictors – that is,
regress Y on x1 and x2 , and x3 , regress Y on x1 and x2 , and x4 , ..., and regress Y on x1
and x2 , and xp−1 .

3 Of those predictors whose t-test p-value is less than α, the third predictor put in the
stepwise model is the predictor that has the smallest t-test p-value.

4 If no predictor has a t-test p-value less than α, stop. The model containing the two predictors
obtained from the second step is your final model.

5 But, suppose instead that x3 was deemed the “best” third predictor and it is therefore
entered into the stepwise model.

6 Now, since x1 and x2 were the first predictors in the model, step back and see if entering x3
into the stepwise model somehow affected the significance of the x1 and x2 predictors.
If the t-test p-value for either β1 = 0 and β2 = 0 has become not significant, then remove the
predictor from the stepwise model.

Stopping the procedure. Continue the steps as described above until adding an additional
predictor does not yield a t-test p-value below α.
Back to the Cement Example

Step 1. choose the first predictor

Using lm() four times to regress Y on each of the four predictors, we obtain:
The t-statistic for x4 is largest in absolute value and therefore the p-value for x4 is the smallest. As
a result of the first step, we enter x4 into our stepwise model.
In R, to conduct the first step, we can also use the add1() function, which is equivalent to but
easier than calling lm() four times.

Note that in add1(), the general linear (partial) F -test is used. It can be checked that the p-value
for the F -test for each predictor is the same as the p-value for the t-test for each predictor
obtained by lm().
Step 2. choose the second predictor when x4 is in the model.
We regress Y on x4 and x1 , regress Y on x4 and x2 , regress Y on x4 and x3 . Instead of using
lm() for three times, we use the add1() function:

1 The F -statistic (p-value) for x1 is the largest (smallest). As a result of the second step, we
enter x1 into our stepwise model.
2 The update() function adds or removes a predictor to a linear model. In this example, it
means that the model is updated by including x4 . It is the same as regressing Y on x4 by
lm().
Before we proceed to the next step, we need to check whether or not adding x1 in the model
affects the significance of x4 :

The p-value for x4 shows that adding x1 in the model doesn’t affect the significance of x4
Step 3. choose the third predictor when x4 and x1 are in the model.
We regress Y on x4 , x1 , and x2 ; and we regress Y on x4 , x1 , and x3 , obtaining:

The predictor x2 has the smallest F -test p-value. Therefore, as a result of the third step, we enter
x2 into our stepwise model.
Recall that in a stepwise regression procedure, we don’t need to set the α level as strict as 0.05. It
could be larger.
Now, since x1 and x4 were the first two predictors in the model, we must step back and see if
entering x2 into the stepwise model affected the significance of the x1 and x4 predictors.

If we choose α = 0.15, then the p-value for x4 is larger than α. Therefore, we remove the predictor
x4 from the stepwise model, leaving us with the predictors x1 and x2 in our stepwise model.
Step 4. choose the third predictor when x1 and x2 are in the model.
We regress Y on x1 , x2 , and x3 . Note that we don’t need to regress Y on x1 , x2 , and x4 , since we
just delete x4 from a model containing x1 , x2 , and x4 .
Stop the stepwise regression procedure
According to its p-value, x3 is not eligible for entry into our stepwise model. That is, we stop our
stepwise regression procedure. Our final regression model, based on the stepwise procedure
using F -tests to choose predictors, contains only the predictors x1 and x2 .
Information Criteria (Other Criteria to Choose
Predictors)

1. Akaike’s Information Criterion (AIC)

AIC = n log(SSE) − n log(n) + 2p

2. Bayesian Information Criterion (BIC)

BIC = n log(SSE) − n log(n) + plog(n)

1 Notice that the only difference between AIC and BIC is the multiplier of p, the number of
parameters.

2 The BIC places a higher penalty ( log(n) > 2 when n > 7) on the number of parameters in
the model so will tend to reward more parsimonious (smaller) models.

3 For regression models, the information criteria combine information about the SSE, number
of parameters p in the model, and the sample size n.

4 A small value, compared to values for other possible models, is good.

We can use the step() function in R to conduct stepwise regression procedure, instead of using
addl1() or lm() for multiple times. In this function, the default criterion to choose predictors is
AIC. For this cement example, we can get the following:
Note that if we use AIC to choose predictors, the stepwise regression procedure ends up with a
model containing x1 , x2 , and x4 , which is different from the model when we use F -tests to choose
predictors.
Some Cautions about Stepwise Regression
Procedure

1 The final model is not guaranteed to be optimal in any specified sense.

2 The procedure yields a single final model, although there are often several equally
good models.

3 Stepwise regression does not take into account a researcher’s knowledge about the
predictors. It may be necessary to force the procedure to include important
predictors.

4 One should not over-interpret the order in which predictors are entered into the
model.
Best Subsets Regression

1 The general idea behind best subsets regression is that we select the subset of
predictors that do the best at meeting some well-defined objective criterion, such as
having the largest R2 value or the smallest M SE.

2 In order to avoid an underspecified model, which yields biased estimators, a

fundamental rule of the best subsets regression procedure is that the list of
candidate predictor variables must include all of the variables that actually predict
the response.
The Procedure of Best Subsets Regression

Step 1.
First, identify all of the possible regression models derived from all of the possible combinations of
the candidate predictors. Unfortunately, this can be a huge number of possible models.

1. Suppose we have one (1) candidate predictor – x1 . Then, there are two (2) possible
regression models we can consider:
1 the one (1) model with no predictors.
2 the one (1) model with the predictor x1 .

2. Suppose we have two (2) candidate predictor – x1 and x2 . Then, there are four (4) possible
regression models we can consider:
1 the one (1) model with no predictors.
2 the two (2) models with the only one predictor each – the model with x1 alone; the
model with x2 alone.
3 the one (1) model with all two predictors – the model with x1 and x2 .
In general, if there are p − 1 possible candidate predictors, then there are 2p−1 possible regression
models containing the predictors. For example, 10 predictors yield 210 = 1024 possible regression
models.
Step 2.

1. From the possible models identified in the first step, determine

1 the one-predictor models that do the “best” at meeting some well-defined criteria

2 the two-predictor models that do the “best”

3 the three-predictor models that do the “best,” and so on.

2. By doing this, it cuts down considerably the number of possible regression models to
consider!

Step 3

1. Further evaluate and refine the handful of models identified in the last step. This might
entail performing residual analyses, transforming the predictors and/or response, adding
interaction terms, and so on.

2. Do this until you are satisfied that you have found a model that

1 meets the model conditions,

2 does a good job of summarizing the trend in the data,

3 and most importantly allows you to answer your research question.

What do You Think “Best” Means?

Some possible criteria:

1 the model with the largest R2

2 the model with the largest adjusted R2

3 the model with the smallest M SE (or square root of M SE)

4 Mallows’ Cp -statistic. (We will learn it soon.)

The different criteria quantify different aspects of the regression model, and therefore often
yield different choices for the best set of predictors.

We should use best subsets regression as a screening tool to reduce the large number of
possible regression models to just a handful of models that we can evaluate further before
arriving at one final model.
R2 Value

The R2 is defined as:

SSR SSE
R2 = =1−
SST O SST O
can only increase as more variables are added.

Therefore, it makes no sense to define the “best” model as the model with the largest
R2 -value.

1 We can instead use the R2 -values to find the point where adding more predictors is
not worthwhile, because it yields a very small increase in the R2 -value.

2 In other words, we look at the size of the increase in R2 , not just its magnitude alone.

3 It is used most often in combination with the other criteria.

R Package Leaps to Conduct Best Subset Regression
By the regsubsets function in Leaps, we can find the “best” model for each number of possible
predictor(s). For the cement example:

1 “summary.mod$which” shows that a “best” model includes some predictor(s). For

example, the first row tells us that for all possible models with only one predictor, the model
with x4 is the “best”.

2 “summary.mod$rsq” shows the R2 for each “best” model. It shows that going from the
“best” one-predictor model to the “best” two-predictor model. The R2 value jumps from 67.5
to 97.9, which is the largest increase.

3 Based on the R2 -value criterion, the “best” model is the model with the two predictors x1 and
x2 .
Adjusted R2 Value and M SE

1 The adjusted R2 value, which is defined as:

n − 1 n − 1 SSE n−1
Ra2 = 1 − (1 − R2 ) = 1 − =1− M SE
n−p n − p SST O SST O

makes us pay a penalty for adding more predictors to the model.

Therefore, we can just use the adjusted R2 value to choose the best model, which is the one
with the largest adjusted R2 value.

2 mean squared error (MSE) is defined as:

n
(Yi − Ŷi )2
P
SSE i=1
M SE = =
n−p n−p

quantifies how far away our predicted responses are from our observed responses.
Therefore, according to the M SE criterion, the best regression model is the one with the
smallest M SE.

3 The adjusted R2 value increases only if M SE decreases. That is, the adjusted R2 value and
M SE criteria always yield the same “best” models.
Different criteria can lead us to the same “best” models. Based on the largest adjusted R2 value
and the smallest M SE criteria, the “best” model is the model with the three predictors x1 , x2 , and
x4 .

1 “summary.mod$adjr2” shows the adjusted R2 for each “best” model.

2 “summary.mod$rss” shows the SSE for each “best” model, we need to compute by hand
to find M SE.
Mallows’ Cp-statistic

General idea of the Mallows’ Cp -statistic:

1 Recall that an underspecified model is a model in which important predictors are

missing.

2 And, an underspecified model yields biased regression coefficients and biased

predictions of the response.

3 Mallows’ Cp -statistic estimates the size of the bias that is introduced into the
predicted responses by having an underspecified model.
Bias and Variation in Predicted Responses
The true value is the center of the target.

2 The target on the right has more random error (large variance) in the predicted responses ,
however, the results are valid, lacking systematic error (unbiased).

3 The middle target depicts our goal: observations that are both reliable (small variance) and
valid (unbiased).
A Measure of the Total Variation in the Predicted
Responses Γp

1 To quantify the total variation in the predicted responses, we just sum the two components
σŶ2 (random sampling variation) and Bi2 (prediction bias) over all n data points to obtain a
i
standardized measure of the total variation in the predicted responses Γp :
n n
1 nX 2 X o
Γp = 2
σŶ + Bi2
σ i=1
i
i=1

where Bi = E(Ŷi ) − E(Yi ).

2 If all Bi are 0, then Γp achieves its smallest possible value, namely p, the number of
parameters:
n
1 nX 2 o
Γp = 2 σŶ + 0 = p
σ i=1
i

The best model is simply the model with the smallest value of Γp .
However, we don’t know the population quantities: σŶ2 and σ 2 , and hence Γp . We need a
i
statistic to estimate Γp .
Mallows’ Cp as An Estimate of Γp

Mallows’ Cp -statistic is an estimate of Γp . It is defined as

SSEp (M SEp − M SEall )(n − p)

Cp = − (n − 2p) = p +
M SEall M SEall

1 We estimate σ 2 by M SEall , the mean squared error obtained from fitting the model
containing all of the candidate predictors.

2 Estimating σ 2 by M SEall assumes that there are no biases in the full model with all of the
predictors, an assumption that may or may not be valid.

3 Cp = p for the full model because in that case M SEp − M SEall = 0.

Recalling that p denotes the number of parameters in the model:

1 Subset models with small Cp values have a small total (standardized) variance of prediction.

2 When the Cp value is

1 near p, the bias is small (next to none)

2 much greater than p, the bias is substantial

3 below p, it is due to sampling error; interpret as no bias

3 For the largest model containing all of the candidate predictors, Cp = p (always). Therefore,
you shouldn’t use Cp to evaluate the fullest model.
Using the Cp Criterion to Identify “Best” Models

Here’s a reasonable strategy for using Cp to identify “best” models:

1 Identify subsets of predictors for which the Cp value is near p (if possible).

2 The full model always yields Cp = p, so don’t select the full model based on Cp .

3 If all models, except the full model, yield a large Cp not near p, it suggests some important
predictor(s) are missing from the analysis.
In this case, we are well-advised to identify the predictors that are missing!

4 If a number of models have Cp near p, choose the model with the smallest Cp value, thereby
insuring that the combination of the bias and the variance is at a minimum.

5 When more than one model has a small value of Cp value near p, in general, choose the
simpler model or the model that meets your research needs.
According to the Cp criterion, the model with x1 and x2 , and the model with x1 , x2 ,
and x4 are both valid models.

“summary.mod$cp” shows that Cp criterion value for each best model.

Topics in Today’s Class

1 Introduction to model building and variable selection

2 R examples

Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
6chapter-6 (Nested and Split Plot Design) - 1
No ratings yet
6chapter-6 (Nested and Split Plot Design) - 1
46 pages
2021-A Complete Guide To Stepwise Regression in R
No ratings yet
2021-A Complete Guide To Stepwise Regression in R
4 pages
Stepwise Regression
100% (2)
Stepwise Regression
28 pages
Econometric Modeling:: Model Specification and Diagnostic Testing
100% (1)
Econometric Modeling:: Model Specification and Diagnostic Testing
57 pages
Stepwise Regression
No ratings yet
Stepwise Regression
3 pages
MultiLinear VariableSelection
No ratings yet
MultiLinear VariableSelection
10 pages
Chapter 14
No ratings yet
Chapter 14
15 pages
Lab 5
No ratings yet
Lab 5
30 pages
Practice Question 15.12.2023 2
No ratings yet
Practice Question 15.12.2023 2
13 pages
Stepwise Regression
0% (1)
Stepwise Regression
9 pages
College of Natural and Computational Science Department of Statistics Linear Regression Biostatistics Master Program
No ratings yet
College of Natural and Computational Science Department of Statistics Linear Regression Biostatistics Master Program
3 pages
SBE11 CH 16
No ratings yet
SBE11 CH 16
59 pages
Create Linear Regression Model Using Stepwise Regression - MATLAB Stepwiselm - MathWorks India
No ratings yet
Create Linear Regression Model Using Stepwise Regression - MATLAB Stepwiselm - MathWorks India
15 pages
IE266-S25-week12
No ratings yet
IE266-S25-week12
53 pages
Chapter 3 MLR
No ratings yet
Chapter 3 MLR
40 pages
Multiple Regression - Selecting The Best Equation: An Example
No ratings yet
Multiple Regression - Selecting The Best Equation: An Example
29 pages
k2 - Attachments - CT Lecture 16. Model Selection
No ratings yet
k2 - Attachments - CT Lecture 16. Model Selection
50 pages
Stepwise Regression: Forward (Step-Up) Selection
No ratings yet
Stepwise Regression: Forward (Step-Up) Selection
7 pages
MINITAB Which Is Better
No ratings yet
MINITAB Which Is Better
6 pages
Lesson 5 Model Selection
No ratings yet
Lesson 5 Model Selection
41 pages
Model Selection R Chap 4
No ratings yet
Model Selection R Chap 4
5 pages
Lesson 10
No ratings yet
Lesson 10
9 pages
Chapter 15 CRAVEN SALES MODEL - Multiple Regression
No ratings yet
Chapter 15 CRAVEN SALES MODEL - Multiple Regression
19 pages
lOFTUS_ET_AL
No ratings yet
lOFTUS_ET_AL
17 pages
Regn_lect_5
No ratings yet
Regn_lect_5
9 pages
QUIZ Notes
No ratings yet
QUIZ Notes
5 pages
Multiple Regression
No ratings yet
Multiple Regression
21 pages
Linear Model
No ratings yet
Linear Model
10 pages
Module 7 Content
No ratings yet
Module 7 Content
10 pages
15Multiple Linear Regression
No ratings yet
15Multiple Linear Regression
168 pages
Ch08 Part 2 - Multtiple Regression
No ratings yet
Ch08 Part 2 - Multtiple Regression
45 pages
Chapter 5
No ratings yet
Chapter 5
30 pages
Lecture 5
No ratings yet
Lecture 5
16 pages
Trend Analysis - CompContr12
No ratings yet
Trend Analysis - CompContr12
68 pages
Ch08 Part 2 - Multiple Regression
No ratings yet
Ch08 Part 2 - Multiple Regression
45 pages
Chapter 4
No ratings yet
Chapter 4
23 pages
Lecture 3
No ratings yet
Lecture 3
27 pages
Multiple Linear Regression 13112023 063212pm
No ratings yet
Multiple Linear Regression 13112023 063212pm
49 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
13 pages
Lecture-4---Multiple-Linear-Regression-imran-20022025-092939am
No ratings yet
Lecture-4---Multiple-Linear-Regression-imran-20022025-092939am
49 pages
Specification Choosing Independent Variables
No ratings yet
Specification Choosing Independent Variables
7 pages
Chapter4 Notes
No ratings yet
Chapter4 Notes
18 pages
Backward Elimination and Stepwise Regression
No ratings yet
Backward Elimination and Stepwise Regression
5 pages
Lecture 7
No ratings yet
Lecture 7
14 pages
Chapter 10 Multiple Regression
No ratings yet
Chapter 10 Multiple Regression
43 pages
D Linear Regression With R
No ratings yet
D Linear Regression With R
9 pages
Stat 452/652 - Minitab Lab 6 MULTIPLE REGRESSION - Choosing The Best Model
No ratings yet
Stat 452/652 - Minitab Lab 6 MULTIPLE REGRESSION - Choosing The Best Model
2 pages
Introduction of Regression
No ratings yet
Introduction of Regression
57 pages
Chapter 3 Notes
No ratings yet
Chapter 3 Notes
5 pages
Statistical Testing and Prediction Using Linear Regression: Abstract
No ratings yet
Statistical Testing and Prediction Using Linear Regression: Abstract
10 pages
Multiple Linear Regression & Nonlinear Regression Models
No ratings yet
Multiple Linear Regression & Nonlinear Regression Models
51 pages
Chapter 5. Regression Models: 1 A Simple Model
No ratings yet
Chapter 5. Regression Models: 1 A Simple Model
49 pages
Slide 9 Part 2 Variable Selection Technique
No ratings yet
Slide 9 Part 2 Variable Selection Technique
18 pages
Business Analytics Unit - V Notes_60637708_2025_05_15_02_16
No ratings yet
Business Analytics Unit - V Notes_60637708_2025_05_15_02_16
37 pages
Statistical Modelling: Regression: Choosing The Independent Variables
No ratings yet
Statistical Modelling: Regression: Choosing The Independent Variables
14 pages
Multiple Regression
No ratings yet
Multiple Regression
49 pages
stepwise regression
No ratings yet
stepwise regression
2 pages
Unit III
No ratings yet
Unit III
18 pages
SRM Notes
No ratings yet
SRM Notes
38 pages
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
From Everand
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
SUJAUL CHOWDHURY
No ratings yet
Homework4 1
No ratings yet
Homework4 1
10 pages
Questions For Group Assignment
No ratings yet
Questions For Group Assignment
1 page
Notes 15
No ratings yet
Notes 15
20 pages
Demand Supply2
No ratings yet
Demand Supply2
42 pages
Takeaways From Session 6
No ratings yet
Takeaways From Session 6
3 pages
Intro To Sociology
No ratings yet
Intro To Sociology
4 pages
Kinship
No ratings yet
Kinship
3 pages
Module 1 Psych II
No ratings yet
Module 1 Psych II
151 pages
SOCIALIZATION
No ratings yet
SOCIALIZATION
4 pages
Operations Research Chapter 16
No ratings yet
Operations Research Chapter 16
137 pages
Tugas Metode Kuantitatif - Dicky Wahyudi
No ratings yet
Tugas Metode Kuantitatif - Dicky Wahyudi
9 pages
Biometry4 Front
No ratings yet
Biometry4 Front
12 pages
Fly Lab JS - Genetics of Organisms
No ratings yet
Fly Lab JS - Genetics of Organisms
8 pages
Business Analytics: Methods, Models, and Decisions, 1: Edition James R. Evans
No ratings yet
Business Analytics: Methods, Models, and Decisions, 1: Edition James R. Evans
90 pages
Ch 13 Probability Sol
No ratings yet
Ch 13 Probability Sol
6 pages
Linearity in Regression, Domodar N Gujrati -Basic Econometrics
No ratings yet
Linearity in Regression, Domodar N Gujrati -Basic Econometrics
2 pages
M6 Check in Activity 4
No ratings yet
M6 Check in Activity 4
4 pages
SAS - ETS 9.3 User's Guide
No ratings yet
SAS - ETS 9.3 User's Guide
3,316 pages
Ch9 Testbank Handout
100% (2)
Ch9 Testbank Handout
2 pages
Lnq = Β + Β Lnli + Β Lnki + Ɛ
No ratings yet
Lnq = Β + Β Lnli + Β Lnki + Ɛ
12 pages
BAUDM Assignment Predicting Boston Housing Prices
No ratings yet
BAUDM Assignment Predicting Boston Housing Prices
6 pages
Hso201a 2017 18 II Answers To Pps # 7
100% (1)
Hso201a 2017 18 II Answers To Pps # 7
5 pages
03 Model Selection and Train-Validation-Test Sets 12 Min
No ratings yet
03 Model Selection and Train-Validation-Test Sets 12 Min
7 pages
Statistics Impact v.1.3.5
No ratings yet
Statistics Impact v.1.3.5
15 pages
ch08 9QNP PDF
No ratings yet
ch08 9QNP PDF
23 pages
Taxi Fare Prediction using Random Forests
No ratings yet
Taxi Fare Prediction using Random Forests
10 pages
Chapter 7. Statistical Intervals For A Single Sample
No ratings yet
Chapter 7. Statistical Intervals For A Single Sample
102 pages
BEC503 Digital Communication
No ratings yet
BEC503 Digital Communication
1 page
II B.Com MCQ
No ratings yet
II B.Com MCQ
23 pages
Botswana Accountancy College: CIS227 Quantitative Analysis For Business
No ratings yet
Botswana Accountancy College: CIS227 Quantitative Analysis For Business
11 pages
Chapter 8
No ratings yet
Chapter 8
8 pages
Statistical Inference Assignment
No ratings yet
Statistical Inference Assignment
4 pages
Statistics for Engineers: An Introduction with Examples from Practice 1st Edition Hartmut Schiefer instant download
100% (2)
Statistics for Engineers: An Introduction with Examples from Practice 1st Edition Hartmut Schiefer instant download
50 pages
Field Guide To Probability Random Processes and Random Data Analysis
No ratings yet
Field Guide To Probability Random Processes and Random Data Analysis
5 pages
(Ebook) State Space and Unobserved Component Models: Theory and Applications by Harvey A., Koopman S.J., Shephard N. (eds.) ISBN 9780521835954, 052183595X download pdf
100% (4)
(Ebook) State Space and Unobserved Component Models: Theory and Applications by Harvey A., Koopman S.J., Shephard N. (eds.) ISBN 9780521835954, 052183595X download pdf
81 pages
Chave Condit Aguilar Hernandez Lao and Perez 2004
No ratings yet
Chave Condit Aguilar Hernandez Lao and Perez 2004
12 pages
PAG11.2 Daphnia Heart Rate Formula Sheet v2 1
No ratings yet
PAG11.2 Daphnia Heart Rate Formula Sheet v2 1
1 page
Activity 4
No ratings yet
Activity 4
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Notes 12

Uploaded by

Notes 12

Uploaded by

Statistical Methods - 2

1 Introduction to model building and variable selection

1 The regression model is “correctly specified.”

2 The regression model is “underspecified.”

3 The regression model contains one or more “extraneous variables.”

4 The regression model is “over-specified.”

1 A regression model is correctly specified (outcome 1) if the regression equation contains

2 A regression model is underspecified (outcome 2) if the regression equation is missing

2 Identify all of the possible candidate predictors.

3 Use variable selection procedures to find the middle ground between an

4 Fine-tune the model to get a correctly specified model.

1 Predictor x1 : % of tricalcium aluminate

2 Predictor x2 : % of tricalcium silicate

3 Predictor x3 : % of tetracalcium alumino ferrite

4 Predictor x4 : % of dicalcium silicate

5 Response Y : heat evolved in calories during hardening of cement on a per gram

3 If no predictor has a t-test p-value less than α, stop.

Step 1. choose the first predictor

1. Akaike’s Information Criterion (AIC)

AIC = n log(SSE) − n log(n) + 2p

2. Bayesian Information Criterion (BIC)

BIC = n log(SSE) − n log(n) + plog(n)

4 A small value, compared to values for other possible models, is good.

1 The final model is not guaranteed to be optimal in any specified sense.

2 In order to avoid an underspecified model, which yields biased estimators, a

1. From the possible models identified in the first step, determine

2 the two-predictor models that do the “best”

3 the three-predictor models that do the “best,” and so on.

1 meets the model conditions,

2 does a good job of summarizing the trend in the data,

3 and most importantly allows you to answer your research question.

Some possible criteria:

1 the model with the largest R2

2 the model with the largest adjusted R2

3 the model with the smallest M SE (or square root of M SE)

4 Mallows’ Cp -statistic. (We will learn it soon.)

The R2 is defined as:

3 It is used most often in combination with the other criteria.

1 “summary.mod$which” shows that a “best” model includes some predictor(s). For

1 The adjusted R2 value, which is defined as:

makes us pay a penalty for adding more predictors to the model.

2 mean squared error (MSE) is defined as:

1 “summary.mod$adjr2” shows the adjusted R2 for each “best” model.

General idea of the Mallows’ Cp -statistic:

1 Recall that an underspecified model is a model in which important predictors are

2 And, an underspecified model yields biased regression coefficients and biased

where Bi = E(Ŷi ) − E(Yi ).

Mallows’ Cp -statistic is an estimate of Γp . It is defined as

SSEp (M SEp − M SEall )(n − p)

3 Cp = p for the full model because in that case M SEp − M SEall = 0.

2 When the Cp value is

1 near p, the bias is small (next to none)

2 much greater than p, the bias is substantial

3 below p, it is due to sampling error; interpret as no bias

Here’s a reasonable strategy for using Cp to identify “best” models:

“summary.mod$cp” shows that Cp criterion value for each best model.

1 Introduction to model building and variable selection

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.