Module01.1 LinearRegression
Module01.1 LinearRegression
Module01.1 LinearRegression
1 1.5 1.91
2 1.4 1.83
3 2.7 0.86
4 1.1 1.72
5 0.9 1.28
6 0.8 1.09
7 2.9 0.79
8 2.2 1.1
9 3.3 0.81
10 1.8 1.67
Glimpse of Linear Regression
lm1 <- lm(mpg ~ hp, data = mtcars)
summary(lm1)
anova(lm1)
Normal Probability Plot of Residuals
x1 x2 . . . x
Normal Probability Plotk y
(response is Flight Time)
99 x11 x12 . . . x1k y1
95 x21 x22 . . . x2k y2
90
80
M M M M
70
xn1 xn2 . . . xnk yn
Percent
60
50
40
30
20
10
1
-1.0 -0.5 0.0 0.5 1.0
Residual
Residual =
Matrix Plot
Correlation
Questions to Ask?
• Is any of the predictors important in
predicting the response ?
• Which of the predictors are important?
• How well does the model fit the data?
• Given a set of predictors, how accurate is the prediction?
• What is the effect of individual observation on the model?
Questions to Ask?
• Is any of the predictors important in
predicting the response ?
Ans: ANOVA
• Which of the predictors are important?
Ans: All subsets or best subsets regression
• How well does the model fit the data?
Ans:
• Given a set of predictors, how accurate is the prediction?
Ans: , Cross-validation etc.
• What is the effect of individual observation on the model?
Ans: Leverage, Cook’s Distance
Assumptions for Linear Regression
• The primary assumptions for linear regression are
1. Linearity of the observed phenomenon
2. Constant variance of error terms
3. Normality of the error term distribution
4. Independence of error terms
• Adherence to the assumptions are tested through
graphical methods such as residual plots, normal
probability plot of residuals
Steps to do Regression
• Step 1. Create a flat file (ready to use software when done)
• Step 2. Start with a first order model (usually)
• Step 3. Fit the current model form.
• Step 4. Perform model diagnostics. If defensible, stop. Otherwise, try a different
form including possibly adding or removing factors. Return to Step 3.
• Step 5. (Sometime Optional) t-test coefficients and/or make decision.
• Comment: The process involves a degree of subjectivity and intuition about the
physical system and what model form makes sense and helps to answer the
relevant questions.
Estimation of Coefficients
Model form:
•
•
Estimation of Coefficients
• =SSE
• Minimize the above function with respect to ’s.
• How do we do this?
• for
• In matrix form:
• SSE= (minimize w.r.t )
• (derivative=0)
1 -1 -1
4
1 1 -1
3
X= 1
1
-1
1
1
1 y= 1
0
1 0 0
4
Design Matrix
Example #2- Estimation
5 0 0 0.20 0 0
X'X = 0 4 0 (X'X)-1 = 0 0.25 0
0 0 4 0 0 0.25
2.4
b = (X'X)-1 X'y =
-0.5
-1.5
y = 2.4 - 0.5 x1 - 1.5 x2 prediction equation
Example #2- New Model-Same Array
22 design with yi =b0 + b1 xi1 + b2 xi2 +
x1 x2 y
center-point -1 -1 4 b3 x2i2 + ei
1 -1 3
experimental -1 1 1 functional form of
design 1 1 0 the fit model
0 0 4
1 -1 -1 1 4
1 1 -1 1 3
X= 1
1
-1
1
1
1
1
1 y= 1
0
1 0 0 0 4
Different Design Matrix
Hypothesis Testing Multiple Regression
•
•
•
𝟐
• 𝒂𝒅𝒋 ( = #of predictors)
𝟐
• 𝒑𝒓𝒆𝒅
• Variance of is
• The Confidence Interval of mean response
at is
Prediction Interval of New Response
• Suppose we want to predict the actual response at a point
Check VIFs,
Rsq, P-
values etc.
Model Selection: Example 1
Model Selection: Example 2
Forward Selection
• Step 1: Model m1 <- Fit a null model
• Step 2: Add variables one at a time ( simple Linear Reg
model)
• Step 3: Pick the model with lowest RSS and add it to the
m1
• Step 4: With remaining variables, add to m1 one at
a time and pick the model that provides best RSS
• Step 5: Continue until some stopping criterion is satisfied
Backward Selection
• Step 1: Start with all the variables in the model
• Step 2: Remove the variable which is least significant
(largest p-value)
• Step 3: Fit with remaining variables
• Step 4: Continue dropping variables until some stopping
criterion is met (threshold on p-value)
Other methods
• Mallow’s Cp
• AIC
• BIC
• Cross-validation
• Adjusted