简单线性回归分析Simple Linear Regression PDF

Chapter 1 Simple Linear Regression (part 4)
Analysis of Variance (ANOVA) approach to regression analysis
Recall the model again Yi = 0 + 1 Xi + i , The observations can be written as obs 1 2 . . . n Y Y1 Y2 . . . Yn X X1 X2 . . . Xn i = 1, ..., n
The deviation of each Yi from the mean Y , Yi Y The tted Yi = b0 + b1 Xi , i = 1, ..., n are from the regression and determined by Xi . Their mean is 1 Y = n Thus the deviation of Yi from its mean is Yi Y The residuals ei = Yi Yi , with mean is e=0 Thus the deviation of ei from its mean is ei = Yi Yi 1 (why?)
n
Yi = Y
i=1
Write Yi Y Total deviation obs 1 2 . . . n Sum of squares Yi Y Deviation due the regression
ei Deviation due to the error deviation of ei = Yi Yi e1 e = e1 e2 e = e2 . . . en e = en n 2 i=1 ei Sum of squares of error/residuals (SSE)
n i=1
deviation of Yi Y1 Y Y2 Y . . . Yn Y n 2 i=1 (Yi Y ) Total Sum of squares (SST)
deviation of i = b0 + b1 Xi Y Y1 Y Y2 Y . . . n Y Y n 2 i=1 (Yi Y ) Sum of squares due to regression (SSR)

n i=1
We have
n i=1
(Yi Y )2 = SST
(Yi Y )2 + SSR
e2 i
SSE
Proof:
n i=1
(Yi Y )2 = =
n i=1 n i=1
(Yi Y + Yi Yi )2 {(Yi Y )2 + (Yi Yi )2 + 2(Yi Y )(Yi Yi )}

n
= SSR + SSE + 2
i=1 n
(Yi Y )(Yi Yi ) (Yi Y )ei

i=1 n
= SSR + SSE + 2 = SSR + SSE + 2 = SSR + SSE + 2b0
(b0 + b1 Xi Y )ei
i=1 n n n
ei + 2b1
i=1 i=1
Xi ei 2Y
i=1
ei
= SSR + SSE It is also easy to check

n
SSR =
i=1
(b0 + b1 Xi b0 b1 X)2 = b2 1 2
n i=1
(Xi X)2
(1)
Breakdown of the degree of freedom The degrees of freedom for SST is n 1: noticing that Y1 Y , ....., Yn Y have one constraint
n i=1 (Yi
Y) = 0
The degrees of freedom for SSR is 1: noticing that Yi = b0 + b1 Xi (see Figure 1)
1 0
1 0
residuals e 0 0.5 X 1
2 fitted yhat Y
1 0 1
0.5 X
0.5 X
Figure 1: A gure shows the degree of freedom The degrees of freedom for SSE is n 2: noticing that e1 , ..., en have TWO constraints Mean (of ) Squares M SR = SSR/1 M SE = SSE/(n 2) called regression mean square called error mean square
n i=1 ei
= 0 and
n i=1 Xi ei
= 0 (i.e., the normal equation).
Analysis of variance (ANOVA) table Based on the break-down, we write it as a table Source of variation Regression Error Total
SSR = SSE = SST =
n 2 i=1 (Yi Y ) n 2 i=1 (Yi Yi ) n 2 i=1 (Yi Y )
SS
df 1 n-2 n-1
MS MSR = SSR 1 MSE = SSE n2
F-value M SR F = M SE
P (> F ) p-value
R command for the calculation anova(object, ...) where object is the output of a regression.
Expected Mean Squares E(M SE) = 2 and

2 E(M SR) = 2 + 1 n i=1
(Xi X)2
[Proof: the rst equation was proved (where?). By (1), we have E(M SR) = E(b1 ) = [ ]
2 n i=1 2
(Xi X)2 = [V ar(b1 ) + (Eb1 )2 ] + X)2

2 1 ] n i=1
n i=1
(Xi X)2
n i=1
n i=1 (Xi
2 (Xi X)2 = 2 + 1
(Xi X)2
F-test of H0 : 1 = 0
Consider the hypothesis test H0 : 1 = 0, Note that Yi = b0 + b1 Xi and SSR = b2 1

n i=1
Ha : 1 = 0.
(Xi X)2
If b1 = 0 then SSR = 0 (why). Thus we can test 1 = 0 based on SSR. i.e. under H0 , SSR or MSR should be small. We consider the F-statistic F = Under H0 , F F (1, n 2) For a given signicant level , our criterion is 4 SSR/1 M SR = . M SE SSE/(n 2)
If F F (1 , 1, n 2) (i.e. indeed small), accept H0 If F > F (1 , 1, n 2)(i.e. not small), reject H0 where F (1 , 1, n 2) is the (1 ) quantile of the F distribution. We can also do the test based on the p-value = P (F > F ), If p-value , accept H0 If p-value < , reject H0 Example 2.1 For the example above (with n = 25, in part 3), we t a model Yi = 0 + 1 Xi + i (By (R code)), we have the following output Analysis of Variance Table Response: Y Df Sum Sq Mean Sq X 1 252378 252378 Residuals 23 54825 2384
F value 105.88
P r(> F ) 4.449e-10
***
Suppose we need to test H0 : 1 = 0 with signicant level 0.01, based on the calculation, the p-value is 4.449 1010 <0.01, we should reject H0 . Equivalence of F -test and t-test We have two methods to test H0 : 1 = 0 versus H1 : 1 = 0. Recall SSR = b2 1
n i=1 (Xi
X)2 . Thus
n i=1 (Xi
b2 SSR/1 = 1 F = SSE/(n 2) But since s2 (b1 ) = M SE/

n i=1 (Xi
X)2 M SE
X)2 (where?), we have under H0 ,

2
F = Thus
b1 b2 1 = s2 (b1 ) s(b1 )
= (t )2 .
F > F (1 , 1, n 2) (t )2 > (t(1 /2, n 2))2 |t | > t(1 /2, n 2). and F F (1 , 1, n 2) (t )2 (t(1 /2, n 2))2 |t | t(1 /2, n 2). (you can check in the statistical table F (1 , 1, n 2) = (t(1 /2, n 2))2 ) Therefore, the test results based on F and t statistics are the same. (But ONLY for simple linear regression model) 5
General linear test approach
To test whether H0 : 1 = 0, we can do it by comparing two models Full model : Yi = 0 + 1 Xi + i and Reduced model : Yi = 0 + i Denote the SSR of the FULL and REDUCED models by SSR(F ) and SSR(R) respectively (and SSE(R), SSR(F)). We have immediately SSR(F ) SSR(R) or SSE(F ) SSE(R). A question: when does the equality hold? Note that if H0 : 1 = 0 holds, then SSE(R) SSE(F ) should be small SSE(F ) Considering the degree of freedoms, dene F = (SSE(R) SSE(F ))/(dfR dfF ) should be small SSE(F )/dfF
where dfR and dfF indicate the degrees of freedom of SSE(R) and SSE(F ) respectively. Under H0 : 1 = 0, it is proved that F F (dfR dfF , dfF ) Suppose we get the F value as F , then If F F (1 , dfR dfF , dfF ), accept H0 If F > F (1 , dfR dfF , dfF ), reject H0 Similarly, based on the p-value = P (F > F ), If p-value , accept H0 If p-value < , reject H0
Descriptive measures of linear association between X and Y
It follows from SST = SSR + SSE that 1= where

SSR SST
SSR SSE + SST SST
is the proportion of Total sum of squares that can be explained/predicted by the
predictor X
SSE SST
is the proportion of Total sum of squares that caused by the random eect.
A good model should have large R2 = SSE SSR =1 SST SST
R2 is called Rsquare, or coecient of determination Some facts about R2 for simple linear regression model 1. 0 R2 1. 2. if R2 = 0, then b1 = 0 (because SSR = b2 1 3. if R2 = 1, then Yi = b0 + b1 Xi (why?) 4. the correlation coecient between rX,Y = R2 [Proof: R2 = b2 SSR = 1 SST
n 2 i=1 (Xi X) n 2 i=1 (Yi Y ) 2 = rXY n i=1 (Xi
X)2 )
5. R2 only indicates the tness in the observed range/scope. We need to be careful if we make prediction outside the range. 6. R2 only indicates the linear relationships. R2 = 0 does not mean X and Y have no nonlinear association. 7
Considerations in Applying regression analysis

1. In prediction a new case, we need to ensure the model is applicable to the new case. 2. Sometimes we need to predict X, and thus predict Y . As a consequence, the prediction accuracy also depends on the prediction of X 3. The range of X for the model. If a new case X is far from the range, in the prediction, we need be careful 4. 1 = 0 only indicates the correlation relationship, but not a cause-and-eect relation (causality). 5. Even if 1 = 0 can be concluded, we cannot say Y has no relationship/association with X. We can only say there is no LINEAR relationship/association between X and Y .
Write an estimated model

Y (S.E.) = b0 (s(b0 )) + b1 X (s(b1 ))
2 (or MSE) = ..., R2 = ..., F-statistic = ... (and others) Other formats of writing a tted model can be found in Part 3 of the lecture notes.

简单线性回归分析Simple Linear Regression PDF

Uploaded by

Copyright:

Available Formats

简单线性回归分析Simple Linear Regression PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

简单线性回归分析Simple Linear Regression PDF

Uploaded by

Copyright:

Available Formats

Chapter 1 Simple Linear Regression (part 4)

Analysis of Variance (ANOVA) approach to regression analysis

deviation of Yi Y1 Y Y2 Y . . . Yn Y n 2 i=1 (Yi Y ) Total Sum of squares (SST)

deviation of i = b0 + b1 Xi Y Y1 Y Y2 Y . . . n Y Y n 2 i=1 (Yi Y ) Sum of squares due to regression (SSR)

(Yi Y + Yi Yi )2 {(Yi Y )2 + (Yi Yi )2 + 2(Yi Y )(Yi Yi )}

(Yi Y )(Yi Yi ) (Yi Y )ei

= SSR + SSE + 2 = SSR + SSE + 2 = SSR + SSE + 2b0

= SSR + SSE It is also easy to check

The degrees of freedom for SSR is 1: noticing that Yi = b0 + b1 Xi (see Figure 1)

= 0 (i.e., the normal equation).

SSR = SSE = SST =

n 2 i=1 (Yi Y ) n 2 i=1 (Yi Yi ) n 2 i=1 (Yi Y )

MS MSR = SSR 1 MSE = SSE n2

Expected Mean Squares E(M SE) = 2 and

(Xi X)2 = [V ar(b1 ) + (Eb1 )2 ] + X)2

Consider the hypothesis test H0 : 1 = 0, Note that Yi = b0 + b1 Xi and SSR = b2 1

b2 SSR/1 = 1 F = SSE/(n 2) But since s2 (b1 ) = M SE/

X)2 (where?), we have under H0 ,

General linear test approach

Descriptive measures of linear association between X and Y

It follows from SST = SSR + SSE that 1= where

SSR SSE + SST SST

is the proportion of Total sum of squares that can be explained/predicted by the

A good model should have large R2 = SSE SSR =1 SST SST

Considerations in Applying regression analysis

Write an estimated model

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.