Chapter 3 Multivariate Linear Regression

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Multiple Linear Regression Prepared by Mandefro Abere 2007

Chapter 3
3. Multivariate Linear Regression
A regression model that involves more than one regressor variable is called a multiple linear
regression model.
3.1. Description of the model
The data consist of n observations on a dependent or response variable Y and p
predictor or explanatory variables, X 1 , X 2 , X 3 , . . . ,Xp. The observations are
usually represented as in the Table below. The relationship between Y and X 1 , X 2 ,
X 3 , . . . , Xp, is formulated as a linear model
Y  0  1x1  2 x2   p xp i ,i 1,, n, (3.1)
Where  0 ,  1 ,  2 ,  ,  p , are constants referred to as the model partial regression
coefficients (or simply as the regression coefficients) and  is a random
disturbance or error. It is assumed that for any set of fixed values of X 1 , X 2 , X 3 , . . .
, Xp, that fall within the range of the data, the linear equation (3) provides an
acceptable approximation of the true relationship between Y and the X’s (Y is
approximately a linear function of the X’s, and  measures the discrepancy in that
approximation).
In particular,  contains no systematic information for determining Y that is not
already captured by the X’s.

3.2. Estimation of the model parameters

1
Multiple Linear Regression Prepared by Mandefro Abere 2007

3.2.1. Least Squares Estimation


The method of least squares can be used to estimate the regression coefficients in Eq. (3.2).
Suppose that n>p observations are available, and let denote the ith observed response and
denote the ith observation or regressor . We assume that the error term in the model has zero
mean, constant variance, and that the errors are uncorrelated. And also we will assume that the
regressor variables , ,…, are fixed(i.e. mathematical or non random) variables, measured
without error.
Data available:
(Y1 , x11 , x12 ,  , x1 p ), (Y2 , x 21 , x 22 ,  , x 2 p ),  , (Yn , x n1 , x n 2 ,  , x np ).
The multiple linear regression model for the above data is
Yi  0  1xi1  2 xi2  p xip i ,i 1,, n,
where the error terms are assumed to have the following properties:
1. E  i   0;
2. Var  i    2 ;
3. Cov  i ,  j   0, i  j ;
The above data can be represented as the matrix form. Let
Y1  1 x11  x1 p   1  0 
Y  1 x  x2 p      
 2
,    .
21 2
Y  2
,X  ,  
         
       
Yn  1 x n1  x np   n    p 
Then,
Y1    0   1 x11     p x1 p   1    0   1 x11     p x1 p    1 
Y      x     px        x     x   
Y   2  
2

0 1 21 2p 0 1 21 p 2p 
  2 =
       
       
Yn    0   1 x n1     p x np   n    0   1 x n1     p x np   n 
1 x11  x1 p    0    1 
1 x  x 2 p    1   2 
 21
    X   ,
          
    
1 x n1  x np    p   n 
where the error terms become
1. E    0;
 
2. Cov   E    I ;   t 2

The least squares method is to find the estimate of  by minimizing the sum of squares of
residual,

2
Multiple Linear Regression Prepared by Mandefro Abere 2007

 1 
n  
S (  )  S (  0 ,  1 , ,  p )    i2   1 2   n  2    t 
i 1 
 
 n 
 (Y  X ) t (Y  X )
since   Y  X . Expanding S (  ) gives
S( )  (Y  X)t (Y  X )  (Y t   t X t )(Y  X )
 Y tY  Y t X   t X tY   t X t X
 Y tY  2 t X tY   t X t X
Since  t X t Y  (  t X t Y ) t  Y t X  a real number.

 B t A t and A1   At 


t 1
Note: For two matrices A and B,  AB t
Similar to the procedure in finding the minimum of a function in calculus, the least squares
estimate b can be found by solving the equation based on the first derivative of S (  ) ,
 S   
  0 
S (  )  S  
 
Y t Y  2 t X t Y   t X t X 
  1    2 X t Y  2 X t X  0
    
S   
  P 

 X t X   X tY
The fitted values (in vector):
 Yˆ1 
ˆ 
Y
Yˆ  X ˆ   2 
 
 
Yˆn 
The residuals (in vector):
1 
 
Y  Yˆ  Y  Xˆ     2 

 
 n 

3
Multiple Linear Regression Prepared by Mandefro Abere 2007

p 1  0   a1 
 (  i 1 ai )   a 
(  t a )  1
Note: (i)  i 1
 a, where   and a   2  .
       
   
  p   a p 1 
p 1 p 1
 (  i 1  j 1 aij )
 (  t A ) i 1 j 1
(ii)   2 A , where A is any ( p  1)  ( p  1)
 
symmetry matrix.
t t t
 t
Note: X X  X X  X X t
  t

,
t
X X is a symmetric matrix.
Also,

X X  X X 
t 1 t t t 1

 XtX 
1
,
X X  t 1
is a symmetric matrix.
t t
Note: X X   Z Y is called the normal equation.
Note:  t X  (Y t  ˆ t X t ) X  Y t  Y t X ( X t X ) 1 X t X  Y t X  Y t X ( X t X ) 1 X t X
 Y t X Y t X  0.
1
1
Therefore, if there is an intercept, then the first column of X is   . Then,
 

1
1 x11  x1 p 
1 x  x 2 p   n 
 t X   1  2   n 
21

     
  i xij 0
i 1 
 
1 x n1  x np 
n
  i  0
i 1
n
Note: For the linear regression model without the intercept,  ˆi might not be equal to 0.
i 1

Note:
Yˆ  X ˆ  X X t X  1
X t Y  HY ,

Where H  X X tX   1
X t
is called “hat” matrix (or projection matrix). Thus,

4
Multiple Linear Regression Prepared by Mandefro Abere 2007

  Y  Xˆ  Y  HY  I  H Y .
Example 1:
Heller Company manufactures lawn mowers and related lawn equipment. The managers believe
the quantity of lawn mowers sold depends on the price of the mower and the price of a
competitor’s mower. We have the following data:
Competitor’s Price Heller’s Price Quantity sold
X i1 xi2 yi
120 100 102
140 110 100
190 90 120
130 150 77
155 210 46
175 150 93
125 250 26
145 270 69
180 300 65
150 250 85

The regression model for the above data is


Yi   0  1 xi1   2 xi 2   i .
The data in matrix form are
 Y1  102 1 x11 x12  1 120 100
 Y  100 1 x  1 140 110  0 
x
Y      , X  
2 21 22  ,    1 
             .
         2 
Y 85
 10    1 x101 x102  1 150 250
The least squares estimate  is

 ˆ 0   66 . 518 
 ˆ 
  X t Y   0 . 414 
1
ˆ
   1   X t X .
 ˆ    0 . 269 
 2
The fitted regression equation is

yˆ  ˆ0  ˆ1x1  ˆ2 x2  66.518  0.414x1  0.269x2 .


The fitted equation implies an increase in the competitor’s price of 1 unit is associated with an
increase of 0.414 unit in expected quantity sold and an increase in its own price of 1 unit is
associated with a decrease of 0.269 unit in expected quantity sold. Thus,
  t
Yˆ t  X̂  [89.21, 94.79, 120.88, 79.86, 74.02, 98.49, 50.81, 53.69, 60.09, 61.16]

5
Multiple Linear Regression Prepared by Mandefro Abere 2007

and
 t

  Y  Yˆ  [12.79, 5.21, -0.88, -2.86, -28.02, -5.49, -24.81, 15.31, 4.91, 23.84]
Suppose now we want to predict the quantity sold in a city where Heller prices it mower at $160
and the competitor prices its mower at $170. The quantity sold predicted is
66.518 0.414170  0.269160  93.718 .
3.3. Hypothesis Testing in Multiple Linear Regressions
Once we have estimated the parameters in the model, we face two immediate questions:
1. What is the overall adequacy of the model?
2. Which specific regressors seem important?
Several hypothesis testing procedures prove useful for addressing these questions. The
formal tests require that our random errors be independent and follow a normal
distribution with mean E ( ) =0 and variance Var( ) =
3.3.1. Tests for the significance of the regression
The test for significance of regression is a test to determine if there is a linear r/ship between the
response y and any of the explanatory variables , ,…, . This procedure is often thought of
as an overall or global test of model adequacy. The appropriate hypotheses are
• H0: = = =0 (None of X’s associated with Y)
• HA: Not all = 0 (for at least one j)
Rejection of this null hypothesis implies that at least one of the regressors , ,…,
contributes significantly to the model. The test procedure is the generalization of the analysis of
variance used in simple linear regression.
ANALYSIS OF VARIANCE (ANOVA)
The total variability in a regression analysis can be partitioned into a component explained by
this regression, SSR, and a component due to unexplained error SSR.
n 2 n 2 n 2

i.e.,   y i  y     y i  yˆ i     yˆ i  y 
n 1 i 1 i 1

i.e., SST  SSE  SSR


i.e., the total variation in the dependent variable is the sum of the variation in the dependent
variable due to the variation in the independent variable included in the model and the variation
that remained unexplained by the explanatory variables in the model. Analysis of variance is the
technique of decomposing the total sum of squares into its components. i.e., the technique
decomposes the total variation in the dependent variable into the explained and unexplained
variations. The degrees of freedom of the total variation are also the sum of the degrees of
freedom of the two components. By dividing the sum of squares with the corresponding degrees

6
Multiple Linear Regression Prepared by Mandefro Abere 2007

of freedom, we are getting the mean sum of squares. The results of overall significance test of a
model are summarized in the Analysis of Variance (ANOVA) table as follows.
Source of variation Sum of squares Df Mean sum of squares F
Regression SSR =  ( yˆ i  y ) 2 p SSR MSR
MSR  F
p MSE
Residual SSE =  ( y i  yˆ ) 2 n – p-1 SSE
MSE 
n  p 1

Total SST =  ( y i  y ) 2 n–1

The test statistic


MSR
F  F ( p , n  p  1)
MSE
The critical region is
Re ject H 0 if F  F ( p , n  p  1)

Where F ( p , n  p  1) is the value from the F-distribution table.

3.3.2. Tests for Individual Parameters


If the global hypothesis is rejected, it is then appropriate to examine hypotheses for the
individual parameters, such as
H0: βj = 0 vs. H1: βj ≠ 0.
The appropriate test statistic is
^
 j   j
t=
s ^
 j

t  t ( , n  p  1 )
t   t ( , n  p  1 )
The decision rule is rejecting H 0 if
t  t
, ( n  p 1 )
2

Where t follows Student’s t distribution with n – p -1 degrees of freedom and significance level
.
Example2. Test the significance of the regression of the data given in example 1 based on the
following Minitab Output.
The regression equation is

7
Multiple Linear Regression Prepared by Mandefro Abere 2007

y = 66.5 + 0.414 x1 - 0.270 x2

Predictor Coef SE Coef T P


Constant 66.52 41.88 1.59 0.156
x1 0.4139 0.2604 1.59 0.156
x2 -0.26978 0.08091 -3.33 0.013

S = 18.74 R-Sq = 65.3% R-Sq(adj) = 55.4%

Analysis of Variance

Source DF SS MS F P
Regression 2 4618.8 2309.4 6.58 0.025
Residual Error 7 2457.3 351.0
Total 9 7076.1

Source DF Seq SS
x1 1 715.5
x2 1 3903.3

Solution:
1. State the hypothesis: H0: = =0 (None of X’s associated with Y)
HA: Not all = 0 (for at least one j)
2. F  ( p , n  p  1)  F0 .05 , 2 , 7  4 . 74 =
MSR 2309 .4
3. Test statistic F   351  6.58
MSE

4. Since F=6.58 > F0.05 , 2 , 7  4.74 , Reject H0.


5. Conclusion: Then we can conclude that there is a significant contribution of the
variables X 1 and X 2 to the changes of Y.

Since we have rejected the null hypothesis H0: = =0 then we have to identify which variables
are more significant. So we will apply individual tests of the regression parameters H0: =0 and
against H1 : ≠0 .
Solution:
1. State the hypothesis: H0: =0 against H1 : ≠0
2. t ( n  p  1)  t 0 .025 , 7  2 . 365
2
3. Test statistic
^
 1  1 0 . 4139  0
t=  0 . 2604
 1 . 59
s ^
 1

8
Multiple Linear Regression Prepared by Mandefro Abere 2007

4. Since t=1.59< t0.025 , 7  2.365 , do not reject H0.


5. Conclusion: We can conclude that variable X 1 does not have significance contribution to
the changes in Y.

And also to test the hypothesis H0: =0 and against H1 : ≠0 we will follow the following
step.
Solution:
1. State the hypothesis: H0: =0 against H1 : ≠0
2. t ( n  p  1)  t 0 .025 , 7  2 . 365
2
3. Test statistic
^
 2   2  0 . 26978 0
t=  0 . 08091
  3 . 34
s ^
 2

4. Since |t|=3.34> t0.025 , 7  2.365 , reject H0.


5. Conclusion: We can conclude that variable X 2 has more significance contribution to the
changes in Y.

From the above example we can estimate the variance of the estimated regression
parameters var(β ), var(β ) and var(β )using the formula cov(β)= [ ′ ] . Therefore

4.995634866 -2.861085e-02 -3.060623e-03

cov(β)= [ ′ ] = -0.028610846 1.931437e-04 -2.946016e-06

-0.003060623 -2.946016e-06 1.864613e-05l

Which is a 3by3 matrix whose jth diagonal element is the variance of β and whose ijth

off diagonal element is the covariance between β and β .

Where the value estimated from since the error term NID with mean zero and
constant variance then E(SSE/n-p-1= )= .
i.e. =351. β var(β ) 1753.5
Therefore var(β)=var β = var(β ) = 0.067
Β var(β ) 0.006

Exercise: find the cov (β , β ) for all i≠ .

3.4. Confidence Interval for Regression Coefficients

9
Multiple Linear Regression Prepared by Mandefro Abere 2007

If the population regression errors  i , are normally distributed and the standard regression

assumptions hold, the 100(1   )% confidence interval for the regression coefficients,  j

are given by

ˆ j  t  sˆ
( n  p 1, ) j
2

ˆ j   j
where sˆ   ^2 C jj is the standard error of ˆ j that follows Student’s t-
j

distribution with n-p-1 degrees of freedom and significance level  .


Example: find 95% CI on  1 and  2 for the data given on example 1.

Solution: the point estimates of  1 and  2 are 0.2604 and -0.26978 respectively, the
diagonal elements of [ ′ ] to  1 and  2 are =1.931437e-04 and
=1.864613e-05 and also  ^2 =351. Therefore 95% CI on  1 and  2 is given by
ˆ  t 
1 ( 2 ,n 
 ^2C and ˆ  t 
p  1, ) 11  ^2C . From these formulas we get 95% CI for
2 ( 2,n  p  1, ) 22

 1 is between -0.202 and 1.03 and also  2 is between -0.46 and -0.08.
3.4.1. Confidence Interval Estimation of the Mean Response
We may construct a confidence interval on the mean response at particular point, such as x 01 , x 02
,…, x0 p . Define the vector x 0 as
1
x0 = x 01


x0 p

The fitted value at this point is ̂ = x' 0 . This is unbiased estimator of E (y| x 0 ), since E ( ̂ )= E

(y| x 0 ), and the variance of ̂ is Var( ̂ )= x' 0 (X’X) x0 .

Therefore a 100(1- ) percent confident interval on the mean response at the point x 01 , x 02 ,…,

x0 p is ̂ ± t ( n  p  1) s.e( ̂ ).
2

Example3: Find the mean response value for the quantity of mowers sold when the price of the
mower is 200 and the price of competitors is 170 and also find 95% CI for mean response.
66.52
Solution: ̂ = x' 0 =[ 1 200 170] 0.4139 =103.44
−0.26978

10
Multiple Linear Regression Prepared by Mandefro Abere 2007

So the 95% CI for ̂ ± t ( n  p  1) s.e ( ̂ ) =103.44±2.365√201.8


2

3.5. Prediction of New Observations


The regression model can be used to predict future observations on y corresponding to particular
values of the regressor variables, for example x 01 , x 02 ,…, x0 p . If x' 0 =[1, x 01 , x 02 ,…, x0 p ], then

the point estimate of the future observation at the point x 01 , x 02 ,…, x0 p is = x' 0 . Therefore

a 100(1- ) percent confident interval for this future observation is ± t ( n  p  1) s.e( ).


2

Where s.e( )= (1 + x' 0 (X’X) x0 )

This is the generalization of the prediction interval for future observation in simple linear
regression.
Example4: Find the predicted value for the quantity of mowers sold when the price of the
mower is 200 and the price of competitors is 170 and also find 95% CI for the predicted value.
66.52
Solution: = x' 0 = [ 1 200 170] 0.4139 =103.44
−0.26978
And also the 95% CI for future observation is given by ± t ( n  p  1) s.e ( ) and we
2
obtain 103.44±2.365*14.21= [69.82,137.06]
3.6. Coefficient of determination R2
The total variability in a regression analysis can be partitioned into a component explained by
the regression, SSR, and a component due to unexplained error SSR.
n 2 n 2 n 2

is   yi  y 
n 1
   y i  yˆi     yˆ i  y 
i 1 i 1

is SST  SSE  SSR


We have seen that the fit of the regression equation to the data is improved as SSR increases
and SSE decreases. The ratio of the regression sum of squares, SSR, divided by the total sum
of squares, SST, provides a descriptive measure of the proportion or percent of the total
variability that is explained by the regression model. This measure is called the coefficient of
determination or generally R2.
SSR SSE
 R2   1
SST SST

11
Multiple Linear Regression Prepared by Mandefro Abere 2007

The coefficient of determination is often interpreted as the percent of variability in y that is


explained by the regression equation. The quantity varies from 0 to higher values indicate a
better regression. Caution should be used in making general interpretations of R2 because a
high value can result from either a small SSE or a large SST or both. R2 is also known as
percent explained variability.
R2 can vary from 0 to 1, since SST is fixed and 0<SSE<SST. A large R2 implies a better
regression, everything else being equal.
The components of variability have associated degrees of freedom. The SST quantity has (n –
1) degrees of freedom because the mean of Y is required for its computation. SSR
components have (p) degrees of freedom because p coefficients are required for its
computation. Finally, the SSE components has (n – p -1) degrees of freedom because k
coefficients and the mean are required for its computation.
We use the coefficient of determination, R 2 routinely as a descriptive statistic to describe
the strength of the linear relationship between the independent X variables and the dependent
variable Y. It is important to emphasize that R 2 can only be used to compare regression
models that have the same set of sample observations yi where I = 1,2,..,n. This result is seen
from the equation form
SSE
R 2 1 
SST
Thus we see that R 2 can be large either because SSE is small- indicating that the observed
points are close to the predicted points- or because SST is large. We have seen that SSE and
2
S  indicate the closeness of the observed points to the predicted points. With the same SST

for two or more regression equations R 2 provides a comparable measure of the goodness for
the equations.
The coefficient of determination ( R 2 ) for data given in example 1 is 65.3%. This implies that
65.3% of the variability on y is explained by the regressors.
There is a potential problem with using R 2 as an overall measure of the quality of a fitted
equation. As additional independent variables are added to a multiple regression model, the
explained sum of squares SSR will increase even if the additional independent variable is not
an important predictor variable. Thus we might find that R 2 has increased spuriously after

12
Multiple Linear Regression Prepared by Mandefro Abere 2007

one or more non significant predictor variables have been added to the multiple regression
model. In such a case the increased value of R 2 would be misleading. To avoid this problem,
the adjusted coefficient of determination can be computed as follows.
The Adjusted Coefficient of Determination, R 2 , is defined as

SSE
2 (n  p  1)
R 1
SST )
(n 1
We use this measure to correct for the fact that non relevant independent variables will result
in some small reduction in the error sum of squares. Thus, the adjusted R 2 provide
a better comparison between multiple regression models with different numbers of independent
variables. But the difference between R 2 and R 2 is not very large. However, if the regression
model had contained a number of independent variables that were not important conditional
predictors, then the difference would be substantial. The adjusted R-square for data in example 1
is 55.4%.

Another measure of relationship in multiple regressions is the coefficient of multiple


correlations. The coefficient of multiple correlations is the correlation between the predicted
value and the observed value of the dependent variable

R  r ( yˆ , y )  R 2
and is equal to the square root of the multiple coefficient of determination. We use R as another
measure of the strength of the relationship between the dependent variable and the independent
variables. Thus, it is comparable to the correlation between Y and X in simple regression.
3.7. Dummy variables
In the discussion of multiple regressions up to this point we have assumed that the independent
variables x j have existed over a range and contained many different values. However, in the

multiple regression assumptions the only restriction on the independent variables is that they are
fixed values. Thus, we could have an independent variable that took on only two values:
x j  0 and x j  1 . This structure is commonly defined as a dummy variable and we will see that

it provides a valuable tool for applying multiple regression to situations involving categorical
variables. One important example is a linear function that shifts in response to some influence.
Consider first a simple regression equation:

13
Multiple Linear Regression Prepared by Mandefro Abere 2007

Y   0  1 X 1

Now, suppose that we introduce a dummy variable, X 2 that has values 0 and 1 and that the
resulting equation is
Y   0  1 X 1   2 X 2

When X 2 =0 in this equation, the constant is  0 , but when X 2 = 1 the constant is  0   2 . Thus

we see that dummy variable shifts the linear relationship between Y and X 1 by the value of the

coefficient  2 . In this way we can represent the effect of shifts in our regression equation.
Dummy variables are also called indicator variables.
Example
The president of an Investors Ltd. Wants to determine if there is any evidence of wage
discrimination in the salaries of male and female financial analysts.
Examining the data he saw two subsets of salaries, and that salaries for male appears to be
uniformly higher across the years of experience.
This problem can be analyzed by estimating a multiple regression model of salary, Y versus
years of experience, X 1 with a second variable, X 2 that is coded as
0 - Female employees
1 - Male employees
The resulting multiple regression model

yˆ  ˆ0  ˆ1 x1  ˆ 2 x 2

Can be analyzed using the procedures we have learned, noting that the coefficient  1 is an

estimate of the expected annual increase in salary per year of experience and  2 is the shift in

mean salary from male to female employees. If  2 is positive, we have an indication that male
salaries are uniformly higher.

Annual Salary(Y) Years of Gender( X 2 )


experienc 0 =F, 1 = M
e( X 1 )
$ 36730 5 0

14
Multiple Linear Regression Prepared by Mandefro Abere 2007

40650 7 0
46820 9 0
50149 10 0
59679 14 0
67360 17 0
51535 5 1
62289 7 1
72486 9 1
75022 10 1
93379 14 1
105979 17 1

The regression equation is


Y = 23608 + 4076.5 X 1 + 14684 X 2

We can see that  1 = 4076.5 indicating that the expected value for the annual increase is

$4076.5 and  2 = 14684, indicating that the male salaries are, on average $14684 higher.
Analyses such as these have been used successfully in a number of wage discrimination lawsuits.
As a result most companies perform an analysis similar to this to determine if there is any
evidence of salary discrimination.
Examples such as the previous one have wide application to a number of problems including the
following.
1. The relationship between the number of units sold and the price is likely to shift if a new
competitor moves into the market.
2. The relationship between aggregate consumption and aggregate disposable income may
shift in time of war or other major national event.
3. The relationship between total output and number of workers may shift as the result of
the introduction of new production technology.
4. The demand function for a product may shift because of a new advertising campaign or a
news release relating to the product.

15
Multiple Linear Regression Prepared by Mandefro Abere 2007

16

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy