Analysis of Residuals

Analysis of Residuals
One of the most important and informative parts of our analysis in regression equations is the analysis of residuals. After
we compute the regression equation, we should examine the residuals y-𝑦̂ , where 𝑦̂ is the estimate of y from the
regression equation
𝑦̂ = 𝛽̂ 𝑥 (1)
The analysis of residuals may tell us whether there are any peculiarities in our data, whether the functional form chosen is
a wrong one, whether there are omitted variables, and whether assumptions we made about the disturbances 𝑢𝑖 are valid
or not. Thus, the analysis of residuals reveal whether
1. There are outliers

2. There are some omitted variables
3. The relationship is non linear instead of linear
4. The 𝑢𝑖 are correlated instead of independent as assumed
5. The variance of 𝜎𝑢2 of 𝑢𝑖 is not constant
If analysis of residuals reveal any of the above said problems, the simple least squares method described above is
modified in a number of ways. Some of the solutions commonly used are discussed below.
1. Outliers
1
An outlier is an observation that behaves differently from the rest of the observations (data regarding war period ). If
there are outliers, the usual procedure is to omit them and re-estimate the regression equation. But it is not true that
outliers are better discarded than left in. In fact many economists are of the opinion that “outliers give more relevant
information about the relationship between X and Y than the other observations which are considered as good, because
these may correspond to something like controlled experiment in the physical sciences”. So, when we do not have any
priori reason to discard outliers and they are more in number, we can use the least absolute residual ( LAR ) method. In
this method, the sum of absolute residuals
∑𝑛𝑖=1|𝑦𝑖 − 𝑦̂|
𝑖 (2)
is minimized. This will ensure that extreme observations do not get an undue weight. That is LAR method minimizes
𝑄 = ∑𝑛𝑖=1|𝑒𝑖 |
𝑒2
= ∑𝑛𝑖=1 |𝑒𝑖 |
𝑖
= ∑𝑛𝑖=1 𝑤𝑖 𝑒𝑖2 (3)

1
Where 𝑤𝑖 = |𝑒 | .
𝑖
2
In ordinary least squares method , ∑𝑛𝑖=1 𝑒𝑖2 ( sum of squares of residuals) is minimized and in LAR, weighted sum of squares
of residuals is minimized. Initial estimates of the residuals are obtained by ordinary least squares method. The reciprocals
of the absolute values of these residuals are used as weights 𝑤𝑖 .
Then new estimates of the parameters of regression equation are obtained by minimizing ∑𝑛𝑖=1 𝑤𝑖 𝑒𝑖2 . These estimates
enable us to get the new residuals and the procedure is repeated until there is no much difference in estimates obtained
at successive iterations. Usually the estimates do not change significantly after the second or third iteration.
There is no well developed theory about the sampling distributions of the estimators obtained by LAR method.
2. Omitted variables
If there is a symmetric pattern in the residuals, this is possibly caused by some omitted variables. These variables are
ignored because it is not possible to measure these variables. Managerial input, quality changes in labour, etc. are some
examples of such variables. In such situations, one has to say something about the direction of the bias in the estimated
co-efficient or to use some substitute variables (which can be measured ) that capture the effects. These substitute
variables are called proxy variables ( representative variables) . Some times the proxy variables are as just the true
variables with a measurement errors.
Consider the regression model
3
𝑦 = 𝛽𝑥 + 𝑢
where instead of true variable x, we measure proxy z = x+v.
In addition to the assumption Cov( u,x) = 0, it is also assumed that Cov (v , x) = Cov( v,u) = 0. In such cases, the
instrumental variable method is used in place of OLS. These variables as proxy can be used in analysis.
3. Nonlinearity
Sometimes nonlinear relationships can be converted into linear relationships by transformation of variables, assuming the
disturbances to be additive in the transformed equation. Some other nonlinearities can be handled by “search
procedures”. For example, if the regression equation is
𝛽
𝑦𝑖 = 𝛼 + +𝑢
𝑥𝑖 +𝑐
For different values of c=cl , l=1,2,... , define the variable

1
𝑧𝑖 =
𝑥𝑖 +𝑐𝑙
And estimate 𝛼 𝑎𝑛𝑑 𝛽 by minimizing
4
∑𝑛𝑖=1(𝑦𝑖 − 𝛼 − 𝛽𝑧𝑖 )2 .
For each value of c, the residual sum of squares is obtained. The value of c for which residual sum of squares is minimum is
chosen. The corresponding estimates of 𝛼 𝑎𝑛𝑑 𝛽 are the least squares estimates of these parameters. Always it may not
be possible to use the simple search procedures discussed here. In such cases , we have to use a non linear minimization
procedure.
4. The disturbances (𝒖𝒊 ’s) are correlated
If the disturbances 𝑢𝑖 ’sare correlated among themselves ( autocorrelation ), one has to study the pattern of this
correlation. To do this, usually the Durbin – Watson statistic
∑𝑡 (𝑢 2
̂−𝑢
𝑡 ̂ 𝑡−1 )
𝐷𝑊 =
∑𝑡 𝑢̂𝑡 2
is computed. If the value of calculated DW is very low, then the usual procedure to estimate the regression equation on
first differences. That is regress (𝑦𝑡 − 𝑦𝑡−1 ) on (𝑥𝑡 − 𝑥𝑡−1 ).
5. Heteroscedasticity
5
If the analysis of residuals reveal that, the disturbances do not have a common variance (heteroscedasticity) , the
solution to this problem is in reducing these models to models where disturbances have a common variance either by
transformation of variables or by deflation. For example, suppose we have data on sales si and profits pi for a number
of firms large and small, and we want to estimate regression equation
𝑝𝑖 = 𝛼 + 𝛽𝑠𝑖 + 𝑢𝑖 ,
𝑖𝑡 is reasonable to assume that the disturbances to larger firms would have a higher variance than the disturbances of
smaller firms. If we hypothize that the variance of 𝑢𝑖 is proportional to squares of the sales 𝑠𝑖 , that is
𝑉𝑎𝑟(𝑢𝑖 ) = 𝜎 2 𝑠𝑖2 ,
Then we can convert the above regression model to the one where the disturbances exhibit a constant variance 𝜎 2 by
dividing throughout by 𝑠𝑖 , we have
𝑝𝑖 𝛼
= + 𝛽 + 𝑣𝑖 .
𝑠𝑖 𝑠𝑖
If on the other hand,
𝑉𝑎𝑟(𝑢𝑖 ) = 𝜎 2 𝑠𝑖
6
𝑝𝑖 𝛼
Then, = + 𝛽 √𝑠𝑖 + 𝑤𝑖
√𝑠𝑖 √𝑠𝑖
𝑢𝑖 1
Where 𝑤𝑖 = 𝑎𝑛𝑑 𝑉𝑎𝑟(𝑤𝑖 ) = 𝑉𝑎𝑟(𝑢𝑖 ) = 𝜎 2
√𝑠𝑖 𝑠𝑖
𝑝𝑖 1
Thus in this case we would be regressing 𝑜𝑛 𝑎𝑛𝑑 √𝑠𝑖 .
√𝑠𝑖 √𝑠𝑖
Another procedure in such heteroscedasticity cases would be to run the regression in logs, that is
log 𝑝𝑖 = 𝛼 ′ + 𝛽 ′ log 𝑠𝑖 + 𝑢𝑖 .
Thus the two commonly used solutions for heteroscedasticity are deflation and log transformation.
The procedure of transforming the original variables in such a way that the transformed variables satisfy all the
assumptions of classical linear regression model(CLRM) and applying OLS to the transformed variables is known as the
method of generalized least squares (GLS). In short, GLS is the OLS on the transformed variables that satisfy the standard
least squares assumptions. In OLS ∑𝑛𝑖=1 𝑒𝑖2 = 𝑒𝑒 ′ = (𝑦 − 𝛽̂ 𝑥)′ (𝑦 − 𝛽̂ 𝑥) is minimized with
7
1
𝜎12
1
𝜎2
𝑤= .2 .
..
1
[𝜎𝑛2]
That is in GLS, a weighted sum of residual squares is minimized , but in OLS equally weighted sum of residual squares is
minimized. In GLS, weights assigned to each residual is inversely proportional to its 𝜎𝑖 . That is observation coming from a
population with larger 𝜎𝑖 will get relatively smaller weight and those from population with smaller 𝜎𝑖 will get relatively
larger weights in minimizing residual sum of squares.
A point to be noted here that, heteroscedasticity in the residuals often gives the impression that the residuals are non-
normal. If the residuals are not normally distributed, the least squares estimators (OLS) estimators are still BLUE, but all
the tests of significance will not be valid. The deflation and transformation of variables may also produce disturbances that
are approximately normally distributed.
8
Generalized Least Squares
Assuming that all the ideal conditions of OLS hold except that the covariance matrix of the disturbances is 𝜎 2 Ω , where Ω
is not the identity matrix . In particular, Ω may be non diagonal and /or have unequal diagonal elements.
Properties of OLS estimators under this violation.:

𝑥 ′ Ω𝑥
1. The OLS estimator 𝛽̂ is unbiased. And if lim is finite 𝛽̂ is consistent.
𝑛→∞ 𝑛
Proof: E(𝛽̂ ) = 𝛽 + E((𝑥 ′ 𝑥)−1 𝑥 ′ 𝑢)
=𝛽 (unbiased)
We know that asymptotic variance of 𝛽̂ is given by
̂
𝛽 1
lim 𝑉( ) = lim 𝑉( 𝛽̂ )
𝑛→∞ 𝑛 𝑛→∞ 𝑛2
( Defn: Let X1,X2, ... be a sequence of random variables. Then the sequence X1,X2, ... is said to converge in probability
to z if for any arbitrary 𝜀 > 0,
lim 𝑃(|𝑋𝑡 − 𝑧| ≥ 𝜀) = 0 Which is written as 𝑝𝑙𝑖𝑚𝑋𝑡 = 𝑧.
𝑡→∞
Any estimator 𝛽̂ of 𝛽 is consistent if 𝑝𝑙𝑖𝑚𝛽̂ =𝛽
9
′ −1
𝑥 𝑥 𝑥′𝑢
Now 𝑝𝑙𝑖𝑚𝛽̂ = 𝛽 + lim ( ) 𝑝𝑙𝑖𝑚
𝑛 𝑛→∞ 𝑛
𝑥′𝑢 𝑥 ′ Ω𝑥
But has zero mean and covariance matrix σ2
𝑛 𝑛2
𝑥 ′ Ω𝑥 𝑥 ′ Ω𝑥
If lim is finite then lim = 0.
𝑛→∞ 𝑛 𝑛→∞ 𝑛2
𝑥′𝑢 𝑥′𝑢
Hence has zero mean and its covariance matrix vanishes asymptotically, which implies 𝑝𝑙𝑖𝑚 = 0.
𝑛 𝑛
Therefore, 𝑝𝑙𝑖𝑚𝛽̂ = 𝛽, hence 𝛽̂ is consistent.
In order to find V(𝛽̂ ), we know that
𝛽̂ = 𝛽 + (𝑥 ′ 𝑥)−1 𝑥 ′ 𝑢 and hence 𝛽̂ - 𝛽 = (𝑥 ′ 𝑥)−1 𝑥 ′ 𝑢 , therefore
V(𝛽̂ ) = E((𝛽̂ - 𝛽)(𝛽̂ − 𝛽)′ ))
= E ((𝑥 ′ 𝑥)−1 𝑥 ′ 𝑢𝑢′ 𝑥(𝑥 ′ 𝑥)−1 )
= (𝑥 ′ 𝑥)−1 𝑥 ′ 𝐸(𝑢 𝑢′ ) 𝑥(𝑥 ′ 𝑥)−1
= (𝑥 ′ 𝑥)−1 𝑥 ′ 𝜎 2 Ω 𝑥(𝑥 ′ 𝑥)−1
= 𝜎 2 (𝑥 ′ 𝑥)−1 𝑥 ′ Ω 𝑥(𝑥 ′ 𝑥)−1
10
Note that the covariance matrix of 𝛽̂ (GLS estimator) is no longer equal to 𝜎 2 (𝑥 ′ 𝑥)−1 (which is variance of 𝛽̂ in
OLS). It may be either larger or smaller depending on (𝑥 ′ 𝑥)−1 𝑥 ′ Ω 𝑥(𝑥 ′ 𝑥)−1 - (𝑥 ′ 𝑥)−1 is positive semi definite,
negative semi definite or neither.
𝜎 2 (𝑥 ′ 𝑥)−1 (𝑥 ′ Ω𝑥) (𝑥 ′ 𝑥)−1
The asymptotic covariance matrix of 𝛽̂ is given by lim lim lim .
𝑛 𝑛→∞ 𝑛 𝑛→∞ 𝑛 𝑛→∞ 𝑛
𝑒𝑒 ′
Lemma: 𝑠 2 = is a biased and inconsistent estimator of 𝜎 2 .
𝑛−𝑘
Proof: E(𝑒 𝑒 ′ ) = 𝐸(𝑢′ 𝐴𝑢) , 𝐴 = (𝐼𝑛 − 𝑥(𝑥 ′ 𝑥)−1 𝑥 ′ )
= 𝜎 2 𝑡𝑟𝑎𝑐𝑒(𝐴Ω)
≠ 𝜎 2 (𝑛 − 𝑘)
𝑒𝑒 ′
Hence is biased.
𝑛−𝑘
𝐴Ω
Now, 𝑝𝑙𝑖𝑚𝑠 2 = 𝜎 2 lim 𝑡𝑟𝑎𝑐𝑒
𝑛→∞ 𝑛
2
≠𝜎 ,
Implying that 𝑠 2 is inconsistent.
Some results relating to GLS:
1. There exists a non singular matrix V such that 𝑉 ′ 𝑉 = Ω−1 .
11
2. Suppose that the regression equation 𝑦 = 𝑥𝛽 + 𝑢 satisfies all the ideal conditions of OLS except that Ω (𝜎 2 Ω is
variance-covariance matrix of 𝑢) is not identity matrix. Suppose that
𝑥 ′ Ω−1 𝑥
lim is finite and non singular. Let V be a matrix such that 𝑉 ′ 𝑉 = Ω−1 . Then the transformed equation
𝑛→∞ 𝑛
𝑉𝑦 = 𝑉𝑥𝛽 + 𝑉𝑢
Satisfies all the ideal conditions.
That is 𝐸(𝑉𝑢) = 0
′
𝐸 ((𝑉𝑢)(𝑉𝑢) ) = 𝑉𝐸(𝑢𝑢′ )𝑉 ′
= 𝑉𝜎 2 Ω𝑉 ′
= 𝜎 2 𝑉Ω𝑉 ′
But 𝑉 ′ (𝑉Ω𝑉 ′ ) = 𝑉 ′ 𝑉Ω𝑉 ′
= 𝑉 ′ (∵ 𝑉 ′ 𝑉 = Ω−1 )
Implying that 𝑉Ω𝑉 ′ = 𝐼
′
Hence , 𝐸 ((𝑉𝑢)(𝑉𝑢) ) = 𝜎 2 𝐼 .
Theorem:
The BLUE of 𝛽 under transformed relationship is
12
𝛽 𝐺 = (𝑥 ′ Ω−1 𝑥)−1 𝑥 ′ Ω−1 𝑦

Proof: The transformed equation
𝑉𝑦 = 𝑉𝑥𝛽 + 𝑉𝑢
Satisfies all ideal conditions of OLS, and the OLS estimator of transformed equation is
𝛽 𝐺 = ((𝑉𝑥)′ 𝑉𝑥)−1 (𝑉𝑥)′ (𝑉𝑦) (∵OLS of 𝛽 𝑢𝑛𝑑𝑒𝑟 𝑦 = 𝑥𝛽 + 𝑢 is (𝑥 ′ 𝑥)−1 𝑥 ′ 𝑦 )
= (𝑥 ′ 𝑉 ′ 𝑉𝑥)−1 𝑥 ′ 𝑉 ′ 𝑉𝑦
= (𝑥 ′ Ω−1 𝑥)−1 𝑥 ′ Ω−1 𝑦
Thus , 𝛽 𝐺 is GLS estimator (OLS of transformed equation). 𝛽 𝐺 has all the desired properties like unbiased, BL, efficient,
consistent and asymptotically efficient.
Theorem: The covariance matrix of the GLS estimator 𝛽 𝐺 is 𝜎 2 (𝑥 ′ Ω−1 𝑥)−1 .
Proof: Considering 𝛽 𝐺 as the OLS estimator in the transformed equation, it clearly has covariance matrix
𝜎 2 ((𝑉𝑥)′ (𝑉𝑥))−1 (∵ 𝜎 2 (𝑥 ′ 𝑥)−1 )
= 𝜎 2 (𝑥 ′ 𝑉 ′ 𝑉𝑥)−1
13
= 𝜎 2 (𝑥 ′ Ω−1 𝑥)−1
And the asymptotic covariance matrix of 𝛽 𝐺 is given by

−1
𝜎2 𝑥 ′ Ω−1 𝑥
lim ( )
𝑛 𝑛→∞ 𝑛
Theorem: An unbiased, consistent , efficient and asymptotically efficient estimator of 𝜎 2 is

𝑒̃ ′ Ω−1 𝑒̃
𝜎̃ 2 =
𝑛−𝑘
Where 𝑒̃ = 𝑉𝑦 − 𝑉𝑥𝛽 𝐺 (∵ 𝑉𝑦 = 𝑉𝑥𝛽 + 𝑉𝑢)
Proof: Since, the transformed equation satisfies the ideal conditions of OLS, the desired estimator of 𝜎 2 is
1 ′
𝑛−𝑘
(𝑉𝑦 − 𝑉𝑥𝛽 𝐺 ) (𝑉𝑦 − 𝑉𝑥𝛽 𝐺 ) ((𝑒 ′ 𝑒)/(𝑛 − 𝑘))
1 ′
=
𝑛−𝑘
(𝑦 − 𝑥𝛽 𝐺 ) 𝑉 ′ 𝑉 (𝑦 − 𝑥𝛽 𝐺 )
1 ′
=
𝑛−𝑘
(𝑦 − 𝑥𝛽 𝐺 ) Ω−1 (𝑦 − 𝑥𝛽 𝐺 )
14
𝑒̃ ′Ω−1 𝑒̃
=
𝑛−𝑘
Note:1. 𝛽 𝐺 ∼ 𝑁(𝛽 , 𝜎 2 (𝑥 ′ Ω−1 𝑥)−1 )
̃2
(𝑛−𝑘)𝜎 2
2. ∼ 𝜒(𝑛−𝑘)
𝜎2
𝐺
3. 𝛽 and 𝜎̃ 2 are independent.
15

Analysis of Residuals

Uploaded by

Copyright:

Available Formats

Analysis of Residuals

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Analysis of Residuals

Uploaded by

Copyright:

Available Formats

Analysis of Residuals

1. There are outliers

= ∑𝑛𝑖=1 𝑤𝑖 𝑒𝑖2 (3)

Consider the regression model

where instead of true variable x, we measure proxy z = x+v.

For different values of c=cl , l=1,2,... , define the variable

And estimate 𝛼 𝑎𝑛𝑑 𝛽 by minimizing

4. The disturbances (𝒖𝒊 ’s) are correlated

If on the other hand,

Generalized Least Squares

Properties of OLS estimators under this violation.:

𝛽 𝐺 = (𝑥 ′ Ω−1 𝑥)−1 𝑥 ′ Ω−1 𝑦

Theorem: The covariance matrix of the GLS estimator 𝛽 𝐺 is 𝜎 2 (𝑥 ′ Ω−1 𝑥)−1 .

𝜎 2 ((𝑉𝑥)′ (𝑉𝑥))−1 (∵ 𝜎 2 (𝑥 ′ 𝑥)−1 )

And the asymptotic covariance matrix of 𝛽 𝐺 is given by

Theorem: An unbiased, consistent , efficient and asymptotically efficient estimator of 𝜎 2 is

Where 𝑒̃ = 𝑉𝑦 − 𝑉𝑥𝛽 𝐺 (∵ 𝑉𝑦 = 𝑉𝑥𝛽 + 𝑉𝑢)

Note:1. 𝛽 𝐺 ∼ 𝑁(𝛽 , 𝜎 2 (𝑥 ′ Ω−1 𝑥)−1 )

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.