Analysis of Residuals
Analysis of Residuals
Analysis of Residuals
One of the most important and informative parts of our analysis in regression equations is the analysis of residuals. After
we compute the regression equation, we should examine the residuals y-𝑦̂ , where 𝑦̂ is the estimate of y from the
regression equation
𝑦̂ = 𝛽̂ 𝑥 (1)
The analysis of residuals may tell us whether there are any peculiarities in our data, whether the functional form chosen is
a wrong one, whether there are omitted variables, and whether assumptions we made about the disturbances 𝑢𝑖 are valid
or not. Thus, the analysis of residuals reveal whether
If analysis of residuals reveal any of the above said problems, the simple least squares method described above is
modified in a number of ways. Some of the solutions commonly used are discussed below.
1. Outliers
1
Analysis of Residuals
An outlier is an observation that behaves differently from the rest of the observations (data regarding war period ). If
there are outliers, the usual procedure is to omit them and re-estimate the regression equation. But it is not true that
outliers are better discarded than left in. In fact many economists are of the opinion that “outliers give more relevant
information about the relationship between X and Y than the other observations which are considered as good, because
these may correspond to something like controlled experiment in the physical sciences”. So, when we do not have any
priori reason to discard outliers and they are more in number, we can use the least absolute residual ( LAR ) method. In
this method, the sum of absolute residuals
∑𝑛𝑖=1|𝑦𝑖 − 𝑦̂|
𝑖 (2)
is minimized. This will ensure that extreme observations do not get an undue weight. That is LAR method minimizes
𝑄 = ∑𝑛𝑖=1|𝑒𝑖 |
𝑒2
= ∑𝑛𝑖=1 |𝑒𝑖 |
𝑖
2
Analysis of Residuals
In ordinary least squares method , ∑𝑛𝑖=1 𝑒𝑖2 ( sum of squares of residuals) is minimized and in LAR, weighted sum of squares
of residuals is minimized. Initial estimates of the residuals are obtained by ordinary least squares method. The reciprocals
of the absolute values of these residuals are used as weights 𝑤𝑖 .
Then new estimates of the parameters of regression equation are obtained by minimizing ∑𝑛𝑖=1 𝑤𝑖 𝑒𝑖2 . These estimates
enable us to get the new residuals and the procedure is repeated until there is no much difference in estimates obtained
at successive iterations. Usually the estimates do not change significantly after the second or third iteration.
There is no well developed theory about the sampling distributions of the estimators obtained by LAR method.
2. Omitted variables
If there is a symmetric pattern in the residuals, this is possibly caused by some omitted variables. These variables are
ignored because it is not possible to measure these variables. Managerial input, quality changes in labour, etc. are some
examples of such variables. In such situations, one has to say something about the direction of the bias in the estimated
co-efficient or to use some substitute variables (which can be measured ) that capture the effects. These substitute
variables are called proxy variables ( representative variables) . Some times the proxy variables are as just the true
variables with a measurement errors.
3
Analysis of Residuals
𝑦 = 𝛽𝑥 + 𝑢
In addition to the assumption Cov( u,x) = 0, it is also assumed that Cov (v , x) = Cov( v,u) = 0. In such cases, the
instrumental variable method is used in place of OLS. These variables as proxy can be used in analysis.
3. Nonlinearity
Sometimes nonlinear relationships can be converted into linear relationships by transformation of variables, assuming the
disturbances to be additive in the transformed equation. Some other nonlinearities can be handled by “search
procedures”. For example, if the regression equation is
𝛽
𝑦𝑖 = 𝛼 + +𝑢
𝑥𝑖 +𝑐
4
Analysis of Residuals
∑𝑛𝑖=1(𝑦𝑖 − 𝛼 − 𝛽𝑧𝑖 )2 .
For each value of c, the residual sum of squares is obtained. The value of c for which residual sum of squares is minimum is
chosen. The corresponding estimates of 𝛼 𝑎𝑛𝑑 𝛽 are the least squares estimates of these parameters. Always it may not
be possible to use the simple search procedures discussed here. In such cases , we have to use a non linear minimization
procedure.
If the disturbances 𝑢𝑖 ’sare correlated among themselves ( autocorrelation ), one has to study the pattern of this
correlation. To do this, usually the Durbin – Watson statistic
∑𝑡 (𝑢 2
̂−𝑢
𝑡 ̂ 𝑡−1 )
𝐷𝑊 =
∑𝑡 𝑢̂𝑡 2
is computed. If the value of calculated DW is very low, then the usual procedure to estimate the regression equation on
first differences. That is regress (𝑦𝑡 − 𝑦𝑡−1 ) on (𝑥𝑡 − 𝑥𝑡−1 ).
5. Heteroscedasticity
5
Analysis of Residuals
If the analysis of residuals reveal that, the disturbances do not have a common variance (heteroscedasticity) , the
solution to this problem is in reducing these models to models where disturbances have a common variance either by
transformation of variables or by deflation. For example, suppose we have data on sales si and profits pi for a number
of firms large and small, and we want to estimate regression equation
𝑝𝑖 = 𝛼 + 𝛽𝑠𝑖 + 𝑢𝑖 ,
𝑖𝑡 is reasonable to assume that the disturbances to larger firms would have a higher variance than the disturbances of
smaller firms. If we hypothize that the variance of 𝑢𝑖 is proportional to squares of the sales 𝑠𝑖 , that is
𝑉𝑎𝑟(𝑢𝑖 ) = 𝜎 2 𝑠𝑖2 ,
Then we can convert the above regression model to the one where the disturbances exhibit a constant variance 𝜎 2 by
dividing throughout by 𝑠𝑖 , we have
𝑝𝑖 𝛼
= + 𝛽 + 𝑣𝑖 .
𝑠𝑖 𝑠𝑖
𝑉𝑎𝑟(𝑢𝑖 ) = 𝜎 2 𝑠𝑖
6
Analysis of Residuals
𝑝𝑖 𝛼
Then, = + 𝛽 √𝑠𝑖 + 𝑤𝑖
√𝑠𝑖 √𝑠𝑖
𝑢𝑖 1
Where 𝑤𝑖 = 𝑎𝑛𝑑 𝑉𝑎𝑟(𝑤𝑖 ) = 𝑉𝑎𝑟(𝑢𝑖 ) = 𝜎 2
√𝑠𝑖 𝑠𝑖
𝑝𝑖 1
Thus in this case we would be regressing 𝑜𝑛 𝑎𝑛𝑑 √𝑠𝑖 .
√𝑠𝑖 √𝑠𝑖
Another procedure in such heteroscedasticity cases would be to run the regression in logs, that is
log 𝑝𝑖 = 𝛼 ′ + 𝛽 ′ log 𝑠𝑖 + 𝑢𝑖 .
Thus the two commonly used solutions for heteroscedasticity are deflation and log transformation.
The procedure of transforming the original variables in such a way that the transformed variables satisfy all the
assumptions of classical linear regression model(CLRM) and applying OLS to the transformed variables is known as the
method of generalized least squares (GLS). In short, GLS is the OLS on the transformed variables that satisfy the standard
least squares assumptions. In OLS ∑𝑛𝑖=1 𝑒𝑖2 = 𝑒𝑒 ′ = (𝑦 − 𝛽̂ 𝑥)′ (𝑦 − 𝛽̂ 𝑥) is minimized with
7
Analysis of Residuals
1
𝜎12
1
𝜎2
𝑤= .2 .
..
1
[𝜎𝑛2]
That is in GLS, a weighted sum of residual squares is minimized , but in OLS equally weighted sum of residual squares is
minimized. In GLS, weights assigned to each residual is inversely proportional to its 𝜎𝑖 . That is observation coming from a
population with larger 𝜎𝑖 will get relatively smaller weight and those from population with smaller 𝜎𝑖 will get relatively
larger weights in minimizing residual sum of squares.
A point to be noted here that, heteroscedasticity in the residuals often gives the impression that the residuals are non-
normal. If the residuals are not normally distributed, the least squares estimators (OLS) estimators are still BLUE, but all
the tests of significance will not be valid. The deflation and transformation of variables may also produce disturbances that
are approximately normally distributed.
8
Analysis of Residuals
Assuming that all the ideal conditions of OLS hold except that the covariance matrix of the disturbances is 𝜎 2 Ω , where Ω
is not the identity matrix . In particular, Ω may be non diagonal and /or have unequal diagonal elements.
9
Analysis of Residuals
′ −1
𝑥 𝑥 𝑥′𝑢
Now 𝑝𝑙𝑖𝑚𝛽̂ = 𝛽 + lim ( ) 𝑝𝑙𝑖𝑚
𝑛 𝑛→∞ 𝑛
𝑥′𝑢 𝑥 ′ Ω𝑥
But has zero mean and covariance matrix σ2
𝑛 𝑛2
𝑥 ′ Ω𝑥 𝑥 ′ Ω𝑥
If lim is finite then lim = 0.
𝑛→∞ 𝑛 𝑛→∞ 𝑛2
𝑥′𝑢 𝑥′𝑢
Hence has zero mean and its covariance matrix vanishes asymptotically, which implies 𝑝𝑙𝑖𝑚 = 0.
𝑛 𝑛
Therefore, 𝑝𝑙𝑖𝑚𝛽̂ = 𝛽, hence 𝛽̂ is consistent.
In order to find V(𝛽̂ ), we know that
𝛽̂ = 𝛽 + (𝑥 ′ 𝑥)−1 𝑥 ′ 𝑢 and hence 𝛽̂ - 𝛽 = (𝑥 ′ 𝑥)−1 𝑥 ′ 𝑢 , therefore
V(𝛽̂ ) = E((𝛽̂ - 𝛽)(𝛽̂ − 𝛽)′ ))
= E ((𝑥 ′ 𝑥)−1 𝑥 ′ 𝑢𝑢′ 𝑥(𝑥 ′ 𝑥)−1 )
= (𝑥 ′ 𝑥)−1 𝑥 ′ 𝐸(𝑢 𝑢′ ) 𝑥(𝑥 ′ 𝑥)−1
= (𝑥 ′ 𝑥)−1 𝑥 ′ 𝜎 2 Ω 𝑥(𝑥 ′ 𝑥)−1
= 𝜎 2 (𝑥 ′ 𝑥)−1 𝑥 ′ Ω 𝑥(𝑥 ′ 𝑥)−1
10
Analysis of Residuals
Note that the covariance matrix of 𝛽̂ (GLS estimator) is no longer equal to 𝜎 2 (𝑥 ′ 𝑥)−1 (which is variance of 𝛽̂ in
OLS). It may be either larger or smaller depending on (𝑥 ′ 𝑥)−1 𝑥 ′ Ω 𝑥(𝑥 ′ 𝑥)−1 - (𝑥 ′ 𝑥)−1 is positive semi definite,
negative semi definite or neither.
𝜎 2 (𝑥 ′ 𝑥)−1 (𝑥 ′ Ω𝑥) (𝑥 ′ 𝑥)−1
The asymptotic covariance matrix of 𝛽̂ is given by lim lim lim .
𝑛 𝑛→∞ 𝑛 𝑛→∞ 𝑛 𝑛→∞ 𝑛
𝑒𝑒 ′
Lemma: 𝑠 2 = is a biased and inconsistent estimator of 𝜎 2 .
𝑛−𝑘
Proof: E(𝑒 𝑒 ′ ) = 𝐸(𝑢′ 𝐴𝑢) , 𝐴 = (𝐼𝑛 − 𝑥(𝑥 ′ 𝑥)−1 𝑥 ′ )
= 𝜎 2 𝑡𝑟𝑎𝑐𝑒(𝐴Ω)
≠ 𝜎 2 (𝑛 − 𝑘)
𝑒𝑒 ′
Hence is biased.
𝑛−𝑘
𝐴Ω
Now, 𝑝𝑙𝑖𝑚𝑠 2 = 𝜎 2 lim 𝑡𝑟𝑎𝑐𝑒
𝑛→∞ 𝑛
2
≠𝜎 ,
Implying that 𝑠 2 is inconsistent.
Some results relating to GLS:
1. There exists a non singular matrix V such that 𝑉 ′ 𝑉 = Ω−1 .
11
Analysis of Residuals
2. Suppose that the regression equation 𝑦 = 𝑥𝛽 + 𝑢 satisfies all the ideal conditions of OLS except that Ω (𝜎 2 Ω is
variance-covariance matrix of 𝑢) is not identity matrix. Suppose that
𝑥 ′ Ω−1 𝑥
lim is finite and non singular. Let V be a matrix such that 𝑉 ′ 𝑉 = Ω−1 . Then the transformed equation
𝑛→∞ 𝑛
𝑉𝑦 = 𝑉𝑥𝛽 + 𝑉𝑢
Satisfies all the ideal conditions.
That is 𝐸(𝑉𝑢) = 0
′
𝐸 ((𝑉𝑢)(𝑉𝑢) ) = 𝑉𝐸(𝑢𝑢′ )𝑉 ′
= 𝑉𝜎 2 Ω𝑉 ′
= 𝜎 2 𝑉Ω𝑉 ′
But 𝑉 ′ (𝑉Ω𝑉 ′ ) = 𝑉 ′ 𝑉Ω𝑉 ′
= 𝑉 ′ (∵ 𝑉 ′ 𝑉 = Ω−1 )
Implying that 𝑉Ω𝑉 ′ = 𝐼
′
Hence , 𝐸 ((𝑉𝑢)(𝑉𝑢) ) = 𝜎 2 𝐼 .
Theorem:
The BLUE of 𝛽 under transformed relationship is
12
Analysis of Residuals
Thus , 𝛽 𝐺 is GLS estimator (OLS of transformed equation). 𝛽 𝐺 has all the desired properties like unbiased, BL, efficient,
consistent and asymptotically efficient.
Proof: Considering 𝛽 𝐺 as the OLS estimator in the transformed equation, it clearly has covariance matrix
= 𝜎 2 (𝑥 ′ 𝑉 ′ 𝑉𝑥)−1
13
Analysis of Residuals
= 𝜎 2 (𝑥 ′ Ω−1 𝑥)−1
Proof: Since, the transformed equation satisfies the ideal conditions of OLS, the desired estimator of 𝜎 2 is
1 ′
𝑛−𝑘
(𝑉𝑦 − 𝑉𝑥𝛽 𝐺 ) (𝑉𝑦 − 𝑉𝑥𝛽 𝐺 ) ((𝑒 ′ 𝑒)/(𝑛 − 𝑘))
1 ′
=
𝑛−𝑘
(𝑦 − 𝑥𝛽 𝐺 ) 𝑉 ′ 𝑉 (𝑦 − 𝑥𝛽 𝐺 )
1 ′
=
𝑛−𝑘
(𝑦 − 𝑥𝛽 𝐺 ) Ω−1 (𝑦 − 𝑥𝛽 𝐺 )
14
Analysis of Residuals
𝑒̃ ′Ω−1 𝑒̃
=
𝑛−𝑘
̃2
(𝑛−𝑘)𝜎 2
2. ∼ 𝜒(𝑛−𝑘)
𝜎2
𝐺
3. 𝛽 and 𝜎̃ 2 are independent.
15