Econometrics A

Autocorrelation
 The classical linear regression model assumes that there is no autocorrelation in the
data.
 Autocorrelation occurs when the error terms of any two observations are correlated.
 With autocorrelation the error terms are not independent of each other.
 This is more likely to occur in a time series case since the observations in such data
follow a natural ordering and therefore the observations are likely to be correlated.
 Thus, a shock (random factor) in one period is likely to have an effect in the next
period.
Why does Autocorrelation Occur
 Inertia-business cycles- momentum.

 Specification bias: Excluded Variable case. Suppose the true model is
but if we fit the model then
. Thus to the extent that X3 influences Y, there will be a systematic pattern
in the error terms.
 Cobwebs: Example: Overproduction in one year might cause underproduction in the
next year.
 Lags-Autoregression (inclusion of lagged value of the dependent variable as an
explanatory variable in the model)
 Manipulation of data- averaging data introduces smoothness which may lend itself to
a systematic pattern in the disturbances
 Data Transformation- taking first differences of data, introduces auto correlation.
 Non-stationary error term.
What are the consequences
As in the case of heteroscedasticity, in the presence of autocorrelation, OLS estimators

are still unbiased, consistent and asymptotically normally distributed, but are no longer
efficient. The usual t, F and 2 tests cannot be legitimately applied.
The error terms
 We assume that the residuals are generated by an autoregressive process i.e. the errors
are determined by previous error values
 An autoregressive 1 process AR (1)) can be expressed as follows
 and the new error term is normally distributed with mean zero, constant
variance and does not exhibit autocorrelation
 The closer is to 1 in absolute value, the more autocorrelation is present. A value of
1 means there is perfect positive autocorrelation, of -1 means there is perfect negative
autocorrelation, and of zero means there is absolutely no autocorrelation
Detecting Autocorrelation
1
(1) Graphical Methods: Plot residuals against time or graph and see if there is a
systematic pattern
(2) Formal Tests:
(a) Run’s Test
 We define a run as an uninterrupted sequence of a specific sign residual.

 We define the length of a run as the number of elements in it.
 Very few runs are an indication of positive autocorrelation. Too many runs are a sign
of negative auto correlation
 We consider the sequence of the runs to determine whether or not there is randomness
 N = total no of observations;
N1= no of positive symbols
N2 = no of negative symbols;
K = no of runs
H0: residuals are independent-no autocorrelation
We construct a confidence interval
 If the actual no of runs falls outside this interval, reject H0
Durbin Watson d statistic
Assumptions:
 AR(1) process
 The model has an intercept term.
 The explanatory variables are fixed.
 The regression does not include lagged values of the dependent variable.
 There are no missing observations in the data.
 The Durbin – Watson statistic can be derived to be the following:
 implies 0<d<4
  = 0 implies d = 2 (no autocorrelation).
  = 1 implies d = 0 (positive autocorrelation).
2
  = -1 implies d = 4 (negative autocorrelation).
 We look at Durbin-Watson tables: They are organized in terms of a) sample size N

and b) the no of explanatory variables k’=k-1
 The table reports two critical values . The upper limit and the lower limit.
 The null is that there is no autocorrelation
Decision Rules:
(i) If d < dL – positive autocorrelation

(ii) If dL< d < dU – indecision.
(iii) If dU < d < 4-dU – no autocorrelation (positive or negative).
(iv) If 4-dU <d< 4-dL – indecision.
(v) If d > 4-dL – negative autocorrelation
 Limitations of Durbin-Watson test

(i) Applicable only for first order autoregressive scheme
(ii) Decision cannot be made if d lies in the indecisive zone
(iii) Cannot be applied if the model contains lagged value of the dependent
variable as one of the independent variables.
(iv) Cannot be applied if the model does not have an intercept.
 To test for higher order autocorrelation Breusch-Godfrey Test can be applied.
 To test for autocorrelation in the model with lagged dependent variable as one of the
explanatory variables Durbin’s h-test can be applied.
How do we correct for autocorrelation
(A) If is known:
 We use a form of GLS i.e. we transform the variables such that there is no
autocorrelation.
 W start off with the model (1)
 We also assume an AR (1) process.
 We then assume the same, for the previous period
 Multiplying both sides by we get (2)
 Subtracting (2) from (1), we get where
 The OLS assumption now holds. Thus the GLS estimators are BLUE.
 The slope coefficient is the same as in the original model.
 To get back to the intercept of the original model, we divide the GLS intercept by (1-
).
 Since we lose an observation when differencing, we can apply the Prais-Winsten
transformation, and , to avoid losing the
observation.
3
(B) If is not known:
(1) Assume =1, and estimate corresponding difference model i.e.

where , and .
(2) Estimate from the Durbin-Watson statistic , transform the data and
apply OLS. For small samples , where k = no. of coefficients
(including the intercept term).
(3) Cochrane-Orcutt iterative procedure:
 Fit the OLS equation

 Use the residuals from the regression to run the regression to
estimate
 Run the generalized difference equation (GLS)
to re-estimate the coefficients
 Use these estimates, to calculate a new set of residuals

 Use these residuals to re-estimate i.e.
 Run generalized least squares get new estimates
 Continue until successive estimates for differ by very small amounts
(<0.005)
Note: If we have to estimate the procedure is known as estimated generalized least

squares and the properties of the estimators are valid only in large samples.
4
Heteroscedasticity
Definition
Heteroscedasticity occurs when the error variance is not constant. In this case, the random
disturbance for each observation has different distribution with a different variance. Stated
equivalently, the variance of the observed value of the dependent variable around the
regression line is non-constant. A general linear regression model with heteroscedasticity can
be expressed as follows:
Yi = 1 + 2 X2i + … + k Xki + εi
Var(εi) = E(εi 2) = st2 for i = 1, 2, …, n
Note that we have a i subscript attached to sigma squared. This indicates that the disturbance
for each of the n-units is drawn from a probability distribution that has a different variance.
Consequences of heteroscedasticity
If the error term has non-constant variance, but all other assumptions of the classical linear
regression model are satisfied, following are the consequences of using the OLS estimator to
obtain estimates of the population parameters:
1. The OLS estimators are still unbiased.
2. The OLS estimators are inefficient; that is, it is not BLUE.
3. The estimated variances and covariance of the OLS estimators are biased and inconsistent.
4. Conventional hypothesis testing is not valid.
Hence, application of OLS method requires detection of the problem of heteroscedasticity

and, if any, appropriate remedies should be applied.
Detection of heteroscedasticity
There are several ways to use the sample data to detect the existence of heteroscedasticity.
Plot of the residuals
The residual is an unbiased estimate of the unknown and unobservable random
disturbance term . Thus, the squared residuals, can be used as an estimate of the
5
unknown and unobservable variance, si2 = E ( ). One can calculate the squared residuals
and then plot them against an explanatory variable that might be related to the error variance.
If the error variance is likely to be related to more than one of the explanatory variables, one
can plot the squared residuals against each one of these variables. Alternatively, one can plot
the squared residuals against the fitted value of the dependent variable obtained from the OLS
regression. However, this is not a formal test for heteroscedasticity. It would only suggest
whether heteroscedasticity may exist. One should also carry out a formal test to confirm the
same statistically.
Some Important Tests:
There are a set of heteroscedasticity tests that are based on specific assumption about the
structure of the heteroscedasticity. In other words, in order to use these tests one must choose
a specific functional form for the relationship between the error variance and the variables
that are likely to influence the error variance.
Breusch-Pagan Test, and Harvey-Godfrey Test

The Breusch-Pagan test assumes that the error variance is a linear function of one or more
variables. The Harvey-Godfrey Test assumes the error variance is an exponential function of
one or more variables. The variables are usually assumed to be one or more of the
explanatory variables in the regression equation.
Example: Suppose that the regression model is given by
Yi = b1 + b2Xi + εi for i = 1, 2, …, n
It is assumed that all the assumptions of classical linear regression model are satisfied, except
the assumption of constant error variance. Thus, Var(εi) = E(εi 2) = si2 for i = 1, 2, …, n
The Breusch-Pagan test assumes that the error variance is a linear function of Xi, i.e.,
si2 = a1 + a2Xi for i = 1, 2, …, n
On the other hand, the Harvey-Godfrey test assumes that the error variance is an exponential
function of Xi, i.e., as si2 = exp(a1 + a2Xi) or ln(si2) = a1 + a2Xi for i = 1, 2, …, n
6
The null-hypothesis of constant error variance (no heteroscedasticity) can be expressed as the
following restriction on the parameters: Ho: a2 = 0 and H1: a2 ¹ 0
In order to test the null-hypothesis of constant error variance (no heteroscedasticity), one can
use a Lagrange multiplier test. This test follows a chi-square distribution with degrees of
freedom equal to the number of restrictions. In the given example, only one variable, Xi is
included and thus there is only one restriction. Because the error variances si2 are unknown
and unobservable, the squared residuals are used as estimates of these error variances. The
Lagrange Multiplier test requires the following steps to be followed:
Step 1: Regress Yi against a constant and Xi using the OLS methord.

Step 2: Calculate the residuals from this regression,
Step 3: Square these residuals, for the Breusch-Pagan test and take the logarithm of these
squared residuals, ln( ) for the Harvey-Godfrey test.
Step 4: For the Breusch-Pagan Test, regress the squared residuals, , on a constant and Xi,
using the OLS method. For the Harvey-Godfrey Test, regress the logarithm of the squared
residuals, ln( ), on a constant and Xi, using the OLS method.
Step 5: Find the ordinary R2 of this auxiliary regression.
Step 6: Calculate the LM test statistic as follows: LM = nR2.
Step 7: Compare the value of the test statistic to the critical value for some predetermined
level of significance. If the calculated test statistic exceeds the critical value, reject the null-
hypothesis of constant error variance and conclude that there is heteroscedasticity. If not, do
not reject the null-hypothesis and conclude that there is no heteroscedasticity.
These heteroscedasticity tests have two major shortcomings:
1. One must specify a model of the structure of the heteroscedasticity, if it exists. For
example, the Breusch-Pagan test assumes that the error variance is a linear function of one or
more of the explanatory variables. Thus, if heteroscedasticity exists, but the error variance is
a non-linear function of one or more explanatory variables, then this test will not be valid.
2. If the errors are not normally distributed, then these tests may not be valid.
White’s test
7
The White test is a general test for heteroscedasticity. It has the following advantages:
1. It does not require to specify a model of the structure of the heteroscedasticity.
2. It does not depend on the assumption that the errors are normally distributed.
3. It specifically tests if the presence of heteroscedasticity causes the OLS formula for the
variances and the covariance of the estimates to be incorrect.
Example: Suppose that the regression model is given by
Yi = b1 + b2X2i + b3X3i + mi for i = 1, 2, …, n
It is assumed that the error variance has the following general structure.
si2 = a1 + a2X2i + a3X3i + a4X22i + a5X23i + a6X2iX3i+vi for i = 1, 2, …, n
Here, all the explanatory variables are included in the function that describes the error
variance, and a general functional form is used to describe the structure of heteroscedasticity.
The null-hypothesis of constant error variance (no heteroscedasticity) can be expressed as the
following restriction on the parameters of the heteroscedasticity equations:
Ho: a2 = a3 = a4 = a5 = a6 = 0
H1: At least one is non-zero
In order to test the null hypothesis, one can use a Lagrange multiplier test. This follows a chi-
square distribution with degrees of freedom equal to the number of restrictions imposed. In
the given example, 5 restrictions are imposed and, therefore, we have 5 degrees of freedom.
Once again, since the error variances si2 are unknown and unobservable, the squared
residuals are used as estimates of error variances. In order to carry out the Lagrange
Multiplier test, the following steps need to be followed.
Step 1: Regress Yi against a constant, X2i, and X3i using the OLS method.
Step 2: Calculate the residuals from this regression, .
Step 3: Square these residuals,
Step 4: Regress the squared residuals, , on a constant, X2i, X3i, X22i, X23i and X2iX3i using
the OLS method.
Step 5: Find the ordinary R2 for the auxiliary regression.
8
Step 6: Calculate the LM test statistic as follows: LM = nR2.
Step 7: Compare the value of the test statistic to the critical value for some predetermined
level of significance. If the calculated test statistic exceeds the critical value, reject the null-
hypothesis of constant error variance and conclude that there is heteroscedasticity. If not, do
not reject the null-hypothesis and conclude that there is no evidence of heteroscedasticity.
The following points should be noted about the White’s Test.
1. If one or more of the X’s are dummy variables, one must be careful when specifying the
auxiliary regression. For example, suppose X3 is a dummy variable. In this case, variable X23
is the same as variable X3. If both of these are included in the auxiliary regression, there will
be perfect multicollinearity. Therefore, one should exclude X23 from the auxiliary regression.
2. If there are a large number of explanatory variables in the model, the number of
explanatory variables in the auxiliary regression could exceed the number of observations. In
this case, one must exclude some of the variables from the auxiliary regression. One can use
the linear terms, and/or the cross-product terms; however, the squared terms should always be
included in the auxiliary regression.
Remedies for Heteroscedasticity
Suppose there is evidence of heteroscedasticity. If we use the OLS method, we will get
unbiased but inefficient estimators of the parameters. Also, the estimators of variances and
covariance of the parameter estimates will be biased and inconsistent making testing of
hypothesis invalid. In such cases, econometricians do one of two things.
1. Use the method of OLS to estimate the parameters of the model and correct the estimates
of variances and covariance so that they are consistent.
2. Use alternative estimators (i.e., other than the OLS estimators) such as GLS or WLS to
estimate the parameters of the model.
Many econometricians choose alternative #1. This is so because the most serious
consequence of using the OLS estimator when there is heteroscedasticity is that the estimates
of the variances and covariance of the parameter estimates are biased and inconsistent. If this
problem is corrected, then the only shortcoming of using OLS is that one loses some
precision relative to some other estimator that could have been used. However, to get more
9
precise estimates with an alternative estimator, one must know the approximate structure of
the heteroscedasticity. If the model of heteroscedasticity is specified wrongly, the alternative
estimator can yield estimates that are worse than the OLS estimator.
Heteroscedasticity Consistent Covariance Matrix (HCCM) Estimation
White developed a method for obtaining consistent estimates of the variance and covariance
of the OLS estimators. This is called the heteroscedasticity consistent covariance matrix
(HCCM) estimator. Most statistical packages have an option that allows you to calculate the
HCCM.
The GLS can be used when there are problems of heteroscedasticity and autocorrelation.
However, it has its own weaknesses.
Problems with using the GLS estimator
The major problem with the GLS estimator is that to use it one must know the true error
variance and standard deviation of the error for each observation in the sample. However, the
true error variance is always unknown and unobservable. Thus, the GLS estimator is not a
feasible estimator.
Feasible Generalized Least Squares (FGLS) estimator
The GLS estimator requires si to be known for each observation in the sample. To make the
GLS estimator feasible, one can use the sample data to obtain an estimate of si for each
observation in the sample. Subsequently, the GLS estimator can be applied using the
estimates of si. This estimator is called the Feasible Generalized Least Squares Estimator, or
FGLS estimator.
Example: Suppose that we have the following general linear regression model:
Yi = b1 + b2X2i + b3X3i + mi for i = 1, 2, …, n
Var(mi) = si2 = Some Function for i = 1, 2, …, n
The rest of the assumptions are the same as the classical linear regression model. It is
assumed that the error variance is a linear function of X 2i and X3i. Thus, we are assuming that
the heteroscedasticity has the following structure.
10
Var(mi) = st2 = a1 + a2X2i + a3X3i for i = 1, 2, …, n
To obtain FGLS estimates of the parameters b1, b2, and b3 proceed as follows.
Step 1: Regress Yi against a constant, X2i, and X3i using the OLS estimator.
Step 2: Calculate the residuals from this regression, .
Step 3: Square these residuals,
Step 4: Regress the squared residuals, , on a constant, X2i, and X3i, using OLS.
Step 5: Use the estimates of a1, a2, and a3 to calculate the predicted values . This is an
estimate of the error variance for each observation. Check the predicted values of Y. For any
predicted value that is non-positive replace it with the squared residual for that observation.
This ensures that the estimate of the variance is a positive number.
Step 6: Find the square root of the estimate of the error variance, for each observation.
Step 7: Calculate the weight wi = 1/ for each observation.
Step 8: Multiply Yi, , X2i, and X3i for each observation by its weight.
Step 9: Regress wiYi on wi, wiX2i, and wiX3i using OLS.
Properties of the FGLS Estimator
If the assumed model of heteroscedasticity is a reasonable approximation of the true

heteroscedasticity, then the FGLS estimator has the following properties. 1) It is non-linear.
2) It is biased in small samples. 3) It is asymptotically more efficient than the OLS estimator.
4) Monte Carlo studies suggest that it tends to yield more precise estimates than the OLS
estimator. However, if the assumed model of heteroscedasticity is not a reasonable
approximation of true heteroscedasticity, the FGLS estimator will yield worse estimates than
the OLS estimator.
11
Multicollinearity
One of the assumptions of classical linear regression model (VLRM) is that there are no exact
linear relationships among the independent variables and that there are at least as many
observations as the dependent variables (Rank of the regression). If either of these
assumptions is violated, it is impossible to get OLS estimator as the estimation procedure
simply breaks down.
In estimation, the number of observations should be greater than the number of parameters to
be estimated. The difference between the sample size the number of parameters (i.e., the
degree of freedom) should be as large as possible. Further, there could be approximate
relationships among the independent variables. Even though the estimation procedure does
not entirely breaks down when the independent variables are highly correlated, severe
estimation problems may arise.
There could be two types of multicollinearity problems: Perfect multicollinearity and less
than perfect multicollinearity. If multicollinearity is perfect, the regression coefficients of the
independent variables are indeterminate and their standard errors are infinite. On the other
hand, if multicollinearity is less than perfect, the regression coefficients, although are
determinate, possess large standard errors, which means the coefficients cannot be estimated
with great precision.
Sources of multicollinearity
1. The data collection method employed: For instance, sampling over a limited range.
2. Model specification: For instance adding polynomial terms.
3. An over determined model: This happens when the model has many independent variables
with similar implications.
4. In time series data, the regressors may share the same trend
Consequences of multicollinearity
1. Although BLUE, the OLS estimators have larger variances making precise estimation
difficult. The OLS estimators are BLUE because near multicollinearity does not affect the
assumptions made.
12
Var  1  
2 r2 
x 1i 2ix
2 , Where,
 x1i 1  r  x 1i
2
x2i 2
2
Var   2  
 x2i 2 1  r 2 
Both the denominators include the correlation coefficient between X1 and X2. When the
independent variables are uncorrelated, the correlation coefficient is zero. However, when the
correlation coefficient becomes very high (close to 1), severe multicollinearity is present with
the consequence that the estimated variances of both the parameters become very large.
Thus, even if it is believed that one or both of the variables ought to be in the model, one fails
to reject the null hypothesis because of the large standard errors. In other words, the presence
of multicollinearity makes the OLS estimators less precise.
2. The confidence intervals tend to be much wider, leading to the failure in rejecting the null
hypothesis
3. The t statistics may not be statistically significant and the coefficient of determination may
be high.
4. The OLS estimators and their standard errors may be sensitive to small changes in the data.
Detection of multicollinearity
The presence of multicollinearity makes it difficult to separate the individual effects of the
collinear variables on the dependent variable. Explanatory variables are rarely uncorrelated
with each other and multicollinearity is a matter of degree. The severity of multicollinearity
can be detected in the following ways:
1. A relatively high R 2 and significant F-statistics with a very few significant t- statistics.
2. Wrong signs of the regression coefficients
3. Examination of partial correlation coefficients among the independent variables.
13
4. Use subsidiary or auxiliary regressions. This involves regressing each independent variable
on the remaining independent variables and use of F-test to determine the significance of R 2 .
R 2 /  k  1
F
1  R  /  n  k 
2
5. Use of VIF (variance inflating factor)
Where, R2 is the multiple correlation coefficients among the independent variables.
In general, VIF > 5 is used to indicate the presence of multicollinearity among the continuous
independent variables. When the variables to be investigated are discrete in nature,
Contingency Coefficient (CC) is used.
Where, N is the total sample size.
If CC is greater than 0.75, the variables are said to be collinear.
Remedies of multicollinearity
Several methodologies have been proposed to overcome the problem of multicollinearity.

1. Do nothing: Sometimes multicollinearity is not necessarily bad or unavoidable. If the R 2 of
the regression exceeds the R 2 of the regression of any independent variable on other
explanatory variables, there should not be much worry. Also, if the t-statistics are all greater
than 2, there should not be much problem. If the estimated equation is used for prediction and
the multicollinearity problem is expected to prevail in the situation to be predicted, we should
not be concerned much about multicollinearity.
2. Drop a variable(s) from the model: This however could lead to specification error.
3. Acquiring additional information: Multicollinearity is a sample problem. In a sample

involving another set of observations, multicollinearity might not be present. Also, increasing
the sample size may help to reduce the severity of multicollinearity problem.
14
4. Rethinking of the model: Incorrect choice of functional form, specification errors, etc…
5. Prior information about some parameters of a model could also help to get rid of
multicollinearity.
6. Transformation of variables: e.g. into logarithms, forming ratios, etc…
7. Use partial correlation and stepwise regression
This involves determination of the relationship between a dependent variable and

independent variable(s) by netting out the effects of other independent variable(s). Suppose a
dependent variable Y is regressed on two independent variables X 1 and X2. Assume that the
two independent variables are collinear.
Y   0  1 X 1   2 X 2   i
The partial correlation coefficient between Y on X2 must be defined in such a way that it
measures the effect of X2 on Y which is not accounted for the other variables in the model. In
the present regression equation, this is done by finding the partial correlation coefficients that
are calculated by eliminating the linear effect of X2 on Y as well as that of X2 on X1 and thus
running the appropriate regression procedure. The procedure can be described as follows:
Run the regression of Y on X 2 and obtain the fitted values
Run the regression of X 2 on X 3 and obtain fitted values
Remove the influence of X 2 on bothY and X 1
 
Y * Y Y , X *1  X 1  X 1
The partial correlation between X and Y is then simple correlation between Y * and X *
1 1 .
The partial correlation of Y on X 1 is represented as rYX1 . X 2 (i.e. controlling for X2)
15
rYX1  Simple correlation between Y and X 1
rX1 X 2  Simple correlation between X 1 and X 2
rYX1  rYX 2 rX1 X 2

rYX1 . X 2 =
1  rX21 X 2 1  rYX2 2
Also the partial correlation of Y on X 2 keeping X 1 constant is represented as:
rYX 2  rYX1 rX1 X 2

rYX 2 . X 1 =
1  rX21 X 2 1  rYX2 1
We can also establish relationship between the partial correlation coefficient and the multiple
correlation coefficient  R  .
2
R 2  rYX2 2
r 2
YX1 . X 2 
1 r 2  
 1  R 2  1  rYX2 2 1  rYX2 1 . X 3 
YX 2
In stepwise regression procedure, one adds variables to the model to maximize the adjusted
coefficient of determination R .  
2
16
Specification Errors
One of the OLS assumptions is that the dependent variable can be calculated as a linear
function of a set of specific independent variables and an error term. This assumption is
crucial if the estimators are to be interpreted as decent “guesses” on effects from independent
variables on the dependent variable. Violations of this assumptions lead to what is generally
known as “specification errors”. One should always approach empirical studies in the social
sciences with the question “is the regression equation specified correctly?”
One particular type of specification error is excluding relevant regressors. This is crucial
when investigating the effects of one particular independent variable, say education, on the
dependent variable, like farm productivity. If one important variable line extension services is
missing from the regression equation, there is a chance of facing omitted variable bias. The
estimated effects of education can now be systematically over-or understated, because
extension services affect both education and farm productivity. The education coefficient will
incorporate some of the effects that are really due to extension services. Identifying all the
right “control variables” is a crucial task, and disputes over proper control variables can be
found everywhere in the social sciences.
Another variety of this specification error is including irrelevant controls. If objective is to

estimate the total, and not only the “direct”, effect of education on productivity, one should
not include variables that are theoretically expected to be intermediate variables. That is, one
should not include variables through which education affects productivity. One example
could be a specific type of policy, A. If one controls for policy A, one controls away the
effect of education on farm productivity that is more likely to be pushed through policy A. If
one controls for Policy A, total effect of education on farm productivity is not estimated.
Another specification error may arise due to assumption of a linear relationship when it is
non-linear. In many instances, variables are not related in a fashion that is close to linearity.
Transformations of variables can however often be made that allows be within an OLS-based
framework. If one suspects a U-or inversely U-shaped relationship between two variables, the
independent variable should be squared before entering it into the regression model. If one
suspects that the effect of an increase in the independent variable is larger at its lower levels,
log-transformation of the independent variable can be considered. The effects of an
17
independent variable might also be dependent upon the specific values taken by other
variables. The effects may also be different in different parts of the sample. Interaction terms
and delineations of the sample are two suggested ways to investigate such matters.
18

Econometrics A

Uploaded by

Copyright:

Available Formats

Econometrics A

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Econometrics A

Uploaded by

Copyright:

Available Formats

Autocorrelation

Why does Autocorrelation Occur

 Inertia-business cycles- momentum.

What are the consequences

As in the case of heteroscedasticity, in the presence of autocorrelation, OLS estimators

The error terms

(2) Formal Tests:

(a) Run’s Test

 We define a run as an uninterrupted sequence of a specific sign residual.

H0: residuals are independent-no autocorrelation

We construct a confidence interval

 If the actual no of runs falls outside this interval, reject H0

Durbin Watson d statistic

 We look at Durbin-Watson tables: They are organized in terms of a) sample size N

(i) If d < dL – positive autocorrelation

 Limitations of Durbin-Watson test

How do we correct for autocorrelation

(1) Assume =1, and estimate corresponding difference model i.e.

apply OLS. For small samples , where k = no. of coefficients

(including the intercept term).

(3) Cochrane-Orcutt iterative procedure:

 Fit the OLS equation

 Use these estimates, to calculate a new set of residuals

Note: If we have to estimate the procedure is known as estimated generalized least

Var(εi) = E(εi 2) = st2 for i = 1, 2, …, n

1. The OLS estimators are still unbiased.

2. The OLS estimators are inefficient; that is, it is not BLUE.

4. Conventional hypothesis testing is not valid.

Hence, application of OLS method requires detection of the problem of heteroscedasticity

Plot of the residuals

The residual is an unbiased estimate of the unknown and unobservable random

Some Important Tests:

Breusch-Pagan Test, and Harvey-Godfrey Test

Example: Suppose that the regression model is given by

Step 1: Regress Yi against a constant and Xi using the OLS methord.

squared residuals, ln( ) for the Harvey-Godfrey test.

These heteroscedasticity tests have two major shortcomings:

Example: Suppose that the regression model is given by

Yi = b1 + b2X2i + b3X3i + mi for i = 1, 2, …, n

si2 = a1 + a2X2i + a3X3i + a4X22i + a5X23i + a6X2iX3i+vi for i = 1, 2, …, n

H1: At least one is non-zero

Step 3: Square these residuals,

The following points should be noted about the White’s Test.

Remedies for Heteroscedasticity

Heteroscedasticity Consistent Covariance Matrix (HCCM) Estimation

Problems with using the GLS estimator

Feasible Generalized Least Squares (FGLS) estimator

Yi = b1 + b2X2i + b3X3i + mi for i = 1, 2, …, n

Var(mi) = si2 = Some Function for i = 1, 2, …, n

Step 3: Square these residuals,

Properties of the FGLS Estimator

If the assumed model of heteroscedasticity is a reasonable approximation of the true

2. Model specification: For instance adding polynomial terms.

2. Wrong signs of the regression coefficients

3. Examination of partial correlation coefficients among the independent variables.

5. Use of VIF (variance inflating factor)

Where, R2 is the multiple correlation coefficients among the independent variables.

Where, N is the total sample size.