0% found this document useful (0 votes)
19 views100 pages

Econometric S

Uploaded by

lemmademe204
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views100 pages

Econometric S

Uploaded by

lemmademe204
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 100

LECTURE NOTE ON ECONOMETRICS

CHAPTER ONE
INTRODUCTION

1.1 Definition of Econometrics


Econometrics is about how we can use economic or social science theory and data, along with
tools from statistics to answer “how much” type questions. It integrates mathematical
knowledge, statistical skills and economic theories to solve business and economic problems
of agribusiness firms. For instance economics tells us that the demand for a good is a function
of the good‟s price, and that in most cases, the price elasticity of demand is negative. But for
many practical purposes one may be interested to quantify the elasticity more accurately. For
such kind of questions econometrics can provide the answer.
Simply stated, econometrics means economic measurement. The “metric” part of the word
signifies measurement and econometrics is concerned with the measuring of economic
relationships.
It is a social science in which the tools of economic theory, mathematics and statistical
inference are applied to the analysis of economic phenomena (Arthur Goldberger).
In the words of Maddala, econometrics is “the application of statistical and mathematical
methods to the analysis of economic data, with a purpose of giving empirical content to
economic theories and verifying them or refuting them.”
Econometrics utilizes economic theory, as embodied in an econometric Model; facts, as
summarized by relevant data, and statistical theory, as refined into econometric techniques to
measure and to test empirically certain relationships among economic variables.
It is a special type of economic analysis and research in which the general economic theory
formulated in mathematical form (i.e. mathematical economics) is combined with empirical
measurement (i.e. statistics) of economic phenomena.

Definition of Econometrics (Greene, 2003)“Econometrics is the field of economics that


concerns itself with the application of mathematical statistics, and tools of statistical inference
to the empirical measurement of relationships postulated by economic theory.”

Set by Mekonnen A. DBU, 2022 Page 1


LECTURE NOTE ON ECONOMETRICS

1.2 Why a Separate Discipline?


As the definition suggests econometrics is an amalgam of economic theory, mathematical
statistics and economic statistics. But: a distinction has to be made between Econometrics,
and economic theory, statistics and mathematics.
1. Economic theory makes statements or hypotheses that are mostly of qualitative nature.
Example: Other things remaining constant (ceteris paribus) a reduction in the price of a
commodity is expected to increase the quantity demanded. And Economic theory postulates
an inverse relationship between price and quantity demanded of a commodity. But the theory
does not provide numerical value as the measure of the relationship between the two. Here
comes the task of the econometrician to provide the numerical value by which the quantity
will go up or down as a result of changes in the price of the commodity.
2. Economic statistics is concerned with collecting, processing and presenting economic data
(descriptive statistics).
Example: collecting and refining data on national accounts, index numbers, employment,
prices, etc.
3. Mathematical statistics and mathematical economics do provide much of the tools used in
Econometrics. But Econometrics needs special methods to deal with economic data which are
never experimental data.
Examples: Errors of measurement, problem of multicollinearity, problem of serial correlation
are only econometric problems and are not concerns of mathematical statistics.
Econometrics utilizes these data to estimate quantitative economic relationships and to test
hypothesis about them. The Econometrician is called upon to develop special methods of
analysis and deal with such kinds of Econometric problems.
1.3. Aims of econometrics:
The three main aims econometrics are as follows:
1. Formulation and specification of econometric models:
The economic models are formulated in an empirically testable form. Several econometric
models can be derived from an economic model. Such models differ due to different choice of
functional form,
specification of the stochastic structure of the variables etc.

Set by Mekonnen A. DBU, 2022 Page 2


LECTURE NOTE ON ECONOMETRICS

2. Estimation and testing of models:


The models are estimated on the basis of the observed set of data and are tested for their
suitability. This is the part of the statistical inference of the modelling. Various estimation
procedures are used to know the numerical values of the unknown parameters of the model.
Based on various formulations of statistical models, a suitable and appropriate model is
selected.
3. Use of models:
The obtained models are used for forecasting and policy formulation, which is an essential
part in any policy decision. Such forecasts help the policymakers to judge the goodness of the
fitted model and take necessary measures in order to re-adjust the relevant economic
variables.

1.4. THE METHODOLOGY OF ECONOMETRICS


Econometricians basically conduct or carry out their economic analysis in the
following eight ways;
i. Economic theory
Economic theory is the starting point or basis for econometrics work. For example, in
order to estimate a consumption function, one should study the economic theories of
consumption, e.g. Keynesian Consumption function. Having understood the theory,
you can then state your hypothesis.

ii. Mathematical model for the theory


The next step is to formulate a linear single equation mathematical model to the
theory, e.g. The Keynesian consumption theory, i.e. Y=f(x) such that Y=β1+β2X

Such that: Y = consumption expenditure; i.e. The independent variable


since it is determined within the model (or by the theory)

X = Income of the consumer, i.e. the independent or explanatory variable as itis


determined from outside the model.

β1= intercept coefficient (i.e. the value of Y when X=0)

Set by Mekonnen A. DBU, 2022 Page 3


LECTURE NOTE ON ECONOMETRICS

β2 = Slope coefficient (i.e. the change in Y brought about by a change in X. thus, it is


the marginal propensity to consume (MPC) in a consumption function).

Note that: i). β1 and β2 are the parameters of interest we need to estimate

ii) The consumption 0<MPC<1 is obtained from economic theory.

iii. Specification of the econometric model of the theory


The mathematical model Y=β1+β2X assumes that the variables X and Y have an exact
relationship; i.e. that all points (X, Y) will lie on the line, Y=β1+β2X.

However, not all points obey such a perfect or exact relationship. Instead, there exists
an inexact or stochastic relationship among most economic variables. Thus, relevant
econometric model is Y=β1+β2X+ u.

There are reasons for including an error term/disturbance term (u) in a model:

a) It captures other factors that explain Y which are not explained by X

b) It captures measurement errors- i.e., errors in measuring X and Y.

c) It captures specification errors, e.g. wrong specification

Set by Mekonnen A. DBU, 2022 Page 4


LECTURE NOTE ON ECONOMETRICS

Due to inexact nature of the relationship between X and Y, arising from individual line
variations, not all points will lie on the regression line:

iv. Obtaining data


Data is crucial for economic analysis. Thus we need to collect data on both X and Y if
we are to estimate the parameters of interest (β1 and β2).
v. Estimation of the econometric model

Once the data is available, we can then proceed to estimate β1 and β2. For example, we can
estimate the econometric model and get, say, β1= -30 and β2=0.5. Thus the econometric
model becomes Ŷt=-30+0.5Xt where Ŷ is read Ŷ hat. The hat (^) on Yt means that it is an
estimated value, not the actual value. Recall that β2 = 0.5 is thus called the marginal
propensity to consume (MPC). Regression analysis is the main statistical technique we use to
estimate β1 and β2 since it actually gives us the line of best fit.

vi. Hypothesis testing

Set by Mekonnen A. DBU, 2022 Page 5


LECTURE NOTE ON ECONOMETRICS

The Keynesian consumption theory reminds us that the marginal propensity to consume
(mpc), i.e. β2 obeys the rule: 0<MPC< 1. Now, we may want to test whether the results we
have obtained (MPC=0.5) conform to what theory says. This is actually hypothesis testing.

vii. Forecasting prediction

If hypothesis testing confirms that our results indeed conform to hat economic theory says, we
comfortably use our model for forecasts or predictions. For example, given, say X= 100, we
can predict Yt as Y t  30 0.5(100) 20

viii. Using the model for control or policy purposes:

Apart from forecasting or prediction, the econometric model can also be used for control or
policy purposes especially in evaluating the impact of a particular fiscal or monetary policy
by government on say , consumption in such a case, the particular policy is a control variable
and consumption is the target variable.

1.5. Economic models vs. Econometric models


A model is any representation of an actual phenomenon such as an actual system or process.
The real world system is represented by the model in order to explain it, to predict it, and to
control it. Any model represents a compromise between reality and manageability. A given
representation of real world system can be a model if it fulfills the following requirements.
(1) It must be a “reasonable” representation of the real world system and in that sense it
should be realistic.
(2) On the other hand it must be “manageable” in that it yields certain insights or
conclusions.
A good model is both realistic and manageable. A highly realistic but too complicated model
is a “bad” model in the sense it is not manageable. A model that is highly manageable but so
idealized that it is unrealistic not accounting for important components of the real world
system, is a “bad” model too. In general to find the proper balance between realism and
manageability is the essence of good Modeling. Thus a good model should, on the one hand,
specify the interrelationship among the parts of a system in a way that is sufficiently detailed
and explicit and, on the other hand, it should be sufficiently simplified and manageable to

Set by Mekonnen A. DBU, 2022 Page 6


LECTURE NOTE ON ECONOMETRICS

ensure that the model can be readily analyzed and conclusions can be reached concerning the
real world.
A) Economic models
Any economic theory is an observation from the real world. For one reason, the immense
complexity of the real world economy makes it impossible for us to understand all
interrelationships at once. Another reason is that all the interrelationships are not equally
important for the understanding of the economic phenomenon under study. The sensible
procedure is therefore, to pick up the important factors and relationships relevant to our
problem and to focus our attention on these alone.
B) Econometric models
The most important characteristic of economic relationships is that they contain a random
element which is ignored by mathematical economic models which postulate exact
relationships between economic variables.
Example: Economic theory postulates that the demand for a commodity depends on its price,
on the prices of other related commodities, on consumers‟ income and on tastes.

This is an exact relationship which can be written mathematically as:

Q  b0  b1 P  b2 P0  b3Y  b4 t

The above demand equation is exact. However, many more factors may affect demand. In
econometrics the influence of these „other‟ factors is taken into account by the introducing
random variable. In our example, the demand function studied with the tools of econometrics
would be of the stochastic form:

Q  b0  b1 P  b2 P0  b3Y  b4 t  u , where u stands for the random factors which affect the

quantity demanded. The random term (also called error term or disturbance term) is a
surrogate variable for important variables excluded from the model, errors committed and
measurement errors.
Econometric models consist of the following four basic structural elements.

i) A set of variables

Set by Mekonnen A. DBU, 2022 Page 7


LECTURE NOTE ON ECONOMETRICS

ii) A list of fundamental relationships

iii) A number of strategic coefficients

iv) A random term

Desirable Properties of an Econometric Model


An econometric model is a model whose parameters have been estimated with some
appropriate econometric technique. The „goodness‟ of an econometric model is judged
customarily based on the following desirable properties.

a. Theoretical plausibility: The model should be compatible with the postulates of economic
theory and adequately describe the economic phenomena to which it relates.

b. Explanatory ability: The model should be able to explain the observations of the actual
world. It must be consistent with the observed behavior of the economic variables whose
relationship it determines.

c. Accuracy of the estimates of the parameter: The estimates of the coefficients should be
accurate in the sense that they should approximate as best as possible the true parameters
of the structural model. The estimates should if possible possess the desirable properties
of unbiasedness, consistency and efficiency.

d. Forecasting ability: The model should produce satisfactory predictions of future values of
the dependent (endogenous) variables.

e. Simplicity: The model should represent the economic relationships with maximum
simplicity. The fewer the equations and the simpler their mathematical form, the better
the model provided that the other desirable properties are not affected by the
simplifications of the model.

1.6. Goals of Econometrics


Basically there are three main goals of Econometrics. They are:

i) Analysis i.e. testing economic theory


ii) Policy making i.e. obtaining numerical estimates of the coefficients of economic
relationships for policy simulations.
iii) Forecasting i.e. using the numerical estimates of the coefficients in order to forecast the
future values of economic magnitudes.

Set by Mekonnen A. DBU, 2022 Page 8


LECTURE NOTE ON ECONOMETRICS

1.7. Elements of Econometrics


The four elements of econometrics are explained below.
a. Data
Collecting and coding the sample data, the raw material of econometrics. Most economic
data is observational, or non-experimental, data (as distinct from experimental data generated
under controlled experimental conditions).
b. Specification
Specification of the econometric model that we think (hope) generated the sample data -- that
is, specification of the data generating process (or DGP).
An econometric model consists of two components:
i) An economic model: specifies the dependent or outcome variable to be explained and
the independent or explanatory variables that we think are related to the dependent
variable of interest. Often suggested or derived from economic theory.
ii) A statistical model: specifies the statistical elements of the relationship under
investigation, in particular the statistical properties of the random variables in the
relationship.
c. Estimation
It consists of using the assembled sample data on the observable variables in the model to
compute estimates of the numerical values of all the unknown parameters in the model.
d. Inference
It consists of using the parameter estimates computed from sample data to test hypotheses
about the numerical values of the unknown population parameters that describe the behaviour
of the population from which the sample was selected.
1.8. Types of Econometrics
There are two types of econometrics. These are theoretical and applied econometrics.
A) Theoretical Econometrics
Theoretical econometrics is concerned with the development of appropriate methods for
measuring economic relationships specified by econometric models. In this aspect,
econometrics leans heavily on mathematical statistics. For example, one of the methods is
least squares.
B) Applied Econometrics

Set by Mekonnen A. DBU, 2022 Page 9


LECTURE NOTE ON ECONOMETRICS

Applied econometrics deals with the application of econometric methods developed in the
theoretical econometrics by employing economic data. In applied econometrics, we use the
tools of theoretical econometrics to study some special field(s) of economics and business,
such as the production function, investment function, demand and supply functions, portfolio
theory, etc.
In summary, an econometrician is concerned with:

i) Measuring relationships among economic variables and estimating their


parameters;
ii) Testing the theoretical ideas represented by such relationships (i.e. hypothesis
testing so as to verify or refute the theory); and
iii) Using such relationship for forecasting, prediction or policy purposes.
1.9. LIMITATIONS OF RELYING ON ECONOMIC THEORY

As already noted economic theory must always form the basis or starting point of any
econometric analysis. However, relying on theory alone is not sufficient, because theory
has the following limitations:

i. Theory does not tell us how to define our variables


For example, in the estimation of C    bYd; theory does not tell us whether C is
private consumption or public consumption but simply aggregate consumption,
etc.
ii. Theory tells us nothing about the precise functional form of the equation;

i.e., whether the relationship is a simple linear function, e.g. C    bYd or a log- linear
function, e.g. logC  1  2Yd or even a log- linear function, e.g. logC  log1   2 logYd;
and so on.

iii. Theory provides use only with qualitative information about the relationship but no
attempt at quantitative information. For example, theory just tells us that a rise in Yd will lead
to a rise in C, i.e. that MPC lies in the interval: 0<MPC<1. But an econometrician would like
to know a specific value for MPC.

Set by Mekonnen A. DBU, 2022 Page 10


LECTURE NOTE ON ECONOMETRICS

v. Theory only provides information about long-run equilibrium relationships. However


an econometrician may perhaps be interested in knowing, not only the long run
equilibrium, but also the short-run disequilibrium to equilibrium position (dynamic
equilibrium) and so on.
For this reason, econometricians combine theory, mathematics and statistics.

1.10. Structure of Economic data


Economic data can be categorized into:
a. Cross-sectional data
b. Time series data
c. Pooled cross section data
d. Panel or longitudinal data
A. Cross sectional data
A cross-sectional data set consists of a sample of individuals, households, firms, cities, states,
countries or a variety of other units, taken at a given point in time. Data on individuals,
households, firms, and cities at a given point in time are examples of cross sectional data.

Set by Mekonnen A. DBU, 2022 Page 11


LECTURE NOTE ON ECONOMETRICS

Set by Mekonnen A. DBU, 2022 Page 12


LECTURE NOTE ON ECONOMETRICS

B. Time series data


A time series data set consists of observations on a variable or several variables over time.
Stock prices, money supply, consumer price index, gross domestic product, automobile sales
figures, inflation rates and unemployment rates collected over a period of time, daily, weekly,
monthly or every year are examples of time series data.

Set by Mekonnen A. DBU, 2022 Page 13


LECTURE NOTE ON ECONOMETRICS

C. Pooled Cross Sections


Pooled cross sections are data sets that have both cross-sectional and time series features. For
example, suppose that two cross-sectional household surveys are taken in the United States,
one in 1985 and one in 1990. In 1985, a random sample of households is surveyed for
variables such as income, savings, family size, and so on.

Set by Mekonnen A. DBU, 2022 Page 14


LECTURE NOTE ON ECONOMETRICS

D. Panel data or Longitudinal Data


A panel data set consists of a time series for each cross-sectional member in the data set. An
example is a household survey (census) conducted every 10 years in Ethiopia. Note that it is
common to denote each observation by the letter t and the total number of observations by T
for time series data, and to denote each observation by the letter i and the total number of
observations by N for cross-sectional data.

Set by Mekonnen A. DBU, 2022 Page 15


LECTURE NOTE ON ECONOMETRICS

Set by Mekonnen A. DBU, 2022 Page 16


LECTURE NOTE ON ECONOMETRICS

CHAPTER TWO

CORRELATION ANALYSIS

2.1 Introduction
Economic variables have a great tendency of moving together and very often data are given in
pairs of observations in which there is a possibility that the change in one variable is on
average accompanied by the change of the other variable. This situation is known as
correlation.
Correlation may be defined as the degree of relationship existing between two or more
variables. The degree of relationship existing between two variables is called simple
correlation. The degree of relationship connecting three or more variables is called multiple
correlations. In this unit, we shall examine only simple correlation. A correlation is also said
to be partial if it studies the degree of relationship between two variables keeping all other
variables connected with these two are constant.
Correlation may be linear, when all points (X, Y) on scatter diagram seem to cluster near a
straight, or nonlinear, when all points seem to lie near a curve. In other words, correlation is
said to be linear if the change in one variable brings a constant change of the other. It may be
non-linear if the change in one variable brings a different change in the other.
Correlation may also be positive or negative. Correlation is said to positive if an increase or a
decrease in one variable is accompanied by an increase or a decrease by the other in which
both variables are changed with the same direction. For example, the correlation between
price of a commodity and its quantity supplied is positive since as price rises, quantity
supplied will be increased and vice versa. Correlation is said to negative if an increase or a
decrease in one variable is accompanied by a decrease or an increase in the other in which
both are changed with opposite direction. For example, the correlation between price of a
commodity and its quantity demanded is negative since as price rises, quantity demanded will
be decreased and vice versa.

2.2 Methods of Measuring Correlation


In correlation analysis there are two important things to be addressed. These are the type of
co-variation existed between variables and its strength. And the types of correlation

Set by Mekonnen A. DBU, 2022 Page 17


LECTURE NOTE ON ECONOMETRICS

mentioned before do not show to us the strength of co-variation between variables. There are
three methods of measuring correlation. These are:
a. The Scattered Diagram or Graphic Method
b. The Simple Linear Correlation coefficient
c. The coefficient of Rank Correlation

A. The Scattered Diagram or Graphic Method


The scatter diagram is a rectangular diagram which can help us in visualizing the relationship
between two phenomena. It puts the data into X-Y plane by moving from the lowest data set
to the highest data set. It is a non-mathematical method of measuring the degree of co-
variation between two variables. Scatter plots usually consist of a large body of data. The
closer the data points come together and make a straight line, the higher the correlation
between the two variables, or the stronger the relationship.
If the data points make a straight line going from the origin out to high x- and y-values, then
the variables are said to have a positive correlation. If the line goes from a high-value on the
y-axis down to a high-value on the x-axis, the variables have a negative correlation.

Set by Mekonnen A. DBU, 2022 Page 18


LECTURE NOTE ON ECONOMETRICS

Figure 1: Perfect linear correlations


A perfect positive correlation is given the value of 1. A perfect negative correlation is given
the value of -1. If there is absolutely no correlation present the value given is 0. The closer the
number is to 1 or -1, the stronger the correlation, or the stronger the relationship between the
variables. The closer the number is to 0, the weaker the correlation. There is a high degree of
positive correlation between sales and profit.
Note: Ideally, the sample correlation coefficient should lie between -1 and +1; (-1 ≤ r ≤ +1)
such that:

If r = -1, there is a perfect negative correlation

SET BY MEKONNEN A. 19
LECTURE NOTE ON ECONOMETRICS

If -1< r < -0.75, there is a strong negative correlation


If -0.75< r< -0.25, there is a fair negative correlation
If -0.25< r< 0, there is weak negative correlation.
If r =0, there is no correlation
If 0< r < 0.25, there is a weak positive correlation
If 0.25< r < 0.75, there is a fair positive correlation
If 0.75< r < 1, there is a strong positive correlation
If r =1, there is a perfect positive correlation

These possible ranges of sample correlation coefficients are illustrated below

SET BY MEKONNEN A. 20
LECTURE NOTE ON ECONOMETRICS

r=0

2.3 The Pearson’s correlation coefficient

In the light of the above discussions it appears clear that we can determine the kind of
correlation between two variables by direct observation of the scatter diagram. In addition, the
scatter diagram indicates the strength of the relationship between the two variables. This
section is about how to determine the type and degree of correlation using a numerical result.
For a precise quantitative measurement of the degree of correlation between Y and X we use a
parameter which is called the correlation coefficient and is usually designated by the Greek
letter. Having as subscripts the variables whose correlation it measures,  refers to the
correlation of all the values of the population of X and Y. Its estimate from any particular
sample (the sample statistic for correlation) is denoted by r with the relevant subscripts. For
example if we measure the correlation between X and Y the population correlation coefficient
is represented by xy and its sample estimate by rxy. The simple correlation coefficient is used
to measure relationships which are simple and linear only. It cannot help us in measuring non-
linear as well as multiple correlations. Sample correlation coefficient is defined by the
formula

SET BY MEKONNEN A. 21
LECTURE NOTE ON ECONOMETRICS

Or x i yi
2.2
rxy 
x i
2
y i
2

Where, xi  X i  X and y i  Yi - Y

We will use a simple example from the theory of supply. Economic theory suggests that the
quantity of a commodity supplied in the market depends on its price, ceteris paribus. When
price increases the quantity supplied increases, and vice versa. When the market price falls
producers offer smaller quantities of their commodity for sale. In other words, economic
theory postulates that price (X) and quantity supplied (Y) are positively correlated.

Example 2.1: The following table shows the quantity supplied for a commodity with the
corresponding price values. Determine the type of correlation that exists between these two
variables.
Table 1: Data for computation of correlation coefficient
Time period(in days) Quantity supplied Yi (in tons) Price Xi (in Birr)
1 10 2
2 20 4
3 50 6
4 40 8
5 50 10
6 60 12
7 80 14
8 90 16
9 90 18
10 120 20

To estimate the correlation coefficient we, compute the following results.

SET BY MEKONNEN A. 22
LECTURE NOTE ON ECONOMETRICS

Table 2: Computations of inputs for correlation coefficients


Y X xi  X i  X yi  Yi  Y xi2 yi2 xiyi XY X2 Y2
10 2 -9 -51 81 2601 459 20 4 100
20 4 -7 -41 49 1681 287 80 16 400
50 6 -5 -11 25 121 55 300 36 2500
40 8 -3 -21 9 441 63 320 64 1600
50 10 -1 -11 1 121 11 500 100 2500
60 12 1 -1 1 1 -1 720 144 3600
80 14 3 19 9 361 57 1120 196 6400
90 16 5 29 25 841 145 1440 256 8100
90 18 7 29 49 841 203 1620 324 8100
120 20 9 59 81 3481 531 2400 400 14400
Sum=610 110 0 0 330 10490 1810 8520 1540 47700
Mean=61 11

n XY   X  Y
10(8520)  (110)(610)
r   0.975
10(1540)  (110)(110) 10(47700)  (610)(610)
n X  ( X )
2 2
n Y  ( Y )
2 2

Or using the deviation form (Equation 2.2), the correlation coefficient can be computed as:

1810
r  0.975
330 10490

This result shows that there is a strong positive correlation between the quantity supplied and
the price of the commodity under consideration.
The simple correlation coefficient has the value always ranging between -1 and +1. That
means the value of correlation coefficient cannot be less than -1 and cannot be greater than
+1. Its minimum value is -1 and its maximum value is +1. If r= -1, there is perfect negative
correlation between the variables. If 0  r  1 , there is positive correlation between the two
variables and movement from zero to positive one increases the degree of positive correlation.

SET BY MEKONNEN A. 23
LECTURE NOTE ON ECONOMETRICS

If r= +1, there is perfect positive correlation between the two variables. If the correlation
coefficient is zero, it indicates that there is no linear relationship between the two variables. If
the two variables are independent, the value of correlation coefficient is zero but zero
correlation coefficient does not show us that the two variables are independent.
Properties of Simple Correlation Coefficient
The simple correlation coefficient has the following important properties:
1. The value of correlation coefficient always ranges between -1 and +1.
2. The correlation coefficient is symmetric. That means rxy  ryx , where, rxy is the correlation

coefficient of X on Y and ryx is the correlation coefficient of Y on X.

3. The correlation coefficient is independent of change of origin and change of scale. By


change of origin we mean subtracting or adding a constant from or to every values of a
variable and change of scale we mean multiplying or dividing every value of a variable by a
constant.
4. If X and Y variables are independent, the correlation coefficient is zero. But the converse is
not true.
5. The correlation coefficient has the same sign with that of regression coefficients.
6. The correlation coefficient is the geometric mean of two regression coefficients.
r  b yx * bxy

Though, correlation coefficient is most popular in applied statistics and econometrics, it has
its own limitations. The major limitations of the method are:
1. The correlation coefficient always assumes linear relationship regardless of the fact
whether the assumption is true or not.
2. Great care must be exercised in interpreting the value of this coefficient as very often the
coefficient is misinterpreted. For example, high correlation between lung cancer and smoking
does not show us smoking causes lung cancer.
3. The value of the coefficient is unduly affected by the extreme values
5. The coefficient requires the quantitative measurement of both variables. If one of the two
variables is not quantitatively measured, the coefficient cannot be computed.

SET BY MEKONNEN A. 24
LECTURE NOTE ON ECONOMETRICS

B. The Rank Correlation Coefficient


The formulae of the linear correlation coefficient developed in the previous section are based
on the assumption that the variables involved are quantitative and that we have accurate data
for their measurement. However, in many cases the variables may be qualitative (or binary
variables) and hence cannot be measured numerically. For example, profession, education,
preferences for particular brands, are such categorical variables. Furthermore, in many cases
precise values of the variables may not be available, so that it is impossible to calculate the
value of the correlation coefficient with the formulae developed in the preceding section. For
such cases it is possible to use another statistic, the rank correlation coefficient (or spearman‟s
correlation coefficient.). We rank the observations in a specific sequence for example in order
of size, importance, etc., using the numbers 1, 2, 3, …, n. In other words, we assign ranks to
the data and measure relationship between their ranks instead of their actual numerical values.
Hence, the name of the statistic is given as rank correlation coefficient. If two variables X and
Y are ranked in such way that the values are ranked in ascending or descending order, the
rank correlation coefficient may be computed by the formula

6 D 2
r'  1 
2.3
n( n 2  1)

Where,
D = difference between ranks of corresponding pairs of X and Y
n = number of observations.
The values that r may assume range from + 1 to – 1.
Two points are of interest when applying the rank correlation coefficient. Firstly, it does not
matter whether we rank the observations in ascending or descending order. However, we must
use the same rule of ranking for both variables. Second if two (or more) observations have the
same value we assign to them the mean rank. Let‟s use example to illustrate the application of
the rank correlation coefficient.
Example 2.2: A market researcher asks experts to express their preference for twelve different
brands of soap. Their replies are shown in the following table.
Table 3: Example for rank correlation coefficient
Brands of soap A B C D E F G H I J K L

SET BY MEKONNEN A. 25
LECTURE NOTE ON ECONOMETRICS

Person I 9 10 4 1 8 11 3 2 5 7 12 6
Person II 7 8 3 1 10 12 2 6 5 4 11 9

The figures in this table are ranks but not quantities. We have to use the rank correlation
coefficient to determine the type of association between the preferences of the two persons.
This can be done as follows.

Table 4: Computation for rank correlation coefficien

Brands of soap A B C D E F G H I J K L Total

Person I 9 10 4 1 8 11 3 2 5 7 12 6
Person II 7 8 3 1 10 12 2 6 5 4 11 9
Di -2 -2 -1 0 2 1 -1 4 0 -3 -1 -2
2
Di 4 4 1 0 4 1 1 16 0 9 1 4 45

The rank correlation coefficient (using Equation 2.3)


= = 0.825

This figure, 0.825, shows a marked similarity of preferences of the two persons for the
various brands of soap.
2.4 Limitations of the Theory of Linear Correlation
Correlation analysis has serious limitations as a technique for the study of economic
relationships.
Firstly: The above formulae for r apply only when the relationship between the variables is
linear. However two variables may be strongly connected with a nonlinear relationship.
It should be clear that zero correlation and statistical independence of two variables (X and Y)
are not the same thing. Zero correlation implies zero covariance of X and Y so that r=0.

SET BY MEKONNEN A. 26
LECTURE NOTE ON ECONOMETRICS

Statistical independence of x and y implies that the probability of xi and yi occurring


simultaneously is the simple product of the individual probabilities
P (x and y) = p (x) p (y)
Independent variables do have zero covariance and are uncorrelated: the linear correlation
coefficient between two independent variables is equal to zero. However, zero linear
correlation does not necessarily imply independence. In other words uncorrelated variables
may be statistically dependent. For example if X and Y are related so that the observations fall
on a circle or on a symmetrical parabola, the relationship is perfect but not linear. The
variables are statistically dependent.
Secondly: Although the correlation coefficient is a measure of the co-variability of variables,
it does not necessarily imply any functional relationship between the variables concerned.
Correlation theory does not establish, and/ or prove any causal relationship between the
variables. It seeks to discover a co-variation exists, but it does not suggest that variations in,
say, Y are caused by variations in X, or vice versa. Knowledge of the value of r, alone, will
not enable us to predict the value of Y from X. A high correlation between variables Y and X
may describe any one of the following situations:
(1) variation in X is the cause of variation in Y
(2) variation in Y is the cause of variation X
(3) Y and X are jointly dependent, or there is a two- way causation, that is to say Y is the
cause of (is determined by) X, but also X is the cause of (is determined by) Y. For
example in any market: q = f (p), but also p = f(q), therefore there is a two – way
causation between q and p, or in other words p and q are simultaneously determined.
(4) there is another common factor (Z), that affects X and Y in such a way as to show a close
relation between them. This often occurs in time series when two variables have strong
time trends (i.e. grow over time). In this case we find a high correlation between Y and X,
even though they happen to be causally independent,
(5) The correlation between X and Y may be due to chance.

SET BY MEKONNEN A. 27
LECTURE NOTE ON ECONOMETRICS

Chapter Three

Simple Linear Regression Analysis


1. Introduction

Economic theories are mainly concerned with the relationships among various
economic variables. These relationships, when phrased in mathematical terms, can
predict the effect of one variable on another. The functional relationships of these
variables define the dependence of one variable upon the other variable (s) in the
specific form. In this regard regression model is the most commonly used and
appropriate technique of econometric analysis. Regression analysis refers to
estimating functions showing the relationship between two or more variables and
corresponding tests. This section introduces students with the concept of simple
linear regression analysis. It includes estimating a simple linear function between
two variables. We will restrict our discussion in this part only to two variables and
deal with more variables in the next section.

2. Ordinary Least Square Method (OLS)


There are two major ways of estimating regression functions. These are the
(ordinary) least square (OLS) method and maximum likelihood (MLH) method. Both
the methods are basically similar to their application in estimations that you may be
aware of in statistics courses. The ordinary least square method is the easiest and the
most commonly used method as opposed to the maximum likelihood (MLH) method
which is limited by its assumptions. For instance, the MLH method is valid only for
large sample as opposed to the OLS method which can be applied to smaller
samples. Owing to this merit, our discussion mainly focuses on the ordinary least
square (OLS).

The (Ordinary) least square (OLS) method of estimating parameters or regression


function is about finding or estimating values of the parameters (  and  ) of the

SET BY MEKONNEN A. 28
LECTURE NOTE ON ECONOMETRICS

simple linear regression function given below for which the errors or residuals are
minimized. Thus, it is about minimizing the residuals or the errors.

Yi    X i  U i 2.5

The above identity represents population regression function (to be estimated from
total enumeration of data from the entire population). But, most of the time it is
difficult to generate population data owing to several reasons; and most of the time
we use sample data and we estimate sample regression function. Thus, we use the
following sample regression function for the derivation of the parameters and related
analysis.

Yˆi  ˆ  ˆX i 2.6

Before discussing the details of the OLS estimation techniques, let‟s see the major
conditions that are necessary for the validity of the analysis, interpretations and
conclusions of the regression function. These conditions are known as classical
assumptions. In fact most of these conditions can be checked and secured very easily.

3. Classical Assumptions

For the validity of a regression function and its attributes the data we use or the
terms related to our regression function should fulfill the following conditions
known as classical assumptions.

i. The error terms „Ui‟ are randomly distributed or the disturbance terms are not
correlated. This means that there is no systematic variation or relation among the
value of the error terms (Ui and Uj); Where i = 1, 2, 3, …….., j = 1, 2, 3, ……. and
i  j . This is represented by zero covariance among the error terms summarized
as follows:

Cov (Ui , Uj) = 0 for i  j . Note that the same argument holds for residual terms

when we use sample data or sample regression function. Thus, Cov (ei, ej) = 0 for
i  j . Otherwise, the error terms do not serve an adjustment purpose rather it
causes an autocorrelation problem.

SET BY MEKONNEN A. 29
LECTURE NOTE ON ECONOMETRICS

ii. The disturbance terms „Ui‟ have zero mean. This implies the sum of the individual
disturbance terms is zero. The deviations of the values of some of the disturbance
terms are negative; some are zero and some are positive and the sum or the average
is zero. This is given by the following identities.

U i
E(Ui) =  0 . Multiplying both sides by (sample size „n‟) we obtain the
n
following.  E(Ui) =  U i  0 . The same argument is true for sample

regression function and so for residual terms given as follows  ei 0

If this condition is not met, then the position of the regression function (or
curve) will not be the same as where it is supposed to be. This results in an
upward (if the mean of the error term or residual term is positive) or down
ward (if the mean of the error term or residual term is negative) shift in the
regression function. For instance, suppose we have the following regression
function.

i   0  1  i  Ui

i   E (Yi / X i )   0  1  i   U i 

i   E (Yi / X i )   0  1 X i if E (U i )  0

Otherwise the estimated models will be biased and cause the regression function
to shift. For instance, if E (U i )  0 (or positive) it is going to shift the estimation

upward from the true representative model. Similar argument is true for residual
term of sample regression function. This is demonstrated by the following figure.

SET BY MEKONNEN A. 30
LECTURE NOTE ON ECONOMETRICS

Figure 2: Regression Function/curve if the mean of error term is not zero

iii. The disturbance terms have constant variance in each period. This is given as

Var U i   E ((U i  E (U i )) 2   =  u . This assumption is known


2
follows:
2

as the assumption of homoscedasticity. If this condition is not fulfilled or if the


variance of the error terms varies as sample size changes or as the value of
explanatory variables changes, then this leads to Heteroscedasticity problem.

iv. Explanatory variables „Xi‟ and disturbance terms „Ui‟ are uncorrelated or
independent. All the co-variances of the successive values of the error term are

equal to zero. This condition is given by Cov U i , X i   0 . It is followed from this

that the following identity holds true;  ei X i  0 . The value in which the error

term assumed in one period does not depend on the value in which it assumed in
any other period. If this condition is not met by our data or variables, our
regression function and conclusions to be drawn from it will be invalid. This
assumption is known as the assumption of non-autocorrelation or non-serial
correlation.

SET BY MEKONNEN A. 31
LECTURE NOTE ON ECONOMETRICS

v. The explanatory variable Xi is fixed in repeated samples. Each value of Xi does not
vary for instance owing to change in sample size. This means the explanatory
variables are non-random and hence distributional free variable.

vi. Linearity of the model in parameters. The simple linear regression requires
linearity in parameters; but not necessarily linearity in variables. The same
technique can be applied to estimate regression functions of the following forms: Y
= f (X ); Y = f (X 2); Y = f (X 3); Y = f (X – kX ); and so on. What is important is
transforming the data as required.

vii. Normality assumption-The disturbance term Ui is assumed to have a normal


distribution with zero mean and a constant variance. This assumption is given as
follows:

Ui ˜ 
N 0,  u
2
 . This assumption is a combination of zero mean of error term
assumption and homoscedasticity assumption. This assumption or combination of
assumptions is used in testing hypotheses about significance of parameters. It is also
useful in both estimating parameters and testing their significance in maximum
likelihood method.

viii. Explanatory variables should not be perfectly, linearly and/or highly correlated.
Using explanatory variables which are highly or perfectly correlated in a regression
function causes a biased function or model. It also results in multicollinearity
problem.

ix. The relationship between variables (or the model) is correctly specified. For
instance all the necessary variables are included in the model. The variables are in
the form that best describes the functional relationship. For instance, “Y = f (X 2)”
may better reflect the relationship between Y and X than “Y = f (X )”.

x. The explanatory variables do not have identical value. This assumption is very
important for improving the precision of estimators.

SET BY MEKONNEN A. 32
LECTURE NOTE ON ECONOMETRICS

Note that some of these assumptions or conditions, (those which imply to more than
one explanatory variables), are meant for the next chapters (along with all the other
assumptions or conditions). So, we may not restate these conditions in the next
chapter even if they are required there also.

4. OLS Method of Estimation

Estimating a linear regression function using the Ordinary Least Square (OLS)
method is simply about calculating the parameters of the regression function for
which the sum of square of the error terms is minimized. The procedure is given as
follows. Suppose we want to estimate the following equation

Yi   0  1 X i  U i

Since most of the time we use sample (or it is difficult to get population data) the
corresponding sample regression function is given as follows.

Yˆi  ˆ0  ˆ1 X i

From this identity, we solve for the residual term ' ei ' , square both sides and then

take sum of both sides. These three steps are given (respectively as follows.

ei  Yi  Yˆi  Yi  ˆ0  ˆ1 X i 2.7

2

 ei   Yi  ˆ0  ˆ1 X i 2
2.8

Where,  ei  RSS= Residual Sum of Squares.


2

The method of OLS involves finding the estimates of the intercept and the slope for
which the sum squares given by the Equation is minimized. To minimize the residual
sum of squares we take the first order partial derivatives of Equation 2.8 and equate
them to zero.

That is, the partial derivative with respect to ˆ 0 :

SET BY MEKONNEN A. 33
LECTURE NOTE ON ECONOMETRICS

  ei
 
2
 2  Yi  ˆ0  ˆX i (1)  0 2.9
ˆ
0

 Y i  ˆ0  ˆ1 X i  0  2.10

  Yi  nˆ0  ˆ1  X i =0 2.11

 Y i  nˆ0  ˆ1  X i 2.12

Where n is the sample size.

Partial derivative With respect to ˆ1


  ei
 
2
 2  Yi  ˆ0  ˆ1 X i ( X i )  0 2.13
ˆ 1

 Y Xi i  ˆ0 X i  ˆ1 X i2  0  2.14

  X i Yi  ˆ0  X i  ˆ1  X i 2.15


2

^ ^
Note that the equation (Yi   0  1 X i ) 2 is a composite function and we should

apply a chain rule in finding the partial derivatives with respect to the parameter
estimates.

Equations 2.12 and 2.15 are together called the system of normal equations. Solving
the system of normal equations simultaneously we obtain:

n XY  ( X )( Y )
ˆ1 
Or
n X 2  ( X ) 2

_ _
  XY  n Y X
1  _ and we have ˆ 0  Y  ˆ1 X from above
X nX i
2 2

SET BY MEKONNEN A. 34
LECTURE NOTE ON ECONOMETRICS

Now we have the formula to estimate the simple linear regression function. Let us
illustrate with example.

Example 2.4: Given the following sample data of three pairs of „Y‟ (dependent
variable) and „X‟ (independent variable), find a simple linear regression function; Y
= f(X).

Yi Xi
10 30
20 50
30 60

a) find a simple linear regression function; Y = f(X)

b) Interpret your result.

c) Predict the value of Y when X is 45.

Solution

xi. To fit the regression equation we do the following computations.

Yi Xi Yi Xi Xi2
10 30 300 900
20 50 1000 2500
30 60 1800 3600
Sum 60 140 3100 7000
Mean Y = 20 140
X=
3

n XY  ( X )( Y )
3(3100)  (140)(60)
ˆ1    0.64
3(7000)  (140) 2
n X 2  ( X ) 2

ˆ0  Y  ˆ1 X  20  0.64(140 / 3)  10

SET BY MEKONNEN A. 35
LECTURE NOTE ON ECONOMETRICS

Thus the fitted regression function is given by: Yˆi   10  0.64 X i

b) Interpretation, the value of the intercept term,-10, implies that the value of the
dependent variable „Y‟ is – 10 when the value of the explanatory variable is

zero. The value of the slope coefficient ( ˆ  0.64 ) is a measure of the marginal

change in the dependent variable „Y‟ when the value of the explanatory
variable increases by one. For instance, in this model, the value of „Y‟ increases
on average by 0.64 units when „X‟ increases by one.

c) Yˆi   10  0.64 X i =-10+(0.64)(45)=18.8

That means when X assumes a value of 45, the value of Y on average is expected
to be 18.8. The regression coefficients can also be obtained by simple formulae by
taking the deviations between the original values and their means. Now, if

xi  X i  X , and yi  Yi  Y

Then, the coefficients can be represented by alternative formula given below

x y
i i

ˆ1  , and ˆ0  Y  ˆ1 X


x
2
i

Example 2.5: Find the regression equation for the data under Example 2.4, using
the shortcut formula. To solve this problem we proceed as follows.

Yi Xi y X xy x2 y2
10 30 -10 -16.67 166.67 277.78 100
20 50 0 3.33 0.00 11.11 0
30 60 10 13.33 133.33 177.78 100
Sum 60 140 0 0 300.00 466.67 200
Mean 20 46.66667

Then

SET BY MEKONNEN A. 36
LECTURE NOTE ON ECONOMETRICS

x i yi
300 , and ˆ0  Y  ˆ1 X =20-(0.64) (46.67) = -10 with results
ˆ1    0.64
466.67
x i
2

similar to previous case.

5. Mean and Variance of Parameter Estimates

We have seen that how the numerical values of the parameter estimates can be
obtained using OLS estimating techniques. Now let us see their distributional nature,
i.e. the mean and variance of the parameter estimates. There can be several samples
of the same size that can be drawn from the same population. For each sample, the
parameter estimates have their own specific numerical values. That means the values
of the estimates are different when we go from one sample to another. Therefore, the
parameter estimate have different values for estimating a given true population
parameter. That means the parameter estimates are random in their nature and
should have distinct distribution with the corresponding parameters. Remember we
have discussed in the previous sections that both the error term and the dependent
variable are assumed to be normally distributed. Thus, the parameter estimates also
have a normal distribution with their associative mean and variance. Formula for
mean and variance of the respective parameter estimates and the error term are
given below (procedure to drive is given in Annex A)
 
1. The mean of 1  E (1 )  1
     U2
2. The variance of 1  Var ( 1 )  E ((1  E ( )) 2 
 xi2
  
3. The mean of  0  E (  0 )   0
    U2  X i2
4. The variance of  0  E (( 0  E ( 0 )) 2 
n  xi2

 ei2
5. The estimated value of the variance of the error term  U 
2

n2

SET BY MEKONNEN A. 37
LECTURE NOTE ON ECONOMETRICS

6. Hypothesis Testing of OLS Estimates


After estimation of the parameters there are important issues to be considered by the
researcher. We have to know that to what extent our estimates are reliable enough
and acceptable for further purpose. That means, we have to evaluate the degree of
representativeness of the estimate to the true population parameter. Simply a model
must be tested for its significance before it can be used for any other purpose. In this
subsection we will evaluate the reliability of model estimated using the procedure
we explained above.

The available test criteria are divided in to three groups: Theoretical a priori criteria,
statistical criteria and econometric criteria. Priori criteria set by economic theories are
in line with the consistency of coefficients of econometric model to the economic
theory. Statistical criteria, also known as first order tests, are set by statistical theory
and refer to evaluate the statistical reliability of the model. Econometric criteria refer
to whether the assumptions of an econometric model employed in estimating the
parameters are fulfilled or not. There are two most commonly used tests in
econometrics. These are:

i. The square of correlation coefficient ( r 2 ) which is used for judging the


explanatory power of the linear regression of Y on X or on X‟s. The square of
correlation coefficient in simple regression is known as the coefficient of

determination and is given by R 2 . The coefficient of determination measures the


goodness of fit of the line of regression on the observed sample values of Y and X.

ii. The standard error test of the parameter estimates applied for judging the
statistical reliability of the estimates. This test measures the degree of confidence that
we may attribute to the estimates.

i) The Coefficient of determination (R2)

The coefficient of determination is the measure of the amount or proportion of the


total variation of the dependent variable that is determined or explained by the

SET BY MEKONNEN A. 38
LECTURE NOTE ON ECONOMETRICS

model or the presence of the explanatory variable in the model. The total variation of
the dependent variable is split in two additive components; a part explained by the
model and a part represented by the random term. The total variation of the
dependent variable is measured from its arithmetic mean.
_
Total var iationin Yi  (Yi  Y ) 2
 _
Total exp lained var iation  (Yi  Y ) 2
Total un exp lained var iation   ei2

The total variation of the dependent variable is given in the following form;
TSS=ESS + RSS, which means total sum of square of the dependent variable is split
into explained sum of square and residual sum of square.

ei  yi  y i

yi  yi  ei
 2 
yi2  yi  ei2  2 yi ei
2 
 y i2   y i   ei2  2  y i ei

But  y i ei  0
2
Therefore,  y   y i   ei2
2
i

The coefficient of determination is given by the formula

 (Yˆ  Y )2  yˆ
2

2.16
i i
Explained Variation in Y
R2   
Total Variation in Y
 (Y i Y ) 2
y i
2

 2 
Since  y i   1  x i y i the coefficient of determination can also be given as

 1  xi y i
R2 
 y i2

Or

SET BY MEKONNEN A. 39
LECTURE NOTE ON ECONOMETRICS

 (Y  Yˆi ) 2 e
2

Unexplained Variation in Y
i i
2.17
R  1
2
 1  1
Total Variation inY
 (Y i  Y )2 y i
2

The higher the coefficient of determination is the better the fit. Conversely, the
smaller the coefficient of determination is the poorer the fit. That is why the
coefficient of determination is used to compare two or more models. One minus the
coefficient of determination is called the coefficient of non-determination, and it
gives the proportion of the variation in the dependent variable that remained
undetermined or unexplained by the model.

ii) Testing the significance of a given regression coefficient

Since the sample values of the intercept and the coefficient are estimates of the true
population parameters, we have to test them for their statistical reliability.

The significance of a model can be seen in terms of the amount of variation in the
dependent variable that it explains and the significance of the regression coefficients.

There are different tests that are available to test the statistical reliability of the
parameter estimates. The following are the common ones;

A) The standard error test

B) The standard normal test

C) The students t-test

Now, let us discuss them one by one.

A) The Standard Error Test

This test first establishes the two hypotheses that are going to be tested which are
commonly known as the null and alternative hypotheses. The null hypothesis
addresses that the sample is coming from the population whose parameter is not
significantly different from zero while the alternative hypothesis addresses that the
sample is coming from the population whose parameter is significantly different
from zero. The two hypotheses are given as follows:

SET BY MEKONNEN A. 40
LECTURE NOTE ON ECONOMETRICS

H0: βi=0

H1: βi≠0

The standard error test is outlined as follows:

1. Compute the standard deviations of the parameter estimates using the above
formula for variances of parameter estimates. This is because standard deviation is
the positive square root of the variance.

  U2
se(  1 ) 
 xi2
  U2  X i2
se(  0 ) 
n  xi2

2. Compare the standard errors of the estimates with the numerical values of the
estimates and make decision.

A) If the standard error of the estimate is less than half of the numerical value of the
estimate, we can conclude that the estimate is statistically significant. That is, if
 1 
se(  i )  (  i ) , reject the null hypothesis and we can conclude that the estimate is
2
statistically significant.

B) If the standard error of the estimate is greater than half of the numerical value of
the estimate, the parameter estimate is not statistically reliable. That is, if
 1 
se(  i )  (  i ) , conclude to accept the null hypothesis and conclude that the estimate
2
is not statistically significant.

B) The Standard Normal Test

This test is based on the normal distribution. The test is applicable if:

 The standard deviation of the population is known irrespective of the sample


size

 The standard deviation of the population is unknown provided that the


sample size is sufficiently large (n>30).

SET BY MEKONNEN A. 41
LECTURE NOTE ON ECONOMETRICS

The standard normal test or Z-test is outline as follows;

1. Test the null hypothesis H 0 :  i  0 against the alternative hypothesis

H1 :  i  0

2. Determine the level of significant (  ) in which the test is carried out. It is the

probability of committing type I error, i.e. the probability of rejecting the null
hypothesis while it is true. It is common in applied econometrics to use 5%
level of significance.

3. Determine the theoretical or tabulated value of Z from the table. That is, find
the value of Z 2 from the standard normal table. Z 0.025  1.96 from the table.

4. Make decision. The decision of statistical hypothesis testing consists of two


decisions; either accepting the null hypothesis or rejecting it.

If Z cal  Z tab , accept the null hypothesis while if Z cal  Z tab , reject the null

hypothesis. It is true that most of the times the null and alternative hypotheses
are mutually exclusive. Accepting the null hypothesis means that rejecting the
alternative hypothesis and rejecting the null hypothesis means accepting the
alternative hypothesis.
 
Example: If the regression has a value of 1 =29.48 and the standard error of 1 is 36.

Test the hypothesis that the value of 1  25 at 5% level of significance using

standard normal test.

Solution: We have to follow the procedures of the test.

H 0 : 1  25
H1 : 1  25

After setting up the hypotheses to be tested, the next step is to determine the level of
significance in which the test is carried out. In the above example the significance
level is given as 5%.

SET BY MEKONNEN A. 42
LECTURE NOTE ON ECONOMETRICS

The third step is to find the theoretical value of Z at specified level of significance.
From the standard normal table we can get that Z 0.025  1.96 .

The fourth step in hypothesis testing is computing the observed or calculated value
of the standard normal distribution using the following formula.

ˆ1  1 29.48  25
Z cal    0.12 . Since the calculated value of the test statistic is less
se( ˆ1 ) 36

than the tabulated value, the decision is to accept the null hypothesis and conclude
that the value of the parameter is 25.

C) The Student t-Test

In conditions where Z-test is not applied (in small samples), t-test can be used to test
the statistical reliability of the parameter estimates. The test depends on the degrees
of freedom that the sample has. The test procedures of t-test are similar with that of
the z-test. The procedures are outlined as follows;

1. Set up the hypothesis. The hypotheses for testing a given regression coefficient is
given by:
H 0 : i  0
H1 :  i  0

2. Determine the level of significance for carrying out the test. We usually use a 5%
level significance in applied econometric research.
3. Determine the tabulated value of t from the table with n-k degrees of freedom,
where k is the number of parameters estimated.
4. Determine the calculated value of t. The test statistic (using the t- test) is given by:

ˆi
tcal 
se( ˆi )

The test rule or decision is given as follows:

Reject H0 if | t cal | t / 2,nk

iii) Confidence Interval Estimation of the regression Coefficients

SET BY MEKONNEN A. 43
LECTURE NOTE ON ECONOMETRICS

We have discussed the important tests that that can be conducted to check model and
parameters validity. But one thing that must be clear is that rejecting the null
hypothesis does not mean that the parameter estimates are correct estimates of the
true population parameters. It means that the estimate comes from the sample drawn
from the population whose population parameter is significantly different from zero.
In order to define the range within which the true parameter lies, we must construct
a confidence interval for the parameter. Like we constructed confidence interval
estimates for a given population mean, using the sample mean (in Introduction to
Statistics), we can construct 100(1- ) % confidence intervals for the sample
regression coefficients. To do so we need to have the standard errors of the sample
regression coefficients. The standard error of a given coefficient is the positive square
root of the variance of the coefficient. Thus, we have discussed that the formulae for
finding the variances of the regression coefficients are given as.

X
2
i

Variance of the intercept is given by var( ˆ 0 )   u


2
2.18
n xi
2

1
Variance of the slope ( ˆ1 ) is given by var( ˆ1 )   u 2 2.19
x
2
i

e
2
i

Where, u2  (2.20) is the estimate of the variance of the random term
nk
and k is the number of parameters to be estimated in the model. The standard errors
are the positive square root of the variances and the 100 (1- ) % confidence interval
for the slope is given by:
   
1  t (n  k )(se( 1 ))  1  1  t (n  k )(se( 1 ))
2 2

1  ˆ1  t / 2,n k ( se( ˆ1 )) 2.21

SET BY MEKONNEN A. 44
LECTURE NOTE ON ECONOMETRICS

And for the intercept:

 0  ˆ0  t / 2,n k ( se( ˆ 0 )) 2.22

Example 2.6: The following table gives the quantity supplied (Y in tons) and its price
(X pound per ton) for a commodity over a period of twelve years.
Table 5: Data on supply and price for given commodity

Y 69 76 52 56 57 77 58 55 67 53 72 64
X 9 12 6 10 9 10 7 8 12 6 11 8

SET BY MEKONNEN A. 45
LECTURE NOTE ON ECONOMETRICS

Table 6: Data for computation of different parameters

Time Y X XY X2 Y2 x Y Xy x2 y2 Ŷ ei ei2

1 69 9 621 81 4761 0 6 0 0 36 63.00 6.00 36.00

2 76 12 912 144 5776 3 13 39 9 169 72.75 3.25 10.56

3 52 6 312 36 2704 -3 -11 33 9 121 53.25 -1.25 1.56

4 56 10 560 100 3136 1 -7 -7 1 49 66.25 -10.25 105.06

5 57 9 513 81 3249 0 -6 0 0 36 63.00 -6.00 36.00

6 77 10 770 100 5929 1 14 14 1 196 66.25 10.75 115.56

7 58 7 406 49 3364 -2 -5 10 4 25 56.50 1.50 2.25

8 55 8 440 64 3025 -1 -8 8 1 64 59.75 -4.75 22.56

9 67 12 804 144 4489 3 4 12 9 16 72.75 -5.75 33.06

10 53 6 318 36 2809 -3 -10 30 9 100 53.25 -0.25 0.06

11 72 11 792 121 5184 2 9 18 4 81 69.50 2.50 6.25

12 64 8 512 64 4096 -1 1 -1 1 1 59.75 4.25 18.06

Sum 756 108 6960 1020 48522 0 0 156 48 894 756.00 0.00 387.00

Use Tables (Table 5 and Table 6) to answer the following questions

1. Estimate the Coefficient of determination (R2)


2. Run significance test of regression coefficients using the following test methods
A) The standard error test B) The students t-test
3. Fit the linear regression equation and determine the 95% confidence interval for
the slope.

Solution

1. Estimate the Coefficient of determination (R2)

Refer to Example 2.6 above to determine how much percent of the variations in the
quantity supplied is explained by the price of the commodity and what percent
remained unexplained.

Use data in Table 6 to estimate R2 using the formula given below.

SET BY MEKONNEN A. 46
LECTURE NOTE ON ECONOMETRICS

e
2
i
387
R2  1  1  1  0.43  0.57
894
y i
2

This result shows that 57% of the variation in the quantity supplied of the
commodity under consideration is explained by the variation in the price of the
commodity; and the rest 37% remain unexplained by the price of the commodity. In
other word, there may be other important explanatory variables left out that could
contribute to the variation in the quantity supplied of the commodity, under
consideration.

2. Run significance test of regression coefficients using the following test methods

Fitted regression line for the data given is:



Yi  33.75  3.25 X i , where the numbers in parenthesis are standard errors of the
(8.3) (0.9)

respective coefficients.

A. Standard Error test

In testing the statistical significance of the estimates using standard error test, the
following information needed for decision.

Since there are two parameter estimates in the model, we have to test them
separately.

Testing for 1
  
We have the following information about 1 i.e 1 =3.25 and se( 1 )  0.9

The following are the null and alternative hypotheses to be tested.

H 0 : 1  0
H 1 : 1  0

SET BY MEKONNEN A. 47
LECTURE NOTE ON ECONOMETRICS

 
Since the standard error of 1 is less than half of the value of 1 , we have to reject the

null hypothesis and conclude that the parameter estimate 1 is statistically

significant.

Testing for  0


Again we have the following information about  0

 
 0  33.75 and se(  0 )  8.3

The hypotheses to be tested are given as follows;

H0 : 0  0
H1 :  0  0
 
Since the standard error of  0 is less than half of the numerical value of  0 , we have

to reject the null hypothesis and conclude that  0 is statistically significant.

B. The students t-test

In the illustrative example, we can apply t-test to see whether price of the commodity
is significant in determining the quantity supplied of the commodity under
consideration? Use =0.05.

The hypothesis to be tested is:

H 0 : 1  0
H 1 : 1  0

The parameters are known. ˆ1  3.25, se(ˆ1 )  0.8979

Then we can estimate tcal as follows

ˆi 3.25
t cal    3.62
se( ˆi ) 0.8979

SET BY MEKONNEN A. 48
LECTURE NOTE ON ECONOMETRICS

Further tabulated value for t is 2.228. When we compare these two values, the
calculated t is greater than the tabulated value. Hence, we reject the null hypothesis.
Rejecting the null hypothesis means, concluding that the price of the commodity is
significant in determining the quantity supplied for the commodity.

In this part we have seen how to conduct the statistical reliability test using t-statistic.
Now let us see additional information about this test. When the degrees of freedom is
large, we can conduct t-test without consulting the t-table in finding the theoretical
value of t. This rule is known as “2t-rule”. The rule is stated as follows;

The t-table shows that the values of t changes very slowly if the degrees of freedom
(n-k) are greater than 8. For example the value of t0.025 changes from 2.30 (when n-

k=8) to 1.96(when n-k=∞). The change from 2.30 to 1.96 is obviously very slow.
Consequently, we can ignore the degrees of freedom (when they are greater than 8)
and say that the theoretical value of t cal is 2.0. Thus, a two tail test of a null hypothesis

at 5% level of significance can be reduced to the following rules.

1. If t cal is greater than 2 or less than -2, we reject the null hypothesis

2. If t cal is less than 2 or greater than -2, accept the null hypothesis.

3. Fit the linear regression equation and determine the 95% confidence interval for the
slope.

Fitted regression model is indicated Yi  33.75  3.25 X i , where the numbers in


(8.3) (0.9)

parenthesis are standard errors of the respective coefficients. To estimate confidence


interval we need standard error which is determined as follows

e
2
i
387 387
u2     38.7
nk 12  2 10

SET BY MEKONNEN A. 49
LECTURE NOTE ON ECONOMETRICS

1 1
var( ˆ1 )   u  38.7( )  0.80625
2

48
x 2

The standard error of the slope is se( ˆ1 )  var( ˆ1 )  0.80625  0.8979

The tabulated value of t for degrees of freedom 12-2=10 and /2=0.025 is 2.228.

Hence the 95% confidence interval for the slope is given by:

ˆ1  3.25  (2.228)(0.8979)  3.25  2  3.25  2, 3.25  2  1.25, 5.25 . The result tells us
that at the error probability 0.05, the true value of the slope coefficient lies between
1.25 and 5.25

7. Properties of OLS Estimators


The ideal or optimum properties that the OLS estimates possess may be summarized
by well-known theorem known as the Gauss-Markov Theorem.

Statement of the theorem: “Given the assumptions of the classical linear regression
model, the OLS estimators, in the class of linear and unbiased estimators, have the
minimum variance, i.e. the OLS estimators are BLUE.

According to this theorem, under the basic assumptions of the classical linear
regression model, the least squares estimators are linear, unbiased and have
minimum variance (i.e. are best of all linear unbiased estimators). Sometimes the
theorem referred as the BLUE theorem i.e. Best, Linear, and Unbiased Estimator. An
estimator is called BLUE if:

a. Linear: a linear function of the random variable, such as, the dependent variable
Y.

b. Unbiased: its average or expected value is equal to the true population parameter.

c. Minimum variance: It has a minimum variance in the class of linear and unbiased
estimators. An unbiased estimator with the least variance is known as an efficient
estimator.

SET BY MEKONNEN A. 50
LECTURE NOTE ON ECONOMETRICS

According to the Gauss-Markov theorem, the OLS estimators possess all the BLUE
properties. The detailed proof of these properties are presented in Annex B

Chapter Four

Multiple Linear Regression Models

1. Concept and Notations of Multiple Regression Models


Simple linear regression model (also called the two-variable model) is extensively
discussed in the previous section. Such models assume that a dependent variable is
influenced by only one explanatory variable. However, many economic variables are
influenced by several factors or variables. Hence, simple regression models are
unrealistic. There is no more practicality of such models except simple to understand.
Very good examples, for this argument, are demand and supply in which they have
several determinants each.

Adding more variables to the simple linear regression model leads us to the
discussion of multiple regression models i.e. models in which the dependent variable
(or regress and) depends on two or more explanatory variables, or repressors. The
multiple linear regression (population regression function) in which we have one
dependent variable Y, and k explanatory variables, X 1 , X 2 ,...... X k is given by

Yi   0  1 X 1   2 X 2  ....   k X k  u i 3.1

Where,  0  the intercept = value of Yi when all X‟s are zero

 i = are partial slope coefficients

u i = the random term

In this model, for example,  1 is the amount of change in Yi when X 1 changes by one

unit, keeping the effect of other variables constant. Similarly,  2 is the amount of

SET BY MEKONNEN A. 51
LECTURE NOTE ON ECONOMETRICS

change in Yi when X 2 changes by one unit, keeping the effect of other variables

constant. The other slopes are also interpreted in the same way.

Although multiple regression equation can be fitted for any number of explanatory
variables (equation 3.1), the simplest possible regression model, three-variable
regression will be presented for the sake of simplicity. It is characterized by one
dependent variable (Y) and two explanatory variables (X1 and X2). The model is
given by:

Yi   0  1 X 1   2 X 2  u i 3.2

 0 = the intercept = value of Y when both X & X are zero


1 2

 1 = the change in Y, when X 1 changes by one unit, keeping the effect of X 2


constant.
 2 = the change in Y, when X 2 changes by one unit, keeping the effect of X 1

constant

2. Assumptions of the Multiple Linear Regression

Each econometric method that would be used for estimation purpose has its own
assumptions. Knowing the assumptions and their consequence if they are not
maintained is very important for the econometrician. In the previous section, there
are certain assumptions underlying the multiple regression model, under the method
of ordinary least squares (OLS). Let us see them one by one.

Assumption 1: Randomness of ui - the variable u is a real random variable.

Assumption 2: Zero mean of u i - the random variable u i has a zero mean for each

value of X i i.e. E (ui )  0

Assumption 3: Homoscedasticity of the random term - the random term u i has

constant variance. In other words, the variance of each u i is the same for all the X i

values.

SET BY MEKONNEN A. 52
LECTURE NOTE ON ECONOMETRICS

E (ui2 )   u2 Cons tan t

Assumption 4: Normality of u i - the values of each u i are normally distributed

ui  N (0,  u2 )

Assumption 5: No autocorrelation or serial independence of the random terms - the


successive values of the random term are not strongly correlated. The values of u i

(corresponding to xi ) are independent of the values of any other u j (corresponding to

X j ).

E(ui u j )  0 for i  j

Assumption 6: Independence of u i and X i - every disturbance term u i is

independent of the explanatory variables. E (u i X 1i )  E (u i X 2i )  0

Assumption 7: No errors of measurement in the X ' s - the explanatory variables are


measured without error.
Assumption 8: No perfect multicollinearity among the X ' s - the explanatory
variables are not perfectly linearly correlated.
Assumption 9: Correct specification of the model - the model has no specification
error in that all the important explanatory variables appear explicitly in the function
and the mathematical form is correctly defined (linear or non-linear form and the
number of equations in the model).

3. Estimation of Partial Regression Coefficients


The process of estimating the parameters in the multiple regression model is similar
with that of the simple linear regression model. The main task is to derive the normal
equations using the same procedure as the case of simple regression. Like in the
simple linear regression model case, OLS and Maximum Likelihood (ML) methods
can be used to estimate partial regression coefficients of multiple regression models.
But, due to their simplicity and popularity, OLS methods can be used. The OLS
procedure consists in so choosing the values of the unknown parameters that the
residual sum of squares is as small as possible.

SET BY MEKONNEN A. 53
LECTURE NOTE ON ECONOMETRICS

Under the assumption of zero mean of the random term, the sample regression
function will look like the following.
^ ^ ^ ^
Yi   0  1 X 1   2 X 2 3.3

We call this equation, the fitted equation. Subtracting (3.3) from (3.2), we obtain:
^
ei  Yi  Y i 3.4

The method of ordinary least squares (OLS) or classical least square (CLS) involves
^ ^ ^
obtaining the values  0 ,  1 and  2 in such a way that  ei2 is minimum.
^ ^ ^
The values of  0 ,  1 and  2 for which e 2
i is minimum is obtained by

differentiation this sum of squares with respect to these coefficients and equate them
to zero. That is,
n
  (Y  ˆ0  ˆ1 X 1  ˆ 2 X 2 ) 2
  ei2
 i 1
0 3.5
ˆ0
^
 0
n
  (Y  ˆ0  ˆ1 X 1  ˆ 2 X 2 ) 2
 e 2

 i 1
0
i
3.6
ˆ1
^
 1
n
  (Y  ˆ0  ˆ1 X 1  ˆ 2 X 2 ) 2
 e 2

 i 1
0
i
3.7
ˆ 2
^
 2

Solving equations (3.5), (3.6) and (3.7) simultaneously, we obtain the system of
normal equations given as follows:
^ ^ ^
Y  n     X    X
i 0 1 1i 2 2i 3.8
^ ^ ^
X Y   X  X  X X
1i i 0 1i 1 1
2
2 1i 2i 3.9
^ ^ ^
X Y   X  X X b X
2i i 0 2i 2 1i 2i 2
2
2i 3.10

Then, letting

SET BY MEKONNEN A. 54
LECTURE NOTE ON ECONOMETRICS


x1i  X 1i  X 1 3.11

x 2i  X 2i  X 2 3.12

yi  Yi  Y 3.13

The above three equations (3.8), (3.9) and (3.10) can be solved using Matrix
operations or simultaneously to obtain the following estimates:

^  x y  x    x y  x x 
2

1 
1 1 2 2 1 2
3.14
 x  x    x x 
2
1
2
2 1 2
2

^  x y  x    x y  x x 
2

2 
2 1 1 1 2
3.15
 x  x    x x 
2
1
2
2 1 2
2

^   
 0  Y  ˆ1 X 1  ˆ 2 X 2 3.16

4. Variance and Standard errors of OLS Estimators


Estimating the numerical values of the parameters is not enough in econometrics if
the data are coming from the samples. The standard errors derived are important for
two main purposes: to establish confidence intervals for the parameters and to test
statistical hypotheses. They are important to look into their precision or statistical
reliability. An estimator cannot be used for any purpose if it is not a good estimator.
The precision of an estimator is measured by observing the standard error of the
estimator.

Like in the case of simple linear regression, the standard errors of the coefficients are
vital in statistical inferences about the coefficients. We use standard the error of a
coefficient to construct confidence interval estimate for the population regression
coefficient and to test the significance of the variable to which the coefficient is
attached in determining the dependent variable in the model. In this section, we will
see these standard errors. The standard error of a coefficient is the positive square

SET BY MEKONNEN A. 55
LECTURE NOTE ON ECONOMETRICS

root of the variance of the coefficient. Thus, we start with defining the variances of
the coefficients.

Variance of the intercept   0 


^

 

 2
 2   
 1   2  x1  2 X 1 X 2  x1 x 2 
2 2
X x 2 X
 ^  ^
2
Var   0    ui 1   3.17
n 
 
  1 2  1 2
x 2
x 2
 ( x x ) 2

 
^
Variance of 1

 ^  
Var  1    u2 
 x22 
3.18
2 
    x1  x 2  ( x1 x 2 ) 
2 2

^
Variance of  2
^ ^ 2
Var (  2 )   u 
 x12  3.19
2 
  x1  x 2  ( x1 x 2 ) 
2 2

Where,


^
2

e 2
i
3.20
n3
u

Equation 3.20 here gives the estimate of the variance of the random term. Then, the
standard errors are computed as follows:
^
Standard error of  0
^ ^
SE (  0 )  Var (  0 ) 3.21
^
Standard error of 1
^ ^
SE( 1 )  Var ( 1 ) 3.22
^
Standard error of  2

SET BY MEKONNEN A. 56
LECTURE NOTE ON ECONOMETRICS

^ ^
SE(  2 )  Var (  2 ) 3.23

Note: The OLS estimators of the multiple regression model have properties which
are parallel to those of the two-variable model.

5. Coefficient of Multiple Determination


In simple regression model we have discussed about the coefficient of determination
and its interpretation. In this section, we will discuss the coefficient of multiple
determination which has an equivalent role with that of the simple model. As
coefficient of determination is the square of the simple correlation in simple model,
coefficient of multiple determination is the square of multiple correlation coefficient.

The coefficient of multiple determination ( R 2 ) is the measure of the proportion of the

variation in the dependent variable that is explained jointly by the independent

variables in the model. One minus R 2 is called the coefficient of non-determination.


It gives the proportion of the variation in the dependent variable that remains
unexplained by the independent variables in the model. As in the case of simple

linear regression, R 2 is the ratio of the explained variation to the total variation.
Mathematically:

R2 
 y2 3.24
y 2

^ ^
Or R can also be given in terms of the slope coefficients 1 and  2 as :
2

^ ^
 1  x1 y   2  x 2 y
R2  3.25
y 2

In simple linear regression, the higher the R 2 means the better the model is
determined by the explanatory variable in the model. In multiple linear regression,

however, every time we insert additional explanatory variable in the model, the R 2

SET BY MEKONNEN A. 57
LECTURE NOTE ON ECONOMETRICS

increases irrespective of the improvement in the goodness-of- fit of the model. That

means high R 2 may not imply that the model is good.

Thus, we adjust the R 2 as follows:

(n  1)
2
Rady  1  (1  R 2 ) 3.26
(n  k )
Where, k = the number of explanatory variables in the model.

In multiple linear regression, therefore, we better interpret the adjusted R 2 than the

ordinary or the unadjusted R 2 . We have known that the value of R 2 is always

between zero and one. But the adjusted R 2 can lie outside this range even to be
negative.

In the case of simple linear regression, R 2 is the square of linear correlation


coefficient? Again as the correlation coefficient lies between -1 and +1, the coefficient

of determination ( R 2 ) lies between 0 and 1. The R 2 of multiple linear regression also

lies between 0 and +1. The adjusted R 2 , however, can sometimes be negative when

the goodness of fit is poor. When the adjusted R 2 value is negative, we considered it
as zero and interpret as no variation of the dependent variable is explained by
repressors.

6. Confidence Interval Estimation

Confidence interval estimation in multiple linear regression follows the same


formulae and procedures that we followed in simple linear regression. You are,
therefore, required to practice finding the confidence interval estimates of the
intercept and the slopes in multiple regression with two explanatory variables.

Please recall that 100(1-  ) % confidence interval for  i is given as ˆi  t / 2,n  k se( ˆi )

where k is the number of parameters to be estimated or the number of variables (both


dependent and explanatory)

SET BY MEKONNEN A. 58
LECTURE NOTE ON ECONOMETRICS

Interpretation of the confidence interval: Values of the parameter lying in the interval are
plausible with 100(1-  ) % confidence.

7. Hypothesis Testing in Multiple Regression


Hypothesis testing is important to draw inferences about the estimates and to know
how representative the estimates are to the true population parameter. Once we go
beyond the simple world of the two-variable linear regression model, hypothesis
testing assumes several interesting forms such as the following.

a) Testing hypothesis about an individual partial regression coefficient;


b) Testing the overall significance of the estimated multiple regression model
(finding out if all the partial slope coefficients are simultaneously equal to
zero);
c) Testing if two or more coefficients are equal to one another;
d) Testing that the partial regression coefficients satisfy certain restrictions
e) Testing the stability of the estimated regression model over time or in different
cross-sectional units
f) Testing the functional form of regression models.

These and other types of hypotheses tests can be referred from different
Econometrics books. For the case in point, we will confine ourselves to the major
ones.

Testing individual regression coefficients


The tests concerning the individual coefficients can be done using the standard error
test or the t-test. In all the cases the hypothesis is stated as:

H 0 : ˆ1  0 H 0 : ˆ 2  0 H 0 : ˆ K  0
a) b)
H : ˆ  0
1 1 H 1 : ˆ 2  0 H 1 : ˆ K  0

In a) we will like to test the hypothesis that X1 has no linear influence on Y holding

other variables constant. In b) we test the hypothesis that X2 has no linear

SET BY MEKONNEN A. 59
LECTURE NOTE ON ECONOMETRICS

relationship with Y holding other factors constant. The above hypotheses will lead us
to a two-tailed test however, one-tailed test might also be important. There are two
methods for testing significance of individual regression coefficients.

a) Standard Error Test: Using the standard error test we can test the above
hypothesis.
Thus the decision rule is based on the relationship between the numerical value of
the parameter and the standard error of the same.

(i) If S ( ˆ i )  1 ˆ i , we accept the null hypothesis, i.e. the estimate of  i is not


2
statistically significant.

Conclusion: The coefficient ( ˆ i ) is not statistically significant. In other words, it does

not have a significant influence on the dependent variable.

(ii) If S ( ˆ i )  1 ˆ i , we fail to accept H0, i.e., we reject the null hypothesis in favour of
2

the alternative hypothesis meaning the estimate of i has a significant influence on

the dependent variable.


Generalisation: The smaller the standard error, the stronger is the evidence that the
estimates are statistically significant.

(b) t-test
The more appropriate and formal way to test the above hypothesis is to use the t-test.
As usual we compute the t-ratios and compare them with the tabulated t-values and
make our decision.
ˆ i
Therefore: t cal   t ( n  1)
S ( ˆ i )

Decision Rule: accept H0 if  t   t cal  t 


2 2

SET BY MEKONNEN A. 60
LECTURE NOTE ON ECONOMETRICS

Otherwise, reject the null hypothesis. Rejecting H 0 means, the coefficient being tested

is significantly different from 0. Not rejecting H 0 , on the other hand, means we don‟t

have sufficient evidence to conclude that the coefficient is different from 0.

8. Testing the Overall Significance of Regression Model


Here, we are interested to test the overall significance of the observed or estimated
regression line, that is, whether the dependent variable is linearly related to all of the
explanatory variables. Hypotheses of such type are often called joint hypotheses.
Testing the overall significance of the model means testing the null hypothesis that
none of the explanatory variables in the model significantly determine the changes in
the dependent variable. Put in other words, it means testing the null hypothesis that
none of the explanatory variables significantly explain the dependent variable in the
model. This can be stated as:
H 0 : 1   2  0
H 1 :  i  0, at least for one i.

The test statistic for this test is given by:

 yˆ 2

Fcal  k  1
 e2
nk

Where, k is the number of explanatory variables in the model.


The results of the overall significance test of a model are summarized in the analysis
of variance (ANOVA) table as follows.
Source of Sum of squares Degrees of Mean sum of Fcal
variation freedom squares
Regression ^2 k 1 ^
MSE
SSE   y MSE 
y 2
F
k 1 MSR

Residual SSR   e 2 nk e 2

MSR 
nk

SET BY MEKONNEN A. 61
LECTURE NOTE ON ECONOMETRICS

Total SST   y 2 n 1

The values in this table are explained as follows:


^
SSE   y 2   (Yˆi  Y ) 2  Explained Sum of Squares
SSR   y i2   (Yi  Yˆ ) 2  Unexplaine d Sum of Squares

SST   y 2   (Yi  Y ) 2  Total Sum of Squares

These three sums of squares are related in such a way that

SST  SSE  SSR

This implies that the total sum of squares is the sum of the explained (regression)
sum of squares and the residual (unexplained) sum of squares. In other words, the
total variation in the dependent variable is the sum of the variation in the dependent
variable due to the variation in the independent variables included in the model and
the variation that remained unexplained by the explanatory variables in the model.
Analysis of variance (ANOVA) is the technique of decomposing the total sum of
squares into its components. As we can see here, the technique decomposes the total
variation in the dependent variable into the explained and the unexplained
variations. The degrees of freedom of the total variation are also the sum of the
degrees of freedom of the two components. By dividing the sum of squares by the
corresponding degrees of freedom, we obtain what is called the Mean Sum of
Squares (MSS).

The Mean Sum of Squares due to regression, errors (residual) and Total are
calculated as the Sum of squares and the corresponding degrees of freedom (look at
column 3 of the above ANOVA table.
The final table shows computation of the test statistic which can be computed as
follows:
MSR
Fcal   F (k  1, n  k ) [The F statistic follows F distribution]
MSE

SET BY MEKONNEN A. 62
LECTURE NOTE ON ECONOMETRICS

The test rule: Reject H 0 if Fcal F (k  1, n  k ) where F (k  1, n  k ) is the value to be

read from the F- distribution table at a given  level.

Relationship between F and R2

You may recall that R 2 is given by R 2   yˆ 2


and  yˆ 2
 R2  y2
y 2

We also know that

R2  1
e 2
Hence, e 2

 1  R 2 which means e 2
 (1  R 2 ) y 2
y 2
y 2

The formula for F is:

Fcal 
y 2

k 1
 e2
nk

R2  y2 R2  y2 (n  k )
Fcal   .
k 1 k 1 (1  R 2 ) y 2
(1  R 2 ) y 2
nk
(n  k ) R2
Fcal  .
k  1 (1  R 2 )

That means the calculated F can also be expressed in terms of the coefficient of
determination.

Testing the Equality of two Regression Coefficients


Given the multiple regression equation:

Yi   0   1 X 1i   2 X 2i   3 X 3i  ...   K X Ki  U i

We would like to test the hypothesis:

H 0 :  1   2 or 1   2  0 vs. H 1 : H 0 is not true

The null hypothesis says that the two slope coefficients are equal.

SET BY MEKONNEN A. 63
LECTURE NOTE ON ECONOMETRICS

Example: If Y is quantity demanded of a commodity, X1 is the price of the

commodity and X2 is income of the consumer. The hypothesis suggests that the price

and income elasticity of demand are the same.


We can test the null hypothesis using the classical assumption that

ˆ 2  ˆ1
t ~ t distribution with N - K degrees of freedom.
SE( ˆ 2  ˆ1 )

Where K = the total number of parameters estimated.

The SE( ˆ 2  ˆ1 ) is given as SE( ˆ 2  ˆ1 )  Var ( ˆ 2 )  Var ( ˆ1 )  2 cov(  2 ,  1 )

Thus the t-statistic is:


ˆ 2  ˆ1
t
Var ( ˆ 2 )  Var ( ˆ1 )  2 cov(  2 ,  1 )

Decision: Reject H0 if tcal. > ttab.

Note: Using similar procedures one can also test linear equality restrictions, for
example 1   2  1 and other restrictions.

Illustration: The following table shows a particular country‟s the value of imports
(Y), the level of Gross National Product(X1) measured in arbitrary units, and the price
index of imported goods (X2), over 12 years period.
Table 7: Data for multiple regression examples

Year 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971
Y 57 43 73 37 64 48 56 50 39 43 69 60
X1 220 215 250 241 305 258 354 321 370 375 385 385
X2 125 147 118 160 128 149 145 150 140 115 155 152

a) Estimate the coefficients of the economic relationship and fit the model.

To estimate the coefficients of the economic relationship, we compute the entries


given in Table 8

SET BY MEKONNEN A. 64
LECTURE NOTE ON ECONOMETRICS

Table 8: Computations of the summary statistics for coefficients for data of Table 7

Year Y X1 X2 x1 x2 Y X12 x2 2 x1y x2y x1x2 y2


1960 57 220 125 -86.5833 -15.3333 3.75 7496.668 235.1101 -324.687 -57.4999 1327.608 14.0625
1961 43 215 147 -91.5833 6.6667 -10.25 8387.501 44.44489 938.7288 -68.3337 -610.558 105.0625
1962 73 250 118 -56.5833 -22.3333 19.75 3201.67 498.7763 -1117.52 -441.083 1263.692 390.0625
1963 37 241 160 -65.5833 19.6667 -16.25 4301.169 386.7791 1065.729 -319.584 -1289.81 264.0625
1964 64 305 128 -1.5833 -12.3333 10.75 2.506839 152.1103 -17.0205 -132.583 19.52731 115.5625
1965 48 258 149 -48.5833 8.6667 -5.25 2360.337 75.11169 255.0623 -45.5002 -421.057 27.5625
1966 56 354 145 47.4167 4.6667 2.75 2248.343 21.77809 130.3959 12.83343 221.2795 7.5625
1967 50 321 150 14.4167 9.6667 -3.25 207.8412 93.44509 -46.8543 -31.4168 139.3619 10.5625
1968 39 370 140 63.4167 -0.3333 -14.25 4021.678 0.111089 -903.688 4.749525 -21.1368 203.0625
1969 43 375 115 68.4167 -25.3333 -10.25 4680.845 641.7761 -701.271 259.6663 -1733.22 105.0625
1970 69 385 155 78.4167 14.6667 15.75 6149.179 215.1121 1235.063 231.0005 1150.114 248.0625
1971 60 385 152 78.4167 11.6667 6.75 6149.179 136.1119 529.3127 78.75022 914.8641 45.5625
Sum 639 3679 1684 0.0004 0.0004 0 49206.92 2500.667 1043.25 -509 960.6667 1536.25
Mean 53.25 306.5833 140.3333 0 0 0

SET BY MEKONNEN A. 65
LECTURE NOTE ON ECONOMETRICS

From Table 8, we can take the following summary results.

 Y  639 X 1  3679 X 2  1684 n  12

Y 639
Y    53.25
n 12

X 1
3679
X1    306.5833
n 12

X 2
1684
X2    140.3333
n 12

The summary results in deviation forms are then given by:

x  49206.92 x  2500.667
2 2
1 2

 x y  1043.25
1 x 2 y  509

x x 1 2  960.6667 y 2
 1536.25

The coefficients are then obtained as follows.


^  x y  x   x y  x x  (1043.25)(2500.667)- (-509)(960.6667) 2608821 488979.4
2

1   
1 1 2 2 1 2

 x  x    x x 
2
1
2
2 1 2
2
(49206.92)(2500.667)- (960.667) 123050121 922880.51 2

3097800.2
  0.025365
122127241

^  x y  x   x y  x x  (-509)(49206.92)- (1043.25)(960.6667) - 2504632-1002216


2

2   
2 1 1 1 2

 x  x   x x 
2
1
2
2 1 2
2
(49206.92)(2500.667)- (960.667) 123050121 922880.51 2

- 26048538
  0.21329
122127241

ˆ0  Y  ˆ1 X 1  ˆ2 X 2  53.25  (0.025365)  (0.21329)  75.40512

The fitted model is then written as: Yˆi = 75.40512 + 0.025365X1 - 0.21329X2

b) Compute the variance and standard errors of the slopes.

SET BY MEKONNEN A. 66
LECTURE NOTE ON ECONOMETRICS

First, you need to compute the estimate of the variance of the random term as follows


^
2

e 2
i

1401.223 1401.223
  155.69143
n3 12  3
u
9
^
Variance of 1

 ^  
Var  1    u 
 x22 
 155.69143(
2500.667
)  0.003188
2 
2

    x1  x2  ( x1 x2 ) 
2 2
12212724

^
Standard error of 1
^ ^
SE( 1 )  Var ( 1 )  0.003188  0.056462
^
Variance of  2

^ ^ 2
Var (  2 )   u 
 x12 
 155.69143(
49206.92
)  0.0627
2 
  x1  x2  ( x1 x2 ) 
2 2
122127241
^
Standard error of  2
^ ^
SE(  2 )  Var (  2 )  0.0627  0.25046

Similarly, the standard error of the intercept is found to be 37.98177. The detail is left for
you as an exercise.
c) Calculate and interpret the coefficient of determination.
We can use the following summary results to obtain the R2.

 yˆ 2
 135.0262

e 2
 1401.223

y 2
 1536.25 (The sum of the above two). Then,

^ ^
 1  x1 y   2  x 2 y (0.025365)(1043.25) (-0.21329)(-509)
R 2
  0.087894
y 2
1356.25

SET BY MEKONNEN A. 67
LECTURE NOTE ON ECONOMETRICS

e 2

1401.223
or R 2  1   1  0.087894
1356.25
y 2

d) Compute the adjusted R2.


(n  1 12 - 1
2
Rady  1  (1  R 2 )  1 - (1 - 0.087894)  0.114796
n  k) 12 - 3

e) Construct 95% confidence interval for the true population parameters (partial regression
coefficients).[Exercise: Base your work on Simple Linear Regression]
f) Test the significance of X1 and X2 in determining the changes in Y using t-test.
The hypotheses are summarized in the following table.

Coefficien Hypothes Estimat Std. error Calculated t Conclusion


t is e
1 H0: 1=0 0.025365 0.056462 0.025365 We do not
t cal   0.449249
0.056462
H1: 10 reject H0
since tcal<ttab
2 H0: 2=0 -0.21329 0.25046  0.21329 We do not
t cal   0.85159
 0.21329
H1: 20 reject H0
since tcal<ttab

The critical value (t 0.05, 9) to be used here is 2.262. Like the standard error test, the t- test
revealed that both X1 and X2 are insignificant to determine the change in Y since the
calculated t values are both less than the critical value.

Exercise: Test the significance of X1 and X2 in determining the changes in Y using the
standard error test.
g) Test the overall significance of the model. (Hint: use  = 0.05)
This involves testing whether at least one of the two variables X 1 and X2 determine the
changes in Y. The hypothesis to be tested is given by:

SET BY MEKONNEN A. 68
LECTURE NOTE ON ECONOMETRICS

H 0 : 1   2  0
H 1 :  i  0, at least for one i.

The ANOVA table for the test is give as follows:


Source of Sum of Squares Degrees of Mean Sum of Squares Fcal
variation freedom
Regression ^ 2 ^ MSR
SSR   y  135.0262
MSR 
y 
2
135.0262 F  MSE

k  1 =3-1=2 k 1 2  0.433634
67.51309

Residual SSE   e 2  1401.223 n  k =12-3=9 e 2


1401.223
MSE   
nk 9
155.614

Total SST   y 2  1536.25 n  1 =12-1=11

The tabulated F value (critical value) is F(2, 11) = 3.98

In this case, the calculated F value (0.4336) is less than the tabulated value (3.98). Hence,
we do not reject the null hypothesis and conclude that there is no significant
contribution of the variables X1 and X2 to the changes in Y.

h) Compute the F value using the R2.

(n  k ) R2 (12 - 3) 0.087894
Fcal  .   0.433632
k  1 (1  R )
2
3 - 1 1  0.087894

SET BY MEKONNEN A. 69
LECTURE NOTE ON ECONOMETRICS

Chapter five
Econometric Problems
1. Assumptions Revisited
In many practical cases, two major problems arise in applying the classical linear
regression model.
1) those due to assumptions about the specification of the model and about the
disturbances and
2) those due to assumptions about the data

The following assumptions fall in either of the categories.


 The regression model is linear in parameters.
 The values of the explanatory variables are fixed in repeated sampling (non-
stochastic).
 The mean of the disturbance (ui) is zero for any given value of X i.e. E(ui) = 0

 The variance of ui is constant i.e. homoscedastic

 There is no autocorrelation in the disturbance terms


 The explanatory variables are independently distributed with the ui.

 The number of observations must be greater than the number of explanatory


variables.
 There must be sufficient variability in the values taken by the explanatory
variables.
 There is no linear relationship (multicollinearity) among the explanatory
variables.
 The stochastic (disturbance) term ui are normally distributed i.e., ui ~ N(0, ²)

 The regression model is correctly specified i.e., no specification error.

SET BY MEKONNEN A. 70
LECTURE NOTE ON ECONOMETRICS

With these assumptions we can show that OLS are BLUE, and normally distributed.
Hence it was possible to test Hypothesis about the parameters. However, if any of such
assumption is relaxed, the OLS might not work. We shall not examine in detail the
violation of some of the assumptions.

2. Violations of Assumptions

The Zero Mean Assumption i.e. E(ui)=0

If this assumption is violated, we obtain a biased estimate of the intercept term. But,
since the intercept term is not very important we can leave it. The slope coefficients
remain unaffected even if the assumption is violated. The intercept term does not also
have physical interpretation.

The Normality Assumption


This assumption is not very essential if the objective is estimation only. The OLS
estimators are BLUE regardless of whether the ui are normally distributed or not. In

addition, because of the central limit theorem, we can argue that the test procedures –
the t-tests and F-tests - are still valid asymptotically, i.e. in large sample.

Heteroscedasticity: The Error Variance is not constant


The error terms in the regression equation have a common variance i.e., are
Homoscedastic. If they do not have common variance we say they are Hetroscedastic.
The basic questions to be addressed are:
 What is the nature of the problem?
 What are the consequences of the problem?
 How do we detect (diagnose) the problem?
 What remedies are available for the problem?

The Nature of the Problem


In the case of homoscedastic disturbance terms the spread around the mean is constant,

i.e. = ². But in the case of heteroscedasticity disturbance terms the variance changes

SET BY MEKONNEN A. 71
LECTURE NOTE ON ECONOMETRICS

with the explanatory variable. The problem of heteroscedasticity is likely to be more


common in cross-sectional than in time-series data.

Causes of Heteroscedasticity
There are several reasons why the variance of the error term may be variable, some of
which are as follows.
 Following the error-learning models, as people learn, their errors of behaviour
become smaller over time where the standard error of the regression model
decreases.
 As income grows people have discretionary income and hence more scope for
choice about the disposition of their income. Hence, the variance (standard
error) of the regression is more likely to increase with income.
 Improvement in data collection techniques will reduce errors (variance).
 Existence of outliers might also cause heteroscedasticity.
 Misspecification of a model can also be a cause for heteroscedasticity.
 Skewness in the distribution of one or more explanatory variables included in
the model is another source of heteroscedasticity.
 Incorrect data transformation and incorrect functional form are also other
sources
Note: Heteroscedasticity is likely to be more common in cross-sectional data than in
time series data. In cross-sectional data, individuals usually deal with samples (such as
consumers, producers, etc) taken from a population at a given point in time. Such
members might be of different size. In time series data, however, the variables tend to
be of similar orders of magnitude since data is collected over a period of time.

Consequences of Heteroscedasticity
If the error terms of an equation are heteroscedastic, there are three major consequences.

SET BY MEKONNEN A. 72
LECTURE NOTE ON ECONOMETRICS

a) The ordinary least square estimators are still linear since heteroscedasticity does
not cause bias in the coefficient estimates. The least square estimators are still
unbiased.
b) Heteroscedasticity increases the variance of the partial regression coefficients but
it does affect the minimum variance property. Thus, the OLS estimators are
inefficient. Thus the test statistics – t-test and F-test – cannot be relied on in the
face of uncorrected heteroscedasticity.

Detection of Heteroscedasticity
There are no hard and fast rules (universally agreed upon methods) for detecting the
presence of heteroscedasticity. But some rules of thumb can be suggested. Most of these
methods are based on the examination of the OLS residuals, ei, since these are the once

we observe and not the disturbance term ui. There are informal and formal methods of

detecting heteroscedasticity.

a) Nature of the problem


In cross-sectional studies involving heterogeneous units, heteroscedasticity is the rule
rather than the exception.
Example: In small, medium and large sized agribusiness firms in a study of input
expenditure in relation to sales, the rate of interest, etc. heteroscedasticity is expected.

b) Graphical method
If there is no a priori or empirical information about the nature of heteroscedasticity,

one could do an examination of the estimated residual squared, ei² to see if they exhibit

any systematic pattern. The squared residuals can be plotted either against Y or against
one of the explanatory variables. If there appears any systematic pattern,
heteroscedasticity might exist. These two methods are informal methods.

c) Park Test

SET BY MEKONNEN A. 73
LECTURE NOTE ON ECONOMETRICS

Park suggested a statistical test for heteroscedasticity based on the assumption that the

variance of the disturbance term (i²) is some function of the explanatory variable Xi.

Park suggested a functional form as:  i   2 X i e vi which can be transferred to a linear
2

function using ln transformation. Hence, Var (ei )   2 X ii e vi where vi is the stochastic

disturbance term.

ln  i  ln  2   ln X i  vi
2

ln ei  ln  2   ln X i  vi since ² is not known.


2

The Park-test is a two-stage procedure: run OLS regression disregarding the


heteroscedasticity question and obtain the ei and then run the above equation. The

regression is run and if  turns out to be statistically significant, then it would suggest
that heteroscedasticity is present in the data.

d) Spearman’s Rank Correlation test


  di 2 
Recall that: rS  1  6  d = difference between ranks
 N ( N  1) 
2
 

Step 1: Regress Y on X and obtain the residuals, ei.

Step 2: Ignoring the significance of ei or taking |ei| rank both ei and Xi and compute

the rank correlation coefficient.


Step 3: Test for the significance of the correlation statistic by the t-test

rS N 2
t ~ t (n  2)
1  rs
2

A high rank correlation suggests the presence of heteroscedasticity. If more than one
explanatory variable, compute the rank correlation coefficient between e i and each

explanatory variable separately.

e) Goldfeld and Quandt Test

SET BY MEKONNEN A. 74
LECTURE NOTE ON ECONOMETRICS

This is the most popular test and usually suitable for large samples. If it is assumed that

the variance (i²) is positively related to one of the explanatory variables in the

regression model and if the number of observations is at least twice as many as the
parameters to be estimated, the test can be used.
Given the model
Yi   0   1 X i  U i

Suppose i² is positively related to Xi as

i2   2 Xi2
Goldfeld and Quandt suggest the following steps:
1. Rank the observation according to the values of Xi in ascending order.

2. Omit the central c observations (usually the middle third of the recorded
observations), or where c is specified a priori, and divide the remaining (n-c)
(n  c)
observations into two groups, each with observations.ss
2
3. Fit separate regressions for the two sub-samples and obtain the respective
residuals
( n  c)
RSS, and RSS2 with  k df
2
4. Compute the ratio:

F 
Rss 2 / df (n  c  2k )
~ FV1V2 v1  v2 
Rss1 / df 2
If the two variances tend to be the same, then F approaches unity. If the variances differ
we will have values for F different from one. The higher the F-ratio, the stronger the
evidence of heteroscedasticity.
Note: There are also other methods of testing the existence of heteroscedasticity in your
data. These are Glejser Test, Breusch-Pagan-Godfrey Test, White‟s General Test and
Koenker-Bassett Test the details for which you are supposed to refer.

SET BY MEKONNEN A. 75
LECTURE NOTE ON ECONOMETRICS

Remedial Measures
OLS estimators are still unbiased even in the presence of heteroscedasticity. But they are
not efficient, not even asymptotically. This lack of efficiency makes the usual hypothesis
testing procedure a dubious exercise. Remedial measures are, therefore, necessary.
Generally the solution is based on some form of transformation.

a) The Weighted Least Squares (WLS)


Given a regression equation model of the form
Yi   0   1 X i  U i

The weighted least square method requires running the OLS regression to a
transformed data. The transformation is based on the assumption of the form of
heteroscedasticity.
Assumption One: Given the model Yi   0   1 X 1i  U i

If var(U i )   i   X i , then E (U i )   X 1i
2 2 2 2 2

Where ² is a constant variance of a classical error term. So if as a matter of speculation


or other tests indicate that the variance is proportional to the square of the explanatory
variable X, we may transform the original model as follows:
Yi 0  X Ui
  1 1i 
X 1i X 1i X 1i X 1i

1
 0 ( )   1  Vi
X 1i

Now E (Vi 2 )  E ( U i )  1
2
E (U i )   2
2

X 1i X 1i

Y 1
Hence the variance of Ui is now homoscedastic and regress on .
X 1i X 1i
Assumption Two: Again given the model
Yi   0   1 X 1i  U i

Suppose now Var (U i )  E (U i )   U i   X 1i


2 2 2

SET BY MEKONNEN A. 76
LECTURE NOTE ON ECONOMETRICS

It is believed that the variance of Ui is proportional to X1i, instead of being proportional

to the squared X1i. The original model can be transformed as follows:

Yi 0 X 1i Ui
  1 
X 1i X 1i X 1i X 1i
1
 0   1 X 1i  Vi
X 1i

Where Vi  U i / X 1i and where X 1i  0


Thus, E (Vi 2 )  E ( U i ) 2  1 E (U i 2 )  1   2 X 1i   2
X 1i X 1i X 1i

Now since the variance of Vi is constant (homoscedastic) one can apply the OLS

Y 1
technique to regress on and X 1i .
X 1i X 1i

To go back to the original model, one can simply multiply the transformed model by

X 1i .

Assumption Three: Given the model let us assume that

E (U i ) 2   2 E (Yi )
2

The variance is proportional to the square of the expected value of Y.

Now, E (Yi )   0   1 X 1i

We can transform the original model as


Yi 0 X 1i Ui
  1 
E (Yi ) E (Yi ) E (Yi ) E (Yi )

1 X 1i
 0  1  Vi
E (Yi ) E (Yi )

Ui
Again it can be verified that Vi  gives us a constant variance ²
E (Yi )
2
 Ui 
  2 E (Yi )   2
1 1
E (Vi )  E    E (U i ) 2 
2

 E (Yi )  E (Yi ) 2
E (Yi )2

SET BY MEKONNEN A. 77
LECTURE NOTE ON ECONOMETRICS

The disturbance Vi is homoscedastic and the regression can be run.

Assumption Four: If instead of running the regression Yi   0   1 X 1i  U i one could

run

ln Yi   0   1 ln X 1i  U i
Then it reduces heteroscedasticity.
b) Other Remedies for Heteroscedasticity
Two other approaches could be adopted to remove the effect of heteroscedasticity.
 Include a previously omitted variable(s) if heteroscedasticity is suspected due to
omission of variables.
 Redefine the variables in such a way that avoids heteroscedasticity. For example,
instead of total income, we can use Income per capita.

Autocorrelation: Error Terms are correlated


Another assumption of the regression model was the non-existence of serial correlation
(autocorrelation) between the disturbance terms, Ui.

Cov(U i ,V j )  0 i  j

Serial correlation implies that the error term from one time period depends in some
systematic way on error terms from other time periods. Autocorrelation is more a
problem of time series data than cross-sectional data. If by chance, such a correlation is
observed in cross-sectional units, it is called spatial autocorrelation. So, it is important to
understand serial correlation and its consequences of the OLS estimators.

Nature of Autocorrelation
The classical model assumes that the disturbance term relating to any observation is not
influenced by the disturbance term relating to any other disturbance term.

E (U iU j )  0 , i  j

But if there is any interdependence between the disturbance terms then we have
autocorrelation

SET BY MEKONNEN A. 78
LECTURE NOTE ON ECONOMETRICS

E (U iU j )  0 , i  j

Causes of Autocorrelation
Serial correlation may occur because of a number of reasons.
 Inertia (built in momentum) – a salient feature of most economic variables time
series (such as GDP, GNP, price indices, production, employment etc) is inertia
or sluggishness. Such variables exhibit (business) cycles.
 Specification bias – exclusion of important variables or incorrect functional forms
 Lags – in a time series regression, value of a variable for a certain period depends
on the variable‟s previous period value.
 Manipulation of data – if the raw data is manipulated (extrapolated or
interpolated), autocorrelation might result.

Autocorrelation can be negative as well as positive. The most common kind of serial
correlation is the first order serial correlation. This is the case in which this period error
terms are functions of the previous time period error term.

Et  PE t 1  U t
This is also called the first order autoregressive model.
-1 < P < 1
The disturbance term Ut satisfies all the basic assumptions of the classical linear model.

E (U t )  0

E (U t , U t 1 )  0 t  t 1
U t ~ N (0,  2 )

Consequences of serial correlation


When the disturbance term exhibits serial correlation, the values as well as the standard
errors of the parameters are affected.
1) The estimates of the parameters remain unbiased even in the presence of
autocorrelation but the X‟s and the u‟s must be uncorrelated.

SET BY MEKONNEN A. 79
LECTURE NOTE ON ECONOMETRICS

2) Serial correlation increases the variance of the OLS estimators. The minimum
variance property of the OLS parameter estimates is violated. That means the
OLS are no longer efficient.

without serial
correlation

~
Var ( ˆ )  Var (  )
with serial
correlation

 ˆ

Figure 3: The distribution of with and without serial correlation.

3) Due to serial correlation the variance of the disturbance term, Ui may be

underestimated. This problem is particularly pronounced when there is positive


autocorrelation.
4) If the Uis are autocorrelated, then prediction based on the ordinary least squares

estimates will be inefficient. This is because of larger variance of the parameters.


Since the variances of the OLS estimators are not minimal as compared with
other estimators, the standard error of the forecast from the OLS will not have
the least value.
Detecting Autocorrelation
Some rough idea about the existence of autocorrelation may be gained by plotting the
residuals either against their own lagged values or against time.

SET BY MEKONNEN A. 80
LECTURE NOTE ON ECONOMETRICS

et et
et = +ve
et = +ve
et-1 = -ve
et-1 = +ve

et-1 et-1
et = -ve
et-1 = -ve
et-1 = +ve
et = -ve

Figure 4: Graphical detection of autocorrelation

There are more accurate tests for the incidence of autocorrelation. The most common
test of autocorrelation is the Durbin-Watson Test.

The Durbin-Watson d Test


The test for serial correlation that is most widely used is the Durbin-Watson d test. This
test is appropriate only for the first order autoregressive scheme.

U t  PU t 1  Et then Et  PE t 1  U t

The test may be outlined as


HO : P  0

H1 : P  0

This test is, however, applicable where the underlying assumptions are met:
 The regression model includes an intercept term
 The serial correlation is first order in nature
 The regression does not include the lagged dependent variable as an explanatory
variable
 There are no missing observations in the data
The equation for the Durban-Watson d statistic is

SET BY MEKONNEN A. 81
LECTURE NOTE ON ECONOMETRICS
N

 (e t  et 1 ) 2
d  t 2
N

e
t 1
t
2

Which is simply the ratio of the sum of squared differences in successive residuals to
the RSS
Note that the numerator has one fewer observation than the denominator, because an

observation must be used to calculate et 1 . A great advantage of the d-statistic is that it

is based on the estimated residuals. Thus it is often reported together with R², t, etc.
The d-statistic equals zero if there is extreme positive serial correlation, two if there is
no serial correlation, and four if there is extreme negative correlation.
1. Extreme positive serial correlation: d  0

et  et 1 so (et  et 1 )  0 and d 0.

2. Extreme negative correlation: d  4

et  et 1 and (et  et 1 )  (2et )

thus d 
 (2e ) t
2

and d  4
e
2
t

3. No serial correlation: d  2

 (e  e e   et 1  2 et et 1
2 2
t 1 )2
d   2
t t

e e
2 2
t t

Since e e
t t 1  0 , because they are uncorrelated. Since et
2
and e t 1
2
differ in

only one observation, they are approximately equal.

The exact sampling or probability distribution of the d-statistic is not known and,

therefore, unlike the t, X² or F-tests there are no unique critical values which will lead to
the acceptance or rejection of the null hypothesis.

SET BY MEKONNEN A. 82
LECTURE NOTE ON ECONOMETRICS

But Durbin and Watson have successfully derived the upper and lower bound so that if
the computed value d lies outside these critical values, a decision can be made
regarding the presence of a positive or negative serial autocorrelation.

Thus

 (e  e e   et 1  2 et et 1
2 2
t 1 )2
d 
t t

e e
2 2
t t

 2(1 
e e t t 1
)
e
2
t 1

ˆ ) since
d  2(1  P
e e t t 1 ˆ
P
e
2
t 1

But, since –1  P  1 the above identity can be written as: 0  d  4


Therefore, the bounds of d must lie within these limits.

zone of indecision zone of indecision


f(d)

Reject H0 Reject H0

+ve -ve
autocorr. autocorr.

accept H0
no serial
correlation

0 d
dL dU 4-dU 4-dL 4

Thus ˆ  0  d = 2, no serial autocorrelation.


if P
if Pˆ  1  d = 0, evidence of positive autocorrelation.
ˆ  1 d = 4, evidence of negative autocorrelation.
if P

SET BY MEKONNEN A. 83
LECTURE NOTE ON ECONOMETRICS

Decision Rules for Durbin-Watson - d-test


Null hypothesis Decision If

No positive autocorrelation Reject 0 < d < dL

No positive autocorrelation No decision dL  d  dU

No negative autocorrelation Reject


4-dL < d < 4
No negative autocorrelation No decision
4-dU  d  4-dL
No autocorrelation Do not reject
dU < d < 4-dU

Note: Other tests for autocorrelation include the Runs test and the Breusch-Godfrey
(BG) test. There are so many tests of autocorrelation since there is no particular test that
has been judged to be unequivocally best or more powerful in the statistical sense.

Remedial Measures for Autocorrelation


Since in the presence of serial correlation the OLS estimators are inefficient, it is
essential to seek remedial measures.
1) The solution depends on the source of the problem.
 If the source is omitted variables, the appropriate solution is to include these
variables in the set of explanatory variables.
 If the source is misspecification of the mathematical form the relevant
approach will be to change the form.
2) If these sources are ruled out then the appropriate procedure will be to transform
the original data so as to produce a model whose random variable satisfies the
assumptions of non-autocorrelation. But the transformation depends on the pattern
of autoregressive structure. Here we deal with first order autoregressive scheme.
 t  P t 1  U t

SET BY MEKONNEN A. 84
LECTURE NOTE ON ECONOMETRICS

For such a scheme the appropriate transformation is to subtract from the original

observations of each period the product of P̂ times the value of the variables in the
previous period.

Yt *  b0 * b1 X 1t * ...  bK X Kt * U t

Yt *  Yt  Pˆt 1
X it *  X it  PX i (t  1)
where:
Vt  U t  PU t 1
b0 *  b0  Pb0
Thus, if the structure of autocorrelation is known, then it is possible to make the above
transformation. But often the structure of the autocorrelation is not known. Thus, we
need to estimate P in order to make the transformation.
When  is not known
There are different ways of estimating the correlation coefficient,  , if it is unknown.
1) Estimation of  from the d-statistic

ˆ  1 d
ˆ ) or P
Recall that d  2(1  P
2
which suggest a simple way of obtaining an estimate of  from the estimated d statistic.
Once an estimate of  is made available one could proceed with the estimation of the
OLS parameters by making the necessary transformation.
2) Durbin’s two step method
Given the original function as
ˆˆ ˆˆ ˆ
Yt  P Yt 1   0 (1  P )  1 ( X t  ˆX t 1 )  U t *x

let U t  U t 1  Vt

Step 1: start from the transformed model


(Yt  PYt 1 )   0 (1  P)  1 ( X 1t  PX 1t 1 )  ...   K ( X Kt PX t 1 )  U t *

rearranging and setting

SET BY MEKONNEN A. 85
LECTURE NOTE ON ECONOMETRICS

 0 (1  P )  a 0
 1  a1
 1  a 2
etc.

The above equation may be written as

Yt  a0  PYt 1  a1 X 1t  ...  a K X Kt  Vt
Applying OLS to the equation, we obtain an estimate of, which is the coefficient of the

lagged variable Yt 1 .

Step 2: We use this estimate, ̂ to obtain the transformed variables.


(Yt  PYt 1 )  Yt *
( X 1t  X 1t 1 )  X 1t *
...
( X Kt  PX Kt 1 )  X Kt *
We use this model to estimate the parameters of the original relationship.
Yt *   0   1 X 1t * ...   K X Kt * Vt

The methods discussed above to solve the problem of serial autocorrelation are
basically two step methods. In step 1, we obtain an estimate of the unknown  and in
step 2, we use that estimate to transform the variables to estimate the generalized
difference equation.
Note: The Cochrane-Orcutt Iterative Method is also another method.

Multicollinearity: Exact linear correlation between Regressors


One of the classical assumptions of the regression model is that the explanatory
variables are uncorrelated. If the assumption that no independent variable is a perfect
linear function of one or more other independent variables is violated we have the
problem of multicollinearity. If the explanatory variables are perfectly linearly
correlated, the parameters become indeterminate. It is impossible to find the numerical
values for each parameter and the method of estimation breaks.

SET BY MEKONNEN A. 86
LECTURE NOTE ON ECONOMETRICS

If the correlation coefficient is 0, the variables are called orthogonal; there is no problem
of multicollinearity. Neither of the above two extreme cases is often met. But some
degree of inter-correlation is expected among the explanatory variables, due to the
interdependence of economic variables.

Multicollinearity is not a condition that either exists or does not exist in economic
functions, but rather a phenomenon inherent in most relationships due to the nature of
economic magnitude. But there is no conclusive evidence which suggests that a certain
degree of multicollinearity will affect seriously the parameter estimates.

Reasons for Existence of Multicollinearity


There is a tendency of economic variables to move together over time. For example,
Income, consumption, savings, investment, prices, employment tend to rise in the
period of economic expansion and decrease in a period of recession. The use of lagged
values of some explanatory variables as separate independent factors in the relationship
also causes multicollinearity problems.
Example: Consumption = f(Yt, Yt-1, ...)

Thus, it can be concluded that multicollinearity is expected in economic variables.


Although multicollinearity is present also in cross-sectional data it is more a problem of
time series data.

Consequences of Multicollinearity
Recall that, if the assumptions of the classical linear regression model are satisfied, the
OLS estimators of the regression estimators are BLUE. As stated above if there is perfect
multicollinearity between the explanatory variables, then it is not possible to determine
the regression coefficients and their standard errors. But if collinearity among the X-
variables is high, but not perfect, then the following might be expected.
Nevertheless, the effect of collinearity is controversial and by no means conclusive.

SET BY MEKONNEN A. 87
LECTURE NOTE ON ECONOMETRICS

1) The estimates of the coefficients are statistically unbiased. Even if an equation


has significant multicollinearity, the estimates of the parameters will still be
centered around the true population parameters.
2) When multicollinearity is present in a function, the variances and therefore the
standard errors of the estimates will increase, although some econometricians
argue that this is not always the case.

without severe
multicollinearity

with severe
multicollinearity

̂

(3) The computed t-ratios will fall i.e. insignificant t-ratios will be observed in the

presence of multicollinearity. t   Since increases t-falls. Thus


SE( 
ˆ)
SE ( ˆ )

one may increasingly accept the null hypothesis that the relevant true
population‟s value is zero

H0 : i  0
Thus because of the high variances of the estimates the null hypothesis would
be accepted.

(4) A high R² but few significant t-ratios are expected in the presence of
multicollinearity. So one or more of the partial slope coefficients are

individually statistically insignificant on the basis of the t-test. Yet the R² may
be so high. Indeed, this is one of the signals of multicollinearity, insignificant t-

values but a high overall R² and F-values. Thus because multicollinearity has

SET BY MEKONNEN A. 88
LECTURE NOTE ON ECONOMETRICS

little effect on the overall fit of the equation, it will also have little effect on the
use of that equation for prediction or forecasting.

Detecting Multicollinearity
Having studied the nature of multicollinearity and the consequences of
multicollinearity, the next question is how to detect multicollinearity. The main purpose
in doing so is to decide how much multicollinearity exists in an equation, not whether
any multicollinearity exists. So the important question is the degree of multicollinearity.
But there is no one unique test that is universally accepted. Instead, we have some rules
of thumb for assessing the severity and importance of multicollinearity in an equation.
Some of the most commonly used approaches are the following:

1) High R² but few significant t-ratios

This is the classical test or symptom of multicollinearity. Often if R ² is high (R² > 0.8) the
F-test in most cases will reject the hypothesis that the partial slope coefficients are
simultaneously equal to zero, but the individual t-tests will show that none or very few
partial slope coefficients are statistically different from zero. In other words,
multicollinearity that is severe enough to substantially lower t-scores does very little to

decrease R² or the F-statistic.

So the combination of high R² with low calculated t-values for the individual regression
coefficients is an indicator of the possible presence of severe multicollinearity.
Drawback: a non-multicollinear explanatory variable may still have a significant
coefficient even if there is multicollinearity between two or more other explanatory
variables Thus, equations with high levels of multicollinearity will often have one or

two regression coefficients significantly different from zero, thus making the “high R²
low t” rule a poor indicator in such cases.
1) High pair-wise (simple) correlation coefficients among the regressors (explanatory
variables).

SET BY MEKONNEN A. 89
LECTURE NOTE ON ECONOMETRICS

If the R‟s are high in absolute value, then it is highly probable that the X‟s are highly
correlated and that multicollinearity is a potential problem. The question is how high r
should be to suggest multicollinearity. Some suggest that if r is in excess of 0.80, then
multicollinearity could be suspected.
Another rule of thumb is that multicollinearity is a potential problem when the squared

simple correlation coefficient is greater than the unadjusted R².

Two X‟s are severely multicollinear if ( rxi x j )


2
 R2 .
A major problem of this approach is that although high zero-order correlations may
suggest collinearity, it is not necessary that they be high to have collinearity in any
specific case.
2) VIF and Tolerance
VIF shows the speed with which the variances and covariances increase. It also shows
how the variance of an estimator is influenced by the presence of multicollinearity. VIF
is defined as followers:
1
VIF 
(1  r 2 23 )

Where, r23 is the correlation between two explanatory variables? As r 2 23 approaches 1,

the VIF approaches infinity. If there no collinearity, VIF will be 1. As a rule of thumb,
VIF value of 10 or more shows multicollinearity is sever problem. Tolerance is defined
as the inverse of VIF.
3) Other more formal tests for multicollinearity
The use of formal tests to give any indications of the severity of the multicollinearity in
a particular sample is controversial. Some econometricians reject even the simple
indicators developed above, mainly because of the limitations cited. Some people tend
to use a number of more formal tests. But none of these is accepted as the best.

Remedies for Multicollinearity

SET BY MEKONNEN A. 90
LECTURE NOTE ON ECONOMETRICS

There is no automatic answer to the question “what can be done to minimize the
problem of multicollinearity.” The possible solution which might be adopted if
multicollinearity exists in a function, vary depending on the severity of
multicollinearity, on the availability of other data sources, on the importance of factors
which are multicollinear, on the purpose for which the function is used. However, some
alternative remedies could be suggested for reducing the effect of multicollinearity.
1) Do Nothing
Some writers have suggested that if multicollinearity does not seriously affect the
estimates of the coefficients one may tolerate its presence in the function. In a sense,
multicollinearity is similar to a non-life threatening human disease that requires an
operation only if the disease is causing a significant problem. A remedy for
multicollinearity should only be considered if and when the consequences cause
insignificant t-scores or widely unreliable estimated coefficients.
2) Dropping one or more of the multicollinear variables
When faced with severe multicollinearity, one of the simplest way to get rid of (drop)
one or more of the collinear variables. Since multicollinearity is caused by correlation
between the explanatory variables, if the multicollinear variables are dropped the
correlation no longer exists.
Some people argue that dropping a variable from the model may introduce
specification error or specification biases. According to them since OLS estimators are
still BLUE despite near collinearity omitting a variable may seriously mislead us as to
the true values of the parameters.
Example: If economic theory says that income and wealth should both be included in
the model explaining the consumption expenditure, dropping the wealth variable
would constitute specification bias.
3) Transformation of the variables
If the variables involved are all extremely important on theoretical grounds, neither
doing nothing nor dropping a variable could be helpful. But it is sometimes possible to

SET BY MEKONNEN A. 91
LECTURE NOTE ON ECONOMETRICS

transform the variables in the equation to get rid of at least some of the
multicollinearity.
Two common such transformations are:
(i) to form a linear combination of the multicollinear variables
(ii) to transform the equation into first differences (or logs)
The technique of forming a linear combination of two or more of the multicollinearity
variables consists of:

 creating a new variable that is a function of the multicollinear variables


 using the new variable to replace the old ones in the regression equation (if X1

and X2 are highly multicollinear, a new variable, X3 = X1 + X2 or K1X1 + K2X2

might be substituted for both of the multicollinear variables in a re-estimation of


the model)
The second kind of transformation to consider as possible remedy for severe
multicollinearity is to change the functional form of the equation.
A first difference is nothing more than the change in a variable from the previous time-
period.
X t  X t  X t 1

If an equation (or some of the variables in an equation) is switched from its normal
specification to a first difference specification, it is quite likely that the degree of
multicollinearity will be significantly reduced for two reasons.
 Since multicollinearity is a sample phenomenon, any change in the definitions of
the variables in that sample will change the degree of multicollinearity.
 Multicollinearity takes place most frequently in time-series data, in which first
differences are far less likely to move steadily upward than are the aggregates
from which they are calculated.
(4) Increase the sample size

SET BY MEKONNEN A. 92
LECTURE NOTE ON ECONOMETRICS

Another solution to reduce the degree of multicollinearity is to attempt to increase the


size of the sample. Larger data set (often requiring new data collection) will allow more
accurate estimates than a small one, since the large sample normally will reduce
somewhat the variance of the estimated coefficients reducing the impact of
multicollinearity. But, for most economic and business applications, this solution is not
feasible. As a result new data are generally impossible or quite expensive to find. One
way to increase the sample is to pool cross-sectional and time series data.

3) Other Remedies
There are several other methods suggested to reduce the degree of multicollinearity.
Often multivariate statistical techniques such as Factor analysis and Principal
component analysis or other techniques such as ridge regression are often employed to
solve the problem of multicollinearity.

Summary
The Ordinary Least Squares methods will work when the assumptions of classical linear
regression models hold. One of the critical assumptions of the classical linear regression
model is that the disturbances have all same variance the violation of which leads to
heteroscedasticity. Heteroscedasticity does not destroy the unbiasedness and
consistency properties of OLS estimators but the efficiency property. There are several
diagnostic tests available for detecting it but one cannot tell for sure which will work in
a given situation. Eventhough it is detected, it is not easy to correct it. Transforming the
data might be a possible way out. The other assumption is that there is no
multicollinearity (exact or approximately exact linear relationship) among the
explanatory variables. If there is perfect collinearity, the regression coefficients are
indeterminate. Although there are no sure methods of detecting collinearity, there are
several indicators of it. The clearest sign of it is when R2 is very high but none of the
regression coefficients is statistically significant. Detection of multicollinearity is half the
battle, the other half is concerned with how to get rid of it. Although there are no sure
methods, there are only few rules of thumb such as use of extraneous or priori

SET BY MEKONNEN A. 93
LECTURE NOTE ON ECONOMETRICS

information, omitting highly collinear variable, transforming data and increasing


sample size. Serial autocorrelation is an econometric problem which arises when the
disturbance terms are correlated to each other. It might be caused due to sluggishness of
economic time series, specification bias, data massaging or data transformation.
Although it exists, the OLS estimators may remain unbiased, consistent and normally
distributed but not efficient. There are formal and informal methods of detecting
autocorrelation of which the Durbin-Watson d test is the popular one. The remedy
depends on the nature of the interdependence among the disturbance terms.

SET BY MEKONNEN A. 94
LECTURE NOTE ON ECONOMETRICS

Chapter Six

Non-Linear Regression Analysis


The relationship between the dependent variable Y and the independent variable X can
be non-linear rather than linear.

Some Commonly Used Non- Linear Models

a) Log-linear, double Log or constant elasticity model

The most common functional form that is non-linear in the variable (but still linear in
the coefficients) is the log-linear form. A log-linear form is often used, because the
elasticities and not the slopes are constant i.e.,  =  Constant.

Output

Input

Thus, given the assumption of a constant elasticity, the proper form is the exponential
(log-linear) form.

Given: Yi   0 X i i eU i
The log-linear functional form for the above equation can be obtained by a logarithmic
transformation of the equation.

ln Yi  ln  0   i ln X i  U i
The model can be estimated by OLS if the basic assumptions are fulfilled.

SET BY MEKONNEN A. 95
LECTURE NOTE ON ECONOMETRICS

demand gd(log f)

ln Yi  ln  0  1 ln X i
 1
Yi   0 X i

price log f price

The model is also called a constant elasticity model because the coefficient of elasticity
between Y and X (1) remains constant.

Y X d ln Y
   1
X Y d ln X
This functional form is used in the estimation of demand and production functions.

Note: We should make sure that there are no negative or zero observations in the data
set before we decide to use the log-linear model. Thus log-linear models should be run
only if all the variables take on positive values.

b) Semi-log Form

The semi-log functional form is a variant of the log-linear equation in which some but
not all of the variables (dependent and independent) are expressed in terms of their
logs. Such models expressed as:

( i ) Yi   0   1 ln X 1i  U i ( lin-log model ) and ( ii ) ln Yi   0   1 X 1i  U i ( log-lin

model ) are called semi-log models. The semi-log functional form, in the case of taking
the log of one of the independent variables, can be used to depict a situation in which
the impact of X on Y is expected to „tail off‟ as X gets bigger as long as 1 is greater than

zero.

SET BY MEKONNEN A. 96
LECTURE NOTE ON ECONOMETRICS

1<0
Y=0+1Xi

1>0

Example: The Engel‟s curve tends to flatten out, because as incomes get higher, a
smaller percentage of income goes to consumption and a greater percentage goes to
saving.

 Consumption thus increases at a decreasing rate.

 Growth models are examples of semi-log forms

c) Polynomial Form

Polynomial functional forms express Y as a function of independent variables some of


which are raised to powers others than one. For example in a second degree
polynomial (quadratic) equation, at least one independent variable is squared.

Y   0  1 X 1i   2 X 1i   3 X 2i  U i
2

Such models produce slopes that change as the independent variables change. Thus the
slopes of Y with respect to the Xs are

Y Y
  1  2  2 X 1 , and  3
X 1 X 2

In most cost functions, the slope of the cost curve changes as output changes.

SET BY MEKONNEN A. 97
LECTURE NOTE ON ECONOMETRICS

Y Y

A) B)

X
Xi Impact of age on earnings
a typical cost curve

Simple transformation of the polynomial could enable us to use the OLS method to
estimate the parameters of the model

X1  X 3
2
Setting

 Y   0   1 X 1i   2 X 3   3 X 2i  U i

d) Reciprocal Transformation (Inverse Functional Forms)

The inverse functional form expresses Y as a function of the reciprocal (or inverse) of
one or more of the independent variables (in this case X1):

1
Yi   0  1 ( )   2 X 2i  U i
X 1i

Or
1
Yi   0  1 ( )   2 X 2i  U i
X 1i

The reciprocal form should be used when the impact of a particular independent
variable is expected to approach zero as that independent variable increases and
eventually approaches infinity. Thus as X1 gets larger, its impact on Y decreases.

SET BY MEKONNEN A. 98
LECTURE NOTE ON ECONOMETRICS

1 0  0
Y  0 
X 1i 1  0

0

1 0  0
Y  0 
X 1i 1  0

An asymptote or limit value is set that the dependent variable will take if the value of
the X-variable increases indefinitely i.e. 0 provides the value in the above case. The

function approaches the asymptote from the top or bottom depending on the sign of 1.

Example: Phillips curve, a non-linear relationship between the rate of unemployment


and the percentage wage change.

1
Wt   0  1 ( ) Ut
Ut

References
Greene, W. H. (2002). Econometric Analysis. 5th Edition. Macmillan, New York.
Maddala, G.S. (1992). Introduction to Econometrics. 2nd Edition.

Gujarati D. 2004 Basic Econometrics. 4th ed. Tata mc grow hill.

Koutsoyianis A., 2001. Theory of Economietrics. 2nd ed. Replicas press. Pvt.ltd. New
Delhi.

Wooldridge 2005. Introductory Econometrics. 3rd ed. Koutsoyiannis, (2001). Theory of


Econometrics, 2nd edition.

SET BY MEKONNEN A. 99
LECTURE NOTE ON ECONOMETRICS

SET BY MEKONNEN A. 100

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy