Handout - Basic Regression - Analysis
Handout - Basic Regression - Analysis
Contents
1 What is Regression? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3 Regression vs Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
8 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1
1 What is Regression?
“The term regression was introduced by Francis Galton. In a famous paper, Galton found
that, although there was a tendency for tall parents to have tall children and for short parents
to have short children, the average height of children born of parents of a given height tended
to move or “regress” toward the average height in the population as a whole. In other words,
the height of the children of unusually tall or unusually short parents tends to move toward
the average height of the population. Galton’s law of universal regression was confirmed by
his friend Karl Pearson, who collected more than a thousand records of heights of members of
family groups. He found that the average height of sons of a group of tall fathers was less than
their fathers’ height and the average height of sons of a group of short fathers was greater than
their fathers’ height, thus “regressing” tall and short sons alike toward the average height of
all men. In the words of Galton, this was regression to mediocrity.” (Gujarati, 2004)
In simple words, regression analysis is concerned with how one variable (dependent)
is influenced by the other (independent) variables. That is, if Y is a dependent variable
and X is an independent variable then regression of Y on X would simply mean “explain-
ing Y in terms of X” , or “examining how Y varies with changes in X”.
3 Regression vs Correlation
When we talk about correlation, the primary concern is to measure how strongly two
variables are associated linearly. In contrast, regression analysis allows us to estimate
the effect of independent (explanatory) variable/s on a dependent variable. Thus, the
fundamental difference between correlation and regression is that the latter distinguishes
between the dependent and independent variable while the former does not differ in the
treatment of the variables.
1
”Stochastic Variables are variables that have probability distributions. The word stochastic comes from the Greek
word stokhos meaning “a bull’s eye” wherein the outcome of throwing darts on a dart board is a stochastic process,
that is, it is affected by misses.” (Gujarati, 2004)
2
4 An Example of a Simple Regression Model
In this section, we take a simple example from Gujarati and Porter’s (2009) book and
modify it. The example is as follows: We assume that the total population 2 consists of 60
students and their weekly pocket money (read, income) (X) and consumption expendi-
ture (Y ) in a batch$. The 60 students are divided into 10 different income groups starting
from Rs. 800 to Rs. 2600 and the weekly expenditures of each student are reported in
their respective income group (see table 1). As can be seen from the table, we have 10
fixed values of X and the corresponding Y values against each of these X.
600 700 840 930 1070 1150 1360 1370 1450 1520
650 740 900 950 1100 1200 1400 1400 1550 1750
700 800 940 1030 1160 1300 1440 1520 1650 1780
750 850 980 1080 1180 1350 1450 1570 1750 1800
- 880 - 1130 1250 1400 - 1600 1890 1850
- - - 1150 - - - 1620 - 1910
Total 3250 4620 4450 7070 6780 7500 6850 10430 9660 12110
Conditional Mean of Y , 650 770 890 1010 1130 1250 1370 1490 1610 1730
E(Y |X)
1. Weekly consumption expenditure of students varies even in the same income group.
2. Despite the marked variations, it can be seen that, on an average, consumption ex-
penditure of a student increases with a rise in income.
3. Conditional Mean are mean calculated against a specific income group. Conditional
Mean values are also called Conditional Expected Values, as they are conditional or
depend on the given values of variable X.
4. Unconditional Mean, on the other hand, is the simple arithmetic mean of consump-
tion expenditures of 60 families, Rs. 1210.20. We arrived at unconditional mean in
2
The word “population” comes from the fact that we are dealing in this example with the entire popu-
lation of 60 students.
3
the sense that we do not consider or care about the income levels.
The dark circle points in the above figure represents the conditional mean of Y given
the values of X. Joining these values gives us the Population Regression Line (curve). In
simple words, it is the regression Y on X.
Definition: A population regression curve is simply the locus of the conditional means of the
dependent variable for the fixed values of the explanatory variable(s). (Gujarati, 2004)
4
5 Population Regression Function
As discussed in the previous section, it is comprehensible that each conditional mean
E(Y |Xi ) is a function of Xi where Xi can be a given value of X.
The above eq. (1) is called as the Population Regression Function (PRF). In simple words
, PRF tells you how, on average, Y responds to variations in X.
Now, the question arises that what form does the function f (Xi ) assume? We assume that
the PRF, E(Y |Xi ), is a linear function of Xi (we will discuss in a bit what do we mean by
linearity here).
The above eq. (2) is the linear PRF while our interest lies in estimating α and β, which are
unknowns, on the basis of Y and X.
Y = α + βX
Y = α + βX 2
Y = α + β1 X + β2 X 2
Y = e(α + βX)
Y = α + β1 X + β2 X 2 + β3 X 3
are all linear in parameters and hence are linear according to our assumption.
5
5.2 Stochastic Specification of the PRF
As depicted in Fig. 1, we can infer that a student’s weekly consumption expenditure is
positively related to their income. It is interesting to note that expenditure of a student in
an a particular income group may not always increase with a rise in income (See Table 1).
However, one important observation from figure 1 is that, for a given income group, Xi , a
student expenditure is clustered around the conditional mean of expenditure within that
income group. Hence, deviation of a particular student’s Yi around its conditional mean
is:
ui = Yi − E(Y |Xi )
Here, the deviation ’ui ’ is an unobservable component/variable that can take any positive
or negative values. ui is also known as the stochastic disturbance or stochastic error term.
From eq. (3), we can conclude that the expenditure of an individual family, given its
income level, is the sum of two components:
• Mean expenditure of all families with the same income level (deterministic compo-
nent)
• A random component.
Y1 = 790 = α + β1 (1200) + u1
Y2 = 840 = α + β1 (1200) + u2
Y3 = 900 = α + β1 (1200) + u3
Y4 = 940 = α + β1 (1200) + u4
Y5 = 980 = α + β1 (1200) + u5
Y6 = 88 = β1 + β2 (100) + u6 (5)
6
Now , if we take expected value on both sides of eq.(4), we can rewrite eq.(4) as:
It must be noted that E(Y |Xi ) is a constant since the value of Xi is fixed, and also the
expected value of a constant is a constant. Thus, we can write (6)as:
E(ui |Xi ) = 0
Hence, the assumption that a regression line is the locus of the conditional means of Y
implies that the conditional mean value of ui is zero.
Here, SRF1 is the regression line based on the first sample in Table 2 while SRF2 is
based on the second sample drawn as in Table 3.. The regression lines in the figure are
called as sample regression lines. Till now we know from eq. (2)
7
Table 2: Sample 1 Table 3: Sample 2
X Y Y X
700 800 550 800
650 1000 880 1000
900 1200 900 1200
950 1400 800 1400
1100 1600 1180 1600
1150 1200 1200 1800
1200 2000 1450 2000
1400 2200 1350 2200
1550 2400 1450 2400
1600 2600 1750 2600
8
Ŷi = estimator of E(Y /Xi )
α̂ = estimator of α
β̂ = estimator of β
Yi = α̂ + β̂Xi + ûi
where ûi is an estimate of ui which denotes the error or deviation of residual term.
An estimator is a method which tells us how to estimate the population parameter and
the the values obtained such as β̂1 and β̂2 are called an estimate. In simple words, our
objective is to estimate PRF
Yi = α + βXi + ui
Rewriting,
Yi = Ŷi + ûi
where,
Ŷi = α̂ + β̂Xi
from eqn.7
ûi = Yi − Ŷi
and SRF is
Yi = α̂ + β̂Xi + ûi (9)
9
Figure 3: PRF and SRF as in Gujarati (2004)
ûi = Yi − Ŷi
= Yi − α̂ − β̂Xi (11)
which implies that ûi (or the errors) are the difference between Yi (actual) and Ŷi (esti-
mated) values. Our main objective is to determine SRF, Ŷi , such that it is as close as to the
P P
actual Yi . In other words, we want to choose SRF in which the residuals ûi = (Yi − Ŷi )
is minimum.
P
However, if we adopt the criterion of minimising ûi , then all the residuals in Fig. 4
are given equal weightage. The problem with this is that even if the residuals are widely
scattered such as u1 = 20, u2 = −5, u3 = 5 and u4 = −20 then it will add upto zero. But in
reality, the residuals are far from the SRF in case of û1 and û4 . To overcome this limitation,
10
Figure 4: Plot for Deviation of residuals from SRF
we employ the least square criterion in which SRF can be presented in such a way that
X X
û2i = (Yi − Ŷi )2
X
= (Yi − α̂ − β̂Xi )2 (12)
is the least possible, and squaring ûi will give more weightage to the û1 and û4 and
overcomes the problem of individual small algebraic terms even if the residuals are large.
11
7.2 Derivation of least square estimate
From equation (12) we know that
X X
û2i = (Yi − α̂ + β̂Xi )2 (13)
Now differentiating the above equation partially w.r.t α̂ and β̂, we get
∂( û2i )
P
=0
X ∂
X α̂ (14)
⇒ −2 (Yi − α̂ − β̂Xi ) = −2 ûi = 0
û2i )
P
∂(
=0
∂ β̂ (15)
X X
⇒ −2 (Yi − α̂ − β̂Xi )Xi = −2 ûi Xi = 0
X X
(Yi − α̂ − β̂Xi ) = ûi Xi = 0
= cov(û
ˆ i , X̂i ) = 0
P
Xi
We also know that, X̄ =
P n
therefore Yi = nȲ
P
⇒ Xi = nX̄
Solution 1:
X X
(Yi − Ȳ )(Xi − X̄) = (Yi Xi − Yi X̄ − Ȳ Xi + X̄ Ȳ )
X X X X
= Yi Xi − Yi X̄ − Ȳ Xi + X̄ Ȳ
P
X X X Yi
= Yi Xi − nȲ X̄ − Ȳ Xi + X̄ Ȳ (since Ȳ = )
X n
= Yi Xi − nȲ X̄ − nX̄ Ȳ + nX̄ Ȳ
X
= Yi Xi − nȲ X̄
12
X X
(Yi − Ȳ )(Xi − X̄) = Yi Xi − nȲ X̄ (16)
Solution 2:
X X
(Yi − Ȳ )Xi = (Yi Xi − Ȳ Xi )
X
= Yi Xi − nX̄ Ȳ
X X
(Yi − Ȳ )(Xi − X̄) = (Yi − Ȳ )Xi
X
= Yi Xi − nX̄ Ȳ
Solution 3:
X X
(Xi − X̄)2 = (Xi − X̄)(Xi − X̄)
X
= (Xi − X̄)Xi
X
= Xi2 − nX̄ 2
X X
(Xi − X̄)2 = Xi2 − nX̄ 2 (17)
α̂ = Ȳ − β̂ X̄ (18)
13
Similarly, deriving F.O.C for Eq. (15)
X
−2 (Yi − α̂ − β̂Xi )Xi =0
X X
⇒ (Yi Xi − α̂Xi − β̂ Xi2 =0
X X X
⇒ Yi Xi − α̂ Xi − β̂ Xi2 =0
X X
⇒ Yi Xi − (Ȳ − β̂ X̄)nX̄ − β̂ Xi2 =0
X X
⇒ Yi Xi − nX̄ Ȳ + nβ̂ X̄ 2 − β̂ Xi2 =0
cov(Y
ˆ i , Xi )
β̂ =
varX
ˆ i
8 References
• Gujarati, D. (2004). Basic econometrics fourth (4th) edition. Magraw Hill Inc, New
York, 109.
14