Introduction To Econometrics, 5 Edition: Chapter 1: Simple Regression Analysis
Introduction To Econometrics, 5 Edition: Chapter 1: Simple Regression Analysis
Introduction To Econometrics, 5 Edition: Chapter 1: Simple Regression Analysis
Dougherty
Introduction to Econometrics,
5th edition
Chapter heading
Chapter 1: Simple Regression
Analysis
120
100
Hourly earnings ($)
80
60
40
20
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Years of schooling (highest grade completed)
The scatter diagram shows hourly earnings in 2011 plotted against years of schooling,
defined as highest grade completed, for a sample of 500 respondents from the National
Longitudinal Survey of Youth 1997–.
1
INTERPRETATION OF A REGRESSION EQUATION
120
100
Hourly earnings ($)
80
60
40
20
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Years of schooling (highest grade completed)
Highest grade completed means just that for elementary and high school. Grades 13, 14,
and 15 mean completion of one, two and three years of college.
2
INTERPRETATION OF A REGRESSION EQUATION
120
100
Hourly earnings ($)
80
60
40
20
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Years of schooling (highest grade completed)
3
INTERPRETATION OF A REGRESSION EQUATION
. reg EARNINGS S
----------------------------------------------------------------------------
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 1, 498) = 46.57
Model | 6014.04474 1 6014.04474 Prob > F = 0.0000
Residual | 64314.9215 498 129.146429 R-squared = 0.0855
-----------+------------------------------ Adj R-squared = 0.0837
Total | 70328.9662 499 140.939812 Root MSE = 11.364
----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.265712 .1854782 6.82 0.000 .9012959 1.630128
_cons | .7646844 2.803765 0.27 0.785 -4.743982 6.273351
----------------------------------------------------------------------------
This is the output from a regression of earnings on years of schooling, using Stata.
4
INTERPRETATION OF A REGRESSION EQUATION
. reg EARNINGS S
----------------------------------------------------------------------------
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 1, 498) = 46.57
Model | 6014.04474 1 6014.04474 Prob > F = 0.0000
Residual | 64314.9215 498 129.146429 R-squared = 0.0855
-----------+------------------------------ Adj R-squared = 0.0837
Total | 70328.9662 499 140.939812 Root MSE = 11.364
----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.265712 .1854782 6.82 0.000 .9012959 1.630128
_cons | .7646844 2.803765 0.27 0.785 -4.743982 6.273351
----------------------------------------------------------------------------
For the time being, we will be concerned only with the estimates of the parameters. The
variables in the regression are listed in the first column and the second column gives the
estimates of their coefficients.
5
INTERPRETATION OF A REGRESSION EQUATION
. reg EARNINGS S
----------------------------------------------------------------------------
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 1, 498) = 46.57
Model | 6014.04474 1 6014.04474 Prob > F = 0.0000
Residual | 64314.9215 498 129.146429 R-squared = 0.0855
-----------+------------------------------ Adj R-squared = 0.0837
Total | 70328.9662 499 140.939812 Root MSE = 11.364
----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | 1.265712 .1854782 6.82 0.000 .9012959 1.630128
_cons | .7646844 2.803765 0.27 0.785 -4.743982 6.273351
----------------------------------------------------------------------------
In this case there is only one variable, S, and its coefficient is 1.27. _cons, in Stata, refers
to the constant. The estimate of the intercept is 0.76.
6
INTERPRETATION OF A REGRESSION EQUATION
120
^
EARNINGS = 0.76 + 1.27 S
100
Hourly earnings ($)
80
60
40
20
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Years of schooling (highest grade completed)
Here is the scatter diagram again, with the regression line shown.
7
INTERPRETATION OF A REGRESSION EQUATION
120
^
EARNINGS = 0.76 + 1.27 S
100
Hourly earnings ($)
80
60
40
20
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Years of schooling (highest grade completed)
8
INTERPRETATION OF A REGRESSION EQUATION
120
^
EARNINGS = 0.76 + 1.27 S
100
Hourly earnings ($)
80
60
40
20
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Years of schooling (highest grade completed)
To answer this question, you must refer to the units in which the variables are measured.
9
INTERPRETATION OF A REGRESSION EQUATION
120
^
EARNINGS = 0.76 + 1.27 S
100
Hourly earnings ($)
80
60
40
20
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Years of schooling (highest grade completed)
120
^
EARNINGS = 0.76 + 1.27 S
100
Hourly earnings ($)
80
60
40
20
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Years of schooling (highest grade completed)
11
INTERPRETATION OF A REGRESSION EQUATION
20
18
Hourly earnings ($)
$15.95
16
$14.69
$1.27
one year
14
12
10.5 11 12
Years of schooling (highest grade completed)
The regression line indicates that completing 12th grade instead of 11th grade would
increase earnings by $1.27, from $14.69 to $15.95, as a general tendency. (Yes, there is a
discrepancy of 0.01, caused by rounding error.)
12
INTERPRETATION OF A REGRESSION EQUATION
120
^
EARNINGS = 0.76 + 1.27 S
100
Hourly earnings ($)
80
60
40
20
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Years of schooling (highest grade completed)
You should ask yourself whether this is a plausible figure. If it is implausible, this could be
a sign that your model is misspecified in some way.
13
INTERPRETATION OF A REGRESSION EQUATION
120
^
EARNINGS = 0.76 + 1.27 S
100
Hourly earnings ($)
80
60
40
20
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Years of schooling (highest grade completed)
For low levels of education it might be plausible. But for high levels it would seem to be an
underestimate.
14
INTERPRETATION OF A REGRESSION EQUATION
120
^
EARNINGS = 0.76 + 1.27 S
100
Hourly earnings ($)
80
60
40
20
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Years of schooling (highest grade completed)
What about the constant term? (Try to answer this question yourself before continuing with
this sequence.)
15
INTERPRETATION OF A REGRESSION EQUATION
120
^
EARNINGS = 0.76 + 1.27 S
100
Hourly earnings ($)
80
60
40
20
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Years of schooling (highest grade completed)
Literally, the constant indicates that an individual with no years of education would earn
$0.76 per hour.
16
INTERPRETATION OF A REGRESSION EQUATION
120
^
EARNINGS = 0.76 + 1.27 S
100
Hourly earnings ($)
80
60
40
20
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Years of schooling (highest grade completed)
17
INTERPRETATION OF A REGRESSION EQUATION
120
^
EARNINGS = 0.76 + 1.27 S
100
Hourly earnings ($)
80
60
40
20
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Years of schooling (highest grade completed)
A safe solution to the problem is to limit the interpretation to the range of the sample data,
and to refuse to extrapolate on the ground that we have no evidence outside the data range.
18
INTERPRETATION OF A REGRESSION EQUATION
120
^
EARNINGS = 0.76 + 1.27 S
100
Hourly earnings ($)
80
60
40
20
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Years of schooling (highest grade completed)
With this explanation, the only function of the constant term is to enable you to draw the
regression line at the correct height on the scatter diagram. It has no meaning of its own.
19
INTERPRETATION OF A REGRESSION EQUATION
120
^
EARNINGS = 0.76 + 1.27 S
100
Hourly earnings ($)
80
60
40
20
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Years of schooling (highest grade completed)
Another solution is to explore the possibility that the true relationship is nonlinear (see the
red curve) and that we are approximating it with a linear regression. We will soon extend
the regression technique to fit nonlinear models.
20
Copyright Christopher Dougherty 2016.
Individuals studying econometrics on their own who feel that they might benefit
from participation in a formal course should consider the London School of
Economics summer school course
EC212 Introduction to Econometrics
http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx
or the University of London International Programmes distance learning course
EC2020 Elements of Econometrics
www.londoninternational.ac.uk/lse.
2016.04.16