Ba Rimsr

Download as pdf or txt
Download as pdf or txt
You are on page 1of 110

I Today's Agenda

• Introduction
• Discussion of Syllabus
• Review of linear regressions

My expectation is that
you've seen most of this Despite trying to do much
before; but it is helpful to of it without math; today's
review the key ideas that lecture likely to be long
are useful in practice and tedious. . . (sorry)
(without all the math)
I Linear Regression - Outline
• The CEF and causality (very brief)
• Linear 0 LS niodel
• Multivariate estiniation
• Hypothesis testing
• Miscellaneous issues

We will cover the latter


two in the next lecture
I Background readings
• Angrist and Pischke
o Sections 3.1-3.2) 3.4.1

• Wooldridge
o Sections4.1 &4.2

• Greene
o Chapter 3 and Sections 4.14.4) 5. 7-5.9) 6.1-6.2

• Cohn, Liu, Wardlaw (JFE 2022)


I Motivation
• Linear regression is arguably the niost popular
niodeling approach in corporate finance
o Transparent and intuitive
o Very robust technique; easy to build on
o Even if not [nterested in causali , it is useful for
describing the dat

Given importance, we will spend today &


next lecture reviewing the key ideas
Motivation continued ...
• As researchers, we are interested
explaining how the world works
o E.g., how are firms' choices regarding leverage
explained by their investment opportunities
• I.e., if investment opportunities suddenly jumped
for some random reason, how would we expect
firms' leverage to respond on average?

o More broadly, how is y ex2lained by x, where


bothy and x are random variables?
I Linear Regression - Outline
• The CEF and causality (very brief)
o Random variables & the CEF
o Using OLS to learn about the CEF
o Briefly describe ''causality''

• Linear 0 LS nlodel
• Multivariate estinlation
• Hypothesis testing
• Miscellaneous issues
I A bit about random variables
• It is useful know that any randoni variable y
can be written as

y=E(y x)+c
where (y) ~ E) are random variables and E(EI x}=O

o E(y Ix) is expected value ofy given x


o In words,y can be broken down into part
'explained' by x, E(y Ix)) and a piece that is
mean independent of x, E
Conditional expectation functio (CEF)

• E(y Ix) is what we call the CEF, and


it has very desirable properties
o Natural way to think about relationship
between x andy
o And it is best predictor of v given x
0

in a minimum mean-squared error sense


• I.e., E(y Ix) minimizes E[(y-m(x))~,where
m(x) can be any function of x.
I CEF visually ...
• E(JI Ix) is fixed, but unobservable
y

Our goal is
to learn about
E(ylx) = c
theCEF

• Intuition: for any value of x, distribution


ofy is centered about E(JI Ix)
I Linear Regression - Outline
• The CEF and causality (very brief)
o Random variables & the CEF
o Using OLS to learn about the CEF
o Briefly describe ''causality''

• Linear 0 LS nlodel
• Multivariate estinlation
• Hypothesis testing
• Miscellaneous issues
I Linear regression and the CEF
• If done correctly, a linear regression can
help us uncover what the CEF is
• Consider linear regression model, y = Px + u
oy = dependent variable
o x = independent variable
o u = error term (or disturbance)
o ~ = slope parameter
Some additional terminology
• Other term.s fory... • Other term.s for x ...
o Outcome variable o Covariate
o Response variable o Control variable
o Explained variable o Explanatory variable
o Predicted variable o Predictor variable
o Regressand o Regressor
I Details about y = f3x + u
• (y, x, u) are randotn variables
• (y, x) are observable
• (u, /1) are unobservable
o u captures everything that determines y after
accounting for x [This might be a lot of stuff!]
o We want to estimate ~
I Ordinary Least Squares (OLS)
• Sim.ply put, OLS finds the f3 that
m.inim.izes the m.ean-squared error

/3 = argmin = E[(y-bx) 2 ]
b

• Using first order condition: E[x(y-fix)]=O,


we have ~= E(-'9') /E(x2)
• Note: by definition, the residual from. this
regression,y-fixJ is uncorrelated with x
I What great about this linear regression?
• It can be proved that ...
o f3x is best* linear prediction ofy given x
o f3x is best* linear approximation of E(y Ix)
* 'best' in terms of minimum mean-squared error

• This is quite useful. I.e., even if E(y Ix) is


nonlinear, the regression gives us the best
linear approxim.ation of it
I Linear Regression - Outline
• The CEF and causality (very brief)
o Random variables & the CEF
o Using OLS to learn about the CEF
o Briefly describe "causality"
---

• Linear 0 LS nlodel
• Multivariate estinlation
• Hypothesis testing
• Miscellaneous issues
I What about causality?
• Need to be careful here ...
o How x explains y, which this regression
helps us understand, is not the same as
learning the causal effect of x ony
o For that, we need more assumptions ...
I The basic assumptions [Part 1)
• Assumption #1: E(u) =0
o With intercept, this is totally innocuous
o Just change regression toy = rL + px + u,
where rL is the intercept term
o Now suppose, E(u)=kf:O
• We could rewrite u = k + w, where E(w)=O
• =
Then, model becomes y (et + k) + {Jx + w
• Intercept is now just et + k, and error, w, is mean zero
• I.e., Any non-zero mean is absorbed by intercept
I The basic assumptions [Part 2)
Intuition?
• Assumption #2: E(u Ix) = E(u)
o In words, average of u (i.e., unexplained portion
ofy) does not depend on value of x
o This is ''conditional mean independence'' (CMI)
• True if x and u are independent of each other
• Implies u and x are uncorrelated

This is the key assutnption being tnade


w-hen people tnake causal inferences
CMI Assumption
• Basically, assum.ption says you've got correct
CEF m.odel for causal effect of x ony
o CEF is causal if it describes differences in
average outcomes for a change in x
• i.e., change in y if x increases from values a to bis
equal to E(y Ix=b)-E(y Ix=a) [In words?]

o Easy to see that this is only true if E(u Ix) = E(u)


[This is done on next slide ...]
I Example of why CMI is needed
• With tnodely = rJv + f3x + u,
o E(y Ix=a) = et + f3a + E(u Ix=a)
o E(y Ix=b) = + f3b + E(u Ix=b)
et
o Thus, E(y Ix=b) - E(y Ix=a) =
~(b-a) + E(u Ix=b) - E(u Ix=a)

o This only equals what we think of as the 'causal'


effect of x changing from a to b if E(u Ix=b) =
E(u Ix=a) ... i.e., CMI assumption holds
Tangent - CMI versus correlation
• CMI (which itnplies x and u are
uncorrelated) is needed for no bias
[which is a finite sample properry]
• However, we only need to assutne a zero
correlation between x and u for consistency
[which is a large sample properry]
o More about bias vs. consistency later; but we
typically care about consistency, which is why
I'll often refer to correlations rather than CMI
I Is it plausible?
• Admittedly, there are tnany reasons why
this assutnption tnight be violated
o Recall, u captures all the factors that affecty
other than x ... It will contain a lot!
o Let's just do a couple of examples ...
I Ex. #1 - Capital structure regression
• Consider following fir01-level regression:
Leveragei =a+ PProfitabilityi + ui

o CMI implies average u is same for each profitability


o Easy to find a few stories why this isn't true ...
• #1 - unprofitable firms tend to have higher bankruptcy risk,
which by tradeoff theory, should mean a lower leverage
• #2 - unprofitable firms have accumulated less cash, which
by pecking order means they should have more leverage
Ex. #2 - Investment Measure of

investment
opportunities

• Consider following fir01-level regression:


Investmenti = a + PQi + ui
o CMI implies average u is same for each Tobin's Q
o Easy to find a few stories why this isn't true ...
• #1 - Firms with low Q might be in distress & invest less
• #2 - Firms with high Q might be smaller, younger firms
that have a harder time raising capital to fund investments
I Is there a way to test for CMI?
• Let y be the predicted value ofy, i.e.
y =a+ Px , where rt and ~ are OLS estimates
• And, let u be the residual, i.e. u= y - y
• Can we prove CMI if residuals are E( u)=O
and if u is uncorrelated with x?
D Answer: No! By construction, these residuals are
mean zero and uncorrelated with x. See earlier
derivation of OLS estimates
Identification police

• What people call the ''identification police''


are those that look for violations of CMI
o I.e., the "police" look for a reason why the
model's disturbance is correlated with x
• Unfortunately, it's not that hard ...
• Trying to find ways to ensure the CMI
assumption holds and causal inferences can be
made will be a key focus of this course
I A side note about "endogeneity"
• Many ''police'' will criticize a tnodel by
saying it has an ''endogeneity probletn'' but
then don't say anything further ...

• But w-hat does it tnean to say there is an


''an endogeneity problein''?
I A side note about "endogeneity"
• My view-: such vague ''endogeneity'' critics
suspect som.ething is potentially wrong, but
don't really know why or how
o Don't let this be you! Be specific about
what the problem is!

• Violations to CMI can be roughly


categorized into three bins ... which are?
I Three reasons why CMI is violated
• Omitted variable bias
• Measurement error bias
• Simultaneity bias
o We will look at each of these in much
more detail in the "Causality'' lecture
I What "endogenous" means to me
• An ''endogenous'' xis when its value depends
on y (i.e., it is determined jointly with y such
that there is sitnultaneity bias).
o However, some use a broader definition to
mean any correlation between x and u
[e.g., Roberts & Whited (2011)]
o Because of the confusion, I avoid using
"endogeneity''; I'd recommend the same for you
• I.e., Be specific about CMI violation; e.g., just say
omitted variable, measurement error, or simultaneity bias
I A note about presentations ...
• Think about ''causality'' when presenting
papers in the next two classes
o I haven't yet formalized the various reasons for
why "causal" inferences shouldn't be made; but
I'd like you to take a stab at thinking about it
I Linear Regression - Outline
• The CEF and causality (very brief)
• Linear 0 LS nlodel

~-
o Basic interpretation
0 Rescaling & shifting of variables
o Incorporating non-linearities

• Multivariate estinlation
• Hypothesis testing
• Miscellaneous issues
I Interpreting the estimates
• Suppose I estitnate the following tnodel of
CEO cotnpensation
salaryi = a+ fiROEi + ui
o Salary for CEO i is in $000s; ROE is a 0/o

• If you get ... a= 963.2


p" = 18.50
o What do these coefficients tell us?
o Is CMI likely satisfied?
I Interpreting the estimates -Answers
salaryi == 963.2+18.5ROEi + ui

• What do these coefficients tell us?


D 1 percentage point increase in ROE is
associated with $18,500 increase in salary
D Average salary for CEO with ROE= 0
was equal to $963,200

• Is CMI likely satisfied? Probably not


I Linear Regression - Outline
• The CEF and causality (very brief)
• Linear OLS lllodel
o Basic interpretation
o Rescaling & shifting of variables
o Incorporating non-linearities

• Multivariate estilllation
• Hypothesis testing
• Miscellaneous issues
I Scaling the dependent variable
• What if I change tneasuretnent of salary frotn
$000s to $s by tnultiplying it by 1,000?

D Estimates were... a= 963.2


,....
p = 18.50

o N ow, they will b e ...


a= 963, 200
,....
p = 18,500
I Scalingy continued ...
• Scalingy by an aniount c just causes all the
estiniates to be scaled by the sanie aniount
o Mathematically, easy to see why ...

y=a+Px+u
cy = (ca)+ ( cP) x +cu

I
New intercept New slope
I Scalingy continued ...
• Notice, the scaling has no effect on the
relationship between ROE and salary

o I.e., because y is expressed in $s now, p = 18,500


"

means that a one percentage point increase in ROE


is still associated with $18,500 increase in salary
I Scaling the independent variable
• What if I instead change tneasuretnent
of ROE frotn percentage to decitnal?
(i.e., tnultiply ROE by 1/100)

D Estimates were... a= 963. 2


p" = 18.50
a= 963.2
o Now, they will be ...
"
p = 1,850
Scaling x continued ...
• Scaling x by an am.aunt k just causes the
slope on x to be scaled by 1/ k

o Mathematically, easy to see why ...


Will interpretation of
estimates change?
y=a+f3x+u
Answer: Again, no!

y=a+ /3 kx+u
k

New slope
I Scaling both x and y
• If we scale y by an atnount c and x by
atnount k , then we get ...
o Intercept scaled by c
o Slope scaled by c/ k
y=a+px+u

cy = (ca)+ cf kx +cu

• When is scaling useful?


I Practical application of scaling #1
• No one wants to see a coefficient of
0.000000456 or 1,234,567,890

• Just scale the variables for cosm.etic purposes!


o It will affect coefficients & SEs
o However, it won't affect !-stats or inference
I Practical application of scaling #2 [P1J
• To itnprove interpretation, in tertns of
estitnated tnagnitudes, it's helpful to scale the
variables by their satnple standard deviations

o Let ax and ay be sample standard deviations of x


and y respectively
o Let c, the scalar for y, be equal to 1 / ay
o Let k, the scalar for x, be equal to 1 /ax
o I.e., units of x and y are now standard deviations
I Practical application of scaling #2 [P2]
• With the prior rescaling, how would we
interpret a slope coefficient of 0.25?
o Answer = a 1 s.d. increase in xis associated
with 1/4 s.d. increase in y

o The slope tells us how many standard


deviations y changes, on average, for a
standard deviation change in x

o Is 0.25 large in magnitude? What about 0.01?


I Shifting the variables
• Suppose we instead add c toy and k to x
(i.e., we shifty and x up by c and k respectively)

• Will the estim.ated slope change?


I Shifting continued ...
• No! Only the estilllated intercept will change
o Mathematically, easy to see why ...

y=a+fix+u
y + c = a + c +fix+ u
y+c = a+c+ P(x+k)-Pk+u
y+c =(a+c-fik)+ P(x+k)+u

/
New intercept ""
Slope the same
I Practical application of shifting
• To itnprove interpretation, sotnetitnes helpful
to detnean x by its satnple tnean
o Let µx be the sample mean of x, regress y on x - µx
o Intercept now reflects expected value ofy for x =µx

y=(a+ f3µx)+ f3(x-µx)+u


E(y Ix = µJ = (a+ f3µx)
o This will be very useful when we get to cliff-in-cliffs
I Break Time
• Let's take a 10-minute break
I Linear Regression - Outline
• The CEF and causality (very brief)
• Linear 0 LS nlodel
o Basic interpretation
o Rescaling & shifting of variables
o Incorporating non-linearities
--

• Multivariate estinlation
• Hypothesis testing
• Miscellaneous issues
I Incorporating nonlinearities [Part 1J
• Assunllng that the causal CEF is linear
m.ay not always be that realistic

o E.g., consider the following regression

wage = ex+ ~education +u

o Why might a linear relationship between #


of years of education and level of wages be
unrealistic? How can we fix it?
I Incorporating nonlinearities [Part 2)
• Better assumption might be that each year of
education leads to a constant proportionate
(i.e., percentage) increase in wages

o Approximation of this intuition captured by ...

ln(wage) =ex+ ~education+ u

o I.e., the linear specification is very flexible


because it can capture linear relationships
between non-linear variables
I Common nonlinear function forms
• Regressing Levels on Logs
• Regressing Logs on Levels
• Regressing Logs on Logs

Let's discuss how- to interpret each of these


I The usefulness of log
• Log variables are useful because
100X.Afn(y) ~ 0/o~y
D Note: When I (and others) say "Log'', we
really mean the natural logarithm, "Ln".
E.g., if you use the ''log" function in Stata,
it assumes you meant ''ln"
I Interpreting log-level regressions
• If you estitnate the ln(wage) equation, 100~
will tell you the 0/oLiwage for an additional
year of education. To see this ...

ln( wage) == a + peducation + u


~In( wage) == p~education
100 x ~ ln( wage) == (100 p)~education
%~wage~ (IOOP)~education
I Log-level interpretation continued ...
• The proportionate change iny for a
given change in xis assum.ed constant
o The change in y is not assumed to be
constant ... it gets larger as x increases
o Specifically, ln(y) is assumed to be linear in
x, buty is not a linear function of x . ..

ln(y) =a+ fix+u


y =exp(a+ fix+u)
I Example interpretation
• Suppose you estim.ated the wage equation
(where wages are $/hour) and got ...

ln(wage) = 0.584 + 0.083education

o What does an additional year of education get you?

Answer= 8.3°/o increase in wages.


o Any potential problems with the specification?
o Should we interpret the intercept?
I Interpreting log-log regressions
• If you alternatively estimate the following ...

ln(y) = a + /J ln( x) + u

• ~ is the elasticity ofy w.r. t. x!


D i.e., ~ is the percentage change iny for a
percentage change in x
o Note: regression assumes constant elasticity
betweeny and x regardless of level of x
I Example interpretation of log-log
• Suppose you estim.ated the CEO salary m.odel
using logs and got the following:

ln(salary) =4.822 + 0.257ln(sales)


• What is the interpretation of 0.257?

Answ-er = For each 1°/o increase in


sales, salary increases by 0.257°/o
I Interpreting level-log regressions
• If estitnating the following ...

y = a + /J ln( x) + u

• ~/ 100 is the change in y for 1°/o change x


I Example interpretation of level-log
• Suppose you estim.ated the CEO salary
m.odel using logs and got the following,
where salary is expressed in $000s:

salary = 4.822 + 1,812.5/n(sales)

• What is the interpretation of 1,812.5?


Answ-er = For each 1°/o increase in
sales, salary increases by $18,125
I Summary of log functional forms
Dependent Independent
Model Interpretation of ~
Variable Variable

Level-Level y x dy = f3dx
Level-Log y ln(x) dy = (/3/100) 0/odx
Log-Level ln(y) x 0
/ody = (100f3)dx
Log-Log ln(y) ln(x) 0
/ody = /3 °/odx

• Now, let's talk about what happens if


you change units (i.e., scale) for eithery
or x in these regressions ...
I Rescaling logs doesn't matter [Part 1)
• What happens to intercept & slope if rescale
(i.e., change units) ofy when in log form.?

• Ans'Wer = Only intercept changes; slope


unaffected because it m.easures proportional
change inyin Log-Level m.odel
log(y) =a+ fJx+u
log(c) + log(y) = log(c) +a+ fJx + u
Iog(cy) = (Iog(c) +a)+ fJx+u
I Rescaling logs doesn't matter [Part 2)
• Sarne logic applies to changing scale of x in
level-log models ... only intercept changes

y =a+ Plog(x)+u
y+ Plog(c) =a+ Plog(x)+ Plog(c)+u
y = (a - Plog(c)) + Plog(cx) + u
I Rescaling logs doesn't matter [Part 3)
• Basic message - If you rescale a logged variable,
it will not affect the slope coefficient because you
are only looking at proportionate changes
I Log approximation problems
• I once discussed a paper where author
argued that allowing capital inflows into
country caused -120°/o change in stock
prices during crisis periods ...

o Do you see a problem with this?


• Of course! A 120o/o drop in stock prices isn't
possible. The true percentage change was -70°/o.
Here is where that author went wrong ...
I Log approximation problems [Part 1)
• Approximation error occurs because as true
0
/o~y becomes larger, 100Liln(y)~ 0/o~y
becomes a worse approximation

• To see this, consider a change fromy toy' ...

o Ex. #1: y '- y = 5%, and 100&(y) = 4.9%


y

o Ex. #2: y '-y = 75%, but 100.Afn(y)= 56%


y
I Log approximation problems [Part 2)
400.00%
- - - Approximation
350.00%
--Exact
300.00%

;>--.
(!)
250.00%
OJ)
ij
...c: 200.00%
u
~ 150.00%
- -- ---
.,.. .
,;-.

100.00o/o
__ .... ---- _... -- -- _.,.. .,..
50.00%
-·-
.... . --- -
,,..
,,,.. fl"'

0.00%
0 0.5. 1 1.5 2 2.5 3 3.5 4 4.5 5
Delta x
I Log approximation problems [Part 3)
• Probleni also occurs for negative changes

o Ex. #1: Y '-y = -5%, and 100&(y) = -5.1%


y

D Ex. #2: Y '- Y = -7 5%, but 1OO&n(y) = -139°/o


y
I Log approximation problems [Part4]
o So, if implied percent change is large, better to convert
it to true 0/o change before interpreting the estimate

ln(y) =a+ Px+u


ln(y ') - ln(y) = p(x '- x)
In(y'I y) = fi(x'-x)
y'/ y = exp(fi(x'-x))
[(y'-y) I y ]% = 100[exp(fi(x'-x) )-1 J
I Log approximation problems [Part 5]
• We can now use this fortnula to see what
true 0/o change iny is for x ~-x = 1

[(y'- y) I y ]% = 100[exp(fi(x'-x) )-1 J


[(y'-y) I y ]% = 100[exp(fi)-1 J
o If f3 = 0.56, the percent change isn't 56°/o, it is
100[exp(0.56)-1] =75%
I Recap of last two points on logs
• Two things to keep in mind about using logs
o Rescaling a logged variable doesn't affect slope
coefficients; it will only affect intercept
o Log is only approximation for 0/o change; it can
be a very bad approximation for large changes
I Usefulness of logs - Summary
• Using logs gives coefficients
with appealing interpretation

• Can be ignorant about unit of


tneasuretnent of log variables
since they're proportionate ds

• Logs ofy or x can tnitigate


influence of outliers
I "Rules of thumb" on when to use logs
• Helpful to take logs for variables with ...
o Positive currency amount
o Large integral values (e.g., population)

• Don't take logs for variables tneasured in


years or for variables that can equal zero ...
I What about using ln(1 +y)?
• Because ln(O) doesn't exist, sotne use ln(l +y)
for non-negative variables, i.e. y E [O, oo)
o However, you should not do this! Nice
interpretation no longer true, especially if a lot of
zeros or many small values in y [Why?]
• Ex. #1: What does it mean to go from ln(O) to ln(x>O)?
• Ex. #2: And Ln(x'+ 1) - Ln(x+ 1) is not percent change ofx

o See Cohn, Liu, Wardlaw UFE 2022) for solutions


& more details on why using ln(1 +y) is problematic
Tangent- Percentage Change
• What is the percent change in
unetnploytnent if it goes frotn 10°/o to 9°/o?
o This is 10 percent drop
o It is a 1-percentage point drop
• Percentage change is [(x1 - x0)/x0]X100
• Percentage point change is the raw change in
percentages

Please take care to get this right in


description of your empirical results
I Models with quadratic terms [Part 1)
• Considery ={J0 + {J1x +{Jr!+ u
• Partial effect of xis given by ...
J),.y =(Pi+ 2fi2x )tu
o What is different about this partial effect
relative to everything we've seen thus far?
• Answer = It depends on the value of x. So, we will
need to pick a value of x to evaluate it (e.g. x)
I Models with quadratic terms [Part 2)
" "
• If P1 > 0, P2 < 0, then it has parabolic relation
o Turning point = Maximum = P" 1 I 2 P" 2
o Know where this turningpoint isl Don't claim a
parabolic relation if it lies outside range of x!
o Odd values might imply misspecification or simply
mean the quadratic terms are irrelevant and should
be excluded from the regression
I Linear Regression - Outline
• The CEF and causality (very brief)
• Linear 0 LS niodel
• Multivariate estiniation
o Properties & Interpretation
o Partial regression interpretation
o R2, bias, and consistency

• Hypothesis testing
• Miscellaneous issues
I Motivation
• Rather uncom.m.on that we have
just one independent variable
o So, now we will look at multivariate
OLS models and their properties ...
I Basic multivariable model
• Exam.ple with constant and k regressors

• Sinillar identifying assum.ptions as before


o No collinearity among covariates [why?]
o E(u Ix 1, •• • , x,J = 0

• Implies no correlation between any x and u, which


means we have the correct model of the true causal
relationship between y and (x1, ••• , x~
I Interpretation of estimates
,....
• Estimated intercept, Po, is predicted
value ofy when all x = O; sometimes this
makes sense, sometimes it doesn't

• Estimated slopes, (/JP ... ,/Jk ),


have a
more subtle interpretation now ...
,.... ,.... ,....

Y =Po+ P1x1 + ... + Pkxk +U


o How would you interpret fi" 1 ?
I Interpretation -Answer
/3
• Estimated slopes, ( 1 , ••• , /Jk ), have partial
effect interpretations

• Typically, we think about change in just one


variable, e.g.,~ x 1, holding constant all other
variables, i.e., (~2, •• • , ~k all equal 0)
o This is given by L1y = /3" 1 ~1
"
o I.e., /31 is the coefficient holding all else fixed
(ceteris paribus)
I Interpretation continued ...
• However, can also look at how changes
in tnultiple variables at once affects
predicted value ofy
o I.e., given changes in x 1 through xk
we obtain the predicted change iny, Lb'
I Example interpretation - College GPA
• Suppose we regress college GPA onto high
school GPA (4-point scale) and ACT scores
for N = 141 university students

co/GPA= 1.29 + 0.453hsGPA + 0.0094ACT


o What does the intercept tell us?
o What does the slope on hsGPA tell us?
I Example - Answers
• Intercept meaningless ... person with zero
high school GPA and ACT doesn't exist

• Example interpretation of slope ...


o Consider two students, Ann and Bob, with
identical ACT score, but Ann's GPA is 1 point
higher than Bob. Best prediction of Ann's college
GPA is that it will be 0.453 higher than Bob's
I Example continued ...
• Now, what is effect of increasing high school
GPA by 1 point and ACT by 1 point?

AcalGPA== 0.453 x AhsGPA + 0.0094 x MCT


AcolGPA == 0.453 + 0.0094
AcolGPA == 0.4624
I Example continued ...
• Lastly, what is effect of increasing high school
GPA by 2 points and ACT by 10 points?

AcalGPA== 0.453 x AhsGPA + 0.0094 x MCT


AcalGPA == 0.453x2+0.0094x10
AcolGPA ==l
I Fitted values and residuals
• Definition of residual for observation i, ui

• Properties of residual and fitted values


o Sample average of residuals = O; implies that
sample average of y equals sample average ofy
o Sample covariance between each independent
variable and residuals = 0
o Point of means (y,x1, ... ,xk) lies on regression line
Tangent about residuals
• Again, it bears repeating ...
o Looking at whether the residuals are correlated
with the :x!s is NOT a test for causality
o By construction, they are uncorrelated with x
o There is no "test'' of whether the CEF is the
causal CEF; that justification will need to rely
.
on econormc arguments
I Linear Regression - Outline
• The CEF and causality (very brief)
• Linear 0 LS niodel
• Multivariate estiniation
o Properties & Interpretation

~-
o Partial regression interpretation-
o R2 , bias, and consistency

• Hypothesis testing
• Miscellaneous issues
IQuestion to motivate the topic...
• What is wrong with the following? And w-hy?
o Researcher wants to know effect of x ony
after controlling for z
o So, researcher removes the variation iny that is
z z
driven by by regressingy on & saves residuals
o Then, researcher regresses these residuals on x and
claims to have identified effect of x ony controlling
for zusing this regression

We'll answer why it's


wrong in a second ...
I Partial regression [Part 1]
• The foil owing is quite useful to know ...
• Suppose you want to estim.ate the following

o Is there another way to get /J" 1 that doesn't


involve estimating this directly?
• Answer: Yes! You can estimate it by regressing the
residuals from a regression ofyon x 2 onto the
residuals from a regression of x 1 onto x 2
I Partial regression [Part 2]
"
• To be clear, you get /31 , by ...
#1 - Regress y on x 2 ; save residuals (call them y )
#2 - Regress x 1 on x 2 ; save residuals (call them x)
#3 - Regress y onto x;the estimated coefficient
will be the same as if you'd just run the original
multivariate regression!!!
I Partial regression - Interpretation
• Multivariate estimation is basically finding
effect of each independent variable after
partialing out effects of other variables
o I.e., Effect of x 1 ony after controlling for x 2 , (i.e.,
what you'd get from regressingy on both x 1 and
x 2) is the same as what you get after you partial
out the effect x 2 from both x 1 andy and then run
a regression using the residuals
I Partial regression - Generalized
• This property holds more generally ...
o Suppose X 1 is vector of independent variables
o X 2 is vector of more independent variables
o And, you want to know that coefficients on X 1 that
you would get from a multivariate regression ofy
onto all the variables in X 1 and X2 •••
I Partial regression - Generalize~ Part 2
• You can get the coefficients for each
variable in X 1 by ...
o Regress y and each variable in X 1 onto all the
variables in X 2 (at once), save residuals from
each regression
o Do a regression of residuals; i.e., regress y
onto variables of X 1, but replacey and X 1
with the residuals from the corresponding
regression in step #1
I Practical application of partial regression
• Now-, w-hat is wrong with the following?
o Researcher wants to know effect of x ony
after controlling for z
o So, researcher removes the variation iny that is
z z
driven by by regressingy on & saves residuals
o Then, researcher regresses these residuals on x and
claims to have identified effect of x ony controlling
for zusing this regression
I Practical application - Answer
• It's wrong because it didn't partial the effect
of zout of x! Therefore, it is NOT the
same as regressingy onto both x and ~

• Unfortunately, it was comm.only done by


researchers in finance [e.g.} industry-adjustiniJ
o We will see how badly this can mess up things in
a later lecture where we look at my paper with
David Matsa on unobserved heterogeneity
I Linear Regression - Outline
• The CEF and causality (very brief)
• Linear 0 LS niodel
• Multivariate estiniation

• Hypothesis testing
• Miscellaneous issues
I Goodness-of-Fit (R2)
• A lot is tnade of R-squared; so, let's
quickly review exactly what it is
• Start by defining the following:
D Sum of squares total (SS1)
D Sum of squares explained (SSE)
D Sum of squares residual (SSR)
I Definition of SST, SSE, SST
If N is the num.ber of observations and the
regression has a constant, then

N 2
SST = L (Yi - y)
i=l
SST is total variation iny

SSE is total variation in predictedy


i=l
[mean of predictedy = mean of y]
N
ssR= Lu: SSR is total variation in residuals
i=l [mean of residual= OJ
I SSR, SST, and SSE continued ...
• The total variation, SST, can be broken
into two pieces ... the explained part,
SSE, and unexplained part, SSR

SST = SSE + SSR


• R2 is just the share of total variation that
is explained! In other words,

R 2 = SSE/SST = 1- SSR/SST
I More about R2
• As seen on last slide, R2 m.ust be
between 0 and 1
• It can also be shown that R2 is equal
to the square of the correlation
between y and predicted y
• If you add an independent variable,
R2 will never go down
I Adjusted R2
• Because R2 always goes up, we often use
what is called Adjusted R2

AdjR 2 =1-(t-R 2 ) N-l


N-1-k

o k = # of regressors, excluding the constant


o Basically, you get penalized for each additional
regressor, such that adjusted R2 won't go up after
you add another variable if it doesn't improve fit
much [it can go down!]
I Interpreting R2
• If I tell you the R2 is 0.014 from a
regression, what does that mean? Is it bad?
o Answer #1 = It means I'm only explaining
about 1.4°/o of the variation iny with the
regressors that I'm including in the regression
o Answer #2 =Not necessarily! It doesn't mean
the model is wrong; you might still be getting a
consistent estimate of the ~ you care about!
I Unbiasedness versus Consistency
• When we say an estimate is unbiased
or consistent, it means we think it has
a causal interpretation ...
o I.e., the CMI assumption holds and the :x!s are
all uncorrelated with the disturbance, u

• Bias refers to finite sample property;


consistency refers to asymptotic property
I More formally ...
• An estimate, fl, is unbiased if E (fl) = P
o I.e., on average, the estimate is centered around the
true, unobserved value of ~
o Doesn't say whether you get a more precise
estimate as sample size increases
"
• An estim.ate is consistent if plim P = P
N~oo

o I.e., as sample size increases, the estimate converges


(in probability limit) to the true coefficient
I Unbiasedness of OLS
• OLS will be unbiased when ...
o Model is linear in parameters
o We have a random sample of x
o No perfect collinearity between :x!s
o E(u I x 1, •• • , x,J = 0
[13arlier CMI assumptions # 1 and #2 give us this]

• Unbiasedness is nice feature of OLS; but in


practice, we care tnore about consistency
Consistency of OLS
• OLS will be consistent when
D Model is linear in parameters
D u is not correlated with any of the xs,
[CMI assumptions # 1 and #2 give us this; a lack of
co"elation is a weaker assumption than CMI. . . CMI
precludes both linear and non-linear relationships, while
co"elations onfy measure linear relationships]
• Again, this is good
• See textbooks for tnore infortnation

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy