0% found this document useful (0 votes)
37 views

AE 2023 Lecture7

1) Dummy variables can represent qualitative information quantitatively by assigning binary values like 0 and 1. 2) In a regression with dummy variables, the coefficient estimates the intercept shift when the dummy variable equals 1 compared to when it equals the base/reference group. 3) Interaction terms using dummy variables allow the effect of one variable to vary across categories of another variable, like estimating different returns to education for men and women.

Uploaded by

Mỹ Phụng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

AE 2023 Lecture7

1) Dummy variables can represent qualitative information quantitatively by assigning binary values like 0 and 1. 2) In a regression with dummy variables, the coefficient estimates the intercept shift when the dummy variable equals 1 compared to when it equals the base/reference group. 3) Interaction terms using dummy variables allow the effect of one variable to vary across categories of another variable, like estimating different returns to education for men and women.

Uploaded by

Mỹ Phụng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Lecture 7: Multiple Regression Analysis

with Qualitative Information


Applied Econometrics
Dr. Le Anh Tuan

1
Dummy variables or binary variables

► Qualitative information can be turned into quantitative


information in a straightforward way, using binary coding for
“yes” and “no”.
► “is a certain person in the sample female?”
► yes female → ! = 1 and no not female → ! = 0.
► Examples: employed vs. unemployed, financial crisis vs.
non-financial crisis…

► ! can be called a dummy variable (or indicator variable/binary


variable/ zero-one variable)

► They may appear as the dependent or as independent variables

2
Dummy variables or binary variables
► In general, a dummy variable is a variable that equals 0 or 1.
► The coefficient of a dummy variable is equal to an intercept shift
of size δ when dummy variable = 1. All slope parameters remain
unchanged.
► Example:

Dummy variable:
= the wage gain/loss if the person =1 if the person is a woman
is a woman rather than a man =0 if the person is man
(holding other things fixed)

3
Dummy variables

Alternative interpretation of
coefficient:

i.e. the difference in mean


wage between men and
women with the same level
of education.

Intercept shift

► In order to be able to interpret the coefficients of dummy


variables one has to know the reference/base group. The
reference/base group is given by the group for which the
dummy equals zero.
4
Dummy variables
This model cannot be estimated
► Dummy trap (perfect collinearity)

► When using dummy variables, one category always has to be


omitted:
The base category are men

The base category are women

► Alternatively, one could omit the intercept:

Disadvantages:
1) More difficult to test for differences between the parameters
2) R-squared formula only valid if regression contains intercept
5
Dummy variables
►Gender and Wage

Holding education,
experience, and tenure fixed,
women earn $1.81 less per
hour than men

► Does that mean that women are discriminated against?


► Not necessarily. Being female may be correlated with other
productivity characteristics that have not been controlled for.

6
Dummy variables
►Effects of training grants on hours of training
Dummy variable indicating whether firm
Hours training per received a training grant
employee

► Holding sales, employee, fixed, firms with grants have higher


training per employee by 26.25 hours.
► This is an example of program evaluation
► Treatment group (= grant receivers) vs. control group (= no grant)
► Is the effect of treatment on the outcome of interest causal?

7
Dummy variables
►A new diet program and Weight
! = 12.82 − 1.85'/#0'#1
"#$%ℎ'
(2.12) (0.60)
n=330, R2=15%
►treated is a dummy variable that equals 1 for a person
who joined a new diet program, 0 otherwise.

Is the new diet is effective?

8
Dummy variables
►Using dummy explanatory variables in equations for log(y)

Dummy variable
=1 if a house is of
colonial style.
=0 otherwise
As the dummy for colonial style changes from 0 to
1, the house price increases by 5.4 percentage
points

9
Using dummy variables for multiple
groups

10
Several subgroups
► Example: A worker is female or male and married or unmarried
⇒ 4 subgroups:
► female and not married
► female and married
► male and not married
► male and married

► How to proceed:
► (1) Define dummy variables for all subgroups
► (2) Leave out one group, which becomes the base
/reference group.

11
Several subgroups

Holding other things fixed,


married women earn 19.8% less
than single men (= the base
group)
12
Incorporating ordinal information
using dummy variables
Municipal bond rate Credit rating from 0-4 (0=worst, 4=best)

This specification would probably not be appropriate as the credit rating only
contains ordinal information. A better way to incorporate this information is to
define dummies:

Dummies indicating whether the particular rating applies, e.g. CR1=1 if CR=1, and
CR1=0 otherwise. All effects are measured in comparison to the worst rating (=
base group).
13
University ranking
► University ranking from 1-100
► The quality difference between ranks 1 and 2 and ranks 11 and
12, respectively, may be dramatically different.
→ Hence, ranks should not be used as independent var.
► Instead, we have to assign a dummy variable !" for all but one
(the “reference group”) of the universities, inducing several new
parameters which have to be estimated.
► Note: Then, the coefficient of a dummy variable Dj denotes the
intercept shift between university j and the reference university.
► Sometimes there are too many ranks and hence too many
parameters to be estimated. Then it proves useful to group the
data, e.g., ranks 1-10, 11-20, etc.

14
Interactions involving dummy
variables

15
Interaction terms
► Example: Do returns to schooling differ for men and women?
► Or: is the effect of education on the wage moderated by gender?

educ wage

female

► How do we formulate a model that allows the effect of education


to vary with gender?
► We use a model with interaction terms
log(%&'() = +, + +. (/01 + +2 3(4&5( + +6 3(4&5( ∗ (/01 + 0

16
Notes
► Consider two models:

(1) log $%&' = )* + ), '-./ + )0 1'2%3' + .

H0: 74 =0, indicating that the whole wage equation is


the same for men and women

4 log $%&' = )* + ), '-./ + )0 1'2%3' + )5 1'2%3' ∗ '-./ + .

H0: 78 =0, indicating that the return to education is the same for men
and women
Or
The impact of education on wage is similar for men and women.

17
Interaction terms
log(%&'() = +, + +. (/01 + +2 3(4&5( + +6 3(4&5( ∗ (/01 + 0

► Female is a dummy variable that equals 1 for female person, 0


otherwise.
► Expected Wage for:

► Male: +, + +. (/01

► Female:
+, + +. (/01 + +2 + +6 (/01 = (+, + +2 )+(+. + +6 )(/01

18
Interaction terms

19
Interaction terms

t-ratio=-0.53→ the coefficient on 9/:;</ ∗ /012 is insignificant → The


impact of education on wage is not different between men and women

"#$!
%&$' =. *++. -./012 − .216'7&"'−. --89/:;</ ∗ /012
(0.12) (0.008) (.17) (.013)
+.005'@A'B + .017tenure
(0.002) (0.003)

This coeffient explains the


Wage is positively associated impact of gender on wage when
with education educ=0. In overall, women
earns less than men.

20
Example – Investment Inefficiency and Firm
Performance
► We have a proposal with two hypotheses:
H1: Investment inefficiency has a negative impact on firm
performance.
H2: The negative impact of investment inefficiency on firm
performance is stronger for small firms.

Investment H1
Firm performance
Inefficiency
H2

Small firms

21
Example – Investment Inefficiency and Firm
Performance
► First, to test the first hypothesis, we estimate the model:
9:; = <= + <?;@A:9B;C_DAEFGHBFAH + <ICFEF9;JF + KF;9 LMG + N
"#$
! = .033 − .317$,-#".$/_1-2345.3-5 − 0.042/323"$83
(0.002) (.013) (.002) (.004)
n=76,604, R2=1.38%

The coefficient on abnormal_investment is -0.317 and statistically significant


at 1% level.
→ Investment inefficiency negatively affects firm performance

Support the hypothesis: Firm performance is negatively


associated with investment inefficiency.
→ a 1% increases in abnormal investment is associated with a
decrease in roa by 0.317%.
22
Example – Investment Inefficiency and Firm
Performance
To test the second hypothesis, we build an interaction terms
model:
Baseline model:
!"# = %& + %(#)*"!+#,_.*/012+0*2 + %3,0/0!#40 + 50#! 671 + 8
Interaction terms model
!"#
= %& + %( #)*"!+#,_.*/012+0*2 + %3 1+#,,_9.!+1
+ %: #)*"!+#,_.*/012+0*2 ∗ 1+#,,_9.!+1 + %< ,0/0!#40 + 50#! 671
+8

► =>?@@_ABC>D is a dummy variable that equals to 1 firms have firm


size is smaller than sample median, 0 otherwise.

► The coefficient on the interaction terms indicates the difference


in the impact of investment inefficiency on firm performance
between small firms and non-small firms.
23
Example – Investment Inefficiency and Firm
Performance
"#$
! = 0.043 + 0.090$,-#".$/_1-2345.3-5 − 0.0204.$//_81".4
(0.002) (0.020) (.002)
−9. :;9<=>?@A<B_C>DEFGAE>G ∗ FA<BB_IC@AF − .074/323"$K3
(.027) (.004)
n=76,604, R2=4.18%
The coefficient of <=>?@A<B_C>DEFGAE>G ∗ FA<BB_IC@AF is negative (-0.610) and
statistically significant at 1% level.

→ The negative impact of investment inefficiency on firm performance is


stronger/more pronounced for small firms.
→ In other words, compared to large firms (non-small firms), the negative impact of
investment inefficiency on firm performance is stronger/more pronounced for small
firms.

In terms of economic significance, a one unit increase in the abnormal investment,


the performance of small firms is 0.610 unit lower than that of non-small firms.

24
Example – Investment Inefficiency and Firm
Performance

25
Example – Corruption and Firm Performance
► We have a proposal with three hypotheses:
H1: Firm performance is negatively associated with corruption.
H2: The negative impact of corruption on firm performance is
stronger for young firms.
H3: The negative impact of corruption on firm performance is
stronger in financial crisis.

Financial crisis
H3
H1
Corruption Firm performance
H2

Young firms

26
Example – Corruption and Firm Performance
► First, to test the first hypothesis, we estimate the model:
"#$ = ;< + ;= -#""./01#2 + ;> 415$ + ;? 072819:$ + .
"#$
! = .156 − .033-#""./01#2 + .005415$ − .057072819:$
(0.024) (0.006) (.002) (.017)
n=2,597, R2=2.61%

The coefficient on corruption is -.033 and statistically significant at 1%


level.
→ Corruption negatively affects firm performance

Support the hypothesis: Firm performance is negatively


associated with corruption.
→ a one-unit increases in corruption is associated with a
decrease in roe by 0.033 unit (3.3%).
27
Example – Corruption and Firm Performance

To test the second hypothesis, we build an interaction terms


model:
Baseline model:
!"# = %& + %( )"!!*+,-". + %/ 0-1# + %2 ,3.4-56# + *
Interaction terms model
!"#
= %& + %( )"!!*+,-". + %/ 7"*.489:;< + %2 )"!!*+,-".
∗ 7"*.4_?-!@0 +%A 0-1# + %B ,3.4-56# + *

► C"*.4_?-!@0 is a dummy variable that equals to 1 firms have


firm age is smaller than sample median, 0 otherwise.

► The coefficient on the interaction terms indicates the difference


in the impact of corruption on firm performance between young
firms and non-young firms.
28
Example – Corruption and Firm Performance

"#$
! = .113 − .002,#""-./0#1 + .0624#-15_70"89
(.032) (.014) (.027)
−.039,#""-./0#1 ∗ 4#-15_70"89 − 00490A$ − .056/C150DE$
(.016) (.002) (.018)

The coefficient of ,#""-./0#1 ∗ 4#-15_70"89 is negative (-0.039) and statistically


significant at 5% level.

→ The negative impact of corruption on firm performance is stronger/more


pronounced for young firms.
→ In other words, compared to mature firms (non-young firms), the negative impact
of corruption on firm performance is stronger/more pronounced for young firms.

In terms of economic significance, a one unit increase in the corruption, the


performance of young firms is 0.039 unit lower than that of non-young firms.

29
Example – Corruption and Firm Performance

To test the third hypothesis, we build an interaction terms


model:
Baseline model:
!"# = %& + %( )"!!*+,-". + %/ 0-1# + %2 ,3.4-56# + *
Interaction terms model
!"#
= %& + %( )"!!*+,-". + %/ )!-0-0 + %2 )"!!*+,-".
∗ )!-0-0 +%8 0-1# + %9 ,3.4-56# + *

► Crisis is a dummy variable that equals to 1 for years of 2007,


2008, 2009, 0 otherwise.

30
Example – Corruption and Firm Performance

"#$
! = .153 − .029.#""/012#3 + .004."2626
(.026) (.008) (.027)
−.011.#""/012#3 ∗ ."2626 + .00562>$ − .0571?3@2AB$
(.011) (.002) (.018)

The coefficient of .#""/012#3 ∗ ."2626 is -.011 and statistically insignificant, even at


10% level.

→ The negative impact of corruption on firm performance is similar during the


financial crisis and non-crisis periods.
→ In other words, compared to before and after the crisis, the negative impact of
corruption on firm performance is similar in the financial crisis period.

31
Summary
Consider a model:
y = #$ + #& '1 + )*+,-*./ + 0 (model 1)
y = #$ + #& '1 + #9 '2 + #; '1 ∗ '2+)*+,-*./ + 0 (=>?@A B)
Where x2 is a dummy variable
Main effect in Interaction-terms in Hypothesis
Model (1) Model (2)
#& #;
>0 >0 More pronounced

>0 <0 Less pronounced

<0 <0 More pronounced

<0 >0 Less pronounced

32
Summary
Consider a model:
y = #$ + #& '1 + #) '2 + #+ '1 ∗ '2+-./01.23 + 4
Where x2 is a dummy variable

Interpret the coefficient on #+ :

→ The positive/negative impact of x1 on y is more/less


pronounced for x2 (main group).

→ In other words, compared to x2 (base group), the


positive/negative impact of x1 on y is more/less pronounced
for x2 (main group).

#+ denotes how much unit is lower/higher between main


group and base group.
33
Example
Board tenure diversity and Investment Efficiency
Dep. Var.: INVESTMENT EFFICIENCY
(1)
BTD 0.001
(0.501)
BTD × SHORT TENURE -0.006*
(-1.907)
SHORT TENURE 0.006***
(2.738)
Control variables INCLUDED
Fixed effects Country, Industry, Year
Observations 81,750
Adjusted R2 0.314
SHORT TENURE is a dummy variable that equals 1 if a firm‘s board tenure is lower than sample
median, 0 ọtherwise.

The coefficient of BTD × LONG TENURE is positive (0.006) and statistically significant at 10% level.
→ The POSITIVE impact of board tenure diversity on firm investment efficiency is less pronounced
for firms that have short tenure.
In terms of economic significance, a one unit increase in the BTD, investment efficiency of short-
tenured firms is 0.006 unit lower than that of long-tenured firms.

34
Example
Corruption and GDP Growth: The moderating Role of FDI

Corruption GDP Growth

High FDI

Sample: 48 Asian countries

HIGH FDI is a dummy variable that equals 1 if a country‘s FDI is higher than
sample median, 0 ọtherwise.

35
Example
Corruption and GDP Growth: The moderating Role of FDI
Dep. Var.: GDP GROWTH
(1)
Corruption -0.089***
(-3.501)
Corruption × HIGH FDI 0.050***
(4.102)
HIGH FDI 0.016***
(2.738)
Control variables INCLUDED
Fixed effects Country, Year
Observations 570
Adjusted R2 0.112

HIGH FDI is a dummy variable that equals 1 if a country‘s FDI is higher than sample median, 0
ọtherwise.

The coefficient of Corruption × HIGH FDI is positive (0.050) and statistically significant at 1%.
→ The negative impact of corruption on GDP GROWTH is less pronounced for countries that have
high FDI.
In terms of economic significance, a one unit increase in the Corruption, GDP Growth of high-FDI
countries is 0.050 unit higher than that of low-FDI countries.

36
A Binary dependent variable:
The linear probability model

37
The linear probability model

Linear regression when the dependent variable is binary


If the dependent
variable only
takes on the
values 1 and 0

Linear probability mod


(LPM)

In the linear probability model, the coefficients


describe the effect of the explanatory variables on
the probability that y=1
38
The linear probability model
=1 if "in the labor force” of a married
women, =0 otherwise husband‘s income

kidslt6 = number of children less than six years


kidsge6 = number of kids between 6 and 18 years
of age
If the number of kids under six years increases by one,
the probability that the woman works falls by 26.2%
The linear probability model
► Disadvantages of the linear probability model
► Predicted probabilities may be larger than one or smaller than
zero
► Marginal probability effects sometimes logically impossible
► The linear probability model is necessarily heteroskedastic
Variance of Bernoulli
variable
► Heteroskedasticity consistent standard errors need to be
computed

► Disadvantages of the linear probability model


► Easy estimation and interpretation
► Estimated effects and predictions are often reasonably
good in practice

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy