CH 07 Specification and Data Issues TQT
CH 07 Specification and Data Issues TQT
CH 07 Specification and Data Issues TQT
2. Interactions among 𝑋𝐽
reg lnwage i.married##i.gender age
reg lnwage c.edu##i.gender age
Population regression
Estimated regression
E.g., suppose we have a sample that includes only those older than 40; this
causes no problem for the regression because the regression function
E(saving|edu,age,size) is the same for any subgroup of the population
defined by edu, age, or hhsize.
❑ This causes the bias and inconsistency since these respondents may
systematically differ from those who engage in the survey.
❑ For instance, only those with income obove 1000$ are included in the
sample. Then the population regression 𝐸 𝑖𝑛𝑐𝑜𝑚𝑒 𝑒𝑑𝑢, 𝑎𝑔𝑒, 𝑠𝑖𝑧𝑒 ≠
𝐸 𝑖𝑛𝑐𝑜𝑚𝑒 𝑒𝑑𝑢, 𝑎𝑔𝑒, 𝑠𝑖𝑧𝑒, 𝑖𝑛𝑐𝑜𝑚𝑒 > 1000$ .
❑ The model specifies the qth – quantile (0< q <1) of conditional distribution of the
dependent variable.
Advanced econometrics
4. Zero conditional mean assumption
How to do?
❑ Functional form is correctly specified
❑ What are the consequences of the unconstant variance? (See more in 5.5)
1. -The formulas for 𝑉𝑎𝑟 𝛽መ𝑗 : invalid
2. -F-test & ttest: invalid
3. -No longer blue (No best linear unbiased estimator): because not the smallest
variance)
But heteroskedasticity does not affect unbiasedness, consistency and R-square.
❑ How to do deal with the unsconstant variance (see more in 5.5)
-Log-transformation.
-Using the “robust” option provided by Stata.
reg wage edu exper female married, vce(robust)
6. Normality of the errors
b𝟏 : the wage gap between men and women (holding other things constant)
b𝟏 is statistically significant
at 1% =>the intercept is
different between the two
groups
Interpretation: interactions between two dummy variables
0.03 married*men
Log_wage = 8.6+ 0.11 married + 0.14 men +
1. What is the intercept for single women?
2. What is the difference in intercepts between married
women and single women?
3. What is the difference in intercepts between married
men and single women?
4. What is the difference in intercepts between married
men and single men?
5. What is the difference in intercepts between married
men and married women?
𝑤𝑎𝑔𝑒
ෟ = 2145.787 − 666.4691𝐺𝑒𝑛𝑑𝑒𝑟 + 349.1673𝐸𝑑𝑢 + 141.9522𝑮𝒆𝒏𝒅𝒆𝒓 ∗ 𝑬𝒅𝒖
For every one-year increase in education, a woman earns an additional 349.1673
For every one-year increase in education, a man earns an additional 491.1195 = 349.1673 + 141.9522
The difference in the slopes ( or the different effect of edu on wage): 141.9522
Note: This kind of interaction allows for different slopes
Quadratic Functional Forms: interpretation
Quadratic Functional Forms: y = b0 + b1x + b2x2 + u
❑ This functional form captures decreasing or increasing marginal effects
❑ The effect depends on b1, b2 and the value of X:
∆𝑦ො ≈ 𝛽መ1 + 2𝛽መ2 𝑥 ∆𝑥, 𝑠𝑜 ∆𝑦/∆𝑥
ො ≈ 𝛽መ1 + 2𝛽መ2 𝑥
An increase in work experience from 10–11 years increases wages by about 100 thousand VND/month.
∆𝑤𝑎𝑔𝑒
ෟ = 200 − 2 ∗ 5 ∗ 10 ∗ 1 =200-100=100
❑ Consistency: 𝛽መ𝑗 → 𝛽𝑗 as n → ∞
OLS estimator (𝛽መ𝑗 )would converge to the true population parameter (𝛽𝑗 ) as the
sample size get larger, and approaches infinity.
❑ Efficiency: smaller variances or more precision
❑ Under assumptions 1-5: OLS estimator is unbiased, consistent and efficient.
❑ BLUE: Best Linear Unbiased Estimator
❑ Best: OLS has smallest variances among other linear unbiased estimators
Three Components of OLS Variances
𝝈𝟐
𝑆𝑒 𝛽𝑗 = 𝑉𝑎𝑟 𝛽𝑗 = ; 𝑗 = 1,2 … 𝑘
𝑺𝑺𝑻𝒋 (𝟏−𝑹𝟐𝒋 )
1. Error variance: 𝝈𝟐
▪ Larger 𝝈𝟐 increases the sampling variances
▪ Larger 𝝈𝟐 reduces the precision of estimates.
▪ 𝝈𝟐 does not decrease with sample size because 𝝈𝟐 𝒊𝒔 𝒂 𝒇𝒆𝒂𝒕𝒖𝒓𝒆 𝒐𝒇 𝒕𝒉𝒆 𝒑𝒐𝒑𝒖𝒍𝒂𝒕𝒊𝒐𝒏
▪ Adding more explanatory variables reduces 𝝈𝟐
2. The total sample variation in explanatory variable𝒋 : 𝑺𝑺𝑻𝒋
• Higher sample variations increase the precision of estimates.
• Larger sample sizes automatically increase sample variation.
• Increasing sample size is a method to obtain more precise estimates.
3. Linear relationship among explanatory variables:𝐑𝟐𝐣
• As 𝑹𝟐𝒋 increases to 1, 𝑉𝑎𝑟 𝛽𝑗 gets larger and larger: 𝑉𝑎𝑟
𝛽𝑗 → ∞ 𝑎𝑠 𝑅𝑗2 → 1
• Multicollinearity leads to higher variances for OLS slope estimators.
The precision of the slope coefficient
will increase if
𝝈𝟐
𝑆𝑒 𝛽𝑗 = 𝑉𝑎𝑟 𝛽𝑗 = ; 𝑗 = 1,2 … 𝑘
𝑺𝑺𝑻𝒋 (𝟏−𝑹𝟐𝒋 )
A. True or false?
1. The error variance (𝝈𝟐 ) can be reduced if we add more explanatory
variables.
2. Reducing the error variance makes the estimates less precise.
3. Measurement errors in the dependent variable make the estimates less
precise.
4. The estimates are more precise if the variance of the dependent variable is
a function of the independent variable: Var(Y|x)=F(x).
5. The estimates are less precise if the variance of the errors is a function of
the independent variable: Var(U|x)=F(x).
B. Omitted variable bias occurs if
1. A model omitted a variable that has no partial effect on Y and is highly
correlated with other explanatory variables.
2. A model omitted a variable that has a non-zero partial effect on Y and is
uncorrelated with other explanatory variables.
3. A model omitted a variable that has a positive effect on Y and is uncorrelated
with other explanatory variables.
4. A model omitted a variable that has a positive effect on Y and is negatively
correlated with other explanatory variables.
5. A model omitted a variable that has a negative effect on Y and is uncorrelated
with other explanatory variables.
6. A model omitted a variable that has a negative effect on Y and is positively
correlated with other explanatory variables.
7. A model omitted a variable that has a positive effect on Y and is positively
correlated with other explanatory variables.
8. A model omitted the squared term of X, while both the X and X squared terms
were statistically significant.
C.
1. You are given this regression model: 𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝑢. Suppose
there is a positive and highly correlated relationship between X1 and X2,
and X1 has a positive effect on Y. What happens if we drop X1 from the
model?
2. You are given this regression model: 𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝑢. Suppose
there is a negative and highly correlated relationship between X1 and X2,
and X1 has a positive effect on Y. What happens if we drop X1 from the
model?
3. You are given this regression model: 𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝑢. Suppose
there is a positive and highly correlated relationship between X1 and X2,
and X2 has a negative effect on Y. What happens if we drop X2 from the
model?
4. You are given this regression model: 𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝑢. Suppose
there is a positive relationship between X1 and X2, and X2 has a zero
partial effect on Y . What happens if we include X2 in the model?
5. If X2 has a non-zero partial effect on Y but is omitted, does this
increase the bias?
6. You are given this regression model: 𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝑢. You find
that var (Y|x1) increases with the value of x1. Does this affect the
unbiasedness?
Midterm exam
8 Nov 2023
One-hour written open-book exam
The first group includes students from 1 to 20.
The test time is 7.30–8.30
The second group includes students from 21 to 43.
The test time is 9:00–10.00.
Midterm exam
9 Nov 2023
One-hour written open-book exam
The first group includes students from 1 to 20.
The test time is 7.30–8.30
The second group includes students from 21 to 44.
The test time is 9:00–10.00.