0% found this document useful (0 votes)
53 views

Problem Set 2 Quantitative Methods UNIGE

1) The respondent conducted a univariate regression analysis with CEO salary as the dependent variable and company net income as the independent variable using cross-sectional data from 1990. 2) The analysis found that for every $1 million increase in company net income, CEO salary increased by $389,000 on average. This relationship was statistically significant. 3) As larger companies tend to pay CEOs more, these results were expected. However, omitted variable bias is still possible since other factors like company size were not controlled for.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Problem Set 2 Quantitative Methods UNIGE

1) The respondent conducted a univariate regression analysis with CEO salary as the dependent variable and company net income as the independent variable using cross-sectional data from 1990. 2) The analysis found that for every $1 million increase in company net income, CEO salary increased by $389,000 on average. This relationship was statistically significant. 3) As larger companies tend to pay CEOs more, these results were expected. However, omitted variable bias is still possible since other factors like company size were not controlled for.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Quantitative Methods

Université de Genève
Fall 2023
Michalis Nikiforos & Simon Grothe

Problem Set 2
Santiago Hernández Carrasco

1. Imagine you are conducting your first empirical research. You have gathered data and
now you run a first OLS regression. Explain in your own words:

(a) Why would you do hypothesis testing? Give an example of a regression and a re-
lated test of a hypothesis.

A hypothesis test can be used to determine (among many other things) if each of the
independent variables, considered by itself has any capability to explain the variability
in the dependent variable. In a regression where the dependent variable is the wages
received, and the independent ones are the years of education, the years of tenure and the
gender of the person. In this model hypothesis testing can be done in order to confirm
the statistic significance of each of this variables effects on wages, or for all of them as a
whole. A specific example of such a hypothesis test is how statistical analysis software
automatically generates a p-value for an F test. In this situation the null hypothesis - H0
- is that the model R2 value is equal to 0 and thus, the model has no capability to explain
the variability of the dependent variable.

(b) What are the general steps in hypothesis testing?

The first step of hypothesis testing is the formulation of both the null - H0 - and the
alternative - H1 - hypothesis, in an example of a multivariate model with 3 independent
variables, the null can be that both β1 and β2 are equal to 0. or in other words, they have
no impact on the ability of the model to explain the variation of the dependent variable.

H0 = β1 = β2 = 0
H1 = β1 ̸= β2 ̸= 0
The next step is to Compute a test statistic from your estimate and the null hypothesis,
followed by choosing a significance level and identify the correspondent critical value. We
then compare the value previously calculated of the test statistic and the critical value.
If the test statistic is larger than the critical value, then we reject the null hypothesis.

(c) Stata gives you a p-value and a confidence interval for your parameter estimates.
What do they actually mean? How are they related to the estimates for the coefficients
and the standard error?

When running a regression in Stata, the output generated includes a p-value for every
one of the independent variables, the interpretation of this is the probability of the spe-
cific variable to have no explanatory value in the model (considered independently from
Quantitative Methods: Problem set 2 Santiago Hernández Carrasco

the others), and so smaller p-values correspond to lower probability of this independent
variable being unable to explain the variability of the dependent variable, on the contrary
high p-values point to the independent variable being unuseful in the model. The gen-
erally accepted cut values are 0.10, 0.05 and 0.01. Meaning if the calculated P-value is
lower, then there is 90%, 95%, and 99% confidence that the null hypothesis is not to be
accepted, and thus the probability of reaching the results shown even though they are in
reality wrong (the null hypothesis being true) is extremely low.
There’s also the 95% confidence intervals, which can be understood as a range of values
where the actual estimation for the coefficient of the independent variable falls 95% of
the time, meaning for every 100 estimations with new samples, in 95 of them the numeric
value of the coefficient of the independent variable will fall inside the range of the 95%
confidence intervals.

(d) What is the logic behind the F-test? Why can it be useful to do an F-test for
two variables that are each individually non significant?

An F-test is used as a way to compare different versions of the same regression, these
are called the restricted and unrestricted models. The first one corresponds to a version
of the original regression where some of the independent variables are not taken into ac-
count, for example in a model with three variables, the restricted model can be one where
X1 and X2 are not taken into account, while the unrestricted one is simply the original
model with the 3 independent variables included. In this context the unrestricted is at
the very least as capable as the restricted one to account for the variability of Y .
The next phase is to compare how good each of this models is, if they are equally good,
it means that the excluded independent variables had zero impact on how good the fit
of the regression, and thus, their exclusion from the model simplifies it without loosing
explanatory capacity. In this situation we would be accepting the null hypothesis. On the
contrary, if the unrestricted model is highly more capable of capturing the variation in
the dependent variable then the alternative hypothesis is accepted, meaning the addition
of X1 and X2 substantially increases the ability of the model to catch how does Y behave.

Unrestricted model:

Y = β0 + β1 X1 + β2 X2 + β3 X3 + u

Restricted model:

Y = β0 + β3 X3 + u
The way to calculate this is to use the F-Statistic. This statistic takes into account a
few parameters: the amount of observations (N), the degrees of freedom (N-k) where k is
number of independent variables in the unrestricted model, and the number of restrictions
imposed (number of variables difference between the unrestricted and restricted models).
The numerical value of the F-statistic is always positive, and a lower value corresponds
with a small loss of explaining ability in the restricted model.
An additional advantage of using the F-test is the possibility of checking for the joint
relevance of groups of variables that taken independently appear to be irrelevant accord-
ing to their T-test. There can be a case where two or more variables seem to be not
relevant in terms of their explaining capability of the dependent variable variation, but
they are correlated and also correlated with Y , and as such if they are considered as a

2
Quantitative Methods: Problem set 2 Santiago Hernández Carrasco

whole, their inclusion on the model result in a substantially higher explanatory capability.

2. You would like to analyse the factors that contribute to spending on Christmas gifts
Y . You include the number of visits at the shopping mall X1 as an explanatory variable
and your model becomes:
Y = β0 + β1 X1 + u
However, you find that disposable income X2 is correlated with both visits at the shop-
ping mall X1 and spending on Christmas gifts Y . Regarding your estimator for β1 what
problem would occur when you exclude disposable income from your model? Explain if
you expect your estimator -β̂1 - to be too high, or too low, compared to β1 ?

The problem with not including a variable, in this case disposable income, which is cor-
related to both the model’s dependent and independent variables is a typical omitted
variable bias situation. In first place it would cause a violation of the zero conditional
mean assumption, meaning the expected value of the error term of the model would not
be 0.
On the other hand if such a model (without disposable income as an independent vari-
able) is run can be expected to overestimate the effect of visits at the shopping mall X1 ,
given that all the capability to explain the variability of spending on Christmas gifts Y
is assigned to it, even when in a more complete model, much of this variability would be
the result of disposable income X2 . In a few worlds the value of our estimator -β̂1 - would
be toot high compared to the real value of β1 .

3. Use the dataset return.dta which you can find on moodle. It contains cross sec-
tional data on CEO salaries in 1990.

(a) The variable salary displays CEO salary in 1990 (thousands USD). What is the me-
dian salary in this dataset?

The following table shows the minimum, maximum, and median among other descrip-
tive statistics for the salaries in thousands of USD variable from the dataset.

Statistic Value
Minimum 267
1st Quartile 733.5
Median 1040.5
Mean 1325.1
3rd Quartile 1364.2
Maximum 14336

Table 1: Descriptive statistics of CEO’s salaries in 1990 (thousands USD).

The median salary for CEO’s in 1990 was 1040.5 thousands of USD, or $1,040,500 USD.

(b) Run the following univariate model, where netinc is the net income (millions of USD)
of the CEO’scompany in 1990:

Salary = β0 + β1 ∗ netinc + u

3
Quantitative Methods: Problem set 2 Santiago Hernández Carrasco

Interpret your results. Were they to be expected? Is your coefficient statistically signifi-
cant? Is it economically significant? Do you expect your estimator to be unbiased?

Table 2: Company’s Net Income on CEO’s Salary (Thousands of USD)

Dependent variable:
CEO Salary in Thousands of USD
Company’s Net Income - β1 0.389∗∗
(0.177)

Intercept - β0 1,126.057∗∗∗
(156.413)

Observations 142
R2 0.033
Adjusted R2 0.026
Residual Std. Error 1,519.410 (df = 140)
F Statistic 4.827∗∗ (df = 1; 140)
∗ ∗∗ ∗∗∗
Note: p<0.1; p<0.05; p<0.01

This results show that the model predicts a CEO to earn an extra $389 USD in terms
of salary for every extra million in net income for their company, while also pointing out
how a CEO of a company with $0 USD in net income would receive a salary of $1,126,057
USD.
Logically it makes little sense for a company struggling, with little to no income to spend
over a million dollars paying a CEO. This is a result of the linear model of the regres-
sion. In terms of statistic significance, the estimator for the effect of the company net
income on the CEO’s salary is significant at the 95% level. However in economic terms
it’s significance is much less clear, it becomes a subjective endeavour to determine if they
are. In my opinion a variation of $389 USD per every additional million dollars does not
appear to be a enticing enough motivation, when the base salary is over a million dollars.
In this sense, even tough there is a correlation between them, it seems a CEO has little
motivation for his company to get larger net incomes, at least in terms of its effects on
their salary.
The question of the unbiasedness the first four assumptions of the OLS model have to be
meet, in first place the linear relation between dependent and independent variables, in
second the random sampling, as a model with only a single independent variable the third
assumption is variability in the dependent variable, fourth the zero conditional mean. The
first three assumptions seem to be meet without much difficulty. There appear to be a
linear relation between salary and the net income of the company; as the dataset is given,
its quality is assumed, and so is the randomness of the sample; third, the existence of
diverse values for the salary of CEOs satisfy the assumption. Now the real hard question,
if this data has a zero conditional mean or not. In the real world is quite complicated to
determine if this condition is actually meet, given the possible existence of other related

4
Quantitative Methods: Problem set 2 Santiago Hernández Carrasco

variables not included on the model having a direct effect on how good is its fit. In such
sense is quite possible that the estimator in this model is not unbiased.

(c) Generate a new variable salaryEUR = 1.05USD that converts hourly earning to Euros.
Now run the following model:
salaryEU R = β0 + β1 ∗ netinc + u
How do your estimates and t-statistics compare to those of the model in (b)? Why?

Table 3: Company’s Net Income on CEO Salary (Thousands of EUR)

Dependent variable:
CEO Salary in thousands of EUR
Company’s Net Income - β1 0.370∗∗
(0.168)

Intercept - β0 1,072.435∗∗∗
(148.965)

Observations 142
R2 0.033
Adjusted R2 0.026
Residual Std. Error 1,447.057 (df = 140)
F Statistic 4.827∗∗ (df = 1; 140)
∗ ∗∗ ∗∗∗
Note: p<0.1; p<0.05; p<0.01

The comparison between the original model (thousands of USD) and the second one (thou-
sands of EUR) shows in first place the same R2 and adjusted R2 , meaning both have the
same capacity to explain the variability of the CEO’s salary. Delving into the coefficient
of the variables its possible to see how the values of the coefficients of the second model
are equal to the ones of the first one, multiplied by the exchange rate of USD to EUR.
In this case the coefficients of the EUR model are equal to the USD ones when they are
multiplied by a factor of 1.05.

β1 (EUR) * 1.05 = β1 (USD)


0.370 * 1.05 = β1 (USD)
0.3885 = β1 (USD)
0.3885 ≈ 0.3891
The reason behind this situation is the use of diverse units to show the relation between
the independent and dependent variables. Even tough the model’s ability to predict the
variability of a CEO’s variable is equal for both of them, the numeric value of the coef-
ficients do depend on the specific units used for measuring the dependent variable Y , in
these models the salary, either in USD or EUR.

1
The sligth difference is due to the setting of the R output related to the number of decimals included.

5
Quantitative Methods: Problem set 2 Santiago Hernández Carrasco

(d) Run the following specifications and interpret the estimated effects of netinc on salary.
Which specification has the best fit?

salary = β0 + β1 · netinc + β2 · netinc2 + u (1)

log(salary) = β0 + β1 · netinc + u (2)

salary = β0 + β1 · log(netinc) + u (3)

log(salary) = β0 + β1 · log(netinc) + u (4)

1.salary = β0 + β1 ∗ netinc + β2 ∗ netinc2 + u

Table 4: Regression - Company’s Net Income + Company’s Net Income squared on CEO
Salary

Dependent variable:
CEO Salary in thousands of USD
Company’s Net Income (squared) - β2 -0.0002
(0.0001)

Company’s Net Income - β1 1.091∗∗


(0.471)

Intercept - β0 926.636∗∗∗
(198.840)

Observations 142
R2 0.051
Adjusted R2 0.037
Residual Std. Error 1,510.848 (df = 139)
F Statistic 3.737∗∗ (df = 2; 139)
∗ ∗∗ ∗∗∗
Note: p<0.1; p<0.05; p<0.01

6
Quantitative Methods: Problem set 2 Santiago Hernández Carrasco

2.log(salary) = β0 + β1 ∗ netinc + u

Table 5: Regression - Company’s Net Income on log of CEO Salary

Dependent variable:
log of CEO salary in thousands of USD
Company’s Net Income - β1 0.0003∗∗∗
(0.0001)

Intercept - β0 6.794∗∗∗
(0.056)

Observations 142
R2 0.153
Adjusted R2 0.147
Residual Std. Error 0.548 (df = 140)
F Statistic 25.283∗∗∗ (df = 1; 140)
∗ ∗∗ ∗∗∗
Note: p<0.1; p<0.05; p<0.01

3.salary = β0 + β1 ∗ log(netinc) + u

Table 6: Regression - log of Net Income on CEO Salary

Dependent variable:
CEO Salary in thousands of USD
log of Net Income - β1 297.605∗∗∗
(111.166)

Intercept - β0 -346.041
(636.920)

Observations 142
R2 0.049
Adjusted R2 0.042
Residual Std. Error 1,507.284 (df = 140)
F Statistic 7.167∗∗∗ (df = 1; 140)
∗ ∗∗ ∗∗∗
Note: p<0.1; p<0.05; p<0.01

7
Quantitative Methods: Problem set 2 Santiago Hernández Carrasco

4.log(salary) = β0 + β1 ∗ log(netinc) + u

Table 7: Regression - log of Net Income on log of CEO Salary

Dependent variable:
log of CEO Salary in thousands of USD
log of Net Income - β1 0.243∗∗∗
(0.039)

Intercept - β0 5.593∗∗∗
(0.222)

Observations 142
R2 0.219
Adjusted R2 0.213
Residual Std. Error 0.527 (df = 140)
F Statistic 39.183∗∗∗ (df = 1; 140)
∗ ∗∗ ∗∗∗
Note: p<0.1; p<0.05; p<0.01

Among this models, the one with the best fit is the last one of them, with the highest R2
value. The following table allows to compare easily this fact:

Regression Model Adjusted R2 Value


1. Company’s Net Income+Company’s Net Income2 on CEO Salary 0.037
2. Company’s Net Income on log of CEO Salary 0.147
3. log of Net Income on CEO Salary 0.042
4. log of Net Income on log of CEO Salary 0.213

Table 8: Comparative Regression Adjusted R2 values.

Finally, the interpretation of this results for every one of the models is as follows:
1. Company’s Net Income + Company’s Net Income squared on CEO Salary - A vari-
ation of 1 million dollars in the non-squared net income translates into an increase
of 1.091 thousands dollars ($1,091 USD). If the increase of 1 unit is in the other
independent variable - squared net income - then the change in the CEO’s salary
would be negative, it would decrease by 0.0002 thousands of dollars (0.20 USD, or
20 cents of a dollar) while all else remains equal.

2. Company’s Net Income on log of CEO Salary - A variation of 1 million dollars in


the company’s net income carries a variation of 0.0003% in the CEO’s salary, while
all else remains equal.

3. log of Net Income on CEO Salary - A variation of 1% in the company’s net income
carries a variation of 297.605 thousands of dollars ($297,605 USD) in the CEO’s
salary, while all else remains equal.

8
Quantitative Methods: Problem set 2 Santiago Hernández Carrasco

4. log of Net Income on log of CEO Salary - A variation of 1% in the company’s net
income carries a variation of 0.243% in the CEO’s salary, while all else remains
equal..

(e) Take the model that includes a quadratic term. What is the meaning of the results?
Do they make sense? What salary do you predict for a CEO whose company had a net
income of 1.5 billion USD?

The model including a quadratic term shown above means that a CEO of a company
with 0 Net income and 0 squared net income, would receive a salary of 926.636 thousands
of dollars ($926,636 USD) if overwriting else stays constant, an increase of 1 million dol-
lars in the non-squared net income translates into an increase of 1.091 thousands dollars
($1,091 USD). if the increase of 1 unit is in the other independent variable - squared net
income - then the change in the CEO’s salary would be negative, it would decrease by
0.0002 thousands of dollars (0.20 USD, or 20 cents of a dollar) while all else remains equal.

The analysis of those results show little coherence, meaning there cannot be an increase in
net income that does not translate also in an increase of the squared net income, however
such a situation points towards opposing effects on the salary of the CEO, meaning this
results make little sense. For example, an increase of 3 million dollars in the net income
would translate, according to this model, into an increase of the CEO’s salary of $3,273
USD. However this same increase means the squared net income increases by 9, causing
a reduction of $1.8 USD. This situation makes no sense.
Additionally, the inclusion of the second independent variable (squared net income) does
not helps in a significant manner to increase the model’s capability to explain the varia-
tion of the CEO’s salary. This can be determined by comparing the adjusted R2 value of
the model with and without this particular variable. The corresponding values are 0.037
and 0.026. This slight difference is not enough to consider valuable the inclusion of the
squared term in the model.

log(Salary) = 5.593 + 0.243 ∗ log(N etIncome)

log(Salary) = 5.593 + 0.243 ∗ log(1500)


log(Salary) = 5.593 + 0.243 ∗ 3.176
log(Salary) = 5.593 + 0.772
Salary = 106.365
Salary = 2, 142.105
Finally a CEO whose company has a net income of 1.5 billion usd would receive a salary
of: 2,142.105 thousands of dollars, ($2,142,105 USD) according to the model.

(f) Add the variable return - which indicates the % change in stock price between 1990
and 1994 of the CEO’s company - to the model with the best fit from (e). Can you reject
the null hypothesis of both variables not having an effect on price at the 5% and 10%
significance level?

Including the return in the regression model with the best fit from a previous section,
turn the following results:

9
Quantitative Methods: Problem set 2 Santiago Hernández Carrasco

Table 9: Regression - log of Company’s Net Income + Return on log of CEO salary

Dependent variable:
log of CEO Salary
Return - β2 0.001
(0.001)

log of Company’s Net Income - β1 0.245∗∗∗


(0.039)

Intercept - β0 5.585∗∗∗
(0.222)

Observations 142
R2 0.226
Adjusted R2 0.215
Residual Std. Error 0.526 (df = 139)
F Statistic 20.293∗∗∗ (df = 2; 139)
∗ ∗∗ ∗∗∗
Note: p<0.1; p<0.05; p<0.01

The results show a F-Statistic value of 20.293 with a statistical significance at the 99%
level, meaning that the null hypothesis (both independent variables not having an effect
on the dependent one) can be rejected and as such, when both variables are analysed as a
whole, they have a statistical significant effect, and thus cannot be taken out of the model
without it losing an important amount of its capacity to explain the variation in Y .

10

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy