CU ASwR Lab12 Sol

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

JTMS-03 Applied Statistics with R

Spring Semester 2023

Lab 12 – Multiple linear regression (Extensions) – Solution


May 03, 2023

As part of your work for the DIW (German Institute for Economic Research), you investigate the socio-
economic factors driving the subjective well-being of German citizens. In line with OECD Guidelines on
Measuring Subjective Well-being1, you conceptualize subjective well-being in terms of citizens’ experienced
satisfaction with life. Prior to collecting your own, primary representative data, you gain first impressions by
drawing on the population representative sample from Nordrhein-Westfalen in the 2019 Vielfaltsbarometer
study. The latter operationalizes life satisfaction with the item: “Taking all things together, how satisfied are
you at present with your life?”. Your research focuses on biological sex, education and income. Controlling
for age, you test the following hypotheses:
H1: In comparison to men, women are more satisfied with life.
H2: The higher the level of education of the respondent, the higher the reported life satisfaction.
H3: Economically better-off respondents are more satisfied with life.

Data VielfaltNRW.Rdata
Source Robert-Bosch-Stiftung (2019)2
Variables (only the relevant ones)
wb01 Life satisfaction (0= completely unsatisfied to 10= completely satisfied; 99 = don’t know)
female Biological sex of respondent (0= male, 1= female)
age Age of respondent (in years)
edu Respondent’s level of education (1= lower or no formal, 2= vocational training, 3= tertiary)
eqhhinc Respondent’s equivalized household income (in Euro)

Reading the data in R

setwd("Type/your/directory/here")
library(foreign)
data.lab12 <- read.spss("VielfaltNRW.sav", header= T, to.data.frame= T,
use.value.labels= F, use.missings= T)

Tasks
1. Data preparation.
a) Inspect the involved variables for (undeclared) missing values. Resolve the issues, if any, and
produce a missing-free dataset.

Only the variable capturing respondents’ life satisfaction is affected by missing values. In particular, one
respondent has a score of 99. The latter value codes the answering option “don’t know”.

table(data.lab12$wb01, useNA= "always")


##
## 0 1 2 3 4 5 6 7 8 9 10 99 <NA>
## 6 1 7 9 5 30 22 51 140 40 67 1 0

1
OECD. (2013). OECD Guidelines on Measuring Subjective Well-being. Paris: OECD Publishing.
2
https://www.bosch-stiftung.de/en/publication/cohesion-diversity-diversity-barometer-robert-bosch-stiftung

1
According to the output, 99 is treated by the software as a valid score. If preserved as such, the estimates
for life satisfaction will be distorted. This is why, 99 has to be declared as a missing value (NA).

data.lab12$wb01[data.lab12$wb01==99] <- NA

table(data.lab12$wb01, useNA= "always")


##
## 0 1 2 3 4 5 6 7 8 9 10 <NA>
## 6 1 7 9 5 30 22 51 140 40 67 1

None of the other variables needed for the analyses has any (undeclared) missing values. The evidence is
not presented here for space reasons.

datanoNA.lab12 <- na.omit(data.lab12, cols= "wb01")


attach(datanoNA.lab12)

b) Consider the level of measurement of education. Take the steps necessary to analyze its effect.

library(summarytools)
freq(edu)
## Frequencies
## edu
## Type: Numeric
##
## Freq % Valid % Valid Cum. % Total % Total Cum.
## ----------- ------ --------- -------------- --------- --------------
## 1 76 20.11 20.11 20.11 20.11
## 2 188 49.74 69.84 49.74 69.84
## 3 114 30.16 100.00 30.16 100.00
## <NA> 0 0.00 100.00
## Total 378 100.00 100.00 100.00 100.00

The variable on education has three ordered categories. Hence, it is of ordinal quality. In order to analyze
the effect of education in the framework of linear regression, each category has to be expressed as a
dummy variable.

eduUni <- ifelse(edu==3, 1, 0)


eduVoc <- ifelse(edu==2, 1, 0)
eduLow <- ifelse(edu==1, 1, 0)
datanoNA.lab12 <- data.frame(datanoNA.lab12, eduUni, eduVoc, eduLow)

c) Transform the raw values of household income by taking their natural logarithm.

lnIncome <- log(eqhhinc)


datanoNA.lab12 <- data.frame(datanoNA.lab12, lnIncome)

2. Data description: Describe life satisfaction in terms of central tendency (mean) and dispersion (range
and standard deviation).

library(psych)
describe(wb01)
## vars n mean sd min max range
## X1 1 378 7.54 2.08 0 10 10

2
The individual responses to the item tapping on life satisfaction span the entire measurement scale from
one extreme (completely unsatisfied, coded with 0) to the other (completely satisfied, coded with 10). The
average level of life satisfaction in the sample is at 7.54. Given that the implied mean of the scale is 5,
respondents can be characterized as being, on average, quite satisfied with their life. The individual
responses vary around the mean at a standard deviation of 2.08.

3. Data analysis. Assess the significance of the estimates at the 5 % level.


a) Specify a model to test the hypotheses.

In order to address the hypotheses, the regression model takes into account the effects of biological sex,
education and income. The model additionally controls for the effect of age. Regarding biological sex, as
captured by the dummy variable for women (female), the group of men serves as the reference category.
As to education, the dummy variable for respondents with a tertiary degree (eduUni) will not be included in
the model. For this reason, respondents with a tertiary degree will serve as the reference category. Taken
together, men with a tertiary degree represent the overarching reference group in the model.

m1 <- lm(wb01 ~ female + age + eduVoc + eduLow + lnIncome)


summary(m1)
##
## Call:
## lm(formula = wb01 ~ female + age + eduVoc + eduLow + lnIncome)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.7325 -0.7194 0.3596 1.1767 3.4173
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.320375 1.262279 1.838 0.0668 .
## female 0.373763 0.214768 1.740 0.0826 .
## age 0.001149 0.006303 0.182 0.8554
## eduVoc 0.100712 0.247729 0.407 0.6846
## eduLow 0.183533 0.322890 0.568 0.5701
## lnIncome 0.653540 0.156722 4.170 3.79e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.038 on 372 degrees of freedom
## Multiple R-squared: 0.05046, Adjusted R-squared: 0.0377
## F-statistic: 3.954 on 5 and 372 DF, p-value: 0.001662

b) Report and interpret the evidence.

To start with, there is a positive, but insignificant association between respondents’ age and their life
satisfaction (b = 0.001, p = 0.86).

As to the effect of biological sex, the evidence shows that women are more satisfied with life than their male
counterparts of similar age, education and income. Women’s level of life satisfaction is, on average, 0.37
points higher than that of men (b = 0.374). Considering that the evidence is in line with the one-tailed

3
hypothesis tested, the effect of biological sex is significant at the 5 % level in a one-tailed test (p = 0.04).
Hypothesis H1 is, thus, supported with the data.

Moving on to education, the results inform that respondents’ life satisfaction does not differ significantly with
respect to their level of education. Moreover, the evidence points to a tendency opposite to the
hypothesized association between education and life satisfaction. Holding all other predictors constant (i.e.
at similar age, income and biological sex), respondents with vocational training tend to be somewhat more
satisfied with life (b = 0.101, p = 0.68) than respondents with a tertiary degree. Life satisfaction among
those with lower or no formally completed education tends to be relatively highest among the three
educational groups (b = 0.184, p = 0.57). Taken together, the evidence suggests that hypothesis H2 cannot
be supported with the data.

Finally, income is positively and highly significantly related to life satisfaction (b = 0.654, p < 0.01). In other
words, respondents with higher household income are significantly more satisfied with life. Hence,
hypothesis H3 can be supported with the data.

c) Report the amount and significance of the explained variation in life satisfaction.

Taken together, the individual socio-economic characteristics explain only 5 % (R2 = 0.05) of the variation
in life satisfaction. Given the sample size (n = 378), this amount, though rather small, emerges as highly
significant: F(5, 372) = 3.954, p < 0.01.

d) Based on the model estimates, write down the regression equation for the prediction of life
satisfaction.

Life satisfaction = 2.320 + 0.374*female + 0.001*age + 0.101*eduVoc + 0.184*eduLow + 0.654*lnIncome

e) Predict the level of life satisfaction for a 22-year old woman with a tertiary degree and a monthly
income of 520 Euro3.

2.320 + 0.374*1 + 0.001*22 + 0.101*0 + 0.184*0 + 0.654*6.254


## 6.806116

The predicted life satisfaction for the person in question is at about 6.81 points on the scale from 0 to 10.

f) Extend the model from a) in order to test for a quadratic effect of age. Report and interpret the
evidence with respect to age.

The test for a quadratic effect of age is essentially concerned with the question as to whether age and life
satisfaction are linearly related or not. In order to account for a possible quadratic effect of age, an additional
variable (agesq) will be created by squaring the age of the respondents.

agesq <- age*age


datanoNA.lab12 <- data.frame(datanoNA.lab12, agesq)

m2 <- lm(wb01 ~ female + age + agesq + eduVoc + eduLow + lnIncome)


summary(m2)

3 Assume that ln(520) is approximately equal to 6.254.

4
##
## Call:
## lm(formula = wb01 ~ female + age + agesq + eduVoc + eduLow +
## lnIncome)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.9726 -0.7246 0.3212 1.1744 3.4331
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.3872904 1.3965738 2.425 0.0158 *
## female 0.3707671 0.2141679 1.731 0.0842 .
## age -0.0590030 0.0346852 -1.701 0.0898 .
## agesq 0.0006003 0.0003404 1.763 0.0787 .
## eduVoc 0.0945738 0.2470543 0.383 0.7021
## eduLow -0.0256507 0.3431344 -0.075 0.9405
## lnIncome 0.6930885 0.1578805 4.390 1.48e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.033 on 371 degrees of freedom
## Multiple R-squared: 0.05835, Adjusted R-squared: 0.04312
## F-statistic: 3.832 on 6 and 371 DF, p-value: 0.001014

Since the relationships between life satisfaction and the other predictors remain unchanged in essence, the
interpretation of the evidence from the extended model focuses only on the effect of age. The linear term
(regression coefficient for the variable age) is negative at b = -0.059, whereas the quadratic term (regression
coefficient of the variable agesq) is positive at b = 0.0006. Taken together, the two terms indicate a U-
shaped relationship between age and life satisfaction: Up to a certain age, life satisfaction decreases,
eventually reaching a minimum, and from this point onward increases again. It should be mentioned,
though, that both terms are only marginally significant at the 10% level. (For the curious readers: The
inflection point can be precisely determined on the basis of the regression coefficients. It refers to an age
of approximately 49.144 years.)

g) Extend the model from a) in order to test whether the effect of income differs for men and women.
Report and interpret the evidence.

A test as to whether the effect of income differs for men and women can be performed by including the
interaction effect of income and biological sex.

femaleXlnIncome <- female*lnIncome


datanoNA.lab12 <- data.frame(datanoNA.lab12, femaleXlnIncome)

4 At what age is life satisfaction at its lowest? The inflection point can be most easily determined using the
first derivative of the underlying quadratic function. Regression equation for the prediction of life satisfaction:
ŷ = 3.387 + 0.371*female – 0.059*age + 0.0006*age2 + 0.095*eduVoc – 0.026*eduLow + 0.693*lnIncome
First derivative with respect to age: dŷ/dx = 0 + 0 – 0.059 + 2*0.0006*age + 0 – 0 + 0.
The inflection point is where dŷ/dx = 0, i.e. –0.059 + 2*0.0006*age = 0.
Hence: age = 0.059/(2*0.0006) = 49.14.

5
m3 <- lm(wb01 ~ female + age + eduVoc + eduLow + lnIncome + femaleXlnIncome)
summary(m3)
##

##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.507833 1.457517 0.348 0.7277
## female 6.225109 2.407826 2.585 0.0101 *
## age 0.000749 0.006264 0.120 0.9049
## eduVoc 0.107463 0.246112 0.437 0.6626
## eduLow 0.123072 0.321717 0.383 0.7023
## lnIncome 0.895690 0.184635 4.851 1.81e-06 ***
## femaleXlnIncome -0.781987 0.320521 -2.440 0.0152 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.025 on 371 degrees of freedom
## Multiple R-squared: 0.06545, Adjusted R-squared: 0.05034
## F-statistic: 4.331 on 6 and 371 DF, p-value: 0.0003071

Given that the relationships between life satisfaction and the other predictors do not change in essence,
the interpretation of the evidence from the extended model will focus only on the main effects of biological
sex and income as well as their interaction effect. As already discussed on the basis of the estimates from
the first model, women are significantly more satisfied with life than men and higher income boosts life
satisfaction. The negative (b = -0.782) and statistically significant (p = 0.02) interaction term informs that
the effect of income does indeed differ for men and women. More concretely, the effect of income on life
satisfaction remains positive for both sexes; it is, however, stronger among men (b = 0.896) than among
women (b = 0.114)5. In other words, having higher income makes a much stronger contribution to the life
satisfaction of men than it does to that of women.

5 The interaction effect of income and biological sex can be easily interpreted after rewriting the
regression equation:
(0.508 +...+ 0.896*lnIncome – 0.782*female*lnIncome) as 0.508 +...+ (0.896 – 0.782*female)*lnIncome.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy