0% found this document useful (0 votes)
179 views

Assignment 5

This document contains an analysis of blood pressure data. Various linear regression models are fitted to predict systolic blood pressure using variables like weight, years lived in urban areas, etc. The best fitting model uses both weight and years lived in urban areas as predictors. This multiple variable model has an r-squared value of 0.47, performing better than single variable models. Diagnostic plots of the multiple variable model show it meets the assumptions of linear regression like linearity of residuals and normality.

Uploaded by

Ray Guo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
179 views

Assignment 5

This document contains an analysis of blood pressure data. Various linear regression models are fitted to predict systolic blood pressure using variables like weight, years lived in urban areas, etc. The best fitting model uses both weight and years lived in urban areas as predictors. This multiple variable model has an r-squared value of 0.47, performing better than single variable models. Diagnostic plots of the multiple variable model show it meets the assumptions of linear regression like linearity of residuals and normality.

Uploaded by

Ray Guo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Assignment 5: Under (blood) pressure

Raymond Guo
2020-02-19

Exercise 1

blood_pressure %>%
gather(Age:Pulse, key = "measurement", value = "value") %>%
ggplot() +
geom_point(mapping = aes(x = value, y = Systol)) +
facet_wrap(~ measurement, scales = "free_x")
Age Calf Chin

160

140

120

20 30 40 50 0 5 10 15 20 5.0 7.5 10.0

Forearm Height Pulse

160
Systol

140

120

2.5 5.0 7.5 10.0 12.5 1500 1550 1600 1650 50 60 70 80 90

Weight Years

160

140

120

60 70 80 0 10 20 30 40
value

Exercise 2
THe years graph shows a negative correlation.
The variables that show a positive correlation are Forearm, Weight, Calf, and Height.

1
Exercise 3

blood_pressure_updated <- blood_pressure%>%


mutate(urban_frac_life = Years / Age)

Exercise 4

systol_urban_frac_model <- lm(Systol ~ urban_frac_life, data = blood_pressure_updated


)

Exercise 5

systol_urban_frac_model %>%
tidy()

term estimate std.error statistic p.value


(Intercept) 133.49572 4.038011 33.059770 0.0000000
urban_frac_life -15.75182 9.012962 -1.747686 0.0888139

systol_urban_frac_model %>%
glance() %>%
select(r.squared)

r.squared
0.0762564

Exercise 6

systol_urban_frac_df <- blood_pressure_updated %>%


add_predictions(systol_urban_frac_model) %>%
add_residuals(systol_urban_frac_model)

i. The column that holds the response value is pred


ii. The column that holds the residuals is resid

Exercise 7
We can tell if it is reliable if the dependent variable Y has a linear relationship to the independent
variable X.
ggplot(systol_urban_frac_df) +
geom_point(mapping = aes(x = urban_frac_life, y = Systol)) +
geom_abline(slope = systol_urban_frac_model$coefficients[2], intercept = systol_urban_frac_mo

2
160

Systol
140

120

0.00 0.25 0.50 0.75


urban_frac_life

Exercise 8

ggplot(systol_urban_frac_df) +
geom_point(mapping = aes(pred, Systol)) +
geom_abline(
slope = 1,
intercept = 0,
color = "red",
size = 1
)

160
Systol

140

120

120 125 130


pred

ggplot(systol_urban_frac_df) +
geom_point(aes(pred, resid)) +
geom_ref_line(h = 0)

3
30

20
resid
10

−10

120 125 130


pred

i. The plots suggest the condition was not violated. There is no curve even remotely shown.
ii. The plots suggest the condition was not violated. There is an equilibrium of the points from
above and below the line. ## Exercise 9
ggplot(data = systol_urban_frac_df) +
geom_histogram(
mapping=aes(x = resid), binwidth = 5
)

10.0

7.5
count

5.0

2.5

0.0
−20 −10 0 10 20 30 40
resid

i. It looks very right skewed and the center is around the value 5 of resid.
ii. The skewed nature of the bell violated the nearly normal residuals because there is a dis-
proportion amount of negative residual values compared to the positive ones ## Exercise
10
ggplot(data = systol_urban_frac_df) +
geom_qq(mapping = aes(sample = resid)) +
geom_qq_line(mapping = aes(sample = resid))

4
40

20

sample
0

−20

−2 −1 0 1 2
theoretical

This graph clearly shows a violation within the nearly normal residual condition. There are more
points plotted above the linear line than below which explains the right skewed image of the bell
shape curve.

Exercise 11

systol_weight_model <- lm(Systol ~ Weight, data = blood_pressure_updated


)

systol_weight_model %>%
glance() %>%
select(r.squared)

r.squared
0.2718207

Yes, because r.squared is closer to 1 compared to what urban_frac_life can provide.


systol_weight_df <- blood_pressure_updated %>%
add_predictions(systol_urban_frac_model) %>%
add_residuals(systol_urban_frac_model)

ggplot(systol_weight_df) +
geom_point(mapping = aes(pred, Systol)) +
geom_abline(
slope = 1,
intercept = 0,
color = "red",
size = 1
)

5
160

Systol
140

120

120 125 130


pred

ggplot(systol_weight_df) +
geom_point(aes(pred, resid)) +
geom_ref_line(h = 0)

30

20
resid

10

−10

120 125 130


pred

For the first condition, there is a linear relationship between pred and Systol. For the second
condition, there is a single outliear on the graph which might alter the bell shape curve but not
much. This is not much of a violation. For the third condition, the points around h = 0 looks like
an equilibrium. All three conditions are met so the new model is reliable.

Exercise 12

systol_combo_model <- lm(Systol ~ urban_frac_life + Weight, data = blood_pressure_updated)

systol_combo_model_df <- blood_pressure_updated %>%


add_predictions(systol_urban_frac_model) %>%
add_residuals(systol_urban_frac_model)

6
ggplot(systol_combo_model_df) +
geom_point(mapping = aes(pred, Systol)) +
geom_abline(
slope = 1,
intercept = ,
color = "red",
size = 1
)

160
Systol

140

120

120 125 130


pred

ggplot(systol_combo_model_df) +
geom_point(aes(pred, resid)) +
geom_ref_line(h = 0)

30

20
resid

10

−10

120 125 130


pred

For the first condition, there is a linear relationship between pred and Systol. For the second
condition, there is a single outliear on the graph which might alter the bell shape curve but not
much. This is not much of a violation. For the third condition, the points around h = 0 looks like
an equilibrium. All three conditions are met so the new model is reliable.

7
systol_combo_model %>%
glance() %>%
select(r.squared)

r.squared
0.4731078

This mult-variable system performed better because r.squared got closer to the value 1 compared
single-variable.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy