0% found this document useful (0 votes)
7 views

Assignment 3

The document outlines an assignment focused on regression analysis of mercury contamination in walleyes from Island Lake Reservoir, Minnesota. It includes data analysis tasks, model fitting, and interpretation of results related to the relationship between walleye length and mercury levels. The assignment emphasizes the importance of developing consumption advisories based on the findings and discusses the appropriateness of using length versus weight as predictors.

Uploaded by

battesaron9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Assignment 3

The document outlines an assignment focused on regression analysis of mercury contamination in walleyes from Island Lake Reservoir, Minnesota. It includes data analysis tasks, model fitting, and interpretation of results related to the relationship between walleye length and mercury levels. The assignment emphasizes the importance of developing consumption advisories based on the findings and discusses the appropriateness of using length versus weight as predictors.

Uploaded by

battesaron9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Assignment 3 – STAT 360 – Regression Analysis

1 - Mercury Contamination of Walleyes in Island Lake Reservoir


Datafiles: Walleyes Island Lake.JMP and Walleyes Island
Lake.txt

Walleyes are the state fish of Minnesota and are the most important
game fish in MN (estimated 43,000 jobs and $2.8 billion in retail
spending). The main contaminant found in MN walleyes is mercury
which can have health consequences if ingested. Every year the MN
Dept. of Natural Resources (DNR) and the MN Dept. of Health (MDH)
publish waterway specific fish consumption guidelines for at-risk
populations - children under age 15 and women who are or may
become pregnant. Too see current consumption guidelines visit:
http://www.health.state.mn.us/divs/eh/fish/eating/sitespecific.html.

Data Source: These data come from Minnesota's Fish Contaminant


Monitoring Program (FCMP) which is a joint effort by the DNR, MDH,
MN Dept. of Agriculture (MDA), and the Minnesota Pollution Control
Agency (MPCA) from the years (1990 -1998).

Goal: Develop a regression model to predict/explain mercury level


found in the tissues of a walleye (ppm) using length (in.). Of primary
interest is in developing a walleye consumption advisory based on
length for walleyes in Island Lake Reservoir near Duluth, so let
Y=HGPPM and X = LGTHIN.

a) Examine a scatterplot of HGPPM vs. LGTHIN and add marginal


distribution estimates to the plot. Would you characterize the
joint distribution of (LGTHIN,HGPPM) as bivariate normal?
Explain. (3 pts.)

I wouldn’t characterize the joint distribution as bivariate normal


because the length histograms appear to be right skewed and
the HGPPM is not normal distribution.

1
b) Take the natural log of HGPPM and use Analyze >
Distribution to examine the distribution of both HGPPM and
log(HGPPM). Find the sample means of the mercury levels in
the log scale ( log ⁡( y )) and in the original scale ( y ). Convert
log ⁡( y ) back to the original scale and compare it to y . How do
they compare? (3 pts.)

The sample mean of mercury level in log scale log ⁡( y )) = -0.508


Mean of mercury level in the original scale is 0.694
Convert the log scale: e ^ (-0.508) = 0.6017
The mean of the original data is higher than the back-transformed
mean of the log-transformed data.
E(Y)> e^(E(log(Y)), 0.694 > 0.6017

c) Repeat part (b), but this time consider the sample medians
instead. What do you find? (3 pts.)
Med(Y)= 0.625
Med(log(Y)) = -0.470
Convert the original scale = e^ (-0.470) = 0.6250
The median of the original data is equal to the back-transformed median of
the log data.

d) Fit the model:

E ( HGPPM|LGTHIN )=β o + β 1 LGTHIN = -0.4108 + 0.06622*


LGTHIN
Var ( HGPPM|LGTHIN )=σ 2 = (0.384) ^2 = 0.147

2
Examine residual plots and comment on the model assumptions. (3
pts.)

The main residual plot has the shape of a megaphone where the variance is
increasing showing a non-constant variance. And the distribution is not
normal.
The actual vs predicated plot shows a linear trend or linear r/ship between
HGPPM and LNGTHIN.
In the normal quantile plot we have a lot of outliers.

e) Construct a nonconstant variance plot for the model fit in part


0.5
(d), |e^ i| vs . ^y i, and discuss what this model suggests regarding
the model assumptions. (3 pts.)

This model suggests that there is a variation in the residual across different
values of predicated y. Showing there is a non-constant variance. The
variance of errors increases or decreases as the predicated value increases
or decreases.

3
f) Despite the fact this model is clearly deficient, interpret the
both parameters estimates ^β o∧ β^ 1 in words using proper units. (2
pts.)
-0.4108 + 0.06622* LGTHIN
^β = when the predicted length value is 0, mercury level is equal to- 0.4108.
o
This represents the baseline level when the model predicts no effect.
^β =¿When length increases by one unit (inches) the mercury level will
1
increase by 0.06622 units (PPM)
g) Now fit the model:

E¿
Var ( log ( HGPPM )|LGTHIN )=σ
2

E¿
Var ( log ( HGPPM )|LGTHIN )=¿ 0.5485) ^2 = 0.30
Examine residual plots and comment on the model assumptions.
(3 pts.)

The main residual plot looks fine, there is even distribution across the x
value.
Actual vs predicated plot shows liner trend between log (HGPPM) and
LGTHIN.
The normal quantile plot: the data points are within the boundary showing
and close to the line. Showing normal distribution.

4
h) Construct a nonconstant variance plot for the model fit in part
0.5
(f), |e^ i| vs . ^y i, and discuss what this model suggests regarding
the model assumptions. (3 pts.)

This model suggests that there is no variation in the square root of the
absolute residuals as the value of the predicated y increases. The line is
constant.
This satisfies the constant variance model assumption.

You should find that the model using log ⁡(HGPPM ) as the
response is more appropriate for modeling the relationship
between the mercury levels found in the walleyes using their
length in inches. For the remainder of this problem you will be
working with the model from part (c) where the response is
log ⁡(HGPPM ).

i) Test the hypotheses

NH: E ( log ( HGPPM )| LGTHIN ) =β o


AH: E ( log ( HGPPM )| LGTHIN ) =β o+ β1 LGTHIN
and summarize your findings. (3 pts.)

P-value is less than 0.05 and f-ratio is big (100.453) supporting the alternative hypothesis. Also
looking at the sum of squares for the model and the total we can see that the model built using
length was able to explain 11.014 out of 17.15 of the variability in HGPPM.
Showing that the mercury level in the Walleyes does depend on length of the fish.

5
j) Conduct the following test for population slope parameter (β 1)

NH : β 1=0
AH : β 1 ≠ 0

Summarize your results. Square the t-statistic for this test and
compare it to the F-statistic from the test in part (f), what do
you find? (3 pts.)

The p-value is <0.0001 supporting the alternative hypothesis. The estimate


for length is 0.097, indicating a linear relationship between length and log
(HGPPM).
(t-ratio) ^2= 100.4004
f-ratio = 100.45
The squared t-ratio and the f-ratio values are similar.

k) What is the R-Square (R2 ) value for the regression of log ⁡(HGPPM )
on LGTHIN ? In the context of this problem, carefully explain
what this value is measuring. (3 pts.)

The R^2 value is 0.642, 62.4 % of the variation in log (HGPMM) is explained
by a model using length as a predictor, suggesting a strong predictive
relationship between LGTHIN and the log (HGPPM).
In the original scale, this implies that LGTHIN is a good predictor of the
exponential trend in HGPPM.

l) Use the estimated slope ( ^β 1) and the associated CI for β 1 to


interpret the change in the response in the original scale
associated with a 1 inch increase in the length of walleyes in
Island Lake. Summarize your findings carefully and thoroughly,
both multiplicatively and in terms of a percentage increase in
the mercury level (ppm). (6 pts.)

First convert the values back to the original scale:

6
(B 1) 0.0972
e =e = 1.1021.
UCI 0.1167
e =e =1.1238 .
LCI 0.0778
e =e =1.0809 .

Multiplicative Interpretation: one-inch increase in the length of walleyes


will increase the mercury level by a factor of approximately 1.1021, with a
95% confidence interval ranging from 1.0809 to 1.1238.
Percentage Interpretation: one inch increase in length corresponds to an
approximate 10.21% increase in mercury levels, with a 95% confidence
interval suggesting the increase could be between 8.09% and 12.38%.

m) Construct a scatterplot of Y = HGPPM vs. X = LGTHIN and use


Bivariate Fit > Fit Special to the fit the model
E(log ⁡(HGPPM )∨LGTHIN ). Also add the shaded confidence and
prediction intervals to the plot. Include this plot below and
discuss it. (3 pts.)

The linear fit on the log-transformed data suggests that as LGTHIN


increases, log (HGPPM) increases linearly. That means an exponential
increase in HGPPM with increasing length.
The shaded confidence interval and predication interval show the precision
of the predictions. If the shaded area is smaller the predictions are more
accurate. The predictive interval is much larger than the confidence interval
suggesting the error for individual predictions is much higher than the mean
error.
n) Give a point estimate and CI for E(log ⁡(HGPPM )∨LGTHIN=17.9).
Also convert these back to the original scale and interpret. (4
pts.)

= e^ 0.6763= 1.97
For a walleye of 17.9 inches in length, the expected (HGPPM) is
approximately 1.97 ppm.

LCI = e^0.617 = 1.854 ppm


UCL = e^0.740 = 2.097 ppm

7
We are 95% confident that the true mean of HGPPM for walleyes of 17.9
inches in length lies between 1.854 ppm and 2.097 ppm

~
o) Give a point estimate and PI for log ( HGPPM )∨LGHTIN=20. Also
convert these back to the original scale and interpret. (4 pts.)
e^0.829 = 2.291 ppm
The estimated HGPPM for LGTHIN= 20 is approximately 2.291
ppm.

LPI = e^0.4236 = 1.528 ppm


UPL =e^1.6245 = 5.065 ppm
The 95% prediction interval for HGPPM range is approximately (1.528 ppm,
5.065 ppm).

Note regarding part (l): There are no walleyes in the sample that are 20
inches in length. To obtain these estimates you will need to save the
Prediction Interval Formula to the data table and then add a new row to the
spreadsheet corresponding a walleye that is 20 inches in length.
It is recommended that humans should not consume more than one
fish per month with mercury levels in its tissues greater than .5 ppm.
Because your average walleye angler does not carry a gas
spectrometer in their fishing boat, actually measuring the Hg level
found in a walleye they have caught is a problem. However, it is very
easy for an angler to measure the length of their walleye in inches.

p) Using your regression model, what length of walleye would you


recommend for the “do not eat more than one walleye exceeding
_______ inches per month” advisory? (2 pts.)

E ( HGPPM|LGTHIN )=β o + β 1 LGTHIN = -0.4108 + 0.06622* LGTHIN


0.5=−¿0.4108 + 0.06622* LGTHIN
0.5+ 0.4108=0.06622∗LGTHIN
0.9108/ 0.06622= LGTHIN
=13.76 inches

Note regarding parts (m & n): The process of finding an X value associated
with a specific value for the response (Y) is called inverse prediction. Also
keep in mind that your model is for log(Y) not Y, so you will need to take this
into account when answering this question.

q) It is also recommended that humans should never consume fish


with mercury levels exceeding 1 ppm in their tissues. Complete
the following “we recommend that you do not eat any walleyes
exceeding __________ inches from Island Lake”. (2 pts.)

E ( HGPPM|LGTHIN )=β o + β 1 LGTHIN = -0.4108 + 0.06622* LGTHIN


1=−¿0.4108 + 0.06622* LGTHIN

8
1+0.4108=0.06622∗LGTHIN
1.4108/ 0.06622= LGTHIN
=21.31 inches

r) Would you recommend using your model to predict the mercury


level for a walleye that is 7 inches in length? How about 29
inches? Explain your reasoning. (2 pts.)

Yes, for a Walleye that is 7 inches in length and no for a Walleye that is 29 inches
in length.

I would not recommend using the model for predictions at both 7 inches and
29 inches because these lengths do not fall within the observed data range
used for developing the model.

s) Would you recommend using this model to predict the mercury


levels and develop consumption advisories for walleyes in the
Mississippi River? Explain. (1 pt.)
No, I would not recommend using this model to predict mercury levels and
develop consumption advisories for walleyes in the Mississippi River. The
model was developed using data collected from the Island Lake Reservoir.

t) The Island Lake walleye data also contains the weight (lbs.) for
each of the fish sampled. Do you think using weight as opposed
to length to establish consumption advisories is a good idea?
Justify your answer by fitting models for mercury or log mercury
level using X = WTLB as the predictor and contrasting the
results with those above. (4 pts.)

9
Using length to establish consumption advisories is a better idea than
using weight. The length-based model has a higher R^2 value explaining
more variability in mercury levels, and lower RMSE value compared to the
weight-based model offering more accurate predictions. Additionally, length
is easier to measure consistently in the field, making it a more practical
choice for developing consumption advisories.
u) Another possible model to consider is:

E ( log ( HGPPM )| LGTHIN ) =β o+ β1 log ⁡( LGTHIN )= -5.149 + 1.6697


* log (LGTHIN)
Var ( log ( HGPPM )|LGTHIN )=σ = (0.548) ^2 = 0.300304
2

Fit this model, examine residual plots, and comment on the


adequacy of
this model. (5 pts.)

The main residual plot looks fine, there is even distribution across the x
value and constant variance.
Actual vs predicated plot shows liner trend between log (HGPPM) and
LNGTHIN.
The normal quantile plot: the data points are within the boundary showing
and close to the line. Showing normal distribution.
This model met all of the model assumptions (normality, constant variance,
independence and linearity).

10
v) Use the estimated slope ( ^β 1) from the model in part (t) to
interpret the change in the response in the original scale
associated with a 1 unit increase in the log (LGTHIN ). (3 pts.)
A 1 unit increase in log (LGTHIN) corresponds to
multiplying LGTHIN by e≈2.718
That means 1 unit increase in log (LGTHIN) will multiply the
mercury level by e^(B1) = e^(1.6697)= 5.310.
w) Use the estimated slope ( ^β 1) from the model in part (t) to
interpret the change in the response in the original scale
associated with a 20% increase in the length of walleyes. (3 pts.)
If length increases by 20 % then HGPPM will increase by 33.394%

x) Using this regression model, what length of walleye would you


recommend for the “do not eat more than one walleye exceeding
_______ inches per month” advisory? How does your
recommendation compare to the recommendation from part (o)?
(3 pts.)

E ( log ⁡(HGPPM )|LGTHIN ) =β o + β 1 log ⁡(LGTHIN ) = -5.149 + 1.6697* log


(LGTHIN)
log ⁡(0.5)=−5.149+1.6697∗log(LGTHIN )
−0.3010+5.149=1.6697∗log ⁡(LGTHIN )
4.848/ 1.6697= log (LGTHIN)
=e^(2.904)
=18.24 inches
This value is much higher than the recommended value using the
untransformed model which was 13.76 inches.

y) It is also recommended that humans should never consume fish


with mercury levels exceeding 1 ppm in their tissues. Complete
the following “we recommend that you do not eat any walleyes
exceeding __________ inches from Island Lake” using this
regression model. How does your recommendation compare to
the recommendation from part (p)? (3 pts.)

E ( log ⁡(HGPPM )|LGTHIN ) =β o + β 1 log ⁡(LGTHIN ) = -5.149 + 1.6697* log


(LGTHIN)
log ⁡(1)=−5.149+1.6697∗log ( LGTHIN )
5.149=1.6697∗log ⁡(LGTHIN )
5.149/ 1.6697= log (LGTHIN)
=e ^ (3.083)
=21.82 inches
This value is approximately equal to the untransformed model which was
21.31 inches.

11
z) Use R to fit the model E ( log ( HGPPM )| LGTHIN ) =β o+ β1 log ⁡( LGTHIN )
and obtain a model summary. Include the output from R below.
To retain the appearance of R output using Courier New (10 pt)
as the font. (BONUS 5 pts.)

Code to run in R:
> Island = read.csv(file.choose())
> names(Island)
> attach(Island)
> logHg = log(HGPPM)
> logX = log(LGTHIN)
> trendscatter(logHg~logX)
> lm1 = lm(logHg~logX)
> summary(lm1)

> Island = read.csv(file.choose())


> names(Island)
[1] "WTLB" "LGTHIN" "HGPPM"
> attach(Island)
> hist(WTLB) > hist(HGPPM)

12
> logHGPPM = log(HGPPM)
> hist(logHGPPM)

> library(s20x)
> trendscatter(logHGPPM~LGTHIN)

> logLGTH = log(LGTHIN)


> trendscatter(logHGPPM~logLGTH)

13
14
> lm1 = lm(logHGPPM~logLGTH)

> summary(lm1)

Call:
lm(formula = logHGPPM ~ logLGTH)

Residuals:
Min 1Q Median 3Q Max
-0.4506 -0.3002 0.0038 0.2322 0.7311

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.150 0.440 -11.7 < 2e-16 ***
logLGTH 1.670 0.157 10.6 5.2e-15 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.319 on 56 degrees of freedom


Multiple R-squared: 0.668, Adjusted R-squared: 0.662
F-statistic: 112 on 1 and 56 DF, p-value: 5.22e-1

15

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy