0% found this document useful (0 votes)
2 views

Assignment 1

The assignment focuses on analyzing data related to walleyes in Island Lake, Minnesota, emphasizing mercury contamination and its implications for health. Students are tasked with statistical analyses including characterizing distributions, hypothesis testing, confidence intervals, and regression analysis. The assignment also includes creating visual representations and interpreting relationships between variables such as length and mercury levels.

Uploaded by

battesaron9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Assignment 1

The assignment focuses on analyzing data related to walleyes in Island Lake, Minnesota, emphasizing mercury contamination and its implications for health. Students are tasked with statistical analyses including characterizing distributions, hypothesis testing, confidence intervals, and regression analysis. The assignment also includes creating visual representations and interpreting relationships between variables such as length and mercury levels.

Uploaded by

battesaron9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

STAT 360 – Regression Analysis – Assignment 1 (48 pts.

) DUE 1/22/25
by midnight
Fall 2024

1 - Walleyes in Island Lake (Datafiles: Walleyes Island Lake.JMP & Walleyes


Island Lake.txt)

Walleyes are the state fish of Minnesota and are the most important game fish in
MN (estimated 43,000 jobs and $2.8 billion in retail spending). The main
contaminant found in MN walleyes is mercury which can have health
consequences if ingested. Every year the MN Dept. of Natural Resources (DNR)
and the MN Dept. of Health (MDH) publish waterway specific fish consumption
guidelines for at-risk populations - children under age 15 and women who are or
may become pregnant. Current consumption guidelines can be found at the MN
Dept. of Health website:
http://www.health.state.mn.us/divs/eh/fish/eating/sitespecific.html
Data Source: These data come from Minnesota's Fish Contaminant Monitoring Program (FCMP) which
is a joint effort by the DNR, MDH, MN Dept. of Agriculture (MDA), and the Minnesota Pollution
Control Agency (MPCA) from the years (1990 -1998).

The variables in this data file:


 Year – year walleyes were sampled
 WTLB – weight of walleye (lbs.)
 LGTHIN – length of walleye (in.)
 HGPPM – Mercury contamination in walleye fillet (ppm)
a) Use histograms, kernel density estimates, and summary statistics to
characterize the
distribution of weight, length, and mercury concentration for the
walleyes in Island
Lake. (6 pts.)
STAT 360 – Regression Analysis – Assignment 1 (48 pts.) DUE 1/22/25
by midnight
Fall 2024
b) Is there evidence that the mean length of walleyes in Island Lake is over
15 inches?
Conduct an appropriate test to determine this, checking necessary
assumptions.
Summarize your findings. (4 pts.)

Yes, there is enough statistical evidence to suggest that the mean length of walleyes in Island Lake is
over 15 inches. (p=0.031)
c) Give a 95% confidence interval for the mean length of walleyes in Island
Lake.
Interpret this interval. (3 pts.)
We are 95% confident that the true mean length of walleyes in Island Lake lies within the interval of
15.49 to 17.97 inches. This means that if we were to repeatedly sample walleyes from Island Lake
and calculate the CI each time, approximately 95% of those intervals would contain the true mean
length of walleyes in the lake
d) Find 95% prediction interval for the length of randomly “selected”
walleye from
Island Lake. (3 pts.)

Sample size is 1.
A lower predication interval of 7.55 and an upper predication interval of 25.81.

e) Construct a scatterplot of weight (lbs.) vs. length (in.). Comment on the

variance functions, E(Weight|Length) and Var(Weight|Length). (4 pts.)


mean and

E (Weight ∣ Length):
The relationship between weight (WTLB) and length (LGTHIN) is nonlinear and increasing. As the length

Var (Weight ∣ Length): The variance of weight given length seems to increase as length increases. This is shown by
increases, the expected weight also increases.

the spread of points, which widens slightly for larger lengths.


STAT 360 – Regression Analysis – Assignment 1 (48 pts.) DUE 1/22/25
by midnight
Fall 2024
2 – MN Walleyes (Datafiles: MN Walleyes.JMP and MN Walleyes.txt)

The variables in these data files:


 LGTHIN – length of walleye (in.)
 LGTHCM – length of walleye (cm)
 HGPPM – Mercury contamination in walleye fillet (ppm)
a) Create a binned version of length (in.) using 2.5 in width bins. Then

conditional distribution of HGPPM|Length using the binned length


examine the

variable.
E ( HGPPM|Length ) and ^
Construct a table in JMP giving ^ SD (HGPPM ∨Length)
and
discuss the results. (6 pts.)

Comparing the amount of mercury using conditional distribution of Length.


Walleyes with a higher length tend to have larger amount of HGPPM
For example, walleyes with a length range of 27.5-30 have higher amount of HGPPM
compared to smaller length ranges.
By looking at the ^
SD(HGPPM ∨Length) larger length Walleyes have a wider variation in
mercury content.
STAT 360 – Regression Analysis – Assignment 1 (48 pts.) DUE 1/22/25
by midnight
Fall 2024
b) What percent of the variation in the response (HGPPM) is explained by
the
regression on binned length? Interpret this value. (4 pts.)

30.46% of the variation in mercury (HGPPM) is explained by a regression model using


binned length.
The corrected SS value id 103.34 and the sum of RSSK is 71.86
R^2 = 1- (sum of RSSK/ corrected SS)
= 1-(0.70)
=30.46%
c) Show that the equation ^
E(E^ ( HGPPM| Length ) ) = ^
E ( HGPPM ) holds based on
the sample information. (6 pts.)

mean of HGPPM for each Length is 0.418 and the E(E(Y|X)) is 4.18 they are equal
E(Y|X) = E(E(Y|X))

d) BONUS QUESTION: Show that the equation


^
Var ( HGPPM ) = ^ ^
E (V ^
ar ( HGPPM |Length ) ) + Var ( ^
E ( HGPPM|Length ) )
holds based on the sample information. (6 pts.)

Variance of HGPPM is 0.168, variance of each length, E (Var (HGPPM|Length)) is 0.25,


Var( E(HGPPM|Length)) is 0.159.
STAT 360 – Regression Analysis – Assignment 1 (48 pts.) DUE 1/22/25
by midnight
Fall 2024
Var (HGPPM) = E (Var (HGPPM|Length)) + Var(E(HGPPM|Length))
0.168 != 0.25 + 0.159 ( I don’t know why)
After doing fit HGPPM by length I got the mean and Std table and Squared the Std to find
the variance. From there I got the variance of the mean and the mean of the variance using
distribution. I don’t know why I got the wrong answer.

e) Construct a scatterplot of HGPPM vs. LGTHIN and add lowest smooth


based
estimates of E ( HGPPM|Length ) & E(HGPPM ∨Length)± SD (HGPPM∨Length ) to
the
plot. Discuss. (6 pts.)

This red line shows there is a positive relationship between HGPPM and Length. The
purple and green lines show an upper and lower bound. As the length increases the
variability in HGPPM also increases. The upper and lower bounds get wider as the length
increases.

f) Repeat part (e) using the logarithm of HGPPM as the response. Discuss.
(6 pts.)
STAT 360 – Regression Analysis – Assignment 1 (48 pts.) DUE 1/22/25
by midnight
Fall 2024
The purple line shows a positive approximately linear like relationship between log HGPPM
and Length.
The red and green lines capture the variability in the transformed data. As their length
increases variability doesn’t change showing there is constant variance in the transformed
data.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy