0% found this document useful (0 votes)
7 views18 pages

Workbook.regression

The document discusses various statistical methods including scatterplots, regression analysis, correlation coefficients, and chi-square tests. It includes examples of data sets related to baby girls' weight and length, global sea surface temperatures, and sales data affected by temperature. Additionally, it addresses how to interpret results from these analyses to make predictions and understand trends.

Uploaded by

kart238
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views18 pages

Workbook.regression

The document discusses various statistical methods including scatterplots, regression analysis, correlation coefficients, and chi-square tests. It includes examples of data sets related to baby girls' weight and length, global sea surface temperatures, and sales data affected by temperature. Additionally, it addresses how to interpret results from these analyses to make predictions and understand trends.

Uploaded by

kart238
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Regression

SCATTERPLOTS AND REGRESSION

1. The table gives weight in pounds and length in inches for 3-month-old
baby girls. Graph the points from the table in a scatterplot and describe
the trend.

Weight (lbs) Length (in)

9.7 21.6

10.2 22.1

12.4 23.6

13.6 25.1

9.8 22.4

11.2 23.9

14.1 25.8

2. The following values have been computed for a data set of 14 points.
Calculate the line of best fit.


x = 86


y = 89.7


xy = 680.46

x 2 = 654.56

1
3. For the data set given in the table, calculate each of the following
values:

x ,(
∑ )
2
2
∑ ∑ ∑ ∑
n, x, y, xy, x

Month 1 2 3 4 5 6 7 8 9 10 11 12

Temperature 73 73 75 75 77 79 79 81 81 81 77 75

4. Use the Average Global Sea Surface Temperatures data shown in the
table to create a line of best fit for the data. Consider 1910 as year 10. Use
the equation to predict the average global sea surface temperature in the
year 2050.

2
Year Temperature, F

1910 -1.11277

1920 -0.71965

1930 -0.58358

1940 -0.17977

1950 -0.55318

1960 -0.30358

1970 -0.30863

1980 0.077197

1990 0.274842

2000 0.232502

2010 0.612718

5. Compare the scatterplots. The second graph includes extra data


starting in 1880. How does this compare to the plot that only shows 1910 to
2010? Explain trends in the data, and how the regression line changes by
adding in these extra points. Which trend line would be best for predicting
the temperature in 2050?

3
Average Global Sea Surface Temperatures, 1910-2010
0.9

0.6

0.3
Temperature

-0.3

-0.6

-0.9

-1.2

-1.5
1920 1940 1960 1980 2000 2020

Year

Average Global Sea Surface Temperatures, 1880-2010


0.9

0.6

0.3
Temperature

-0.3

-0.6

-0.9

-1.2

-1.5
1880 1900 1920 1940 1960 1980 2000 2020

Year

4
6. A small coffee shop wants to know how hot chocolate sales are
affected by daily temperature. Find the rate of change of hot chocolate
sales, with respect to temperature.

Daily Temperature, F Hot Chocolate Sales

28 110

29 115

31 108

33 103

45 95

48 93

55 82

57 76

5
CORRELATION COEFFICIENT AND THE RESIDUAL

1. What does the shape of this residual plot tell us about the line of best
fit that was created for the data?

2. What does the shape of this residual plot tell us about the line of best
fit that was created for the data?

6
3. Calculate and interpret the correlation coefficient for the data set.

x y

54 0.162

57 0.127

62 0.864

77 0.895

81 0.943

93 1.206

7
4. Calculate the residuals, draw the residual plot, and interpret the
results. Compare the results to the r-value in the previous problem. The
equation of the line of best fit for the data is

ŷ = 0.0257x − 1.1142

x y

54 0.162

57 0.127

62 0.864

77 0.895

81 0.943

93 1.206

5. The table shows average global sea surface temperature by year.


Calculate and interpret the correlation coefficient for the data set. Leave
the years as they are.

8
Year Temperature, F

1880 -0.47001

1890 -0.88758

1900 -0.48331

1910 -1.11277

1920 -0.71965

1930 -0.58358

1940 -0.17977

1950 -0.55318

1960 -0.30358

1970 -0.30863

1980 0.077197

1990 0.274842

2000 0.232502

2010 0.612718

6. Calculate the residuals and create the residual plot for the data in the
table. Compare this with the r-value we calculated in the last question and
interpret the results. Use the equation for the regression line
ŷ = 0.0143x − 28.332.

9
Year Temperature, F

1880 -0.47001

1890 -0.88758

1900 -0.48331

1910 -1.11277

1920 -0.71965

1930 -0.58358

1940 -0.17977

1950 -0.55318

1960 -0.30358

1970 -0.30863

1980 0.077197

1990 0.274842

2000 0.232502

2010 0.612718

10
COEFFICIENT OF DETERMINATION AND RMSE

1. Linda read an article about the predictions of high school students and
their GPA. The article studied three factors, the number of volunteer
organizations each student participated in, the number of hours spent on
homework, and the student’s individual scores on standardized tests.

The article concluded that the number of hours spent on homework are
the best predictor of GPA, because they found 24 % of the variance in GPA
to be from hours spent on homework, 15 % from the number of volunteer
organizations, and 11.5 % from individual scores on standardized tests.

What is the coefficient of determination for the line-of-best-fit that has y


-values of high school GPA and x-values of hours spent on homework? Is
the line of best fit a good predictor of the data? Why or why not?

2. For the data in the table, calculate the sum of the squared residuals
based on the mean of the y-values.

x y

1 3.1

2 3.4

3 3.7

4 3.9

5 4.1

11
3. Use the same data as the previous question to calculate the sum of
the squared residuals based on the least squares regression line,
ŷ = 0.25x + 2.89.

4. Based on the previous two questions, in which we found the sum of


the squared residuals based on the mean of the y-values and then the line
of best fit, what percentage of error did we eliminate by using the least
squares line? What is the term for this error?

5. What is the RMSE of the data set and what does it mean?

x y

1 3.1

2 3.4

3 3.7

4 3.9

5 4.1

6. Calculate the RMSE for the data set, given that the least squares line is
ŷ = 0.0028x + 1.2208.

12
x y

5 1.25

10 1.29

12 1.17

15 1.24

17 1.32

13
CHI-SQUARE TESTS

1. We want to know whether a person’s geographic region of the United


State affects their preference of cell phone brand. We randomly sample
people across the country and ask them about their brand preference.
What can we conclude using a chi-square test at 95 % confidence?

iPhone Android Other Totals

Northeast 72 33 8 113

Southeast 48 26 7 81

Midwest 107 50 10 167

Northwest 59 33 10 102

Southwest 61 27 9 97

Totals 347 169 44 560

2. A beverage company wants to know if gender affects which of their


products people prefer. They take a random sample of fewer than 10 % of
their customers, and ask them in a blind taste test which beverage they
prefer. What can the company conclude using a chi-square test at α = 0.1?

14
Beverage

A B C Totals

Men 35 34 31 100

Women 31 33 36 100

Totals 66 67 67 200

3. A coffee company wants to know whether or not drink and pastry


choice are related among their customers. The company randomly
sampled fewer than 10 % of their customers, and recorded their drink and
pastry orders. What can the restaurant conclude using a chi-square test at
99 % confidence?

Bagel Muffin Totals

Coffee 38 34 72

Tea 25 29 54

Totals 63 63 126

4. A school district wants to know whether or not GPA is affected by


elective preference. They randomly sampled fewer than 10 % of their
students, and recorded their elective preference an GPA. What can the
school district conclude using a chi-square test at α = 0.1?

15
GPA range

<2 2 3 4+ Totals

Music 12 26 31 34 103

Theater 21 22 23 21 87

Art 36 29 29 32 126

Totals 69 77 83 87 316

5. An airline wants to know if people travel constantly throughout the


year, or if travel is more concentrated at specific times. They recorded
flights taken each quarter, and recorded them in a table (in hundreds of
thousands). What can the airline conclude using a chi-square test at 95 %
confidence?

Quarter Jan-Mar Apr-Jun Jul-Sep Oct-Dec Total

Flights 3.97 4.58 4.73 5.14 18.42

6. A sandwich company wants to know how their sales are affected by


time of day. They recorded sandwiches sold during each part of the day.
What can the sandwich company conclude using a chi-square test at
α = 0.1?

Time of day Midday Afternoon Evening Total

Sales 213 208 221 642

16
17

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy