Unit 4 Statistics Notes Scatter Plot 2023-24

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Academic year 2023 – 24

Unit: Reasoning with data (Statistics) Grade: MYP 5


Key concept: Relationships Related concept: Representation, Validity
Global context: Globalization and sustainability
Exploration: Students will explore Data-driven decision -making
SOI: Inquiring about the representation and validity of data can help to
establish the underlying relationships and trends thus enhancing our
decision-making skills.
Connection with SOI: Students will explore the validity and representation of
the data using different techniques of data collection and presentation and
analysis which will enhance the data driven decision making skill.

Scatter Plots and Best fit line

A Scatter Plot has points that show the relationship between two
sets of data. Each member of the data set gets plotted as a point
whose x-y coordinates relates to its values for the two variables.
Scatter plots provide a visual representation of the correlation, or
relationship between the two variables.

A line of best fit (or "trend" line) is a straight line that best
represents the data on a scatter plot. This line may pass through
some of the points, none of the points, or all of the points. It is
used to study the nature of relation between two variables.

1
Find the line of best fit.
Two points that seem to be on the red line are (3, 15) and (24, 13).

2
3
Types of correlation:

Positive Correlation: Positive correlation occurs when an


increase in one variable increases the value in another. The line
corresponding to the scatter plot is an increasing line.

Negative Correlation: Negative correlation occurs when


an increase in one variable decreases the value of another.
The line corresponding to the scatter plot is a decreasing
line.

4
No Correlation: No correlation occurs when there is
no linear dependency between the variables.

Perfect Correlation: Perfect correlation occurs when there


is a functional dependency between the variables. In this
case all the points are in a straight line.

Strong Correlation: A correlation is stronger the closer


the points are located to one another on the line.

5
Weak Correlation: A correlation is weaker the farther apart
the points are located to one another on the line.

6
7
Example: You might be familiar with calorie requirements for males, like the ones shown in
the table below. What type of correlation is exhibited by the data?

Calorie Requirements (Male), 1-59 years

If you draw a scatter plot of the data, you see that the x and y values tend to increase
together. Therefore, the data exhibits positive correlation. That is, as age increases so do
calorie requirements.

What are outliers in scatter plots?

Scatter plots often have a pattern. We call a data point an outlier if


it doesn't fit the pattern.

Consider the scatter plot above, which shows data for students on a
backpacking trip. (Each point represents a student.)

Notice how two of the points don't fit the pattern very well. These
points have been labelled Brad and Sharon, which are the names of
the students.

8
Sharon could be considered an outlier because she is carrying a
much heavier backpack than the pattern predicts.

Brad could be considered an outlier because he is carrying a much


lighter backpack than the pattern predicts.

When to use scatter plots?


To show the correlation that exists between two sets of
data. To represent a large amount of information
graphically.
Has applications in stock markets, population distribution etc.

Limitations of a Scatter Diagram

The following are a few limitations of a scatter diagram:

Scatter diagrams cannot give you the exact extent of correlation.


A scatter diagram does not show you the quantitative
measurement of the relationship between the variables. It only
shows the quantitative expression of quantitative change.
This chart does not show you the relationship for more than two
variables.

Benefits of a Scatter Diagram

The following are a few advantages of a scatter

diagram: It shows the relationship between two

variables.
It is the best method to show you a non-linear pattern.
The range of data flow, i.e. maximum and minimum value, can
be determined.
Observation and reading are
straightforward. Plotting the diagram is
easy.

Solved Examples:

9
1) Draw a line of best fit for the scatter plot given.

10
Solution: Draw a line through the maximum number of
points, balancing about an equal number of points above
and below the line.

2) The following table describes data for the number of people


using a swimming poolover 8 days in summer and the
corresponding maximum temperature (in degrees Celsius) on
each day.

a. Draw a scatterplot for this set of data.


b. Draw a line of best fit through the data by eye.
c. Is association positive or negative?
11
d. Is association weak or strong?
e. Use the line of best fit to predict the swimming pool
attendance where the daily maximum temperature is:
(i) 18 ºC (ii) 30 ºC (iii) 40 ºC

Solution:

a. The scatterplot is obtained by plotting y against x, as shown


below.

b. A line of best fit by eye is drawn through the scatterplot so


that an equal number of points lie on either side of the line
and/or the sum of the distances of the points above the line are
roughly equal to the sum of the distances below the line.

c. It is clear that y increases as x increases. So, the


association between the variables is positive.

d. The data is spread about the line. So, the association


between the variables is weak.

e.
(i) When x = 18, y = 260
So, about 260 people are expected to attend the swimming pool.

(ii) When x = 30, y = 400


So, about 400 people are expected to attend the swimming pool.

(iii) When x = 40, y = 520


So, about 520 people are expected to attend the swimming pool.
12
Let's describe the type of correlation shown in the scatterplot and explain the answer.

Source: CNN
This is a negative correlation. As the years get larger, the sales go down. This could be
because in the boom of online/digital and pirated music.

Now, let's find the linear equation of best fit for the data set above.
First, it can be very difficult to determine the “best” equation for a set of points. In
general, you can use these steps to help you.
Step 1: Draw the scatterplot on a graph.
Step 2: Sketch the line that appears to most closely follow the data. Try to have the
same number of points above and below the line.
Step 3: Choose two points on the line and estimate their coordinates. These points do
not have to be part of the original data set.
Step 4: Find the equation of the line that passes through the two points from Step 3.

Let’s use these steps on the graph above. We already have the scatterplot drawn, so
let’s sketch a couple lines to find the one that best fits the data.

From the lines in the graph, it looks like the purple line might be the best choice. The
red line looks good from 2006-2009, but in the beginning, all the data is above it. The
green line is well below all the early data as well. Only the purple line cuts through the
first few data points, and then splits the last few years. Remember, it is very important
to have the same number of points above and below the line.
13
Using the purple line, we need to find two points on it. The second point, crosses the
grid perfectly at (2000, 14). Be careful! Our graph starts at 1999, so that would be
considered zero. Therefore, (2000, 14) is actually (1, 14). The line also crosses perfectly
at (2007, 10) or (8, 10). Now, let’s find the slope and y−intercept.

However, the equation above assumes that x starts with zero. In actuality, we started with

1999, so our final equation is .

Using the line of best fit above, what would you expect music sales to be in 2010?
In this problem, we are using the line of best fit to predict data. Plug in 2010 for x and
solve for y.

14
References:

• “Outliers in Scatter Plots (Article).” Khan Academy, Khan Academy,


www.khanacademy.org/math/cc-eighth-grade-math/cc-8th-data/cc-
8th-interpreting-scatter- plots/a/outliers-in-scatter-plots.
• Types of Correlation,
www.ditutor.com/regression/types_correlation.ht
ml.“Line of Best Fit(Eyeball Method).” Line of Best
Fit (Eyeball Method),
• www.varsitytutors.com/hotmath/hotmath_help/topics/line-of-best-fit-eyeball-
method.

15

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy