Unit 4 Statistics Notes Scatter Plot 2023-24
Unit 4 Statistics Notes Scatter Plot 2023-24
Unit 4 Statistics Notes Scatter Plot 2023-24
A Scatter Plot has points that show the relationship between two
sets of data. Each member of the data set gets plotted as a point
whose x-y coordinates relates to its values for the two variables.
Scatter plots provide a visual representation of the correlation, or
relationship between the two variables.
A line of best fit (or "trend" line) is a straight line that best
represents the data on a scatter plot. This line may pass through
some of the points, none of the points, or all of the points. It is
used to study the nature of relation between two variables.
1
Find the line of best fit.
Two points that seem to be on the red line are (3, 15) and (24, 13).
2
3
Types of correlation:
4
No Correlation: No correlation occurs when there is
no linear dependency between the variables.
5
Weak Correlation: A correlation is weaker the farther apart
the points are located to one another on the line.
6
7
Example: You might be familiar with calorie requirements for males, like the ones shown in
the table below. What type of correlation is exhibited by the data?
If you draw a scatter plot of the data, you see that the x and y values tend to increase
together. Therefore, the data exhibits positive correlation. That is, as age increases so do
calorie requirements.
Consider the scatter plot above, which shows data for students on a
backpacking trip. (Each point represents a student.)
Notice how two of the points don't fit the pattern very well. These
points have been labelled Brad and Sharon, which are the names of
the students.
8
Sharon could be considered an outlier because she is carrying a
much heavier backpack than the pattern predicts.
variables.
It is the best method to show you a non-linear pattern.
The range of data flow, i.e. maximum and minimum value, can
be determined.
Observation and reading are
straightforward. Plotting the diagram is
easy.
Solved Examples:
9
1) Draw a line of best fit for the scatter plot given.
10
Solution: Draw a line through the maximum number of
points, balancing about an equal number of points above
and below the line.
Solution:
e.
(i) When x = 18, y = 260
So, about 260 people are expected to attend the swimming pool.
Source: CNN
This is a negative correlation. As the years get larger, the sales go down. This could be
because in the boom of online/digital and pirated music.
Now, let's find the linear equation of best fit for the data set above.
First, it can be very difficult to determine the “best” equation for a set of points. In
general, you can use these steps to help you.
Step 1: Draw the scatterplot on a graph.
Step 2: Sketch the line that appears to most closely follow the data. Try to have the
same number of points above and below the line.
Step 3: Choose two points on the line and estimate their coordinates. These points do
not have to be part of the original data set.
Step 4: Find the equation of the line that passes through the two points from Step 3.
Let’s use these steps on the graph above. We already have the scatterplot drawn, so
let’s sketch a couple lines to find the one that best fits the data.
From the lines in the graph, it looks like the purple line might be the best choice. The
red line looks good from 2006-2009, but in the beginning, all the data is above it. The
green line is well below all the early data as well. Only the purple line cuts through the
first few data points, and then splits the last few years. Remember, it is very important
to have the same number of points above and below the line.
13
Using the purple line, we need to find two points on it. The second point, crosses the
grid perfectly at (2000, 14). Be careful! Our graph starts at 1999, so that would be
considered zero. Therefore, (2000, 14) is actually (1, 14). The line also crosses perfectly
at (2007, 10) or (8, 10). Now, let’s find the slope and y−intercept.
However, the equation above assumes that x starts with zero. In actuality, we started with
Using the line of best fit above, what would you expect music sales to be in 2010?
In this problem, we are using the line of best fit to predict data. Plug in 2010 for x and
solve for y.
14
References:
15