Chapter 13
Chapter 13
Chapter 13
Correlation and Linear Regression
LO13-3 Apply regression analysis to estimate the linear relationship between two variables
Ch 13 : Overall Concept
Is it positive or negative?
Ch 13 : Correlation Analysis Concept
Examples
• Does the amount Healthtex spends per month on
training its sales force affect its monthly sales?
• Does the number of hours students study for an exam
influence the exam score?
Ch 13 : Scatter Diagram
North American Copier Sales sells copiers to businesses of all sizes throughout the United States and
Canada. The new national sales manager is preparing a report and would like to show the importance of
making an extra sales call each day. She takes a random sample of 15 sales representatives and gathers
information on the number of sales calls made last month and the number of copiers sold.
What observations can you make about the relationship between the number of sales calls and the number
of copiers sold? Develop a scatter diagram to display the information.
Sales
representatives
who make more
calls tend to sell
more copiers!
Ch 13 : Correlation Coefficient
We will measure the strength and direction of this relationship between two variables by determining the
correlation co-efficient.
CORRELATION COEFFICIENT A measure of the strength of the linear relationship between two variables.
The following graphs summarize the strength and direction of the correlation coefficient
A correlation co-efficient r close to 0 (say, .08) shows that the linear relationship is quite weak.
Coefficients of −.91 and +.91 have equal strength; both indicate very strong correlation between the two
variables. Thus, the strength of the correlation does not depend on the direction (either − or +).
Ch 13 : Correlation Coefficient
The following graphs summarize the strength and direction of the correlation coefficient
Scatter diagrams for r = 0, a weak r (say, −.23), and a strong r (say, +.87).
Note that, if the correlation is weak, there is considerable scatter about a line drawn through the center of
the data.
For the scatter diagram representing a strong relationship, there is very little scatter about the line. This
indicates, in the example shown on the chart, that hours studied is a good predictor of exam score.
Ch 13 : Correlation Coefficient
!"#−%"#
OR r = %−1 '"'#
The correlation coefficient for our example is calculated by sum of the deviations from the mean number of
sales calls and the mean number of copiers sold; then multiply them. The sum of their product is 6,672 and
will be used in formula to find r. We also need the standard deviations of x & y. The result, r = 0.865
indicates a strong, positive relationship.
Ch 13 : Correlation Coefficient
The result, r = 0.865 indicates a strong, positive relationship because 0.865 is close to 1.00
!!"#
r= = 0.865
(%&'%)()#."!)(%#.+,)
The Applewood Auto Group’s marketing department believes younger buyers purchase vehicles on which
lower profits are earned and older buyers purchase vehicles on which higher profits are earned. They
would like to use this information as part of an upcoming advertising campaign to try to attract older
buyers.
Develop a scatter diagram and then determine the correlation coefficient. Would this be a useful
advertising feature?
Ch 13 : Correlation Coefficient - Example
Dan Ireland, the student body president at Toledo State University, is concerned about the cost to students
of textbooks. He believes there is a relationship between the number of pages in the text and the selling
price of the book.
To provide insight into the problem he selects a sample of eight textbooks currently on sale in the
bookstore.
Book Page Price($)
Introduction to History 500 84
• Draw a scatter diagram
Basic Algebra 700 75
• Calculate the correlation coefficient r? Introduction to Psychology 800 99
Introduction to Sociology 600 72
Business Management 400 69
Introduction to Biology 600 81
Fundamentals of Jazz 600 63
Principles of Nursing 800 93
Ch 13 : Correlation Coefficient - Example
Answer:
In order to draw the scatter diagram, we have to assign the X-axis for one of the variable say (no. of pages = X)
then the Y-axis will represent the other variable (Price =Y). Each book will be plotted on graph paper (X,Y) the
resulted graph represents the scatter diagram as illustrated below:
We can see from the graph that there is a positive moderate association between the two variables.
Ch 13 : Correlation Coefficient - Example
For the determination of the coefficient of correlation r, we need the following table:
r=
å ( X - X )(Y - Y ) = 7800
= 0.657
( n - 1) S x S y 7(138.87)(12.21)
The correlation between the number of pages and the selling price of the book is 0.657.
This indicates a moderate relationship between the two variables.
Ch 13 : Correlation Coefficient - Example
A school head master wants to find the relationship between the age and the daily spent money ($) of
group of children in his school. A random sample of 6 children is selected, and the data is recorded in table
below.
Age (X) Money (Y)
48 66
• "! = 6 = 8 '! = 6 = 11
398−(6∗82 ) 14
• )* = 6−1
= 5
= 2.8 =1.67332
736−(6∗112 ) 10
• )4 = 6−1
= 5
= 2 = 1.414
Ch 13 : Correlation Coefficient - Example
By Excel:
#$%&((∗*∗++) /
! = #∗+.(%$∗+..+. = ++.*+ = 0.12
We will also evaluate the ability of the equation to accurately make estimations.
Ch 13 : Regression Analysis
REGRESSION EQUATION An equation that expresses the linear relationship between two variables.
Ch 13 : Regression Analysis
• In regression analysis, our objective is to use the data to position a line that best represents the
relationship between two variables.
• The first approach is to use a scatter diagram to visually position the line.
• A line is then drawn across the scattered dots that best fits between the data through a method that
results in a single best regression line.
Ch 13 : Regression Analysis
"#$−&'(
OR b = &−1 *'2
Ch 13 : Regression Analysis
A positive value for b reflects a direct relationship between the two variables and a negative value
indicates an inverse relationship.
!"#$%"#
OR b = %$& '"(
Ch 13 : Regression Analysis - Example
Recall the example of North American Copier Sales. The sales manager gathered information on the number
of sales calls made and the number of copiers sold. Use the least squares method to determine a linear
equation to express the relationship between the two variables.
Answer:
} The first step is to find the slope of the least squares regression line, b (we already found r previously)
} Next, find a
} So if a salesperson makes 100 calls (x), he or she can expect to sell 46.0432 copiers (y)
Ch 13 : Drawing the Regression Line
For example, the fifth sales representative is Jeff Hall. He made 164 calls. His estimated number of copiers
sold is 62.7344. The plot x = 164 and y! = 62.7344 is located by moving to 164 on the x-axis and then going
vertically to 63.7344. The other points on the regression equation can be determined by substituting a
particular value of x into the regression equation and calculating y!.
Ch 13 : The Standard Error of Estimate
The results of the regression analysis for North American Copier Sales show a significant relationship between
number of sales calls and the number of sales made.
By substituting the names of the variables into the equation, it can be written as:
Then we need a measure that describes how inaccurate the estimate might be. This measure is called the
standard error of estimate.
It is the same concept as the standard deviation discussed in Chapter 3 where the standard deviation
measures the dispersion around the mean.
For example:
• A large electronics firm with a stock option plan for employees. Suppose there is a relationship between the number of
years employed and the number of shares owned. If we observe all employees with 20 years of service, they would most
likely own different numbers of shares.
• A real estate developer studied the relationship between the income of buyers and the size of the home they purchased.
All buyers with an income of $70,000 will not purchase a home of exactly the same size.
Ch 13 : The Standard Error of Estimate
• The standard error of estimate measures the dispersion around the regression line:
STANDARD ERROR OF ESTIMATE A measure of the dispersion, or scatter, of the
observed values around the line of regression for a given value of x.
We need the sum of the squared differences between each observed value of y and the predicted value
of y.
If the standard error of estimate is small, this indicates that the data are relatively close to the regression
line and the regression equation can be used.
If it is large, the data are widely scattered around the regression line and the regression equation will not
provide a precise estimate of y.
Ch 13 : The Standard Error of Estimate - Example
Coefficient of Determination : The proportion of the total variation in the dependent variable Y that is explained, or
accounted for, by the variation in the independent variable X.
!"#$%"# 2
Coefficient of Determination = (r2) = ( )
%$& '"'#
Ch 13 : Coefficient of Determination (r2) - Example
In the North American Copier Sales example, the correlation coefficient was 0.865; just
square that (0.865)2 = 0.748; this is the Coefficient of Determination .
This means 74.8% of the variation in the number of copiers sold is explained by the variation
in sales calls.
Ch 13 : Coefficient of Determination (r2) - Example
In Example 3 , at Toledo State University, where the relationship between the number of
pages in the text and the selling price of the book is studied. The correlation coefficient was
0.657; just square that (0.657)2 = 0.432; this is the Coefficient of Determination .
This means %43 of the variation in the price is explained by the number of pages.
Ch 13 : Coefficient of Determination (r2) - Example
Going back to our example of North American Copier Sales. How well can the regression
equation predict number of copiers sold with number of sales calls made?
Our analysis shows that only 74.8% of the variation in copiers sold is explained by the number
of sales calls. Clearly, these data do not form a perfect line. Instead, the data are scattered
around the best-fitting, least squares regression line, and there will be error in the predictions.
In the next section, the standard error of estimate is used to provide more specific
information regarding the error associated with using the regression equation to make
predictions.
Ch 13 : Coefficient of Determination (r2) - Example
A school head master wants to find the relationship between the age and the daily spent
money ($) of group of children in his school. A random sample of 6 children is selected, and
the data recorded in the table below.
537−(6∗8∗11) 9
1. b= 5∗(1.673)2
= 14 = 0.643
a = 11-(0.643*8)=5.857
2 5.857 + 0.643 X
3=
2. y2 9= 5.857 + (0.643*9)=11.644
Ch 13 : Coefficient of Determination (r2) - Example
End of Chapter 13