0% found this document useful (0 votes)
123 views

Chapter 13

Here is the data collected: Age (years) Daily Spent Money ($) 8 5 10 7 12 10 14 15 16 20 18 25 To analyze this data: 1) Make a scatter plot with Age on the X-axis and Daily Spent Money on the Y-axis 2) Calculate the correlation coefficient r using the formula 3) Interpret the value of r The scatter plot shows a positive linear relationship between Age and Daily Spent Money. Using the formula, r is calculated to be 0.95 A value of r close to 1 indicates a strong positive correlation. Therefore, there is a strong positive correlation

Uploaded by

Hasan Hubail
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
123 views

Chapter 13

Here is the data collected: Age (years) Daily Spent Money ($) 8 5 10 7 12 10 14 15 16 20 18 25 To analyze this data: 1) Make a scatter plot with Age on the X-axis and Daily Spent Money on the Y-axis 2) Calculate the correlation coefficient r using the formula 3) Interpret the value of r The scatter plot shows a positive linear relationship between Age and Daily Spent Money. Using the formula, r is calculated to be 0.95 A value of r close to 1 indicates a strong positive correlation. Therefore, there is a strong positive correlation

Uploaded by

Hasan Hubail
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Statistical Inference

Chapter 13
Correlation and Linear Regression

By: Dr. Abdul Sattar Al-Azzawi


Ch 13 : Learning Objectives

LO13-1 Explain the purpose of correlation analysis


LO13-2 Calculate a correlation coefficient to test and interpret the relationship between two variables

LO13-3 Apply regression analysis to estimate the linear relationship between two variables
Ch 13 : Overall Concept

If Gulf Air wanted to explore the relationship between ticket


price and distance.

And if there is a correlation, then what percentage of the


variation in airfare is accounted for by distance?

How much does each additional mile add to the price?

In this chapter, we will be able to explore ways to study such


relationships between two variables in order to provide
information on ways to increase profits, methods to
decrease costs, or variables to predict demand.
Ch 13 : Overall Concept

In marketing products, many firms use price reductions through


coupons and discount pricing to increase sales.

To study the relationship between two variables:


Price reductions and sales.

In economics, you will find many relationships between two variables


that are the basis of economics, such as price and demand.

Is the relationship strong or weak?

Is it positive or negative?
Ch 13 : Correlation Analysis Concept

Correlation Analysis is used to report the relationship


between two variables.

CORRELATION ANALYSIS A group of techniques to measure


the relationship between two variables.

In addition to graphing techniques, we’ll develop numerical


measures to describe the relationships.

Examples
• Does the amount Healthtex spends per month on
training its sales force affect its monthly sales?
• Does the number of hours students study for an exam
influence the exam score?
Ch 13 : Scatter Diagram

• A scatter diagram is a graphic tool used to portray the


relationship between two variables.

• The independent variable is scaled on the X-axis and is


the variable used as the predictor.

• The dependent variable is scaled on the Y-axis and is the


variable being estimated.
Ch 13 : Scatter Diagram – Example 1

North American Copier Sales sells copiers to businesses of all sizes throughout the United States and
Canada. The new national sales manager is preparing a report and would like to show the importance of
making an extra sales call each day. She takes a random sample of 15 sales representatives and gathers
information on the number of sales calls made last month and the number of copiers sold.

What observations can you make about the relationship between the number of sales calls and the number
of copiers sold? Develop a scatter diagram to display the information.

Graphing the data in a


scatter diagram will make
the relationship between
sales calls and copiers sales
easier to see.
Ch 13 : Scatter Diagram – Example 1 - Answer

The scatter diagram of the data.

Sales
representatives
who make more
calls tend to sell
more copiers!
Ch 13 : Correlation Coefficient

We will measure the strength and direction of this relationship between two variables by determining the
correlation co-efficient.

CORRELATION COEFFICIENT A measure of the strength of the linear relationship between two variables.

Characteristics of the correlation coefficient are:

• The correlation coefficient is identified as r.


• It shows the direction and strength of the linear relationship
between two variables.
• It ranges from −1.00 to 1.00 indicating a perfect correlation.
• If it’s 0, there is no association.
• A value near 1.00 indicates a direct or positive correlation.
• A value near −1.00 indicates a negative correlation.
• Negative values indicate an inverse relationship and positive
values indicate a direct relationship.
Ch 13 : Correlation Coefficient

The following graphs summarize the strength and direction of the correlation coefficient

If there is absolutely no relationship between the two sets of variables, r is zero.

A correlation co-efficient r close to 0 (say, .08) shows that the linear relationship is quite weak.

Coefficients of −.91 and +.91 have equal strength; both indicate very strong correlation between the two
variables. Thus, the strength of the correlation does not depend on the direction (either − or +).
Ch 13 : Correlation Coefficient

The following graphs summarize the strength and direction of the correlation coefficient

Scatter diagrams for r = 0, a weak r (say, −.23), and a strong r (say, +.87).

Note that, if the correlation is weak, there is considerable scatter about a line drawn through the center of
the data.

For the scatter diagram representing a strong relationship, there is very little scatter about the line. This
indicates, in the example shown on the chart, that hours studied is a good predictor of exam score.
Ch 13 : Correlation Coefficient

!"#−%"#
OR r = %−1 '"'#

The correlation coefficient for our example is calculated by sum of the deviations from the mean number of
sales calls and the mean number of copiers sold; then multiply them. The sum of their product is 6,672 and
will be used in formula to find r. We also need the standard deviations of x & y. The result, r = 0.865
indicates a strong, positive relationship.
Ch 13 : Correlation Coefficient

The result, r = 0.865 indicates a strong, positive relationship because 0.865 is close to 1.00

!!"#
r= = 0.865
(%&'%)()#."!)(%#.+,)

We can use Excel to calculate the Correlation Coefficient r.


Ch 13 : Correlation Coefficient - Example

The Applewood Auto Group’s marketing department believes younger buyers purchase vehicles on which
lower profits are earned and older buyers purchase vehicles on which higher profits are earned. They
would like to use this information as part of an upcoming advertising campaign to try to attract older
buyers.

Develop a scatter diagram and then determine the correlation coefficient. Would this be a useful
advertising feature?
Ch 13 : Correlation Coefficient - Example

The data is available on the Blackboard content folder to calculate


the required r and to plot the scatter diagram.

The scatter diagram suggests that a positive relationship does exist


between age and profit, but it is not a strong relationship.

Next, calculate r, which is 0.262. The relationship is positive but


weak.

Therefore, the data does not support a business decision to create


an advertising campaign to attract older buyers!
Ch 13 : Correlation Coefficient - Example

Dan Ireland, the student body president at Toledo State University, is concerned about the cost to students
of textbooks. He believes there is a relationship between the number of pages in the text and the selling
price of the book.

To provide insight into the problem he selects a sample of eight textbooks currently on sale in the
bookstore.
Book Page Price($)
Introduction to History 500 84
• Draw a scatter diagram
Basic Algebra 700 75
• Calculate the correlation coefficient r? Introduction to Psychology 800 99
Introduction to Sociology 600 72
Business Management 400 69
Introduction to Biology 600 81
Fundamentals of Jazz 600 63
Principles of Nursing 800 93
Ch 13 : Correlation Coefficient - Example

Answer:

In order to draw the scatter diagram, we have to assign the X-axis for one of the variable say (no. of pages = X)
then the Y-axis will represent the other variable (Price =Y). Each book will be plotted on graph paper (X,Y) the
resulted graph represents the scatter diagram as illustrated below:

We can see from the graph that there is a positive moderate association between the two variables.
Ch 13 : Correlation Coefficient - Example

For the determination of the coefficient of correlation r, we need the following table:

r=
å ( X - X )(Y - Y ) = 7800
= 0.657
( n - 1) S x S y 7(138.87)(12.21)

The correlation between the number of pages and the selling price of the book is 0.657.
This indicates a moderate relationship between the two variables.
Ch 13 : Correlation Coefficient - Example

Or we can use Excel to calculate the Correlation Co-efficient.


Ch 13 : Correlation Coefficient - Example

A school head master wants to find the relationship between the age and the daily spent money ($) of
group of children in his school. A random sample of 6 children is selected, and the data is recorded in table
below.
Age (X) Money (Y)

• Compute the correlation coefficient between Age and 6 9


Money spent for this group of student. 10 13
8 12
• Analyze the answer.
7 11
7 10
10 11
Ch 13 : Correlation Coefficient - Example

Age (X) Money (Y) XY X2 Y2


6 9 54 36 81
10 13 130 100 169
8 12 96 64 144
7 11 77 49 121
7 10 70 49 100
10 11 110 100 121
48 66 537 398 736

48 66
• "! = 6 = 8 '! = 6 = 11

398−(6∗82 ) 14
• )* = 6−1
= 5
= 2.8 =1.67332

736−(6∗112 ) 10
• )4 = 6−1
= 5
= 2 = 1.414
Ch 13 : Correlation Coefficient - Example

By Excel:

#$%&((∗*∗++) /
! = #∗+.(%$∗+..+. = ++.*+ = 0.12

Therefore, It’s a strong positive relationship


Regression Analysis
Ch 13 : Regression Analysis

Next we will develop an equation to express the relationship between variables.

This will allow us to estimate one variable on the basis of another.

This is called Regression Analysis.

We will also evaluate the ability of the equation to accurately make estimations.
Ch 13 : Regression Analysis

• Regression analysis is another way to evaluate a linear relationship between 2 variables.


• In regression analysis, we estimate one variable based on another variable.
• The variable being estimated/predicted is the dependent variable.
• The variable used to make the estimate/predict the value is the independent variable.
• The relationship between the variables is linear.
• In Regression Analysis we use the independent variable (X) to estimate the dependent variable (Y).

REGRESSION EQUATION An equation that expresses the linear relationship between two variables.
Ch 13 : Regression Analysis

• In regression analysis, our objective is to use the data to position a line that best represents the
relationship between two variables.
• The first approach is to use a scatter diagram to visually position the line.
• A line is then drawn across the scattered dots that best fits between the data through a method that
results in a single best regression line.
Ch 13 : Regression Analysis

• This is the equation of a line:

• y! is the estimated value of y for a selected value of x


• a is the constant or intercept
• b is the slope of the fitted line
• x is the value of the independent variable

• The formulas for a and b are:

"#$−&'(
OR b = &−1 *'2
Ch 13 : Regression Analysis

A positive value for b reflects a direct relationship between the two variables and a negative value
indicates an inverse relationship.

!"#$%"#
OR b = %$& '"(
Ch 13 : Regression Analysis - Example

Recall the example of North American Copier Sales. The sales manager gathered information on the number
of sales calls made and the number of copiers sold. Use the least squares method to determine a linear
equation to express the relationship between the two variables.

Answer:
} The first step is to find the slope of the least squares regression line, b (we already found r previously)

} Next, find a

} Then determine the regression line

} So if a salesperson makes 100 calls (x), he or she can expect to sell 46.0432 copiers (y)
Ch 13 : Drawing the Regression Line

The least squares equation can be drawn on the scatter diagram.

For example, the fifth sales representative is Jeff Hall. He made 164 calls. His estimated number of copiers
sold is 62.7344. The plot x = 164 and y! = 62.7344 is located by moving to 164 on the x-axis and then going
vertically to 63.7344. The other points on the regression equation can be determined by substituting a
particular value of x into the regression equation and calculating y!.
Ch 13 : The Standard Error of Estimate

The results of the regression analysis for North American Copier Sales show a significant relationship between
number of sales calls and the number of sales made.
By substituting the names of the variables into the equation, it can be written as:

Number of copiers sold = 19.9632 + 0.2608 (Number of sales calls)

If the number of sales calls is 84, then we can predict the


number of copiers sold. It is 41.8704, found by 19.9632 +
0.2608(84).

However, the data show two sales representatives with


84 sales calls and 30 and 43 copiers sold.

So, is the regression equation a good predictor of


“Number of copiers sold”?
Ch 13 : The Standard Error of Estimate

Then we need a measure that describes how inaccurate the estimate might be. This measure is called the
standard error of estimate.

It is the same concept as the standard deviation discussed in Chapter 3 where the standard deviation
measures the dispersion around the mean.

For example:

• A large electronics firm with a stock option plan for employees. Suppose there is a relationship between the number of
years employed and the number of shares owned. If we observe all employees with 20 years of service, they would most
likely own different numbers of shares.

• A real estate developer studied the relationship between the income of buyers and the size of the home they purchased.
All buyers with an income of $70,000 will not purchase a home of exactly the same size.
Ch 13 : The Standard Error of Estimate

• The standard error of estimate measures the dispersion around the regression line:
STANDARD ERROR OF ESTIMATE A measure of the dispersion, or scatter, of the
observed values around the line of regression for a given value of x.

• It is in the same units as the dependent variable.


• It is based on squared deviations from the regression line.
• Small values indicate that the points cluster closely about the regression line.

• It is computed using the following formula:


Ch 13 : The Standard Error of Estimate - Example

We calculate the standard error of estimate in this example.

We need the sum of the squared differences between each observed value of y and the predicted value
of y.

We can use a spreadsheet to help with the


calculations.
Ch 13 : The Standard Error of Estimate - Example

The standard error of


estimate is 6.720

If the standard error of estimate is small, this indicates that the data are relatively close to the regression
line and the regression equation can be used.

If it is large, the data are widely scattered around the regression line and the regression equation will not
provide a precise estimate of y.
Ch 13 : The Standard Error of Estimate - Example

Using Excel sheet to solve the same example:


Ch 13 : Coefficient of Determination (r2)

Another statistic provides a measure of a regression equation’s ability to predict.


It is called the Coefficient of Determination, or R-square:

Coefficient of Determination : The proportion of the total variation in the dependent variable Y that is explained, or
accounted for, by the variation in the independent variable X.

• It ranges from 0 to 1.0.


• It is the square of the correlation coefficient.
• It is found from the following formula:

!"#$%"# 2
Coefficient of Determination = (r2) = ( )
%$& '"'#
Ch 13 : Coefficient of Determination (r2) - Example

In the North American Copier Sales example, the correlation coefficient was 0.865; just
square that (0.865)2 = 0.748; this is the Coefficient of Determination .

This means 74.8% of the variation in the number of copiers sold is explained by the variation
in sales calls.
Ch 13 : Coefficient of Determination (r2) - Example

In Example 3 , at Toledo State University, where the relationship between the number of
pages in the text and the selling price of the book is studied. The correlation coefficient was
0.657; just square that (0.657)2 = 0.432; this is the Coefficient of Determination .

This means %43 of the variation in the price is explained by the number of pages.
Ch 13 : Coefficient of Determination (r2) - Example

Going back to our example of North American Copier Sales. How well can the regression
equation predict number of copiers sold with number of sales calls made?

If it were possible to make perfect predictions, the coeffcient of determination would be


100%. That would mean that the independent variable, number of sales calls, explains or
accounts for all the variation in the number of copiers sold.

Our analysis shows that only 74.8% of the variation in copiers sold is explained by the number
of sales calls. Clearly, these data do not form a perfect line. Instead, the data are scattered
around the best-fitting, least squares regression line, and there will be error in the predictions.

In the next section, the standard error of estimate is used to provide more specific
information regarding the error associated with using the regression equation to make
predictions.
Ch 13 : Coefficient of Determination (r2) - Example

A school head master wants to find the relationship between the age and the daily spent
money ($) of group of children in his school. A random sample of 6 children is selected, and
the data recorded in the table below.

1. Find the regression equation.


2. Estimate the money spent by a child with age of 9
years old.
3. Find the Standard Error of Estimate .
4. Find the Coefficient of Determination .
Ch 13 : Coefficient of Determination (r2) - Example

Age (X) Money (Y) XY X2 Y2


6 9 54 36 81
10 13 130 100 169
8 12 96 64 144
7 11 77 49 121
7 10 70 49 100
10 11 110 100 121
48 66 537 398 736

537−(6∗8∗11) 9
1. b= 5∗(1.673)2
= 14 = 0.643
a = 11-(0.643*8)=5.857
2 5.857 + 0.643 X
3=

2. y2 9= 5.857 + (0.643*9)=11.644
Ch 13 : Coefficient of Determination (r2) - Example

3-To find the standard error


Age Money y /
Estimated 0 y-/
0 0)2
(y-/
x
! 5.857 + 0.643 X
"= 6 9 9.715 0.715 .511
10 13 12.287 0.713 0.508

8 12 11.001 0.999 0.998


∑(y−")2
! 4.213 7 11 10.358 0.642 0.412
Sy.x = )−2
= 6−2
= 1.026
7 10 10.358 -0.358 0.128
10 11 12.287 -0.1.287 1.656
4.213

4- r2= (0.76)2= 0.5776


Questions ?

End of Chapter 13

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy