0% found this document useful (0 votes)
18 views

Correlation N Regression

Training handout
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Correlation N Regression

Training handout
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 25

REGRESSION

AND
CORRELATION ANALYSIS
REGRESSION ANALYSIS

• Regression analysis is used to investigate and model the


relationship between a response variable and one or more
predictors.
• Very often when 2 (or more) variables are observed,
relationship between them can be visualized
• Regression analysis is used to help formulate these
predictions and relationships
REGRESSION ANALYSIS

 Observe and note what is happening in a


systematic way
 Form some kind of theory about the observed
facts
 Draw a scatter diagram to visualize relationship
 Generate the relationship by mathematical formula
 Make use of the mathematical formula to predict
REGRESSION ANALYSIS

• Linear Regression is to fit a line to the data,


producing an equation that shows the
relationship of the data, so that we might
predict one variable by measuring the other
variable.
Variation of Regression

Linear Regression

Where:
Y = Predictor Y = α + βX
X = Response
α = Intercept
β = Slope’s Gradient
Variation of Regression

Second order Regression

Where:
Y = Predictor
X = Response
α = Intercept
Y = α + β1X + β2X2
β2 = Coefficient
Variation of Regression

Third order Regression

Where:
Y = Predictor
X = Response
Y = α + β1X + β2X2 + β3X3
α = Intercept
β3 = Coefficient
Least Squares Method

• From a scatter diagram, there is virtually no limit as to the


number of lines that can be drawn to make a linear
relationship between the 2 variables
• The objective is to create a BEST FIT line to the data
concerned
• The criterion is the called the method of least squares
Least Squares Method
• The sum of squares of the vertical deviations
from the points to the line be a minimum (based
on the fact that the dependent variable is drawn on
the vertical axis)
• The linear relationship between the dependent
variable (Y) and the independent variable can be
written as Y = α + βX, where α and β are
parameters describing the vertical intercept and
the slope of the regression line respectively
Least Squares Method

Vertical Deviation

CLICK FOR SAMPLE EXERCISE


The coefficient of multiple determination (r2)

In a regression analysis, one way to measure how


well a straight line fits the data is to compute the
square of the correlation r2 . This statistic is
interpreted as the proportion of total variation in
the data explained by the straight-line relationship
with the explanatory variable.
The coefficient of multiple determination (r2)

An r2 value “close” to 1 is often taken


as evidence that the predictions made
using the model are going to be
adequate.

0 ≤ r2 ≤ 1
Correlation – Definition

Correlation calculates the Pearson product moment


coefficient of correlation (also called the correlation
coefficient or correlation) for pairs of variables. The
correlation coefficient is a measure of the degree of
linear relationship between two variables.
Variation of Correlation

No pattern. Data points are Negative correlation. Larger


scattered randomly in the values of one variable (input)
chart. associated with smaller
values of other variable
(effect).

Positive correlation. Larger Complex pattern. This often


values of one variable occur when there is come
(cause) associated with other factor at work that
larger values of other variable interact one of the factors.
(effect).
Suppose we wished to graph the relationship between
foot length and height of 20 subjects.

In order to create the graph, which is called a


scatterplot or scattergram, we need the foot length
and height for each of our subjects.

74

72
70
Height

68

66
64

62
60

58
4 6 8 10 12 14

Foot Length
1. Find 12 inches on the x-axis.
2. Find 70 inches on the y-axis.
3. Locate the
Assume
intersection
our first
of subject
12 and 70.
had a 12
4. Place a dot
inchatfoot
the and
intersection of 12 and
was 70 inches tall.70.

74

72

70
Height

68

66
64

62
60

58
4 6 8 10 12 14

Foot Length
5. Find 8 inches on the x-axis.
6. Find 62 inches on the y-axis.
Assume
7. Locate the that our
intersection of second subject
8 and 62.
hadatan
8. Place a dot the8 inch foot andofwas
intersection 62 62.
8 and
9. Continueinches
to plot tall.
points for each pair of scores.

74

72

70

68

66
64

62
60

58
4 6 8 10 12 14
Notice how the scores cluster to form a pattern.

The more closely they cluster to a line that is drawn


through them, the stronger the linear relationship between
the two variables is (in this case foot length and height).

74

72
70

68

66
64

62
60

58
4 6 8 10 12 14
Pearson's correlation coefficient ( r )

Measures the degree of linear relationship between


two variables. The correlation coefficient assumes
a value between -1 and +1. If one variable tends to
increase as the other decreases, the correlation
coefficient is negative. Conversely, if the two
variables tend to increase together the correlation
coefficient is positive.
If the points on the scatterplot 74

72

have an upward movement 70

68
from left to right, we say the 66

relationship between the 64

62

variables is positive. 60

58
4 6 8 10 12 14

74 If the points on the


72
70
scatterplot have a
68 downward movement from
66
64 left to right, we say the
62
60
relationship between the
58
4 6 8 10 12 14
variables is negative.
A positive relationship means that high scores on one
variable are associated with high scores on the other
variable
It also indicates that low scores on one variable
are associated with low scores on the other variable.

74

72
70

68

66
64

62
60

58
4 6 8 10 12 14
A negative relationship means that high scores on one
variable are associated with low scores on the other variable.

It also indicates that low scores on one variable


are associated with high scores on the other variable.

74

72
70

68

66
64

62
60

58
4 6 8 10 12 14
Not only do relationships have direction (positive and
negative), they also have strength (from 0.00 to 1.00 and
from 0.00 to –1.00).

The more closely the points cluster toward a straight line,


the stronger the relationship is.

r = 1.00
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
A set of scores with r= –0.60 has the same strength as
a set of scores with r= 0.60 because both sets cluster
similarly.
For this unit, we use Pearson’s r. This statistical
procedure can only be used when BOTH variables are
measured on a continuous scale and you wish to measure
a linear relationship.

NO
Pearson r
Linear Relationship Curvilinear Relationship

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy