0% found this document useful (0 votes)
36 views

Correlation and Regression

Here are the key steps to find the partial correlation coefficient between X and Y holding Z constant in a trivariate distribution: 1) Find the correlation coefficient rxy between X and Y. 2) Find the correlation coefficients rxz and ryz between (X,Z) and (Y,Z) respectively. 3) Calculate the partial correlation coefficient rxy.z between X and Y holding Z constant as: rxy.z = (rxy - rxz * ryz) / √(1-rxz^2) * √(1-ryz^2) So in this problem, we would: 1) Be given the values of rxy, rx

Uploaded by

Nandhika Ravuri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Correlation and Regression

Here are the key steps to find the partial correlation coefficient between X and Y holding Z constant in a trivariate distribution: 1) Find the correlation coefficient rxy between X and Y. 2) Find the correlation coefficients rxz and ryz between (X,Z) and (Y,Z) respectively. 3) Calculate the partial correlation coefficient rxy.z between X and Y holding Z constant as: rxy.z = (rxy - rxz * ryz) / √(1-rxz^2) * √(1-ryz^2) So in this problem, we would: 1) Be given the values of rxy, rx

Uploaded by

Nandhika Ravuri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Module 3

Correlation and regression


Correlation
• Correlations tell us to the degree that two variables are similar or associated
with each other. It is a measure of association.
• When two variables are related in such a way that a change in the value of one is
accompanied either by a direct change or by an inverse change in the values of
the other, the two variables are said to be correlated.
• A greater change in one variable resulting in a corresponding greater or smaller
change in the other variable is also known as correlation.

Example: Relationship exists between the price and demand of a commodity


because keeping other things equal, an increase in the price of a commodity shall
cause a decrease in the demand for that commodity. Relationship might exist
between the heights and weights of the students and between amount of rainfall in
a city and the sales of raincoats in that city.
Types of Correlation
Correlation can be categorized as one of the following:
(i) Positive and Negative,
(ii) Simple and Multiple.
(iii) Partial and Total.
(iv) Linear and Non-Linear (Curvilinear)

(i) Positive and Negative Correlation : Positive or direct Correlation refers to the
movement of variables in the same direction. The correlation is said to be positive when
the increase (decrease) in the value of one variable is accompanied by an increase
(decrease) in the value of other variable also.
Negative or inverse correlation refers to the movement of the variables in
opposite direction. Correlation is said to be negative, if an increase (decrease) in
the value of one variable is accompanied by a decrease (increase) in the value of
other.
(ii) Simple and Multiple Correlation : Under simple correlation, we study the relationship
between two variables only i.e., between the yield of wheat and the amount of rainfall or
between demand and supply of a commodity. In case of multiple correlation, the
relationship is studied among three or more variables.
For example, the relationship of yield of wheat may be studied with both chemical
fertilizers and the pesticides.

(iii) Partial and Total Correlation : There are two categories of multiple correlation
analysis. Under partial correlation, the relationship of two or more variables is studied in
such a way that only one dependent variable and one independent variable is considered
and all others are kept constant.
For example, coefficient of correlation between yield of wheat and chemical fertilizers
excluding the effects of pesticides and manures is called partial correlation. Total
correlation is based upon all the variables.
(iv) Linear and Non-Linear Correlation: When the amount of change in one variable
tends to keep a constant ratio to the amount of change in the other variable, then the
correlation is said to be linear. But if the amount of change in one variable does not
bear a constant ratio to the amount of change in the other variable then the
correlation is said to be non-linear. The distinction between linear and non-linear is
based upon the consistency of the ratio of change between the variables.

Methods of Studying Correlation:


There are different methods which helps us to find out whether the variables are
related or not.
1. Scatter Diagram Method.
2. Karl Pearson’s Coefficient of correlation.
3. Rank Method.
Scatter diagram
• Rectangular coordinate
• Two quantitative variables
• One variable is called independent (X) and the second is
called dependent (Y)
• Points are not joined
• No frequency table
Y
* *
*
X
Example Wt. 67 69 85 83 74 81 97 92 114 85
(kg)
SBP 120 125 140 160 130 180 150 140 200 130
(mmHg)

220

200

180

160

140

120

100

wt (kg)
80
60 70 80 90 100 110 120

Scatter diagram of weight and systolic blood pressure


220

200

180

160

140

120

100

80 Wt (kg)
60 70 80 90 100 110 120

Scatter diagram of weight and systolic blood pressure


Scatter plots

The pattern of data is indicative of the type of relationship


between your two variables:
positive relationship
negative relationship
no relationship
Positive relationship
18

16

14

12
Height in CM

10

0
0 10 20 30 40 50 60 70 80 90
Age in Weeks
Negative relationship

Reliability

Age of Car
No relation
Karl Pearson’s Co-efficient of Correlation
Karl Pearson’s method, popularly known as Pearsonian co-efficient of correlation, is
most widely applied in practice to measure correlation.
It is denoted by or
Variance and Covariance (Recall)
As the variance measures the variations of the RV from its mean
value , the quantity measures the simultaneous
variation of two RV’s and from their respective means and hence it is called
the covariance of and denoted as

is also called the product moment of


and Y.
Though is a useful measure of the degree of correlation between and , it is
to be expressed in mixed units of and . To avoid this difficulty and to express the
degree of correlation in absolute units, we divide by , so that
is a mere number, free from the units of and .

The sign of or denotes the nature of association


while the value of denotes the strength of association.

Depending on the value of , we can classify correlation as follows.


 If , both the variables and increase or decrease in the same proportion. In this
case we say that there is perfect positive correlation.
 If , both the variables and are inversely proportion to each other. In this
case we say that there is perfect negative correlation.
 If , we say that and are having no linear relationship.
 If , there is moderate (partial) positive correlation between and .
 If , there is moderate (partial) negative correlation between and .
We will mainly deal with linear correlation of discrete RV’s and . will take
the values with frequency 1 each and will simultaneously take the
values with frequency 1 each.
2. If the joint pdf of (X,Y) is given by ( , ) = + , 0 ≤ , ≤ 1. Find .

3. The independent random variables X and Y have the pdf given by

Find the correlation coefficient.


Spearman's Rank Correlation Coefficient

Rank correlation coefficient is useful for finding correlation between any two
qualitative characteristics.
For example: Beauty, Honesty, and Intelligence etc., which cannot be measured
quantitatively but can be arranged serially in order of merit or proficiency possessing
the two characteristics.

Suppose we associate the ranks to individuals or items in two series based on order
of merit, the Spearman's Rank correlation coefficient r is given by

Where, = Sum of squares of differences of ranks between paired items in two


series, = Number of paired items
= 1- 1.1515
= -0.1515

We say that there is low degree


of negative rank correlation
between the two judges.
From the table, we have,

We say that there is high degree of positive rank correlation between the scores of selection
and proficiency tests.
SPEARMAN'S RANK CORRELATION COFFICIENT FOR A DATA WITH TIED OBSERVATIONS

In any series, if two or more observations are having same values then the observations
are said to be tied observations
When two or more values are equal it is customary that values are given the average
of the ranks they would have received. In this case the formula for computing rank
correlation coefficient takes the form

Here,
S1 is the number of times first tied observation is repeated
S2 is the number of times second tied observation is repeated
S3 is the number of times third observation is repeated etc.
Partial and Multiple Correlation

Let us say that we find a correlation between these two factors. That is, as the bank
balance increases, cholesterol level also increases.

But this is not a correct relationship as Cholesterol level can also increase as age
increases. Also as age increases, the bank balance may also increase because a person
can save from his salary over the years.

Thus there is age factor which influences both cholesterol level and bank balance.
Suppose we want to know only the correlation between cholesterol and bank balance
without the age influence, we could take persons from the same age group and thus
control age, but if this is not possible we can statistically control the age factor and
thus remove its influence on both cholesterol and bank balance. This if done is called
partial correlation.
If there are three variables and there will be three coefficients of partial
correlation, each studying the relationship between two variables when the third
is held constant. If we denote by i.e., the coefficient of partial correlation
between and keeping constant, it is calculated as
Problem: In a trivariate distribution , it is found that and
. Find the partial correlation coefficients.
Answer:
2. Is it possible to get the following from a set of experimental data?
and

Since the value of is greater than one, there is some inconsistency


in the given data.
Multiple Correlation

Sometimes in psychology we have certain factors which are influenced by large


number of variables.
For instance academic achievement will be affected by intelligence, work habit, extra
coaching, socio economic status, etc.
To find out the correlation between academic achievement with various other factors
as mentioned above can be done by Multiple Correlation.
The coefficient of multiple correlation with three variables and are
and is the coefficient of multiple correlation related to as a
dependent variable and as two independent variables and it can be
expressed in terms of and as
Example:
1. The following zero-order correlation coefficients are given:
and Calculate multiple correlation coefficient
treating first variable as dependent and second and third variables as independent.
Now we get the total correlation coefficient and
Regression
Definition: Regression is the measure of the average relationship between two or
more variables in terms of the original units of data.

Regression Equation: The functional relationship of a dependent variable with


one or more independent variable is called regression equation.
It is also called a prediction equation or estimating equation.

Note: The independent variable in regression analysis is called the "predictor" or


"regressor" and the dependent variable is called the regressed variable.
MAT
Types of Regression:
 If there are only two variables under consideration, then the regression is called
simple regression.
 If there are more than two variables under consideration, then the regression is
called multiple regression.
 If there are more than two variables under consideration, and only the relation
between two variables is established, after excluding the effect of the remaining
variables, then the regression is called partial regression.
 If the relationship between x and y is non-linear, then the regression is a
curvilinear regression.
There are certain guidelines for regression lines:
1) Use regression lines when there is a significant correlation to predict values.
2) Do not use if there is not a significant correlation.
3) Stay within the range of the data. For example, if the data is from 10 to 60, do not predict
a value for 400.

Regression Equations (Linear Fit)


• Linear regression equation of y on x
• Linear regression equation of x on y
MAT
Equation of the Regression Line of Y on X

The regression line of Y on X is the best-fitting straight line for the observed pairs of
values (x1, y1), (x2, y2), …, (xn, yn), based on the assumption that x is the
independent variable and y is the dependent variable.

let the equation of the regression line of Y on X be assumed as


y = ax + b. (1)
Multiple linear Regressions
If the number of independent variables in a regression model is more than one, then
the model is called as multiple regression. In fact, many of the real-world applications
demand the use of multiple regression models.

Regression Model with Two independent variables using Normal equations:

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy