Correlation and Regression
Correlation and Regression
(i) Positive and Negative Correlation : Positive or direct Correlation refers to the
movement of variables in the same direction. The correlation is said to be positive when
the increase (decrease) in the value of one variable is accompanied by an increase
(decrease) in the value of other variable also.
Negative or inverse correlation refers to the movement of the variables in
opposite direction. Correlation is said to be negative, if an increase (decrease) in
the value of one variable is accompanied by a decrease (increase) in the value of
other.
(ii) Simple and Multiple Correlation : Under simple correlation, we study the relationship
between two variables only i.e., between the yield of wheat and the amount of rainfall or
between demand and supply of a commodity. In case of multiple correlation, the
relationship is studied among three or more variables.
For example, the relationship of yield of wheat may be studied with both chemical
fertilizers and the pesticides.
(iii) Partial and Total Correlation : There are two categories of multiple correlation
analysis. Under partial correlation, the relationship of two or more variables is studied in
such a way that only one dependent variable and one independent variable is considered
and all others are kept constant.
For example, coefficient of correlation between yield of wheat and chemical fertilizers
excluding the effects of pesticides and manures is called partial correlation. Total
correlation is based upon all the variables.
(iv) Linear and Non-Linear Correlation: When the amount of change in one variable
tends to keep a constant ratio to the amount of change in the other variable, then the
correlation is said to be linear. But if the amount of change in one variable does not
bear a constant ratio to the amount of change in the other variable then the
correlation is said to be non-linear. The distinction between linear and non-linear is
based upon the consistency of the ratio of change between the variables.
220
200
180
160
140
120
100
wt (kg)
80
60 70 80 90 100 110 120
200
180
160
140
120
100
80 Wt (kg)
60 70 80 90 100 110 120
16
14
12
Height in CM
10
0
0 10 20 30 40 50 60 70 80 90
Age in Weeks
Negative relationship
Reliability
Age of Car
No relation
Karl Pearson’s Co-efficient of Correlation
Karl Pearson’s method, popularly known as Pearsonian co-efficient of correlation, is
most widely applied in practice to measure correlation.
It is denoted by or
Variance and Covariance (Recall)
As the variance measures the variations of the RV from its mean
value , the quantity measures the simultaneous
variation of two RV’s and from their respective means and hence it is called
the covariance of and denoted as
Rank correlation coefficient is useful for finding correlation between any two
qualitative characteristics.
For example: Beauty, Honesty, and Intelligence etc., which cannot be measured
quantitatively but can be arranged serially in order of merit or proficiency possessing
the two characteristics.
Suppose we associate the ranks to individuals or items in two series based on order
of merit, the Spearman's Rank correlation coefficient r is given by
We say that there is high degree of positive rank correlation between the scores of selection
and proficiency tests.
SPEARMAN'S RANK CORRELATION COFFICIENT FOR A DATA WITH TIED OBSERVATIONS
In any series, if two or more observations are having same values then the observations
are said to be tied observations
When two or more values are equal it is customary that values are given the average
of the ranks they would have received. In this case the formula for computing rank
correlation coefficient takes the form
Here,
S1 is the number of times first tied observation is repeated
S2 is the number of times second tied observation is repeated
S3 is the number of times third observation is repeated etc.
Partial and Multiple Correlation
Let us say that we find a correlation between these two factors. That is, as the bank
balance increases, cholesterol level also increases.
But this is not a correct relationship as Cholesterol level can also increase as age
increases. Also as age increases, the bank balance may also increase because a person
can save from his salary over the years.
Thus there is age factor which influences both cholesterol level and bank balance.
Suppose we want to know only the correlation between cholesterol and bank balance
without the age influence, we could take persons from the same age group and thus
control age, but if this is not possible we can statistically control the age factor and
thus remove its influence on both cholesterol and bank balance. This if done is called
partial correlation.
If there are three variables and there will be three coefficients of partial
correlation, each studying the relationship between two variables when the third
is held constant. If we denote by i.e., the coefficient of partial correlation
between and keeping constant, it is calculated as
Problem: In a trivariate distribution , it is found that and
. Find the partial correlation coefficients.
Answer:
2. Is it possible to get the following from a set of experimental data?
and
The regression line of Y on X is the best-fitting straight line for the observed pairs of
values (x1, y1), (x2, y2), …, (xn, yn), based on the assumption that x is the
independent variable and y is the dependent variable.