0% found this document useful (0 votes)

25 views

Business Project 12 Content

O[O[O[O[O[O[OP

Uploaded by

Puraskar Luitel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

Business Project 12 Content

O[O[O[O[O[O[OP

Uploaded by

Puraskar Luitel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

CORRELATION

Introduction:
In today’s business world we come across many activities, which are dependent
on each other. In businesses we see large number of problems involving the use of two
or more variables. Identifying these variables and its dependency helps us in resolving
the many problems. Many times there are problems or situations where two variables
seem to move in the same direction such as both are increasing or decreasing. At times
an increase in one variable is accompanied by a decline in another. For example,
family income and expenditure, price of a product and its demand, advertisement
expenditure and sales volume etc. If two quantities vary in such a way that movements
in one are accompanied by movements in the other, then these quantities are said to
be correlated.
Meaning:
Correlation is a statistical technique to ascertain the association or relationship
between two or more variables. Correlation analysis is a statistical technique to study
the degree and direction of relationship between two or more variables.
A correlation coefficient is a statistical measure of the degree to which changes
to the value of one variable predict change to the value of another. When the
fluctuation of one variable reliably predicts a similar fluctuation in another variable,
there’s often a tendency to think that means that the change in one causes the change
in the other.
Uses of correlations:
1. Correlation analysis helps inn deriving precisely the degree and the direction of
such relationship.
2. The effect of correlation is to reduce the range of uncertainity of our prediction.
The prediction based on correlation analysis will be more reliable and near to
reality.
3. Correlation analysis contributes to the understanding of economic behaviour,
aids in locating the critically important variables on which others depend, may
reveal to the economist the connections by which disturbances spread and
suggest to him the paths through which stabilizing farces may become effective
4. Economic theory and business studies show relationships between variables
like price and quantity demanded advertising expenditure and sales promotion
measures etc.
5. The measure of coefficient of correlation is a relative measure of change.

Page | 1
Types of Correlation:
Correlation is described or classified in several different ways. Three of the
most important are:
I. Positive and Negative
II. Simple, Partial and Multiple
III. Linear and non-linear
I. Positive, Negative and Zero Correlation:
Whether correlation is positive (direct) or negative (in-versa) would depend
upon the direction of change of the variable.
Positive Correlation: If both the variables vary in the same direction, correlation is
said to be positive. It means if one variable is increasing, the other on an average is
also increasing or if one variable is decreasing, the other on an average is also
deceasing, then the correlation is said to be positive correlation. For example, the
correlation between heights and weights of a group of persons is a positive
correlation.
Height (cm) : X 158 160 163 166 168 171 174 176
Weight (kg) : Y 60 62 64 65 67 69 71 72
Negative Correlation: If both the variables vary in opposite direction, the correlation
is said to be negative. If means if one variable increases, but the other variable
decreases or if one variable decreases, but the other variable increases, then the
correlation is said to be negative correlation. For example, the correlation between the
price of a product and its demand is a negative correlation.
Price of Product (Rs. Per Unit) : X 6 5 4 3 2 1
Demand (In Units) : Y 75 120 175 250 215 400
Zero Correlation: Actually it is not a type of correlation but still it is called as zero or
no correlation. When we don’t find any relationship between the variables then, it is
said to be zero correlation. It means a change in value of one variable doesn’t influence
or change the value of other variable. For example, the correlation between weight of
person and intelligence is a zero or no correlation.
II. Simple, Partial and Multiple Correlation:
The distinction between simple, partial and multiple correlation is based upon
the number of variables studied.
Simple Correlation: When only two variables are studied, it is a case of simple
correlation. For example, when one studies relationship between the marks secured
by student and the attendance of student in class, it is a problem of simple correlation.
Partial Correlation: In case of partial correlation one studies three or more variables
but considers only two variables to be influencing each other and the effect of other
influencing variables being held constant. For example, in above example of
relationship between student marks and attendance, the other variable influencing
such as effective teaching of teacher, use of teaching aid like computer, smart board
etc are assumed to be constant.

Page | 2
Multiple Correlation: When three or more variables are studied, it is a case of
multiple correlation. For example, in above example if study covers the relationship
between student marks, attendance of students, effectiveness of teacher, use of
teaching aids etc, it is a case of multiple correlation.
III. Linear and Non-linear Correlation:
Depending upon the constancy of the ratio of change between the variables, the
correlation may be Linear or Non-linear Correlation.
Linear Correlation: If the amount of change in one variable bears a constant ratio to
the amount of change in the other variable, then correlation is said to be linear. If such
variables are plotted on a graph paper all the plotted points would fall on a straight
line. For example: If it is assumed that, to produce one unit of finished product we
need 10 units of raw materials, then subsequently to produce 2 units of finished
product we need double of the one unit.
Raw material : X 10 20 30 40 50 60
Finished Product : Y 2 4 6 8 10 12
Non-linear Correlation: If the amount of change in one variable does not bear a
constant ratio to the amount of change to the other variable, then correlation is said to
be non-linear. If such variables are plotted on a graph, the points would fall on a curve
and not on a straight line. For example, if we double the amount of advertisement
expenditure, then sales volume would not necessarily be doubled.
Advertisement Expenses : X 10 20 30 40 50 60
Sales Volume : Y 2 4 6 8 10 12

Illustration 01:
State in each case whether there is
(a) Positive Correlation
(b) Negative Correlation
(c) No Correlation
Sl No Particulars Solution
1 Price of commodity and its demand Negative
2 Yield of crop and amount of rainfall Positive
3 No of fruits eaten and hungry of a person Negative
4 No of units produced and fixed cost per unit Negative
5 No of girls in the class and marks of boys No Correlation
6 Ages of Husbands and wife Positive
7 Temperature and sale of woollen garments Negative
8 Number of cows and milk produced Positive
9 Weight of person and intelligence No Correlation
10 Advertisement expenditure and sales volume Positive

Page | 3
Methods of measurement of correlation:
Quantification of the relationship between variables is very essential to take the
benefit of study of correlation. For this, we find there are various methods of
measurement of correlation, which can be represented as given below:

Methods of Measurement of Correlation

Graphic Method Algebric Method

1. Karl Pearson’s Coefficient of

Correlation
1. Scatter Diagram 2. Spearman’s Rank Coefficient of
2. Graph Method Correlation
3. Concurrent Deviation Method
4. Method of Least Square

Among these methods we will discuss only the following methods:

1. Scatter Diagram
2. Karl Pearson’s Coefficient of Correlation
3. Spearman’s Rank Coefficient of Correlation

Scatter Diagram:
This is graphic method of measurement of correlation. It is a diagrammatic
representation of bivariate data to ascertain the relationship between two variables.
Under this method the given data are plotted on a graph paper in the form of dot. i.e.
for each pair of X and Y values we put dots and thus obtain as many points as the
number of observations. Usually an independent variable is shown on the X-axis
whereas the dependent variable is shown on the Y-axis. Once the values are plotted on
the graph it reveals the type of the correlation between variable X and Y. A scatter
diagram reveals whether the movements in one series are associated with those in the
other series.
 Perfect Positive Correlation: In this case, the points will form on a straight line
falling from the lower left hand corner to the upper right hand corner.
 Perfect Negative Correlation: In this case, the points will form on a straight line
rising from the upper left hand corner to the lower right hand corner.
 High Degree of Positive Correlation: In this case, the plotted points fall in a
narrow band, wherein points show a rising tendency from the lower left hand
corner to the upper right hand corner.

Page | 4
 High Degree of Negative Correlation: In this case, the plotted points fall in a
narrow band, wherein points show a declining tendency from upper left hand
corner to the lower right hand corner.
 Low Degree of Positive Correlation: If the points are widely scattered over the
diagrams, wherein points are rising from the left hand corner to the upper right
hand corner.
 Low Degree of Negative Correlation: If the points are widely scattered over the
diagrams, wherein points are declining from the upper left hand corner to the
lower right hand corner.
 Zero (No) Correlation: When plotted points are scattered over the graph
haphazardly, then it indicate that there is no correlation or zero correlation
between two variables.

Diagram – I Diagram – II

Diagram – III Diagram – IV

Page | 5
Diagram – V Diagram – VI

Diagram – VII

Illustration 02:
Given the following pairs of values:
Capital Employed (Rs. In Crore) 1 2 3 4 5 7 8 9 11 12
Profit (Rs. In Lakhs) 3 5 4 7 9 8 10 11 12 14
(a) Draw a scatter diagram
(b) Do you think that there is any correlation between profits and capital
employed? Is it positive or negative? Is it high or low?
Solution:
From the observation of scatter diagram we can say that the variables are positively
correlated. In the diagram the points trend toward upward rising from the lower left
hand corner to the upper right hand corner, hence it is positive correlation. Plotted
points are in narrow band which indicates that it is a case of high degree of positive
correlation.

Page | 6
16
14

Profit (Rs. in Lakhs)

0
0 2 4 6 8 10 12 14
Capital Employed (Rs. in Crore)

Karl Pearson’s Coefficient of Correlation:

Karl Pearson’s method of calculating coefficient of correlation is based on the
covariance of the two variables in a series. This method is widely used in practice and
the coefficient of correlation is denoted by the symbol “r”. If the two variables under
study are X and Y, the following formula suggested by Karl Pearson can be used for
measuring the degree of relationship of correlation.

Above different formula’s can be used in different situation depending upon the
information given in the problem.

Page | 7
Illustration 03:
From following information find the correlation coefficient between advertisement
expenses and sales volume using Karl Pearson’s coefficient of correlation method.
Firm 1 2 3 4 5 6 7 8 9 10
Advertisement Exp. (Rs. In Lakhs) 11 13 14 16 16 15 15 14 13 13
Sales Volume (Rs. In Lakhs) 50 50 55 60 65 65 65 60 60 50

Solution:
Let us assume that advertisement expenses are variable X and sales volume are
variable Y.
Calculation of Karl Pearson’s coefficient of correlation
Firm X Y x=X-Ẋ x2 y=Y -Ẏ y2 xy
1 11 50 -3 9 -8 64 24
2 13 50 -1 1 -8 64 8
3 14 55 0 0 -3 9 0
4 16 60 2 4 2 4 4
5 16 65 2 4 7 49 14
6 15 65 1 1 7 49 7
7 15 65 1 1 7 49 7
8 14 60 0 0 2 4 0
9 13 60 -1 1 2 4 -2
10 13 50 -1 1 -8 64 8
140 580 22 360 70
∑X ∑Y ∑x2 ∑y2 ∑xy
Ẋ = ∑X = 140 = 14 Ẏ = ∑Y = 580 = 58
n 10 n 10

∑xy 70 70
r= = = = 0.7866
√∑x2 ∑y2 √22∗360 88.9944

Interpretation: From the above calculation it is very clear that there is high degree of
positive correlation i.e. r = 0.7866, between the two variables. i.e. Increase in
advertisement expenses leads to increased sales volume.

Illustration 04:
Find the correlation coefficient between age and playing habits of the following
students using Karl Pearson’s coefficient of correlation method.
Age 15 16 17 18 19 20
Number of students 250 200 150 120 100 80
Regular Players 200 150 90 48 30 12

Page | 8
Solution:
To find the correlation between age and playing habits of the students, we need to
compute the percentages of students who are having the playing habit.

Percentage of playing habits = No. of Regular Players / Total No. of Students * 100

Now, let us assume that ages of the students are variable X and percentages of playing
habits are variable Y.

Calculation of Karl Pearson’s coefficient of correlation

Percentage
No of Regular
Age (X)
Students Players
of Playing X-Ẋ (X - Ẋ)2 Y-Ẏ (Y - Ẏ)2 (X - Ẋ)(Y - Ẏ)
Habits (Y)
15 250 200 80 -2.5 6.25 30 900 -75
16 200 150 75 -1.5 2.25 25 625 -37.5
17 150 90 60 -0.5 0.25 10 100 -5
18 120 48 40 0.5 0.25 -10 100 -5
19 100 30 30 1.5 2.25 -20 400 -30
20 80 12 15 2.5 6.25 -35 1225 -87.5
105 300 17.5 3350 -240
∑X ∑Y ∑x2 ∑y2 ∑xy

Ẋ = ∑X = 105 = 17.5 Ẏ = ∑Y = 300 = 50

n 6 n 6

∑(X−Ẋ)(Y−Ẏ) −240 −240

r= = = = -0.9912
√∑(X−Ẋ)2 ∑(Y−Ẏ)2 √17.5∗3350 242.126

Interpretation: From the above calculation it is very clear that there is high degree of
negative correlation i.e. r = -0.9912, between the two variables of age and playing
habits. i.e. Playing habits among students decreases when their age increases.

Illustration 05:
Find Karl Pearson’s coefficient of correlation between capital employed and profit
obtained from the following data.
Capital Employed (Rs. In Crore) 10 20 30 40 50 60 70 80 90 100
Profit (Rs. In Crore) 2 4 8 5 10 15 14 20 22 50

Solution:
Let us assume that capital employed is variable X and profit is variable Y.

Page | 9
Calculation of Karl Pearson’s coefficient of correlation
n∑XY−∑X ∑Y
X Y X2 Y2 XY r=
√[n(∑X2) − (∑X)2][n(∑Y2) − (∑Y)2]
10 2 100 4 20
20 4 400 16 80
30 8 900 64 240 (10 ∗ 11500) − (550 ∗ 150)
r=
40 5 1600 25 200 √[(10∗38500)−(5502)] [ (10∗4014)−(1502)]
50 10 2500 100 500
60 15 3600 225 900
70 14 4900 196 980 (1,15,000) − (82,500)
r=
80 20 6400 400 1600 √[(3,85,000)−(3,02,500)] [ (40,140)−(22,500)]
90 22 8100 484 1980
100 50 10000 2500 5000 32,500 32,500
550 150 38500 4014 11500
r= =
√(82,500) (17,640) √1455300000
∑X ∑Y ∑X2 ∑Y2 ∑XY
32,500
r= = 0.8519
38148.3945

Illustration 06:
A computer while calculating the correlation coefficient between the variable X and Y
obtained the following results:
N = 30; ∑X = 120 ∑X2 = 600 ∑Y = 90 ∑Y2 = 250 ∑XY = 335
It was, however, later discovered at the time of checking that it had copied down two
pairs of observations as: (X, Y) : (8, 10) (12, 7)
While the correct values were: (X, Y) : (8, 12) (10, 8)
Obtain the correct value of the correlation coefficient between X and Y.

Solution:
Correct ∑X = 120 – 8 – 12 + 8 + 10 = 118
Correct ∑X2 = 600 – 82 – 122 + 82 + 102
= 600 – 64 – 144 + 64 + 100 = 556
Correct ∑Y = 90 – 10 – 7 + 12 + 8 = 93
Correct ∑Y2 = 250 – 102 – 72 + 122 + 82
= 250 – 100 – 49 + 144 + 64 = 309
Correct ∑XY = 335 – (8*10) – (12*7) + (8*12) + (10*8)
= 335 – 80 – 84 + 96 + 80 = 347

n∑XY−∑X ∑Y −564 −564

r= r= =
√[n(∑X2) − (∑X)2][n(∑Y2) − (∑Y)2] √(2,756) (621) √1711476

(30 ∗ 347) − (118 ∗ 93) −564

r= r= = -0.4311
1308.2339
√[(30∗556)−(1182)] [ (30∗309)−(932)]

(10,410) − (10,974) Therefore, the correct value of correlation

r=
√[(16,680)−(13,924)] [ (9270)−(8649)] coefficient between X and Y is moderately
negative correlation of -0.4311.

Page | 10
Illustration 07:
Coefficient of correlation between X and Y is 0.3. Their covariance is 9. The variance of
X is 16. Find the standard devotion of Y series.

Solution:
Given information:
r = 0.3 Cov (X, Y) = 9 Var (X) = 16
𝐶o(𝑋,𝑌)
r= 0.3 = 9 0.3 = 9
√𝑉𝑎𝑟(𝑋) ∗ 𝑉𝑎𝑟 (𝑌) √16 ∗ 𝑉𝑎𝑟 (𝑌) 4 ∗ √ 𝑉𝑎𝑟 (𝑌)

0.3 * 4 = 9 1.2 = 9 SD(Y) = 9

= 7.5
𝑆𝐷(𝑌) 𝑆𝐷(𝑌) 1.2

Therefore the standard deviation of Y series = σ(Y) = 7.5

Illustration 08:
Calculate correlation coefficient from the following two-way table, with X representing
the average salary of families selected at random in a given area and Y representing
the average expenditure on entertainment.
Expenditure on Average Salary (Rs. ‘000)
Entertainment (Rs. ‘000) 100-150 150-200 200-250 250-300 300-350
0 – 10 5 4 5 2 4
10 – 20 2 7 3 7 1
20 – 30 - 6 - 4 5
30 – 40 8 - 4 - 8
40 – 50 - 7 3 5 10

Solution:
Let us assume that Average Salary is variable X and Expenditure on
Entertainment is variable Y.
In case of grouped data, we need to follow the assumed mean method to
calculate Karl Pearson’s Coefficient of Correlation. Following steps are followed to
compute correlation.
1. Identify the mid-point of the class intervals for variable X and Y.
2. Chose an assumed mean from the mid-point identified above for both X and Y.
3. To simplify further, deviation from assumed mean is computed by dividing
deviation by a common factor.
4. Add the values in cell, row-wise and column-wise, to compute frequencies (f).
Sum of either row-wise or column-wise represent the value of N.
5. Obtain the product of dx and dy and the corresponding frequencies (f) in each
cell. Write the figure thus obtained in the right corner of each cell which
represent the value of fdxdy.

Page | 11
Calculation of Karl Pearson’s coefficient of correlation
X 100 - 150 – 200 – 250 – 300 –
150 200 250 300 350
f dy fdy fdy2 fdxdy
Mid
Y Point
125 175 225 275 325

20 8 0 -4 -16
0 – 10 5 20 -2 -40 80 8
5 4 5 2 4
4 7 0 -7 -2
10 – 20 15 20 -1 -20 20 2
2 7 3 7 1
- 0 - 0 0
20 – 30 25 15 0 0 0 0
- 6 - 4 5
-16 - 0 - 16
30 – 40 35 20 1 20 20 0
8 - 4 - 8
- -14 0 10 40
40 – 50 45 25 2 50 100 36
- 7 3 5 10
100
f 15 24 15 18 28 10 220 46
=N
dx -2 -1 0 1 2 ∑fdy ∑fdy2 ∑fdxdy

fdx -30 -24 0 18 56 20 ∑fdx

fdx2 60 24 0 18 112 214 ∑fdx2

fdxdy 8 1 0 -1 38 46 ∑fdxdy

dx = Mid Point of Series X – Assumed Mean of Series X = MP(X) - 225

dy = Mid Point of Series Y – Assumed Mean of Series Y = MP(Y) - 25

n∑𝑓𝑑xdy−∑fdx ∑fdy (100∗46)− (20∗10)

r= =
√[n(∑𝑓𝑑x2) − (∑𝑓𝑑x)2][n(∑𝑓𝑑𝑦2) − (∑𝑓𝑑𝑦)2] √[(100∗214) − (20)2][(100∗220) − (10)2]

(4,600)− (200) 4,400 4,400

r= = = = 0.2052
√[21,400 − 400][22,000 − 100] √[21,000]∗[21,900] 21,445.2792

Interpretation: From the above calculation it is very clear that there is low degree of
positive correlation i.e. r = 0.2052, between the two variables of salary and
expenditure. It means average salary of income have slightly or low influence over
entertainment expenditure.

Page | 12
Spearman’s Rank Coefficient of Correlation:
When quantification of variables becomes difficult such beauty of female, leadership
ability, knowledge of person etc, then this method of rank correlation is useful which
was developed by British psychologist Charles Edward Spearman in 1904. In this
method ranks are allotted to each element either in ascending or descending order.
The correlation coefficient between these allotted two series of ranks is popularly
called as “Spearman’s Rank Correlation” and denoted by “R”.

To find out correlation under this method, the following formula is used.
2
R=1- 6∑D
where, D =Difference of the ranks between paired items in two series.
N 3− N
N = Number of pairs of ranks

In case of tie in ranks or equal ranks:

In some cases it may be possible that it becomes necessary to assign same rank
to two or more elements or individual or entries. In such situation, it is customary to
give each individual or entry an average rank. For example, if two individuals are
ranked equal to 5th place, then both of them are allotted with common rank (5+6)/2 =
5.5 and if three are ranked in 5th place, then they are given the rank of (5+6+7)/3 = 6.
It means where two or more individuals are to be ranked equal, the rank assigned for
the purpose of calculating coefficient of correlation is the average of the ranks with
these individual or items or entries would have got had they differed slightly with each
other.
Where equal ranks are assigned to some entries, an adjustment factor is to be
added to the value of 6∑D2 in the above formula for calculating the rank coefficient
correlation. This adjustment factor is to be added for every repetition of rank.
1
Adjustment factor = (m13-m1) where, m = number of items whose rank are common
12
For example, if a particular rank repeated two times then m=2 and if it repeats three
times then m= 3 and so on.
Hence the above formula can be re-written as follows:
1 1 1
6 ∗ [∑D2+ (m3−m)+ (m3−m)+ (m3−m)+ …… ]
R=1– 12 12
N 3− N
12

Illustration 09:
Find out spearman’s coefficient of correlation between the two kinds of assessment of
graduate students’ performance in a college.
Name of students A B C D E F G H I
Internal Exam 51 68 73 46 50 65 47 38 60
External Exam 49 72 74 44 58 66 50 30 35

Page | 13
Solution:
Calculation of Spearman’s Rank Coefficient of Correlation
Internal External
Name Ranks (R1) Ranks (R2) D = R1 – R2 D2
Exam Exam
A 51 5 49 6 -1 1
B 68 2 72 2 0 0
C 73 1 74 1 0 0
D 46 8 44 7 1 1
E 50 6 58 4 2 4
F 65 3 66 3 0 0
G 47 7 50 5 2 4
H 36 9 30 9 0 0
I 60 4 35 8 -4 16
∑D2 = 26

R=1-
6∑D2
=1–
6∗26
= 1-
156
=1-
156 = 1 - 0.2167 = 0.7833
N3− N 93− 9 729 − 9 720

Interpretation: From the above calculation it is very clear that there is high degree of
positive correlation i.e. R = 0.7833, between two exams. It means there is a high
degree of positive correlation between the internal exam and external exam of the
students.

Illustration 10:
The coefficient of rank correlation of the marks obtained by 10 students in statistics
and accountancy was found to be 0.8. It was later discovered that the difference in
ranks in the two subjects obtained by one of the students was wrongly taken as 7
instead of 9. Find the correct coefficient of rank correlation.

Solution:
2 2 2 2
R = 1 - 6∑D => 0.8 = 1 - 6∑D => 0.8 = 1 - 6∑D => 6∑D = 1-0.8 =>
N 3− N 103− 10 990 990

6∑D2
= 0.2 => 6∑D2 = 0.2 * 990 => ∑D2 = 198/6 => ∑D2 = 33
990

But this is not correct ∑D2 therefore we need to compute correct value
Correct ∑D2 = 33 – 72 + 92 = 65
Hence, correct
2
value of rank coefficient of correlation is:
R = 1 - 6∑D = 1 – 6∗65 = 1 - 390 = 1 – 0.394 = 0.606
N3− N 990 990

Illustration 11:
Ten competitors in a beauty contest are ranked by three judges in the following order:
1st Judge 1 6 5 10 3 2 4 9 7 8
2nd Judge 3 5 8 4 7 10 2 1 6 9
3rd Judge 6 4 9 8 1 2 3 10 5 7

Page | 14
Use the rank correlation coefficient to determine which pairs of judges has the nearest
approach to common tastes in beauty.

Solution:
In order to find out which pair of judges has the nearest approach to common tastes in
beauty, we compare rank correlation between the judgements of
1. 1st Judge and 2nd Judge
2. 2nd Judge and 3rd Judge
3. 1st Judge and 3rd Judge
Calculation of Spearman’s Rank Coefficient of Correlation
Rank by 1st Rank by 2nd Rank by 3rd
Judge (R1) Judge (R2) Judge (R3) D2 = (R1–R2)2 D2 = (R2–R3)2 D2 = (R1–R3)2
1 3 6 4 9 25
6 5 4 1 1 4
5 8 9 9 1 16
10 4 8 36 16 4
3 7 1 16 36 4
2 10 2 64 64 0
4 2 3 4 1 1
9 1 10 64 81 1
7 6 5 1 1 4
8 9 7 1 4 1
N = 10 N = 10 N = 10 ∑D = 200
2 ∑D = 214
2 ∑D = 60
2

2
1. 1st Judge and 2nd Judge: R = 1 - 6∑D
=1–
6∗200
= 1 – 1200 = 1 – 1.2121= -0.2121
N3− N 103− 10 990
2
2. 2nd Judge and 3rd Judge: R = 1 - 6∑D
=1–
6∗214
=1–
1284
= 1 – 1.297 = -0.297
N3− N 103− 10 990
2
3. 1st Judge and 3rd Judge: R = 1 - 6∑D
=1–
6∗60
=1–
360
= 1 – 0.3636 = 0.6364
N3− N 103− 10 990

Interpretation: From the above calculation it can be observed that coefficient of

correlation is positive in the judgement of the first and third judges. Therefore, it can
be concluded that first and third judges have the nearest approach to common tastes
in beauty.

Illustration 12:
From the following data, compute the rank correlation.
X 82 68 75 61 68 73 85 68
Y 81 71 71 68 62 69 80 70

Solution:
In the problem we find there are repetitions of ranks. Value of X = 68 repeated 3 times
and Value of Y = 71 repeated 2 times. Therefore we need to compute adjustment factor
to be added to the value of ∑D2.

Page | 15
Calculation of Spearman’s Rank Coefficient of Correlation
X Y R1 R2 D= R1 – R 2 D2
82 81 2 1 1 1
68 71 6 3.5 2.5 6.25
75 71 3 3.5 -0.5 0.25
61 68 8 7 1 1
68 62 6 8 -2 4
73 69 4 6 -2 4
85 80 1 2 -1 1
68 70 6 5 1 1
∑D2 18.5
1 1
6 ∗ [∑D2+ (m3−m)+ (m3−m)]
R=1– 12
N3− N
12

When value X repeated three times, m=3,

Adjustment factor (1) = 1 (33-3) = 1 * (27-3) = 1 * 24 = 2
12 12 12
When value Y repeated two times, m=2,
Adjustment factor (2) = 1 (23-2) = 1 * (8-2) = 1 * 6 = 0.5
12 12 12

R=1–
6 ∗ [18.5 + 2 + 0.5]
=1–
6 ∗ 21 = 1 – 126 = 1 – 0.25 = 0.75
83 − 8 512− 8 504

Spearman’s Rank Coefficient of Correlation = 0.75, which indicates there is high

degree of positive correlation.

Properties of Coefficient of Correlation:

1. The coefficient of correlation always lies between – 1 to +1, symbolically it can
written as – 1 ≤ r ≤ 1.
2. The coefficient of correlation is independent of change of origin and scale.
3. The coefficient of correlation is a pure number and is independent of the units of
measurement. It means if X represent say height in inches and Y represent say
weights in kgs, then the correlation coefficient will be neither in inches nor in kgs
but only a pure number.
4. The coefficient of correlation is the geometric mean of two regression coefficient,
symbolically r = √bxy ∗ byx
5. If X and Y are independent variables then coefficient of correlation is zero.

Page | 16
REGRESSION
Meaning:
A study of measuring the relationship between associated variables, wherein
one variable is dependent on another independent variable, called as Regression. It is
developed by Sir Francis Galton in 1877 to measure the relationship of height between
parents and their children.
Regression analysis is a statistical tool to study the nature and extent of
functional relationship between two or more variables and to estimate (or predict) the
unknown values of dependent variable from the known values of independent
variable.
The variable that forms the basis for predicting another variable is known as
the Independent Variable and the variable that is predicted is known as dependent
variable. For example, if we know that two variables price (X) and demand (Y) are
closely related we can find out the most probable value of X for a given value of Y or
the most probable value of Y for a given value of X. Similarly, if we know that the
amount of tax and the rise in the price of a commodity are closely related, we can find
out the expected price for a certain amount of tax levy.

Uses of Regression Analysis:

1. It provides estimates of values of the dependent variables from values of
independent variables.
2. It is used to obtain a measure of the error involved in using the regression line as
a basis for estimation.
3. With the help of regression analysis, we can obtain a measure of degree of
association or correlation that exists between the two variables.
4. It is highly valuable tool in economies and business research, since most of the
problems of the economic analysis are based on cause and effect relationship.

Distinction between Correlation and Regression

Sl No Correlation Regression
1 It measures the degree and direction It measures the nature and extent of
of relationship between the variables. average relationship between two or
more variables in terms of the original
units of the data
2 It is a relative measure showing It is an absolute measure of
association between the variables. relationship.
3 Correlation Coefficient is independent Regression Coefficient is independent
of change of both origin and scale. of change of origin but not scale.
4 Correlation Coefficient is independent Regression Coefficient is not
of units of measurement. independent of units of measurement.
5 Expression of the relationship Expression of the relationship
between the variables ranges from –1 between the variables may be in any

Page | 17
to +1. of the forms like:
Y = a + bX
Y = a + bX + cX2
6 It is not a forecasting device. It is a forecasting device which can be
used to predict the value of dependent
variable from the given value of
independent variable.
7 There may be zero correlation such as There is nothing like zero regression.
weight of wife and income of husband.

Regression Lines and Regression Equation:

Regression lines and regression equations are used synonymously. Regression
equations are algebraic expression of the regression lines. Let us consider two
variables: X & Y. If y depends on x, then the result comes in the form of simple
regression. If we take the case of two variable X and Y, we shall have two regression
lines as the regression line of X on Y and regression line of Y on X. The regression line
of Y on X gives the most probable value of Y for given value of X and the regression line
of X on Y given the most probable value of X for given value of Y. Thus, we have two
regression lines. However, when there is either perfect positive or perfect negative
correlation between the two variables, the two regression line will coincide, i.e. we
will have one line. If the variables are independent, r is zero and the lines of regression
are at right angles i.e. parallel to X axis and Y axis.
Therefore, with the help of simple linear regression model we have the
following two regression lines
1. Regression line of Y on X: This line gives the probable value of Y (Dependent
variable) for any given value of X (Independent variable).
Regression line of Y on X : Y – Ẏ = byx (X – Ẋ)
OR : Y = a + bX

2. Regression line of X on Y: This line gives the probable value of X (Dependent

variable) for any given value of Y (Independent variable).
Regression line of X on Y : X – Ẋ = bxy (Y – Ẏ)
OR : X = a + bY

In the above two regression lines or regression equations, there are two
regression parameters, which are “a” and “b”. Here “a” is unknown constant and “b”
which is also denoted as “byx” or “bxy”, is also another unknown constant popularly
called as regression coefficient. Hence, these “a” and “b” are two unknown constants
(fixed numerical values) which determine the position of the line completely. If the
value of either or both of them is changed, another line is determined. The parameter
“a” determines the level of the fitted line (i.e. the distance of the line directly above or
below the origin). The parameter “b” determines the slope of the line (i.e. the change
in Y for unit change in X).

Page | 18
If the values of constants “a” and “b” are obtained, the line is completely
determined. But the question is how to obtain these values. The answer is provided by
the method of least squares. With the little algebra and differential calculus, it can be
shown that the following two normal equations, if solved simultaneously, will yield
the values of the parameters “a” and “b”.
Two normal equations:
X on Y Y on X
∑X = Na + b∑Y ∑Y = Na + b∑X
∑XY = a∑Y + b∑Y 2 ∑XY = a∑X + b∑X2

This above method is popularly known as direct method, which becomes quite
cumbersome when the values of X and Y are large. This work can be simplified if
instead of dealing with actual values of X and Y, we take the deviations of X and Y
series from their respective means. In that case:
Regression equation Y on X:
Y = a + bX will change to (Y – Ẏ) = byx (X – Ẋ)
Regression equation X on Y:
X = a + bY will change to (X – Ẋ) = bxy (Y – Ẏ)
In this new form of regression equation, we need to compute only one
parameter i.e. “b”. This “b” which is also denoted either “byx” or “bxy” which is called as
regression coefficient.

Regression Coefficient:
The quantity “b” in the regression equation is called as the regression
coefficient or slope coefficient. Since there are two regression equations, therefore, we
have two regression coefficients.
1. Regression Coefficient X on Y, symbolically written as “bxy”
2. Regression Coefficient Y on X, symbolically written as “byx”
Different formula’s used to compute regression coefficients:
Method Regression Coefficient X on Y Regression Coefficient Y on X
Using the correlation 𝜎𝑥 𝜎𝑦
coefficient (r) and bxy = 𝑟 byx = 𝑟
𝜎𝑦 𝜎𝑥
standard deviation (σ)
Direct Method: Using bxy =
N∑XY− ∑X∑Y
byx =
N∑XY− ∑X∑Y
sum of X and Y N∑Y2− (∑Y)2 N∑X2− (∑X)2
∑𝑥𝑦 ∑𝑥𝑦
When deviations are bxy = byx =
taken from arithmetic ∑𝑦2 ∑𝑥2
mean where x = X - Ẋ and y = Y - Ẏ where x = X - Ẋ and y = Y - Ẏ

Properties of Regression Coefficients:

1. The coefficient of correlation is the geometric mean of the two regression
coefficients. Symbolically r = √bxy ∗ byx

Page | 19
2. If one of the regression coefficients is greater than unity, the other must be less
than unity, since the value of the coefficient of correlation cannot exceed unity.
For example if bxy = 1.2 and byx = 1.4 “r” would be = √1.2 ∗ 1.4 = 1.29, which is
not possible.
3. Both the regression coefficient will have the same sign. i.e. they will be either
positive or negative. In other words, it is not possible that one of the regression
coefficients is having minus sign and the other plus sign.
4. The coefficient of correlation will have the same sign as that of regression
coefficient, i.e. if regression coefficient have a negative sign, “r” will also have
negative sign and if the regression coefficient have a positive sign, “r” would also
be positive. For example, if bxy = -0.2 and byx = -0.8 then r = - √0.2 ∗ 0.8 = – 0.4
5. The average value of the two regression coefficient would be greater than the
value of coefficient of correlation. In symbol (bxy + byx) / 2 > r. For example, if
bxy = 0.8 and byx = 0.4 then average of the two values = (0.8 + 0.4) / 2 = 0.6 and
the value of r = r = √0.8 ∗ 0.4 = 0.566 which less than 0.6
6. Regression coefficients are independent of change of origin but not scale.

Illustration 01:
Find the two regression equation of X on Y and Y on X from the following data:
X : 10 12 16 11 15 14 20 22
Y : 15 18 23 14 20 17 25 28

Solution:
Calculation of Regression Equation
X Y X2 Y2 XY
10 15 100 225 150
12 18 144 324 216
16 23 256 529 368
11 14 121 196 154
15 20 225 400 300
14 17 196 289 238
20 25 400 625 500
22 28 484 784 616
120 160 1,926 3,372 2,542
∑X ∑Y ∑X2 ∑Y2 ∑XY
Here N = Number of elements in either series X or series Y = 8
Now we will proceed to compute regression equations using normal equations.
Regression equation of X on Y: X = a + bY
The two normal equations are:
∑X = Na + b∑Y
∑XY = a∑Y + b∑Y2
Substituting the values in above normal equations, we get

Page | 20
120 = 8a + 160b ..... (i)
2542 = 160a + 3372b ..... (ii)
Let us solve these equations (i) and (ii) by simultaneous equation method
Multiply equation (i) by 20 we get 2400 = 160a + 3200b
Now rewriting these equations:
2400 = 160a + 3200b
2542 = 160a + 3372b
(-) (-) (-) .
-142 = -172b
Therefore now we have -142 = -172b, this can rewritten as 172b = 142
Now, b = 142 = 0.8256 (rounded off)
172
Substituting the value of b in equation (i), we get
120 = 8a + (160 * 0.8256)
120 = 8a + 132 (rounded off)
8a = 120 - 132
8a = -12
a = -12/8
a = -1.5
Thus we got the values of a = -1.5 and b = 0.8256
Hence the required regression equation of X on Y:
X = a + bY => X = -1.5 + 0.8256Y

Regression equation of Y on X: Y = a + bX
The two normal equations are:
∑Y = Na + b∑X
∑XY = a∑X + b∑X2
Substituting the values in above normal equations, we get
160 = 8a + 120b ..... (iii)
2542 = 120a + 1926b ..... (iv)
Let us solve these equations (iii) and (iv) by simultaneous equation method
Multiply equation (iii) by 15 we get 2400 = 120a + 1800b
Now rewriting these equations:
2400 = 120a + 1800b
2542 = 120a + 1926b
(-) (-) (-) .
-142 = -126b
Therefore now we have -142 = -126b, this can rewritten as 126b = 142
Now, b = 142 = 1.127 (rounded off)
126
Substituting the value of b in equation (iii), we get
160 = 8a + (120 * 1.127)
160 = 8a + 135.24

Page | 21
8a = 160 - 135.24
8a = 24.76
a = 24.76/8
a = 3.095
Thus we got the values of a = 3.095 and b = 1.127
Hence the required regression equation of Y on X:
Y = a + bX => Y = 3.095 + 1.127X

Illustration 02:
After investigation it has been found the demand for automobiles in a city depends
mainly, if not entirely, upon the number of families residing in that city. Below are the
given figures for the sales of automobiles in the five cities for the year 2019 and the
number of families residing in those cities.
City No. of Families (in lakhs): X Sale of automobiles (in ‘000): Y
Belagavi 70 25.2
Bangalore 75 28.6
Hubli 80 30.2
Kalaburagi 60 22.3
Mangalore 90 35.4
Fit a linear regression equation of Y on X by the least square method and estimate the
sales for the year 2020 for the city Belagavi which is estimated to have 100 lakh
families assuming that the same relationship holds true.

Solution:
Calculation of Regression Equation
City X Y X2 XY
Belagavi 70 25.2 4900 1764
Bangalore 75 28.6 5625 2145
Hubli 80 30.2 6400 2416
Kalaburagi 60 22.3 3600 1338
Mangalore 90 35.4 8100 3186
375 141.7 28,625 10,849
∑X ∑Y ∑X2 ∑XY
Regression equation of Y on X: Y = a + bX
The two normal equations are:
∑Y = Na + b∑X
∑XY = a∑X + b∑X2
Substituting the values in above normal equations, we get
141.7 = 5a + 375b ....................................... (i)
10849= 375a + 28625b .................................. (ii)
Let us solve these equations (i) and (ii) by simultaneous equation method
Multiply equation (i) by 75 we get 10627.5 = 375a + 28125b

Page | 22
Now rewriting these equations:
10627.5 = 375a + 28125b
10849 = 375a + 28625b
(-) (-) (-) .
-221.5 = -500b
Therefore now we have -221.5 = -500b, this can rewritten as 500b = 221.5
Now, b = 221.5 = 0.443
500
Substituting the value of b in equation (i), we get
141.7 = 5a + (375 * 0.443)
141.7 = 5a + 166.125
5a = 141.7 - 166.125
5a = -24.425
a = -24.425/5
a = -4.885
Thus we got the values of a = -4.885 and b = 0.443
Hence, the required regression equation of Y on X:
Y = a + bX => Y = -4.885 + 0.443X
Estimated sales of automobiles (Y) in city Belagavi for the year 2020, where number of
families (X) are 100(in lakhs):
Y = -4.885 + 0.443X
Y = -4.885 + (0.443 * 100)
Y = -4.885 + 44.3
Y = 39.415 (‘000)
Means sales of automobiles would be 39,415 when number of families are 100,00,000

Illustration 03:
From the following data obtain the two regression lines:
Capital Employed (Rs. in lakh): 7 8 5 9 12 9 10 15
Sales Volume (Rs. in lakh): 4 5 2 6 9 5 7 12

Solution:
Calculation of Regression Equation
X Y X2 Y2 XY
7 4 49 16 28
8 5 64 25 40
5 2 25 4 10
9 6 81 36 54
12 9 144 81 108
9 5 81 25 45
10 7 100 49 70
15 12 225 144 180
75 50 769 380 535
∑X ∑Y ∑X2 ∑Y2 ∑XY

Page | 23
Regression line/equation of X on Y: Regression line/equation of Y on X:
(X – Ẋ) = bxy (Y – Ẏ) (Y – Ẏ) = byx (X – Ẋ)

Ẋ = ∑X = 75 = 9.375 Ẋ = ∑X = 75 = 9.375
n 8 n 8

Ẏ = ∑Y = 50 = 6.25 Ẏ = ∑Y = 50 = 6.25
n 8 n 8

Regression coefficient of X on Y: Regression coefficient of Y on X:

b x=y n∑XY− ∑X∑Y
b y=x n∑XY− ∑X∑Y
2 2
n∑Y − (∑Y) 2 2
n∑X − (∑X)

(8∗535) – (75∗50) (8∗535) – (75∗50)

b =
xy b =
xy
(8∗380) – (50)2 (8∗769) – (75)2
4280 – 3750 4280 – 3750
= =
3040 – 2500 6152 – 5625
530 530
= = 0.9815 = = 1.0057
540 527

(X – Ẋ) = bxy (Y – Ẏ) (Y – Ẏ) = byx (X – Ẋ)
 X – 9.375 = 0.9815 (Y – 6.25)  Y – 6.25 = 1.0057 (X – 9.375)
 X – 9.375 = 0.9815Y – 6.1344  Y – 6.25 = 1.0057X – 9.4284)
 X = 9.375 – 6.1344 + 0.9815Y  Y = 6.25 – 9.4284 + 1.0057X
 X = 3.2406 + 0.9815Y  Y = -3.1784 + 1.0057X

Illustration 04:
From the following information find regression equations and estimate the production
when the capacity utilisation is 70%.
Average (Mean) Standard Deviation
Production (in lakh units) 42 12.5
Capacity Utilisation (%) 88 8.5
Correlation Coefficient (r) 0.72
Solution:
Let production be variable X and capacity utilisation be variable Y. Regression
equation of production based on based on capacity utilisation shall be given by X on Y
and regression equation of capacity utilisation of production shall be given by Y on X,
which can be computed as given below:
Given Information: Ẋ = 42 Ẏ = 88 σx = 12.5 σy = 8.5 r = 0.72
Regression coefficient of X on Y: Regression coefficient of Y on X:
𝜎𝑥 𝜎𝑦
bxy = 𝑟 = 0.72 ∗ 12.5 = 1.0588 byx = 𝑟 = 0.72 ∗ 8.5 = 0.4896
𝜎𝑦 8.5 𝜎𝑥 12.5
Regression Equation of X on Y: Regression Equation of Y on X:
(X – Ẋ) = bxy (Y – Ẏ) (Y – Ẏ) = byx (X – Ẋ)
 X – 42 = 1.0588 (Y – 88)  Y – 88 = 0.4896 (X – 42)
 X = 42 – 93.1744 + 1.0588Y  Y = 88 – 20.5632 + 0.4896X
 X = -51.1744 + 1.0588Y  Y = 67.4368 + 0.4896X

Page | 24
Estimation of the production when the capacity utilisation is 70% is regression
equation X on Y, where Y = 70
Regression Equation of X on Y:
(X – Ẋ) = bxy (Y – Ẏ)
X = -51.1744 + 1.0588Y
= -51.1744 + (1.0588 * 70)
= -51.1744 + 74.116
= 22.9416
Therefore, the estimated production would be 22,94,160 units when there is a
capacity utilisation of 70%.

Illustration 05:
The following data gives the age and blood pressure (BP) of 10 sports persons.
Name : A B C D E F G H I J
Age (X) : 42 36 55 58 35 65 60 50 48 51
BP (Y) : 98 93 110 85 105 108 82 102 118 99
i. Find regression equation of Y on X and X on Y (Use the method of deviation
from arithmetic mean)
ii. Find the correlation coefficient (r) using the regression coefficients.
iii. Estimate the blood pressure of a sports person whose age is 45.

Solution:
Calculation of Regression Equation
x=X-Ẋ y=Y-Ẏ
Name Age (X) BP (Y) x2 y2 xy
x=X-50 y=Y-100
A 42 98 -8 -2 64 4 16
B 36 93 -14 -7 196 49 98
C 55 110 5 10 25 100 50
D 58 85 8 -15 64 225 -120
E 35 105 -15 5 225 25 -75
F 65 108 15 8 225 64 120
G 60 82 10 -18 100 324 -180
H 50 102 0 2 0 4 0
I 48 118 -2 18 4 324 -36
J 51 99 1 -1 1 1 -1
500 1,000 0 0 904 1,120 -128
∑X ∑Y ∑x ∑y ∑x2 ∑y2 ∑xy

Ẋ = ∑X = 500 = 50 Ẏ = ∑Y = 1000 = 100

n 10 n 10
Regression coefficients can be computed using the following formula:
∑𝑥𝑦 ∑𝑥𝑦
bxy = byx = where x = X - Ẋ and y = Y - Ẏ
∑𝑦 2 ∑𝑥2

Page | 25
Regression coefficient of X on Y: Regression coefficient of Y on X:
∑𝑥𝑦 ∑𝑥𝑦
bxy = = −128 = -0.1143 byx = =
−128
= -0.1416
∑𝑦 2 1120 ∑𝑥2 904

Regression equation of X on Y: Regression equation of Y on X:

(X – Ẋ) = bxy (Y – Ẏ) (Y – Ẏ) = byx (X – Ẋ)
 X – 50 = -0.1143 (Y – 100)  Y – 100 = -0.1416 (X – 50)
 X – 50 = -0.1143Y + 11.43  Y – 100 = -0.1416X + 7.08
 X = 50 + 11.43 – 0.1143Y  Y = 100 + 7.08 – 0.1416X
 X = 61.43 - 0.1143Y  Y = 107.08 – 0.1416X

Computation of coefficient of correlation using regression coefficient:

r = √bxy ∗ byx = – √0.1143 ∗ 0.1416 = – √0.01618488 = – 0.1272
Therefore, we have low degree of negative correlation between age and blood
pressure of sports person.

Estimation of the blood pressure (Y) of a sports person whose age is X=45 can
be calculated using regression equation Y on X:
Regression equation of Y on X:
(Y – Ẏ) = byx (X – Ẋ)
 Y = 107.08 – 0.1416X = 107.08 – (0.1416 * 45) = 107.08 – 6.372 = 100.708
It means estimated blood pressure of a sports person is 101 (rounded off)
whose age is 45.

Illustration 06:
There are two series of index numbers, P for price index and S for stock of commodity.
The mean and standard deviation of P are 100 and 8 and S are 103 and 4 respectively.
The correlation coefficient between the two series is 0.4. With these data, work out a
linear equation to read off values of P for various values of S. Can the same equation be
used to read off values of S for various values of P?

Solution:
Let us assume that P=Price Index be variable X an S=Stock of Commodity be variable Y.
Linear equation to read off values of P for various values of S would be regression
equation of X on Y. Regression coefficient is to be computed using mean and standard
deviation.
From the problem we can list out the given information:
Ẋ = 100 Ẏ = 103 σx = 8 σy = 4 r = 0.4

Regression equation of X on Y:
(X – Ẋ) = bxy (Y – Ẏ)

 (X – Ẋ) = 𝑟 𝜎𝑥 (Y – Ẏ)
𝜎𝑦

Page | 26
 (X – 100) = (0.4 ∗ 8) (Y – 103)
4
 (X – 100) = 0.8 (Y – 103)
 (X – 100) = 0.8Y – 82.4
 X = 100 – 82.4 + 0.8Y
 X = 17.6 + 0.8Y
Linear equation to read off values of P for various values of S is X = 17.6 + 0.8Y
To read off values of S for various values of P we need regression equation of Y on X
and therefore above linear equation cannot be used. Hence, the following regression
equation of Y on X be computed:
(Y – Ẏ) = byx (X – Ẋ)
 (Y – Ẏ) = 𝑟 𝜎𝑦 (X – Ẋ)
𝜎𝑥
 (Y – 103) = 0.4 ∗ 4
(X – 100)
8
 (Y – 103) = 0.2 (X – 100)
 Y – 103 = 0.2X – 20
 Y = 103 – 20 + 0.2X
 Y = 83 + 0.2X
Hence, the linear equation to read off values of S for various values of P is Y = 83 + 0.2X.

Applications of Correlation and Regression

Some real-life applications of correlation and regression analysis:

Economics and Finance

Healthcare and Medicine
Social Sciences
Engineering and Technology
Environmental Science
Economics and Finance

Correlation and regression analyses are integral tools in economics and finance,
enabling investors, economists, and marketers to make data-driven decisions and
predictions to optimize portfolio performance, economic policy, and marketing
strategies.

Page | 27
Economics and Finance

In Stock Market Analysis

Portfolio Diversification: Investors rely on such correlation analysis to reveal the

correlation between various stocks or assets held in the portfolio. Through
discrimination of assets that have a low or even negative correlation with each other, a
pattern of investment dives into diversified portfolios which decrease risk cumulatively.

Risk Management: Regression analysis can be the tool that enables investors to put a
number on how one stock’s or asset’s return is associated with several indicators like
market indices, interest rates, or macro indicators. Such assessments enable investors to
analyze these factors as potential performance levers and adjust their risk exposure if
the need arises.

In Economic Forecasting

GDP Growth Forecasting: With multiple regression analysis, future GDP growth is
forecasted where historical GDP data and pertinent economic indicators including that
of consumer spending, investment, government expenditure, as well as net exports are
used. The economists put them together and can produce models or equations to help
them forecast possible future economic performances.

Inflation Rate Forecasting: Furthermore, regression can be useful in predicting the

inflation rates through analysis of the historical data, inflation channels like money
supply growth, interest rates, and labor market conditions. The understanding of this
gives policymakers and investors the potential to predict coming out such trends in the
future hence enabling the formulation of sound economic policies as a result.

Unemployment Rate Forecasting: A regression analysis is also employed in

predicting unemployment rates by examining data in relation to unemployment rate and
all its determinants, which include economic growth, shrinking or expanding labor force
participation, and demographic developments. Through specifying what are the forces
that lead to unemployment policy makers may design and erect the directed measures to
combat the labor market imperfections.

In Marketing ROI

Advertising Effectiveness: The purpose of regression analysis is to examine how

efficiently marketing campaigns are conducted. These performance metrics are selected
based on advertising expense and sales revenue, brand awareness, or customer
engagement. Through measuring the adverse effects of advertising that stem from these
consequences, marketers can apply a corrective action to advertising approaches and
drive increase in the potential returns on investment (ROI).

Page | 28
Market Segmentation: Correlation analysis becomes an important tool also to tailor
marketing campaigns through different market segments with different responsive
rates. The correlation marketing can be traced down to determinants of customer
demographics, psychographics or buying behavior and comparison outcomes leading to
choice of marketing strategies that suit well specific segments of customers and
performance.

Healthcare and Medicine

In healthcare and medicine, correlation and regression analyses play crucial roles in
improving patient outcomes, optimizing treatments, and informing preventive
measures. Here are real-life examples of their application:

In Clinical Trials
Example: A pharmaceutical company got a permission to compare a new drug or
medicinal drops in lowering blood pressure of hypertensive patients in a clinical trial.

Application of Regression Analysis: Regression analysis is applied by the

investigators in analyzing the data obtained during a clinical trial. They focus on how the
dosage of the drug, its frequency of administration, and the patients’ classifications (for
example, gender, age, weight) are associated with the changes in blood pressure
occurring during the time of the study. This allows doctors to determine the specific
effective dosage and appropriate treatment duration and regimen that will bring the
most desired therapeutic effects without the adverse effects, or side effects.
In Disease Risk Assessment
Example: Public health officials set a target among themselves to evaluate all the factors
of the risk of developing cardiovascular diseases (CVDs) in that area.

Application of Correlation Analysis: The analyzing team relies on correlation

analysis, which is a way to uncover relationship between lifestyle factors (for example:
diet, exercise habits, smoking status) and health outcomes that are related to CVD like
incidence rates, mortality rates, and biomarkers (for example: cholesterol levels, blood
pressure). This review points out that certain behaviors like unhealthy eating habits and
physical inactivity are very likely to be linked with CVD risk factors.

Application of Regression Analysis: Regression is the next step, with the objective
of creating predictive models that estimate an individual’s risk of getting CVD on the
basis of their lifestyle factors using these modelling techniques. These models calculate
the dose of association of the risk factors and the disease end outcomes thus instead of
prescription intervention strategies such as providing targeted methods that promote
healthy behaviors and reduce modifiable risk factors are made possible.
Social Sciences
Correlation and regression analyses are valuable tools in Social Sciences for identifying
patterns, understanding relationships between variables, and informing decision-
making processes.

Page | 29
In Education

Regression Analysis in Academic Achievement: A school district is concerned

about the student achievements but not sure of how to solve the issue. They generate
data on different areas of need such class size, teacher experience, student to teacher
ratio and student demographics. Through applying a regression analysis, they can
determine the cutting edge factors behind the increase of academic performance. Such
as they get to discover that a class of fewer students and more experienced teachers may
very well relate with better students’ outcomes. The information could be used to
facilitate resource allocation, and policy makers could utilize this insight to improve
learner outcomes.

In Sociology

Correlation Analysis in Crime Rates: A sociologist tries to explain the causes that
may be found to be useful in understanding the difference in the level of crime across
neighborhoods. They compile data on factors like income bracket, education, and
unemployment percentage as well as police presence. They can conduct correlation
studies so as to establish the existing relationships between gun violence and select
social factors. This may be illustrated through crime analysis information where regions
that include low level of income and high level of unemployment have higher chances of
crime. The resulting awareness can help formulate social policies that will direct
towards crime reduction by tackling the core social-economic problems.
Correlation Analysis in Voting Behavior: In the election campaign, it is analysts of
political scientists who want to get an idea of these factors that influence the voting
behavior. They then gather data on different variables including the endorsement
candidates, political parties that are often involved, socioeconomic level and also media
coverage. One tool that the campaigns have at their disposal is correlation analysis,
which allows them to explore the relations between these variables and the election
results. They may discover that socioeconomic status and media coverage as factors that
influenced voting scheme. This insight enables the political strategists to construct their
messages as well customize them for specific demographic audiences of voters with
which it is intended to connect an impact.
Engineering and Technology
Real-life applications of correlation and regression in Engineering and Technology,
particularly in the areas of Quality Control and Predictive Maintenance are:

In Quality Control
Example: For automotive manufacturing, regression analysis has been applied to
correlate processing variables like pressure, temperature, and material composition
with product qualities like rigidity, durability and dimensional precision. Through a data
collection process that features information from the various stages of the production
cycle and submitting it for regression analysis, manufacturers conduct an assessment of
the process parameters that they find to be close to being critical and end up
significantly influencing the quality of product.

Page | 30
In Predictive Maintenance
Example: One may face a situation where safety and efficiency are of the utmost
importance and regression analysis is a powerful tool used for predictive maintenance
programs applied to aircraft engines. Using criteria such as motor operating conditions,
fuel consumption, and vibration levels, the airline can collect a huge amount of historical
data not just on motor performance but also on a running engine. With this process of
regression analysis, the resulting forecast models will be equipped to assess potential
failures or maintenance requests accurately before they take place.

Environmental Science

Correlation and regression analyses are invaluable tools in environmental science for
understanding complex relationships between environmental variables and ecosystem
dynamics. Whether modeling future climate scenarios or assessing the impact of habitat
fragmentation on biodiversity, these statistical techniques provide essential insights that
inform decision-making and conservation efforts aimed at preserving our planet’s
ecosystems and mitigating the effects of climate change.

Page | 31
CONCLUSION
Review of Correlation and Regression Analysis:

In correlation analysis, when we are keen to know whether two variables under study
are associated or correlated and if correlated what is the strength of correlation. The
best measure of correlation is proved by Karl Pearson’s Coefficient of Correlation.
However, one severe limitation of this method is that it is applicable only in case of a
linear relationship between two variables. If two variables say X and Y are independent
or not correlated then the result of correlation coefficient is zero.
Correlation coefficient measuring a linear relationship between the two
variables indicates the amount of variation one variable accounted for by the other
variable. A better measure for this purpose is provided by the square of the
correlation coefficient, known as “coefficient of determination”. This can be
interpreted as the ratio between the explained variance to total variance:
Explained variance
r2 = Similarly, Coefficient of non-determination = (1 – r2).
Total Variance
Regression analysis is concerned with establishing a functional relationship
between two variables and using this relationship for making future projection. This
can be applied, unlike correlation for any type of relationship linear as well as
curvilinear. The two lines of regression coincide i.e. become identical when r= -1 or +1
in other words, there is a perfect negative or positive correlation between the two
variables under discussion if r = 0, then regression lines are perpendicular to each
other.
In conclusion, correlation and regression are like detectives in the world of
numbers, helping us understand how things are connected. Correlation gives us a sense
of how two things change together, while regression dives deeper, predicting what
might happen next. Whether it’s figuring out where the stock market is headed or how
our actions affect the environment, these tools are essential.

Page | 32
Page | 33

Your Space Student's Book
No ratings yet
Your Space Student's Book
4 pages
Assignment On Correlation
100% (1)
Assignment On Correlation
7 pages
Unit 3 Correlation and Regression
No ratings yet
Unit 3 Correlation and Regression
27 pages
Business Statistics Unit 4 Correlation and Regression
No ratings yet
Business Statistics Unit 4 Correlation and Regression
27 pages
Correlation
No ratings yet
Correlation
22 pages
Business Statistic-Correlation and Regression
No ratings yet
Business Statistic-Correlation and Regression
30 pages
Business Statistics Unit 4 Correlation and Regression
No ratings yet
Business Statistics Unit 4 Correlation and Regression
27 pages
Correlation and Regression
No ratings yet
Correlation and Regression
64 pages
Correlation
No ratings yet
Correlation
7 pages
Peter
No ratings yet
Peter
48 pages
05correlation Lecture
No ratings yet
05correlation Lecture
14 pages
Correlation Analysis PDF
No ratings yet
Correlation Analysis PDF
30 pages
Correlation Maths
No ratings yet
Correlation Maths
27 pages
Correlation Regreesion Sums
No ratings yet
Correlation Regreesion Sums
50 pages
Business Statistics and analytics Unit 2 notes, Correlation and Regression
No ratings yet
Business Statistics and analytics Unit 2 notes, Correlation and Regression
26 pages
Correlation Analysis: Concept and Importance of Correlation
No ratings yet
Correlation Analysis: Concept and Importance of Correlation
8 pages
Correlation
No ratings yet
Correlation
17 pages
Notes For Correlation Unit - 3 Business Statistics
No ratings yet
Notes For Correlation Unit - 3 Business Statistics
21 pages
Correlation & Regression Analysis
No ratings yet
Correlation & Regression Analysis
16 pages
Scatter plot
No ratings yet
Scatter plot
33 pages
Correlation Analysis
No ratings yet
Correlation Analysis
20 pages
Correlation Bmlt
No ratings yet
Correlation Bmlt
5 pages
Statistics
No ratings yet
Statistics
21 pages
ECAP790 U06L01 Correlation
No ratings yet
ECAP790 U06L01 Correlation
37 pages
Chapter 6 PDF
No ratings yet
Chapter 6 PDF
3 pages
Correlation and Regression: Jaipur National University
No ratings yet
Correlation and Regression: Jaipur National University
32 pages
Correlation 805deee567bf3bca405e2e973070a021
No ratings yet
Correlation 805deee567bf3bca405e2e973070a021
18 pages
Correlation: Definitions
No ratings yet
Correlation: Definitions
24 pages
Correlation: Self Instructional Study Material Programme: M.A. Development Studies
No ratings yet
Correlation: Self Instructional Study Material Programme: M.A. Development Studies
21 pages
R Programming
No ratings yet
R Programming
21 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
100 pages
Correlation and Regression-1
No ratings yet
Correlation and Regression-1
32 pages
Correlation: (For M.B.A. I Semester)
100% (2)
Correlation: (For M.B.A. I Semester)
46 pages
Correlation Analysis
No ratings yet
Correlation Analysis
49 pages
Correlation
No ratings yet
Correlation
20 pages
Earthquake Microzonation of Yogyakarta City
No ratings yet
Earthquake Microzonation of Yogyakarta City
23 pages
Unit 2 - (A) Correlation & Regression
No ratings yet
Unit 2 - (A) Correlation & Regression
15 pages
Qt Module II Correlation and Regression Analysis
No ratings yet
Qt Module II Correlation and Regression Analysis
10 pages
Unit 3-1
No ratings yet
Unit 3-1
12 pages
Concept of Correlation (1)
No ratings yet
Concept of Correlation (1)
17 pages
QT-Correlation and Regression-1
No ratings yet
QT-Correlation and Regression-1
3 pages
Correlation
No ratings yet
Correlation
41 pages
Correlation
No ratings yet
Correlation
83 pages
24 11
No ratings yet
24 11
24 pages
Correlation Analysis1
No ratings yet
Correlation Analysis1
25 pages
MA IF - Quantitative Techniques For Business
No ratings yet
MA IF - Quantitative Techniques For Business
114 pages
Strategic Management
No ratings yet
Strategic Management
114 pages
Business Statistics Unit 3-5
No ratings yet
Business Statistics Unit 3-5
113 pages
Correlation Analysis Notes-2
No ratings yet
Correlation Analysis Notes-2
5 pages
Statistics module 3hejeiehhwwhgsysysudhhdbb
No ratings yet
Statistics module 3hejeiehhwwhgsysysudhhdbb
44 pages
Correlation & Regression
No ratings yet
Correlation & Regression
10 pages
Correlation
No ratings yet
Correlation
4 pages
Correlation Analysis
No ratings yet
Correlation Analysis
48 pages
Correlation
No ratings yet
Correlation
27 pages
Correlation Analysis
No ratings yet
Correlation Analysis
16 pages
Correlation and Regression
No ratings yet
Correlation and Regression
22 pages
Correlation Analysis
No ratings yet
Correlation Analysis
30 pages
Correlation: Hapter
No ratings yet
Correlation: Hapter
16 pages
Core La Ti On
No ratings yet
Core La Ti On
12 pages
Correlation
No ratings yet
Correlation
19 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Lost Legends of Surf Guitar
100% (2)
Lost Legends of Surf Guitar
32 pages
Costa Rica - MPG - 2007 PDF
No ratings yet
Costa Rica - MPG - 2007 PDF
11 pages
3 Mission 35+ Vol. VII
No ratings yet
3 Mission 35+ Vol. VII
83 pages
Topic: Cheating And Hacking In Video Games Team 5: Hoàng Vũ Hải Nam Lê Gia Khoa Hứa Hải Minh Nguyễn Trường Sơn Time estimated: About 15 to 20 minutes
No ratings yet
Topic: Cheating And Hacking In Video Games Team 5: Hoàng Vũ Hải Nam Lê Gia Khoa Hứa Hải Minh Nguyễn Trường Sơn Time estimated: About 15 to 20 minutes
5 pages
Ethiopias-Quest-for-A-Seaport-A-Threat-to-Regional-Stability-Final
No ratings yet
Ethiopias-Quest-for-A-Seaport-A-Threat-to-Regional-Stability-Final
11 pages
Renata Insights H1 2019
No ratings yet
Renata Insights H1 2019
4 pages
PUB Engineering Procedures For ABC Waters Design Features
No ratings yet
PUB Engineering Procedures For ABC Waters Design Features
383 pages
boc-faq-chennai
No ratings yet
boc-faq-chennai
4 pages
Maths Igcse Textbook 14
No ratings yet
Maths Igcse Textbook 14
1 page
OceanofPDF.com Shelter for a Shifter - Lauren Connolly
No ratings yet
OceanofPDF.com Shelter for a Shifter - Lauren Connolly
309 pages
(123doc) - The-Impact-Of-Repetition-And-Recycling-On-Grade-11-Students-Vocabulary-Retention-In-Long-Hai-Phuoc-Tinh-High-School
No ratings yet
(123doc) - The-Impact-Of-Repetition-And-Recycling-On-Grade-11-Students-Vocabulary-Retention-In-Long-Hai-Phuoc-Tinh-High-School
6 pages
States of Matter G4
No ratings yet
States of Matter G4
22 pages
Class 7 Sci CH 1 Nutrition in Plant Notes
No ratings yet
Class 7 Sci CH 1 Nutrition in Plant Notes
2 pages
2022 - 2023 FINAL EXAM E-Tech
No ratings yet
2022 - 2023 FINAL EXAM E-Tech
4 pages
Physics Postmortem Notes
No ratings yet
Physics Postmortem Notes
28 pages
Review Exercises On Analytics MIS
No ratings yet
Review Exercises On Analytics MIS
2 pages
Yuki Onna 1
No ratings yet
Yuki Onna 1
5 pages
Peyote - An Insidious Evil - Herbert Welsh, 1918
No ratings yet
Peyote - An Insidious Evil - Herbert Welsh, 1918
16 pages
Appointement Letter - Dilip Kumar Ray
No ratings yet
Appointement Letter - Dilip Kumar Ray
3 pages
Roadmap For BJJ Halfguard
No ratings yet
Roadmap For BJJ Halfguard
1 page
COT 1-E-TEch Lesson Plan - SY 2021-2022
No ratings yet
COT 1-E-TEch Lesson Plan - SY 2021-2022
2 pages
Birch Ellis - ENG 101 Mid-Term Writing_ Reflective Cover Letter Draft (1)
No ratings yet
Birch Ellis - ENG 101 Mid-Term Writing_ Reflective Cover Letter Draft (1)
1 page
Brennan2017 Article Work-FamilyConflictModifiesThe
No ratings yet
Brennan2017 Article Work-FamilyConflictModifiesThe
6 pages
BEC Vantage Module 2.1 Reading
No ratings yet
BEC Vantage Module 2.1 Reading
2 pages
Poetry Focus Statements
100% (2)
Poetry Focus Statements
2 pages
Black Book
No ratings yet
Black Book
52 pages
ASSIGNMENT CHEMICAL BONDING aakash
No ratings yet
ASSIGNMENT CHEMICAL BONDING aakash
2 pages
Lesson Plan For High School Choir RCR
No ratings yet
Lesson Plan For High School Choir RCR
5 pages
Slim River Boq - Rev 1 - (RN)
No ratings yet
Slim River Boq - Rev 1 - (RN)
11 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Business Project 12 Content

Uploaded by

Business Project 12 Content

Uploaded by

CORRELATION

Methods of Measurement of Correlation

Graphic Method Algebric Method

1. Karl Pearson’s Coefficient of

Among these methods we will discuss only the following methods:

Diagram – III Diagram – IV

Profit (Rs. in Lakhs)

Karl Pearson’s Coefficient of Correlation:

Calculation of Karl Pearson’s coefficient of correlation

Ẋ = ∑X = 105 = 17.5 Ẏ = ∑Y = 300 = 50

∑(X−Ẋ)(Y−Ẏ) −240 −240

n∑XY−∑X ∑Y −564 −564

(30 ∗ 347) − (118 ∗ 93) −564

(10,410) − (10,974) Therefore, the correct value of correlation

0.3 * 4 = 9 1.2 = 9 SD(Y) = 9

Therefore the standard deviation of Y series = σ(Y) = 7.5

fdx -30 -24 0 18 56 20 ∑fdx

fdx2 60 24 0 18 112 214 ∑fdx2

dx = Mid Point of Series X – Assumed Mean of Series X = MP(X) - 225

n∑𝑓𝑑xdy−∑fdx ∑fdy (100∗46)− (20∗10)

(4,600)− (200) 4,400 4,400

In case of tie in ranks or equal ranks:

Interpretation: From the above calculation it can be observed that coefficient of

When value X repeated three times, m=3,

Spearman’s Rank Coefficient of Correlation = 0.75, which indicates there is high

Properties of Coefficient of Correlation:

Uses of Regression Analysis:

Distinction between Correlation and Regression

Regression Lines and Regression Equation:

2. Regression line of X on Y: This line gives the probable value of X (Dependent

Properties of Regression Coefficients:

Regression coefficient of X on Y: Regression coefficient of Y on X:

(8∗535) – (75∗50) (8∗535) – (75∗50)

Ẋ = ∑X = 500 = 50 Ẏ = ∑Y = 1000 = 100

Regression equation of X on Y: Regression equation of Y on X:

Computation of coefficient of correlation using regression coefficient:

Applications of Correlation and Regression

Some real-life applications of correlation and regression analysis:

Economics and Finance

In Stock Market Analysis

Portfolio Diversification: Investors rely on such correlation analysis to reveal the

Inflation Rate Forecasting: Furthermore, regression can be useful in predicting the

Unemployment Rate Forecasting: A regression analysis is also employed in

Advertising Effectiveness: The purpose of regression analysis is to examine how

Healthcare and Medicine

Application of Regression Analysis: Regression analysis is applied by the

Application of Correlation Analysis: The analyzing team relies on correlation

Regression Analysis in Academic Achievement: A school district is concerned

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.