Correlation and Regression

Correlation and Regression Analysis
Prepared by :
B.S. Parajuli
@ B. S. Parajuli
Correlation
• Correlation is a statistical device/tool used to measure
the degree of association between two or more
variables.
• Correlation is only concerned with strength of the
relationship
• No causal effect is implied with correlation
 The following are the types of correlation
i. Positive and Negative Correlation
ii. Linear and Non-linear Correlation
iii. Simple, Partial and Multiple Correlation
@ B. S. Parajuli
Positive Correlation:
 The relationship between two variables is such that as one variable’s
values tend to increase, the other variable’s values also tend to increase,
or if one variable’s values tend to decrease, the other variable’s values
also tend to decrease.
 That is positive correlation indicates that the variable’s values are
deviated in same direction.
Example: height and weight of children up to certain age, demand and
supply , income and expenditure of a person etc.
Negative Correlation:
 The relationship between two variables is such that as one variable’s
values tend to increase, the other variable’s values also tend to decrease,
or if one variable’s values tend to decrease, the other variable’s values
also tend to increase.
 That is positive correlation indicates that the variable’s values are
deviated in opposite direction.
Example: Altitude and Temperature, price and demand etc.
@ B. S. Parajuli
Linear Correlation and Non-linear Correlation
Distinction between linear and non linear correlation is based
up on the constancy of the ratio of change between the
variables.
Linear Correlation:
If the amount of change in one variable tends to bear constant ratio
to the amount of change in the other variable then the correlation is
said to be linear. X 10 20 30 40 50
Y 70 140 210 280 350
Non-linear Correlation: (Curvilinear )

Correlation would be called non linear or curvilinear if the amount
of change in one variable does not bear a constant ratio with the
amount of change in the other variable.
X 5 10 15 20 25
@ B. S. Parajuli Y 3 5 9 13 18
 Simple correlation : The degree of association between only
two variables( one dependent and one independent)
 Partial correlation: It is the degree of association between the

dependent variable and only one particular independent variable
and only one particular independent variable amongst many
independent variables keeping the effect of other independent
variables constant.
 Multiple correlation: It is also the degree of association

between three or more variables at a time. Actually this measures
the correlation between the dependent variable and other
independent variables.
@ B. S. Parajuli
Methods of Studying Correlation
Graphical
Scatter Diagram
Method
Karl Pearson’s
Correlation
Correlation coefficient
Rank Method
Algebraic Spearman’s Rank
Method Correlation
Coefficient
Correlation in
Bivariate
Frequency Table
@ B. S. Parajuli
Scatter Diagram:
Perfect Negative Low Degree of Positive

Perfect Positive Correlation Correlation
Y Low Degree of Negative
Y Correlation Y Y Correlation
O X O X O X
O X
High Degree of Positive High Degree of Negative

Correlation Correlation No Correlation Y Curvilinear
Y
Y Y Correlation
O X
O X O X
O X
@ B. S. Parajuli
Karl Pearson’s Correlation Coefficient
 Prof. Karl Pearson developed this method. Karl Pearson’s coefficient between two
variables X and Y denoted by ‘r’, “is the ratio of Co-variance between variables X andY to
the product of the standard deviations of X andY”.
 Thus ‘r’ is a numerical measures of linear relationship between them.
 It is a number, indicates to what extent two variables are related.
Cov.(X, Y) Cov.(X, Y)
r= =
𝑉𝑎𝑟(𝑋) 𝑉𝑎𝑟(𝑌) σx .σy
1
 Cov. X, Y = σ x − xത . y − yത
n
1
σ x− xത . y−ഥ
y σ x− xത . y−ഥ
y
n
r= =
1 1
σ(x−തx)2 . σ(y−ഥ 2 σ(x−തx)2 .σ(y−ഥ
y) 2
n n
y )
 n @=B.no of pair of observations

S. Parajuli
If x = X - 𝑋ത and y = Y - 𝑌ത deviation taken from mean
σ 𝑥𝑦 σ 𝑥𝑦
 r= =
σ 𝑥2. σ 𝑦2 𝑛.𝜎𝑥 .𝜎𝑦
𝐧 σ 𝑿𝒀− σ 𝑿 .(σ 𝐘)
 𝐫=
𝐧.σ 𝑿𝟐 − σ 𝑿 𝟐 𝐧.σ 𝒀𝟐 − σ 𝒀 𝟐
 By changing origin
If U = X – A and V = Y- B Where A = Assumed mean in values of X variable
B = Assumed mean in values of Y variable
𝐧 σ 𝑼𝑽− σ 𝑼 .(σ 𝐕)
Then r=
𝐧.σ 𝑼𝟐 − σ 𝑼 𝟐 𝐧.σ 𝑽𝟐 −(σ 𝑽)𝟐
For frequency distribution:
𝑵 σ 𝒇𝑿𝒀 − σ 𝒇𝑿 . (σ 𝒇𝐘)
𝐫=
𝐍. σ 𝒇𝑿𝟐 − σ 𝒇𝑿 𝟐 𝐍. σ 𝒇𝒀𝟐 − σ 𝒇𝒀 𝟐
@ B. S. Parajuli
 The value of r ranges between ( -1) and ( +1) i.e. -1≤ r≤1
High Moderate Low Low Moderate High
-1 -0.7 -0.5 0 0.5 0.7 +1

Perfect Perfect
Negative correlation Positive Correlation

@ B. S. Parajuli
S. N. X Y X2 Y2 X.Y
1 30 122 900 14884 3360
2 50 150 2500 22500 7500
3 . 40 122 1600 14884 4880
4 35 120 1225 14400 4200
5 55 140 3025 19600 7700
6 60 142 3600 20164 8520
7 43 150 1849 22500 6450
8 58 130 3364 16900 7540
9 65 145 4225 21025 9425
10 70 160 4900 25600 11200
ΣX = 506 ΣY = 1381 ΣX2 = 27188 ΣY2= 192457 ΣXY= 71075
𝑛.σ 𝑋𝑌− σ 𝑋.σ 𝑌

Now, ∴ 𝑟 = =
2 2
𝑛.σ 𝑋 − σ 𝑋 2 . 𝑛.σ 𝑌 − σ 𝑌 2
10 𝑥 71075 − 506 𝑥 1318
𝑟= = 0.72
10 𝑥 27188 − 506 2. 10 𝑥 192457 − 1318 2
And; 𝑟 2 = (0.72)2 = 0.52 ⇒ 𝑟 2 = 52%

@ B. S. Parajuli
Coefficient of determination:
 It measures the proportion of the variation in one
variable which can be explained by the variation in
second variable. It is denoted by 𝑟 2 .
 It provides the ration of explained variation to the total
variation
 There is high degree of correlation between the age of patient (X)

and blood pressure (Y) since r = 0.72.
Coeff. Of determination = 𝑟 2 = (0.72)2 = 0.5184 = 51.84%
 The variation in the dependent variable (Y) is due to the variation
in independent variable (X) is explained only 51.84% and the
remaining 48.16 % of the variation is still unexplained which is
due to the other factor than (X) which is not included in the
model.
@ B. S. Parajuli
Goodness of fit measure in terms of correlation coefficient
 Probable error is a statistical measure or testing the reliability of the

value of correlation coefficient . It is used to test the calculated
correlation coefficient whether it is significant or not.
 It is traditional measure
 Standard error of correlation coefficient is given by
𝟏 − 𝒓𝟐
S.E.(r) =
𝒏
Then probable error of r is :
(𝟏 − 𝒓𝟐 )
P.E.(r) = 0.6745 × S.E.(r) = 0.6745 ×
𝒏
Interpretation:
 If 𝑟 < P.E.(r) ; there is no evidence of correlation i.e. correlation is
not significant
 If 𝑟 > 6 P.E.(r) ; there is evidence of correlation i.e. correlation is
significant
 For other situations except these two cases, nothing can be stated.
Limits
@ B.of population correlation coefficient = r ± P.E.(r)
S. Parajuli
Rank Correlation
 When qualitative data such as honesty, beauty, character,
morality etc., which cannot be measured quantatively but can be
arranged serially. In such situation, Karl Pearson’s Coefficient of
Correlation cannot be used as such.
 Spearman’s (Charles Edward Spearman) rank correlation
coefficient compute the correlation between qualitative data.
 Rank correlation can be computed in the following
three conditions:
i. When ranks are given
ii. When ranks are not given and not repeated
iii. When ranks are not given and not repeated
@ B. S. Parajuli
Case I : When actual ranks are given
𝟔 σ 𝒅𝟐
Rank correlation coefficient 𝐑 = 𝟏 − 𝟐
𝒏(𝒏 −𝟏)
Where ∑d = 0 (always)
R1 = rank of the items with respect to one(first) variable
R2 = rank of the items with respect to second variable
The limits of R is -1 to +1.
Case II: When ranks are not given and not repeated
In this case first of all we assigned the rank for data from largest to smallest
OR smallest to largest as 1 , 2 and so on. Then procedure is same as
𝟔 σ 𝒅𝟐
R=𝟏−
𝒏(𝒏𝟐 −𝟏)
@ B. S. Parajuli
Case I : When actual ranks are given
 The ranking of ten students in Statistics and Microprocessor are
as follows. Calculate the coefficient of rank correlation?
Statistics (X) 5 3 4 10 8 7 2 1 6 9
Microprocessor (Y) 6 4 9 8 1 2 3 10 5 7
Rank of X (R1) Rank of Y(R2) d = R1- R2 d2
Now, rank correlation
5 6 -1 1
3 4 -1 1 𝟔 σ 𝒅𝟐
𝑹 =𝟏−
4 9 -5 25 𝒏(𝒏𝟐 − 𝟏)
10 8 2 4
6 x 192
=1−
8 1 7 49 10. (102 − 1)
7 2 5 25
1152 −162
=1− =
2 3 -1 1 990 990
= −0.164
1 10 -9 81
6 5 1 1
9 7 2 4
∑d = 0 ∑d2 =
@ B. S. Parajuli 192
Case II: Ranks are not given and not repeated
 The scores of 8 students in an examination of two subjects X and Y are as follows. Find the
rank correlation coefficient.
X 48 89 92 50 29 60 55 35
Y 60 49 69 45 50 51 70 75
X Rank of Y Rank of Y d = R1- d2
X (R1) (R2) R2
Now,
𝟔 σ 𝒅𝟐
48 6 60 4 2 4 𝑹 =𝟏−
𝒏(𝒏𝟐 −𝟏)
89 2 49 7 -5 25 6 × 90
=1−
92 1 69 3 -2 4 8(82 − 1)
50 5 45 8 -3 9 540
=1−
29 8 50 6 2 4 8(64 − 1)
60 3 51 5 -2 4 504 − 540 −36
= =
55 4 70 2 2 4 504 504
= −0.071
36 7 75 1 6 36
∑d =0 ∑d2
@ B. S. Parajuli =90
Case III : Repeated Ranks are given (tied Ranks)
Sometimes it may happen that two or more variant values are equal. In
such case it is necessary to rank two or more items as equal or repeated or
tie. In such case each item should give an average or a common rank.
2+3
If the 2nd and 3rd items have same value then rank for each of them is
2
=2.5 and rank of next item is 4.
Then in this case, some adjustments are to be made in the spearman’s
𝒎(𝒎𝟐 −𝟏)
formula. The adjustment factor σ is to be added to σ 𝒅𝟐
𝟏𝟐
𝒎𝟏 𝒎𝟐 −𝟏 𝒎𝟐 𝒎𝟐 −𝟏
𝟔{σ 𝒅𝟐 + 𝟏
𝟏𝟐
+ 𝟐
𝟏𝟐
+ ……….}
R=𝟏−
𝒏(𝒏𝟐 −𝟏)
Where m1, m2 ,…. be the number of times that an item is repeated.
@ B. S. Parajuli
Case III: Repeated Ranks are given (tied Ranks)
 From the following data, compute the coefficient of rank
correlation between X and Y.
X 33 56 50 65 44 38 44 50 15 26
Y 50 35 70 25 35 58 75 60 55 26
Solution: In the given data, X values of 50 and 44 are repeated twice

and in Y values 35 repeated twice. So, we use Spearman’s rank
correlation and formula for calculating rank correlation is as
𝒎𝟏 𝒎𝟐 −𝟏 𝒎𝟐 𝒎𝟐 −𝟏 𝒎𝟑 𝒎𝟐 −𝟏
𝟐
𝟔{σ 𝒅 + 𝟏
+ 𝟐
+ 𝟑
}
𝟏𝟐 𝟏𝟐 𝟏𝟐
𝐑 =𝟏−
𝒏(𝒏𝟐 −𝟏)
Where,
d = difference of rank
m1, m2, m3 = number of values whose ranks are common.
@ B. S. Parajuli
Calculation
Score (X) Rank of X(R1) Score (Y) Rank of Y(R2) d = R1- R2 d2
33 8 50 6 2 4.00
56 2 35 7.5 -5.5 30.25
50 3.5 70 2 1.5 2.25
65 1 25 10 -9 81.00
44 5.5 35 7.5 -2 4.00
38 7 58 4 3 9.00
44 5.5 75 1 4.5 20.25
50 3.5 60 3 0.5 0.25
15 10 55 5 5 25.00
26 9 26 9 0 0.00
∑d = 0 ∑d2= 176
@ B. S. Parajuli
 Here, n = 10
 For series X : no. of times the value 50 repeated = m1 = 2
no. of times the value 44 repeated = m2 = 2
• For series Y: no. of times the value of 35 repeated = m3 = 2
𝒎𝟏 𝒎𝟐 −𝟏 𝒎𝟐 𝒎𝟐 −𝟏 𝒎𝟑 𝒎𝟐 −𝟏
𝟔{σ 𝒅𝟐 + 𝟏
𝟏𝟐
+ 𝟐
𝟏𝟐
+ 𝟑
𝟏𝟐
}
• R= 𝟏 −
𝒏(𝒏𝟐 −𝟏)
2(22 −1) 2(22 −1) 2(22 −1)
6[176+ 12 + 12 + 12 ]
 R=1−
10(102 −1)
6 176+1.5
 =1− = −0.07
990
@ B. S. Parajuli
Regression Analysis
 Regression analysis is a mathematical measure of the

average relationship between two or more variables in
terms of the original units of the data.
Regression analysis is used to:
 Predict the value of a dependent variable based on the value of at least
one independent variable
 Explain the impact of changes in an independent variable on the
dependent variable
Variables
Independent: variable which Dependent: variable whose value is
influences the values or is used for influenced or is to be predicted
prediction
Regressor/ predictor /explanatory, Regressed or explained /outcome
@ B. S. Parajuli
variable variable/response variable
Simple Regression
 Only one independent variable, X
 Relationship between X and Y is described by a linear function
 Changes inY are assumed to be related to changes in X
 If the graph of dependent and independent variables shows a linear trend then it is called
linear regression. But if it is not in a straight line it is called non-linear.
Simple Linear Regression Model

Population Population Slope
Dependent Independent
Y intercept Coefficient
Variable Variable
Random
Yi = β 0 + β1Xi + ε i Error
term
Linear component
 A line fitted to a set of data points to estimates the relationship
between two variables is called regression line. (y = a + bx)
@ B. S. Parajuli
Y Yi = β0 + β1Xi + ε i
Observed Value
of Y for Xi
εi Slope = β1
Predicted Value
of Y for Xi Random Error for this Xi
value
Intercept = β0
X
Xi
@ B. S. Parajuli
 The two regression lines of regression
Y
Regression line of Y on X
(Y = a +bX)
(𝑋ഥ , 𝑌)
ത Points of intersection
Regression line of X on Y
(X = a +bY)
X
O (0,0)
@ B. S. Parajuli
Least square method
Let Y = a + bX ….. (1) be linear regression equation of Y on X.
WhereY = dependent variable and X = independent variable
a = constant orY –intercept (Value ofY when X = 0)
b= byx = regression coefficient of Y on X . It measures the average
change in dependent variable Y corresponding to a unit change in
independent variable X
By using the principles of least square, we can get two normal equations
of regression equation (1) are as:
∑Y = na +b∑X………….(2)
∑XY = a∑X + b∑X2…….(3)
By solving equations (2) & (3) we get the value of ‘a’ & ‘b’ and after
substituting them in the equation (1) we get required fitted regression
line. @ B. S. Parajuli
Alternatively
n.σ XY− σ X.σ Y ഥ .Y
σ XY− n. X ഥ 𝜎𝑦
b = byx = σ 2 σ 2 = σ 2 ഥ )2
also b = byx = r .
n. X − ( X) X − n(X 𝜎𝑥
σ 𝑥𝑦
b = byx = σ 2 where x = X - 𝑋ത and y = Y - 𝑌ത
𝑥
deviation/variates measured from their means
The computational formula for y-intercept ‘a’ is as:
σ𝑌 σ𝑋
ത ത
𝑎 = 𝑌 − 𝑏𝑋 = − 𝑏. .
𝑛 𝑛
After finding the value of ‘a’ & ‘b’, we get the required fitted linear
regression model of y on x as:
𝑌෠ = a + bX.
Alternatively we can also useY- 𝑌ത = byx (X - 𝑋)
ത
@ B. S. Parajuli
Let X = a + bY ….. (1) be linear regression equation of X onY.
Where X = dependent variable andY = independent variable
a = constant or X –intercept (Value of X whenY = 0)
b= bxy = regression coefficient of X on Y . It measures the average
change in dependent variable X corresponding to a unit change in
independent variableY
By using the principles of least square, we can get two normal
equations of regression equation (1) are as:
∑X = na +b∑Y………….(2)
∑XY = a∑Y + b∑Y2…….(3)
By solving equations (2) & (3) we get the value of ‘a’ & ‘b’ and after
substituting them in the equation (1) we get required fitted
regression line.
@ B. S. Parajuli
Alternatively
n.σ XY− σ X.σ Y ഥ .Y
σ XY− n. X ഥ 𝜎𝑋
b = bxy = σ 2 σ 2 = ഥ )2
σ Y2 − n(Y
also b = bXY = r .
n. Y − ( Y) 𝑌
σ 𝑥𝑦
b = bXY = σ 2 where x = X - 𝑋ത and y = Y - 𝑌ത deviation/variates
𝑦
measured from their means
The computational formula for y-intercept ‘a’ is as:
σ𝑋 σ𝑌
ത
𝑎 = 𝑋 − 𝑏𝑌 = ത − 𝑏. .
𝑛 𝑛
After finding the value of ‘a’ & ‘b’, we get the required fitted linear regression
model of y on x as:
𝑋෠ = a + bY.
Alternatively we can also use X- 𝑋ത = bxy (Y - 𝑌) ത
 Correlation coefficient (r) = ± 𝒃𝒀𝑿 . 𝒃𝑿𝒀
If both regression coefficients are positive we take Positive sign and if
both are negative we take negative sign.
@ B. S. Parajuli
Example
From the following data:
X 0 2 3 5 6
Y 5 7 8 10 12
i. Find the coefficient of correlation (r).

ii. Find coefficient of determination and interpret the result.
iii.Find the regression equation y on x.
iv. Estimate the value of Y when X = 4.
v. Find the standard error of estimate and interpret its
meaning.
vi. Test the significance of ‘r’.
@ B. S. Parajuli
Solution
X Y X2 Y2 X .Y
0 5 0 25 0
2 7 4 49 14
3 8 9 64 24
5 10 25 100 50
6 12 36 144 72
σ 𝑋 = 16 σ 𝑌 = 42 σ 𝑋 2 = 74 σ 𝑌 2 = 382 σ 𝑋𝑌 = 160
σ𝑌
ത=
 𝑌 =
42
= 8.4 𝑎𝑛𝑑 ത = σ𝑋
𝑋 =
16
= 3.2
𝑛 5 𝑛 5
[n σ xy− σ x .(σ y)]
 (i) r =
n.σ x2 − σ x 2 .[n.σ y2 − σ y 2 ]
5 𝑥 160 −16 𝑥 42 128 128
 r= = = = 0.992
5 𝑥 74 −162 . 5 𝑥 382 − 422 114 𝑥 146 129.01
 (ii) 𝑟 2 = (0.992)2 = 0.984 = 98.4% of total variation on Y is explained by variation in X.
 (iii) Let y = a + b x ….(i) be a regression equation y on x.
n.σ XY− σ X.σ Y 128
 b= = = 1.12 𝑎𝑛𝑑 𝑎 = 𝑌ത − 𝑏𝑋ത = 8.4 − 1.12 𝑥 3.2 = 4.8
n.σ X2 − (σ X)2 114
@ B. S. Parajuli
Hence the fitted regression equation of Y on X is
𝑌෠ = 4.8 + 1.12𝑋
Interpretation of regression coefficient:
Since b = 1.12 which means if we increase X by 1 unit then on average Y
increase by 1.12 units
(iv) The value of Y when X = 4 is
𝑌෠ = 4.8 + 1.12 × 4 = 9.28
(v) standard error of estimate
σ Y 2 − a σ Y − b σ XY 382 − 4.8x42 − 1.12x160
Se = =
n−2 5−2
Se = 0.63
Hence average variability around the fitted regression line is 0.63.
Since Se ≠ 0 . The regression line is not perfect for estimating the
dependent variable.
@ B. S. Parajuli
Correlation Analysis Regression Analysis
1 Correlation analysis is the statistical tool, which is 1 Regression analysis is a measure expressing the average
used to describe the degree to which the variables relationship between the two variables whether the
are linearly related. variables are linearly or non-linearly related.
2 Correlation coefficient (r) between two variables 2 Regression analysis is concerned to establishing the
X and Y is the measure of the direction and degree functional relationship between two variables under study.
of the linear relationship between two variables. It is used to predict or estimate the value of dependent
variable for any given value of the independent variable.
3 Correlation coefficients are symmetric i. e. rXY = 3 Regression coefficients are not symmetrical i. e. bYX ≠ bXY
rYX
4 Correlation need not imply cause and effect 4 It clearly indicates the cause and effect relationship
relationship between the variables under study. between the variables. The variables corresponding to
cause and effect are known as independent and dependent
variables respectively.
5 Correlation coefficient (r) is a pure number i. e. 5 Regression coefficients are not a pure number i. e.
independent of unit of measurement. attached with unit of measurement.
6 Correlation analysis has limited applications in 6 Regression analysis has much wide applications.
comparison with regression analysis.
@ B. S. Parajuli
Measures of Variation
Total Sum of Square = Sum of square due to regression + Sum of square due to
regression
TSS = SSR + SSE
For Y = a+ bX
Total sum of square: is the measure of variation in the value of dependent variable (Y)
ത
around their mean value (𝑌).
∴ 𝑻𝑺𝑺 = σ 𝐘 – 𝐘 ഥ 𝟐 = σ𝐘 𝟐 − 𝐧. 𝐘ഥ𝟐
Regression sum of square: is the sum of the squared differences between the predicted
value of Y and the mean value of Y.
∴ 𝑺𝑺𝑹 = σ 𝐘 ෡– 𝐘 ഥ 𝟐 = 𝐚 σ 𝐘 + 𝐛 σ 𝐗𝐘 − 𝐧. 𝐘 ഥ𝟐
Error sum of square: is the sum of the squared differences between the observed value of Y
and the predicted value of Y.
𝟐
෡
∴ 𝑺𝑺𝑬 = σ 𝐘 – 𝐘 = ෍ 𝒀𝟐 − 𝒂 ෍ 𝒀 − 𝒃 ෍ 𝑿𝒀
𝑆𝑆𝑅
Thus, coeff of determination 𝑟 2 = but SST = SSE + SSR then SSR = TSS – SSE
𝑇𝑆𝑆
@ B. S. Parajuli
Standard Error of Estimate
 The standard error of estimate of Y on X, denoted by Se measures the average variation
or scatteredness of observes data point around the regression line.
 It is used to measure the reliability of the regression equation. It is calculated by the
following relation:
𝟐
𝑺𝑺𝑬 ෡
σ 𝐘–𝐘
∴ 𝑺𝒆 = MSE = = ( for simple regression k =1)
𝒏−𝒌−𝟏 𝒏−𝟐
σ 𝒀𝟐 − 𝒂 σ 𝒀 − 𝒃 σ 𝑿𝒀
∴ 𝑺𝒆 =
𝒏−𝟐
n = no of pair of observations
k = no of independent variables
If 𝑺𝒆 = 0 there is no variation of observed data around the regression line. In such case
regression line is perfect for estimating the dependent variable.
Residual (Error):
The residual or error term for the given data X in the model is the difference between
actual value of Y and estimated or predicted value of Y i.e. 𝑌෠ and it is denoted by e.
e = Y -@𝑌෠B. S. Parajuli
Example: A computer operator is interested to know how data rate of internet users depends
upon the bandwidth, the following result were gathered by the operator.
Band width: 17 35 41 19 25 20 10 15
Data Rate: 47 64 68 50 60 55 30 33
a. Is there any association between bandwidth and data rate?
b. Fit the regression model to describe the given data an also interpret the estimated
regression coefficient.
c. What percentage of variation on data rate is explained by the variation on bandwidth?
d. Estimate the data rate when band width is 22.
Example: You are given the following data :
𝑋ത = 20 , 𝑌ത = 20 , 𝜎𝑋 = 4, 𝜎𝑌 = 3, r = 0.7 . Obtain the two regression equations and find the
most likely value of Y when X = 24
Solution:
Hint : The line of Y on X is
𝜎
Y - 𝑌ത = r . 𝜎𝑌 ( X - 𝑋)
ത
𝑋
Then When X = 24 estimate Y.

Similarly The line of X on Y is
𝜎
X - 𝑋ത = r . 𝜎𝑋 ( Y - 𝑌)
ത
𝑌
@ B. S. Parajuli
Example: In the estimation of regression equations of two variables X and Y the
following results were obtained:
𝑋ത = 90 , 𝑌ത = 90 , n = 10, σ 𝑥 2 = 6360, σ 𝑦 2 = 2860, σ 𝑥𝑦 = 3900
Where x and y are deviations from respective means. Obtain the two equations.
Solution:
σ 𝑥𝑦 σ 𝑥𝑦
byx = σ and bXY = σ Then obtain two equations as follows:
𝑥2 𝑦2
Y- 𝑌ത = byx (X - 𝑋)
ത and X- 𝑋ത = bxy (Y - 𝑌)
ത
Example: Given the following data:
Variance of X = 9 , Regression equations: 4X – 5Y + 33 = 0 and 20X – 9Y -107 = 0. Find:
i. The mean values of X and Y
ii. The coefficients of correlation between X and Y.
iii. The standard deviation of Y.
Solution:
(i) We know that regression equation passes through their mean values (𝑋, ത 𝑌ത ).
∴ 4 𝑋ത – 5 𝑌ത = - 33………….(1)
20 𝑋ത – 9 𝑌ത = 107 …………(2)
ത 𝑌ത ) = (13, 17)
By solving these equation we get required mean values are (𝑋,
@ B. S. Parajuli
(ii) We have r = ± 𝑏𝑌𝑋 . 𝑏𝑋𝑌
Let the given line 4X – 5Y + 33 = 0 is regression equation of Y on X
4
comparing with Y = a + bX we get 𝑏𝑌𝑋 = 5
Let the second line 20X – 9Y -107 = 0 is regression equation of X on Y
9
comparing with X = a + bY we get 𝑏𝑋𝑌 = 20
4 9 6
Then r = × 20 = 10 = 0.6
5
(iii) We have
𝜎𝑦
byx = r .𝜎
𝑥
4 6 𝜎𝑦
Or, 5 = 10 × (∵ 𝜎𝑥2 = 9)
3
Or, 4 = 𝜎𝑦
∴ 𝜎𝑦 = 4
@ B. S. Parajuli
Example: For fifty files transmitted, the regression equation of time taken (Y) on the
transmission of file size (X) is 4Y -5X - 8 = 0. Average size of transmission files is 40 GB. The
ratio of the standard deviation 𝜎𝑌 : 𝜎𝑋 is 5 :2. Find the average time taken to transmit file and
the coefficient of correlation between the time and size of the file.
Solution: Here, we have given the number of files to be transmitted i.e. n = 50 regression line of time taken (Y) on size
𝜎 5
of file (X) is 4Y -5X -8 = 0 𝑋ത = 40 , 𝑌 =
𝜎𝑋 2
Here, we have to obtain Y and 𝑟𝑋𝑌
ത 𝑌ത ).
We know, the regression equation passes through their mean values (𝑋,
Or, 4𝑌ത -5 𝑋ത = 8
or, 4 𝑌ത -5 × 40 = 8
or, 4 𝑌ത = 208
∴ 𝑌ത = 52
∴The average time taken to transmit files (𝑌ത ) = 52. 𝜎
Again, 𝑏𝑌𝑋 = r. 𝜎𝑌
Since, the above line is Y on X, 5 5
𝑋
𝜎𝑌 5
so, 4Y = 8 + 5X Or, 4
= r. 2
[ ∵ 𝜎𝑋
= 2
]
8 5 2
or, Y = + X Or, =r
4 4 4
1
5 ∴ 𝑟𝑋𝑌 = 2 = 0.5
or, Y = 2 + X
4
5
Hence the regression coefficient of Y on X i.e. 𝑏𝑌𝑋 = 4
@ B. S. Parajuli

Correlation and Regression

Uploaded by

Copyright:

Available Formats

Correlation and Regression

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Correlation and Regression

Uploaded by

Copyright:

Available Formats

Correlation and Regression Analysis

Non-linear Correlation: (Curvilinear )

 Partial correlation: It is the degree of association between the

 Multiple correlation: It is also the degree of association

Perfect Negative Low Degree of Positive

High Degree of Positive High Degree of Negative

 n @=B.no of pair of observations

High Moderate Low Low Moderate High

-1 -0.7 -0.5 0 0.5 0.7 +1

Negative correlation Positive Correlation

𝑛.σ 𝑋𝑌− σ 𝑋.σ 𝑌

And; 𝑟 2 = (0.72)2 = 0.52 ⇒ 𝑟 2 = 52%

 There is high degree of correlation between the age of patient (X)

 Probable error is a statistical measure or testing the reliability of the

Solution: In the given data, X values of 50 and 44 are repeated twice

 Regression analysis is a mathematical measure of the

Simple Linear Regression Model

i. Find the coefficient of correlation (r).

Then When X = 24 estimate Y.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.