r23 p & s Unit 2 Material

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

UNIT – 2 :: PART – 1/3 :: CORRELATION

Syllabus:-Correlation – Correlation Coefficient – Rank correlation – Regression Coefficients and


properties – Regression lines – Method of least squares – Straight line – Parabola – Exponential – Power
Curves.

Introduction :- Correlation is a statistical measure that represents the relation between two statistical
variables. The relationship between the two random variables is measured by a single number ”r” called
as Correlation coefficient. It is a numerical measure of linear relationship between the variables.

Q: Define Correlation. Explain briefly the different types of Correlation?

Ans: Correlation: - If two quantities are related to each other in such a way that, if there is a change in
any one of these quantities then the other quantity also change. Therefore, these quantities are said to
be correlated. And the relationship between these quantities is called “Correlation”. Moreover, th
measurement of the degree of relationship between these quantities is called “Correlation Analysis”.

Ex. 1] Income and Expenditure

2] Rainfall and agricultural production.

3] Job experience and Job opportunity.

Approximate degrees of correlation are listed below.

“r” Value for


Degree of Correlation
Positive Correlation Negative Correlation
1.Perfect Correlation +1 - 1
2. Very High degree of Correlation +0.9 to +1 -1 to -0.9
3. Sufficiently high degree of Correlation +0.75 to +0.9 -0.9 to -0.75
4. Moderate degree of correlation +0.6 to +0.75 -0.75 to -0.6
5. Only possibility of Correlation +0.3 to +0.6 -0.6 to -0.3
6. Possibly no correlation 0 to +0.3 -0.3 to 0
7. Absence of Correlation 0 0
Properties of Correlation Coefficient (r):

1. Correlation Coefficient lies between -1 and +1.


2. Two independent variables are uncorrelated.

Types if Correlation :- Correlation is classified into three different types. They are

1. Based on the Direction of change of the variable.


2. Based on the Number of variables being Identified.
3. Based on the ratio of change between the variables that are being identified.

Based on the Direction of change of the variable :-

(a) Positive Correlation:- If the change is in the same direction i.e., if a variable increases then the
other variable also increases. And, if a variable decreases then the other variable also
decreases. Then, this type of correlation is said to be a positive correlation.

x 15 29 38 40 55 71
y 11 14 18 35 44 65
(b) Negative Correlation:- If the change is in opposite direction i.e., if a variable increases then the
other variable decreases. And if a variable decreases then the other variable increases. This
type of correlation is said to be a “Negative Correlation”.

x 72 53 45 39 29 13
y 8 18 28 38 48 58

Based on the Number of variables being Identified:

(a) Simple Correlation:- When the two variables are identified then the relationship between them
is said to be “Simple Correlation”.
(b) Partial Correlation:- The study of two variables excluding some other variables is called partial
correlation. For example, we study price and demand, eliminating the supply side.
(c) Multiple Correlation:- When more than two variables are identified , and only the relationship
between a single dependable variable and multiple independent variables is considered. Then,
this type of correlation is said to “Multiple Correlation”

Based on the ratio of change between the variables that are being identified:

(a) Linear Correlation:-

If the ratio of change between two variables is constant, then


the relationship between them is said to be Linear Correlation.

(b) Non-Linear Correlation:-

If the ratio of change between two variables is not constant


i.e., the change in ratio of the variable is different from change
in ratio of the other variable. Then, this type of relationship is
said to be Non-Linear Correlation (or) Curvilinear Correlation.

Coefficient of Correlation:- Correlation is a statistical technique used for analyzing the behavior of two
or more variables. Its analysis deals with the association, between two or more variables statistical
measures of correlation relates to Co variation between series but not of function or casual relationship.

Karl Pearson’s Coefficient of Correlation:-

Karl Pearson’s Coefficient of Correlation between the variables x and y is denoted by r or r(x, y)
or rxy and is defined as

Cov( x , y ) ∑ x . y ∑ X .Y
r= = =
σx.σy N .σx.σy √ ∑ X 2 . √∑ Y 2
Where X =x − x and Y= y − y
x = Mean of the x-series; y = Mean of the y-series
σx = S.D. of the x-series ;
σy = S.D. of the y-series.

N= Number of paired observations.

r=
∑ X.Y
Note:- 1. The above formula √∑ X 2 . √∑ Y 2 is used only when mean values of x and y series are
integers.

2] If x, y are random variables and a, b, c, d are any real numbers such the a≠0 and c≠0 then
a.c
r( ax+b, cy+d) = r( x, y ).
|a.c|
Q1] Find if there is any significant correlation between the heights and weights given below:

Height in inches 57 59 62 63 64 65 55 58 57
Weight in lbs. 113 117 126 126 130 129 111 11 112
6

Q2] Psychological tests of intelligence and of engineering ability were applied to 10 students. Here is a
record of ungrouped data showing intelligence ratio (I.R.) and Engineering ratio (E.R.) calculate the
Coefficient of Correlation.

Student A B C D E F G H I J
I.R. 105 104 102 101 100 99 98 96 93 92
E.R. 101 103 100 98 95 96 104 92 97 94

Q3] Find Karl Pearson’s coefficient of correlation from the following data:

Wages 100 101 102 102 100 99 97 98 96 95


Cost of living 98 99 99 97 95 92 95 94 90 91

When deviations are taken from an Assumed Mean:-

When actual mean is not an integer, but a fraction or when the series is large, the calculation by
direct method will involve a lot of time. To avoid such tedious calculation, we can use the assumed
mean method.

∑ X.∑Y
∑ X .Y − N
r=

√ √
2 2
(∑ X ) (∑ Y )
∑X 2

N
. ∑Y 2

N
.

Where X = x – A ; Y = y – A; N = No of paired observations.

Q4] Calculate Karl Pearson’s correlation coefficient for the following data:
x 28 41 40 38 35 33 40 32 36 33
y 23 34 33 34 30 26 28 31 36 38
Q5] Find out the coefficient of correlation in the following case:

Height of father [in inches] 65 66 67 64 68 69 71 73


Height of son [in inches] 67 68 64 68 72 70 69 70

Q6] Calculate Karl Pearson’s correlation coefficient for the following data”

x 38 45 46 38 35 38 46 32 36 38
y 28 34 38 34 36 26 28 29 25 36

Rank Correlation Coefficient :- A British Psychologist Charles Adward Spearman found out the method
of finding the coefficient of correlation by ranks. It is based on the ranks given to the observations.
Rank Correlation is applicable only to the individual observations. The formula for Spearman’s rank
correlation is given by

6 . ∑ D2
ρ=1−
N ( N 2 −1 )
Where ρ : Rank Coefficient of correlation
D2 : Sum of the squares of the difference of two ranks
N : Number of paired observations
Properties of Rank Correlation coefficient :-

1. The value of rank correlation lines between +1 and -1.


2. If rank correlation is +1, there is complete agreement in the order if the ranks and the direction
of the rank is same.
3. If rank correlation is -1, then there is complete disagreement in the order of the ranks and they
are in opposite directions,

Q7: A random sample of 5 college students is selected and their grades in Mathematics and Statistics are
found to be

Student A B C D E
Mathematics 85 60 73 40 90
Statistics 93 75 65 50 80

Q8] Following are the ranks obtained by 10 students in two subjects x and y. Find the rank correlation
between x and y.

x 1 2 3 4 5 6 7 8 9 10
y 2 4 1 5 3 9 7 10 6 8

Q9] Calculate the coefficient of correlation between x and y from the following data:

x 1 2 3 4 5 6 7
y 2 4 5 3 8 6 7

Q10]The ranks of same 16 students in Mathematics and Physics are as follows. Two numbers within
brackets denote the ranks of the students in Mathematics and Physics: (1,1), (2,10), (3,3), (4,4), (5,5),
(6,7), (7,2), (8,6), (9,8), (10,11), (11,15), (12, 9), (13,14), (14,12), (15,16), (16,13). Calculate the rank
correlation coefficient for proficiencies of this group in Mathematics an Physics.

Rank Correlation for Equal or Repeated Ranks:- If any two or more persons are bracketed equal in any
classification or if there is more than one item with the same value in the series then the Spearman’s
formula for calculating the rank correlation coefficient breaks down. In this case common ranks are
given to repeated items. The common rank is the average of the ranks which these items would have
assumed, if they were different from each other and the next item will get the rank next to ranks all
ready assumed.

For example, If two individuals are placed in the 7 th place and 8th place, each of them are given
the rank 7.5 and the next rank will be 9 th place. Similarly if 3 are ranked equal at the 7 th place, 8th place,
9th place then they are given the rank (7+8+9)/3 = 8 which is the common rank assigned to each and
every and the next rank will be 10th place. We use a slightly difference in formula, show in given below:

1 1
6 .[ ∑ D2 + (m3 −m)+ (m3 −m)+. .. .. . .. .. ]
12 12
ρ=1−
N ( N 2−1 ) , where m = the number of items whose
ranks are common.

Q11] From the following data, Calculate the rank correlation coefficient after making adjustment for tied
ranks;

x 48 33 40 9 16 16 65 24 16 57
y 13 13 24 6 15 4 20 9 6 19

Q12] A sample of 12 fathers and their eldest sons gave the following data about their height in inches.

Fathe 65 63 67 64 68 62 70 66 68 67 69 71
r
Son 68 66 68 65 69 66 68 65 71 67 68 70

Q13] Obtain the rank correlation coefficient for the following data:

x 68 64 75 50 64 80 75 40 55 64
y 62 58 68 45 81 60 68 48 50 70

- oOo -
UNIT – 2 :: PART – 2/3 :: REGRESSION

Syllabus:-Correlation – Correlation Coefficient – Rank correlation – Regression Coefficients and


properties – Regression lines – Method of least squares – Straight line – Parabola – Exponential – Power
Curves.

REGRESSION:- The statistical method which helps us to estimate the unknown value of one variable
from the known value of the related variable is called regression.

The line described in the average relationship between two variables is known as line of
regression. Now a days we are using the term estimating line instead of regression line.

REGRESSION EQUATION :- Regression equation is an algebraic expression of the regression line. The
standard form of the regression equation is y = a + bx , where a, b are constants. Here ‘a’ indicates the
value of y when x = 0. It is called y-intercept. ‘b’ indicates the value of slope of the regression line an
gives a measure of change of y for a unit change in x. It is also called the Regression coefficient of y on x.
Thus, if we know the value of a and b we can easily calculate the value of y for any given value of x.

Deviation taken from Arithmetic Mean of X on Y and Y on X :-

σx
x − x =r. (y − y)
Regression equation of X on Y is σy

σy
y − y = r. (x − x)
Regression equation of Y on X is
σx

Where x =Mean of X – Series


y = Mean of Y-Series

r= Coefficient of correlation

σ x = S.D. of X-Series

σ y = S.D. of Y-Series

b xy =r .
σx
=
∑ XY
b
The regression coefficient of X on Y is denoted by xy and defined as
σy ∑Y2

b yx =r .
σy
=
∑ XY
b
The regression coefficient of Y on X is denoted by yx and defined as
σx ∑ X2
Where X =x − x and Y= y − y
σx σ y
r. r.
b b
Note: xy . yx =
2

σ y . σ x =r ⇒ r= b xy . b yx is the coefficient of correlation.

Q14] Calculate the regression equation of Y on X from the following data, taking deviations from actual
means of X and Y, and Estimate the likely demand when the price is Rs. 20.

Price (Rs.) 10 12 13 12 16 15
Amount Demanded 40 38 43 45 37 43

Q15] Find the equations to the lines of regression for the following data and also find coefficient of
correlation.

x 1 2 3 4 5 6 7 8 9
y 9 8 10 12 11 13 14 16 15

Deviations taken from the Assumed Mean :-

If the actual mean is fraction then we use this method. In this method we take deviations from the
assumed mean instead of Arithmetic mean.

σx
x − x =r. (y − y)
Regression equation of X on Y is σy

σy
y − y = r. (x − x)
Regression equation of Y on X is
σx

Where x =Mean of X – Series


y = Mean of Y-Series

r= Coefficient of correlation

σ x = S.D. of X-Series

σ y = S.D. of Y-Series

∑ X .∑ Y
σx ∑ XY − N
b xy =r . = 2
σy (∑ Y )
b ∑Y 2

N
The regression coefficient of X on Y is denoted by xy and defined as
∑ X .∑ Y
σy ∑ XY − N
b yx =r . = 2
σx (∑ X )
b ∑X 2

N
The regression coefficient of Y on X is denoted by yx and defined as

Where
X =x − A x and Y = y − A y

A x =Assumed Mean of X- Series and A y =Assumed Mean of Y- Series

Q16] The heights of mothers and daughters are given in the following table. From the table, estimate
the expected average height of daughter when the height of the mother is 64.5 inches by using
regression equation.

Height of the Mother (inches) 62 63 64 64 65 66 68 70


Height of the daughter (inches) 64 65 61 69 67 68 71 65

Q17] Fit a regression line of Y on X for the following data

x 1 2 3 4 5 6 7 8 9 10
y 10 9 8 7 6 5 4 3 2 1

Q18] Price indices of cotton and wool are given below for the 12 months of a year. Obtain the equations
of lines of regression between the indices.

Price index of cotton (x) 78 77 85 88 87 82 81 77 76 83 97 93


Price index of wool (y) 84 82 82 85 89 90 88 92 83 89 98 99

Q19] The following data, based on 450 students, are given for marks in Statistics and Economics at a
certain examination.
Mean Marks in Statistics = 40
Mean Marks in Economics = 8
S.D. of Marks in Statistics = 12
Variance of Marks in Economics = 256
Sum of the products of deviations of marks from this respective mean 42075.
Find the equations of the two lines of regression and estimate the average marks in Economics of
candidates who obtained 50 marks in Statistics.

Q20] Find the most likely production corresponding to a rainfall 40 from the following data.
Rain fall (x) Production (y)
Average 30 500 Kgs.
S.D. 5 100 Kgs.
Coefficient of Correlation 0.8

Q21] In a partially destroyed laboratory, record of an analysis of correlation data, the following only are
legible: Variance of X = 9, Regression equations 8x – 10y + 66 = 0, 40x - 18y = 214 . What are (i) the
mean values of X and Y. (ii) the correlation coefficient between X and Y and (iii) the S.D. of Y?
Q22] Two random variables have the regression lines with equations 3x + 2y = 26 and 6x + y = 31. Find
the mean values and the correlation coefficient between x and y.

Q23] The equations of two regression lines are 7x – 16y + 9 = 0 and 5y – 4x – 3 = 0, find the coefficient of
correlation and the means of x and y.

Angle between two regression Lines :-


σx
x − x =r. (y − y)
Let the lines of regression of X on Y and Y on X are given by σy and
σy 1 σ σ
y − y = r. (x − x) m1 = . y and m2=r . y
σx . Slope of the lines are
r σx σx
Let Ɵ be the angle between the above two lines then
1 σy σ
. −r . y
m1 −m2 r σx σx
tan θ= =
1+m 1 . m2 σ 2y
1+
σ 2x
σy 1
. −r ( ) =σ
=
σx r
σ 2x + σ 2y
y
.
σ 2x
σ x σ 2x +σ 2y
. ( 1−r 2
r )
σ 2x
σx . σ y
( )
2
1−r
tan θ= .
σ 2x + σ 2y r

Q24] If Ɵ is the angle between two regression lines and S.D. of Y is twice the S.D. of X and r = 0.25, find
tan Ɵ.

Q25] If
σ x =σ y=σ and the angle between the regression lines is
tan−1 ()
4
,
3 then find ‘r’.

1
σ x= σ y
Q26] The tangent of the angle between two regression lines is 0.6 and if 2 , find the correlation
coefficient between x and y. (Ans. r=0.5).

UNIT – 2 :: PART – 3/3 :: CURVE FITTING

Syllabus:-Correlation – Correlation Coefficient – Rank correlation – Regression Coefficients and


properties – Regression lines – Method of least squares – Straight line – Parabola – Exponential – Power
Curves.

LEAST – SQUARES CURCVE FITTING PROCEDURES :- With an experimental data, the data is plotted on a
graph paper and a straight line is drawn through the plotted points. This is the usual method to fit a
mathematical equation to experimental data. This method of least squares is the most systematic
procedure to fit a unique curve through the given data points.

Let the set of data points be


x x1 x2 x3 ………….. xn
y y1 X2 y3 ………….. yn
Suppose the curve y = f(x) is fitted to this data.

Let the observed value at x = xi is yi and the corresponding value on the curve is f(x i). Let ei is the error of
approximation at x = xi, the we have ei = yi – f(xi).

Consider S = [y1 – f(x1)]2 + [y2 – f(x2)]2 + ……… + [yn – f(xn)]2 .

= e12 + e22 + ……. + em2 .

The method of least squares consists of minimizing S.

Fitting a Straight Line: -


Let y = a + bx be a straight line to be fitted to the given data consider the following data for
fitting the above straight line.
x x1 x2 x3 ………….. xn
y y1 X2 y3 ………….. yn
In the above straight line, we know that x and y are variables, a and b are two unknowns. For
calculation of two unknowns a and b , we need to the following equations derived from least squares
method for minimizing the error S.
∑ y=n.a+b. ∑ x
∑ xy=a. ∑ x+b. ∑ x 2
These equations are called Normal equations to fit the straight line y =a + bx .
Q27] Fit a straight line for the following data:
x 6 7 7 8 8 8 9 9 10
y 5 5 4 5 4 3 4 3 3

Q28] By the method of least squares, find the straight line best fits for the following data:
x 0 5 10 15 20 25
y 12 15 17 22 24 30

Q29] The temperatures T [in degrees Centigrade] and lengths L [in mm] of a heated rod are given below.
If L = a + b T, Find the best values for a and b.
T 20 30 40 50 60 70
L 800.3 800. 800.6 800.7 800. 801.0
4 9

Q30] A chemical company, wishing to study the effect of extraction time on the efficiency operation,
obtained the data shown in the following table. Fit a straight line to the given data by the method of
least squares.
Extraction time minutes (x) 27 45 41 19 3 39 19 49 15 31
Efficiency (y) 57 64 80 46 62 72 52 77 57 68

Fitting a Parabola or Second degree Polynomial :-

Let y = a + bx + cx2 be a parabola (Second degree polynomial) to be fitted to the given data, consider the
following given data.

x x1 x2 x3 ………….. xn
y y1 X2 y3 ………….. yn

In the above parabola, we know that x and y are variables, a, b, c are three unknowns. For calculation of
three unknowns a, b, c. We need to the following equations derived from least squares method for
minimizing the error X.

∑ y=n. a+b. ∑ x +c. ∑ x 2


∑ xy=a . ∑ x+b. ∑ x 2+c. ∑ x 3
∑ x 2 y=a . ∑ x 2 +b . ∑ x3 +c. ∑ x 4
These equations are called Normal equations to fit the parabola y = a + bx + cx2

Q31] Fit a parabola of the form y = a + bx + cx2 to the following data.:


x 10 15 20 25 30 35
y 35.3 32.4 29.2 26.1 23.2 20.5

Q32] By the method of least squares fit a parabola of the form y = a + bx + cx2 for the following data.:
x 2 4 6 8 10
y 3.07 12.85 31.47 57.38 91.29

Q33] Fit a second degree polynomial to the following data by the method of least squares.
x 10 12 15 23 20
y 14 17 23 25 21

Q34] Fit a parabola of the form y = a + bx + cx2 to the following data.:


x 1 2 3 4 5 6 7
y 2.3 5.2 9.7 16.5 29.4 35.5 54.5

Exponential Curve y=a.e bx . :-

Suppose the curve to be fitted with the given data is y=a.e bx .

⇒ log e y=log e ( a . e bx )
⇒ log e y=log e a+log e ( e bx )
⇒ log e y=log e a+bx . log e ( e )
⇒ log e y=log e a+bx .
⇒ Y = A+bx , where Y =log e y ; A=log e a

Normal Equations are

∑ Y = A .n+b. ∑ x
∑ x .Y = A . ∑ x+b. ∑ x2
Q35] Determine the constants a and b by the method of least squares such that y=a.e bx .
x 2 4 6 8 10
y 4.077 11.084 30.128 81.897 222.62

Q36] Fit the curve of the form y=a .e bx .


x 77 100 185 239 285
y 2.4 3.4 7.0 11.1 19.6
Q37] Fit the curve of the form y=a .e bx .
x 0 1 2 3
y 1.05 2.10 3.85 8.30

Power Curve y=a .(b x ) :-

Suppose the curve to be fitted with the given data is y=a .(b x )

⇒ log e y=log e ( a . b x )
⇒ log e y=log e a+log e ( b x )
⇒ log e y=log e a+x . log e ( b )
⇒ Y = A+Bx , where Y =loge y ; A=log e a; B=log e b

Normal Equations are

∑ Y = A .n+B . ∑ x
∑ x .Y = A . ∑ x+B . ∑ x2
Q38] Fit the curve of the form y=a .(b x )

x 2 3 4 5 6
y 8.3 15.4 33.1 65.2 127.4

Q39] Fit the curve of the form y=a .(b x )

x 0 1 2 3 4 5 6 7
y 10 21 35 59 92 200 400 610

Power Curve y=a .( x b ) :-

Suppose the curve to be fitted with the given data is y=a .( x b )


⇒ log e y=log e ( a . x b )
⇒ log e y=log e a+log e ( x b )
⇒ log e y=log e a+b . log e ( x )
⇒ Y = A+bX , where Y =log e y ; A=log e a; X=log e x

Normal Equations are

∑ Y = A .n+b. ∑ X
∑ X .Y = A . ∑ X +b. ∑ X 2
Q40] Fit the curve of the form y=a .( x b )

x 3 4 5 6 7
y 6 9 10 11 12

Q41] Fit the curve of the form y=a .( x b )

x 1 2 3 4 5 6
y 151 100 61 50 20 8

a
Power Curve x. y =b . :-

Suppose the curve to be fitted with the given data is x. y a =b.

⇒ loge ( x . y a )=loge ( b )
⇒ loge x +loge ( y a ) =loge b
⇒ loge x +a . loge y=loge b
⇒ X=B+(−a)Y ,
⇒ X=B+ AY
where Y =loge y ; B=loge b ; X=loge x ; A=−a

Normal Equations are

∑ X=B .n+ A . ∑ Y
∑ X .Y =B . ∑ Y +A . ∑ Y 2
a
Q40] Fit the curve of the form x. y =b.
x 200 150 100 50 30 10
y 1.1 1.4 1.7 2.8 4.6 6.4

a
Q41] Fit the curve of the form x. y =b.
x 3.4567 1.4691 0.8905 0.6243 0.4740
y 1 2 3 4 5

** ** **

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy