Maths 1
Maths 1
Maths 1
Statistical Technique 1
Unit: 1
Subject Name: Mathematics-IV Dr. Kunti Mishra
Subject Code: AAS0402 NIET, Gr Noida
Department of
B Tech 4th Sem Mathematics
Qualifications :
M.Sc.(Maths), M. Tech.(Gold Medalist) in Applied and
Computational Mathematics, Ph.D
Data Analysis
Artificial intelligence
Network and Traffic modeling
PSO1 The ability to identify, analyze real world problems and design
their ethical solutions using artificial intelligence, robotics,
virtual/augmented reality, data analytics, block chain technology,
and cloud computing
PSO2 The ability to design and develop the hardware sensor devices and
related interfacing software systems for solving complex
engineering problems.
PSO3 The ability to understand inter disciplinary computing techniques
and to apply them in the design of advanced computing.
PSO4 The ability to conduct investigation of complex problem with the
help of technical, managerial, leadership qualities, and modern
engineering tools provided by industry sponsored laboratories.
1 CO1 H H H H L L L L L L L M
2 CO2 H H H H L L L L L L M M
3 CO3 H H H H L L L L L L M M
4 CO4 H H H H L L L L L L L M
5 CO5 H H H H L L L L L L M M
CO.1 H L M L
CO.2 L M L M
CO.3 M M M M
CO.4 H M M M
CO.5 H M M M
IOT IV A 49 45 91.83%
• Introduction
• Measures of central tendency: Mean, Median, Mode.
• Moment
• Skewness
• Kurtosis
• Curve Fitting
• Method of least squares
• Fitting of straight lines
• Fitting of second degree parabola
• Exponential curves
• Correlation and Rank correlation,
• Linear regression
• Nonlinear regression
• Multiple linear regression
Arithmetic Mean
Definition
Arithmetic mean of a set of observations is their sum divided by the
number of observations, e.g., the arithmetic mean x¯of n observations
x1, x2, ..., xn is given by:
𝑛
x1+ x2+ … + xn 1
𝑥ҧ = = 𝑥𝑛
𝑛 𝑛
𝑖=1
In case of the frequency distribution xi |fi , i = 1, 2, ..., n, where
fi is the frequency of the variable xi,
𝑛 𝑛 𝑛
𝑓1 x1 +𝑓2 x2 +⋯ + 𝑓𝑛 xn σ𝑖=1 𝑓𝑖 𝑥𝑖 1
𝑥ҧ = = 𝑛 = 𝑓𝑖 𝑥𝑖 , where 𝑓𝑖
𝑓1 + 𝑓2 + ⋯ + 𝑓𝑛 σ𝑖=1 𝑓𝑖 𝑁
𝑖=1 𝑖=1
=𝑁
where 𝑓𝑖 = 𝑁
𝑖=1
Median:
Definition: Median of a distribution is the value of the variable which
divides it into two equal parts.
It is the value such that the number of observations above it is equal to
the number of observations below it. The median is thus a positional
average.
Ungrouped Data:
• If the number of observations is odd then median is the middle
value after the values have been arranged in ascending or descending
order of magnitude.
• In case of even number of observations, there are two middle
terms and median is obtained by taking the arithmetic mean of
middle terms.
Example
1. Median of Values 25, 20, 15, 35, 18. Median: 20
2. Median of Values 8, 20, 50, 25, 15, 30. Median: 22.5
where N =σ𝑛𝑖=1 𝑓𝑖
𝑁
i. See the cumulative frequency (c.f.) just greater than 2 .
𝑁
Here N = 120, The cumulative frequency just greater than is 65 and
2
the 2 value of x corresponding to 65 is 5. Therefore, median is 5.
Mode:
• Mode is the value which occurs most frequently in a set of
observations and around which the other items of the set cluster
densely.
• It is the point of maximum frequency or the point of greatest
density.
• In other words the mode or modal value of the distribution is that
value of the variate for which frequency is maximum.
Calculation of Mode
In case of discrete distribution: Mode is the value of x
corresponding to maximum frequency but in any one (or more)of
the following cases.
where 𝑙 is the lower limit,ℎ 𝑡ℎ𝑒 width and 𝑓𝑚 the frequency of the
model class 𝑓1 𝑎𝑛𝑑 𝑓2 are the frequencies of the classes preceding and
succeeding the modal class respectively. While applying the above
formula it is necessary to see that the class intervals are of the same
size.
𝑺𝒊𝒛𝒆(𝒙) 1 2 3 4 5 6
4 2 7
5 5 13
6 8 17 15
7 9 21 22 29
8 12 26 35
9 14 28 40 43
10 14 29 40
11 15 26 39
12 11 24
13 13
𝑓𝑚 −𝑓1
Mode= 𝑙 + ×ℎ
2𝑓𝑚 −𝑓1 −𝑓2
32 − 16
= 15.5 + ×5
64 − 16 − 24
16
= 15.5 + ×5
24
10
= 15.5 +
3
= 18.83 𝑚𝑎𝑟𝑘𝑠
Q.1 Calculate the mean, median and mode of the following data-
Wages (in Rs) 0-20 20-40 40-60 60-80 80-100 100-120 120-140
No. of 6 8 10 12 6 5 3
Workers
Moments
• In mathematical statistics it involve a basic calculation. These
calculations can be used to find a probability distribution's mean,
variance, and skewness.
σ𝑛
𝑖=1 𝑥𝑖 −𝑥ҧ
𝑟
Moment about mean 𝜇𝑟 = 𝑛
;r = 0,1,2, … .
σ𝑛𝑖=1 𝑓𝑖 𝑥𝑖 − 𝑥ҧ 𝑟
𝜇𝑟 = ; r = 0,1,2 … .
𝑁
where 𝑁 = σ𝑛𝑖=1 𝑓𝑖
1 1 𝑁
in particular 𝜇0 = 𝑁 σ𝑛𝑖=1 𝑓𝑖 𝑥𝑖 − 𝑥ҧ 0
= 𝑁 σ𝑛𝑖=1 𝑓𝑖 = 𝑁 = 1
Note. In case of a frequency distribution with class intervals, the values
of 𝑥 are the midpoints of the intervals.
Example 1.Find the first four moments for the following individual
series.
Solution: Calculation of Moments
𝒙 3 6 8 10 18
σ 𝑥 45
Now 𝑥ҧ = = =9
𝑛 5
σ 𝑥−𝑥ҧ 0
𝜇1 = 𝑛
=5=0,
σ 𝑥−𝑥ҧ 2 128
𝜇2 = 𝑛
= 5 =25.6,
σ 𝑥−𝑥ҧ 3 486
𝜇3 = 𝑛
= 5 =97.2,
σ 𝑥−𝑥ҧ 4 7940
𝜇4 = 𝑛
= 5 =1588,
𝒇𝒙 𝒙−ഥ𝒙 𝒇 𝒙−ഥ
𝒙 𝟐 𝒇(𝒙 𝒇 𝒙−ഥ 𝟒
Mark No.of Mid- 𝒇 𝒙−ഥ
𝒙 𝒙
s Studen Point =𝒙
ts(𝒇) (𝒙) − 𝟑𝟒
5-15 10 10 100 -24 -240 5760 -138240 3317760
σ 𝑓𝑥 3400
𝑥ҧ = = = 34
𝑁 100
σ𝑓 𝑥 − 𝑥ҧ 0
𝜇1 = = =0
𝑁 100
σ𝑓 𝑥 − 𝑥ҧ 2 21400
𝜇2 = = = 214
𝑁 100
σ𝑓 3
𝑥 − 𝑥ҧ 46800
𝜇3 = = = 468
𝑁 100
σ𝑓 4
𝑥 − 𝑥ҧ 9671200
𝜇4 = = = 96712
𝑁 100
1
𝜇′𝑟 = 𝑁 σ𝑛𝑖=1 𝑓𝑖 𝑥𝑖 − 𝐴 𝑟 ; 𝑟 = 0,1,2, …
Where,𝑁 = σ𝑛𝑖=1 𝑓𝑖
1
For 𝑟 = 0, 𝜇′0 = 𝑁 σ𝑛𝑖=1 𝑓𝑖 𝑥𝑖 − 𝐴 0
=1
1 𝑛 1 𝑛 𝐴 𝑛
For 𝑟 = 1, 𝜇′1 = σ 𝑓 𝑥𝑖 − 𝐴 = σ 𝑓𝑥 − σ𝑖=1 𝑓𝑖 = 𝑥ҧ − 𝐴
𝑁 𝑖=1 𝑖 𝑁 𝑖=1 𝑖 𝑖 𝑁
1 𝑛 2
For 𝑟 = 2, 𝜇′2 = σ 𝑓 𝑥𝑖 − 𝐴
𝑁 𝑖=1 𝑖
1 𝑛 3
For 𝑟 = 3, 𝜇′3 = σ 𝑓 𝑥𝑖 − 𝐴 and so on.
𝑁 𝑖=1 𝑖
In Calculation work, if we find that there is some common factor ℎ(>1)
in values of 𝑥 − 𝐴,we can ease our calculation work by defining 𝑢 =
𝑥−𝐴
.
ℎ
In that case , we have
𝑛
1
𝜇′𝑟 = 𝑓𝑖 𝑢𝑖 𝑟 ℎ𝑟 ; 𝑟 = 0,1,2, … .
𝑁
𝑖=1
Where, 𝑁 = σ𝑛𝑖=1 𝑓𝑖
1 𝑛 𝑁
For 𝑟 = 0, 𝑣0 = σ𝑖=1 𝑓𝑖 𝑥𝑖 0 = = 1
𝑁 𝑁
1 𝑛
For 𝑟 = 1, 𝑣1 = σ 𝑓 𝑥 = 𝑥ҧ
𝑁 𝑖=1 𝑖 𝑖
1 𝑛
For 𝑟 = 2, 𝑣2 = σ𝑖=1 𝑓𝑖 𝑥𝑖 2 and so on.
𝑁
relations:
𝜇1 = 0
𝜇2 = 𝜇2 ′ − 𝜇1 ′2
𝜇3 = 𝜇3 ′ − 3𝜇2 ′𝜇1 ′ + 2𝜇1 ′3
𝜇4 = 𝜇4′ − 4𝜇3′ 𝜇1 ′ + 6𝜇2 ′𝜇1 ′2 − 3𝜇1 ′4
𝛾1 = + 𝛽1 𝛾2 = 𝛽2 − 3 (𝛾 −coefficients)
= −40 − 48 + 2 = −86.
= 141290.11.
Example 3:Calculate the variance and third central moment from the
following data.
𝒙𝒊 0 1 2 3 4 5 6 7 8
𝐹𝑖 1 9 26 59 72 52 29 7 1
Solution: Calculation of Moments
0 1 -4 -4 16 -64
1 9 -3 -27 81 -243
2 26 -2 -52 104 -208
3 59 -1 -59 59 -59
4 72 0 0 0 0
σ 𝑓𝑢 −7
𝜇′1 = h= = −0.02734
𝑁 256
σ 𝑓𝑢2 507
𝜇′2 = ℎ2 = 256 =1.9805
𝑁
σ 𝑓𝑢3 3 −37
𝜇′3 = ℎ = = −0.1445
𝑁 256
Moments about Mean:
𝜇1 = 0
2
𝜇2 = 𝜇′2 − 𝜇′ 1 = 1.9805 − −.02734 2
= 1.97975
Variance=1.97975
Also 𝜇3 = 𝜇′3 − 3𝜇′2 𝜇′1 + 2𝜇1 ′3
3
= −0.1445 − 3 1.9805 −0.02734 + 2 −0.02734
=0.0178997
Third central moment= 0.0178997.
Skewness
• It tells us whether the distribution is normal or not
• It gives us an idea about the nature and degree of concentration of
observations about the mean
• The empirical relation of mean, median and mode are based on a
moderately skewed distribution
Skewness:
• I t means lack of symmetry.
• It gives us an idea about the shape of the curve which we can draw with
the help of the given data.
• A distribution is said to be skewed if—
Mean, median and mode fall at different points, i.e.,
Mean ƒ= Median ƒ= Mode;
• Quartiles are not equidistant from median; and
• The curve drawn with the help of the given data is not symmetrical but
stretched more to one side than to the other.
Symmetrical Distribution
A symmetric distribution is a type of distribution where the left side of the
distribution mirrors the right side. In a symmetric distribution, the mean,
mode and median all fall at the same point.
M e as ure s o f Skewness:
The measures of skewnessare:
• Sk = M − M d ,
• Sk = M − M o ,
• Sk = (Q3 − Md ) − (Md − Q1 ),
where M is the mean, Md , the median, Mo , the mode, Q1, the first quartile
deviation and Q3, the third quartile deviation of the distribution.
These are the absolute measures of skewness.
• C o e f f i c i e n t s o f S k e w n e s s : For comparing two series we do
not calculate these absolute measures but we calculate the relative measures
called the coefficients of skewness which are pure numbers independent of
units of measurement.
Pearson’s 𝜷𝟏 a n d 𝜸 𝟏 C o e f f i c i e n t s :
𝝁𝟑
𝜸 𝟏 = 𝜷𝟏 = ±
𝝁𝟐 𝟑
Kurtosis
• Describe the concepts of kurtosis
• Explain the different measures of kurtosis
• Explain how kurtosis describe the shape of a distribution.
Kurtosis
• If we know the measures of central tendency, dispersion and skewness, we
still cannot form a complete idea about the distribution. Let us consider the
figure in which all the three curves
• A, B, and C are symmetrical about the mean and have the same range.
Curve of the type C which is more peaked than the normal curve is called leptokurtic
curve and for such curve 𝛽2 > 3, i.e., γ2 >0.
Q2. For a distribution, the mean is 10, variance is 16, γ1 is +1 and 𝛽2 is 4. Comment
about the nature of distribution. Also find third central moment.
𝝁𝟑
Solution1 = ± ⇒ 𝝁𝟑 =64, 𝝁𝟐 =16,
𝟒𝟎𝟗𝟔
𝜇4
4= ⇒ 𝜇4 = 1024
256
Example 3 The first four moment about the working mean 28.5 of a distribution
are 0.294,7.144,42.409 and 454.98. Calculate the first four moment about mean.
Also evaluate 𝛽1 and 𝛽2 and comment upon the skewness and kurtosis of
the distribution.
Solution:𝜇′1 = .294, 𝜇′2 = 7.144, 𝜇′3 = 42.409, 𝜇′4 =
454.98Moment about mean
𝜇1 = 0,
𝜇2 = 𝜇2′ − 𝜇1 ′2 = 7.0576.
𝜇3 = 𝜇3′ − 3𝜇2′ 𝜇1 ′ + 2𝜇1 ′3 = 36.1588,
𝜇4 = 𝜇4′ − 4𝜇3′ 𝜇1′ + 6𝜇2′ 𝜇1 ′2 − 3𝜇1 ′4 = 408.7896
𝜇2 3
𝛽1 = 3 = 3.7193,
𝜇2
𝜇4
𝛽2 = 2 = 8.207
𝜇2
Skewness :𝛽1 is positive
𝛾 1 = 1.9285 so distribution is positivley skewed.
Kurtosis: 𝛽2 = 8.207 > 3 so distribution is leptokutic.
Q1. Find all four central moments and Discuss Skewness and Kurtosis
for the following distribution-
No.of Library 5 10 8 16 14 12
Moments
Relation between 𝑣𝑟 𝑎𝑛𝑑 𝜇𝑟
Relation between 𝜇𝑟 𝑎𝑛𝑑 𝜇′𝑟
Skewness
Kurtosis
Curve Fitting
• The objective of curve fitting is to find the parameters of a
mathematical model that describes a set of data in a way that
minimizes the difference between the model and the data.
Sol. Let the straight line obtained from the given data be
𝑦 = 𝑎. 1 + 𝑏𝑥 (1)
then the normal equations are
σ 𝑦 = 𝑚𝑎 + 𝑏 σ 𝑥 (2)
σ 𝑥𝑦 = 𝑎 σ 𝑥 + 𝑏 σ 𝑥 2 (3) m=5
Example Use the method of least squares to the fit the curve:
𝑐0
𝑦= + 𝑐1 𝑥 to the following table of values:
𝑥
𝒙 𝑦 𝑦 𝑦 𝑥 𝟏 1
𝑥 𝑥 𝑥2
0.1 21 210 6.64078 3.16228 100
0.2 11 55 4.91935 2.23607 25
0.4 7 17.5 4.42719 1.58114 6.25
0.5 6 12 4.24264 1.41421 4
1 5 5 5 1 1
2 6 3 8.48528 0.70711 0.25
4.2 302.5 33.7152 10.1008 136.5
4 1
𝑥 0 1 2 3 4
𝑓 1 0 3 10 21
Moments
Relation between 𝑣𝑟 𝑎𝑛𝑑 𝜇𝑟
Relation between 𝜇𝑟 𝑎𝑛𝑑 𝜇′𝑟
Skewness & kurtosis
Curve fitting
Correlation
• Identify the direction and strength of a correlation between two factors.
• Compute and interpret the Pearson correlation coefficient and test for
significance.
• Compute and interpret the coefficient of determination.
• Compute and interpret the Spearman correlation coefficient and test for
significance.
Negative Correlation:
• If the two variables deviate in the opposite directions, i.e., if increase (or
decrease) in one results in corresponding decrease (or increase) in the other,
correlation is said to be diverse or negative.
• For example, the correlation between (i) the price and demand of a
commodity, and (ii) the volume and pressure of a perfect gas; is
negative.
Perfect Correlation:
• Correlation is said to be perfect if the deviation in one variable is
followed by a corresponding and proportional deviation in the other.
Correlation Coefficient:
• The correlation coefficient due to Karl Pearson is defined as a measure of
intensity or degree of linear relationship between two variables.
• Karl Pearson’s Correlation Coefficient
• Karl Pearson’s correlation coefficient between two variables X and Y ,
is denoted by r (X, Y ) or rXY , is a measure of linear relationship between
them and is definedas:
𝐶𝑜𝑣(𝑥,𝑦)
• r(X, Y ) = σ σ
X Y
• f (xi, yi ); i = 1, 2, ...,n is the bivariate distribution, then
1
σ 𝑥𝑖 − 𝑥ҧ 𝑦𝑖 − 𝑦ത
= 𝑛
𝜎𝑥 𝜎𝑦
σ 𝑥 − 𝑥ҧ 𝑦 − 𝑦ത
𝑟𝑥𝑦 =
𝑛𝜎𝑥 𝜎𝑦
𝑛 σ 𝑥𝑦−σ 𝑥 σ 𝑦
Or 𝑟 𝑥, 𝑦 =
𝑛 σ 𝑥 2− σ 𝑥 2 𝑛 σ 𝑦2− σ 𝑦 2
Here 𝑛 is the no. of pairs of values of 𝑥 𝑎𝑛𝑑 𝑦.
Note: Correlation co efficient is independent of change of origin and
scale.
Let us define two new variables 𝑢 𝑎𝑛𝑑 𝑣 𝑎𝑠
𝑥−𝑎 𝑦−𝑏
𝑢= ℎ
,𝑣 = 𝑘
where 𝑎, 𝑏, ℎ, 𝑘 𝑎𝑟𝑒 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 𝑡ℎ𝑒𝑛 𝑟𝑥𝑦 = 𝑟𝑢𝑣
𝑛 σ 𝑢𝑣−σ 𝑢 σ 𝑣
Then 𝑟 𝑢, 𝑣 =
𝑛 σ 𝑢2 − σ 𝑢 2 𝑛 σ 𝑣 2 − σ 𝑣 2
𝑥 = 34 𝑦 = 90 𝑥 2 = 248 𝑦 2 = 1446
𝑥𝑦 = 582
𝑥−22 𝑦−24
Solution: Let 𝑢 = ,𝑣 =
4 6
𝒙 𝒚 𝒖 𝒗 𝒖𝟐 𝒗𝟐 𝒖𝒗
10 18 -3 -1 9 1 3
14 12 -2 -2 4 4 4
18 24 -1 0 1 0 0
22 6 0 -3 0 9 0
26 30 1 1 1 1 1
30 36 2 2 4 4 4
Total
𝑢 = −3 𝑣 = −3 𝑢2 = 19 𝑣 2 = 19 𝑢𝑣
= 12
1 1 1 1 1 1
Hence,n=6,𝑢ത = σ 𝑢 = −3 = − ; 𝑣ҧ = σ 𝑣 = −3 = −
𝑛 6 2 𝑛 6 2
𝑛 σ 𝑢𝑣−σ 𝑢 σ 𝑣
Then 𝑟𝑢𝑣 =
𝑛 σ 𝑢2 − σ 𝑢 2 𝑛 σ 𝑣 2 − σ 𝑣 2
6 × 12 − −3 −3 63
= = = 0.6
6 × 19 − −3 2 6 × 19 − −3 2 105 105
1
σ 𝑓𝑥𝑦 − σ 𝑓𝑥 σ 𝑓𝑦
𝑟𝑥𝑦 = 𝑛
1 1
σ 𝑓𝑥 2 − σ 𝑓𝑥 2 σ 𝑓𝑦 2 − σ 𝑓𝑦 2
𝑛 𝑛
Since change of origin and scale do not affect the co-efficient of
correlation.𝑟𝑥𝑦 = 𝑟𝑢𝑣 where the new variables 𝑢, 𝑣 are properly chosen.
Q. The following table given according to age the frequency of marks
obtained by 100 students is an intelligence test:
Marks 18 19 20 21 total
10-20 4 2 2 8
20-30 5 4 6 4 19
30-40 6 8 10 11 35
40-50 4 4 6 8 22
50-60 2 4 4 10
60-70 2 3 1 6
Total 19 22 31 28 100
𝑦−45
Let us define two new variables 𝑢 𝑎𝑛𝑑 𝑣 𝑎𝑠 𝑢 = = 𝑥 − 20 ,𝑣
10
1
σ 𝑓𝑢𝑣 − σ 𝑓𝑢 σ 𝑓𝑣
𝑟𝑥𝑦 = 𝑟𝑢𝑣 = 𝑛
1 1
σ 𝑓𝑢2 − σ 𝑓𝑢 2 σ 𝑓𝑣 2 − σ 𝑓𝑣 2
𝑛 𝑛
1
59 − 100 −75 −32 59 − 24
= =
1 1 643 2894
217 − 100 −75 2 126 − 100 −32 2
4 × 25
= 0.25
RANK CORRELATION:
Definition: Assuming that no two individuals are bracketed equal in either
classification, each of the variables X and Y takes the values 1, 2, ...,n.
Hence, the rank correlation coefficient between A and Bis denoted by r, and
is given as:
𝟔 σ 𝑫𝒊 𝟐
𝒓=𝟏−
𝒏 𝒏𝟐 − 𝟏
Person A B C D E F G H I J
Rank in 9 10 6 5 7 2 4 8 1 3
maths
Rank in 1 2 3 4 5 6 7 8 9 10
physics
Person 𝑹𝟏 𝑹𝟐 D=𝑹𝟏 − 𝑹𝟐 𝑫𝟐
A 9 1 8 64
B 10 2 8 64
C 6 3 3 9
D 5 4 1 1
E 7 5 2 4
F 2 6 -4 16
G 4 7 -3 9
H 8 8 0 0
I 1 9 -8 64
J 3 10 -7 49
𝐷2 = 280
6 σ 𝐷2 6 × 280
𝑟 =1− 2
=1− = 1 − 1.697 = −0.697
𝑛 𝑛 −1 10 100 − 1
Uses:
• It is used for finding correlation coefficient if we are dealing with
qualitative characteristics which cannot be measured quantitatively but
can be arranged serially.
• It can also be used where actual data are given.
• In case of extreme observations, Spearman’s formula is preferred to
Pearson’s formula.
Limitations:
• It is not applicable in the case of bivariate frequency distribution.
• For n > 30, this formula should not be used unless the ranks are given,
since in the contrary case the calculations are quite time-consuming.
𝒙 68 64 75 50 64 80 75 40 55 64
𝑦 62 58 68 45 81 60 68 48 50 70
75 𝑿 68 64 75 50 64 80 75 40 55 64 Total
𝑌 62 58 68 45 81 60 68 48 50 70
Ranks in 4 6 2.5 9 6 1 2.5 10 8 6
𝑋(𝑥)
Ranks in 5 7 3.5 10 1 6 3.5 9 8 2
Y(𝑦)
𝐷 = 𝑥 − 𝑦 -1 -1 -1 -1 5 -5 -1 1 0 4 0
𝐷2 1 1 1 1 25 25 1 1 0 16 72
75 2 times
64 3 times
68 2 times
1 1 1
6 σ 𝐷2 +
𝑚1 𝑚1 2 − 1 + 𝑚2 𝑚2 2 − 1 + 𝑚3 𝑚 3 2 − 1
𝑟 =1− 12 12 12
𝑛 𝑛2 − 1
1 1 1
6 72 + . 2 22 − 1 + . 3 32 − 1 + . 2 22 − 1
=1− 12 12 12
10 102 − 1
6 × 75 6
=1− = = 0.545
990 11
Q1. Find the rank correlation coefficient for the following data:
𝑥 23 27 28 28 29 30 31 33 35 36
𝑦 18 20 22 27 21 29 27 29 28 29
Correlation
Karl Pearson coefficient of correlation
Rank Correlation
Tied Rank
Regression
• Explanation of the variation in the dependent variable, based on the
variation in independent variables and Predict the values of the
dependent variable.
REGRESSION ANALYSIS:
• Regression measures the nature and extent of correlation
.Regression is the estimation or prediction of unknown values of one
variable from known values of another variable.
Difference between curve fitting and regression analysis: The only
fundamental difference, if any between problems of curve fitting and
regression is that in regression, any of the variables may be considered
as independent or dependent while in curve fitting, one variable cannot
be dependent.
Curve of regression and regression equation:
• If two variates 𝑥 𝑎𝑛𝑑 𝑦 are correlated i.e., there exists an
association or relationship between them, then the scatter diagram
will be more or less concentrated round a curve. This curve is called the
curve of regression and the relationship is said to be expressed by
means of curvilinear regression.
• The mathematical equation of the regression curve is called
regression equation.
LINEAR REGRESSION:
• When the point of the scatter diagram concentrated round a straight
line, the regression is called linear and this straight line is known as
the line of regression.
• Regression will be called non-linear if there exists a relationship
other than a straight line between the variables under consideration.
LINES OF REGRESSION:
Let 𝑦 = 𝑎 + 𝑏𝑥 ----.(1)
be the equation of regression line of 𝑦 𝑜𝑛 𝑥.
σ 𝑦 = 𝑛𝑎 + 𝑏 σ 𝑥 … … .(2)
σ 𝑥𝑦 = 𝑎 σ 𝑥 + 𝑏 σ 𝑥 2 … … .(3)
Solving (2) and (3) for ‘𝑎’ and ‘𝑏’ we get.
1
σ 𝑥𝑦− σ 𝑥 σ 𝑦 𝑛 σ 𝑥𝑦−σ 𝑥 σ 𝑦
𝑏= 𝑛
1 = …..(4)
σ 𝑥2− σ𝑥 2 𝑛 σ 𝑥2− σ 𝑥 2
𝑛
σ𝑦 σ𝑥
𝑎= −𝑏 = 𝑦ത − 𝑏𝑥ҧ … …(5)
𝑛 𝑛
Eqt.(5) given 𝑦ത = 𝑎 + 𝑏𝑥ҧ
Hence 𝑦 = 𝑎 + 𝑏𝑥 line passes through point 𝑥,ҧ 𝑦ത
Putting 𝑎 = 𝑦ത − 𝑏𝑥ҧ in equation 𝑦 = 𝑎 + 𝑏𝑥 ,we get
𝑦 − 𝑦ത = 𝑏 𝑥 − 𝑥ҧ ………(6)
Eqt.(6) is called regression line of 𝑦 𝑜𝑛 𝑥.′ 𝑏′ is called the regression
coefficient of 𝑦 𝑜𝑛 𝑥 and is usually denoted by 𝑏𝑦𝑥.
𝑦 − 𝑦ത = 𝑏𝑦𝑥 𝑥 − 𝑥ҧ
𝜎𝑦
𝑏𝑦𝑥 = 𝑟
𝜎𝑥
𝑥 = 𝑎 + 𝑏𝑦
𝑥 − 𝑥ҧ = 𝑏𝑥𝑦 𝑦 − 𝑦ത
Where 𝑏𝑥𝑦 is the regression coefficient of 𝑥 𝑜𝑛 𝑦 and is given by
𝑛 σ 𝑥𝑦 − σ 𝑥 σ 𝑦
𝑏𝑥𝑦 =
𝑛 σ 𝑦 2 − (σ 𝑦)2
𝜎𝑥
Or 𝑏𝑥𝑦 = 𝑟 where the terms have their usual meanings.
𝜎𝑦
⇒ 𝑛𝑟𝜎𝑥 𝜎𝑦 = 𝑎 0 + 𝑏𝑛𝜎𝑥 2
𝜎𝑦
⇒𝑏=𝑟
𝜎𝑥
Where 𝑟 is the coefficient of correlation 𝜎𝑥 𝑎𝑛𝑑 𝜎𝑦 are the standard
deviations of 𝑥 𝑎𝑛𝑑 𝑦 series respectively.
𝑟𝜎𝑦 𝑟𝜎𝑥
G.M. between them= 𝜎𝑥
× 𝜎𝑦
= 𝑟 2 = r =coefficient of correlation.
1
Let 𝑏𝑦𝑥 >1,then <1
𝑏𝑦𝑥
𝜎𝑥 2 + 𝜎𝑦 2 > 2𝜎𝑥 𝜎𝑦
2
𝜎𝑥 − 𝜎𝑦 > 0 which is true.
Property 4:Regression coefficients are independent of the origin but
not of scale.
𝑥−𝑎 𝑦−𝑏
Proof. Let 𝑢 = ℎ , 𝑣 = 𝑘 , where a, b, h and k are constants
𝑟𝜎𝑦 𝑘𝜎𝑣 𝑘 𝑟𝜎𝑣 𝑘
byx = = r. = = 𝑏𝑣𝑢
𝜎𝑥 ℎ𝜎𝑢 ℎ 𝜎𝑢 ℎ
ℎ
Similarly, 𝑏𝑥𝑦 = 𝑏𝑢𝑣 ,
𝑘
Thus 𝑏𝑦𝑥 and 𝑏𝑥𝑦 are both independent of a and b but not of ℎ 𝑎𝑛𝑑 𝑘.
1−𝑟 2 𝜎 𝜎
𝑡𝑎𝑛𝜃 = . 2𝑥 𝑦 2 , where 𝑟, 𝜎𝑥, 𝜎𝑦 have their usual meanings.
𝑟 𝜎𝑥 +𝜎𝑦
1 − 𝑟 2 𝜎𝑦 𝜎𝑥 2 1 − 𝑟2 𝜎𝑥 𝜎𝑦
=± . . 2 2
=± . 2
𝑟 𝜎𝑥 𝜎𝑥 + 𝜎𝑦 𝑟 𝜎𝑥 + 𝜎𝑦 2
Since 𝑟 2 ≤ 1 and 𝜎𝑥 , 𝜎𝑦 are positive.
1−𝑟 2 𝜎 𝜎 𝜋
tan𝜃 = . 2𝑥 𝑦 2 Where 𝑟 = 0, 𝜃 = the two lines of regression
𝑟 𝜎𝑥 +𝜎𝑦 2
are Perpendicular to each other. Hence the estimated value of 𝑦 is the
same for all values of 𝑥 and vice versa.
When 𝑟 = ±1, 𝑡𝑎𝑛𝜃 = 0 so that 𝜃 = 0 𝑜𝑟 𝜋
Hence the lines of regression coincide and there is perfect correlation
between the two variates 𝑥 𝑎𝑛𝑑 𝑦.
Non-linear Regression:
Let 𝑦 = 𝑎. 1 + 𝑏𝑥 + 𝑐𝑥 2
Be a second degree parabolic curve of regression of 𝑦 on 𝑥.
⇒ 𝑦 = 𝑛𝑎 + 𝑏 𝑥 + 𝑐 𝑥 2
⇒ 𝑥𝑦 = 𝑎 𝑥 + 𝑏 𝑥 2 + 𝑐 𝑥 3
⇒ 𝑥2𝑦 = 𝑎 𝑥2 + 𝑏 𝑥3 + 𝑐 𝑥4
𝑥𝑦 = 𝑎 𝑥 + 𝑏 𝑥 2 + 𝑐 𝑥𝑧
𝑦𝑧 = 𝑎 𝑧 + 𝑏 𝑥𝑧 + 𝑐 𝑧 2
𝒙 1 2 3 4
𝑦 12 18 24 30
𝑧 0 1 2 3
𝑥𝑦 = 𝑎 𝑥 + 𝑏 𝑥 2 + 𝑐 𝑥𝑧
𝑦𝑧 = 𝑎 𝑧 + 𝑏 𝑥𝑧 + 𝑐 𝑧 2
Q1. Fit a straight line trend by the method of least square to the
following data:
Year 1979 1980 1981 1982 1983 1984
5 7 9 10 12 17
Production
X 1 2 3 4 5
Y 2 4 5 3 6
Q4. Fit a straight line trend by the method of least squares to the
following data: -
Year 2012 2013 2014 2015 2016 2017
Sales of 7 10 12 14 17 24
T.V. sets
(in’000)
Q3. Sum of squares of items 2430, mean is 7 N=12, find the variance.
i. 176.5
ii. 12.38
iii. 153.26
iv. 14
Q4. Calculate the standard variation of the following
9, 8, 6,5,8,6
i. 2
ii. 3
iii. 1.414
iv. 2.414
Q2. Find the multiple linear regressions of 𝑥 on 𝑦 and 𝑧 from the data relating
to three variables:
𝑥 7 12 17 20
𝑦 4 7 9 12
𝑧 1 2 5 8
Q3. If 𝜃 is the angle between the two line of regression.then express tan 𝜃 in
terms of correlation coefficient(𝑟). Explain the significance when 𝑟 = 0 and
𝑟 = ±1.
Q4. Two lines of regression are given by 7𝑥 − 16𝑦 + 9 = 0 and −4𝑥 +
5𝑦 − 3 = 0 and 𝑣𝑎𝑟(𝑥)=16.Calculate-(i) the mean of 𝑥 and 𝑦 (ii) S.D. of 𝑦
(iii) the correlation coefficient.
•
2/14/2023 Faculty Name Dr. Kunti Mishra Unit 1 151
Expected Questions for University Exam(CO1)
Q6. The first four moments of a distribution about 2 are 1,2.5,5.5 and
16 resp.Calculate the four moments about mean and about the origin.
Text Books
• Erwin Kreyszig, Advanced Engineering Mathematics, 9thEdition,
John Wiley & Sons, 2006.
Reference Books
• B.S. Grewal, Higher Engineering Mathematics, Khanna Publishers,
35th Edition, 2000. 2.T.Veerarajan : Engineering Mathematics (for
semester III), Tata McGraw-Hill, New Delhi.