Final 2nd MAT1243 Handout 2023 Ac Year
Final 2nd MAT1243 Handout 2023 Ac Year
For example:
Relationship between price and supply, yield of crop
and fertilizer input, etc.
Correlation Analysis, Cont’D.
If two variables vary in such a way that movement in
one is accompanied by movement in the other, they are
said to be correlated.
Yield (Kg)
100
Quantity of
Fertilizer (Kg) 10 12 7 15 20 13 50
0
0 5 10 15 20 25
Yield (in Kg) 100 130 92 145 200 150 Quantity of fertilizer (Kg))
Recall on Scatter plot, Cont’D;
Example 2:
Consider the following data which relate 𝑥, the respective number of
branches that 10 different banks have in a given common market,
with 𝑦, the corresponding market share of total deposits held by the
banks:
2nd Curvilinear:
If the change in one variable does not bear a constant ratio to the amount of
change in the other variable, then the correlation is said to be non-linear or
curvilinear correlation.
3rd No correlation:
When the points are scattered all over the graph and it is difficult to conclude whether the
values are increasing or decreasing then there is no correlation between the variables.
2.2.3 Linear correlation and no
correlation in details
Positive correlation
This happened when the points in the graph are rising, moving from left to right.
It means that the values of one variable are increasing with respect to another.
• By looking to the scatter of various points one can form an idea whether the two
variables are correlated or not.
• The greater the scatter of points the lesser is the association / relationship between
two variables.
• The more closer the points come to a straight line the correlation is said to be higher
Methods of studying correlation: 1st. Scatter Diagrams, Cont’D.
Different examples for Scatter diagram to determine the types of correlation
If each point (𝑥, 𝑦) of the data is plotted in a plane, the scatter plot or
Scatter diagram is obtained.
The scatter plot or scatter diagram (in the figure above) indicates that,
roughly speaking, the market share increases as the number of
branches increases. We say that 𝑥 and 𝑦 have a positive correlation.
ii. On the other hand, consider the data below, which relate
average daily temperature 𝑥 in degrees Fahrenheit, and daily
natural gas consumption 𝑦 in cubic meter.
We see that y tends to decrease as x increases. Here, 𝑥 and 𝑦 have a
negative correlation
iii. Consider the data items (𝑥, 𝑦) below, which relate daily temperature
𝑥 over a 10-day period to the Dow Jones stock average 𝑦: (63, 3385);
(72, 3330); (76, 3325); (70, 3320); (71, 3330); (65, 3325); (70, 3280);
(74, 3280); (68, 3300); (61, 3265).
Solution:
First order the values of 𝑥 and values of 𝑦 from lowest to the highest
𝒙: 7; 8; 9; 10; 12; 12; 12; 12; 16; 16
𝒚: 4; 5; 6; 6; 7; 7; 8; 10; 1 0; 13
Then assign ranks from 1 to 10 for each data in data set
X: 7; 8; 9; 10; 12; 12; 12; 12; 16; 16
1 ; 2; 3; 4; 5; 6; 7; 8; 9; 10
Y: 4; 5; 6; 6; 7; 7; 8; 10; 1 0; 13
1 ; 2; 3; 4; 5; 6; 7; 8; 9; 10
Numbers in red are ranks. we need to find the average of those ranks
which is given by the average of their positions
5+6+7+8
e.g: the rank of 12 in data of x is given by = 6.5
4
This means that each 12 will be ranked 6.5
Also 16 appears 2 times on position 9 and 10. then rank of 16 given by
9+10
= 9.5 . i.e each 16 will be ranked 9.5
2
Take
𝑑 2 = 𝑑𝑖2
• Therefore,
6 σ10 𝑑
𝑖=1 𝑖
2
6×61 990−366
•𝜌 =1− , 𝜌 =1− =
𝑛(𝑛2 −1) 10 100−1 990
• Therefore, 𝜌 = 0.63
6×3
𝜌=1−
6 36 − 1
18
𝜌 =1−
210
𝜌 = 0.91
2.3 Regression Analysis
▪ After having established the fact that two variables are closely
associated, one may be interested in estimating the value of one
variable given the value of another variable.
▪ Regression analysis reveals the average relationship between two variables
and makes possible to estimate or predict the variate value under study.
▪In mathematics Y is called a function of X, but in statistics it is
termed as regression which describes relationship.
➢Regression is the study of functional relationship between two variables of
which one is dependent (Y) and another is independent (X) using equation.
• Example: Yield of crop and quantity of fertilizer
Crop yield and Quantity of rain (check on example 1 on scatter plot)
2.3.1 Regression line and regression coefficient
• Formulas for regression line:
𝒀𝒆𝒔𝒕 = 𝒂 + 𝒃𝒚𝒙 𝑿
Where;
σ𝑿 σ𝒀
𝑿𝒀 − 𝒏 𝑛 σ 𝑥𝑦 − σ 𝑥 σ 𝑦
𝑎 = 𝑌ത − 𝑏𝑦𝑥 𝑋ത and 𝒃𝒚𝒙 = or 𝑏 =
𝑿𝟐
σ𝑿
− 𝒏
𝟐
𝑛 σ 𝑥2 − σ 𝑥 2
• Interpretation:
➢ The value of Slope (𝑏) is interpreted as: On average, One unit increase in X increases (If it is
positive) Y by (𝑏) b unit. If (𝒃) is negative we say Decreases.
Regression line and coefficient interpretation, Cont’d;
➢If the value of the correlation coefficient is significant, the next step
is to determine the equation of the regression line,
➢ Purpose of the regression line is to enable the researcher to see
the trend and make predictions on the basis of the data.
➢ In addition, linear regression fit a straight line,𝒀 = 𝒂𝒙 + 𝒃, to data
that gives best prediction of y for any value of x. This will be the line
that minimises distance between data and fitted line, i.e. the
residuals known as line of Best Fit
➢ Given a scatter plot, you must be able to draw the line of best fit.
Best fit means that the sum of the squares of the vertical distances
from each point to the line is at a minimum.
Regression line and coefficient interpretation, Cont’d;
The reason you need a line of best fit is that the values of y will be
predicted from the values of x; hence, the closer the points are to the
line, the better the fit and the prediction will be.
Example of best fit line:
y = ax + b
slope intercept
ε = residual error
Regression line and coefficient interpretation, Cont’d;
Note that:
• When r is positive, the line slopes upward and to the right.
• When r is negative, the line slopes downward from left to right.
Number of employees(X) 6 7 4 5 8
Average daily income (Y) 90 110 50 80 100
𝟓 ∗ 𝟐𝟕𝟏𝟎 − 𝟑𝟎 ∗ (𝟒𝟑𝟎)
𝒓= = 𝟎. 𝟖𝟗
𝟓 𝟏𝟗𝟎 − 𝟑𝟎 𝟐 𝟓 𝟑𝟗𝟏𝟎𝟎 − 𝟒𝟑𝟎 𝟐
Answer 1 (ii) Regression line
No 𝒙 𝒚 𝒙𝒚 𝒙𝟐 𝒚𝟐
1 6 90 540 36 8100
2 7 110 770 49 12100 𝟑𝟎
3 4 50 200 16 2500 ഥ=
𝒙 =𝟔
𝟓
4 5 80 400 25 6400
𝟒𝟑𝟎
5 8 100 800 64 10000 ഥ=
𝒚 = 𝟖𝟔
𝟓
Total 30 430 2710 190 39100
Interpretation:
• The slope b and the correlation r always have the same sign.
• If the correlation r is positive, the slope will be positive and the line
will move upward.
• If the correlation r is negative, the slope will be negative and the line
will move downward.
• The LSRL always passes through 𝑥,ҧ 𝑦ത , where 𝑥,ҧ 𝑦ത is the mean of x
and the mean of y intersect.
• The square of the correlation, 𝑟 2 , is the fraction of the variation in
the values of y that is explained by the least-squares regression of y
on x. this 𝑟 2 is also called the coefficient of determination.
Regression line can be also written as:
We may write:
Note that the regression line 𝑥 on 𝑦 is 𝑥 = 𝑐𝑦 + 𝑑 given by
81