Correlation, Regression Analysis in Civil Engineering

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41
At a glance
Powered by AI
The key takeaways are that correlation measures the strength and direction of relationship between two variables, and regression analysis is used to model the relationship between variables and make predictions.

Correlation measures the degree of association between two variables. It is measured using correlation coefficients, which range from -1 to 1, with 0 indicating no relationship and 1 or -1 indicating a perfect positive or negative relationship respectively.

The main types of correlation coefficients discussed are Pearson, Spearman, and Kendall rank correlations, which are used for different data types (parametric vs non-parametric).

1

CORRELATION AND REGRESSION ANALYSIS


SUBJECT: Analytical and Numerical Methods for Structural Engineers- ANSE
(3712013)

Prepared by: Presented to:


PIUS NYANZI (1807020006) Dr Subhanshu Goyal
STUDENT: M.E CIVIL (STRUCTURAL) Head, Dept. of Mathematics (MEFGI)

MEFGI - GTU 10/12/2018


2 CONTENT

▪ Introduction
▪ Scatter diagrams
▪ Correlation analysis
o Pearson correlation coefficient with example
o Spearman rank correlation coefficient with example
o Kendall’s rank correlation coefficient with example
o Differences between Spearman and Kendall’s tau
▪ Regression Analysis
o Regression (curve fitting)
o Methods of regression
o Multiple regression model
▪ Some Statistical software Packages for regression analysis
▪ Conclusion
MEFGI - GTU 10/12/2018
3 CORRELLATION AND REGRESSION – Introduction

▪ Scientists and engineers always face the task of estimating the


values of dependent variable y for an intermediate value of the
independent variable x , given the discrete data points (x,y).

The data available belongs to main categories:


1. Values of well-defined functions e.g. log tables, trigonometric
tables, interest tables
2. Data values from experiment. E.g. the relationship between stress
and strain on a metal strip, voltage applied and speed of fan, drag
force and velocity of a falling body. Here the relationship is not well
defined.

MEFGI - GTU 10/12/2018


SCATTER DIAGRAMS
4

 A scatter diagram is a diagram that shows the values of two variables X and Y , along with
the way in which these two variables relate to each other.

MEFGI - GTU 10/12/2018


Scatter diagrams
Steel 67 69 85 83 74 81 97 97 114 85
5
bar
Temp
(oC)
Length 120 125 140 160 130 180 150 140 200 130
(mm)
Length (y)
mm

10/12/2018
MEFGI - GTU Temp. (x) oC
6 CORRELATION
 Correlation is a bivariate analysis that measures the strength of relationship
or association between two variables and the direction of the relationship.

 Finding the relationship between two quantitative variables

 Correlation coefficient:
Statistic showing the degree of relation between two variables

MEFGI - GTU 10/12/2018


Correlation coefficient
7

 In terms of the strength of relationship, the value of the correlation


coefficient varies between +1 and -1.
 The direction of the relationship is indicated by the sign of the
coefficient; a + sign indicates a positive relationship and a – sign
indicates a negative relationship.
 Usually, in statistics four types of correlations in statistics:

i. Pearson correlation
ii. Spearman correlation
iii. Kendall rank correlation

MEFGI - GTU 10/12/2018


8 Pearson correlation (r)

 xy −  x y
r= n
 ( x) 2   ( y)2 
x −
2 .  y −
2 
 n    n 

 The value of r ranges between (-1) and ( +1)

 The value of r denotes the strength of the relationship, the sign


denotes direction

MEFGI - GTU 10/12/2018


9
Pearson correlation (r)
strong intermediate weak weak intermediate strong

-1 -0.75 -0.25 0
0.25 0.75 1
indirect Direct
no relation
perfect perfect
correlation correlation

If r = Zero this means no association or correlation between the two variables.

If 0 < r < 0.25 = weak correlation.

If 0.25 ≤ r < 0.75 = intermediate correlation.

If 0.75 ≤ r < 1 = strong correlation.

If r = l = perfect correlation
MEFGI - GTU 10/12/2018
Example1 -Pearson correlation
10 A sample of 6 concrete cubes was selected, data about their age
in days and strength in N/mm2 was recorded as shown in the
following table . It is required to find the correlation between age
and weight.
serial Age Strength
No (days) (N/mm2)
1 7 12
2 6 8
3 8 12
4 5 10
5 6 11
6 9 13
MEFGI - GTU 10/12/2018
11
Example1 -Pearson correlation

▪ Independent variable (x) – Age


▪ Dependent variable (y)
▪ Simple correlation coefficient :

MEFGI - GTU 10/12/2018


• Pearson correlation coefficient
12
Age Strength
Serial
(days) (N/mm2) xy X2 Y2
n.
(x) (y)
1 7 12 84 49 144
2 6 8 48 36 64
3 8 12 96 64 144
4 5 10 50 25 100
5 6 11 66 36 121
6 9 13 117 81 169
Total ∑x= ∑y= ∑xy= ∑X2= ∑Y2=
41 66 461 291 742
MEFGI - GTU 10/12/2018
13 Example1 -Pearson correlation
41  66
461 −
r= 6
 (41) 2   (66) 2 
291 − .742 − 
 6  6 

• r = 0.759 (strong direct correlation)


Interpretation
• There is a strong positive correlation between the number of days of
concrete cubes and the strength of concrete, since r is very close to 1.

MEFGI - GTU 10/12/2018
14 Spearman correlation coefficient (rs)
• It is a non-parametric measure of correlation makes use of the two
sets of ranks assigned to the variables
6 (di) 2
rs = 1 −
n(n 2 − 1)
• Spearman Rank correlation coefficient could be computed in
the following cases:
I. Both variables are quantitative.
II. Both variables are qualitative ordinal.
III. One variable is quantitative and the other is qualitative ordinal.
MEFGI - GTU 10/12/2018
15
Spearman correlation coefficient
Procedure

▪ Rank the values of X from 1 to n where n is the numbers of pairs of


values of X and Y in the sample.
▪ Rank the values of Y from 1 to n.
▪ Compute the value of di for each pair of observation by subtracting
the rank of Yi from the rank of Xi
▪ Square each di and compute ∑(di)2
▪ which is the sum of the squared values.

MEFGI - GTU 10/12/2018


Example-2 Spearman correlation coefficient
16
In a study of the relationship between level education and income
the following data was obtained. Find the relationship between
them and comment.
sample level education Income
numbers (X) (Y)
A Preparatory. 25
B Primary. 10
C University. 8
D secondary 10
E secondary 15
F illiterate 50
G University. 60
MEFGI - GTU 10/12/2018
Example-2 Spearman correlation coefficient
17
Rank Rank di di2
(X) (Y) X Y
A Preparatory 25 5 3 2 4

B Primary. 10 6 5.5 0.5 0.25


C University. 8 1.5 7 -5.5 30.25
D secondary 10 3.5 5.5 -2 4
E secondary 15 3.5 4 -0.5 0.25
F illiterate 50 7 2 5 25
G university. 60 1.5 1 0.5 0.25

∑(di)2=64
(rs)=-0.1 A negative (indirect) weak correlation
MEFGI - GTU 10/12/2018
18 Kendall rank correlation coefficient, tau
• Kendall rank correlation is a non-parametric test that measures the
degree of concordance between 2 columns of ranked data.

• Range, -1.0 and +1.0 just like, r and rs

• Kendall’s tau = (C – D) / (C + D)
C – No of concordant pairs
D – No of discordant pairs

• Kendall's rank correlation improves upon this by reflecting the strength


of the dependence between the variables Spearman coefficient
being compared.
MEFGI - GTU 10/12/2018
Example 3. Kendall’s tau
19

Income Rank X Rank Y C D


SAMPLE Educ. Level (X) (Y) Rank X Rank Y 1.5 7 0 6
A Preparatory 25 5 3 1.5 1 5 0
B Primary. 10 6 5.5 3.5 5.5 0 3
C University. 8 1.5 7 3.5 4 1 2
D secondary 10 3.5 5.5 5 3 1 2
E secondary 15 3.5 4 6 5.5 0 1
F illiterate 50 7 2 7 2
G university. 60 1.5 1 7 14

tau = (C – D) / (C + D)
= (7- 14) / (7 + 14) = -0.33 ( -ve Weak Relationship)

Spearman, (rs)=-0.1
MEFGI - GTU 10/12/2018
20 Pearson Vs Spearman rs Vs Kendall’s tau
▪ Parameteric statistic ▪ Non- Parameteric statistic

▪ rs is usually greater than tau, for tau = (C – D) / (C + D)


most cases (rs)=-0.1 tau = -0.33
▪ Parametric methods produce
more accurate and precise
estimates than non-parametric
methods.
MEFGI - GTU 10/12/2018
21
Regression Analysis
▪ Regression analysis is a form of predictive modelling technique
which investigates the relationship between a dependent (y) and
independent variable (x) (predictor).

▪ Technique is used for forecasting and finding the cause- effect


relationship between the variables.

▪ For example
1) Relationship between strength of concrete and number of
curing days
2) Relationship between strength of road subgrade with lime
content, ground temperature and delay in compaction

MEFGI - GTU 10/12/2018


22 Methods of regression
1. Graphical methods
2. Method of group averages
3. Method of moments
4. Method of least squares

▪ Graphical method and the method of averages fail to give the


values of the unknown constants uniquely and accurately while other
methods do.
▪ The method of least squares is the best to fit a unique curve to a
given data. It is also widely used in applications and can be easily
implemented on a computer.

MEFGI - GTU 10/12/2018


23 Graphical methods

MEFGI - GTU 10/12/2018


24 Graphical methods

MEFGI - GTU 10/12/2018


25
Graphical methods

MEFGI - GTU 10/12/2018


26 Method of group averages

MEFGI - GTU 10/12/2018


27 Method of group averages-Example

r = a + bt , r = 1090.26 – 0.534t

MEFGI - GTU 10/12/2018


28 Method of moments

MEFGI - GTU 10/12/2018


29 Method of moments - Example

MEFGI - GTU 10/12/2018


30 Method of least squares

• We need to minimise the sum


of squares of the errors
Vertical distance between pt (xi, yi) =
error

MEFGI - GTU 10/12/2018


31 Method of least squares (MLS)

▪ To minimise the sum of the squares of the error

MLS can be used to fit the data under the following situations
1. Relationship is linear y = f(x) = a + bx
2. Relationship is a polynomial f(x) = a + bx + bx + cx2
3. Relationship is transcendental f(x)=aeb
4. Multiple linear regression

MEFGI - GTU 10/12/2018


32 Method of least squares (MLS) – linear regression

Relationship is linear y = f(x) = a + bx

…………………..eqn (1)

…………………..eqn (2)

MEFGI - GTU 10/12/2018


33 Method of least squares (MLS) – linear regression

MEFGI - GTU 10/12/2018


34 Method of least squares (MLS) – polynomial
relationship (second order) - Example

y=a1 + a2 x + a3x2
Normal equations are as below;

MEFGI - GTU 10/12/2018


35 Method of least squares (MLS) – polynomial
relationship (second order) - Example

MEFGI - GTU 10/12/2018


36 Multiple linear regression model
Helps to learn more about the relationship between several
independent or predictor variables and a dependent or criterion
variable.

Example. To study the relationship between strength of road subgrade


(Y) with lime content (A), ground temperature (B) and delay in
compaction (C)

MEFGI - GTU 10/12/2018


37 Multiple linear regression model -Example
Subgrade Lime Ground Delay in
strength content temperature compacti
(CBR) -Y (%)-A /C -B on (Hrs) -C
68.5 2 25 0.25
98.9 4 30 0.5
102.5 6 35 0.75
120.5 8 40 1
99.8 10 45 1.25
99.9 12 50 1.5
85 14 55 1.75
Using SPSS a regression model was obtained as

MEFGI - GTU 10/12/2018


38 Some Statistical packages for correlation and
regression analysis
▪ Ms Excel
▪ SPSS
▪ MATLAB
▪ Stata
▪ Statistica
▪ StatXact
▪ Systat

MEFGI - GTU 10/12/2018


39 Conclusion
▪ Correlation coefficient measures the strength and direction between two variables
are related
▪ Pearson correlation coefficient is better for parametric statistics whereas Spearman
coefficient is better for non parametric statics
▪ Method of squares minimises the sum of the errors or vertical distances around the
regression line. It’s best compared to other methods
▪ A multiple regression model gives the relationship between on dependent variable
(y) and other independent variables A, B, C

MEFGI - GTU 10/12/2018


40 References
▪ Numerical methods in Engineering and Science. Dr B.S Grewal
▪ Numerical Methods by E Balagurusany
▪ Numerical Methods in Engineering with Matlab by Jaan Kiusalaas
▪ Statistics Solutions -http://www.statisticssolutions.com
▪ An investigation into field factors that affect the strength of Compacted
lime stabilised clay for subgrade construction. P. NYANZI and Odongo
Parsley, (2015)

MEFGI - GTU 10/12/2018


41

MEFGI - GTU 10/12/2018

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy