Me310 5 Regression PDF
Me310 5 Regression PDF
Me310 5 Regression PDF
Numerical Methods
1
About Curve Fitting
Curve fitting is expressing a discrete set of data points as a continuous function.
It is frequently used in engineering. For example the emprical relations that we use in heat transfer
and fluid mechanics are functions fitted to experimental data.
Regression: Mainly used with experimental data, which might have significant amount of error
(noise). No need to find a function that passes through all discrete points.
Polynomial
Regression
x x
Interpolation: Used if the data is known to be very precise. Find a function (or a series of
functions) that passes through all discrete points.
Four different
A single functions
f(x) f(x)
function
x x
Polynomial Interpolation Spline Interpolation
2
Least Squares Regression
(Read the statistics review from the book.)
Fitting a straight line to a set of data set (paired data points).
(x1, y1), (x2, y2), (x3, y3), ..., (xn, yn)
y
a0 : y-intercept (unknown)
a0 + a 1 x a1 : slope (unknown)
yi
ei a0 + a 1 xi ei = yi - a0 - a1 xi
Error (deviation) for the
ith data point
xi x
Minimize the error (deviation) to get a best-fit line (to find a0 and a1). Several posibilities are:
Minimize the sum of individual errors.
Minimize the sum of absolute values of individual errors.
Minimize the maximum error.
Minimize the sum of squares of individual errors. This is the preferred strategy (Check the
bookto see why others fail).
3
Minimizing the Square of Individual errors
n n
Sr e
i1
2
i (yi1
i a 0 a1x i )2 Sum of squares of the residuals
Sr
x a y
n
-2 (yi a 0 a1x i) 0 n a0
a 0
i 1 i
i 1
Sr
x a x a x y
n
-2 [(yi a 0 a1x i)x i ] 2
a 1
i 0 i 1 i i
i1
or
n
x a y
i 0 i
These are called the normal equations.
xi x a x y
2
i 1 i i
x 1 3 5 7 10 12 13 16 18 20
y 4 5 6 5 8 7 6 9 12 11
n 10
x i 105 n (x i yi ) x i yi 10*906 105 * 73
a1 0.3725
y i 73 n x i2 ( x i )2 10 * 1477 105 2
x 10.5
y 7 .3 a 0 7.3 - 0.3725 *10.5 3.3888
x 1477
2
i
x y 906
i i
Exercise 24: It is always a good idea to plot the data points and the regression line to see
how well the line represents the points. You can do this with Excel. Excel will calculate a0
and a1 for you.
5
Error of Linear Regression (How good is the best line?)
Spread of data around the mean Spread of data around the regression line
y y
a 0 + a1 x
x x
n n n
St (y
i1
i y) 2
Sr e 2
i (y i a0 a1x i )2
i1 i1
St Sr
sy std. deviation sy / x std. error of estimate
n 1 n2
The improvement obtained by using a regression line instead of the mean gives a maesure of
how good the regression fit is.
S t Sr
coefficient of determination r2
St
n (xi yi ) xi yi
correlation coefficient r
n xi2 ( xi )2 n yi2 ( yi )2 6
How to interpret the correlation coefficient?
Two extreme cases are
Sr = 0 r=1 describes a perfect fit (straight line passing through all points).
Sr = St r=0 describes a case with no improvement.
Usually an r value close to 1 represents a good fit. But be careful and always plot the data points
and the regression line together to see what is going on.
n 10 n
S t ( yi y)2 64.1
x i 105 i 1
y i 73
n
Sr ( yi a 0 a1x i )2 12.14
x 10.5 i 1
y 7 .3 S t Sr
r2 0.8107
St
x 1477
2
i
r 0 .9
x y 906
i i
7
Example 24 (contd): Reverse x and y. Find the linear regression line and calculate r.
x = -5.3869 + 2.1763 y
St = 374.5, Sr = 70.91 (different than before).
r2 = 0.8107, r = 0.9 (same as before).
Exercise 25: When working with experimental data we usually take the variable that is
controlled by us in a precise way as x. The measured or calculated quantities are y. See
Midterm II of Fall 2003 for an example.
x x
Create the following table. x 0.4 0.8 1.2 1.6 2.0 2.3
ln y 6.62 6.91 7.24 7.60 7.90 8.23
Fit a straight line to this new data set. Be careful with the notation. You can define z = ln y
Calculate a0 = 6.25 and a1 = 0.841. Straight line is ln y = 6.25 + 0.841 x
Switch back to the original equation. A1 = ea0 = 518, B1 = a1 = 0.841.
Therefore the exponential equation is y = 518 e0.841x. Check this solution with couple of data
9
points. For example y(1.2) = 518 e0.841 (1.2) = 1421 or y(2.3) = 518 e0.841 (2.3) = 3584. OK.
Linearization of Nonlinear Behavior (contd)
(2) Power Equation (y = A2 x B2)
log y = log A2 + B2 log x
y
y = A2 x B2
log y or
Linearization log y = a0 + a1 log x
log x
x
log x 0.398 0.544 0.699 0.778 0.875 1.000 1.097 1.176 1.243 1.301
log y 0.845 0.740 0.591 0.556 0.491 0.447 0.415 0.380 0.362 0.362
Fit a straight line to this new data set. Be careful with the notation.
Calculate a0 = 1.002 and a1 = -0.53. Straight line is log y = 1.002 0.53 log x
Switch back to the original equation. A2 = 10a0 = 10.05, B2 = a1 = -0.53.
Therefore the power equation is y = 10.05 x-0.53. Check this solution with couple of data points. For
10
example y(5) = 10.05 * 5-0.53 = 4.28 or y(15) = 10.05 * 15-0.53 = 2.39. OK.
Linearization of Nonlinear Behavior (contd)
(3) Saturation-growth rate Equation (y = A3 x / (B3+x) )
1/y = 1/A3 + B3/A3 x
y
y = y=A3 x / (B3+x)
1/y or
Linearization 1/y = a0 + a1x
x 1/x
Fit a straight line to this new data set. Be careful with the notation.
Calculate a0 = 0.512 and a1 = 0.562. Straight line is 1/y = 0.512 + 0.562 (1/x)
Switch back to the original equation. A3 = 1/a0 = 1.953, B3 = a1A3 = 1.097.
Therefore the saturation-growth rate equation is 1/y = 1.953 x/(1.097+x). Check this solution with
11
couple of data points. For example y(2) = 1.953*2/(1.097+2) = 1.26 OK.
Polynomial Regression (Extension of Linear Least Sqaures)
Used to find a best-fit line for a nonlinear behavior.
This is not nonlinear regression described at page 468 of the book. That section is omitted.
y a0 + a 1 x + a 2 x2
yi Example for a second order
polynomial regression
a 0 + a 1 x i+ a2 x i2
ei = yi - a0 - a1 xi - a2 xi2
Error (deviation) for the
ith data point
xi x
n n
Sr e
i1
2
i (y
i1
i a 0 a1x i a 2 x i2 )2 Sum of squares of the residuals
Sr Sr Sr
Minimize this sum to get the normal equations 0, 0, 0
a 0 a 1 a 2
Solve these equations with one of the techniques that we learned to get a0, a1 and a2.
12
Polynomial Regression Example
Find the least-squares parabola that fits to the following data set.
x 0 1 2 3 4 5
y 2.1 7.7 13.6 27.2 40.9 61.1
n6
a 0 2.479 , a1 2.359 , a2 1.861
x i 15 y i 152 .6
y 2.479 2.359 x 1.861 x 2
x 2i 55 x i y i 585.6 S t Sr 2573 .4 3.75
r2 0.999
x i3 225 x 2i y i 2488 .6 St 2573 .4
x i4 979 r 0.999
13
Multiple Linear Regression
y = y(x1, x2)
n n
Sum of squares of the residuals is Sr e2i (y i a0 a1 x 1i a2 x 2i )2
i1 i1
S r S r S r
Minimize this sum to get the normal equations 0, 0, 0
a0 a1 a2
n
x 1i x 2i a0 yi
x 1i x 12i x 1i x 2i a1 x 1i y i
x
2i x 1i x 2i x 22i a2 x y
2i i
14
Example 28:
x 0 1 1 2 2 3 3 4 4
y 0 1 2 1 2 1 2 1 2
z 15 18 12.8 25.7 20.6 35 29.8 45.5 40.3
n9
x i 20 x i y i 30 9 20 12 a0 242 .7
x 2i 60 z i 242 .7 20 60 30 a1 661
y i 12 x i z i 661 12 30 20 a2 331 .2
y 2i 20 y iz i 331 .2
Exercise 26: Calculate the standard error of the estimate (sy/x) and the correlation coefficient (r).
15