0% found this document useful (0 votes)
66 views6 pages

Least Square

The document discusses finding the best-fit line through a set of data points using the least squares method. It defines the best-fit line as the line where the sum of the squares of the errors between the observed y-values and the projected y-values is minimized. It derives the equations to calculate the slope and intercept of the least squares line. The correlation coefficient is introduced as a measure of how well the best-fit line represents the linear relationship between the variables, ranging from -1 to 1. Examples of linearizing transformations and using Matlab's polyfit and corrcoef functions are provided.

Uploaded by

Serkan Sancak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views6 pages

Least Square

The document discusses finding the best-fit line through a set of data points using the least squares method. It defines the best-fit line as the line where the sum of the squares of the errors between the observed y-values and the projected y-values is minimized. It derives the equations to calculate the slope and intercept of the least squares line. The correlation coefficient is introduced as a measure of how well the best-fit line represents the linear relationship between the variables, ranging from -1 to 1. Examples of linearizing transformations and using Matlab's polyfit and corrcoef functions are provided.

Uploaded by

Serkan Sancak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 6

Least Square solution of estimation problem.

Given a set of data  xi , yi i  n . Is there any


relationship between the two?

Could we find a best-fit line through these data?


What do we mean by “the best-fit line”?

Many ways we could define a “best-fit” line. How


about this? A best-fit line is that line on which the
sum of squares of all errors (the difference between
yi and the projected data) is minimum.

Suppose the best-fit line is the line y  a0  a1x . In that


case, the estimation error at the point xi is

ei  abs( a0  a1 xi  yi )
n n
Let S   ei2   ( a0  a1 xi  yi )2
i 1 i 1

We want to find that line (i.e. that a0 and a1 ) for


which S is minimized. Such a line is called Least-
square line (or a trend line).

Since S is a function of both a0 (the intercept) and


a1 (the slope), we must have

S n
  2( a0  a1 xi  yi )  0
a0 i 1 … (1) and

S n
  2 xi ( a0  a1 xi  yi )  0
a1 i 1 … (2)

na0  a1  xi   yi
From (1), we get i i … (3) and
a0  xi  a1  xi2   xi yi
From (2), we get i i i …(4)

From these, we get

n xi yi   xi  yi
a1 
n xi2    xi  2

 yi  a  xi
Given this, we obtain a0 
n
1
n
How well does the line y  a0  a1x cover the data?
This is given by the correlation coefficient  where
cov( x , y )
( x, y ) 
 x y

 n xy    x   y   2
where  ( x , y ) 
n x 2    x 2 n y 2    y  2 
The correlation coefficient  is bounded:  1    1.
A good fit implies absolute value of  is closer to 1.

Correlation coefficient is a measure of linearity of


data. Even if data is not linear, one could try various
transformation on the data and try a linear fit
particularly if the associated correlation coefficient is
high in magnitude.
Example: Linearization process as Y  a0  a1 X

Transformation Transformation Relation


x y

a X x Y  log y y  ae bx

b. X  log x Y  log y y  ax b

c. X  log x Y  y 
y  log ax b 2
d. X  x2 Y  ey 
e y  a  bx 2 

In Matlab, we get least-square fit solution using the


following approach.

Let A be the matrix with the x values in the first


column, and the y values in the second column.

>> a

a=

1 2
2 4
3 5
4 6
5 9
6 3
7 12
8 15
9 14
>> x=a(:,1)'

x=

1 2 3 4 5 6 7 8 9

>> y=a(:,2)'

y=

2 4 5 6 9 3 12 15 14
>> pcoeff=polyfit(x,y,1)

pcoeff =

1.5333 0.1111

>>

The best fit line is y  0.1111  1.5333 x in this case.

How about fitting a quadratic function? In this case,


we try

>> pcoeff=polyfit(x,y,2)

pcoeff =

0.1212 0.3212 2.3333

The best fit quadratic curve is now


y  0.1212 x 2  0.3212 x  2.3333

To get the value of correlation coefficient, compute


in Matlab

>> r=corrcoef(a)

r=

1.0000 0.8582
0.8582 1.0000
>>

The entry in ( ij ) element refers to the correlation


coefficient between the ith and the jth variable.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy