0% found this document useful (0 votes)

11 views65 pages

ML - Lec 4-Introduction To Regression

This document introduces regression analysis, focusing on linear regression and its variations, including Bayesian and multivariate regression. It explains the concepts of maximum likelihood estimation and weighted regression, as well as the inclusion of constant terms and handling varying noise in data. The document also touches on non-linear regression and polynomial regression, providing a comprehensive overview of regression techniques and their applications.

Uploaded by

alidashtbozorg709

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views65 pages

ML - Lec 4-Introduction To Regression

Uploaded by

alidashtbozorg709

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 65

Lecture 4: An introduction

to Regression

based on Andrew W. Moore Slides

Single-
Parameter
Linear
Regression
2
Linear Regression
DATASET

inputs outputs
x1 = 1 y1 = 1
x2 = 3 y2 = 2.2

w x3 = 2 y3 = 2
 1 
x4 = 1.5 y4 = 1.9
x5 = 4 y5 = 3.1

Linear regression assumes that the expected value of the

output given an input, E[y|x], is linear.
Simplest case: Out(x) = wx for some unknown w.
Given the data, we can estimate w.
3
1-parameter linear regression
Assume that the data is formed by
yi = wxi + noisei

where…
• the noise signals are independent
• the noise has a normal distribution with mean 0 and
unknown variance σ2

p(y|w,x) has a normal distribution with

• mean wx
• variance σ2
4
Bayesian Linear Regression
p(y|w,x) = Normal (mean wx, var σ2)

We have a set of datapoints (x1,y1) (x2,y2) … (xn,yn) which

are EVIDENCE about w.

We want to infer w from the data.

p(w|x1, x2, x3,…xn, y1, y2…yn)
•You can use BAYES rule to work out a posterior
distribution for w given the data.
•Or you could do Maximum Likelihood Estimation

5
Maximum likelihood estimation of w

Asks the question:

“For which value of w is this data most likely to have
happened?”
<=>
For what w is
p(y1, y2…yn |x1, x2, x3,…xn, w) maximized?
<=>
For what w is n

 p( y
i 1
i w, xi ) maximized?

6
For what w is
n

 p( y
i 1
i w, xi ) maximized?

For what w is n
1 yi  wx i 2
i 1
exp(  (
2 
) ) maximized?

For what w is
1  yi  wxi 
n 2

i 1
 
2 
 maximized?

For what w is 2
n

y
i 1
i  wxi  minimized?
Least Square

7
Linear Regression

The maximum
likelihood w is the
one that minimizes E(w)
w
sum-of-squares of
    yi  wxi 
2
residuals
i

  yi  2 xi yi w 
2
 x w
i
2 2

i
We want to minimize a quadratic function of w.
8
Linear Regression
Easy to show the sum of
squares is minimized


when
xy
w i i

x
2
i

The maximum likelihood model

is
Out x   wx
We can use it for
prediction
9
Linear Regression
Easy to show the sum of
squares is minimized


when p(w)
xy
w i i w

x
2
i Note: In Bayesian stats you’d have

The maximum likelihood model ended up with a prob dist of w

is
Out x   wx
And predictions would have given a prob dist
of expected output

Often useful to know your confidence. Max

We can use it for likelihood can give some kinds of
prediction confidence too.

10
Regression example
• Generated: w=2
• Recovered: w=2.03
• Noise: std=1

11
Regression example
• Generated: w=2
• Recovered: w=2.05
• Noise: std=2

12
Regression example
• Generated: w=2
• Recovered: w=2.08
• Noise: std=4

13
Multivariate
Linear
Regression
14
Multivariate Regression
What if the inputs are vectors?
3.
.4 6.
. 5 2-d input
.8 example
. 10
x2
x1
Dataset has form
x1 y1
x2 y2
x3 y3
.: :
.
xR yR
15
Multivariate Regression
Write matrix X and Y thus:

 .....x1 .....  x11 x12 ... x1m   y1 

.....x .....  x x22 ... x2 m  y 
x 2    21 y   2
       
     
.....x R .....  xR1 xR 2 ... xRm   yR 

(there are R datapoints. Each input has m components)

The linear regression model assumes a vector w such that
Out(x) = wTx = w1x[1] + w2x[2] + ….wmx[D]
The max. likelihood w is w = (XTX) -1(XTY)

16
Multivariate Regression
Write matrix X and Y thus:

 .....x1 .....  x11 x12 ... x1m   y1 

(there are R datapoints. Each input has m components)

The linear regression model assumes a vector w such that
Out(x) = wTx = w1x[1] + w2x[2] + ….wmx[D]
The max. likelihood w is w = (XTX) -1(XTY)

17
Multivariate Regression
•

- +

18
Multivariate Regression (con’t)

The max. likelihood w is w = (XTX)-1(XTY)

XTX is an m x m matrix: i,j’th elt is x

k 1
x
ki kj

R
XTY is an m-element vector: i’th elt x
k 1
y
ki k

19
Constant Term
in Linear
Regression
20
What about a constant term?
We may expect linear
data that does not go
through the origin.

Statisticians and
Neural Net Folks all
agree on a simple
obvious hack.

Can you guess??

21
The constant term
• The trick is to create a fake input “X0” that
always takes the value 1

X1 X2 Y X0 X1 X2 Y
2 4 16 1 2 4 16
3 4 17 1 3 4 17
5 5 20 1 5 5 20
Before: After:
Y=w1X1+ w2X2 Y= w0X0+w1X1+ w2X2
…has to be a poor In this example, You = w0+w1X1+ w2X2
model should be able to see
…has a fine constant term
the MLE w0 , w1 and
w2 by inspection
22
Linear
Regression with
varying noise
23
Regression with varying noise
• Suppose you know the variance of the noise that
was added to each datapoint.
y=3
=2
xi yi  i2
½ ½ 4 y=2
=1/2
1 1 1
=1
2 1 1/4 y=1
=1/2
=2
2 3 4 y=0

3 2 1/4 x=0 x=1 x=2 x=3

Assume yi ~ N ( wxi ,  ) i
2

24
MLE estimation with varying noise
argmax log p( y , y ,..., y
1 2 R | x1 , x 2 ,..., x R ,  2
1 ,  2
2 ,..., R , w) 
2

w Assuming independence
R
( yi  wxi ) 2 among noise and then
argmin   2
 plugging in equation for
Gaussian and simplifying.
i 1 i
w
 R
xi ( yi  wxi )  Setting dLL/dw
 w such that   0   equal to zero
 i 1 i 2


 R xi yi  Trivial algebra
  2 
 i 1  i 
 R xi2 
  2 
 i 1  i 
25
This is Weighted Regression
• We are asking to minimize the weighted sum of
squares
y=3
=2
R
( yi  wxi ) 2
argmin   2 y=2
i 1 i =1/2
w
y=1 =1
=1/2
=2
y=0
x=0 x=1 x=2 x=3

1
where weight for i’th datapoint is  i2
26
Non-linear
Regression
27
Non-linear Regression
• Suppose you know that y is related to a function of x in
such a way that the predicted values have a non-linear
dependence on w, e.g:
y=3
xi yi
½ ½ y=2

1 2.5
2 3 y=1

3 2 y=0

3 3 x=0 x=1 x=2 x=3

Assume yi ~ N ( w  xi ,  ) 2

28
Non-linear MLE estimation
argmax log p( y , y ,..., y
1 2 R | x1 , x2 ,..., xR ,  , w) 
w Assuming i.i.d. and

argmin  y  
R 2 then plugging in
i w  xi equation for Gaussian
i 1 and simplifying.
w
 R
y  w  xi  Setting dLL/dw
 w such that  i  0  equal to zero
 w  x 
 i 1 i 

29
Non-linear MLE estimation
argmax log p( y , y ,..., y
1 2 R | x1 , x2 ,..., xR ,  , w) 
w Assuming i.i.d. and

algebraic
solution???

30
Non-linear MLE estimation
argmax log p( y , y ,..., y1 2 R | x1 , x2 ,..., xR ,  , w) 
w Assuming i.i.d. and

argmin   
Common (but not only) approach: R 2 then plugging in

Numerical Solutions:
yi  w  xi  equation for Gaussian
i 1 and simplifying.
• Line Search w
• Simulated Annealing
 
R
yi  w  xi Setting dLL/dw
 w such that
• Gradient Descent
  w  x
 0 

equal to zero


• Conjugate Gradient i  1 i 
• Levenberg Marquart
• Newton’s Method algebraic
solution???
Also, special purpose statistical-
optimization-specific tricks such as
E.M. (See Gaussian Mixtures lecture
for introduction)
31
Polynomial
Regression
32
Polynomial Regression
So far we’ve mainly been dealing with linear regression
X1 X2 Y X= 3 2 y= 7

3 2 7 1 1 3
1 1 3 : : :
: : : x1=(3,2).. y1=7..
Z= 1 3 2 y=
7
1 1 1 3
: : : b=(ZTZ)-1(ZTy)
z1=(1,3,2).. y1=7..
yest = b0+ b1 x1+ b2 x2
zk=(1,xk1,xk2)
33
Quadratic Regression
It’s trivial to do linear fits of fixed nonlinear basis functions
X1 X2 Y X= 3 2 y= 7

3 2 7 1 1 3
1 1 3 : : :
: : : x1=(3,2).. y1=7..
1 3 2 9 6 4 y=
Z=
7
1 1 1 1 1 1
3 b=(ZTZ)-1(ZTy)
: :
:
yest = b0+ b1 x1+ b2 x2+
z=(1 , x1, x2 , x1 2, x1x2,x22,)
b 3 x1 2 + b 4 x 1 x 2 + b 5 x 2 2

34
Quadratic Regression
It’s trivial to do
Each linear fitsofofafixed
component nonlinear
z vector basis
is called functions
a term.
X1 X2 Each Y column of X= 3 2 y= 7
the Z matrix is called a term column
3 2 How
7 many terms in 1a quadratic
1 3
regression with m
1 1 inputs?
3 : : :
•1 constant term
: : : x1=(3,2).. y1=7..
1 •m3 linear
2 terms
9 6 4 y=
Z=
•(m+1)-choose-2 = 7 quadratic terms
m(m+1)/2
1 1 1 1 1 1
3 b=(Z TZ)-1(ZTy)
: in total = O(m )
2
: (m+2)-choose-2 terms
: esty = b0+ b1 x1+ b2 x2+
z=(1 , x1, x2 , x12, x1x2,x22,)T -1 T
Note that solving b=(Z Z) (Z y) 2 + bO(m
b isx thus x x )+ b x 2
6
3 1 4 1 2 5 2

35
Qth-degree polynomial Regression
X1 X2 Y X= 3 2 y= 7

3 2 7 1 1 3
1 1 3 : : :
: : : x1=(3,2).. y1=7..
1 3 2 9 6 … y= 7
Z=
1 1 1 1 1 … 3
b=(ZTZ)-1(ZTy)
: … :
z=(all products of powers of inputs in yest = b0+
which sum of powers is q or less,) b1 x1+…

36
m inputs, degree Q: how many terms?
= the number of unique terms of them form
q1
x x ...x
1
q2
2
qm
m where  qi  Q
i 1
= the number of unique terms of the mform
q0 q1
1 x x ...x
1
q2
2
qm
m where  qi  Q
i 0
= the number of lists of non-negative integers [q0,q1,q2,..qm]
in which Sqi = Q
= the number of ways of placing Q red disks on a row of
squares of length Q+m = (Q+m)-choose-Q

Q=11, m=4
q0=2 q1=2 q2=0 q3=4 q4=3
37
Radial Basis
Functions
38
Radial Basis Functions (RBFs)
X1 X2 Y X= 3 2 y= 7

3 2 7 1 1 3
1 1 3 : : :
: : : x1=(3,2).. y1=7..
… … … … … … y= 7
Z=
… … … … … … 3
b=(ZTZ)-1(ZTy)
… … … … … … :
z=(list of radial basis function evaluations) yest = b0+
b1 x1+…

39
1-d RBFs

y
c1 c1 c1
x

yest = b1 f1(x) + b2 f2(x) + b3 f3(x)

where
fi(x) = KernelFunction( | x - ci | / KW)

40
Example

y
c1 c1 c1
x

yest = 2f1(x) + 0.05f2(x) + 0.5f3(x)

where
fi(x) = KernelFunction( | x - ci | / KW)

41
RBFs with Linear Regression
KW also held constant
(initialized to be large
Allyci ’s are held constant enough that there’s decent
(initialized randomly or overlap between basis
on a grid
c1 in m- c1 functions*
c1
x
dimensional input space) *Usually much better than the crappy
overlap on my diagram
yest = 2f1(x) + 0.05f2(x) + 0.5f3(x)
where
fi(x) = KernelFunction( | x - ci | / KW)

42
RBFs with Linear Regression
KW also held constant
(initialized to be large
Allyci ’s are held constant enough that there’s decent
(initialized randomly or overlap between basis
on a grid
c1 in m- c1 functions*
c1
x
dimensional input space) *Usually much better than the crappy
overlap on my diagram
yest = 2f1(x) + 0.05f2(x) + 0.5f3(x)
where
fi(x) = KernelFunction( | x - ci | / KW)
then given Q basis functions, define the matrix Z such that Zkj =
KernelFunction( | xk - ci | / KW) where xk is the kth vector of inputs
And as before, b=(ZTZ)-1(ZTy)

43
RBFs with NonLinear Regression

Allow the ci ’s to adapt to KW allowed to adapt to the data.

44
RBFs with NonLinear Regression

Allow the ci ’s to adapt to KW allowed to adapt to the data.

Answer: Gradient Descent

45
RBFs with NonLinear Regression

Allow the ci ’s to adapt to KW allowed to adapt to the data.

ythe data (initialized (Some folks even let each basis
randomly or on a grid inc function have its own
c1 1 c1 fine detail in
KWj,permitting
m-dimensional
x input
dense regions of input space)
space)
yest = 2f1(x) + 0.05f2(x) + 0.5f3(x)
where
fi(x) = KernelFunction( | x - ci | / KW)
But how do we now find all the bj’s, ci ’s and KW ?
(But I’d like to see, or hope someone’s already done, a
hybrid, where the ci ’s and KW are updated with gradient Answer: Gradient Descent
descent while the bj’s use matrix inversion)

46
Radial Basis Functions in 2-d
Two inputs.
Outputs (heights
sticking out of page) Center
not shown.

Sphere of
x2 significant
influence of
center

x1
47
Happy RBFs in 2-d
Blue dots denote
coordinates of
input vectors
Center

Sphere of
x2 significant
influence of
center

x1
48
Crabby RBFs in 2-d What’s the
problem in this
Blue dots denote example?
coordinates of
input vectors
Center

Sphere of
x2 significant
influence of
center

x1
49
More crabby RBFs And what’s the
problem in this
Blue dots denote example?
coordinates of
input vectors
Center

Sphere of
x2 significant
influence of
center

x1
50
Hopeless! Even before seeing the data, you should
understand that this is a disaster!

Center

Sphere of
significant
x2 influence of
center

x1
51
Unhappy Even before seeing the data, you should
understand that this isn’t good either..

Center

Sphere of
significant
x2 influence of
center

x1
52
Robust
Regression
53
Robust Regression

x
54
Robust Regression
This is the best fit that
Quadratic Regression can
manage

x
55
Robust Regression
…but this is what we’d
probably prefer

x
56
LOESS-based Robust Regression
After the initial fit, score
each datapoint according to
how well it’s fitted…

y You are a very good

datapoint.

x
57
LOESS-based Robust Regression
After the initial fit, score
each datapoint according to
how well it’s fitted…

y You are a very good

datapoint.

You are not too

shabby.

x
58
LOESS-based Robust Regression
After the initial fit, score
each datapoint according to
But you are how well it’s fitted…
pathetic.

y You are a very good

datapoint.

You are not too

shabby.

x
59
Robust Regression
For k = 1 to R…
•Let (xk,yk) be the kth datapoint
y
•Let yestk be predicted value of
yk
x
•Let wk be a weight for
datapoint k that is large if the
datapoint fits well and small if it
fits badly:
wk = KernelFn([yk- yestk]2)

60
Robust Regression
For k = 1 to R…
•Let (xk,yk) be the kth datapoint
y
•Let yestk be predicted value of
yk
x
•Let wk be a weight for
Then redo the regression datapoint k that is large if the
using weighted datapoints. datapoint fits well and small if it
Weighted regression was described earlier in fits badly:
the “vary noise” section, and is also discussed
in the “Memory-based Learning” Lecture. wk = KernelFn([yk- yestk]2)
Guess what happens next?

61
Robust Regression
For k = 1 to R…
•Let (xk,yk) be the kth datapoint
y
•Let yestk be predicted value of
yk
x
•Let wk be a weight for
Then redo the regression datapoint k that is large if the
using weighted datapoints. datapoint fits well and small if it
I taught you how to do this in the “Instance- fits badly:
based” lecture (only then the weights
depended on distance in input-space) wk = KernelFn([yk- yestk]2)
Repeat whole thing until
converged!

62
Robust Regression---what we’re
doing
What regular regression does:
Assume yk was originally generated using the
following recipe:
yk = b0+ b1 xk+ b2 xk2 +N(0,2)
Computational task is to find the Maximum
Likelihood b0 , b1 and b2

63
Robust Regression---what we’re
doing
What LOESS robust regression does:
Assume yk was originally generated using the
following recipe:
With probability p:
yk = b0+ b1 xk+ b2 xk2 +N(0,2)
But otherwise
yk ~ N(m,huge2)
Computational task is to find the Maximum
Likelihood b0 , b1 , b2 , p, m and huge

64
Robust Regression---what we’re
doing
What LOESS robust regression does:
Mysteriously, the
Assume yk was originally generated using thereweighting procedure
following recipe: does this computation
for us.
With probability p:
yk = b0+ b1 xk+ b2 xk2 +N(0,2) Your first glimpse of
two spectacular letters:
But otherwise
yk ~ N(m,huge2) E.M.
Computational task is to find the Maximum
Likelihood b0 , b1 , b2 , p, m and huge

Lecture 0.2 - Linear Methods For Regression, Optimization
No ratings yet
Lecture 0.2 - Linear Methods For Regression, Optimization
53 pages
MPMC - Notes T Apparao
No ratings yet
MPMC - Notes T Apparao
105 pages
Introduction To MChip Advance Card Application Specifications - Payment
No ratings yet
Introduction To MChip Advance Card Application Specifications - Payment
38 pages
Lecture 5 - Linear Regression
No ratings yet
Lecture 5 - Linear Regression
51 pages
Xinorbis6 User Manual
No ratings yet
Xinorbis6 User Manual
73 pages
Linear Regression
No ratings yet
Linear Regression
104 pages
Single-Parameter Linear Regression: Predicting Real-Valued Outputs: An Introduction To Regression
No ratings yet
Single-Parameter Linear Regression: Predicting Real-Valued Outputs: An Introduction To Regression
51 pages
Neural
No ratings yet
Neural
68 pages
M6 RegressionLinearModels v2
No ratings yet
M6 RegressionLinearModels v2
97 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
ML - Lec 5 - Regression - Gradient Descent Least Square
No ratings yet
ML - Lec 5 - Regression - Gradient Descent Least Square
59 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
ML EasySol
No ratings yet
ML EasySol
62 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Year 8 - 2020 Booklist
No ratings yet
Year 8 - 2020 Booklist
4 pages
Machine Learning Unit2
No ratings yet
Machine Learning Unit2
31 pages
9 Mle
No ratings yet
9 Mle
39 pages
4y - ENGINEce602-2 TOYOTA PDF
No ratings yet
4y - ENGINEce602-2 TOYOTA PDF
209 pages
L. D. College of Engineering: Lab Manual For
No ratings yet
L. D. College of Engineering: Lab Manual For
70 pages
Design Training - Panasonic
No ratings yet
Design Training - Panasonic
114 pages
11 - Học máy cơ bản - Hồi quy tuyến tính 1
No ratings yet
11 - Học máy cơ bản - Hồi quy tuyến tính 1
105 pages
MoticEasyscan One - E - 1300901111091 - 22052018 - Low
No ratings yet
MoticEasyscan One - E - 1300901111091 - 22052018 - Low
38 pages
Lecture 2
No ratings yet
Lecture 2
19 pages
Sparse Regression
No ratings yet
Sparse Regression
37 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
Linear - Regression
100% (1)
Linear - Regression
39 pages
Machine Learning: Linear Models For Regression
No ratings yet
Machine Learning: Linear Models For Regression
54 pages
Regression
No ratings yet
Regression
11 pages
Lecture 2 - Linear Regression
No ratings yet
Lecture 2 - Linear Regression
54 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
COL774 Practice Problems
No ratings yet
COL774 Practice Problems
22 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
Mlfa Autumn 22 Lec 02
No ratings yet
Mlfa Autumn 22 Lec 02
24 pages
31 Least Squares
No ratings yet
31 Least Squares
39 pages
Lecture 3 Multi-Regresion 2022.
No ratings yet
Lecture 3 Multi-Regresion 2022.
16 pages
Chapter 12 Power Point 5e HP
No ratings yet
Chapter 12 Power Point 5e HP
83 pages
Neural 13
No ratings yet
Neural 13
34 pages
04 LinearModels
No ratings yet
04 LinearModels
28 pages
CS550 Lec2
No ratings yet
CS550 Lec2
24 pages
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
No ratings yet
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
26 pages
Lect5 Reg
No ratings yet
Lect5 Reg
16 pages
G.C. Calafiore (Politecnico Di Torino)
No ratings yet
G.C. Calafiore (Politecnico Di Torino)
23 pages
Abstract: y F X X X, X, X
No ratings yet
Abstract: y F X X X, X, X
10 pages
Lecture Notes 5 Linear Regression
No ratings yet
Lecture Notes 5 Linear Regression
11 pages
WK 06
No ratings yet
WK 06
7 pages
Today: - Calculus
No ratings yet
Today: - Calculus
61 pages
Week 4 Linear Regression
No ratings yet
Week 4 Linear Regression
38 pages
ML Lec8
No ratings yet
ML Lec8
7 pages
2bizbox User Guide Quality Box v3.0.0
No ratings yet
2bizbox User Guide Quality Box v3.0.0
129 pages
Spatial Data Layers - Vertical Data Organization PDF
100% (1)
Spatial Data Layers - Vertical Data Organization PDF
4 pages
2 - Multiple Linear Regression
No ratings yet
2 - Multiple Linear Regression
71 pages
Proxy Marketing A-Z
100% (3)
Proxy Marketing A-Z
7 pages
Lecture 2
No ratings yet
Lecture 2
23 pages
Midterm F02soln
No ratings yet
Midterm F02soln
14 pages
LinearRegression LectureNotesPublic PDF
No ratings yet
LinearRegression LectureNotesPublic PDF
7 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
P05 LinearRegression SolutionNotes
No ratings yet
P05 LinearRegression SolutionNotes
4 pages
Worksheet For Quiz
No ratings yet
Worksheet For Quiz
5 pages
Lecture Notes On High Dimensional Linear Regression
No ratings yet
Lecture Notes On High Dimensional Linear Regression
73 pages
Financial Project of Wipro 1
No ratings yet
Financial Project of Wipro 1
43 pages
SAP HANA Live CLF Consulting Note PDF
No ratings yet
SAP HANA Live CLF Consulting Note PDF
19 pages
InfoSphere - User Guide PDF
No ratings yet
InfoSphere - User Guide PDF
62 pages
Lec 3-5 (Function Approximation)
No ratings yet
Lec 3-5 (Function Approximation)
34 pages
SEE Electrical 8R2 Free Download Get Into PC
No ratings yet
SEE Electrical 8R2 Free Download Get Into PC
1 page
Tutorial 8 DataTable Aslists in Cucumber
No ratings yet
Tutorial 8 DataTable Aslists in Cucumber
13 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
Progression Linaire
No ratings yet
Progression Linaire
187 pages
TIB Ems 10.3.0 Installation
No ratings yet
TIB Ems 10.3.0 Installation
25 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Fianl AJP
No ratings yet
Fianl AJP
11 pages
Thesis Online Reservation System
100% (3)
Thesis Online Reservation System
6 pages
Operating Systems Questions
No ratings yet
Operating Systems Questions
11 pages
Sound Signal Reception System Vss Technical Manual
No ratings yet
Sound Signal Reception System Vss Technical Manual
18 pages
Bam - RM - Guidelines. For Refernce Materialspdf PDF
No ratings yet
Bam - RM - Guidelines. For Refernce Materialspdf PDF
22 pages
ML Unit
No ratings yet
ML Unit
23 pages
Machine Learning (CSO851) - Lecture 02
No ratings yet
Machine Learning (CSO851) - Lecture 02
74 pages
Lecture 3 - Regression
No ratings yet
Lecture 3 - Regression
47 pages
Lab 1
No ratings yet
Lab 1
11 pages
Sr. Cloud DevOps and Linux Engineer - Resume
No ratings yet
Sr. Cloud DevOps and Linux Engineer - Resume
7 pages
Aucom
No ratings yet
Aucom
10 pages
FML Unit2
No ratings yet
FML Unit2
13 pages
Comparison of Electronic Design Automation (EDA) Software
No ratings yet
Comparison of Electronic Design Automation (EDA) Software
7 pages
Syllabus
No ratings yet
Syllabus
2 pages
Wk05 Machine Learning
No ratings yet
Wk05 Machine Learning
6 pages
Upload 1 Document To Download: PPT Presentation On Articles
No ratings yet
Upload 1 Document To Download: PPT Presentation On Articles
2 pages
Pos 1
No ratings yet
Pos 1
2 pages
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
From Everand
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
CSPacademic
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

ML - Lec 4-Introduction To Regression

Uploaded by

ML - Lec 4-Introduction To Regression

Uploaded by

Lecture 4: An introduction

based on Andrew W. Moore Slides

Linear regression assumes that the expected value of the

p(y|w,x) has a normal distribution with

We have a set of datapoints (x1,y1) (x2,y2) … (xn,yn) which

We want to infer w from the data.

Asks the question:

The maximum likelihood model

The maximum likelihood model ended up with a prob dist of w

Often useful to know your confidence. Max

 .....x1 .....  x11 x12 ... x1m   y1 

(there are R datapoints. Each input has m components)

 .....x1 .....  x11 x12 ... x1m   y1 

(there are R datapoints. Each input has m components)

The max. likelihood w is w = (XTX)-1(XTY)

XTX is an m x m matrix: i,j’th elt is x

Can you guess??

3 2 1/4 x=0 x=1 x=2 x=3

3 3 x=0 x=1 x=2 x=3

yest = b1 f1(x) + b2 f2(x) + b3 f3(x)

yest = 2f1(x) + 0.05f2(x) + 0.5f3(x)

Allow the ci ’s to adapt to KW allowed to adapt to the data.

Allow the ci ’s to adapt to KW allowed to adapt to the data.

Answer: Gradient Descent

Allow the ci ’s to adapt to KW allowed to adapt to the data.

y You are a very good

y You are a very good

You are not too

y You are a very good

You are not too

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.