0% found this document useful (0 votes)
18 views51 pages

ML 02 Linear Regression

Uploaded by

Mrs.SANTHOSHI A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views51 pages

ML 02 Linear Regression

Uploaded by

Mrs.SANTHOSHI A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

CS 60050

Machine Learning

Linear Regression

Some slides taken from course materials of Andrew Ng


Dataset of living area and price
of houses in a city

This is a training set.

How can we learn to predict the prices of houses of


other sizes in the city, as a function of their living area?
Dataset of living area and price
of houses in a city

Example of supervised learning problem.

When the target variable we are trying to predict is


continuous, regression problem.
Dataset of living area and price
of houses in a city

m = number of training examples


x's = input variables / features
y's = output variables / "target" variables
(x,y) - single training example
(xi, yi) - specific example (ith training example)
i is an index to training set
How to use the training set?

Learn a function h(x), so


that h(x) is a good
predictor for the
corresponding value of y

h: hypothesis function
How to represent hypothesis h?

θi are parameters
- θ0 is zero condition
- θ1 is gradient
θ: vector of all the parameters

We assume y is a linear function of x


Univariate linear regression
How to learn the values of the parameters?
Digression:
Multivariate linear regression
How to represent hypothesis h?

θi are parameters
- θ0 is zero condition
- θ1 is gradient

We assume y is a linear function of x


Univariate linear regression
How to learn the values of the parameters θi?
Intuition of hypothesis function

• We are attempting to fit a straight line to the


data in the training set
• Values of the parameters decide the equation
of the straight line
• Which is the best straight line to fit the data?
Intuition of hypothesis function

• Which is the best straight line to fit the data?


• How to learn the values of the parameters θi?

• Choose the parameters such that the


prediction is close to the actual y-value for
the training examples
How good is the prediction
given by the straight line?
(x, y): a training example
(x, h(x)): prediction of the
model

( h(x) – y ): prediction error


for this particular training
example

Minimize the prediction


error across all training
examples
Cost function

• Measure of how close the predictions are to


the actual y-values
• Average over all the m training instances

• Squared error cost function J(θ)


• Choose parameters θ so that J(θ) is minimized
Hypothesis:

Parameters:

Cost Function:

Goal:
(for fixed , this is a function of x) (function of the parameters )

500

400

Price ($)
in 1000’s 300

200

100

0
0 1000 2000 3000
Size in feet (x)
2

For simplicity, assume 𝛩0 is a constant


Contour plot or Contour figure When J is a function of
both θ0 and θ1
Minimizing a function

• For now, let us consider some arbitrary


function (not necessarily a cost function)

• Analytical minimization not scalable to


complex functions of hundreds of parameters

• Algorithm called gradient descent


– Efficient and scalable to thousands of parameters
– Used in many applications of minimizing functions
Have some function

Want

• Outline:
• Start with some
• Keep changing to reduce
until we hopefully end up at a minimum

• Iterative method, similar to Newton-


Raphson method for solving equations
J(θ0,θ1)

θ1
θ0
If the function has multiple local minima, where one starts
can decide which minimum is reached

J(θ0,θ1)

θ1
θ0
Gradient descent algorithm

α is the learning rate – more on this later


Gradient descent algorithm

Correct: Simultaneous update Incorrect:


For simplicity, let us first consider a function of
a single variable

If the derivative is positive,


reduce value of θ1

If the derivative is negative,


increase value of θ1
The learning rate
• Do we need to change learning rate over time?
o No, Gradient descent can converge to a local
minimum, even with the learning rate α fixed
o Step size adjusted automatically

• But, value needs to be chosen judiciously


o If α is too small, gradient descent can be slow to
converge
o If α is too large, gradient descent can overshoot the
minimum. It may fail to converge, or even diverge.
Gradient descent for
univariate linear regression
Gradient descent algorithm Linear Regression Model
Gradient descent for univariate linear regression

update
and
simultaneously
“Batch” Gradient Descent

“Batch”: Each step of gradient descent uses


all the training examples.

There are other variations like “stochastic


gradient descent” (used in learning over
huge datasets)
What about multiple local minima?
• The cost function in linear regression is always a
convex function – always has a single global
minimum

• So, gradient descent will always converge


Gradient descent in action
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
Linear Regression for
multiple variables
Multiple features (variables).

Size (feet2) Number of Number of Age of home Price ($1000)


bedrooms floors (years)

2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… … … … …
Multiple features (variables).

Size (feet2) Number of Number of Age of home Price ($1000)


bedrooms floors (years)

2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… … … … …

Notation:
= number of features. m = number of training examples
= input (features) of training example.
= value of feature in training example.
Hypothesis:
For univariate linear regression:

For multi-variate linear regression:

For convenience of notation, define .


Hypothesis:

Parameters:

Cost function:

Gradient descent:
Repeat

(simultaneously update for every )


New algorithm :
Gradient Descent
Repeat
Previously (n=1):

Repeat

simultaneously update for

(simultaneously update )
New algorithm :
Gradient Descent
Repeat
Previously (n=1):

Repeat

simultaneously update for

(simultaneously update )
Practical aspects of applying
gradient descent
Feature Scaling
Idea: Make sure features are on a similar scale.

E.g. = size (0-2000 feet2)


= number of bedrooms (1-5)

Normalization wrt the maximum value:

size (feet2)

number of bedrooms
Feature Scaling
Idea: Make sure features are on a similar scale.

E.g. = size (0-2000 feet2)


= number of bedrooms (1-5)

Mean normalization:
Replace with to make features have approximately zero
mean (Do not apply to ).

Other types of normalization:


Is gradient descent working properly?
• Plot how J(θ) changes with every iteration of
gradient descent

• For sufficiently small learning rate, J(θ) should


decrease with every iteration

• If not, learning rate needs to be reduced

• However, too small learning rate means slow


convergence
When to end gradient descent?
• Example convergence test:

• Declare convergence if J(θ) decreases by less


than 0.001 in an iteration (assuming J(θ) is
decreasing in every iteration)
Polynomial Regression for
multiple variables
Choice of features

Price
(y)

Size (x)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy