0% found this document useful (0 votes)

18 views51 pages

ML 02 Linear Regression

Uploaded by

Mrs.SANTHOSHI A

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views51 pages

ML 02 Linear Regression

Uploaded by

Mrs.SANTHOSHI A

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

CS 60050

Machine Learning

Linear Regression

Some slides taken from course materials of Andrew Ng

Dataset of living area and price
of houses in a city

This is a training set.

How can we learn to predict the prices of houses of

other sizes in the city, as a function of their living area?
Dataset of living area and price
of houses in a city

Example of supervised learning problem.

When the target variable we are trying to predict is

continuous, regression problem.
Dataset of living area and price
of houses in a city

m = number of training examples

x's = input variables / features
y's = output variables / "target" variables
(x,y) - single training example
(xi, yi) - specific example (ith training example)
i is an index to training set
How to use the training set?

Learn a function h(x), so

that h(x) is a good
predictor for the
corresponding value of y

h: hypothesis function
How to represent hypothesis h?

θi are parameters
- θ0 is zero condition
- θ1 is gradient
θ: vector of all the parameters

We assume y is a linear function of x

Univariate linear regression
How to learn the values of the parameters?
Digression:
Multivariate linear regression
How to represent hypothesis h?

θi are parameters
- θ0 is zero condition
- θ1 is gradient

We assume y is a linear function of x

Univariate linear regression
How to learn the values of the parameters θi?
Intuition of hypothesis function

• We are attempting to fit a straight line to the

data in the training set
• Values of the parameters decide the equation
of the straight line
• Which is the best straight line to fit the data?
Intuition of hypothesis function

• Which is the best straight line to fit the data?

• How to learn the values of the parameters θi?

• Choose the parameters such that the

prediction is close to the actual y-value for
the training examples
How good is the prediction
given by the straight line?
(x, y): a training example
(x, h(x)): prediction of the
model

( h(x) – y ): prediction error

for this particular training
example

Minimize the prediction

error across all training
examples
Cost function

• Measure of how close the predictions are to

the actual y-values
• Average over all the m training instances

• Squared error cost function J(θ)

• Choose parameters θ so that J(θ) is minimized
Hypothesis:

Parameters:

Cost Function:

Goal:
(for fixed , this is a function of x) (function of the parameters )

500

400

Price ($)
in 1000’s 300

200

100

0
0 1000 2000 3000
Size in feet (x)
2

For simplicity, assume 𝛩0 is a constant

Contour plot or Contour figure When J is a function of
both θ0 and θ1
Minimizing a function

• For now, let us consider some arbitrary

function (not necessarily a cost function)

• Analytical minimization not scalable to

complex functions of hundreds of parameters

• Algorithm called gradient descent

– Efficient and scalable to thousands of parameters
– Used in many applications of minimizing functions
Have some function

Want

• Outline:
• Start with some
• Keep changing to reduce
until we hopefully end up at a minimum

• Iterative method, similar to Newton-

Raphson method for solving equations
J(θ0,θ1)

θ1
θ0
If the function has multiple local minima, where one starts
can decide which minimum is reached

J(θ0,θ1)

θ1
θ0
Gradient descent algorithm

α is the learning rate – more on this later

Gradient descent algorithm

Correct: Simultaneous update Incorrect:

For simplicity, let us first consider a function of
a single variable

If the derivative is positive,

reduce value of θ1

If the derivative is negative,

increase value of θ1
The learning rate
• Do we need to change learning rate over time?
o No, Gradient descent can converge to a local
minimum, even with the learning rate α fixed
o Step size adjusted automatically

• But, value needs to be chosen judiciously

o If α is too small, gradient descent can be slow to
converge
o If α is too large, gradient descent can overshoot the
minimum. It may fail to converge, or even diverge.
Gradient descent for
univariate linear regression
Gradient descent algorithm Linear Regression Model
Gradient descent for univariate linear regression

update
and
simultaneously
“Batch” Gradient Descent

“Batch”: Each step of gradient descent uses

all the training examples.

There are other variations like “stochastic

gradient descent” (used in learning over
huge datasets)
What about multiple local minima?
• The cost function in linear regression is always a
convex function – always has a single global
minimum

• So, gradient descent will always converge

Gradient descent in action
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
Linear Regression for
multiple variables
Multiple features (variables).

Size (feet2) Number of Number of Age of home Price ($1000)

bedrooms floors (years)

2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… … … … …
Multiple features (variables).

Size (feet2) Number of Number of Age of home Price ($1000)

bedrooms floors (years)

2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… … … … …

Notation:
= number of features. m = number of training examples
= input (features) of training example.
= value of feature in training example.
Hypothesis:
For univariate linear regression:

For multi-variate linear regression:

For convenience of notation, define .

Hypothesis:

Parameters:

Cost function:

Gradient descent:
Repeat

(simultaneously update for every )

New algorithm :
Gradient Descent
Repeat
Previously (n=1):

Repeat

simultaneously update for

(simultaneously update )
New algorithm :
Gradient Descent
Repeat
Previously (n=1):

Repeat

simultaneously update for

(simultaneously update )
Practical aspects of applying
gradient descent
Feature Scaling
Idea: Make sure features are on a similar scale.

E.g. = size (0-2000 feet2)

= number of bedrooms (1-5)

Normalization wrt the maximum value:

size (feet2)

number of bedrooms
Feature Scaling
Idea: Make sure features are on a similar scale.

E.g. = size (0-2000 feet2)

= number of bedrooms (1-5)

Mean normalization:
Replace with to make features have approximately zero
mean (Do not apply to ).

Other types of normalization:

Is gradient descent working properly?
• Plot how J(θ) changes with every iteration of
gradient descent

• For sufficiently small learning rate, J(θ) should

decrease with every iteration

• If not, learning rate needs to be reduced

• However, too small learning rate means slow

convergence
When to end gradient descent?
• Example convergence test:

• Declare convergence if J(θ) decreases by less

than 0.001 in an iteration (assuming J(θ) is
decreasing in every iteration)
Polynomial Regression for
multiple variables
Choice of features

Price
(y)

Size (x)

L3 Linear Regression and Gradient Descent
No ratings yet
L3 Linear Regression and Gradient Descent
46 pages
CS 304.A Training Models
No ratings yet
CS 304.A Training Models
149 pages
Lec2 Regression
No ratings yet
Lec2 Regression
96 pages
Linear Regression
No ratings yet
Linear Regression
75 pages
Lecture LinearRegression
No ratings yet
Lecture LinearRegression
42 pages
Lecture 6,7-Linear Regression
No ratings yet
Lecture 6,7-Linear Regression
47 pages
Machine Learning Notes by Standard Andrew Ng
No ratings yet
Machine Learning Notes by Standard Andrew Ng
142 pages
Machine Learning - 5
No ratings yet
Machine Learning - 5
50 pages
Unit 4 - Linear Regression
No ratings yet
Unit 4 - Linear Regression
52 pages
Week 04
No ratings yet
Week 04
101 pages
cs229 2
No ratings yet
cs229 2
275 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
Stanford ML CS229-Merged Notes
No ratings yet
Stanford ML CS229-Merged Notes
126 pages
LinearRegression
No ratings yet
LinearRegression
64 pages
Revised-L3-Linear Regression
No ratings yet
Revised-L3-Linear Regression
41 pages
M02 Linear Regression Methods
No ratings yet
M02 Linear Regression Methods
40 pages
LinearRegression Annotated
No ratings yet
LinearRegression Annotated
116 pages
CS229
No ratings yet
CS229
69 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
293 pages
TMI04.2 Linear Regression
No ratings yet
TMI04.2 Linear Regression
36 pages
Lecture 2. Regression
No ratings yet
Lecture 2. Regression
61 pages
Linear Regression: Jia-Bin Huang Virginia Tech
No ratings yet
Linear Regression: Jia-Bin Huang Virginia Tech
59 pages
[ML&PR 2025] Lec2 Regression II
No ratings yet
[ML&PR 2025] Lec2 Regression II
41 pages
Linear Regression
No ratings yet
Linear Regression
54 pages
Lecture 3 Ai
No ratings yet
Lecture 3 Ai
48 pages
ML Notes
No ratings yet
ML Notes
14 pages
Planet, Code - MACHINE LEARNING WITH PYTHON_ a Comprehensive Guide to Algorithms, Deep Learning Techniques, And Practical Applications (2025)
No ratings yet
Planet, Code - MACHINE LEARNING WITH PYTHON_ a Comprehensive Guide to Algorithms, Deep Learning Techniques, And Practical Applications (2025)
233 pages
Lecture 4 - More On Linear Regression and Polynomial Regression
No ratings yet
Lecture 4 - More On Linear Regression and Polynomial Regression
26 pages
Gradient Descent - Linear Regression
100% (1)
Gradient Descent - Linear Regression
47 pages
Linear Regression
No ratings yet
Linear Regression
63 pages
lecture7-linear-regression
No ratings yet
lecture7-linear-regression
36 pages
Linear+regression+with+one+variable
No ratings yet
Linear+regression+with+one+variable
48 pages
L4 More On Linear Regression and Polynomial Regression
No ratings yet
L4 More On Linear Regression and Polynomial Regression
37 pages
Machine Learning Notes AndrewNg
No ratings yet
Machine Learning Notes AndrewNg
141 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
43 pages
Linear Regression With Multiple Variables: Reading Material: Part 1 of Lecture Notes 1
No ratings yet
Linear Regression With Multiple Variables: Reading Material: Part 1 of Lecture Notes 1
24 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
[PR 2024] Lec2 Regression II
No ratings yet
[PR 2024] Lec2 Regression II
41 pages
Linear Regression For Absolute Beginners With Implementation in Python
No ratings yet
Linear Regression For Absolute Beginners With Implementation in Python
17 pages
Regression
No ratings yet
Regression
30 pages
ML: Introduction 1. What Is Machine Learning?
No ratings yet
ML: Introduction 1. What Is Machine Learning?
38 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
Linear Regression
100% (1)
Linear Regression
51 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
cs229 Notes1 PDF
No ratings yet
cs229 Notes1 PDF
28 pages
Week 1 Lecture Notes
No ratings yet
Week 1 Lecture Notes
7 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
15 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Linear Regression With Multiple Features
No ratings yet
Linear Regression With Multiple Features
7 pages
Computing For Data Sciences: Introduction To Regression Analysis
No ratings yet
Computing For Data Sciences: Introduction To Regression Analysis
9 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
(Machine Learning Coursera) Lecture Note Week 1
No ratings yet
(Machine Learning Coursera) Lecture Note Week 1
8 pages
Linearna Regresija - NG
No ratings yet
Linearna Regresija - NG
7 pages
Sudoku in CPP
No ratings yet
Sudoku in CPP
19 pages
Graph Traversal: Bfs & Dfs
100% (1)
Graph Traversal: Bfs & Dfs
57 pages
Chapter - 3 Searching and Planning
No ratings yet
Chapter - 3 Searching and Planning
85 pages
Deep Learning CS60010: Computer Science and Engineering
No ratings yet
Deep Learning CS60010: Computer Science and Engineering
59 pages
Lecture-7 (Alising-Antialising)
No ratings yet
Lecture-7 (Alising-Antialising)
26 pages
Simulation Techniques: References
No ratings yet
Simulation Techniques: References
40 pages
Lecture 1, Part 1: Linear Regression: Roger Grosse
No ratings yet
Lecture 1, Part 1: Linear Regression: Roger Grosse
9 pages
Lab Manual
No ratings yet
Lab Manual
43 pages
Cec331-4g-5g Communication Networks-402035229-4g - 5 - 240505 - 181733
No ratings yet
Cec331-4g-5g Communication Networks-402035229-4g - 5 - 240505 - 181733
25 pages
Simplex Method (Minimization Example) : Object Function
No ratings yet
Simplex Method (Minimization Example) : Object Function
6 pages
Math Iv Chapter 2
No ratings yet
Math Iv Chapter 2
34 pages
(EN) S104 - Signal Processing - Midterm Exam Study Guide STUDENT
No ratings yet
(EN) S104 - Signal Processing - Midterm Exam Study Guide STUDENT
13 pages
18-JohanGroenen - Nurikabe
No ratings yet
18-JohanGroenen - Nurikabe
20 pages
REVIEWER
No ratings yet
REVIEWER
28 pages
8X8 Bitmultiplier
No ratings yet
8X8 Bitmultiplier
26 pages
Regula Falsi Method
No ratings yet
Regula Falsi Method
9 pages
Fingerprint Recognition Using Artificial Neural Networks
No ratings yet
Fingerprint Recognition Using Artificial Neural Networks
5 pages
Butter Worth Filter Design
No ratings yet
Butter Worth Filter Design
4 pages
MATH
No ratings yet
MATH
29 pages
Exp 6
100% (1)
Exp 6
9 pages
01 Course Intro
No ratings yet
01 Course Intro
29 pages
Assignment 4
No ratings yet
Assignment 4
3 pages
Windowing Functions Improve FFT Results,: Richard Lyons
No ratings yet
Windowing Functions Improve FFT Results,: Richard Lyons
7 pages
Interpolasi Lagrange Contoh Kode
No ratings yet
Interpolasi Lagrange Contoh Kode
8 pages
Multiplierless Constant Multiplication
No ratings yet
Multiplierless Constant Multiplication
2 pages
Lanczos Algorithms For Large Symmetric Eigenvalue Computations - by Jane K. Cullum
No ratings yet
Lanczos Algorithms For Large Symmetric Eigenvalue Computations - by Jane K. Cullum
4 pages
BCA-302 New
No ratings yet
BCA-302 New
4 pages
Puzzle Factoring
No ratings yet
Puzzle Factoring
1 page
National Institute of Technology Patna: Department of Computer Science & Engineering
No ratings yet
National Institute of Technology Patna: Department of Computer Science & Engineering
2 pages
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Mathematical Optimization: Fundamentals and Applications
From Everand
Mathematical Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

ML 02 Linear Regression

Uploaded by

ML 02 Linear Regression

Uploaded by

CS 60050

Some slides taken from course materials of Andrew Ng

This is a training set.

How can we learn to predict the prices of houses of

Example of supervised learning problem.

When the target variable we are trying to predict is

m = number of training examples

Learn a function h(x), so

We assume y is a linear function of x

We assume y is a linear function of x

• We are attempting to fit a straight line to the

• Which is the best straight line to fit the data?

• Choose the parameters such that the

( h(x) – y ): prediction error

Minimize the prediction

• Measure of how close the predictions are to

• Squared error cost function J(θ)

For simplicity, assume 𝛩0 is a constant

• For now, let us consider some arbitrary

• Analytical minimization not scalable to

• Algorithm called gradient descent

• Iterative method, similar to Newton-

α is the learning rate – more on this later

Correct: Simultaneous update Incorrect:

If the derivative is positive,

If the derivative is negative,

• But, value needs to be chosen judiciously

“Batch”: Each step of gradient descent uses

There are other variations like “stochastic

• So, gradient descent will always converge

Size (feet2) Number of Number of Age of home Price ($1000)

Size (feet2) Number of Number of Age of home Price ($1000)

For multi-variate linear regression:

For convenience of notation, define .

(simultaneously update for every )

simultaneously update for

simultaneously update for

E.g. = size (0-2000 feet2)

Normalization wrt the maximum value:

E.g. = size (0-2000 feet2)

Other types of normalization:

• For sufficiently small learning rate, J(θ) should

• If not, learning rate needs to be reduced

• However, too small learning rate means slow

• Declare convergence if J(θ) decreases by less

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.