0% found this document useful (0 votes)

3 views96 pages

Lec2 Regression

The document discusses regression techniques, including linear and logistic regression, focusing on their applications in predicting housing prices and classification tasks. It explains concepts such as cost functions, gradient descent algorithms, and the importance of feature scaling and regularization to avoid overfitting. Additionally, it covers multi-class classification using the one-vs-all approach and the use of optimization algorithms for parameter fitting.

Uploaded by

Hoàng Bảo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views96 pages

Lec2 Regression

Uploaded by

Hoàng Bảo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 96

Regression

• Linear regression
• Logistic regression
500
Housing Prices
400
(Portland, OR)
300

Price 200

(in 1000s 100

of dollars)
0
0 500 1000 1500 2000 2500 3000

Size (feet2)
Supervised Learning Regression Problem
Given the “right answer” for Predict real-valued output
each example in the data. Classification: discrete-valued
output
Training set of Size in feet2 (x) Price ($) in 1000's (y)
housing prices 2104 460
(Portland, OR) 1416 232
1534 315
852 178
… …
Notation:
m = Number of training examples ( )
= 2104
x’s = “input” variable / features ( )
= 1416
y’s = “output” variable / “target” variable ( )
= 460
(x,y) – one training example
( training example
Size in feet2 (x) Price ($) in 1000's (y)
Training Set
2104 460
1416 232
1534 315
852 178
… …

Hypothesis:
‘s: Parameters
How to choose ‘s ?
Idea: Choose so that
is close to for our training
examples
Simplified
Hypothesis:

Parameters:

Cost Function:

Goal:

Square error function

(for fixed , this is a function of x) (function of the parameter )

3 3

2 2

1 1 (1)= 0
ℎ 𝑥 =𝑦

0 0 x
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5

( ()

() ( + + )=0
(for fixed , this is a function of x) (function of the parameter )

3 3

2 2

1 1
x
0 0 x
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5

]=
.
x
(for fixed , this is a function of x) x
(function of the parameter ) x
x
3 3 x
x

2
x x
2 x
x x
1 1 x
x x
x x
0 0 x
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5

]=
.
]=
.
Hypothesis:

Parameters:

Cost Function:

Goal:
Have some function
Want ,…

Outline:
• Start with some (say )
• Keep changing to reduce
until we hopefully end up at a minimum
Gradient descent algorithm

Correct: Simultaneous update Incorrect:

Gradient descent algorithm
)

If α is too small, gradient descent

can be slow.

If α is too large, gradient descent

can overshoot the minimum. It may
fail to converge, or even diverge.
at local optima

Current value of
Gradient descent can converge to a local
minimum, even with the learning rate α fixed.

As we approach a local
minimum, gradient
descent will automatically
take smaller steps. So, no
need to decrease α over
time.
Gradient descent algorithm Linear Regression Model
()

()

() ()
Gradient descent algorithm

update
and
simultaneously
J()



J()



(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
“Batch” Gradient Descent
“Batch”: Each step of gradient descent uses all the
training examples.
Stochastic Gradient Descent (SGD)
It updates the parameters for each training example,
one by one.
Stochastic Gradient Descent (SGD)
Advantages: - faster than Batch GD in some problem.
- the frequent updates allow us to have a pretty detailed
rate of improvement.
Disadvantages: - the frequent updates are more computationally expensive
- The frequency of those updates can also result in noisy
gradients

Mini Batch Gradient Descent

A combination of the concepts of SGD and Batch Gradient Descent.
Mini Batch Gradient Descent

- Splits the training dataset into small batches

- Performs an update for each of these
batches.
Creates a balance between the
robustness of stochastic gradient descent and
the efficiency of batch gradient descent.

Mini-batch sizes range between 50 and 256.

Multiple features (variables).

Size (feet2) Price ($1000)

2104 460
1416 232
1534 315
852 178
… …
Multiple features (variables).

Size (feet2) Number of Number of Age of home Price ($1000)

bedrooms floors (years)

2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… … … … …
Multiple features (variables).
Size (feet2) Number of Number of Age of home Price ($1000)
bedrooms floors (years)

2104 5 1 45 460
1416 3 2 40 232 (
1534 3 2 30 315
852 2 1 36 178
… … … … …
Notation:
= number of features (
= input (features) of training example.
= value of feature in training example. ( )
Hypothesis:
Previously: (one variable)

Multiple variables:
+ +

+ -
For convenience of notation, define (

+
1x 𝑛 + 1
matrix

Multivariate linear regression.

Hypothesis:
Parameters:
Cost function:

Gradient descent:
Repeat

(simultaneously update for every )

New algorithm :
Gradient Descent
Repeat
Previously (n=1):
Repeat
(simultaneously update for
)
()
=1

(simultaneously update )
Feature Scaling
Idea: Make sure features are on a similar scale.
Get every feature into approximately a range.

Too big
Too small
Mean normalization
Replace with to make features have approximately zero mean
(Do not apply to ).
E.g. Avg. size 1000
1-5 bedrooms
Gradient descent

- “Debugging”: How to make sure gradient

descent is working correctly.
- How to choose learning rate .
Making sure gradient descent is working correctly.

Example automatic
convergence test:

Declare convergence if
decreases by less than
in one iteration.
0 100 200 300 400

No. of iterations
Making sure gradient descent is working correctly.
Gradient descent not working.
Use smaller .
)

No. of iterations

No. of iterations No. of iterations

- For sufficiently small , should decrease on every iteration.

- But if is too small, gradient descent can be slow to converge.
Summary:
- If is too small: slow convergence.
- If is too large: may not decrease on
every iteration; may not converge.

To choose , try
0.003 0.03 0.3
Polynomial regression

Price
(y)

Size (x)
Choice of features

Price
(y)

Size (x)
Normal equation: Method to solve for analytically.
Intuition: If 1D

Solve for

(for every )

Solve for
Examples:
Size (feet2) Number of Number of Age of home Price ($1000)
bedrooms floors (years)

1 2104 5 1 45 460
1 1416 3 2 40 232
1 1534 3 2 30 315
1 852 2 1 36 178

m – dimensional vector
Examples:
Size (feet2) Number of Number of Age of home Price ($1000)
bedrooms floors (years)

1 2104 5 1 45 460
1 1416 3 2 40 232
1 1534 3 2 30 315
1 852 2 1 36 178
1 3000 4 1 38 540
examples ; features.

E.g. If
is inverse of matrix .
Set

Matlab pinv(X’*X)*X’*y

000
training examples, features.
Gradient Descent Normal Equation
• Need to choose . • No need to choose .
• Needs many iterations. • Don’t need to iterate.
• Works well even • Need to compute
when is large.
• Slow if is very large.
Normal equation

- What if is non-invertible? (singular/

degenerate)
- Matlab: pinv(X’*X)*X’*y
inv(X’*X)*X’*y
What if is non-invertible?
• Redundant features (linearly dependent).
E.g. size in feet2
size in m2

• Too many features (e.g. ).

- Delete some features, or use regularization.
Logistic regression (classification)

• Classification
• Hypothesis representation
• Decision boundary
• Cost function
• Multi-class classification: One-vs-all
Classification

Email: Spam / Not Spam?

Online Transactions: Fraudulent (Yes / No)?
Tumor: Malignant / Benign ?

0: “Negative Class” (e.g., benign tumor)

1: “Positive Class” (e.g., malignant tumor)
𝒉𝜽 𝒙 = 𝜽𝑻 𝒙
(Yes) 1
0.5
Malignant ?

(No) 0
Tumor Size

Predict “ ” Predict “ ”

Threshold classifier output at 0.5:

If , predict “ ”
If , predict “ ”
Classification: or

can be > 1 or < 0

Logistic Regression:
Logistic Regression Model
Want

𝑔(𝑧)
1

0.5

Sigmoid function 0

Logistic function
Interpretation of Hypothesis Output
= estimated probability that on input x

Example: If

Tell patient that 70% chance of tumor being malignant

“probability that y = 1, given x,

parameterized by ”
Logistic regression 1
𝑔(𝑧)

0.5

0
Suppose predict “ “ if

predict “ “ if
Decision Boundary
x2
3
2

01 2 3 x1
Decision boundary

Predict “ “ if
Decision boundary
Non-linear decision boundaries
x2

0
-1 1 x1
𝑥 +𝑥 =1
Predict “ “ if
-1

x2
𝑥 +𝑥 ≥1

x1
Training set:

m examples

How to choose parameters ?

Cost function
Linear regression:

Logistic regression: () () () ()

“non-convex” “convex”
• Logarithm function
• Natural logarithm with base e
Logistic regression cost function

If 1

0 1
Logistic regression cost function

If = 0 if
But as

=1- =0

0 1
Logistic regression cost function

If 1:

If :
Logistic regression cost function

To fit parameters :

To make a prediction given new :

Output
Gradient Descent

Want :
Repeat

(simultaneously update all )

() () ()
)= ) −𝐽 𝜃

()
=
()
=
Gradient Descent

Want :
Repeat

(simultaneously update all )

Algorithm looks identical to linear regression!

Optimization algorithm
Cost function . Want .
Given , we have code that can compute
-
- (for )

Gradient descent:
Repeat
Optimization algorithm
Given , we have code that can compute
-
- (for )

Optimization algorithms: Advantages:

- Gradient descent - No need to manually pick
- Conjugate gradient - Often faster than gradient
- BFGS descent.
- L-BFGS Disadvantages:
- More complex
Multiclass classification
Email foldering/tagging: Work, Friends, Family, Hobby

Medical diagrams: Not ill, Cold, Flu

Weather: Sunny, Cloudy, Rain, Snow

Binary classification: Multi-class classification:

x2 x2

x1 x1
x2
One-vs-all (one-vs-rest):
( )

x1
x2 x2

( )

x1 x1
x2
Class 1:
Class 2: ( )
Class 3:
x1
One-vs-all

Train a logistic regression classifier for each

class to predict the probability that .

On a new input , to make a prediction, pick the

class that maximizes
Regularization

• The problem of overfitting

• Cost function
• Regularized logistic regression
Example: Linear regression (housing prices)
Price

Price

Price
Size Size Size

“Underfit” or “high bias” “just right” “overfit” or “high variance”

Overfitting: If we have too many features, the learned hypothesis

may fit the training set very well ( ), but fail
to generalize to new examples (predict prices on new examples).
Example: Logistic regression

x2 x2 x2 +

x1 x1 x1

( = sigmoid function)
Underfit
overfit
Addressing overfitting:
size of house
no. of bedrooms

Price
no. of floors
age of house
average income in neighborhood Size

kitchen size
Addressing overfitting:

Options:
1. Reduce number of features.
― Manually select which features to keep.
― Model selection algorithm.
2. Regularization.
― Keep all the features, but reduce magnitude/values of
parameters .
― Works well when we have a lot of features, each of
which contributes a bit to predicting .
Intuition

Price
Price

Size of house Size of house

X X
Suppose we penalize and make , really small.
Regularization.

Small values for parameters

― “Simpler” hypothesis
― Less prone to overfitting
Housing:
― Features:
― Parameters:

, , …,
Not
Regularization.
Price

Size of house
Regularized linear regression
Gradient descent
Repeat
Regularized logistic regression.

x2 +

x1
Cost function:
Gradient descent
Repeat
Advanced optimization
function [jVal, gradient] = costFunction(theta)
jVal = [ code to compute ];

gradient(1) = [code to compute ];

gradient(2) = [code to compute ];

gradient(3) = [code to compute ];

gradient(n+1) = [code to compute ];

Reference:
- Machine Learning, Andrew Ng, coursera.org

Savage Worlds - Daring Tales of The Sprawl - Cutter
No ratings yet
Savage Worlds - Daring Tales of The Sprawl - Cutter
2 pages
L4 More On Linear Regression and Polynomial Regression
No ratings yet
L4 More On Linear Regression and Polynomial Regression
37 pages
Linear Regression
100% (1)
Linear Regression
51 pages
Lecture 4 - More On Linear Regression and Polynomial Regression
No ratings yet
Lecture 4 - More On Linear Regression and Polynomial Regression
26 pages
ML 02 Linear Regression
No ratings yet
ML 02 Linear Regression
51 pages
Linear Regression
No ratings yet
Linear Regression
95 pages
Gradient Descent - Linear Regression
100% (1)
Gradient Descent - Linear Regression
47 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Linear Regression
No ratings yet
Linear Regression
75 pages
DSCTP 2022 1 ML Slides
No ratings yet
DSCTP 2022 1 ML Slides
110 pages
Lecture 3 Ai
No ratings yet
Lecture 3 Ai
48 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
Week 04
No ratings yet
Week 04
101 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
Optimization23 22
No ratings yet
Optimization23 22
32 pages
Linear Regression
No ratings yet
Linear Regression
63 pages
Mlfa Autumn 22 Lec 04
No ratings yet
Mlfa Autumn 22 Lec 04
24 pages
Basics of Machine Learning: Marco Kuhlmann
No ratings yet
Basics of Machine Learning: Marco Kuhlmann
56 pages
LinearRegression Annotated
No ratings yet
LinearRegression Annotated
116 pages
CS229
No ratings yet
CS229
69 pages
cs229 2
No ratings yet
cs229 2
275 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
Regression
No ratings yet
Regression
30 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
15 pages
ML03
No ratings yet
ML03
14 pages
Supervised Learning Part2
No ratings yet
Supervised Learning Part2
27 pages
Computing For Data Sciences: Introduction To Regression Analysis
No ratings yet
Computing For Data Sciences: Introduction To Regression Analysis
9 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
293 pages
Linearna Regresija - NG
No ratings yet
Linearna Regresija - NG
7 pages
Stanford ML CS229-Merged Notes
No ratings yet
Stanford ML CS229-Merged Notes
126 pages
Machine Learning Notes by Standard Andrew Ng
No ratings yet
Machine Learning Notes by Standard Andrew Ng
142 pages
Machine Learning Notes AndrewNg
No ratings yet
Machine Learning Notes AndrewNg
141 pages
ML Notes
No ratings yet
ML Notes
14 pages
CS229 Lecture 2 PDF
100% (1)
CS229 Lecture 2 PDF
48 pages
A Layman's Guide to the Project
No ratings yet
A Layman's Guide to the Project
34 pages
ML02
No ratings yet
ML02
25 pages
cs229 Notes1 PDF
No ratings yet
cs229 Notes1 PDF
28 pages
Week 2
No ratings yet
Week 2
5 pages
UNIT3
No ratings yet
UNIT3
37 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
Week 1 Lecture Notes
No ratings yet
Week 1 Lecture Notes
7 pages
3 TrainingNetwork
No ratings yet
3 TrainingNetwork
65 pages
Gansp Awareness Quiz PDF
No ratings yet
Gansp Awareness Quiz PDF
13 pages
DL Regularization
No ratings yet
DL Regularization
51 pages
Tom Mitchell Provides A More Modern Definition
No ratings yet
Tom Mitchell Provides A More Modern Definition
10 pages
Linear Regression With Multiple Variables: Reading Material: Part 1 of Lecture Notes 1
No ratings yet
Linear Regression With Multiple Variables: Reading Material: Part 1 of Lecture Notes 1
24 pages
(Machine Learning Coursera) Lecture Note Week 1
No ratings yet
(Machine Learning Coursera) Lecture Note Week 1
8 pages
04 Optimization
No ratings yet
04 Optimization
62 pages
Op Tim Ization
No ratings yet
Op Tim Ization
9 pages
CM20315 06 Fitting
No ratings yet
CM20315 06 Fitting
67 pages
Module3_Ch1
No ratings yet
Module3_Ch1
83 pages
CS 304.A Training Models
No ratings yet
CS 304.A Training Models
149 pages
Machine Learning - 5
No ratings yet
Machine Learning - 5
50 pages
Lecture LinearRegression
No ratings yet
Lecture LinearRegression
42 pages
Machine Learning Notes Cs229 1
No ratings yet
Machine Learning Notes Cs229 1
217 pages
Solving Math Problems
From Everand
Solving Math Problems
George N. Frempong
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
From Everand
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
CSPacademic
No ratings yet
01_mnist.ipynb (4) - JupyterLab
No ratings yet
01_mnist.ipynb (4) - JupyterLab
23 pages
CMOS_FAB
No ratings yet
CMOS_FAB
15 pages
Lecture7C Classification
No ratings yet
Lecture7C Classification
34 pages
Lecture7B Classification
No ratings yet
Lecture7B Classification
78 pages
How To Communicate With The Computer
No ratings yet
How To Communicate With The Computer
21 pages
front of gidada solan
No ratings yet
front of gidada solan
5 pages
Ahili Biswas Chapter 5
No ratings yet
Ahili Biswas Chapter 5
3 pages
Arrangements For Equipment
No ratings yet
Arrangements For Equipment
2 pages
Operational Philosophy Voith Paper
No ratings yet
Operational Philosophy Voith Paper
4 pages
x12 855sampledocument
No ratings yet
x12 855sampledocument
2 pages
MCQ1
No ratings yet
MCQ1
51 pages
En Norm
No ratings yet
En Norm
14 pages
Reconciliation Best Practices
No ratings yet
Reconciliation Best Practices
16 pages
PAP Code of Ethics, VII Assessment
No ratings yet
PAP Code of Ethics, VII Assessment
2 pages
Sheshadri Rao Gudlavalleru Engineering College: (An Autonomous Institute With Permanent Affiliation To JNTUK, Kakinada)
No ratings yet
Sheshadri Rao Gudlavalleru Engineering College: (An Autonomous Institute With Permanent Affiliation To JNTUK, Kakinada)
2 pages
CE404 06a Hydraulic Design of Syphon
No ratings yet
CE404 06a Hydraulic Design of Syphon
9 pages
School of Computer Science and IT: Question Bank Sub: 20mcasct303 - Desinging Enterprise Network
No ratings yet
School of Computer Science and IT: Question Bank Sub: 20mcasct303 - Desinging Enterprise Network
8 pages
Ramy Elsadany Ghoraba- Mechanical Engineer.pdf
No ratings yet
Ramy Elsadany Ghoraba- Mechanical Engineer.pdf
6 pages
TDS - en - Tecnofoam G-2025 H2o
No ratings yet
TDS - en - Tecnofoam G-2025 H2o
5 pages
AI_SAMPLE PAPER 1
No ratings yet
AI_SAMPLE PAPER 1
6 pages
Bapi Po Create1
100% (2)
Bapi Po Create1
8 pages
Realme
No ratings yet
Realme
8 pages
Kaeser BSD 65 83
No ratings yet
Kaeser BSD 65 83
9 pages
Group 1 MH Final
No ratings yet
Group 1 MH Final
40 pages
certschief19122024121817demo
No ratings yet
certschief19122024121817demo
6 pages
Create Product Master of Type "Configurable Material" (2T7) : Master Data Script SAP S/4HANA - 03-09-19
100% (1)
Create Product Master of Type "Configurable Material" (2T7) : Master Data Script SAP S/4HANA - 03-09-19
31 pages
L11 - ASP.NET Core Web API and EF Core
No ratings yet
L11 - ASP.NET Core Web API and EF Core
22 pages
Unit 4 Technology Smart & Tech-savvy(1)
No ratings yet
Unit 4 Technology Smart & Tech-savvy(1)
6 pages
Chapter 2 This Lecture Note Is Mainly Discuss About Network Analysis and Synthesis
No ratings yet
Chapter 2 This Lecture Note Is Mainly Discuss About Network Analysis and Synthesis
7 pages
65 FAC-Calculation Model Engels
No ratings yet
65 FAC-Calculation Model Engels
10 pages
Basavaraj S Avate UI Front End Dev
No ratings yet
Basavaraj S Avate UI Front End Dev
2 pages
E. Syaodih 2018 Smart Village Development (The 9th International Conference of Rural
No ratings yet
E. Syaodih 2018 Smart Village Development (The 9th International Conference of Rural
12 pages
Amisp SBD 15092021 Version 1
No ratings yet
Amisp SBD 15092021 Version 1
335 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lec2 Regression

Uploaded by

Lec2 Regression

Uploaded by

Regression

(in 1000s 100

Square error function

Correct: Simultaneous update Incorrect:

If α is too small, gradient descent

If α is too large, gradient descent

Mini Batch Gradient Descent

- Splits the training dataset into small batches

Mini-batch sizes range between 50 and 256.

Size (feet2) Price ($1000)

Size (feet2) Number of Number of Age of home Price ($1000)

Multivariate linear regression.

(simultaneously update for every )

- “Debugging”: How to make sure gradient

No. of iterations No. of iterations

- For sufficiently small , should decrease on every iteration.

- What if is non-invertible? (singular/

• Too many features (e.g. ).

Email: Spam / Not Spam?

0: “Negative Class” (e.g., benign tumor)

Threshold classifier output at 0.5:

can be > 1 or < 0

Tell patient that 70% chance of tumor being malignant

“probability that y = 1, given x,

How to choose parameters ?

To make a prediction given new :

(simultaneously update all )

(simultaneously update all )

Algorithm looks identical to linear regression!

Optimization algorithms: Advantages:

Medical diagrams: Not ill, Cold, Flu

Weather: Sunny, Cloudy, Rain, Snow

Train a logistic regression classifier for each

On a new input , to make a prediction, pick the

• The problem of overfitting

“Underfit” or “high bias” “just right” “overfit” or “high variance”

Overfitting: If we have too many features, the learned hypothesis

Size of house Size of house

Small values for parameters

gradient(1) = [code to compute ];

gradient(2) = [code to compute ];

gradient(3) = [code to compute ];

gradient(n+1) = [code to compute ];

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.