0% found this document useful (0 votes)

35 views

2021 Logistic Regression

Uploaded by

sibahlemlambo5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views

2021 Logistic Regression

Uploaded by

sibahlemlambo5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Machine Learning – COMS3007

Logistic Regression
Benjamin Rosman

Based heavily on course notes by

Chris Williams and Victor Lavrenko,
Amos Storkey, Eric Eaton, and Clint
van Alten
Classification
• Data 𝑋 = {𝑥 0 , … , 𝑥 (𝑛) }, where 𝑥 (𝑖) ∈ 𝑅𝑑
• Labels 𝐲 = {𝑦 0 , … , 𝑦 (𝑛) }, where 𝑦 (𝑖) ∈ {0,1}
• Want to learn function 𝑦 = 𝑓(𝑥, 𝜃) to predict y for a
new x
𝑦 is the class
(red/blue)

𝑥2 𝑥2

𝑥1 𝑥1
Generative vs discriminative
• In Naïve Bayes, we used a generative approach
• Class conditional modeling
• 𝑝 𝑦 𝒙 ∝ 𝑝(𝒙|𝑦)𝑝(𝑦)

• Now model 𝑝 𝑦 𝒙 directly: discriminative approach

• As was the case in decision trees
• Don’t model 𝑝(𝑥)

• Discriminative:
• Can’t generate data
• Often better
• Fewer variables

• Both are correct

Two class discrimination
• Consider two classes: 𝑦 ∈ 0,1
• We could use linear regression
• Doesn’t perform well
• Values < 0 or > 1 don’t make sense
• We want a model of the form:
• 𝑃 𝑦 = 1 𝑥 = 𝑓 𝑥; 𝜃
• It is a probability, so 0 ≤ 𝑓 ≤ 1
• Also, probabilities sum to 1, so
• 𝑃 𝑦 = 0 𝑥 = 1 − 𝑓 𝑥; 𝜃
• What form should we use for 𝑓?
The logistic function
• We need a function that gives probabilities: 0 ≤ 𝑓 ≤ 1
• Logistic function
1
• 𝑓 𝑧 =𝜎 𝑧 =
1+exp −𝑧
• “Sigmoid function”
• S-shape
𝜎(𝑧)
• “Squashing function”
• As z goes from −∞ to ∞
• 𝑓 goes from 0 to 1

• Notes: 𝑧
• 𝜎 0 = 0.5: “decision boundary”
• 𝜎 ′ 𝑧 = 𝜎(𝑧)(1 − 𝜎 𝑧 ) –ve values of z → class 0
+ve values of z → class 1
Linear weights
• Now we need a way of incorporating features 𝑥 and
parameters/weights 𝜃
• Use the same idea of a linear weighting scheme from linear
regression
• 𝑝 𝑦 = 1 𝑥 = 𝜎(𝜃 𝑇 𝜙 𝑥 )
• 𝜃 is a vector of parameters
• 𝜙 𝑥 is the vector of features
• Decision boundary: 𝜎 𝑧 = 0.5 when 𝑧 = 0
• So: decision boundary = 𝜃 𝑇 𝜙 𝑥 = 0
• For an M dimensional problem, boundary is M-1
dimensional hyperplane
Linear decision boundary

In linear regression, 𝜃 𝑇 𝜙 𝑥
defined the function going
through our data. Here it is the
decision boundary = 𝜃 𝑇 𝜙 𝑥 = 0 function separating our classes.
Cost function
• So:
• 𝑝 𝑦 = 1 𝑥; 𝜃 = 𝜎 𝜃 𝑇 𝜙 𝑥 = ℎ𝜃 𝑥
Why?
• 𝑝 𝑦 = 0 𝑥; 𝜃 = 1 − ℎ𝜃 𝑥 What happens
• Write this more compactly as: when y=0?
And y=1?
𝑦 1−𝑦
• 𝑝 𝑦 𝑥; 𝜃 = ℎ𝜃 𝑥 1 − ℎ𝜃 𝑥

• Likelihood of m data points:

• 𝐿 𝜃 = ς𝑚 𝑖=1 𝑝 𝑦
𝑖 𝑥 𝑖 ;𝜃
𝑦𝑖 1−𝑦 𝑖
• = ς𝑚
𝑖=1 ℎ𝜃 𝑥
𝑖 1 − ℎ𝜃 𝑥 𝑖
Cost function
• Likelihood of m data points:
𝑦𝑖 1−𝑦 𝑖
• 𝐿 𝜃 = ς𝑚
𝑖=1 ℎ𝜃 𝑥
𝑖 1 − ℎ𝜃 𝑥 𝑖

• Take the log of the likelihood:

• 𝑙 𝜃 = log 𝐿(𝜃)
• = σ𝑚𝑖=1 𝑦 𝑖 log(ℎ 𝑥 𝑖 ) + 1 − 𝑦
𝜃
𝑖 log 1 − ℎ𝜃 𝑥 𝑖

• We need to maximise the log likelihood

• Equivalent to minimising 𝐸 𝜃 = −𝑙(𝜃)

• Cannot use a closed form solution

Regularisation
• Just as in linear regression, regularisation is useful here
• Penalise the weights for growing too large
• Note: the higher the weights, the “steeper” the S – so
this stops the model becoming over-confident

• min 𝐸(𝜃) where

𝜃
• 𝐸 𝜃 𝑚=
− ෍ 𝑦 𝑖 log(ℎ𝜃 𝑥 𝑖 + 1−𝑦 𝑖 log 1 − ℎ𝜃 𝑥 𝑖

𝑖=1 𝑑

𝜆 = strength of
+ 𝜆 ෍ 𝜃𝑗2
regularisation 𝑗=1
Regularisation
• Note: the higher the weights, the “steeper” the S – so
regularisation stops the model becoming over-confident

1
𝑦=
𝑒 −𝑘𝑥
Gradient descent (again)
• Initialise 𝜃 0 < 𝛼 ≤ 1 is the learning
• Repeat until convergence: rate, usually set quite
𝜕
• 𝜃𝑗 ← 𝜃𝑗 − 𝛼 𝐽(𝜃) small
𝜕𝜃𝑗
• Simultaneous update for 𝑗 = 0, … , 𝑑

Take a step of size 𝛼

in the “downhill”
direction (negative
gradient)
GD with regularisation
• Initialise 𝜃
No regularisation
• Repeat until convergence: on 𝜃0

• 𝜃0 ← 𝜃0 − 𝛼(ℎ𝜃 𝑥 𝑖 − 𝑦 (𝑖) )
𝑖
• 𝜃𝑗 ← 𝜃𝑗 − 𝛼 ℎ𝜃 𝑥 𝑖 − 𝑦 𝑖 𝑥𝑗 + 𝜆𝜃𝑗

• Simultaneous update for 𝑗 = 0, … , 𝑑

• This is identical to linear regression!

• But the model is completely different:
1
• ℎ𝜃 𝑥 = −𝜃𝑇 𝑥
1+𝑒
The effect of 𝛼
Example
• Generate two random classes of data, from Gaussians
centered at (1, -1) and (-1, 1)
Example
• Weights randomly initialized: 𝜃 = (0.3, −0.01, −0.3)
Example
• Cycle through each data point i:
• Compute:
𝛿𝜃0 = 𝑦 𝑖 − ℎ𝜃 𝑥 𝑖
𝛿𝜃1 = 𝑦 𝑖
− ℎ𝜃 𝑥 𝑖
𝑥1𝑖
𝛿𝜃2 = 𝑦 𝑖 − ℎ𝜃 𝑥 𝑖 𝑥2𝑖

Update:
𝜃0 ← 𝜃0 + 𝛼𝛿𝜃0
𝜃1 ← 𝜃1 + 𝛼𝛿𝜃1
𝜃2 ← 𝜃2 + 𝛼𝛿𝜃2
Example
Example
Example
Example
Example
Example
• Run until convergence (threshold on the size of change of 𝜃)

Probabilities (of class 1):

0.9999

0.4386
---
0.7759

0.0007
Digression: the perceptron
• The logistic function gives a probabilistic output
• What if we wanted to instead force it to be {0, 1}?
• Instead of the logistic function, what about a step function?
1 𝑖𝑓 𝑧 ≥ 0
•𝑔 𝑧 = ቊ
0 𝑖𝑓 𝑧 < 0
• Use this as before:
• 𝑝 𝑦 = 1 𝑥 = 𝑔 𝜃 𝑇 𝜙 𝑥 = ℎ𝜃 (𝑥)

• Perceptron learning rule:

𝑖
• 𝜃𝑗 ← 𝜃𝑗 + 𝛼 𝑦 𝑖 − ℎ𝜃 𝑥 𝑖 𝑥𝑗
• Exactly as before (with a different function)!
The perceptron
• Historical model
• Invented by Frank Rosenblatt (1957)
• Thought to model neurons in the brain
• (Crudely)
• Originally a machine!

• Very controversial:
• Basically claimed they expected to
• “be able to walk, talk, see, write, reproduce itself and be conscious
of its existence”
Linear separability and XOR
• “Perceptrons” by Minsky and Papert (1969)
• Limitation of a perceptron: cannot implement functions
such as a XOR function
• Led to decreased research in neural networks, and
increased research in symbolic AI
Basis functions (again)
• Use basis functions (again) to get round the linear
separability
• Still need it to be separable in some space

Add polynomial
basis functions
Basis functions (again)
• Two Gaussian basis functions: centered at (-1, -1) and (0, 0)
• Data is separable under this transformation
Basis functions (again)

Linear logistic regression Polynomial basis Gaussian basis functions

functions (RBFs)
Multiclass classification
• Instead of classifying between two classes, we may have
more classes
Multiclass logistic regression
• For two classes:
1
• 𝑝 𝑦 = 1 𝑥; 𝜃 = ℎ𝜃 𝑥 =
1+exp(−𝜃𝑇 𝑥)

exp(𝜃 𝑇 𝑥)
=
1 + exp(𝜃 𝑇 𝑥)

• Given C classes:

exp(𝜃𝑘𝑇 𝑥)
• 𝑝 𝑦 = 𝑐𝑘 𝑥; 𝜃 = σ𝐶 𝑇
𝑗=1 exp(𝜃𝑗 𝑥)

• This is the softmax function

• Note that 0 ≤ 𝑝 𝑐𝑘 𝑥; 𝜃 ≤ 1, and σ𝐶𝑗=1 𝑝 𝑐𝑘 𝑥; 𝜃 = 1

Multiclass classification
• Split into one-vs-rest for each of the C classes

• Use gradient descent: update all parameters for all models

simultaneously
• Pick most probable class
Recap
• Discriminative vs generative
• Model (logistic function)
• Decision boundaries
• Cost function
• Regularisation
• Gradient descent
• The perceptron
• Basis functions
• Multiclass classification

Starter Unit - Diagnostic Test
No ratings yet
Starter Unit - Diagnostic Test
4 pages
6.86x Machine Learning With Python: Linear Classifiers
No ratings yet
6.86x Machine Learning With Python: Linear Classifiers
7 pages
12_Bài toán phân lớp_LR_v2
No ratings yet
12_Bài toán phân lớp_LR_v2
130 pages
Week 4 Logistic
No ratings yet
Week 4 Logistic
21 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
NN Theory
No ratings yet
NN Theory
138 pages
A Layman's Guide to the Project
No ratings yet
A Layman's Guide to the Project
34 pages
CH 1
No ratings yet
CH 1
24 pages
04- Linear-Classification-2024
No ratings yet
04- Linear-Classification-2024
65 pages
ML-chap10_2024_110300
No ratings yet
ML-chap10_2024_110300
29 pages
CS229 Andrew NG Lecture Notes
No ratings yet
CS229 Andrew NG Lecture Notes
216 pages
Lecture 2 Math
No ratings yet
Lecture 2 Math
34 pages
Andrew NG Main - Notes PDF
No ratings yet
Andrew NG Main - Notes PDF
226 pages
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
No ratings yet
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
50 pages
Session 6 Machine Learning Algorithms
No ratings yet
Session 6 Machine Learning Algorithms
46 pages
Cours1 ML
No ratings yet
Cours1 ML
41 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
Introduction To Machine Learning Lecture 3: Linear Classification Methods
No ratings yet
Introduction To Machine Learning Lecture 3: Linear Classification Methods
40 pages
ML
No ratings yet
ML
9 pages
CS229 Lecture Notes: Andrew NG and Tengyu Ma April 25, 2023
No ratings yet
CS229 Lecture Notes: Andrew NG and Tengyu Ma April 25, 2023
223 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
46 pages
Unit 3 in Machine Intelligence
No ratings yet
Unit 3 in Machine Intelligence
62 pages
Main Notes
No ratings yet
Main Notes
227 pages
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
No ratings yet
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
17 pages
Main Notes
No ratings yet
Main Notes
227 pages
ML Main Printing Material
No ratings yet
ML Main Printing Material
241 pages
Cs229-Main Notes Andrew NG and Tengyu Ma
No ratings yet
Cs229-Main Notes Andrew NG and Tengyu Ma
227 pages
cs188-sp24-note22
No ratings yet
cs188-sp24-note22
8 pages
M02Logistic Regression Logistic RegressioLogistic Regressionn
No ratings yet
M02Logistic Regression Logistic RegressioLogistic Regressionn
19 pages
3 Non Linear Classifiers
No ratings yet
3 Non Linear Classifiers
74 pages
Lecture 1
No ratings yet
Lecture 1
48 pages
Notes5_Regression
No ratings yet
Notes5_Regression
14 pages
Chapter02-Introduction-to-DeepLearning
No ratings yet
Chapter02-Introduction-to-DeepLearning
84 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
SML_Lecture5
No ratings yet
SML_Lecture5
45 pages
Stanford ML
No ratings yet
Stanford ML
168 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
Minsky y Papert
No ratings yet
Minsky y Papert
77 pages
UNIT 1,2,3
No ratings yet
UNIT 1,2,3
17 pages
cs188 Fa23 Note21
No ratings yet
cs188 Fa23 Note21
8 pages
Brief Summary ML
No ratings yet
Brief Summary ML
25 pages
הרצאה-Classifiers and Decision Trees
No ratings yet
הרצאה-Classifiers and Decision Trees
119 pages
Linear Discriminant Functions: CS479/679 Pattern Recognition Dr. George Bebis
No ratings yet
Linear Discriminant Functions: CS479/679 Pattern Recognition Dr. George Bebis
41 pages
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
No ratings yet
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
10 pages
Neural Network
No ratings yet
Neural Network
14 pages
CS60010: Deep Learning: Spring 2021
No ratings yet
CS60010: Deep Learning: Spring 2021
32 pages
Lec1 PerceptronPocket Recap
No ratings yet
Lec1 PerceptronPocket Recap
61 pages
4 Linear Regression Additional Notes
No ratings yet
4 Linear Regression Additional Notes
8 pages
Machinelearning
No ratings yet
Machinelearning
59 pages
04 Probability and Learning PDF
No ratings yet
04 Probability and Learning PDF
34 pages
ECE_449_Notes
No ratings yet
ECE_449_Notes
5 pages
Lecture 8: Gradient Descent and Logistic Regression
No ratings yet
Lecture 8: Gradient Descent and Logistic Regression
39 pages
Cheatsheet Supervised Learning
No ratings yet
Cheatsheet Supervised Learning
4 pages
Unit 3-Discriminative Models
No ratings yet
Unit 3-Discriminative Models
29 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
Lecture3 Logistic Regression Classifier V0
No ratings yet
Lecture3 Logistic Regression Classifier V0
41 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Exercises of Logarithms and Exponentials
From Everand
Exercises of Logarithms and Exponentials
Simone Malacrida
No ratings yet
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
From Everand
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
Luke Aneke
No ratings yet
Book2 Serial
No ratings yet
Book2 Serial
8 pages
E Business and Business Intelligence
No ratings yet
E Business and Business Intelligence
2 pages
Based On Annex 2B.6 To Deped Order No. 42, S. 2016: Daily Lesson Log Senior High School
No ratings yet
Based On Annex 2B.6 To Deped Order No. 42, S. 2016: Daily Lesson Log Senior High School
2 pages
Roberts (2014) Powerful Knowledge and Geographical Education
No ratings yet
Roberts (2014) Powerful Knowledge and Geographical Education
24 pages
Conectores Series Kq2 SMC
No ratings yet
Conectores Series Kq2 SMC
21 pages
Autocad - Module 3
No ratings yet
Autocad - Module 3
11 pages
A Review of Azimuth Thruster: Virendra Desai-Patil, Abhishek Ayare, Bhushan Mahajan, Sushmita Bade
No ratings yet
A Review of Azimuth Thruster: Virendra Desai-Patil, Abhishek Ayare, Bhushan Mahajan, Sushmita Bade
3 pages
31-Station Acwp - Sop
100% (1)
31-Station Acwp - Sop
3 pages
Seam3 Finals Topic1
No ratings yet
Seam3 Finals Topic1
3 pages
Rancangan Pengajaran Tahunan Ppki KSSR Tahun 1 Bahasa Inggeris
100% (1)
Rancangan Pengajaran Tahunan Ppki KSSR Tahun 1 Bahasa Inggeris
7 pages
Building The Innovative Organization (Part 1) (Week 5) : Bachelor of Business Management (Hons)
No ratings yet
Building The Innovative Organization (Part 1) (Week 5) : Bachelor of Business Management (Hons)
39 pages
Central University of Kashmir: Department of Information Technology
No ratings yet
Central University of Kashmir: Department of Information Technology
38 pages
Concrete Technology MODULE - 1 - LEC3
No ratings yet
Concrete Technology MODULE - 1 - LEC3
12 pages
Open Journal of Architectural Design Zaha Hadid PDF
No ratings yet
Open Journal of Architectural Design Zaha Hadid PDF
9 pages
LPCB Approved: Dry Pillar Fire Hydrants
No ratings yet
LPCB Approved: Dry Pillar Fire Hydrants
6 pages
Salahuddin Ayyubi
No ratings yet
Salahuddin Ayyubi
6 pages
Big Rock Candy Mountain
No ratings yet
Big Rock Candy Mountain
3 pages
EML 4501 Syllabus
No ratings yet
EML 4501 Syllabus
4 pages
1 - The AWS Console
No ratings yet
1 - The AWS Console
2 pages
History Websites List
No ratings yet
History Websites List
6 pages
Pre Placement Paid Internship ServiceNow
No ratings yet
Pre Placement Paid Internship ServiceNow
4 pages
Global Database: Data Coverage
No ratings yet
Global Database: Data Coverage
2 pages
Regarding RSN - Reply To Query
No ratings yet
Regarding RSN - Reply To Query
7 pages
Catalog Indo-Medx
No ratings yet
Catalog Indo-Medx
8 pages
Benefits of Cow
No ratings yet
Benefits of Cow
5 pages
Abigail_at_Red_Shield-student_copy
No ratings yet
Abigail_at_Red_Shield-student_copy
8 pages
Effectiveness of Utilizing Visuals and Displays in Teaching Kindergarten at Subay Elementary School
No ratings yet
Effectiveness of Utilizing Visuals and Displays in Teaching Kindergarten at Subay Elementary School
2 pages
Android Development Slides Lec 01 GCUF
No ratings yet
Android Development Slides Lec 01 GCUF
16 pages
Computer Technology in India
No ratings yet
Computer Technology in India
10 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

2021 Logistic Regression

Uploaded by

2021 Logistic Regression

Uploaded by

Machine Learning – COMS3007

Based heavily on course notes by

• Now model 𝑝 𝑦 𝒙 directly: discriminative approach

• Both are correct

• Likelihood of m data points:

• Take the log of the likelihood:

• We need to maximise the log likelihood

• Equivalent to minimising 𝐸 𝜃 = −𝑙(𝜃)

• Cannot use a closed form solution

• min 𝐸(𝜃) where

Take a step of size 𝛼

• Simultaneous update for 𝑗 = 0, … , 𝑑

• This is identical to linear regression!

Probabilities (of class 1):

• Perceptron learning rule:

Linear logistic regression Polynomial basis Gaussian basis functions

• This is the softmax function

• Note that 0 ≤ 𝑝 𝑐𝑘 𝑥; 𝜃 ≤ 1, and σ𝐶𝑗=1 𝑝 𝑐𝑘 𝑥; 𝜃 = 1

• Use gradient descent: update all parameters for all models

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.