0% found this document useful (0 votes)

8 views

Week 4 Logistic

Introduction to Applied Machine Learning

Uploaded by

zeliawillscumberg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Week 4 Logistic

Introduction to Applied Machine Learning

Uploaded by

zeliawillscumberg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

IAML: Logistic Regression

Nigel Goddard
School of Informatics

Semester 1

1 / 21
Outline

I Logistic function
I Logistic regression
I Learning logistic regression
I Optimization
I The power of non-linear basis functions
I Least-squares classification
I Generative and discriminative models
I Relationships to Generative Models
I Multiclass classification

2 / 21
Decision Boundaries

I In this class we will discuss linear classifiers.

I For each class, there is a region of feature space in which
the classifier selects one class over the other.
I The decision boundary is the boundary of this region. (i.e.,
where the two classes are “tied”)
I In linear classifiers the decision boundary is a line.

3 / 21
Example Data

x2
o o
o o
o o
o
o o
x o
o
x x
x

x
x x x1
x

4 / 21
Linear Classifiers

I In a two-class linear classifier, we

learn a function
x2

o
o

o
o
o

o
F (x, w) = w> x + w0
o
o o
x o
x
x
x
o
that represents how aligned the
x
x x x1 instance is with y = 1.
x
I w are parameters of the classifier
that we learn from data.
I To do classification of an input x:

x 7→ (y = 1) if F (x, w) > 0

5 / 21
A Geometric View

x2
o o
o o
o o
o
w o o
x o
o
x x
x

x
x x x1
x

6 / 21
Explanation of Geometric View

I The decision boundary in this case is

{x|w> x + w0 = 0}

I w is a normal vector to this surface

I (Remember how lines can be written in terms of their
normal vector.)
I Notice that in more than 2 dimensions, this boundary will
be a hyperplane.

7 / 21
Two Class Discrimination

I For now consider a two class case: y ∈ {0, 1}.

I From now on we’ll write x = (1, x1 , x2 , . . . xd ) and
w = (w0 , w1 , . . . wd ).
I We will want a linear, probabilistic model. We could try
P(y = 1|x) = w> x. But this is stupid.
I Instead what we will do is

P(y = 1|x) = f (w> x)

I f must be between 0 and 1. It will squash the real line into

[0, 1]
I Furthermore the fact that probabilities sum to one means

P(y = 0|x) = 1 − f (w> x)

8 / 21
The logistic function
The logistic function
I We need a function that returns probabilities (i.e. stays
We need
�between 0aand
function
1). that returns probabilities (i.e. stays
between 0 and 1). provides this
I The logistic function
� The logistic function provides this
I f (z) = σ(z) ≡ 1/(1 + exp(−z)).
� f (z) = σ(z) ≡ 1/(1 + exp(−z)).
I As z goes from −∞ to ∞, so f goes from 0 to 1, a
� As z goes from −∞ to ∞, so f goes from 0 to 1, a
“squashing function”
“squashing function”
I It has a “sigmoid” shape (i.e. S-like shape)
� It has a “sigmoid” shape (i.e. S-like shape)

0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1

−6 −4 −2 0 2 4 6

6 / 24
9 / 21
Linear weights

I Linear weights + logistic squashing function == logistic

regression.
I We model the class probabilities as
D
wj xj ) = σ(wT x)
X
p(y = 1|x) = σ(
j=0

I σ(z) = 0.5 when z = 0. Hence the decision boundary is

given by wT x = 0.
I Decision boundary is a M − 1 hyperplane for a M
dimensional problem.

10 / 21
Logistic regression

I For this slide write w̃ = (w1 , w2 , . . . wd ) (i.e., exclude the

bias w0 )
I The bias parameter w0 shifts the position of the
hyperplane, but does not alter the angle
I The direction of the vector w̃ affects the angle of the
hyperplane. The hyperplane is perpendicular to w̃
I The magnitude of the vector w̃ effects how certain the
classifications are
I For small w̃ most of the probabilities within the region of
the decision boundary will be near to 0.5.
I For large w̃ probabilities in the same region will be close to
1 or 0.

11 / 21
Learning Logistic Regression

I Want to set the parameters w using training data.

I As before:
I Write out the model and hence the likelihood
I Find the derivatives of the log likelihood w.r.t the
parameters.
I Adjust the parameters to maximize the log likelihood.

12 / 21
I Assume data is independent and identically distributed.
I Call the data set D = {(x1 , y1 ), (x2 , y2 ), . . . (xn , yn )}
I The likelihood is
n
Y
p(D|w) = p(y = yi |xi , w)
i=1
n
p(y = 1|xi , w)yi (1 − p(y = 1|xi , w))1−yi
Y
=
i=1

I Hence the log likelihood L(w) = log p(D|w) is given by

n
X
L(w) = yi log σ(w> xi ) + (1 − yi ) log(1 − σ(w> xi ))
i=1

13 / 21
I It turns out that the likelihood has a unique optimum (given
sufficient training examples). It is convex.
I How to maximize? Take gradient
n
∂L
(yi − σ(wT xi ))xij
X
=
∂wj
i=1

I (Aside: something similar holds for linear regression

n
∂E
(wT φ(xi ) − yi )xij
X
=
∂wj
i=1

where E is squared error.)

I Unfortunately, you cannot maximize L(w) explicitly as for
linear regression. You need to use a numerical
optimisation method, see later.

14 / 21
Fitting this into the general structure for learning algorithms:

I Define the task: classification, discriminative

I Decide on the model structure: logistic regression model
I Decide on the score function: log likelihood
I Decide on optimization/search method to optimize the
score function: numerical optimization routine. Note we
have several choices here (stochastic gradient descent,
conjugate gradient, BFGS).

15 / 21
XOR and Linear Separability
XOR and Linear Separability
I A problem is linearly separable if we can find weights so
that
� A Tproblem is linearly separable if we can find weights so
I w̃ x + w0 > 0 for all positive cases (where y = 1), and
that
I w̃T� xw̃+T xw+0 w≤0 >
0 0forforall
allnegative cases
positive cases (where
(where y=
y = 1), and0)
I XOR � w̃T x + w0 ≤ 0 for all negative cases (where y = 0)
� XOR, a failure for the perceptron

� XOR can be solved by a perceptron using a nonlinear

transformation φ(x) of the input; can you find one?
I XOR becomes linearly separable if we apply a non-linear
11 / 24
tranformation φ(x) of the input — what is one?
16 / 21
The power of non-linear basis functions

1
1

φ2
x2

0 0.5

−1
0

−1 0 x1 1 0 0.5 φ1 1

Using two Gaussian basis functions φ1 (x) and φ2 (x)

Figure credit: Chris Bishop, PRML

As for linear regression, we can transform the input space if we

want x → φ(x) 17 / 21
Generative and Discriminative Models
I Notice that we have done something very different here
than with naive Bayes.
I Naive Bayes: Modelled how a class “generated” the
feature vector p(x|y ). Then could classify using

p(y|x) ∝ p(x|y )p(y )

. This called is a generative approach.

I Logistic regression: Model p(y|x) directly. This is a
discriminative approach.
I Discriminative advantage: Why spend effort modelling
p(x)? Seems a waste, we’re always given it as input.
I Generative advantage: Can be good with missing data
(remember how naive Bayes handles missing data). Also
good for detecting outliers. Or, sometimes you really do
want to generate the input.
18 / 21
Generative Classifiers can be Linear Too

Two scenarios where naive Bayes gives you a linear classifier.

1. Gaussian data with equal covariance. If
p(x|y = 1) ∼ N(µ1 , Σ) and p(x|y = 0) ∼ N(µ2 , Σ) then

p(y = 1|x) = σ(w̃T x + w0 )

for some (w0 , w̃) that depends on µ1 , µ2 , Σ and the class

priors
2. Binary data. Let each component xj be a Bernoulli variable
i.e. xj ∈ {0, 1}. Then a Naı̈ve Bayes classifier has the form

p(y = 1|x) = σ(w̃T x + w0 )

3. Exercise for keeners: prove these two results

19 / 21
Multiclass classification

I Create a different weight vector wk for each class, to

classify into k and not-k.
I Then use the “softmax” function

exp(wTk x)
p(y = k|x) = PC
T
j=1 exp(wj x)

PC
I Note that 0 ≤ p(y = k |x) ≤ 1 and j=1 p(y = j|x) = 1
I This is the natural generalization of logistic regression to
more than 2 classes.

20 / 21
Least-squares classification
I Logistic regression is more complicated algorithmically
than linear regression
I Why not just use linear regression with 0/1 targets?

4 4

2 2

0 0

−2 −2

−4 −4

−6 −6

−8 −8

−4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8

Green: logistic regression; magenta, least-squares regression

Figure credit: Chris Bishop, PRML
21 / 21

515 HPU Facility Integration
100% (2)
515 HPU Facility Integration
35 pages
04- Linear-Classification-2024
No ratings yet
04- Linear-Classification-2024
65 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
Introduction To Machine Learning: 2 Linear Classifiers
No ratings yet
Introduction To Machine Learning: 2 Linear Classifiers
4 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Logistic Regression
No ratings yet
Logistic Regression
24 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Cheatsheet Supervised Learning
100% (1)
Cheatsheet Supervised Learning
4 pages
Logistic Regression: Some Slides Adapted From Dan Jurfasky and Brendan O'Connor
No ratings yet
Logistic Regression: Some Slides Adapted From Dan Jurfasky and Brendan O'Connor
53 pages
7 Logistic-Regression
No ratings yet
7 Logistic-Regression
63 pages
04 Probability and Learning PDF
No ratings yet
04 Probability and Learning PDF
34 pages
4 Linear Regression Additional Notes
No ratings yet
4 Linear Regression Additional Notes
8 pages
ML_MU_Unit_2 - Supervised Learning-Classification Techniques
No ratings yet
ML_MU_Unit_2 - Supervised Learning-Classification Techniques
153 pages
Machine Learning 2
No ratings yet
Machine Learning 2
19 pages
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
No ratings yet
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
67 pages
12_Bài toán phân lớp_LR_v2
No ratings yet
12_Bài toán phân lớp_LR_v2
130 pages
COMP-377Week6_v1.1
No ratings yet
COMP-377Week6_v1.1
38 pages
Logistic Regression
No ratings yet
Logistic Regression
26 pages
Logistic Regression
No ratings yet
Logistic Regression
78 pages
Lecture 5_Logistic Regression (1)
No ratings yet
Lecture 5_Logistic Regression (1)
28 pages
ML DSBA Lab2
No ratings yet
ML DSBA Lab2
4 pages
09_23ECE216_LogisticRegression
No ratings yet
09_23ECE216_LogisticRegression
40 pages
Support Vector Machines Vs Logistic Regression: Kevin Swersky University of Toronto CSC2515 Tutorial
No ratings yet
Support Vector Machines Vs Logistic Regression: Kevin Swersky University of Toronto CSC2515 Tutorial
23 pages
Cheatsheet Supervised Learning
No ratings yet
Cheatsheet Supervised Learning
4 pages
Assignment-2: 1) Explain Classification With Logistic Regression and Sigmoid Function
No ratings yet
Assignment-2: 1) Explain Classification With Logistic Regression and Sigmoid Function
6 pages
3-LG_Eval
No ratings yet
3-LG_Eval
52 pages
Revisiting Revisiting Logistic Regression & Naïve Logistic Regression & Naïve Bayes Bayes
No ratings yet
Revisiting Revisiting Logistic Regression & Naïve Logistic Regression & Naïve Bayes Bayes
46 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Logistic Regression
No ratings yet
Logistic Regression
11 pages
20-questions-to-test-your-skills-on-logistic-regression
No ratings yet
20-questions-to-test-your-skills-on-logistic-regression
9 pages
Final Ml
No ratings yet
Final Ml
54 pages
Unit II
100% (1)
Unit II
13 pages
20MEMECH Part 3 - Classification
No ratings yet
20MEMECH Part 3 - Classification
49 pages
10. Binary Logistic Regression 2
No ratings yet
10. Binary Logistic Regression 2
43 pages
fileml
No ratings yet
fileml
54 pages
Lect4 Log Reg
No ratings yet
Lect4 Log Reg
20 pages
Ch03 LogisticRegression
No ratings yet
Ch03 LogisticRegression
79 pages
Multimedia Application L9
No ratings yet
Multimedia Application L9
43 pages
M02Logistic Regression Logistic RegressioLogistic Regressionn
No ratings yet
M02Logistic Regression Logistic RegressioLogistic Regressionn
19 pages
ML Classification Trupesh Patel
No ratings yet
ML Classification Trupesh Patel
39 pages
Datamining-lect4 - Other Classification Techniques. Nearest Neighbor Classifiers, Support Vector Machines, Logistic Regression, Naive Bayes Classification. Supervised Learning
No ratings yet
Datamining-lect4 - Other Classification Techniques. Nearest Neighbor Classifiers, Support Vector Machines, Logistic Regression, Naive Bayes Classification. Supervised Learning
79 pages
lec20
No ratings yet
lec20
16 pages
FEM 2063 - Data Analytics: CHAPTER 4: Classifications
100% (1)
FEM 2063 - Data Analytics: CHAPTER 4: Classifications
76 pages
5_LR_Apr_7_2021 (3)
No ratings yet
5_LR_Apr_7_2021 (3)
93 pages
03-Logistic Regression
No ratings yet
03-Logistic Regression
59 pages
lec4 (1)
No ratings yet
lec4 (1)
24 pages
5 LR Apr 7 2021
No ratings yet
5 LR Apr 7 2021
94 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
Q No. 1 1.1machine Learning:: Machine Learning Is The Study of Computer Algorithms That Improve Automatically
No ratings yet
Q No. 1 1.1machine Learning:: Machine Learning Is The Study of Computer Algorithms That Improve Automatically
10 pages
CS229 Supplemental Lecture Notes: 1 Binary Classification
No ratings yet
CS229 Supplemental Lecture Notes: 1 Binary Classification
7 pages
CH 1
No ratings yet
CH 1
24 pages
Machine Learning - Logistic Regression
No ratings yet
Machine Learning - Logistic Regression
16 pages
Logistic Regression: Adapted From: Tom Mitchell's Machine Learning Book Evan Wei Xiang and Qiang Yang
No ratings yet
Logistic Regression: Adapted From: Tom Mitchell's Machine Learning Book Evan Wei Xiang and Qiang Yang
15 pages
11-Logistic Regression
No ratings yet
11-Logistic Regression
27 pages
Lecture 8 Logistic Regression
No ratings yet
Lecture 8 Logistic Regression
34 pages
04-LogisticRegression
No ratings yet
04-LogisticRegression
29 pages
Lec18 Logistic Regression d6a15824b03c57e66682ca61d6a89d99
No ratings yet
Lec18 Logistic Regression d6a15824b03c57e66682ca61d6a89d99
17 pages
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
MATH11183 Week 1-Part 2
No ratings yet
MATH11183 Week 1-Part 2
18 pages
Award_in_Education_and_Training_sample
No ratings yet
Award_in_Education_and_Training_sample
9 pages
w2e_multivariate_gaussian
No ratings yet
w2e_multivariate_gaussian
6 pages
BDS 2018-19
No ratings yet
BDS 2018-19
6 pages
Slides 03 A
No ratings yet
Slides 03 A
21 pages
w2c_central_limit
No ratings yet
w2c_central_limit
1 page
Biological Data Science Lecture4
No ratings yet
Biological Data Science Lecture4
21 pages
Biological Data Science Lecture6
No ratings yet
Biological Data Science Lecture6
29 pages
BDS 2016-17
No ratings yet
BDS 2016-17
4 pages
Week 8 Pca
No ratings yet
Week 8 Pca
26 pages
MDA3S
No ratings yet
MDA3S
22 pages
Part 4
No ratings yet
Part 4
24 pages
Part 3
No ratings yet
Part 3
29 pages
Part 5
No ratings yet
Part 5
31 pages
Week 2 Naive Bayes
No ratings yet
Week 2 Naive Bayes
15 pages
TS Part2
No ratings yet
TS Part2
62 pages
PMRslides 03 B
No ratings yet
PMRslides 03 B
45 pages
Bayesian Workshop1 Solution
No ratings yet
Bayesian Workshop1 Solution
3 pages
MLPR w0f - Machine Learning and Pattern Recognition
No ratings yet
MLPR w0f - Machine Learning and Pattern Recognition
3 pages
PMRslides 02
No ratings yet
PMRslides 02
13 pages
Bayesian Week4 LectureNotes
No ratings yet
Bayesian Week4 LectureNotes
15 pages
W6a Gaussian Process Kernels
No ratings yet
W6a Gaussian Process Kernels
6 pages
w9b Netflix Prize
No ratings yet
w9b Netflix Prize
3 pages
Bio Statslectures
No ratings yet
Bio Statslectures
60 pages
2019 AMAM Exam Paper
No ratings yet
2019 AMAM Exam Paper
3 pages
Heat Advection
No ratings yet
Heat Advection
12 pages
Machine Learning and Pattern Recognition - Laplace - Approximation
No ratings yet
Machine Learning and Pattern Recognition - Laplace - Approximation
4 pages
Machine Learning and Pattern Recognition Variational KL
No ratings yet
Machine Learning and Pattern Recognition Variational KL
5 pages
2017 AMAM Exam Paper
No ratings yet
2017 AMAM Exam Paper
6 pages
Machine Learning and Pattern Recognition Minimal Stochastic Variational Inference Demo
No ratings yet
Machine Learning and Pattern Recognition Minimal Stochastic Variational Inference Demo
3 pages
Vi Cheat Sheet
No ratings yet
Vi Cheat Sheet
4 pages
IGCSE 23 Current&VoltageInCircuits
100% (1)
IGCSE 23 Current&VoltageInCircuits
31 pages
HPAC Admin Guide
No ratings yet
HPAC Admin Guide
56 pages
ZUP How To Use
No ratings yet
ZUP How To Use
17 pages
Ifoa Libraryservices Akhub Userguide 202311 v4
No ratings yet
Ifoa Libraryservices Akhub Userguide 202311 v4
9 pages
Color Basics
No ratings yet
Color Basics
10 pages
Log
No ratings yet
Log
16 pages
KarthicK Resume
No ratings yet
KarthicK Resume
6 pages
CV Format
No ratings yet
CV Format
1 page
Motionpro Double-Wishbone Suspension: - Kinematic Modeling, Simulation and Optimization of A
No ratings yet
Motionpro Double-Wishbone Suspension: - Kinematic Modeling, Simulation and Optimization of A
1 page
(EN) Roll20 Sheet Guide - Project Moon TRPG
No ratings yet
(EN) Roll20 Sheet Guide - Project Moon TRPG
28 pages
Entity-Relationship (ER) Modeling: Discussion Focus
No ratings yet
Entity-Relationship (ER) Modeling: Discussion Focus
73 pages
RV College of Engineering: Cad/Cam Assignment Ashish Kumar S Mogha 1RV09ME401
No ratings yet
RV College of Engineering: Cad/Cam Assignment Ashish Kumar S Mogha 1RV09ME401
14 pages
STD 12 Chapter 9 Working With Array and String Textual Exercise and Previous Years Board Papers
100% (1)
STD 12 Chapter 9 Working With Array and String Textual Exercise and Previous Years Board Papers
10 pages
Flow Controls
No ratings yet
Flow Controls
35 pages
Unit 5 SAP
No ratings yet
Unit 5 SAP
10 pages
1 Introduction To GSM
No ratings yet
1 Introduction To GSM
75 pages
SP3.1 Basic Loader Function
No ratings yet
SP3.1 Basic Loader Function
6 pages
Unit 3
No ratings yet
Unit 3
20 pages
Dance Music Formula Course Contents 2018 1
100% (1)
Dance Music Formula Course Contents 2018 1
17 pages
Major Packages of ERP
No ratings yet
Major Packages of ERP
12 pages
EDN's Best of Design Ideas - Volume 1
100% (5)
EDN's Best of Design Ideas - Volume 1
24 pages
Character - Dané
No ratings yet
Character - Dané
3 pages
Torrance Galbraith The Processing Demands of Writing 2006
No ratings yet
Torrance Galbraith The Processing Demands of Writing 2006
16 pages
Introduction To Cool Edit Pro PDF
0% (1)
Introduction To Cool Edit Pro PDF
2 pages
Stat Fax - 2200: Service Manual
No ratings yet
Stat Fax - 2200: Service Manual
18 pages
Homework Table Target
100% (1)
Homework Table Target
8 pages
Data Models: Unit-3 Introduction To Database
No ratings yet
Data Models: Unit-3 Introduction To Database
7 pages
Oren
No ratings yet
Oren
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Week 4 Logistic

Uploaded by

Week 4 Logistic

Uploaded by

IAML: Logistic Regression

I In this class we will discuss linear classifiers.

I In a two-class linear classifier, we

I The decision boundary in this case is

I w is a normal vector to this surface

I For now consider a two class case: y ∈ {0, 1}.

P(y = 1|x) = f (w> x)

I f must be between 0 and 1. It will squash the real line into

P(y = 0|x) = 1 − f (w> x)

I Linear weights + logistic squashing function == logistic

I σ(z) = 0.5 when z = 0. Hence the decision boundary is

I For this slide write w̃ = (w1 , w2 , . . . wd ) (i.e., exclude the

I Want to set the parameters w using training data.

I Hence the log likelihood L(w) = log p(D|w) is given by

I (Aside: something similar holds for linear regression

where E is squared error.)

I Define the task: classification, discriminative

� XOR can be solved by a perceptron using a nonlinear

Using two Gaussian basis functions φ1 (x) and φ2 (x)

As for linear regression, we can transform the input space if we

p(y|x) ∝ p(x|y )p(y )

. This called is a generative approach.

Two scenarios where naive Bayes gives you a linear classifier.

p(y = 1|x) = σ(w̃T x + w0 )

for some (w0 , w̃) that depends on µ1 , µ2 , Σ and the class

p(y = 1|x) = σ(w̃T x + w0 )

3. Exercise for keeners: prove these two results

I Create a different weight vector wk for each class, to

Green: logistic regression; magenta, least-squares regression

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.