0% found this document useful (0 votes)

10 views48 pages

Linear Classifiers

The document discusses linear classifiers, focusing on discriminant functions and their role in classification problems. It covers the perceptron algorithm, decision boundaries, and the challenges of non-linear separability, as well as optimization techniques like stochastic gradient descent and the pocket algorithm. The content is structured around the principles of machine learning as taught in a course at Sharif University of Technology.

Uploaded by

samira.nazari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views48 pages

Linear Classifiers

Uploaded by

samira.nazari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

Linear classifiers

CE-717: Machine Learning

Sharif University of Technology

M. Soleymani
Fall 2016
Topics
 Discriminant functions
 Linear classifiers
 Perceptron
SVM will be covered in the later lectures
 Fisher
 Multi-class classification

2
Classification problem
 Given:Training set
𝑖 𝑖 𝑁
 labeled set of 𝑁 input-output pairs 𝐷 = 𝒙 ,𝑦 𝑖=1
 𝑦 ∈ {1, … , 𝐾}

 Goal: Given an input 𝒙, assign it to one of 𝐾 classes

 Examples:
 Spam filter
 Handwritten digit recognition
 …

3
Discriminant functions
 Discriminant function can directly assign each vector 𝒙 to a
specific class 𝑘

 A popular way of representing a classifier

 Many classification methods are based on discriminant functions

 Assumption: the classes are taken to be disjoint

 The input space is thereby divided into decision regions
 boundaries are called decision boundaries or decision surfaces.

4
Discriminant Functions
 Discriminant functions: A discriminant function 𝑓𝑖 𝒙
for each class 𝒞𝑖 (𝑖 = 1, … , 𝐾):
 𝒙 is assigned to class 𝒞𝑖 if:
𝑓𝑖 (𝒙) > 𝑓𝑗(𝒙) 𝑗  𝑖

 Thus, we can easily divide the feature space into 𝐾 decision

regions
∀𝒙, 𝑓𝑖(𝒙) > 𝑓𝑗(𝒙) 𝑗  𝑖 ⇒ 𝒙 ∈ ℛ𝑖
ℛ𝑖 : Region of the 𝑖-th class

 Decision surfaces (or boundaries) can also be found using

discriminant functions
 Boundary of the ℛ𝑖 and ℛ𝑗 separating samples of these two categories:
∀𝒙, 𝑓𝑖 𝒙 = 𝑓𝑗(𝒙)
5
Discriminant Functions: Two-Category

 Decision surface: 𝑓 𝒙 = 0

 For two-category problem, we can only find a function 𝑓 ∶ ℝd

→ ℝ
 𝑓1 𝒙 = 𝑓(𝒙)
 𝑓2 𝒙 = −𝑓(𝒙)

 First, we explain two-category classification problem and then

discuss the multi-category problems.
 Binary classification: a target variable 𝑦 ∈ 0,1 or 𝑦 ∈ −1,1

6
Linear classifiers
 Decision boundaries are linear in 𝒙, or linear in some
given set of functions of 𝒙
 Linearly separable data: data points that can be exactly
classified by a linear decision surface.

 Why linear classifier?

 Even when they are not optimal, we can use their simplicity
 are relatively easy to compute
 In the absence of information suggesting otherwise, linear classifiers are an
attractive candidates for initial, trial classifiers.

7
Two Category
 𝑓 𝒙; 𝒘 = 𝒘𝑇 𝒙 + 𝑤0 = 𝑤0 + 𝑤1 𝑥1 + . . . 𝑤𝑑 𝑥𝑑
 𝒙 = 𝑥1 𝑥2 … 𝑥𝑑
 𝒘 = [𝑤1 𝑤2 … 𝑤𝑑 ]
 𝑤0 : bias

 if 𝒘𝑇 𝒙 + 𝑤0 ≥ 0 then 𝒞1
 else 𝒞2

Decision surface (boundary): 𝒘𝑇 𝒙 + 𝑤0 = 0

𝒘 is orthogonal to every vector lying within the decision surface

8
Example

3
3 − 𝑥1 − 𝑥2 = 0
4
𝑥2
3
2 if 𝒘𝑇 𝒙 + 𝑤0 ≥ 0 then 𝒞1
else 𝒞2
1

𝑥
1 2 3 4 1

9
Linear classifier: Two Category
 Decision boundary is a (𝑑 − 1)-dimensional hyperplane 𝐻 in
the 𝑑-dimensional feature space
 The orientation of 𝐻 is determined by the normal vector 𝑤1 , … , 𝑤𝑑
 𝑤0 determine the location of the surface.
𝑤0
 The normal distance from the origin to the decision surface is 𝒘

𝒘
𝒙 = 𝒙⊥ + 𝑟
𝒘
𝒘𝑇𝒙 + 𝑤 𝒙⊥
0
𝒘𝑇 𝒙 + 𝑤0 = 𝑟 𝒘 ⇒ 𝑟 =
𝒘

𝑓 𝒙 =0
gives a signed measure of the perpendicular
distance 𝑟 of the point 𝒙 from the decision surface
10
Linear boundary: geometry
𝒘𝑇 𝒙 + 𝑤0 > 0

𝒘𝑇 𝒙 + 𝑤0 = 0

𝒘𝑇 𝒙 + 𝑤0 < 0

𝒘𝑇 𝒙 + 𝑤0
𝒘

11
Non-linear decision boundary
 Choose non-linear features
 Classifier still linear in parameters 𝒘

𝑥2 −1 + 𝑥12 + 𝑥22 = 0

𝝓 𝒙 = [1, 𝒙1 , 𝒙2 , 𝒙12 , 𝒙22 , 𝒙1 𝒙2 ]

1
𝒘 = 𝑤0 , 𝑤1 , … , 𝑤𝑚 = [−1, 0, 0,1,1,0]

1 1 𝑥1
-1
if 𝒘𝑇 𝝓(𝒙) ≥ 0 then 𝑦 = 1
else 𝑦 = −1

𝒙 = [𝒙1 , 𝒙2 ]

12
Cost Function for linear classification
 Finding linear classifiers can be formulated as an optimization
problem:
 Select how to measure the prediction loss
𝑛
 Based on the training set 𝐷 = 𝒙 𝑖 ,𝑦 𝑖
𝑖=1
, a cost function 𝐽 𝒘 is defined
 Solve the resulting optimization problem to find parameters:
 Find optimal 𝑓 𝒙 = 𝑓 𝒙; 𝒘 where 𝒘 = argmin 𝐽 𝒘
𝒘

 Criterion or cost functions for classification:

 We will investigate several cost functions for the classification problem

13
SSE cost function for classification 𝐾=2

SSE cost function is not suitable for classification:

 Least square loss penalizes ‘too correct’ predictions (that they lie a long
way on the correct side of the decision)
 Least square loss also lack robustness to noise

𝑁
𝑇 𝑖 𝑖 2
𝐽 𝒘 = 𝒘 𝒙 −𝑦
𝑖=1

14
SSE cost function for classification 𝐾=2

𝒘𝑇 𝒙 − 𝑦 2
𝑦=1

1 𝒘𝑇 𝒙
Correct predictions that
are penalized by SSE 𝒘𝑇 𝒙 − 𝑦 2
𝑦 = −1

[Bishop] −1 𝒘𝑇 𝒙

15
SSE cost function for classification 𝐾=2

 Is it more suitable if we set 𝑓 𝒙; 𝒘 = 𝑔 𝒘𝑇 𝒙 ?

𝑁
2 sign 𝒘𝑇 𝒙 − 𝑦 2

𝐽 𝒘 = sign 𝒘𝑇 𝒙 𝑖 −𝑦 𝑖

𝑖=1 𝑦=1

−1, 𝑧<0
sign 𝑧 = 𝒘𝑇 𝒙
1, 𝑧≥0

 𝐽 𝒘 is a piecewise constant function shows the number

of misclassifications
𝐽(𝒘)
Training error incurred in classifying
training samples

16
Perceptron algorithm
 Linear classifier
 Two-class: 𝑦 ∈ {−1,1}
 𝑦 = −1 for 𝐶2 , 𝑦 = 1 for 𝐶1

 Goal: ∀𝑖, 𝒙(𝑖) ∈ 𝐶1 ⇒ 𝒘𝑇 𝒙(𝑖) > 0

 ∀𝑖, 𝒙 𝑖 ∈ 𝐶2 ⇒ 𝒘𝑇 𝒙 𝑖 <0

 𝑓 𝒙; 𝒘 = sign(𝒘𝑇 𝒙)

18
Perceptron criterion

𝐽𝑃 𝒘 = − 𝒘𝑇 𝒙 𝑖 𝑦 𝑖

𝑖∈ℳ

ℳ: subset of training data that are misclassified

Many solutions? Which solution among them?

19
Cost function
𝐽(𝒘) 𝐽𝑃 (𝒘)

𝑤0 𝑤0
𝑤1 𝑤1

# of misclassifications Perceptron’s
as a cost function cost function

There may be many solutions in these cost functions

20 [Duda, Hart, and Stork, 2002]

Batch Perceptron
“Gradient Descent” to solve the optimization problem:

𝒘𝑡+1 = 𝒘𝑡 − 𝜂𝛻𝒘 𝐽𝑃 (𝒘𝑡 )

𝛻𝒘 𝐽𝑃 𝒘 = − 𝒙𝑖𝑦 𝑖

𝑖∈ℳ
Batch Perceptron converges in finite number of steps for linearly
separable data:

Initialize 𝒘
Repeat
𝒘 = 𝒘 + 𝜂 𝑖∈ℳ 𝒙 𝑖 𝑦 𝑖

Until 𝜂 𝑖∈ℳ 𝒙 𝑖 𝑦 𝑖 < 𝜃

21
Stochastic gradient descent for Perceptron
 Single-sample perceptron:
 If 𝒙(𝑖) is misclassified:
𝒘𝑡+1 = 𝒘𝑡 + 𝜂𝒙(𝑖) 𝑦 (𝑖)

 Perceptron convergence theorem: for linearly separable data

 If training data are linearly separable, the single-sample perceptron is
also guaranteed to find a solution in a finite number of steps
Fixed-Increment single sample Perceptron
Initialize 𝒘, 𝑡 ← 0
repeat
𝜂 can be set to 1 and
proof still works
𝑡 ←𝑡+1
𝑖 ← 𝑡 mod 𝑁
if 𝒙(𝑖) is misclassified then
𝒘 = 𝒘 + 𝒙(𝑖) 𝑦 (𝑖)
22 Until all patterns properly classified
Example

23
Perceptron: Example

Change 𝒘 in a direction
that corrects the error

24
[Bishop]
Convergence of Perceptron

[Duda, Hart & Stork, 2002]

 For data sets that are not linearly separable, the single-sample
perceptron learning algorithm will never converge

25
Pocket algorithm
 For the data that are not linearly separable due to noise:
 Keeps in its pocket the best 𝒘 encountered up to now.

Initialize 𝒘
for 𝑡 = 1, … , 𝑇
𝑖 ← 𝑡 mod 𝑁
if 𝒙(𝑖) is misclassified then
𝒘𝑛𝑒𝑤 = 𝒘 + 𝒙(𝑖) 𝑦 (𝑖)
if 𝐸𝑡𝑟𝑎𝑖𝑛 𝒘𝑛𝑒𝑤 < 𝐸𝑡𝑟𝑎𝑖𝑛 𝒘 then
𝒘 = 𝒘𝑛𝑒𝑤
end

𝑁
1
𝐸𝑡𝑟𝑎𝑖𝑛 𝒘 = 𝑠𝑖𝑔𝑛(𝒘𝑇 𝒙(𝑛) ) ≠ 𝑦 (𝑛)
𝑁
𝑛=1

26
Linear Discriminant Analysis (LDA)
 Fisher’s Linear Discriminant Analysis :
 Dimensionality reduction
 Finds linear combinations of features with large ratios of between-
groups scatters to within-groups scatters (as discriminant new
variables)
 Classification
 Predicts the class of an observation 𝒙 by first projecting it to the
space of discriminant variables and then classifying it in this space

27
Good Projection for Classification
 What is a good criterion?
 Separating different classes in the projected space

28
Good Projection for Classification
 What is a good criterion?
 Separating different classes in the projected space

29
Good Projection for Classification
 What is a good criterion?
 Separating different classes in the projected space

30
LDA Problem
 Problem definition:
 𝐶 = 2 classes
𝑁
 𝒙(𝑖) , 𝑦 (𝑖) 𝑖=1 training samples with 𝑁1 samples from the first class (𝒞1 )
and 𝑁2 samples from the second class (𝒞2 )
 Goal: finding the best direction 𝒘 that we hope to enable accurate
classification

 The projection of sample 𝒙 onto a line in direction 𝒘 is 𝒘𝑇 𝒙

 What is the measure of the separation between the projected

points of different classes?

31
Measure of Separation in the Projected Direction
 Is the direction of the line jointing the class means a good
candidate for 𝒘?

[Bishop]

32
Measure of Separation in the Projected
Direction
 The direction of the line jointing the class means is the
solution of the following problem:
 Maximizes the separation of the projected class means

max 𝐽 𝒘 = 𝜇1′ − 𝜇2′ 2

𝒘
s. t. 𝒘 = 1
𝒙(𝑖)
𝒙(𝑖) ∈𝒞1
𝜇1′ = 𝒘𝑇 𝝁1 𝝁1 =
𝑁1
𝒙(𝑖)
𝒙(𝑖) ∈𝒞2
𝜇2′ = 𝒘𝑇 𝝁2 𝝁2 =
𝑁2

 What is the problem with the criteria considering only

𝜇1′ − 𝜇2′ ?
 It does not consider the variances of the classes in the projected direction
33
LDA Criteria
 Fisher idea: maximize a function that will give
 large separation between the projected class means
 while also achieving a small variance within each class, thereby
minimizing the class overlap.

𝜇1′ − 𝜇2′ 2
𝐽 𝒘 = ′2
𝑠1 + 𝑠2′2

34
LDA Criteria
 The scatters of the original data are:
2
𝑠12 = 𝒙 𝑖 − 𝝁1
𝒙(𝑖) ∈𝒞1
2
𝑠22 = 𝒙 𝑖 − 𝝁2
𝒙(𝑖) ∈𝒞2

 The scatters of projected data are:

2
𝑠1′2 = 𝒘𝑇 𝒙 𝑖 − 𝑇
𝒘 𝝁1
𝒙(𝑖) ∈𝒞1
2
𝑠2′2 = 𝒘𝑇 𝒙 𝑖 − 𝑇
𝒘 𝝁1
𝒙(𝑖) ∈𝒞2

35
LDA Criteria
𝜇1′ − 𝜇2′ 2
𝐽 𝒘 = ′2
𝑠1 + 𝑠2′2

𝜇1′ − 𝜇2′ 2 = 𝒘𝑇 𝝁1 − 𝒘𝑇 𝝁2 2

= 𝒘𝑇 𝝁1 − 𝝁2 𝝁1 − 𝝁2 𝑇 𝒘

2
𝑠1′2 = 𝒘𝑇 𝒙 𝑖 𝑇
− 𝒘 𝝁1
𝒙(𝑖) ∈𝒞1
𝑇 𝑖 𝑖 𝑇
=𝒘 𝒙 − 𝝁1 𝒙 − 𝝁1 𝒘
𝒙(𝑖) ∈𝒞1

36
LDA Criteria
𝒘𝑇 𝑺 𝐵 𝒘
𝐽 𝒘 =
𝒘𝑇 𝑺𝑊 𝒘
Between-class 𝑇
scatter matrix
𝑺𝐵 = 𝝁1 − 𝝁2 𝝁1 − 𝝁2

Within-class 𝑺𝑊 = 𝑺1 + 𝑺2
scatter matrix

𝑖 𝑖 𝑇
𝑺1 = 𝒙 − 𝝁1 𝒙 − 𝝁1
𝒙(𝑖) ∈𝒞1
𝑖 𝑖 𝑇
𝑺2 = 𝒙 − 𝝁2 𝒙 − 𝝁2
𝒙(𝑖) ∈𝒞2

37 scatter matrix=N×covariance matrix

LDA Derivation

wT S B w
J (w )  T
w SW w

wT S B w wT SW w
 w SW w 
T
 wT S B w
J (w )
 w w 
 2S B w  wT
SW w   2SW w  wT
SB w
w  W   W 
2 2
T T
w S w w S w

J (w )
 0  S B w  SW w
w

38
LDA Derivation

If 𝑺𝑊 is full-rank
𝑺𝐵𝒘 = 𝜆𝑺𝑊𝒘 𝑺−1
𝑊 𝑺𝐵𝒘 = 𝜆𝒘

 𝑺𝐵𝒘 (for any vector 𝒘) points in the same direction as

𝝁1 − 𝝁2 :
𝑺𝐵𝒘 = 𝝁1 − 𝝁2 𝝁1 − 𝝁2 𝑇 𝒘 ∝ 𝝁1 − 𝝁2

𝒘 ∝ 𝑺−1
𝑊 𝝁1 − 𝝁2

 Thus, we can solve the eigenvalue problem immediately

39
LDA Algorithm
 Find 𝝁1 and 𝝁2 as the mean of class 1 and 2 respectively
 Find 𝑺1 and 𝑺2 as scatter matrix of class 1 and 2 respectively
 𝑺𝑊 = 𝑺1 + 𝑺2
 𝑺𝐵 = 𝝁1 − 𝝁2 𝝁1 − 𝝁2 𝑇

 Feature Extraction
 𝒘 = 𝑺−1
𝑤 𝝁1 − 𝝁2 as the eigenvector corresponding to the largest
eigenvalue of 𝑺−1
𝑤 𝑺𝑏
 Classification 𝝁2
 𝒘 = 𝑺−1
𝑤 𝝁1 − 𝝁2
 Using a threshold on 𝒘𝑇 𝒙, we can classify 𝒙 𝝁1

40
𝑥2

Multi-class classification
 Solutions to multi-category problems:
𝑥1
 Extend the learning algorithm to support multi-class:
 A function 𝑓𝑖 (𝒙) for each class 𝑖 is found
 𝑦 = argmax 𝑓𝑖 (𝒙) 𝒙 is assigned to class 𝐶𝑖 if 𝑓𝑖(𝒙) > 𝑓𝑗 (𝒙) 𝑗  𝑖
𝑖=1,…,𝑐

 Converting the problem to a set of two-class problems:

41
Converting multi-class problem to a set of
two-class problems
 “one versus rest” or “one against all”
 For each class 𝐶𝑖 , a linear discriminant function that separates
samples of 𝐶𝑖 from all the other samples is found.
 Totally linearly separable

 “one versus one”

 𝑐(𝑐 − 1)/2 linear discriminant functions are used, one to
separate samples of a pair of classes.
 Pairwise linearly separable

42
Multi-class classification
 One-vs-all (one-vs-rest) 𝑥2

𝑥1

𝑥2 𝑥2

𝑥1
𝑥1 𝑥2
Class 1:
Class 2:
Class 3:
43
𝑥
Multi-class classification
𝑥2
 One-vs-one

𝑥1
𝑥2
𝑥2

𝑥1
𝑥2
𝑥1
Class 1:
Class 2:
Class 3:
44
𝑥1
Multi-class classification: ambiguity
 Converting the multi-class problem to a set of two-class
problems can lead to regions in which the classification is
undefined

one versus rest one versus one

[Duda, Hart & Stork, 2002]

45
Multi-class classification: linear machine
 A discriminant function 𝑓𝑖 𝒙 = 𝒘𝑇𝑖 𝒙 + 𝑤𝑖0 for each class
𝒞𝑖 (𝑖 = 1, … , 𝐾):
 𝒙 is assigned to class 𝒞𝑖 if:
𝑓𝑖(𝒙) > 𝑓𝑗(𝒙) 𝑗  𝑖

 Decision surfaces (boundaries) can also be found using

discriminant functions
 Boundary of the contiguous ℛ𝑖 and ℛ𝑗 : ∀𝒙, 𝑓𝑖 𝒙 = 𝑓𝑗(𝒙)
𝑇
 𝒘𝑖 − 𝒘𝑗 𝒙 + 𝑤𝑖0 − 𝑤𝑗0 = 0

46
Multi-class classification: linear machine

[Duda, Hart & Stork, 2002]

47
Perceptron: multi-class
𝑦 = argmax 𝒘𝑇𝑖 𝒙
𝑖=1,…,𝑐
𝑇
𝑖
𝐽𝑃 𝑾 = − 𝒘𝑦 𝑖 − 𝒘𝑦 𝑖 𝒙
𝑖∈ℳ
ℳ: subset of training data that are misclassified
ℳ = 𝑖|𝑦 𝑖 ≠ 𝑦 (𝑖)
Initialize 𝑾 = 𝒘1 , … , 𝒘𝑐 , 𝑘 ← 0
repeat
𝑘 ← 𝑘 + 1 mod 𝑁
if 𝒙(𝑖) is misclassified then
𝒘𝑦 𝑖 = 𝒘𝑦 𝑖 − 𝒙(𝑖)
𝒘𝑦 𝑖 = 𝒘𝑦 𝑖 + 𝒙(𝑖)
Until all patterns properly classified
48
Resources
 C. Bishop, “Pattern Recognition and Machine Learning”,
Chapter 4.1.

Supervised Machine Learning
No ratings yet
Supervised Machine Learning
74 pages
Linear - Classification
No ratings yet
Linear - Classification
72 pages
Pattern Recognition (CSE4213) : Linear Discriminant Analysis (LDA)
No ratings yet
Pattern Recognition (CSE4213) : Linear Discriminant Analysis (LDA)
33 pages
Lecture 2 Math
No ratings yet
Lecture 2 Math
34 pages
Supervised Learning: Linear Methods (1/2) : Applied Multivariate Statistics - Spring 2012
No ratings yet
Supervised Learning: Linear Methods (1/2) : Applied Multivariate Statistics - Spring 2012
15 pages
Discriminant Functions
No ratings yet
Discriminant Functions
33 pages
1 An Introduction To Linear Classifiers
No ratings yet
1 An Introduction To Linear Classifiers
9 pages
LDA Tutorial
No ratings yet
LDA Tutorial
47 pages
Lda PDF
No ratings yet
Lda PDF
47 pages
Fisher Linear Discriminant Analysis: 1 What's LDA
No ratings yet
Fisher Linear Discriminant Analysis: 1 What's LDA
6 pages
Fishers LDA
No ratings yet
Fishers LDA
47 pages
Linear Models For Classification: Sumeet Agarwal, EEL709 (Most Figures From Bishop, PRML)
No ratings yet
Linear Models For Classification: Sumeet Agarwal, EEL709 (Most Figures From Bishop, PRML)
21 pages
Bayesian Classification
No ratings yet
Bayesian Classification
14 pages
Pattern Recognition Linear Classifier by Zaheer Ahmad
0% (1)
Pattern Recognition Linear Classifier by Zaheer Ahmad
37 pages
Linear Classifiers PPT 1
No ratings yet
Linear Classifiers PPT 1
14 pages
Introduction To Machine Learning Lecture 3: Linear Classification Methods
No ratings yet
Introduction To Machine Learning Lecture 3: Linear Classification Methods
40 pages
Reviewed - IJAMSS - Equivalence of Fisher Discriminant Analysis and Least Square
No ratings yet
Reviewed - IJAMSS - Equivalence of Fisher Discriminant Analysis and Least Square
11 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
46 pages
Detailed Linear Discriminant Functions Notes
No ratings yet
Detailed Linear Discriminant Functions Notes
2 pages
Support Vector Machines
No ratings yet
Support Vector Machines
57 pages
Machine Learning - Lecture 5
No ratings yet
Machine Learning - Lecture 5
19 pages
PRu 4
No ratings yet
PRu 4
13 pages
C30 C35 LinearModelForClassification
No ratings yet
C30 C35 LinearModelForClassification
50 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Ai and ML
No ratings yet
Ai and ML
16 pages
Linear Discriminant Functions: CS479/679 Pattern Recognition Dr. George Bebis
No ratings yet
Linear Discriminant Functions: CS479/679 Pattern Recognition Dr. George Bebis
41 pages
Lec 9
No ratings yet
Lec 9
15 pages
Lda1 PDF
No ratings yet
Lda1 PDF
15 pages
Linear and Quadratic Discriminant Analysis: Tutorial: Benyamin Ghojogh
No ratings yet
Linear and Quadratic Discriminant Analysis: Tutorial: Benyamin Ghojogh
16 pages
Fischer LDA
No ratings yet
Fischer LDA
8 pages
4.10 Fisher Linear Discriminant: Chapter 4. Nonparametric Techniques
No ratings yet
4.10 Fisher Linear Discriminant: Chapter 4. Nonparametric Techniques
8 pages
ML Unit4
No ratings yet
ML Unit4
44 pages
cs188 Fa23 Note21
No ratings yet
cs188 Fa23 Note21
8 pages
Lec 13
No ratings yet
Lec 13
16 pages
4 - SVM
No ratings yet
4 - SVM
58 pages
Lec 7
No ratings yet
Lec 7
21 pages
Objectives:: Linear Discriminant Analysis
No ratings yet
Objectives:: Linear Discriminant Analysis
10 pages
FDA Class 2025
No ratings yet
FDA Class 2025
29 pages
n9 PDF
No ratings yet
n9 PDF
6 pages
10 SVM
No ratings yet
10 SVM
77 pages
ML Unit4
No ratings yet
ML Unit4
41 pages
cs221 Lecture11
No ratings yet
cs221 Lecture11
71 pages
Machine Learning Unit 4
No ratings yet
Machine Learning Unit 4
28 pages
CBM342 BCI Unit IV
No ratings yet
CBM342 BCI Unit IV
22 pages
Introduction To Support Vector Machines: Andrew Moore CMU
No ratings yet
Introduction To Support Vector Machines: Andrew Moore CMU
40 pages
Unit-4 Part-1 ML Ai&Ml r23
No ratings yet
Unit-4 Part-1 ML Ai&Ml r23
20 pages
Lecture Notes On Pattern Recognition and Image Processing
No ratings yet
Lecture Notes On Pattern Recognition and Image Processing
24 pages
CpE646 9v3 PDF
No ratings yet
CpE646 9v3 PDF
45 pages
9 - Linear Discriminant Analysis
No ratings yet
9 - Linear Discriminant Analysis
19 pages
Linear Discriminat Analysis
No ratings yet
Linear Discriminat Analysis
23 pages
Intro SVM New Example PDF
100% (1)
Intro SVM New Example PDF
56 pages
Lec-04 - Linear Discriminant Analysis
No ratings yet
Lec-04 - Linear Discriminant Analysis
23 pages
Support Vector Machine Classifiers
No ratings yet
Support Vector Machine Classifiers
44 pages
B22CS014 Report
No ratings yet
B22CS014 Report
11 pages
Lec 9
No ratings yet
Lec 9
15 pages
ML 41
No ratings yet
ML 41
49 pages
SVM Class
No ratings yet
SVM Class
33 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Limits and Continuity (Calculus) Engineering Entrance Exams Question Bank
From Everand
Limits and Continuity (Calculus) Engineering Entrance Exams Question Bank
Mohmmad Khaja Shareef
No ratings yet
Cambridge International AS Level: English General Paper 8021/21
No ratings yet
Cambridge International AS Level: English General Paper 8021/21
10 pages
Cis CM Exam Blueprint
No ratings yet
Cis CM Exam Blueprint
5 pages
Paypal User Agreement: About Your Account
No ratings yet
Paypal User Agreement: About Your Account
85 pages
Gecu 101
No ratings yet
Gecu 101
6 pages
1 English W3 Q1
No ratings yet
1 English W3 Q1
10 pages
Black Cat
No ratings yet
Black Cat
9 pages
Jurnal Cuu
No ratings yet
Jurnal Cuu
8 pages
KASUS 1 Advanced Stage Oral Cancer.
No ratings yet
KASUS 1 Advanced Stage Oral Cancer.
37 pages
Whirlpool Marketing Project Manjeet
0% (1)
Whirlpool Marketing Project Manjeet
55 pages
The Rise of Youth Culture in The 1950
No ratings yet
The Rise of Youth Culture in The 1950
4 pages
Beyoncé BOW
No ratings yet
Beyoncé BOW
5 pages
Maksud Nama Husna
No ratings yet
Maksud Nama Husna
1 page
Mapeh - Summative - Q3-Week1-4
No ratings yet
Mapeh - Summative - Q3-Week1-4
3 pages
Test Unit 11,12-8th Form Enterprise
No ratings yet
Test Unit 11,12-8th Form Enterprise
1 page
Emsisoft Howto Diavol
No ratings yet
Emsisoft Howto Diavol
4 pages
Asymmetric Info
No ratings yet
Asymmetric Info
3 pages
How Do Leaders Get Selected Paper
No ratings yet
How Do Leaders Get Selected Paper
33 pages
Should P N Differently in The Id
No ratings yet
Should P N Differently in The Id
9 pages
Acet Essay
100% (1)
Acet Essay
3 pages
12 Ways To Find Facebook Gold
100% (1)
12 Ways To Find Facebook Gold
24 pages
High Quality Food of Asia Featuring Authentic Recipes From Master Chefs Instant Download
100% (13)
High Quality Food of Asia Featuring Authentic Recipes From Master Chefs Instant Download
15 pages
Analysis of Level Practice Satisfaction On Basic Chemical Laboratory Performance Faculty of Mipa Jenderal Soedirman University
No ratings yet
Analysis of Level Practice Satisfaction On Basic Chemical Laboratory Performance Faculty of Mipa Jenderal Soedirman University
11 pages
III Unit 5c Where's Alvin
No ratings yet
III Unit 5c Where's Alvin
3 pages
Say Tell or Ask
No ratings yet
Say Tell or Ask
2 pages
Alkaloid - (Chemical Compound) : Families of Flowering Plants
No ratings yet
Alkaloid - (Chemical Compound) : Families of Flowering Plants
6 pages
Chapter - 20 From The Book of Damodar N Gujarati
No ratings yet
Chapter - 20 From The Book of Damodar N Gujarati
42 pages
Fukkm 1 2020 PDF
No ratings yet
Fukkm 1 2020 PDF
346 pages
Institutional Format For Research Paper
No ratings yet
Institutional Format For Research Paper
31 pages
Top Dates Dry Dates Exporters of Pakistan
No ratings yet
Top Dates Dry Dates Exporters of Pakistan
2 pages
Section 13 in The Family Courts Act
No ratings yet
Section 13 in The Family Courts Act
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Linear Classifiers

Uploaded by

Linear Classifiers

Uploaded by

Linear classifiers

CE-717: Machine Learning

 Goal: Given an input 𝒙, assign it to one of 𝐾 classes

 A popular way of representing a classifier

 Assumption: the classes are taken to be disjoint

 Thus, we can easily divide the feature space into 𝐾 decision

 Decision surfaces (or boundaries) can also be found using

 For two-category problem, we can only find a function 𝑓 ∶ ℝd

 First, we explain two-category classification problem and then

 Why linear classifier?

Decision surface (boundary): 𝒘𝑇 𝒙 + 𝑤0 = 0

𝝓 𝒙 = [1, 𝒙1 , 𝒙2 , 𝒙12 , 𝒙22 , 𝒙1 𝒙2 ]

 Criterion or cost functions for classification:

SSE cost function is not suitable for classification:

 Is it more suitable if we set 𝑓 𝒙; 𝒘 = 𝑔 𝒘𝑇 𝒙 ?

 𝐽 𝒘 is a piecewise constant function shows the number

 Goal: ∀𝑖, 𝒙(𝑖) ∈ 𝐶1 ⇒ 𝒘𝑇 𝒙(𝑖) > 0

ℳ: subset of training data that are misclassified

Many solutions? Which solution among them?

There may be many solutions in these cost functions

20 [Duda, Hart, and Stork, 2002]

𝒘𝑡+1 = 𝒘𝑡 − 𝜂𝛻𝒘 𝐽𝑃 (𝒘𝑡 )

Until 𝜂 𝑖∈ℳ 𝒙 𝑖 𝑦 𝑖 < 𝜃

 Perceptron convergence theorem: for linearly separable data

[Duda, Hart & Stork, 2002]

 The projection of sample 𝒙 onto a line in direction 𝒘 is 𝒘𝑇 𝒙

 What is the measure of the separation between the projected

max 𝐽 𝒘 = 𝜇1′ − 𝜇2′ 2

 What is the problem with the criteria considering only

 The scatters of projected data are:

37 scatter matrix=N×covariance matrix

 𝑺𝐵𝒘 (for any vector 𝒘) points in the same direction as

 Thus, we can solve the eigenvalue problem immediately

 Converting the problem to a set of two-class problems:

 “one versus one”

one versus rest one versus one

[Duda, Hart & Stork, 2002]

 Decision surfaces (boundaries) can also be found using

[Duda, Hart & Stork, 2002]

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.