0% found this document useful (0 votes)

19 views61 pages

L5-Support Vector Machine

Uploaded by

Fahim Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views61 pages

L5-Support Vector Machine

Uploaded by

Fahim Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

Kernel Machine (Support Vector Machines)

Course 4232: Machine Learning

Dept. of Computer Science

Faculty of Science and Technology

Lecturer No: Week No: 4 (1X1.5 hrs) Semester: Summer 21-22

Instructor: Dr. M M Manjurul Islam (manjurul@aiub.edu)

Support-vector machines (SVMs)*

In ML, support-vector machines are supervised

learning models with associated learning algorithms

❖ that analyze data for classification and regression analysis.

❖ Developed at AT&T Bell Laboratories by Vladimir Vapnik with colleagues

(Boser et al., 1992, Guyon et al., 1993, Vapnik et al., 1997)

❖ SVMs are one of the most robust prediction methods, being based on
statistical learning frameworks or VC theory proposed by Vapnik (1982,
1995) and Chervonenkis (1974)

*wikipedia 2
Main Ideas

• Max-Margin Classifier
• Formalize notion of the best linear separator
• Kernels
• Projecting data into higher-dimensional space makes it linearly
separable
• Complexity
• Depends only on the number of training examples, not on
dimensionality of the kernel space!
Strength of SVMs

❖Good generalization
o in theory
o in practice
❖Works well with few training instances

❖Find globally best model

❖Eﬃcient algorithms

❖Amenable to the kernel trick

4
Tennis example

Temperature

Humidity
= play tennis
= do not play tennis
Linear Support Vector Machines

Data: <xi,yi>, i=1,..,l

xi  Rd
yi  {-1,+1}

=+1
=-1

x1
Linear SVM

Data: <xi,yi>, i=1,..,l

xi  R d
yi  {-1,+1}

f(x) =-1
=+1

All hyperplanes in Rd are parameterize by a vector (w) and a constant b.

Can be expressed as w•x+b=0 (remember the equation for a hyperplane
from algebra!)
Our aim is to find such a hyperplane f(x)=sign(w•x+b), that
correctly classify our data.
Definitions
Define the hyperplane H such that:
xi•w+b  +1 when yi =+1 H1
xi•w+b  -1 when yi =-1
H2
H1 and H2 are the planes: d+
H1: xi•w+b = +1
H2: xi•w+b = -1 d-
The points on the planes H1 H
and H2 are the Support
Vectors

d+ = the shortest distance to the closest positive point

d- = the shortest distance to the closest negative point

The margin of a separating hyperplane is d+ + d-.
Maximizing the margin
We want a classifier with as big margin as possible.

H1
H
H2
Recall the distance from a point(x0,y0) to a line: d+
Ax+By+c = 0 is|A x0 +B y0 +c|/sqrt(A2+B2) d-
The distance between H and H1 is:
|w•x+b|/||w||=1/||w||

The distance between H1 and H2 is: 2/||w||

In order to maximize the margin, we need to minimize ||w||. With the

condition that there are no datapoints between H1 and H2:
xi•w+b  +1 when yi =+1
xi•w+b  -1 when yi =-1 Can be combined into yi(xi•w)  1
Notations

❖To better match notation used in SVMs and to make

matrix formulas simpler

10
Linear Separators

❖ Training instances

𝑥 ∈ ℝ𝑑+1 , 𝑥0 = 1
𝑦 ∈ {−1,1}

Recall:
❖ Model parameters Inner (dot) product:
u , v = u.v = u T v
𝜃∈ ℝ𝑑+1
=  ui vi
❖ Hyperplane i

T x = ,x = 0

❖ Decision function X = [1, 2, 4.3, 5.5, ]

h( x) = sign( T x) = sign(  , x )

11
Intuitions

12
Intuitions

13
Intuitions

14
Intuitions

15
A “Good” Separator

16
Noise in the Observations

17
Ruling Out Some Separators

18
Lots of Noise

19
Only One Separator Remains

20
Maximizing the Margin

21
“Fat” Separators

22
“Fat” Separators

23
Why Maximize Margin

❖Increasing margin reduces capacity

o i.e., fewer possible models

❖Lesson from Learning Theory:

o If the following holds:
• H is suﬃciently constrained in size
• and/or the size of the training data set n is large,
then low training error is likely to be evidence of low
generalization error

24
Alternative View of Logistic Regression

1
h ( x) = − T x
1+ e

❖ If y = 1 , we want h ( x)  1, T x 0

❖ If y = 0 , we want h ( x)  1, x
T
0

25
Alternative View of Logistic Regression
Cost of example: yi log h ( xi ) − (1 − yi )log(1 − h ( xi ))

1
h ( x) = z =T x
1 + e − x
T

y = 1 (want  y = 0 (want 
T T
If x 0): If x 0 ):

26
Logistic Regression to SVM
❖ Logistic Regression:

n
 d 2
min − [yi cos t1 ( x i ) + (1 − yi )cos t 0 ( x i )] +   j
T T
 2 1
1

❖ Support Vector Machines:

n
1 d 2
min − C [ yi cos t1 ( xi ) + (1 − yi )cos t0 ( xi )] +  j
T T
 2 1
1

1
❖ 𝐶 is similar to 

27
Support Vector Machine

n
 d 2
min − [yi cos t1 ( x i ) + (1 − yi )cos t 0 ( x i )] +   j
T T
 2 1
1

28
Support Vector Machine

29
Maximum Margin Hyperplane

30
Maximum Margin Hyperplane

31
Large Margin Classifier in Presence of Outliers

32
Vector Inner Product

33
Understanding the Hyperplane

Assume θ0 = 0 so that the hyperplane is

centered at the origin, and that d= 2

34
Maximizing the Margin

Assume θ0 = 0 so that the

hyperplane is centered at
the origin, and that d = 2

35
Size of the Margin
❖ For the support vectors, we have p  2
= 1
o p is the length of the projection of the SVs onto θ

36
What if surface is Non-Linear

37
Structure of SVMs

38
Kernel Methods
When Linear Separators Fail

40
Mapping into a new Feature Space

❖ For example, with 𝑥𝑖 ∈ ℝ2

❖ Rather than run SVM on xi, run it on  ( x)

o Find non-‐linear separator in input space

❖ What if  ( x) is really big?

❖ Use kernels to compute it implicitly!

41
Kernels

❖ Find kernel K such that

K ( xi , x j ) = ( xi ), ( x j )

❖ Computing K ( x i , x j ) should be eﬃcient,much more so than

computing 𝜃(𝑥𝑖 )and 𝜃(𝑥𝑗 ).

❖ Use K ( x i , xj) in SVM algorithm rather than 𝑥𝑖 , 𝑥𝑗

❖ Remarkably, this is possible!

42
The Polynomial Kernel

❖Let

❖Consider the following function:

4
The Polynomial Kernel
d
Given by K ( xi , x j ) = xi , x j

o  ( x) contains all the monomials of degree 𝑑

❖ Useful in visual pattern recognition

Example:
• 16x16 pixel image
• 1010 monomials of degree 5
• Never explicitly compute  ( x) !

❖ Variation: K ( xi , x j ) = ( xi , x j + 1) d

o Adds all lower‐order monomials (degrees 1,...,d )!

44
The Kernel Trick

“Given an algorithm which is formulated in terms of a

positive deﬁnite kernel K1, one can construct an alternative
algorithm by replacing K1 with another positive
deﬁnite kernel K2”

❖SVMs can use the kernel trick

45
Incorporating Kernels into SVM

46
The Gaussian Kernel
❖ Also called Radial Basis Function (RBF) kernel

2
xi − x j
K ( xi , x j ) = exp( − 2
)
2 2

o Has value 1 when x i = x j

o Value falls oﬀ to 0 with increasing distance
o Note: Need to do feature scaling before using Gaussian Kernel

47
Gaussian Kernel Example

48
Gaussian Kernel Example

49
Gaussian Kernel Example

50
Gaussian Kernel Example

51
Other Kernels
❖ Sigmoid Kernel

K ( xi , x j ) = tanh( xiT x j + c)

o Neural networks use sigmoid as activation function

o SVM with a sigmoid kernel is equivalent to 2‐layer perceptron

❖ Cosine Similarity Kernel

xiT x j
K ( xi , x j ) =
xi x j

o Popular choice for measuring similarity of text documents

o L2 norm projects vectors onto the unit sphere; their dot product is the
cosine of the angle between the vectors

52
Other Kernels
❖ Chi-‐squared Kernel

( xik − xij ) 2
K ( xi , x j ) = exp( −  )
k xik + x jk

o Widely used in computer vision applications

o Chi‐squared measures distance between probability
distributions
o Data is assumed to be non-‐negative, often with L1 norm of 1

❖ String kernels
❖ Tree kernels
❖ Graph kernels

53
Practical Advice for Applying SVMs
❖ Use SVM software package to solve for parameters
o e.g., SVMlight, libsvm, cvx (fast!), etc.

• Need to specify:
o Choice of parameter C
o Choice of kernel function
• Associated kernel parameters

54
Multi-Class Classification with SVMs

y {1,..., K }

❖ Many SVM packages already have multi-‐class classiﬁcation built in

❖ Otherwise, use one-vs-rest
o Train K SVMs, each picks out one class from rest, yielding  (1) ,..., ( K )

o Predict class i with largest ( (1) )T x

55
4
SVMS in Practice
A Demo
• https://www.csie.ntu.edu.tw/~cjlin/libsvm/

57
SVM summary
• SVMs were originally proposed by Boser, Guyon and Vapnik in 1992 and
gained increasing popularity in late 1990s.
• SVMs represent a general methodology for many PR problems:
classification, regression, feature extraction, clustering, novelty detection,
etc.
• SVMs can be applied to complex data types beyond feature vectors (e.g.,
graphs, sequences, relational data) by designing kernel functions for such
data.
• SVM techniques have been extended to a number of tasks such as
regression [Vapnik et al. ’97], principal component analysis [Schölkopf et
al. ’99], etc.
• Most popular optimization algorithms for SVMs use decomposition to hill-
climb over a subset of αi’s at a time, e.g., SMO [Platt ’99] and [Joachims
’99]
Advantages of SVMs
• There are no problems with local minima, because the solution is a
Qaudratic Programming problem
• The optimal solution can be found in polynomial time
• There are few model parameters to select: the penalty term C, the kernel
function and parameters (e.g., spread σ in the case of RBF kernels)
• The final results are stable and repeatable (e.g., no random initial weights)
• The SVM solution is sparse; it only involves the support vectors
• SVMs rely on elegant and principled learning methods
• SVMs provide a method to control complexity independently of
dimensionality
• SVMs have been shown (theoretically and empirically) to have excellent
generalization capabilities
• Software
• SVMlight, by Joachims, is one of the most widely used SVM classification and
regression package. Distributed as C++ source and binaries for Linux, Windows,
Cygwin, and Solaris. Kernels: polynomial, radial basis function, and neural (tanh).

• LibSVM http://www.csie.ntu.edu.tw/~cjlin/libsvm/ LIBSVM (Library for Support

Vector Machines), is developed by Chang and Lin; also widely used. Developed in
C++ and Java, it supports also multi-class classification, weighted SVM for
unbalanced data, cross-validation and automatic model selection. It has interfaces
for Python, R, Splus, MATLAB, Perl, Ruby, and LabVIEW. Kernels: linear,
polynomial, radial basis function, and neural (tanh).

• https://scikit-learn.org/stable/modules/svm.html in Python
References

• http://www.kernel-machines.org/

• http://www.support-vector.net/
• AN INTRODUCTION TO SUPPORT VECTOR MACHINES
(and other kernel-based learning methods)
N. Cristianini and J. Shawe-Taylor Cambridge University Press 2000 ISBN: 0 521 78019 5

• Papers by Vapnik
C.J.C. Burges: A tutorial on Support Vector Machines. Data Mining and
Knowledge Discovery 2:121-167, 1998.

Yt Ys 624 828 1232
No ratings yet
Yt Ys 624 828 1232
257 pages
Unit 2 PPT - Part 2
100% (1)
Unit 2 PPT - Part 2
81 pages
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Module 3 ML 24
No ratings yet
Module 3 ML 24
65 pages
Event Management System - FINAL
100% (1)
Event Management System - FINAL
39 pages
Fault Analysis and Repairs NEC Tachographs/ EC Tachographs 1318
100% (1)
Fault Analysis and Repairs NEC Tachographs/ EC Tachographs 1318
264 pages
S V M (SVM) : Upport Ector Achine
No ratings yet
S V M (SVM) : Upport Ector Achine
67 pages
Support Vector Machines: Theory, Implementation, and Applications
No ratings yet
Support Vector Machines: Theory, Implementation, and Applications
40 pages
Support Vector Machine: With Python Code
No ratings yet
Support Vector Machine: With Python Code
21 pages
Read BMR, Stay Ahead! - Building Material Reporter Magazine
No ratings yet
Read BMR, Stay Ahead! - Building Material Reporter Magazine
82 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
Unit II 2.2 ML Kernel Machines SVM
No ratings yet
Unit II 2.2 ML Kernel Machines SVM
50 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
103 pages
CRO Checklist For Dropshippers 2020
50% (2)
CRO Checklist For Dropshippers 2020
8 pages
Support Vector Machines
No ratings yet
Support Vector Machines
43 pages
Lecture09 SVM Intro, Kernel Trick (Updated)
No ratings yet
Lecture09 SVM Intro, Kernel Trick (Updated)
36 pages
Unit - 2
No ratings yet
Unit - 2
15 pages
Lecture 15
No ratings yet
Lecture 15
35 pages
Unit-III - SVM
No ratings yet
Unit-III - SVM
105 pages
Support Vector Machine: Abinas Panda
No ratings yet
Support Vector Machine: Abinas Panda
52 pages
W12 SVM
No ratings yet
W12 SVM
52 pages
Lecture - 7 Classification (SVM)
No ratings yet
Lecture - 7 Classification (SVM)
48 pages
RFQ 166, Mafi
No ratings yet
RFQ 166, Mafi
1 page
SVM
No ratings yet
SVM
11 pages
Linear Regression & SVM
No ratings yet
Linear Regression & SVM
33 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
Youth Parliament Guide
No ratings yet
Youth Parliament Guide
21 pages
Unit - 2-1
No ratings yet
Unit - 2-1
7 pages
The Emergence of Cooperation Among Egoists
No ratings yet
The Emergence of Cooperation Among Egoists
13 pages
Ain3001 - 04 - Support - Vector.machines
No ratings yet
Ain3001 - 04 - Support - Vector.machines
50 pages
SVM Presentation
No ratings yet
SVM Presentation
19 pages
Support Vector Machine: Prof. Subodh Kumar Mohanty
No ratings yet
Support Vector Machine: Prof. Subodh Kumar Mohanty
52 pages
22-Kernel Tricks Shit
No ratings yet
22-Kernel Tricks Shit
43 pages
Final - Support Vector Machine - Class - Modifie
No ratings yet
Final - Support Vector Machine - Class - Modifie
69 pages
The Good Life 1
No ratings yet
The Good Life 1
17 pages
D1 Minimum Connectors - Prim's
No ratings yet
D1 Minimum Connectors - Prim's
28 pages
SVM
No ratings yet
SVM
12 pages
A Risk That Paid Off
No ratings yet
A Risk That Paid Off
5 pages
SVM Theory
No ratings yet
SVM Theory
7 pages
Module10 - Support Vector Machine
No ratings yet
Module10 - Support Vector Machine
23 pages
ML Lec9 SVM
No ratings yet
ML Lec9 SVM
32 pages
Support Vector Machine (SVM) Algorithm
No ratings yet
Support Vector Machine (SVM) Algorithm
9 pages
Thesis of Watching TV Makes You Smarter
100% (3)
Thesis of Watching TV Makes You Smarter
7 pages
Unit 2 - SVM - 241016 - 104220
No ratings yet
Unit 2 - SVM - 241016 - 104220
47 pages
SVM
No ratings yet
SVM
43 pages
SML Unit 4
No ratings yet
SML Unit 4
61 pages
Kernel Method and Support Vector Machines: Nguyen Duc Dung, Ph.D. Ioit, Vast
No ratings yet
Kernel Method and Support Vector Machines: Nguyen Duc Dung, Ph.D. Ioit, Vast
34 pages
Lec5 Support Vector Machine
No ratings yet
Lec5 Support Vector Machine
28 pages
GBI RNC Residential Tool V3.0
No ratings yet
GBI RNC Residential Tool V3.0
19 pages
Ann Unit III
No ratings yet
Ann Unit III
20 pages
Support Vector Machine
No ratings yet
Support Vector Machine
40 pages
Machine Learning Unit-3.3
No ratings yet
Machine Learning Unit-3.3
38 pages
Reup 11 13
No ratings yet
Reup 11 13
3 pages
Summary Google Workspace vs. Office 365
No ratings yet
Summary Google Workspace vs. Office 365
2 pages
Support Vector Machine
No ratings yet
Support Vector Machine
8 pages
13.1 Support Vector Machine
No ratings yet
13.1 Support Vector Machine
28 pages
Grade 7 Prayer
No ratings yet
Grade 7 Prayer
13 pages
Unit 2
No ratings yet
Unit 2
47 pages
Assessment Checklist: Stage 1 Eal A: Name . Assignment 1: Interactive Study: Interview Report and Evaluation
No ratings yet
Assessment Checklist: Stage 1 Eal A: Name . Assignment 1: Interactive Study: Interview Report and Evaluation
2 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
Data Classification Using Support Vector Machine: Durgesh K. Srivastava, Lekha Bhambhu
No ratings yet
Data Classification Using Support Vector Machine: Durgesh K. Srivastava, Lekha Bhambhu
7 pages
Ash N Total Manure1
No ratings yet
Ash N Total Manure1
8 pages
Support Vector Machine
No ratings yet
Support Vector Machine
31 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
Development Strategy of HSBC
100% (3)
Development Strategy of HSBC
7 pages
Report On Grameenphone Ltd. - Strategy Management of Grameenphone
No ratings yet
Report On Grameenphone Ltd. - Strategy Management of Grameenphone
15 pages
SVM Notes Unit 4
No ratings yet
SVM Notes Unit 4
8 pages
Support Vector Machines
No ratings yet
Support Vector Machines
16 pages
Slack Byte and Structure Padding in Structures
No ratings yet
Slack Byte and Structure Padding in Structures
3 pages
Lab 6 Dsa
No ratings yet
Lab 6 Dsa
15 pages
Another Introduction SVM
No ratings yet
Another Introduction SVM
4 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
Software and Hardware Sheet Answers
No ratings yet
Software and Hardware Sheet Answers
4 pages
SVM Presentation
No ratings yet
SVM Presentation
27 pages
SVM Tutorial
100% (1)
SVM Tutorial
34 pages
Open Elective List EVEN SEM
No ratings yet
Open Elective List EVEN SEM
2 pages
Quiz - 03 - Self Review Quiz - Acad - Sle201v15 Courseware - SUSE Academy
No ratings yet
Quiz - 03 - Self Review Quiz - Acad - Sle201v15 Courseware - SUSE Academy
3 pages
Manual Ni Usrp 2920
100% (1)
Manual Ni Usrp 2920
56 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
This Is
No ratings yet
This Is
7 pages
Corporal Works of Mercy
No ratings yet
Corporal Works of Mercy
5 pages
Fees 2024
No ratings yet
Fees 2024
2 pages
SVM Basics Paper
No ratings yet
SVM Basics Paper
7 pages
Activity Sheet Module 1 - Lesson 2
No ratings yet
Activity Sheet Module 1 - Lesson 2
3 pages
Parallel Installation Guide: 4KVA/5KVA
No ratings yet
Parallel Installation Guide: 4KVA/5KVA
6 pages
Larry Dossey - HealingBeyondtheBody
No ratings yet
Larry Dossey - HealingBeyondtheBody
2 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

L5-Support Vector Machine

Uploaded by

L5-Support Vector Machine

Uploaded by

Kernel Machine (Support Vector Machines)

Course 4232: Machine Learning

Dept. of Computer Science

Lecturer No: Week No: 4 (1X1.5 hrs) Semester: Summer 21-22

Instructor: Dr. M M Manjurul Islam (manjurul@aiub.edu)

In ML, support-vector machines are supervised

❖ that analyze data for classification and regression analysis.

❖ Developed at AT&T Bell Laboratories by Vladimir Vapnik with colleagues

❖Find globally best model

❖Amenable to the kernel trick

Data: <xi,yi>, i=1,..,l

Data: <xi,yi>, i=1,..,l

All hyperplanes in Rd are parameterize by a vector (w) and a constant b.

d+ = the shortest distance to the closest positive point

d- = the shortest distance to the closest negative point

The distance between H1 and H2 is: 2/||w||

In order to maximize the margin, we need to minimize ||w||. With the

❖To better match notation used in SVMs and to make

❖ Decision function X = [1, 2, 4.3, 5.5, ]

❖Increasing margin reduces capacity

❖Lesson from Learning Theory:

❖ Support Vector Machines:

Assume θ0 = 0 so that the hyperplane is

Assume θ0 = 0 so that the

❖ For example, with 𝑥𝑖 ∈ ℝ2

❖ Rather than run SVM on xi, run it on  ( x)

❖ What if  ( x) is really big?

❖ Find kernel K such that

❖ Computing K ( x i , x j ) should be eﬃcient,much more so than

❖ Use K ( x i , xj) in SVM algorithm rather than 𝑥𝑖 , 𝑥𝑗

❖ Remarkably, this is possible!

❖Consider the following function:

o  ( x) contains all the monomials of degree 𝑑

❖ Useful in visual pattern recognition

o Adds all lower‐order monomials (degrees 1,...,d )!

“Given an algorithm which is formulated in terms of a

❖SVMs can use the kernel trick

o Has value 1 when x i = x j

o Neural networks use sigmoid as activation function

❖ Cosine Similarity Kernel

o Popular choice for measuring similarity of text documents

o Widely used in computer vision applications

❖ Many SVM packages already have multi-‐class classiﬁcation built in

o Predict class i with largest ( (1) )T x

• LibSVM http://www.csie.ntu.edu.tw/~cjlin/libsvm/ LIBSVM (Library for Support

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.