0% found this document useful (0 votes)
63 views

Support Vector Machine

Support vector machines (SVMs) are a type of supervised machine learning model used for classification and regression analysis. SVMs find a hyperplane in a high-dimensional space that distinctly classifies data points. The hyperplane is chosen so that the distance between it and the nearest data points of any class is maximized. These nearest data points are called support vectors. SVMs can efficiently perform nonlinear classification using what is called the kernel trick, implicitly mapping inputs into high-dimensional feature spaces. Common kernels include polynomial and radial basis function kernels. The SVM optimization problem can be solved using quadratic programming to find the hyperplane that maximizes margin and minimizes classification errors.

Uploaded by

Shalini Singhal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views

Support Vector Machine

Support vector machines (SVMs) are a type of supervised machine learning model used for classification and regression analysis. SVMs find a hyperplane in a high-dimensional space that distinctly classifies data points. The hyperplane is chosen so that the distance between it and the nearest data points of any class is maximized. These nearest data points are called support vectors. SVMs can efficiently perform nonlinear classification using what is called the kernel trick, implicitly mapping inputs into high-dimensional feature spaces. Common kernels include polynomial and radial basis function kernels. The SVM optimization problem can be solved using quadratic programming to find the hyperplane that maximizes margin and minimizes classification errors.

Uploaded by

Shalini Singhal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Support Vector Machine

1
2
3
4
5
6
7
8
9
10
11
12
13
Classification: Definition
• Given a collection of records (training set )
– Each record contains a set of attributes, one of the attributes is
the class label.

• Find a model for class attribute as a function of


the values of other attributes. q : X  Y

• Goal: previously unseen records should be


assigned a class as accurately as possible.

14
Classification Example
height

 x1 
x   x X  2
weight  2
Training examples {( x1 , y1 ),  , (x l , yl )}

Linear classifier: x2 yH

H if (w x) b  0
q(x)  
 J if (w x) b  0
y  J (w x) b  0
15
x1
Linear Classifiers

f(x,w,b) = sign(w x + b)
denotes +1
denotes -1

Any of these
would be fine..

..but which is
best?

16
Support Vector Machine
x+ M=Margin Width
Support Vectors
are those
datapoints that
X-
the margin
pushes up
against

What we know:
 
• w . x+ + b = +1 (x  x )  w 2
• w . x- + b = -1 M  
• w . (x+-x-) = 2 w w

17
Linear SVM Mathematically
• Goal: 1) Correctly classify all training data
wx i  b  1 if yi = +1

wx i  b  1 If yi = -1
yi ( wxi  b)  1 for all i

2
2) Maximize the Margin M 
w
1 t
same as minimize ww
2
• We can formulate a Quadratic Optimization Problem and solve for w and b

1 t
Minimize  ( w)  w w
2
subject to
yi (wxi b) 1 i
18
Linear SVM. Cont.
• Requiring the derivatives with respect to w,b to vanish yields:
m
1 m m
maximize


i 1
 i  
2 i 1 j 1
 
i j y i   j   i   j 
y x ,x

m
Subject to :  i 0
 y
i 1
i 

 i  0 i
• KKT conditions yield:

for any  i  0, b  y i   w, x i 
• Where:
m
w  i y x  i  i 
i 1 19
Linear SVM. Cont.
• The resulting separating function is:
m
f  x     i y i  x  i  , x  b
i 1

• If f(x) >0, x will be assigned class label +1 else -1

20
Linear SVM. Cont.
• The resulting separating function is:
m
f  x     i y i  x  i  , x  b
i 1

• Notes:
– The points with α=0 do not affect the solution.
– The points with α≠0 are called support vectors.
– The equality conditions hold true only for the
Support Vectors.
21
Non-linear SVMs: Feature spaces
• General idea: the original feature space can always be
mapped to some higher-dimensional feature space
where the training set is separable:

Φ: x → φ(x)

22
Non Linear SVM
• Note that the training data appears in the solution only in inner
products.

• If we pre-map the data into a higher and sparser space we can


get more separability and a stronger separation family of
functions.

• The pre-mapping might make the problem infeasible.

• We want to avoid pre-mapping and still have the same


separation ability.

• Suppose we have a simple function that operates on two training


points and implements an inner product of their pre-mappings,
then we achieve better separation with no added cost. 23
24
25
26
27
28
The “Kernel Trick”
• The linear classifier relies on inner product between vectors K(xi,xj)=xiTxj
• If every datapoint is mapped into high-dimensional space via some
transformation Φ: x → φ(x), the inner product becomes:
K(xi,xj)= φ(xi) Tφ(xj)
• A kernel function is a function that is equivalent to an inner product in some
feature space.
• Example:
2-dimensional vectors x=[x1 x2]; let K(xi,xj)=(1 + xiTxj)2,
Need to show that K(xi,xj)= φ(xi) Tφ(xj):
K(xi,xj)=(1 + xiTxj)2,= 1+ xi12xj12 + 2 xi1xj1 xi2xj2+ xi22xj22 + 2xi1xj1 + 2xi2xj2=
= [1 xi12 √2 xi1xi2 xi22 √2xi1 √2xi2]T [1 xj12 √2 xj1xj2 xj22 √2xj1 √2xj2]
=
= φ(xi) Tφ(xj), where φ(x) = [1 x12 √2 x1x2 x22 √2x1 √2x2]
• Thus, a kernel function implicitly maps data to a high-dimensional space
(without the need to compute each φ(x) explicitly).
29
Mercer Kernels
• A Mercer kernel is a function: k:XdXd R
for which there exists a function: : Xd  H
such that:
x, y  X d k ( x, y )   ( x),  ( y )
• A function k(.,.) is a Mercer kernel if
for any function g(.), such that:
 ( x)dx  
2
g

the following holds true:


  g ( x) g ( y)k ( x, y)dxdy  0
30
Commonly used Mercer Kernels
k ( x , y )   x, y  p
• Homogeneous Polynomial Kernels:

k ( x, y )   x, y  1
p
• Non-homogeneous Polynomial Kernels:

• Radial Basis Function (RBF) Kernels:



k ( x, y )  exp   x  y
2

31
Solution of non-linear SVM
• The problem:

 ,x 
m
1 m m
maximize


i 1
 i  
2 i 1 j 1
 
i j y i   j 
y k x i   j 

m
Subject to :  i 0
 y
i 1
i 

• The separating function: 0  i  C i

sgn f  x   sgn   i y i k x i  , x   b
m

i 1

32
33
34
35
36
37
38

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy