0% found this document useful (0 votes)

22 views68 pages

07 SVMs

Support Vector Machines (SVMs) perform well using linear decision surfaces. SVMs find the optimal hyperplane that separates classes with the maximum margin. Maximizing the margin reduces the capacity of the model, which helps the model generalize better to new examples according to learning theory. Kernels allow SVMs to find separating hyperplanes in transformed feature spaces, enabling them to handle non-linear decision boundaries.

Uploaded by

nguyen van truong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views68 pages

07 SVMs

Uploaded by

nguyen van truong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 68

Support

Vector Machines
& Kernels

Doing really well with linear decision surfaces

Adapted from slides by Tim Oates

Outline
n Predic@on
n Why might predic@ons be wrong?
n Support vector machines
n Doing really well with linear models
n Kernels
n Making the non-‐linear linear
Why Might Predic@ons be Wrong?
• True non-‐determinism
– Flip a biased coin
– p(heads) = θ!
– Es@mate θ!
– If θ > 0.5 predict ‘heads’, else ‘tails’

Lots of ML research on problems like this:

– Learn a model
– Do the best you can in expecta@on
Why Might Predic@ons be Wrong?
• Par@al observability
– Something needed to predict y is missing from observa@on x
– N-‐bit parity problem
• x contains N-‐1 bits (hard PO)
• x contains N bits but learner ignores some of them (soW PO)

• Noise in the observa@on x

– Measurement error
– Instrument limita@ons
Why Might Predic@ons be Wrong?
• True non-‐determinism
• Par@al observability
– hard, soW
• Representa@onal bias
• Algorithmic bias
• Bounded resources
Representa@onal Bias

• Having the right features (x) is crucial

0 x! x2!
Support Vector Machines

Doing Really Well with Linear

Decision Surfaces
Strengths of SVMs
• Good generaliza@on
– in theory
– in prac@ce
• Works well with few training instances
• Find globally best model
• Eﬃcient algorithms
• Amenable to the kernel trick
Minor Nota@on Change
To be`er match nota@on used in SVMs
...and to make matrix formulas simpler

We will drop using superscripts for the i th instance

(i) Bold denotes

i th instance x xi vector

(i) Non-‐bold
i th instance label y yi denotes scalar

(i)
j th feature of i th instance xj xij

9
Linear Separators

• Training instances
x 2 Rd+1 , x0 = 1
y 2 { 1, 1}
Recall:
• Model parameters Inner (dot) product:
✓ 2 Rd+1 hu, vi = u · v = u| v
X
• Hyperplane = ui v i
✓ | x = h✓, xi = 0 i

• Decision func@on
h(x) = sign(✓ | x) = sign(h✓, xi)
Intui@ons
Intui@ons
Intui@ons
Intui@ons
A “Good” Separator
Noise in the Observa@ons
Ruling Out Some Separators
Lots of Noise
Only One Separator Remains
Maximizing the Margin
“Fat” Separators
“Fat” Separators
margin
Why Maximize Margin
Increasing margin reduces capacity
• i.e., fewer possible models

Lesson from Learning Theory:

• If the following holds:
– H is suﬃciently constrained in size
– and/or the size of the training data set n is large,
then low training error is likely to be evidence of low
generaliza@on error

23
Alterna@ve View of Logis@c Regression
1
h✓ (x) = ✓T x
h✓ (x) = g(z)
1+e
z = ✓ T xh✓ (x) = g(z)
z = ✓T x

If y =
1 , we want h
✓✓ (x)
⇡
1 , ✓ T x 0
If y =
0 , we want h
✓✓ (x)
⇡
0 , ✓ T x ⌧ 0
n
X
J(✓) = [yi log h✓ (xi ) + (1 yi ) log (1 h✓ (xi ))]
i=1
min J(✓) cost1 (✓ | xi ) cost0 (✓ | xi )
✓
24
Based on slide by Andrew Ng
Alternate View of Logis@c Regression
Cost of example: yi log h✓ (xi ) (1 yi ) log (1 h✓ (xi ))
h✓ (x) = g(z)
1 T
h✓ (x) = ✓ T x z = ✓ x
1+e
T
y=1 ✓ x 0
If y =
1 (want ✓ T x 0 ): If y =
0 (want ✓ T x ⌧
0 ):
y=0 ✓T x ⌧ 0

25
Based on slide by Andrew Ng
Logis@c Regression to SVMs
Logis@c Regression:
X n X d

min [yi log h✓ (xi ) + (1 yi ) log (1 h✓ (xi ))] + ✓j2
✓ i=1
2 j=1

Support Vector Machines:
X n X d
| | 1
min C [yi cost1 (✓ xi ) + (1 yi ) cost0 (✓ xi )] + ✓j2
✓
i=1
2 j=1

1
You can think of C as similar to !

26
Support Vector Machine
n
X d
X
1
min C [yi cost1 (✓ | xi ) + (1 yi ) cost0 (✓ | xi )] + ✓j2
✓
i=1
2 j=1
y=1
1 (want ✓ | x
If y = 1 ): 0 (want ✓ | x
If y = 
1 ):
y=0

-‐1 1 -‐1 1

`hinge (h(x)) = max(0, 1 y · h(x))

27
Based on slide by Andrew Ng
Support Vector Machine
n
X d
X
1
min C [yi cost1 (✓ | xi ) + (1 yi ) cost0 (✓ | xi )] + ✓j2
✓
i=1
2 j=1
y = 1 / 0

y = +1 / -‐1
with C = 1

d
1X 2 1
d
X
min
✓ 2
✓j min ✓j2
j=1 ✓ 2 j=1
1 s.t. ✓ | xi 1 if yi = 1 = 1 y (✓ | x )
if yi s.t. 1
i i
1 ✓ | xi  1 if yi = 1 if yi = 1

28
Maximum Margin Hyperplane
2
margin =
k✓k2

✓| x = 1 ✓| x = 1
Support Vectors

✓| x = 1 ✓| x = 1
Large Margin Classiﬁer in
Presence of Outliers

C very large

C not too large

31
Based on slide by Andrew Ng
Vector Inner Product
v 2 v!
u 2 u!
kuk2 = length(u) 2 R
q
θ = u21 + u22
p!
v 1 u 1

u| v = v| u
= u1 v 1 + u2 v 2
= kuk2 kvk2 cos ✓
= pkuk2 where p = kvk2 cos ✓
32
Based on example by Andrew Ng
Understanding the Hyperplane
Xd
1
min ✓j2 Assume θ0 = 0 so that the
✓ 2
j=1 hyperplane is centered at
1 s.t. ✓ | xi 1 if yi = 1 the iforigin,
yi =a1 nd that d = 2
 1 ✓ | xi  1 if yi = 1 if yi = 1

x!
✓ | x = k✓k2 kxk2 cos ✓
| {z }
θ p

= pk✓k2
θ
p!
33
Based on example by Andrew Ng
Based on example by Andrew Ng

Maximizing the Margin

1 Xd Assume θ0 = 0 so that the
min ✓j2 hyperplane is centered at
✓ 2
j=1 the origin, and that d = 2
1 s.t. ✓ | xi 1 if yi = 1 if y = 1
i
Let pi be the projec@on of
 1 ✓ | xi  1 if yi = 1 xi oifnto
yithe
= vector
1 θ

θ
-θ
θ
-θ
Since p is small, therefore k✓k 2 must Since p is larger, k✓k
2 can be smaller
be large to have pk✓k
2 1 (or ≤ -‐1) in order to have pk✓k
2 1 (or ≤ -‐1)
Size of the Margin
For the support vectors, we have pk✓k
2 =
±1

• p is the length of the projec@on of the SVs onto θ

p
Therefore,
1
p=
k✓k2
-θ
θ 2
margin = 2p =
k✓k2

margin

35
The SVM Dual Problem
The primal SVM problem was given as
X d
1
min ✓j2
✓ 2
j=1
s.t. yi (✓ | xi ) 1 8i

Can solve it more eﬃciently by taking the Lagrangian dual
• Duality is a common idea in op@miza@on
• It transforms a diﬃcult op@miza@on problem into a simpler one
• Key idea: introduce slack variables αi for each constraint
– αi indicates how important a par@cular constraint is to the solu@on

36
The SVM Dual Problem
• The Lagrangian is given by
X d Xn
1
L(✓, ↵) = ✓j2 ↵i (yi ✓ | x 1)
2 j=1 i=1

s.t. ↵i 0 8i

• We must minimize over θ and maximize over α

• At op@mal solu@on, par@als w.r.t θ’s are 0

Solve by a bunch of algebra and calculus ...

and we obtain ...
37
SVM Dual Representa@on
n
X n X
X n
1
Maximize J(↵) = ↵i ↵i ↵j yi yj hxi , xj i
i=1
2 i=1 j=1

s.t. ↵i 0 8i
X
↵ i yi = 0
i

The decision func@on is given by !

X
h(x) = sign ↵i yi hx, xi i + b
i2SV
0 1
1 X @ X
where b = yi ↵j yj hxi , xj iA
|SV|
i2SV j2SV
38
Understanding the Dual
n
X n X
X n
1
Maximize J(↵) = ↵i ↵i ↵j yi yj hxi , xj i
i=1
2 i=1 j=1

s.t. ↵i 0 8i
X
↵ i yi = 0
i

Balances between the Constraint weights (αi’s)

weight of constraints for
diﬀerent classes! cannot be nega@ve!

39
Understanding the Dual
n
X n X
X n
1
Maximize J(↵) = ↵i ↵i ↵j yi yj hxi , xj i
i=1
2 i=1 j=1

s.t. ↵i 0 8i
X
↵ i yi = 0
i

Points with diﬀerent labels Measures the similarity

increase the sum
between points!
Points with same label
decrease the sum

Intui@vely, we should be more careful around points

near the margin 40
Understanding the Dual
n
X n X
X n
1
Maximize J(↵) = ↵i ↵i ↵j yi yj hxi , xj i
i=1
2 i=1 j=1

s.t. ↵i 0 8i
X
↵ i yi = 0
i

In the solu@on, either:

|
• αi > 0 and the constraint is @ght ( i x
y (✓ i ) =
1 )
Ø point is a support vector
• αi = 0
Ø point is not a support vector
41
Employing the Solu@on
• Given the op@mal solu@on α*, op@mal weights are
X
?
✓ = ↵i? yi xi
i2SVs
– In this formula@on, have not added x0 = 1

• Therefore, we can solve one of the SV constraints

yi (✓ ? · xi + ✓0 ) = 1
to obtain θ0

– Or, more commonly, take the average solu@on over all
support vectors

42
What if Data Are Not
Linearly Separable?
1 X d
min ✓j2
✓ 2 j=1

s.t. yi (✓ | xi )
• Cannot ﬁnd θ that sa@sﬁes 1 8i

• Introduce slack variables ξi!

yi (✓ | xi ) 1 ⇠i 8i

• New problem: 1
d
X X
min ✓j2 + C ⇠i
✓ 2 j=1 i

s.t. yi (✓ | xi ) 1 ⇠i 8i
Strengths of SVMs
• Good generaliza@on in theory
• Good generaliza@on in prac@ce
• Work well with few training instances
• Find globally best model
• Eﬃcient algorithms
• Amenable to the kernel trick …
What if Surface is Non-‐Linear?

O O O
O O O O O O

O X O
X X X
O O
O X X O
O O O
O
O
Image from h`p://www.atrandomresearch.com/iclass/
Kernel Methods

Making the Non-‐Linear Linear

When Linear Separators Fail

0 x! x2!
Mapping into a New Feature Space

: X 7! X̂ = (x)
• For example, with xi 2 R2
([xi1 , xi2 ]) = [xi1 , xi2 , xi1 xi2 , x2i1 , x2i2 ]
• Rather than run SVM on xi, run it on Φ(xi)
– Find non-‐linear separator in input space

• What if Φ(xi) is really big?

• Use kernels to compute it implicitly!
Image from h`p://web.engr.oregonstate.edu/ ~afern/classes/cs534/
Kernels
• Find kernel K such that
K(xi , xj ) = h (xi ), (xj )i

• Compu@ng K(x
i , x j ) =
should
h (xbi ),e eﬃcient,
(xj )i much
more so than compu@ng Φ(xi) and Φ(xj)

• Use K(x
i , x
j ) =
in hSVM
(xia),lgorithm
(xj )i rather than hxi , xj i
• Remarkably, this is possible!
The Polynomial Kernel
= [xLet
i1 , x i2
x i] =
[x
i1 , x i2
] and xj = [xj1 , xxj2j] = [xj1 , xj2 ]

Consider the following func@on:

2
K(x i , x j ) = hx i , x i
j 2
K(xi , xj ) = hxi , xj i
= (xi1 xj1 + xi2 xj2 )2
2
= (x i1
2 j1 x + x i2 x
2 j2 )
= (xi1 xj1 + xi2 x2j2 + 2xi1 xi2 xj1 xj2 )
2
2 2 2 2
= (x i1 j1x + x i2 xj2 + 2xi1 xi2 xj1 xj2 )
= h (xi ), (xj )i
where (x ) = h 2(xi ), p
2 (xj )i
i = [xi1 , xi2 , p2xi1 xi2 ]
(xi ) = [x22i1 , x2i2 p
2 , 2xi1 xi2 ]
(xj ) = [xj1 , xj2 , p2xj1 xj2 ]
(xj ) = [x2j1 , x2j2 , 2xj1 xj2 ]
The Polynomial Kernel
d
• Given by K(xi , xj ) = hxi , xj i
– Φ(x) contains all monomials of degree d!

• Useful in visual pa`ern recogni@on

– Example:
• 16x16 pixel image
• 1010 monomials of degree 5
• Never explicitly compute Φ(x) !

• Varia@on: K(xi , xj ) = (hxi , xj i + 1)d

– Adds all lower-‐order monomials (degrees 1,...,d )!
The Kernel Trick

“Given an algorithm which is formulated

in terms of a posi@ve deﬁnite kernel K1,
one can construct an alterna@ve
algorithm by replacing K1 with another
posi@ve deﬁnite kernel K2”

Ø SVMs can use the kernel trick

Incorpora@ng Kernels into SVM
n
X n X
X n
1
J(↵) = ↵i ↵i ↵j yi yj hxi , xj i
i=1
2 i=1 j=1

s.t. ai 0 8i
X
Xn ↵ i yi = X
0n X n
1
J(↵) = i ↵i ↵i ↵j yi yj K(xi , xj )
i=1
2 i=1 j=1
s.t. ai 0 8i
X
↵ i yi = 0
i

53
The Gaussian Kernel
• Also called Radial Basis Func@on (RBF) kernel
✓ 2
◆
kxi xj k2
K(xi , xj ) = exp
2 2
– Has value 1 when xi = xj!
– Value falls oﬀ to 0 with increasing distance
– Note: Need to do feature scaling before using Gaussian Kernel

-‐3 -‐3 -‐3

-‐1 5 -‐1 5 -‐1 5
1 0 1 0 1 0
3 -‐5 3 -‐5 3 -‐5

lower bias, higher bias,

higher variance lower variance
54
Gaussian Kernel Example
`1 ✓ ◆
`2 kxi xj k22
K(xi , xj ) = exp 2
2

Imagine we’ve learned that:

`3 ✓ = [ 0.5, 1, 1, 0]

Predict +1 if ✓0 + ✓1 K(x, `1 ) + ✓2 K(x, `2 ) + ✓3 K(x, `3 ) 0

Based on example by Andrew Ng 55

Gaussian Kernel Example
`1 ✓ ◆
`2 kxi xj k22
K(xi , xj ) = exp 2
x1 2

Imagine we’ve learned that:

`3 ✓ = [ 0.5, 1, 1, 0]

Predict +1 if ✓0 + ✓1 K(x, `1 ) + ✓2 K(x, `2 ) + ✓3 K(x, `3 ) 0

• For x1, we have K(x

1 , ` 1 ) ⇡
1 , other similari@es ≈ 0
✓0 + ✓1 (1) + ✓2 (0) + ✓3 (0)
= 0.5 + 1(1) + 1(0) + 0(0)
= 0.5 0 , so predict +1 56
Based on example by Andrew Ng
Gaussian Kernel Example
`1 ✓ ◆
`2 kxi xj k22
K(xi , xj ) = exp 2
2

Imagine we’ve learned that:

`3 x2 ✓ = [ 0.5, 1, 1, 0]

Predict +1 if ✓0 + ✓1 K(x, `1 ) + ✓2 K(x, `2 ) + ✓3 K(x, `3 ) 0

• For x2, we have K(x

2 , ` 3 ) ⇡
1 , other similari@es ≈ 0
✓0 + ✓1 (0) + ✓2 (0) + ✓3 (1)
= 0.5 + 1(0) + 1(0) + 0(1)
Based on example by Andrew Ng
= 0.5 < 0 , so predict -‐1 57
Gaussian Kernel Example
`1 ✓ ◆
`2 kxi xj k22
K(xi , xj ) = exp 2
2

Imagine we’ve learned that:

`3 ✓ = [ 0.5, 1, 1, 0]

Predict +1 if ✓0 + ✓1 K(x, `1 ) + ✓2 K(x, `2 ) + ✓3 K(x, `3 ) 0

Rough sketch of decision surface

Based on example by Andrew Ng 58

Other Kernels
• Sigmoid Kernel
|
K(xi , xj ) = tanh (↵xi xj + c)
– Neural networks use sigmoid as ac@va@on func@on
– SVM with a sigmoid kernel is equivalent to 2-‐layer perceptron

• Cosine Similarity Kernel

|
xi xj
K(xi , xj ) =
kxi k kxj k
– Popular choice for measuring similarity of text documents
– L2 norm projects vectors onto the unit sphere; their dot
product is the cosine of the angle between the vectors
59
Other Kernels
• Chi-‐squared Kernel !
X (xik xjk )2
K(xi , xj ) = exp
xik + xjk
k
– Widely used in computer vision applica@ons
– Chi-‐squared measures distance between probability
distribu@ons
– Data is assumed to be non-‐nega@ve, oWen with L1 norm of 1

• String kernels
• Tree kernels
• Graph kernels
60
An Aside: The Math Behind Kernels
What does it mean to be a kernel?
• K(x
i , x
j ) =
h (x
i ),
(x
j )i
for some Φ

What does it take to be a kernel?

• The Gram matrix Gij = K(xi , xj )
– Symmetric matrix
– Posi@ve semi-‐definite matrix:
"zTGz ≥ 0 for every non-‐zero vector z 2 Rn

Establishing “kernel-‐hood” from first principles is non-‐trivial
!
A Few Good Kernels...
• Linear Kernel K(xi , xj ) = hxi , xj i
d
• Polynomial kernel K(xi , xj ) = (hxi , xj i + c)
– c ≥ 0 trades off influence of lower order terms
✓ ◆
kxi xj k22
• Gaussian kernel K(xi , xj ) = exp 2
2
• Sigmoid kernel K(xi , xj ) = tanh (↵x|i xj + c)

Many more...
• Cosine similarity kernel
• Chi-‐squared kernel
• String/tree/graph/wavelet/etc kernels
62
Applica@on: Automa@c Photo Retouching
(Leyvand et al., 2008)
Prac@cal Advice for Applying SVMs
• Use SVM soWware package to solve for parameters
– e.g., SVMlight, libsvm, cvx (fast!), etc.

• Need to specify:

– Choice of parameter C!
– Choice of kernel func@on
• Associated kernel parameters
d
e.g., K(xi , xj ) = (hxi , xj i + c)
✓ 2
◆
kxi xj k2
K(xi , xj ) = exp
2 2

64
Mul@-‐Class Classiﬁca@on with SVMs

y 2 {1, . . . , K}

• Many SVM packages already have mul@-‐class

classiﬁca@on built in
• Otherwise, use one-‐vs-‐rest
– Train K SVMs, each picks out one class from rest,
yielding ✓ (1) , . . . , ✓ (K)
(i) |
– Predict class i with largest (✓ ) x
Based on slide by Andrew Ng 65
SVMs vs Logis@c Regression
(Advice from Andrew Ng)
n = # training examples d = # features

If d is large (rela@ve to n) (e.g., d > n with d = 10,000, n = 10-‐1,000)
• Use logis@c regression or SVM with a linear kernel

If d is small (up to 1,000), n is intermediate (up to 10,000)
• Use SVM with Gaussian kernel

If d is small (up to 1,000), n is large (50,000+)

• Create/add more features, then use logis@c regression or SVM
without a kernel

Neural networks likely to work well for most of these
se~ngs, but may be slower to train
Based on slide by Andrew Ng 66
Other SVM Varia@ons
• nu SVM
– nu parameter controls:
• Frac@on of support vectors (lower bound) and
misclassiﬁca@on rate (upper bound)
• E.g., ⌫ =
0.05
guarantees that ≥ 5% of training points are
SVs and training error rate is ≤ 5%
– Harder to op@mize than C-‐SVM and not as scalable
• SVMs for regression
• One-‐class SVMs
• SVMs for clustering
...

67
Conclusion
• SVMs ﬁnd op@mal linear separator
• The kernel trick makes SVMs learn non-‐linear
decision surfaces

• Strength of SVMs:

– Good theore@cal and empirical performance
– Supports many types of kernels

• Disadvantages of SVMs:

– “Slow” to train/predict for huge data sets (but rela@vely fast!)
– Need to choose the kernel (and tune its parameters)

Syll Pondicherry
No ratings yet
Syll Pondicherry
33 pages
FMS307P EXAM QUESTIONSs
100% (1)
FMS307P EXAM QUESTIONSs
67 pages
Algebra 1 Textbook Aligment - Big Ideas
No ratings yet
Algebra 1 Textbook Aligment - Big Ideas
23 pages
Lecture 7 - SVM
No ratings yet
Lecture 7 - SVM
125 pages
SAP2000 Sign Covention
No ratings yet
SAP2000 Sign Covention
5 pages
MIT15 097S12 Lec12
No ratings yet
MIT15 097S12 Lec12
14 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
Chapter 8
No ratings yet
Chapter 8
52 pages
Support Vector Machine
No ratings yet
Support Vector Machine
49 pages
3 Classification 2
No ratings yet
3 Classification 2
27 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
103 pages
Support Vecto Machine
No ratings yet
Support Vecto Machine
62 pages
4 - SVM
No ratings yet
4 - SVM
58 pages
SVM Explained PDF
No ratings yet
SVM Explained PDF
19 pages
ML TCS Lecture 15
No ratings yet
ML TCS Lecture 15
46 pages
IC Syllabus For Calicut University
No ratings yet
IC Syllabus For Calicut University
135 pages
W12 SVM
No ratings yet
W12 SVM
52 pages
Support Vector Machines
No ratings yet
Support Vector Machines
33 pages
Machine Learning
No ratings yet
Machine Learning
45 pages
Foundations of Machine Learning: Part A: Logistic Regression
No ratings yet
Foundations of Machine Learning: Part A: Logistic Regression
63 pages
2022 SM Exam Booklet 1 - Solutions-1
No ratings yet
2022 SM Exam Booklet 1 - Solutions-1
87 pages
Partial Differential Equation
No ratings yet
Partial Differential Equation
23 pages
4 - DNN Tip
No ratings yet
4 - DNN Tip
52 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
Lecture Slides-Week12
100% (1)
Lecture Slides-Week12
41 pages
315 F19 14 SVM 1
No ratings yet
315 F19 14 SVM 1
33 pages
ML - 5 Sovan LR SVM 1
No ratings yet
ML - 5 Sovan LR SVM 1
59 pages
A First Course in Finite Elements: CIVL 8/7117 Chapter 1 - Introduction To FEM 1/53
No ratings yet
A First Course in Finite Elements: CIVL 8/7117 Chapter 1 - Introduction To FEM 1/53
53 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
45 pages
COMP 4211 - Machine Learning
No ratings yet
COMP 4211 - Machine Learning
19 pages
L5 SVM
No ratings yet
L5 SVM
61 pages
L5-Support Vector Machine
No ratings yet
L5-Support Vector Machine
61 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
SVM Slides
No ratings yet
SVM Slides
22 pages
Dirac Bergmann Algorithm
No ratings yet
Dirac Bergmann Algorithm
23 pages
Multiple Choice Questions
No ratings yet
Multiple Choice Questions
19 pages
Support Vector Machines For Classification and Regression
No ratings yet
Support Vector Machines For Classification and Regression
8 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
20 SVM
No ratings yet
20 SVM
35 pages
Unit 1,2,3
No ratings yet
Unit 1,2,3
17 pages
An Idiot's Guide To Support Vector Machines
No ratings yet
An Idiot's Guide To Support Vector Machines
28 pages
Q2 Week1 Quadratic Inequalities
No ratings yet
Q2 Week1 Quadratic Inequalities
22 pages
8 SVMs
No ratings yet
8 SVMs
72 pages
The Matrix Cook Book
No ratings yet
The Matrix Cook Book
71 pages
10 SVM
No ratings yet
10 SVM
23 pages
Lecture 3
No ratings yet
Lecture 3
35 pages
Exp 14
No ratings yet
Exp 14
27 pages
Algebra 2 5.1 - 5.3 Quiz
No ratings yet
Algebra 2 5.1 - 5.3 Quiz
3 pages
Support Vector Machines: Xiaojin Zhu
No ratings yet
Support Vector Machines: Xiaojin Zhu
41 pages
Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML
No ratings yet
Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML
28 pages
Study Guide Exam 3 Part 2
No ratings yet
Study Guide Exam 3 Part 2
11 pages
Introduction To Support Vector Machines: 1 Description
No ratings yet
Introduction To Support Vector Machines: 1 Description
15 pages
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
No ratings yet
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
44 pages
Hurlstone Ext2 Math Trial 2024
No ratings yet
Hurlstone Ext2 Math Trial 2024
35 pages
MTH SS2 W4
No ratings yet
MTH SS2 W4
12 pages
Support Vector Machine
No ratings yet
Support Vector Machine
55 pages
An Introduction To Support Vector Machines
No ratings yet
An Introduction To Support Vector Machines
13 pages
Lec5 Support Vector Machine
No ratings yet
Lec5 Support Vector Machine
28 pages
Quiz Functions I
No ratings yet
Quiz Functions I
30 pages
Support Vector Machine
No ratings yet
Support Vector Machine
50 pages
An Idiot Guide To SVM
No ratings yet
An Idiot Guide To SVM
25 pages
Grade 9 Math
No ratings yet
Grade 9 Math
2 pages
Support Vector Machines (SVM) : Y.H. Hu
No ratings yet
Support Vector Machines (SVM) : Y.H. Hu
25 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
Week 1
No ratings yet
Week 1
8 pages
Combinatorics Handout # 7: 1 Problems
No ratings yet
Combinatorics Handout # 7: 1 Problems
2 pages
SVM Problems1
No ratings yet
SVM Problems1
5 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Cmpe 2 Lab M PDF
No ratings yet
Cmpe 2 Lab M PDF
11 pages
Lecture 9 - SVMs
No ratings yet
Lecture 9 - SVMs
8 pages
Leibniz formula for π
No ratings yet
Leibniz formula for π
4 pages
11 Macro Level WT 1
No ratings yet
11 Macro Level WT 1
4 pages
SVM Seminarbericht Hofmann
No ratings yet
SVM Seminarbericht Hofmann
16 pages
Dis11 Sol
No ratings yet
Dis11 Sol
5 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
Taller #1 Introducción. Taylor
No ratings yet
Taller #1 Introducción. Taylor
2 pages
Study Guide 312
No ratings yet
Study Guide 312
1 page
Differential and Integral Calculus - 1988 - Courant - Wiley Classics Library
No ratings yet
Differential and Integral Calculus - 1988 - Courant - Wiley Classics Library
1 page
Another Introduction SVM
No ratings yet
Another Introduction SVM
4 pages
Support Vector Machines
No ratings yet
Support Vector Machines
5 pages
hw3 Soln
No ratings yet
hw3 Soln
7 pages
189 Cheat Sheet Minicards
No ratings yet
189 Cheat Sheet Minicards
2 pages
Catalan Numbers - Paul Valiant - MOP (Green-Blue) 2011
No ratings yet
Catalan Numbers - Paul Valiant - MOP (Green-Blue) 2011
2 pages
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

07 SVMs

Uploaded by

07 SVMs

Uploaded by

Support

Doing really well with linear decision surfaces

Adapted from slides by Tim Oates

Lots of ML research on problems like this:

• Noise in the observa@on x

• Having the right features (x) is crucial

Doing Really Well with Linear

We will drop using superscripts for the i th instance

(i) Bold denotes

Lesson from Learning Theory:

`hinge (h(x)) = max(0, 1 y · h(x))

C not too large

Maximizing the Margin

• We must minimize over θ and maximize over α

Solve by a bunch of algebra and calculus ...

The decision func@on is given by !

Balances between the Constraint weights (αi’s)

Points with diﬀerent labels Measures the similarity

Intui@vely, we should be more careful around points

In the solu@on, either:

• Therefore, we can solve one of the SV constraints

• Introduce slack variables ξi!

Making the Non-­‐Linear Linear

• What if Φ(xi) is really big?

Consider the following func@on:

• Useful in visual pa`ern recogni@on

• Varia@on: K(xi , xj ) = (hxi , xj i + 1)d

“Given an algorithm which is formulated

Ø SVMs can use the kernel trick

-­‐3 -­‐3 -­‐3

lower bias, higher bias,

Imagine we’ve learned that:

Predict +1 if ✓0 + ✓1 K(x, `1 ) + ✓2 K(x, `2 ) + ✓3 K(x, `3 ) 0

Based on example by Andrew Ng 55

Imagine we’ve learned that:

Predict +1 if ✓0 + ✓1 K(x, `1 ) + ✓2 K(x, `2 ) + ✓3 K(x, `3 ) 0

• For x1, we have K(x

Imagine we’ve learned that:

Predict +1 if ✓0 + ✓1 K(x, `1 ) + ✓2 K(x, `2 ) + ✓3 K(x, `3 ) 0

• For x2, we have K(x

Imagine we’ve learned that:

Predict +1 if ✓0 + ✓1 K(x, `1 ) + ✓2 K(x, `2 ) + ✓3 K(x, `3 ) 0

Rough sketch of decision surface

Based on example by Andrew Ng 58

• Cosine Similarity Kernel

What does it take to be a kernel?

• Need to specify:

• Many SVM packages already have mul@-­‐class

If d is small (up to 1,000), n is large (50,000+)

• Strength of SVMs:

• Disadvantages of SVMs:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Making the Non-‐Linear Linear

-‐3 -‐3 -‐3

• Many SVM packages already have mul@-‐class