Lecture 3
Lecture 3
Lecture 3
Lecture 3
David Sontag
New York University
(Dual)
Lagrangian:
(1)
(2)
(3)
Dual formulation only depends on
dot-products of the features!
↵
~ 0
+1
-1
=
=
=
w.x + b
w.x + b
w.x + b
Final solution tends to
be sparse
•αj=0 for most j
Dual:
What changed?
• Added upper bound of C on αi!
• Intuitive explanation:
• Without slack, αi ∞ when constraints are violated (points
misclassified)
• Upper bound of C limits the αi, so misclassifications are allowed
Common kernels
• Polynomials of degree exactly d
• Polynomials of degree up to d
• Gaussian kernels
• Sigmoid
u
u2 1 v 2v1 = (u v + u v ) 2
⇥(u).⇥(v) = 2 . 2 = u1 v1 + u2 v2 = u.v
1 1 2 2
u2 v2 = (u.v) 2
⇥(u).⇥(v) = (u.v)d
For any d (we will skip proof): ⇥(u).⇥(v) = (u.v) d
= (u.v)=d (u.v)d
⇥(u).⇥(v)⇥(u).⇥(v)
Support vectors
Q: How would you prove that the “Gaussian kernel” is a valid kernel?
A: Expand the Euclidean norm as follows:
Any problems?
w+
Simultaneously learn 3 sets
of weights: w-
•How do we guarantee the
correct labels?
wo
•Need new constraints!
To predict, we use:
b+ = .5