0% found this document useful (0 votes)

18 views35 pages

315 F19 15 SVM 2

optimizations of SVM

Uploaded by

Tigabu Yaya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views35 pages

315 F19 15 SVM 2

optimizations of SVM

Uploaded by

Tigabu Yaya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Disclaimer: These slides can include material from

different sources. I’ll happy to explicitly acknowledge

a source if required. Contact me for requests.

Introduction to Machine Learning

10-315 Fall ‘19

Lecture 15:
Support Vector Machines 2
Teacher:
Gianni A. Di Caro
Recap: SVM (hard-margin) optimization problem, linearly separable
Quadratic (convex) optimization problem with
𝑚 linear inequality constraints
∥ 𝒘 ∥#
min 𝑥#
𝒘,* 2

𝑠. 𝑡. 𝑦 1 𝒘2 𝒙 1 + 𝑏 ≥ 1, 𝑖 = 1, ⋯ , 𝑚

𝒘2 ; 𝒘
min
𝒘,* 2
1
𝑠. 𝑡. 𝑦 𝒘2 𝒙 1 + 𝑏 ≥ 1 𝑖 = 1, ⋯ , 𝑚

𝑥$

2
Recap: What if data is still not linearly separable? → Slack variables
v Allow error in classification B

min 𝒘2 ; 𝒘 + p ? 𝜉@
𝒘,*,𝝃
@A$

𝑠. 𝑡. 𝑦 1 𝒘2 𝒙 1 + 𝑏 ≥ 1 − 𝜉1 𝑖 = 1, ⋯ , 𝑚
𝜉1 ≥ 0 𝑖 = 1, ⋯ , 𝑚

o 𝜉1 = slack variable, gets a value > 1 if 𝒙 1 is misclassified

o 0 < 𝜉1 < 1 if 𝒙 1 is correctly classified but within the margin
o 𝜉1 = 0 if 𝒙 1 is correctly classified
ü Soft margins: some examples are
within the margin zone
ü Some examples are misclassified ü Still a convex, quadratic programming problem J

3
Recap: What if data is still not linearly separable? → Slack variables

o 𝜉1 > 1 if 𝒙 1 is misclassified
o 0 < 𝜉1 < 1 if 𝒙 1 is correctly classified, within margin
o 𝜉1 = 0 if 𝒙 1 is correctly classified

o 𝜉1 is linearly proportional to the distance from the class

margin if 𝒙 1 is misclassified or within the margins
o In the obj pay a linearly proportional penalty for mistakes,
𝑏=1

a small one for being inside the margin

𝑏=−
𝒘 2𝒙 +

o 𝑝 penalty for unit of distance mistake, trade-off

𝒘 2𝒙 +

parameter between hard and soft objectives (usually by

cross-validation, e.g. 𝑝 = 1/𝑑)

4
Recap: Soft-margin SVM

𝑦 1 𝒘2 𝒙 1 + 𝑏 ≥ 1 − 𝜉1 𝑖 = 1, ⋯ , 𝑚 Soften the
𝜉1 ≥ 0 𝑖 = 1, ⋯ , 𝑚 constraints

B
Penalty for misclassifying or
min 𝒘2 ; 𝒘 + p ? 𝜉@ taking 𝒙 @ inside margin:
𝒘,*,𝝃
@A$
𝑝 𝜉@
𝑏=1

1
𝑏=−
𝒘 2𝒙 +

How do we recover hard margin SVM?

𝒘 2𝒙 +

Set: 𝑝 = ∞

5
Recap: Support Vectors in soft-margin SVM

𝒘 2𝒙 + 𝑏 = 𝒘 2𝒙 + 𝑏 =
1 1
𝒘 2𝒙 + 𝑏 𝒘 2𝒙 + 𝑏
= −1 = −1

v Margin support vectors v Non-margin support vectors

ü 𝜉1 = 0, 𝑦 1 𝒘2 𝒙 1 +𝑏 =1 ü 𝜉1 > 0
Ø Don’t contribute to objective but enforce Ø Contribute to both objective and constraints
constraints on solution o 0 < 𝜉1 < 1 Correctly classified but inside margin
Ø Correctly classified but on margin o 𝜉1 > 1 Incorrectly classified 6
Soft-margin SVM: Hinge loss

Notice that
𝜉1 (𝑚𝑎𝑟𝑔𝑖𝑛) = max 1 − 𝒘2 𝒙 1 + 𝑏 𝑦 1 ,0

𝒘 2𝒙 + 𝑏 =
Hinge loss
1 𝒘 2𝒙 + 𝑏
= −1
0-1 loss

-1 0 1 𝒘2 𝒙 1 +𝑏 𝑦 1

7
Loss Functions

8
Hinge Loss Function SVMs)

ü Intuition: hinge loss upper bounds 0-1 loss,

has non-trivial gradient
ü Loss = 0 only for margin least 1 Hinge loss
ü Try to increase margin if less than 1: 0-1 loss
Max-margin classifier
-1 0 1 𝒘2 𝒙 1 1
ü Hinge loss optimization problem: +𝑏 𝑦
B
1
min ? max 0, 1 − 𝑦 𝒘2 𝒙 1 + 𝑏
𝒘
1A$
B V

ü Hinge loss regularized optimization problem: min ? max 0, 1 − 𝑦 1

𝒘2 𝒙 1 + 𝑏 + 𝜆 ? 𝑤U#
𝒘
1A$ UA$

9
Hinge Loss Function

Hinge loss
B V 0-1 loss
min ? max 0, 1 − 𝑦 1 𝒘2 𝒙 1 +𝑏 + 𝜆 ? 𝑤U#
𝒘
1A$ UA$
-1 0 1 𝒘2 𝒙 1 +𝑏 𝑦 1

An error signal is triggered even when there’s no error (margin > 0), pushing the algorithm to
keep learning until a minimal margin of at least 1 is achieved for all data

The regularization function is to keep the hypothesis simple by avoiding large values for the
learning parameters, favoring generalization, which traduces to minimize margin!
Small margins ⇒ Good generalization

Geometric interpretation: Zero loss needs a margin 𝒘2 𝒙 𝑦 ≥ 1, in terms of geometric margin

this means: 𝑤 Y$ 𝒘2 𝒙 𝑦 ≥ 𝒘 Y$ ⟹ Keeping 𝒘 small increases geometric margin
10
Hinge loss optimization ↔ Soft-margin SVM

𝜉1 = max 0, 1 − 𝑦 1 𝒘2 𝒙 1 +𝑏
Hinge loss
0-1 loss

-1 0 1 𝒘2 𝒙 1 +𝑏 𝑦 1

Soft-margin SVM Regularized hinge loss optimization

min 𝒘2 ; 𝒘 + p ? 𝜉@ B V
𝒘,*,𝝃
@A$
min ? max 0, 1 − 𝑦 1 𝒘2 𝒙 1 + 𝑏 + 𝜆 ? 𝑤U#
1 2 1 𝒘
𝑠. 𝑡. 𝑦 𝒘 𝒙 + 𝑏 ≥ 1 − 𝜉1 𝑖 = 1, ⋯ , 𝑚 1A$ UA$

𝜉1 ≥ 0 𝑖 = 1, ⋯ , 𝑚

11
SVM vs. Logistic Regression

SVM : Hinge loss

Logistic Regression : Log loss ( log conditional likelihood)

Log loss Hinge loss

0-1 loss

-1 0 1
12
SVM – Linearly separable case

𝑛 training points (𝒙$ , 𝒙# , ⋯ , 𝒙] )

𝑑 features 𝒙1 is a 𝑑-dimensional vector

• Primal problem:

𝒘 – weights on features (𝒅-dim problem)

𝒘2 ; 𝒙 + 𝑏 = 0

o Convex quadratic program – quadratic objective, linear constraints

o But expensive to solve if 𝑑 and 𝑛 are very large
o Often solved in dual form (𝑛-dim problem)

13
Constrained Optimization and Constraint Activation

x⇤ = max(b, 0)

Constraint inactive Constraint active

and tight
14
Constrained Optimization – Dual Problem
Primal problem, constrained:

b +ve

Moving the constraint to objective function

Lagrangian function:

Lagrange multiplier
Price to pay for unit of constraint violation

Dual problem (unconstrained):

a = 0 constraint is inactive
a > 0 constraint is active
15
Dual problem ↔ Relaxation
o Given an optimization problem 𝑃 (primal) a relaxation 𝑅𝑃 of 𝑃 is a derived problem that removes or
aggregates constraints, and/or extends the range of variables (e.g., passing from 𝑥 ∈ 0,1 to 𝑥 ∈
[0,1]), and/or changes the objective function
o The overall aim is to possibly defining an easier problem to solve compared to 𝑃.
o Solving the (easier) 𝑅𝑃 problem we can obtain bounds on primal’s solution and, in certain special cases
(that include convex programming), the solution of the relaxed problem can be directly related to that
of the primal problem, such that solving the relaxed problem will provide the solution to the primal
§ A generic primal 𝑃 (min) can include both multiple
inequality constraints and equality constraints
§ The scalar value of the objective for an assignment
to the decision variables 𝒙 is represented as 𝑍d
min 𝑍d = 𝑓(𝒙) 𝑉 𝑃 = Feasibility region for 𝑃, from intersection
𝒙
𝑠. 𝑡. 𝑔1 𝒙 ≤ 𝑏1 𝑖 = 1, ⋯ , 𝑚 of all inequality and equality constraints

ℎ@ 𝒙 = 𝑑@ 𝑗 = 1, ⋯ , 𝑛 𝑥 ∈ 𝑋 ⊆ ℝV ∶ 𝑔1 𝑥 ≤ 𝑏1 , ℎ@ 𝑥 = 𝑑@ ,
𝑉 𝑃 =
𝑖 = 1, ⋯ , 𝑚, 𝑗 = 1, ⋯ , 𝑛
𝒙 ∈ 𝑋 ⊆ ℝV
16
Dual problem ↔ Relaxation

Primal problem 𝑃: Optimization problem 𝑅𝑃 (possibly derived from 𝑃):

min 𝑍d = 𝑓(𝒙) min 𝑍nd = 𝛷(𝒙)

𝒙 𝒙
𝑠. 𝑡. 𝒙 ∈ 𝑉(𝑃) 𝑠. 𝑡. 𝒙 ∈ 𝑉(𝑅𝑃)

v 𝑅𝑃 is a relaxation of the primal problem 𝑃 if:

1. 𝑉 𝑅𝑃 ⊇ 𝑉(𝑃)
Feasibility region of the relaxed problem fully
includes that of the primal
2. 𝛷 𝒙 ≤ 𝑓 𝒙 , ∀ 𝒙 ∈ 𝑉(𝑃)
Objective function of 𝑅𝑃 is always below 𝑓(𝒙)
(always above for a max problem)
17
Lagrangian relaxation (Lagrangian dual problem)

Primal problem 𝑃: Lagrangian relaxation of 𝑃:

min 𝑍d = 𝑓(𝒙) min 𝑍nd = 𝛷(𝒙, 𝝀, 𝝁)
𝒙 𝒙

𝑠. 𝑡. 𝑔1 𝒙 ≤ 𝑏1 𝑖 = 1, ⋯ , 𝑚 𝒙 ∈ 𝑋 ⊆ ℝV

ℎ@ 𝒙 = 𝑑@ 𝑗 = 1, ⋯ , 𝑛 𝝀 ≥ 0B , 𝝁 ∈ ℝ]

𝒙 ∈ 𝑋 ⊆ ℝV
v Where 𝛷(𝒙, 𝝀, 𝝁) adds to the primal’s 𝑓(𝑥) the linear
combination of all (or a subset of) constraints, weighted by
o It’s a relaxation, an easier problem
multipliers 𝝀, 𝝁 (each constraint gets a different multiplier)
to solve compared to the original
B ]
o Primal is constrained, Lagrangian
relaxation is unconstrained 𝛷 𝒙, 𝝀, 𝝁 = 𝑓 𝒙 + ? 𝜆1 𝑔1 𝒙 − 𝑏1 + ? 𝜇1 ℎ1 𝒙 − 𝑑1
1A$ 1A$

18
Lagrangian relaxation
For a general primal, the optimal solution of the Lagrangian relaxation is either not feasible for the
primal and/or does not correspond to 𝑃’s optimum

v The choice of the multipliers also affects the gap

19
Lagrangian relaxation
o Each multiplier weights the importance of the related constraint in the objective
o Moving the 𝑗-th constraint to the objective, we potentially allow a violation of the constraint, “paying” 𝜆@ for each
unit of constraint violation (the more the solution 𝒙 make the constraint being violated, the more we pay)

min 𝑍d = 24𝑥$ + 14𝑥#

z{,z| min 𝑍y = 24𝑥$ + 14𝑥# − 𝜆$ 3𝑥$ + 𝑥# − 12 +
z{,z|;𝝀
𝑠. 𝑡. 3𝑥$ + 𝑥# ≥ 12 𝜆# 4𝑥$ + 𝑥# − 10 − 𝜆• 2𝑥$ + 𝑥# − 7
4𝑥$ + 𝑥# ≤ 10 Lagrangian 𝑥$ , 𝑥# ∈ ℝ~
2𝑥$ + 𝑥# ≥ 7 relaxation 𝝀≥𝟎
𝑥$ , 𝑥# ∈ ℝ~

o If 𝜆$ = 8 and 𝑥$ , 𝑥# = 3, 1 , the first constraint 𝐶$ = 3𝑥$ + 𝑥# − 12 = −2 < 0, meaning that the

constraint is violated for 2 units. Therefore, a penalty (−8) ; (−2) = 16 has been payed in the objective.
o 𝜆$ penalty has a − sign in front of it because we want to add something to 𝑍y if a violation of the ≥ constraint
occurs. 𝜆# has a + sign because the violation of second constraint, ≤, generates a positive value
v In a min (max) problem, we add (subtract) violation units to the obj to search for a solution with minimal (or
zero, if possible) violations!
20
Lagrangian relaxation
o Central question: what is the right value to assign to multiplier weights?
o What is the right price to pay for a violation of a constraint?
Ø Look at multiplier as constant parameters and assign a weight value based on the importance of a constraint
Ø Look at multipliers as variables and find the best assignment to multipliers (the best price)

v By construction of a relaxation, for any assignment of values to the multipliers, the solution of the
Lagrangian relaxation is a lower bound for the solution of the primal problem of minimization.
(or an upper bound, if the primal is a problem of maximization)

General case

21
Lagrangian relaxation: Convex problems

ü For convex programs, it is possible to find an assignment such that the LB is tight and the
solution of the Lagrangian relaxation (that defines a concave problem in 𝝀) is the same as
that of the primal problem!

𝑍 𝑍 𝑓(𝑥)
𝑓(𝑥)
𝑧d
𝑧d = 𝑧y
𝑧y
Φ(𝑥; 𝜆)
Φ(𝑥; 𝜆)

𝑥 𝑥

Scenario with a generic assignment of Scenario with the best (solution of a new optimization
values to multipliers problem) assignment of values to multipliers

22
When, in general, 𝑍d ≡ 𝑍nd : Complementarity conditions

If the solution of relaxed problem, 𝒙∗𝝀,𝝁 , is feasible in 𝑉(𝑃) (i.e., it’s also a feasible solution point for the
primal) and the following conditions, termed complementarity conditions, hold ⟹ 𝒙∗𝝀,𝝁 is also 𝑃’s optimum
B
(note that for the equality constraints the
? 𝜆1 𝑔1 (𝒙∗ ) − 𝑏1 = 0
conditions above automatically hold)
1A$

Complementarity Complementarity
conditions are satisfied conditions not satisfied

23
Complementarity conditions and Lagrange multipliers
B
? 𝜆1 𝑔1 (𝒙∗) − 𝑏1 = 0 Complementarity conditions
1A$

o Therefore, for the solution being the same between convex primal and its Lagrangian relaxation, the
complementarity conditions must hold, implying that, at the optimum for the 𝑖-th constraint, either:

ü 𝜆1 = 0, or
ü Constraint is active: 𝑔1 (𝒙∗) − 𝑏1 = 0

o Another way of saying the same thing is that, at the optimum:

ü all active constraints have a multiplier value 𝜆1 > 0
ü all constraints that are not activated are associated to a multiplier with a zero value

Rationale: if a constraint 𝑖 is active (i.e., it is satisfied with the equality sign) in correspondence of the optimal
solution, then its potential violation should have a price 𝜆1 > 0, since a small change in the constraint would
change the value of the objective and the solution. Instead, if a constraint 𝑗 is not active, a small change in the
constraint would not change the solution / objective, 𝑗 is practically irrelevant for finding the solution
24
Let’s go back to our specific case: Connection between Primal and Dual

Primal problem: p* = Dual problem: d* =

Ø Weak duality: The dual solution d* lower bounds the primal solution p* i.e. d* ≤ p*

To see this, recall

For every feasible x (i.e. x ≥ b) and feasible α (i.e. α ≥ 0) , notice that

𝑑(α) = ≤ p*

Ø Dual problem (maximization) is always concave even if primal is not convex

𝑑 α = Is a concave function

25
Connection between Primal and Dual

Primal problem: p* = Dual problem: d* =

Ø Weak duality: The dual solution d* lower bounds the primal solution p* i.e. d* ≤ p*

Ø Strong duality: d* = p* holds often for many problems of interest e.g. if the primal is a
feasible convex objective with linear constraints

26
Connection between Primal and Dual
What does strong duality say about ↵⇤ (the ↵ that achieved optimal value of
dual) and x⇤ (the x that achieves optimal value of primal problem)?

Whenever strong duality holds, the following conditions (known as KKT con-
ditions) are true for ↵⇤ and x⇤ :

• 1. 5L(x⇤ , ↵⇤ ) = 0 i.e. Gradient of Lagrangian at x⇤ and ↵⇤ is zero.

• 2. x⇤ b i.e. x⇤ is primal feasible

• 3. ↵⇤ 0 i.e. ↵⇤ is dual feasible

• 4. ↵⇤ (x⇤ b) = 0 (called as complementary slackness)

We use the first one to relate x⇤ and ↵⇤ . We use the last one (complimentary
slackness) to argue that ↵⇤ = 0 if constraint is inactive and ↵⇤ > 0 if constraint
is active and tight.

27
Solving the dual

Optimization over 𝑥 is unconstrained.

Now need to maximize 𝐿(𝑥 ∗, 𝛼) over 𝛼 ≥ 0

Solve unconstrained problem to get α’ and then take max(α’,0)

) ↵0 = 2b

a = 0 constraint is inactive, α > 0 constraint is active and tight

28
Dual SVM – Linearly separable case
𝑛 training points (𝒙$ , 𝒙# , ⋯ , 𝒙] )
𝑑 features 𝒙1 is a 𝑑-dimensional vector

• Primal problem:

𝒘 – weights on features (𝒅-dim problem)

• Dual problem (derivation):

a – weights on training pts (𝒏-dim problem)

29
Dual SVM – linearly separable case

max min
𝜶 𝒘, *

min 𝐿
𝒘,*

If we can solve for as (dual problem), then we have a solution for 𝒘, 𝑏 (primal problem)

30
Dual SVM – linearly separable case

min 𝐿
𝒘,*

Substituting the solutions from min part in 𝐿:

𝑚𝑎𝑥Œ min 𝐿
𝒘,*

Dual problem is also QP

What about b?
Solution gives ajs
(not in eqs)
31
Dual SVM: Sparsity of dual solution

aj = 0
aj > 0 aj = 0

aj > 0 Only few ajs can be non-zero :

where constraint is active and tight

aj > 0 𝒘2 ; 𝒙@ + 𝑏 𝑦@ = 1
aj = 0
Support vectors – training points 𝒙@
0
+𝑏 =

whose aj is non-zero
𝒘2 ; 𝒙

32
Dual SVM – Linearly separable case

Dual problem is also QP

Solution gives ajs

Use support vectors with 𝛼U > 0 to compute 𝑏, since constraint is tight 𝒘2 ; 𝒙@ + 𝑏 𝑦@ = 1

33
Dual SVM – Non-separable case

• Primal problem:
,{ξj}

• Dual problem: Lagrange

Multipliers

,{ξj} L(w, b, ⇠, ↵, µ)

34
Dual SVM – Non-separable case

comes from
@L Intuition:
=0 If C→∞, recover hard-margin SVM
@⇠

Dual problem is also QP

Solution gives ajs

Optimization Models
No ratings yet
Optimization Models
104 pages
VLSI Lab Viva Question With Answers
50% (4)
VLSI Lab Viva Question With Answers
15 pages
Chapter 7 INT 21H
No ratings yet
Chapter 7 INT 21H
14 pages
Samyuktha Karnataka Bangalore
No ratings yet
Samyuktha Karnataka Bangalore
18 pages
ICA AM Invoice
No ratings yet
ICA AM Invoice
3 pages
Raspberry Pi For Beginners & Advanced Users The Comprehensive Raspberry Pi Mastery Guide
No ratings yet
Raspberry Pi For Beginners & Advanced Users The Comprehensive Raspberry Pi Mastery Guide
94 pages
Scs302 Artificial Intelligence Notes
No ratings yet
Scs302 Artificial Intelligence Notes
110 pages
Winshuttle Simplify Your SAP Data Migration Ebook EN PDF
No ratings yet
Winshuttle Simplify Your SAP Data Migration Ebook EN PDF
19 pages
054581ATG0 Fire Alarm Product Catalog 2017 - Austria
No ratings yet
054581ATG0 Fire Alarm Product Catalog 2017 - Austria
247 pages
Object Oriented Programming Exercises 1
100% (1)
Object Oriented Programming Exercises 1
3 pages
Microprocessor Chapter 2
No ratings yet
Microprocessor Chapter 2
111 pages
AZ-104 Microsoft Azure Administrator Exam Dumps 1
No ratings yet
AZ-104 Microsoft Azure Administrator Exam Dumps 1
14 pages
V60 User Manual of Ultrasonic Water Meter
No ratings yet
V60 User Manual of Ultrasonic Water Meter
33 pages
University of Gondar Institute of Technology: C++ Programming-2 Assignment
No ratings yet
University of Gondar Institute of Technology: C++ Programming-2 Assignment
7 pages
Lecture 5. Support Vector Machines SVM
No ratings yet
Lecture 5. Support Vector Machines SVM
47 pages
cs224n 2023 Lecture04 Dep Parsing
No ratings yet
cs224n 2023 Lecture04 Dep Parsing
45 pages
Chapter 4
No ratings yet
Chapter 4
80 pages
Machine Learning Is A Computer Vision
No ratings yet
Machine Learning Is A Computer Vision
7 pages
Subnetting and Supernetting Classful Addressing
No ratings yet
Subnetting and Supernetting Classful Addressing
34 pages
Naveen Badisa Resume
No ratings yet
Naveen Badisa Resume
1 page
Harvestable Black Pepper Recognition Using Computer Vision
No ratings yet
Harvestable Black Pepper Recognition Using Computer Vision
6 pages
1.CS100 - 1 - First Week
No ratings yet
1.CS100 - 1 - First Week
53 pages
Abishek
No ratings yet
Abishek
10 pages
XC10 - IOM - Manual de Instalação Central Combate (A6V10257473) - 2019
No ratings yet
XC10 - IOM - Manual de Instalação Central Combate (A6V10257473) - 2019
104 pages
Support Vector Machines For Classification and Regression
No ratings yet
Support Vector Machines For Classification and Regression
8 pages
PSM I PDF
No ratings yet
PSM I PDF
5 pages
CS6551 Chapter 7
No ratings yet
CS6551 Chapter 7
57 pages
08 Training
No ratings yet
08 Training
18 pages
A Study On Machine Learning Algorithms and Its Applications
No ratings yet
A Study On Machine Learning Algorithms and Its Applications
13 pages
SVM Explained PDF
No ratings yet
SVM Explained PDF
19 pages
Ajst 9 1 150 154
No ratings yet
Ajst 9 1 150 154
5 pages
Assignment 02
No ratings yet
Assignment 02
2 pages
Feature Selection
No ratings yet
Feature Selection
6 pages
Pressure Calibration: Applications Solutions
No ratings yet
Pressure Calibration: Applications Solutions
32 pages
A Tutorial On Support Vector Regression
No ratings yet
A Tutorial On Support Vector Regression
30 pages
Quiz 3 Key
No ratings yet
Quiz 3 Key
6 pages
The Hongkong and Shanghai Banking Corporation Limited Branch
No ratings yet
The Hongkong and Shanghai Banking Corporation Limited Branch
4 pages
$0-V11 Parallel Licenses - 30 Sep - 2023 For IQVIA
No ratings yet
$0-V11 Parallel Licenses - 30 Sep - 2023 For IQVIA
4 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
8 pages
Deep Learning Basics Lecture 3 Regularization I
No ratings yet
Deep Learning Basics Lecture 3 Regularization I
32 pages
8086 Hardware Specification: Segment 5
No ratings yet
8086 Hardware Specification: Segment 5
19 pages
AI Assignment 4
No ratings yet
AI Assignment 4
5 pages
3 Classification 2
No ratings yet
3 Classification 2
27 pages
Overview Network
No ratings yet
Overview Network
42 pages
Lectures HD
No ratings yet
Lectures HD
301 pages
CPE186 Lesson Plan 7
No ratings yet
CPE186 Lesson Plan 7
6 pages
An Idiot's Guide To Support Vector Machines
No ratings yet
An Idiot's Guide To Support Vector Machines
28 pages
BV Cvxslides PDF
No ratings yet
BV Cvxslides PDF
301 pages
Convolutional Networks
No ratings yet
Convolutional Networks
25 pages
ConvexOptimization Boyd Slides
No ratings yet
ConvexOptimization Boyd Slides
394 pages
Chap04 ConvexOptimizationBasics
No ratings yet
Chap04 ConvexOptimizationBasics
29 pages
GST India Sales Demo Guideline
No ratings yet
GST India Sales Demo Guideline
18 pages
MCNN-AAPT Accurate Classification and Functional Prediction of Amino Acid and Peptide Transporters in Secondary Active Transporters Using Protein Lan
No ratings yet
MCNN-AAPT Accurate Classification and Functional Prediction of Amino Acid and Peptide Transporters in Secondary Active Transporters Using Protein Lan
11 pages
ML - 5 Sovan LR SVM 1
No ratings yet
ML - 5 Sovan LR SVM 1
59 pages
Qualcomm Technologies, Inc.: Device Description Key Features (See For Details)
No ratings yet
Qualcomm Technologies, Inc.: Device Description Key Features (See For Details)
57 pages
Concise - Lecture - Notes - On - Optimization - Methods - 1722728042 2024-08-03 23 - 34 - 09
No ratings yet
Concise - Lecture - Notes - On - Optimization - Methods - 1722728042 2024-08-03 23 - 34 - 09
258 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
6 pages
3 Limit and Continuity: 3.1 Definition of Limits
No ratings yet
3 Limit and Continuity: 3.1 Definition of Limits
94 pages
Lecture 9 - SVM
No ratings yet
Lecture 9 - SVM
42 pages
8 SVMs
No ratings yet
8 SVMs
72 pages
Foundations of Machine Learning: Part A: Logistic Regression
No ratings yet
Foundations of Machine Learning: Part A: Logistic Regression
63 pages
Lecture5 SVM
No ratings yet
Lecture5 SVM
67 pages
Support Vector Machine
No ratings yet
Support Vector Machine
49 pages
PID Tutorial
No ratings yet
PID Tutorial
3 pages
Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML
No ratings yet
Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML
28 pages
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
No ratings yet
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
44 pages
Convexity II: Optimization Basics: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Convexity II: Optimization Basics: Ryan Tibshirani Convex Optimization 10-725
28 pages
Lecture 7
No ratings yet
Lecture 7
46 pages
Lec 06 SVM
No ratings yet
Lec 06 SVM
34 pages
10 SVM
No ratings yet
10 SVM
23 pages
Data Science - Convex Optimization and Examples PDF
No ratings yet
Data Science - Convex Optimization and Examples PDF
9 pages
An Idiot Guide To SVM
No ratings yet
An Idiot Guide To SVM
25 pages
Module - 2 Ver 1.4
No ratings yet
Module - 2 Ver 1.4
35 pages
Lecture 7 - SVM
No ratings yet
Lecture 7 - SVM
125 pages
Kernel SVM For Image Classification
No ratings yet
Kernel SVM For Image Classification
20 pages
Convex Optimization in Classification Problems: MIT/ORC Spring Seminar
No ratings yet
Convex Optimization in Classification Problems: MIT/ORC Spring Seminar
39 pages
05 AIS302 ANN-Optimization
No ratings yet
05 AIS302 ANN-Optimization
44 pages
2024 Scu ML 2 1 SVM
No ratings yet
2024 Scu ML 2 1 SVM
36 pages
L5 SVM
No ratings yet
L5 SVM
61 pages
Classification: Linear SVM
No ratings yet
Classification: Linear SVM
26 pages
315 F19 14 SVM 1
No ratings yet
315 F19 14 SVM 1
33 pages
04SVM
No ratings yet
04SVM
22 pages
Chapter 0: Introduction: 0.2.1 Examples in Machine Learning
No ratings yet
Chapter 0: Introduction: 0.2.1 Examples in Machine Learning
4 pages
Lec 4
No ratings yet
Lec 4
53 pages
Les Hoches 2022 Convex Optimization
No ratings yet
Les Hoches 2022 Convex Optimization
34 pages
A Survey of Support Vector Machines With Uncertainties: Ximing Wang Panos M. Pardalos
No ratings yet
A Survey of Support Vector Machines With Uncertainties: Ximing Wang Panos M. Pardalos
17 pages
Support Vector Machines: Xiaojin Zhu
No ratings yet
Support Vector Machines: Xiaojin Zhu
41 pages
Lec 4
No ratings yet
Lec 4
19 pages
Ds 3
No ratings yet
Ds 3
25 pages
Lecture15 Regularization
No ratings yet
Lecture15 Regularization
47 pages
(Optimization) SVMs
No ratings yet
(Optimization) SVMs
19 pages
CS480 6 Linear Models
No ratings yet
CS480 6 Linear Models
68 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
48 pages
Lecture 4
No ratings yet
Lecture 4
9 pages
Ds 5
No ratings yet
Ds 5
21 pages
Lecture 17 - Hyperplane Classifiers - SVM - Plain
No ratings yet
Lecture 17 - Hyperplane Classifiers - SVM - Plain
16 pages
1.1 Hard Margin SVM (Insert Diagrams)
No ratings yet
1.1 Hard Margin SVM (Insert Diagrams)
4 pages
Lec 6 Tutorial
No ratings yet
Lec 6 Tutorial
27 pages
Machine Learning - SVM
No ratings yet
Machine Learning - SVM
11 pages
Math Concept-1st Semester
No ratings yet
Math Concept-1st Semester
30 pages
U9l2 - Solving Quadratic Equations by Graphing Lesson Plan
No ratings yet
U9l2 - Solving Quadratic Equations by Graphing Lesson Plan
2 pages
Introduction to Calculus
From Everand
Introduction to Calculus
Joan Van Glabek
4.5/5 (8)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
Limits and Continuity (Calculus) Engineering Entrance Exams Question Bank
From Everand
Limits and Continuity (Calculus) Engineering Entrance Exams Question Bank
Mohmmad Khaja Shareef
No ratings yet
Exercises of Logarithms and Exponentials
From Everand
Exercises of Logarithms and Exponentials
Simone Malacrida
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

315 F19 15 SVM 2

Uploaded by

315 F19 15 SVM 2

Uploaded by

Disclaimer: These slides can include material from

different sources. I’ll happy to explicitly acknowledge

Introduction to Machine Learning

o 𝜉1 = slack variable, gets a value > 1 if 𝒙 1 is misclassified

o 𝜉1 is linearly proportional to the distance from the class

a small one for being inside the margin

o 𝑝 penalty for unit of distance mistake, trade-off

parameter between hard and soft objectives (usually by

How do we recover hard margin SVM?

v Margin support vectors v Non-margin support vectors

ü Intuition: hinge loss upper bounds 0-1 loss,

ü Hinge loss regularized optimization problem: min ? max 0, 1 − 𝑦 1

Geometric interpretation: Zero loss needs a margin 𝒘2 𝒙 𝑦 ≥ 1, in terms of geometric margin

Soft-margin SVM Regularized hinge loss optimization

SVM : Hinge loss

Logistic Regression : Log loss ( log conditional likelihood)

Log loss Hinge loss

𝑛 training points (𝒙$ , 𝒙# , ⋯ , 𝒙] )

𝒘 – weights on features (𝒅-dim problem)

o Convex quadratic program – quadratic objective, linear constraints

Constraint inactive Constraint active

Moving the constraint to objective function

Dual problem (unconstrained):

Primal problem 𝑃: Optimization problem 𝑅𝑃 (possibly derived from 𝑃):

min 𝑍d = 𝑓(𝒙) min 𝑍nd = 𝛷(𝒙)

v 𝑅𝑃 is a relaxation of the primal problem 𝑃 if:

Primal problem 𝑃: Lagrangian relaxation of 𝑃:

v The choice of the multipliers also affects the gap

min 𝑍d = 24𝑥$ + 14𝑥#

o If 𝜆$ = 8 and 𝑥$ , 𝑥# = 3, 1 , the first constraint 𝐶$ = 3𝑥$ + 𝑥# − 12 = −2 < 0, meaning that the

o Another way of saying the same thing is that, at the optimum:

Primal problem: p* = Dual problem: d* =

To see this, recall

For every feasible x (i.e. x ≥ b) and feasible α (i.e. α ≥ 0) , notice that

Ø Dual problem (maximization) is always concave even if primal is not convex

Primal problem: p* = Dual problem: d* =

• 1. 5L(x⇤ , ↵⇤ ) = 0 i.e. Gradient of Lagrangian at x⇤ and ↵⇤ is zero.

• 3. ↵⇤ 0 i.e. ↵⇤ is dual feasible

Optimization over 𝑥 is unconstrained.

Now need to maximize 𝐿(𝑥 ∗, 𝛼) over 𝛼 ≥ 0

a = 0 constraint is inactive, α > 0 constraint is active and tight

𝒘 – weights on features (𝒅-dim problem)

• Dual problem (derivation):

a – weights on training pts (𝒏-dim problem)

Substituting the solutions from min part in 𝐿:

Dual problem is also QP

aj > 0 Only few ajs can be non-zero :

Dual problem is also QP

Use support vectors with 𝛼U > 0 to compute 𝑏, since constraint is tight 𝒘2 ; 𝒙@ + 𝑏 𝑦@ = 1

• Dual problem: Lagrange

Dual problem is also QP

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.