Poly Kernel
Poly Kernel
1 Introduction
(xl,yl),...,(xt,yt), x~ E R n, Yi E { q - l , - 1 } ,
144
= oby, x, = ¢2)
i=1 SV
Support Vectors are such vectors xi, which satisfy yi [(w, xi) + b] = 1, i. e. which
have a nonzero ai and effectively contribute to the description of the separating
hyperplane. Hence in (2) one can reduce 0# to a linear combination of support
vectors. Less formally these support vectors can be viewed as the examples on
the frontline guarding the own class against the examples of the other one and
are essential for the concept to be learned.
Considering (2) one has to solve the following optimization problem:
AT1 --
I O < A <AcT1AA -+ I a x
(3)
'1 do
with A = ( a l , . . - , a l ) , l = ( 1 , . . . , 1), and Y = (Yl,...,Yt)- The HESSE matrix
A consists of the elements A~j = yiyj(z~, zj) for i,j = 1,... ,l [CV95].
However in the general case the linear separation in the original feature space
will not provide a sufficient classifier. Therefore the original feature space is
145
It is not necessary to expand the feature space explicitly. One way to do the
mapping implicitly is to use kernels K(u, v) (respectively dot products). In this
context the fundamental interrelation is:
as well as
I" \ all 11 21 31 al 51 6 7
2 2 6 10 15 21 28 36
16 16 1531 9 6 9 4 . 8 x l 0 s 2.0×104 7.5×104 2.5×105
256 256 3.3 × 1042.9 × 106 1.9 x 10s 9.7 × 109 4,2 x 1011 1.6 × 10 la
Table 1. Dimension of image space for polynomial kernel with exponent d and n origi-
nal features. The dimension of the image space (where the linear separation takes place)
grows quite rapidly - an explicit computation in this space would be impossible. But as
mentioned before, this is fortunately not required. Rather the value itself should guide
the user to a conjecture about the separating abilities of the associated hyperplane.
base to raise, t Hence the HEssE Matrix A will not be real valued and therefore
symmetric (A T = A) anymore, but in fact contain complex entries. Nevertheless,
A has the property of hermiticity (A T = A). This allows for a new formulation
of (3). Because
AT AA = AT AT A = AT ~1 ( A + A T ) A = A T 1 ( A + A - - ) A = A T R e ( A ) A
we equivalently solve
instead, and get rid of the complex entries. (Re(A) denotes the real part.)
Exposing the kernel for arbitrary exponents d we get according to TAYLOR:
Non-integer exponents do not terminate the series like the integer ones, but the
influence of high-order terms decreases nevertheless. In contrast to kernels with
an integer exponent there are no mappings • corresponding to such a fractional
exponent kernel which have an image space of finite dimension.
Fractional degrees allow a more continuous range of concepts. The resulting
separating hyperplanes smoothly change the shapes with the exponent d. This
will be of importance especially for domains dealing with feature spaces which
already cover tens, hundreds or more dimensions (e.g. recognition of graphical
images), where a lower degree of a polynomial kernel is preferred. A simple
artificial problem in a two dimensional feature space is presented in Figure 1.
[Fri93]
1 One could imagine this in the original space: The representing vectors u and v of
both participating examples form a sufficient obtuse angle.
147
=.as
• ::::::::::::::::::::::::::::::::::: ,~.:: u.
eaet-ee~. • el.% ~ a" : : : : : : : : :::~:, ..~ i V ' : i ~ ' . : " . ~ .!::!,~
i.............. * ~' ~ ,.~-.., ~.. •
~al- ::::::::::::::::::::::::::::::::::::::
| ====================================
" "~.'0 *e e~. "e % I •
::::::::::::::::::::::::::::::::::
•a f • :iiiiiiiii~ii:~'iii?~
================================
i'~ i*i~i.i ~ i:
::::::::::::::::::::::::::::::::
"~ : ' / : ~ i i i i i i i i i i i i i i i i i i i i i i i i i • ,
:;:-;:.::
• x~l ~ 5 ~ a =====================
_ u • :::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::
ata
|
::::::::::::::::::::::::::
:::::::::::::::::::::::::::
============================== :::::::::::::::::::::::::::
: ================================
::::::::::::::::::::::::::::: ::::::::::::::::::::::::::::::
• ================================ ~::::iii~iiiiiiii:: ~i~ii ~ii::~::iii
• ~ -- ":iiii!!ii!i!!ii:'i!!i!!!!!!!!!!
;•~.:• ========================================= ? .~-:;:~i!iiii::iiiiiii!ii!iiiiii!i~ • ",
.. ~ ~ ~ - ======================== ~ • . • • • ===========================
~• ~ ::::::::::::::::::::::::::::
oa o, oa oa
"~iii{i!ili~:
• ~iiiiiiiiiii}iii!ii~~: ' !!iiii~iiiiiili!i!ii~'"
:::::::::::::::::::::::::::
iil}iii;iiiiiiiiiiiiii;iiil
°"7Iggiiggglggi
. i!i!!i!;ii~
" . . -~,~ :
: ii?i!~:
°" !ii!gi!!i!!ig!i!!ig!iigiil
==============================
©
:::::::::::::::::::::::::::::
~,,
:::::::::::::::::::::::::::::
::::::::::::::::::::::::::::
~. .:S~!~!i~!?2f?J!?~!}i!21!?i}!ii!i~i
~
iiiiiiiliiliiiii;
!?iiii!ii!?i!iiiii!~i!?i#?ii?ii?
/x + }, y + }) are members of one class, while the two other examples (x - 6' Y +
Ix + ~, y - ~) belong to a second class. The four points are centered on (x, y).
4 Summary
The Support Vector algorithm shows some promising properties but needs s o m e
refinement especially on the level of practical realization to soften the enormous
effort to find the "simplest" explanation for a learning problem. Polynomial
kernels with fractional degrees provide a broader range of concepts as well as a
way to reduce the numerical effort to be spent in the QP.
T h e algorithm works well with a feature space of "similar" features. Is is
often preferred to do a componentwise transformation to normalize the data in
front of the number crunching task of the SVM itself. For specific domains this
could be done in the kernel function.
References