0% found this document useful (0 votes)

20 views

Poly Kernel

poly

Uploaded by

joseph

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

Poly Kernel

poly

Uploaded by

joseph

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

A Short Note About the Application of

Polynomial Kernels with Fractional Degree in

Support Vector Learning

Rolf Rossius, G~rard Zenker, Andreas Ittner, and Werner Dilger

Department of Computer Science

Artificial Intelligence Group
Chemnitz University of Technology
D-09107 Chemnitz
{ros,gze,ait ,wdi}~informatik.tu-chemnitz.de
http :/ /www.tu-chemnitz.de /informatik /HomePages / K I /

A b s t r a c t . In the mid 90's a fundamental new Machine Learning ap-

proach was developed by V. N. Vapnik: The Support Vector Machine
(SVM). This new method can be regarded as a very promising approach
and is getting more and more attention in the fields where neural net-
works and decision tree methods are applied. Whilst neural networks
may be considered (correctly or not) to be well understood and are in
wide use, Support Vector Learning has some rough edges in theoretical
details and its inherent numerical tasks prevent it from being easily ap-
plied in practice. This paper picks up a new aspect - the use of fractional
degrees on polynomial kernels in the SVM - discovered in the course of
an implementation of the algorithm. Fractional degrees on polynomial
kernels broaden the capabilities of the SVM and offer the possibility to
deal with feature spaces of infinite dimension. We introduce a method to
simplify the quadratic programming problem, as the core of the SVM.

1 Introduction

Well known representatives of classification and prediction methods in the field of

Machine Learning are neural networks and methods for generation different kinds
of decision trees. An innovative and still relatively unknown learning approach
is the Support Vector Machine (SVM) developed by V. N. Vapnik in the mid
90's. Support Vector Learning [IRZ98] is not just another approach to learning
techniques, rather it can be regarded as a fundamental new philosophy in the
area of Machine Learning.
T h e underlying principle of the SVM is the principle of the Structural Risk
Minimization (SRM) [Vap95]. In contrast to a pure minimization of the empirical
risk the SRM is based on the "idea of the simplicity" and unifies Empirical Risk
Minimization and the problem of Model Selection. T h e searched binary classifier
for the problem

(xl,yl),...,(xt,yt), x~ E R n, Yi E { q - l , - 1 } ,
144

has to be a function from the set

{ f , : a E T'}, f , : Rn "-~ { + 1 , - 1 } , z~-~y,
and should reflect the real inherent essence of the given learning problem. This
essence can be regarded as the simplest (in some sense) separation of the feature
space. Here simplicity will be formalized by means of the VC dimension, i. e. a
measure of the considered set of feasible functions, e. g. the family of separating
hyperplanes. The SRM is enforced by controlled bounding of the VC dimensions
of the set {fa} and ensures the excellent generalization ability of the SVM. The
underlying theory of the SRM will not be explained in detail in this paper. We
refer to [Vap95] which covers the SRM and the application in the SVM.
The separating hyperplane is characterized by (w, x) + b = 0. The distance
between the hyperplane and the examples should be maximized, i. e. one has to
solve a problem of mathematical programming. For the non-separable case slack
variables ~ _> 0 are introduced, which leads to:
~(w,w) + C~,i= 1 ~i -¢ min
lyi[(w, xi)+b] > 1 - ~ i Vi= 1,...,l (1)
>_ o vi= 1,...,z,
where the capacity parameter C > 0 controls the interrelationship between the
accuracy of the classifier on the learning set and its ability of generalization, i. e.
the accuracy on an unseen test set.
The vector w, as the solution of (1), determines the optimal hyperplane. It
can be expressed as a linear combination of a possibly small subset of the whole
learning data:
l

= oby, x, = ¢2)
i=1 SV

Support Vectors are such vectors xi, which satisfy yi [(w, xi) + b] = 1, i. e. which
have a nonzero ai and effectively contribute to the description of the separating
hyperplane. Hence in (2) one can reduce 0# to a linear combination of support
vectors. Less formally these support vectors can be viewed as the examples on
the frontline guarding the own class against the examples of the other one and
are essential for the concept to be learned.
Considering (2) one has to solve the following optimization problem:
AT1 --
I O < A <AcT1AA -+ I a x
(3)
'1 do
with A = ( a l , . . - , a l ) , l = ( 1 , . . . , 1), and Y = (Yl,...,Yt)- The HESSE matrix
A consists of the elements A~j = yiyj(z~, zj) for i,j = 1,... ,l [CV95].
However in the general case the linear separation in the original feature space
will not provide a sufficient classifier. Therefore the original feature space is
145

expanded to a very high dimensional image space by (e.g.):

: Itn _+ R g , n << N,~(z) "-" ( 1 , 7 1 X l , . ,"[nXn,"/n+lX2,"Yn+2XlX2,

. . . . . ,"[kXdn),

and in this space the linear separation is performed. An inverse transformation

back into I t " results in a non-linear separation in the original space of the task
supplied features:

f(z) = (w, q~(x)) -I- b .

It is not necessary to expand the feature space explicitly. One way to do the
mapping implicitly is to use kernels K(u, v) (respectively dot products). In this
context the fundamental interrelation is:

K(u, v) = (~(u), 4)(v)).

The symmetric function K(u, v) may be a dot product for the high dimensional
image space, if the eigenvalues are positive. One rather simple type of such
kernels is representable as

g(u, v) = ((u, v) + l) a , d=l,2,... (4)

with degree d as an integer. Another choice may be K(u, v) = e-I1-:~11. A gen-

eralized kind of the kernel (4) will be examined in this paper.

2 Polynomial Kernels with Fractional Degree

Interestingly a fixed chosen kernel K(u, v) induces not only exactly one trans-
formation but a manifold of such mappings 4. Even the dimensionality of the
image space R g i s not determined. From (4) for d = 2 and n -" 2 one gets:

• (u) = (1, V~Ul, v~u2, u 2, V~UlU2, u2), u = (Ul, u2)

as well as

and infinite number of others.

Therefore a question arises: Choosing a kernel K(u, v) - which is the space
of smallest dimension for an image of 4? The answer for d e N is (n+d) (or
equivalently (n+u)). While selecting an appropriate kernel g via the exponent d,
there are huge discontinuities in the dimensionalities of the corresponding image
spaces. The approximation and generalization capacity may be controlled by
bounding the norm of the separating hyperplane, but another tuning parameter
will still be there: the dimensionality (cf. Table 1).
Using a fractional exponent in the kernel (4) we encounter an interesting
property: the dot product (u, v) may be less than - 1 and we have a negative
146

I" \ all 11 21 31 al 51 6 7
2 2 6 10 15 21 28 36
16 16 1531 9 6 9 4 . 8 x l 0 s 2.0×104 7.5×104 2.5×105
256 256 3.3 × 1042.9 × 106 1.9 x 10s 9.7 × 109 4,2 x 1011 1.6 × 10 la

Table 1. Dimension of image space for polynomial kernel with exponent d and n origi-
nal features. The dimension of the image space (where the linear separation takes place)
grows quite rapidly - an explicit computation in this space would be impossible. But as
mentioned before, this is fortunately not required. Rather the value itself should guide
the user to a conjecture about the separating abilities of the associated hyperplane.

base to raise, t Hence the HEssE Matrix A will not be real valued and therefore
symmetric (A T = A) anymore, but in fact contain complex entries. Nevertheless,
A has the property of hermiticity (A T = A). This allows for a new formulation
of (3). Because

AT AA = AT AT A = AT ~1 ( A + A T ) A = A T 1 ( A + A - - ) A = A T R e ( A ) A

we equivalently solve

I AT1 -- 1ATRe (A) A -+ max

~iO < A < C1 (5)
( A T'Y ---0

instead, and get rid of the complex entries. (Re(A) denotes the real part.)
Exposing the kernel for arbitrary exponents d we get according to TAYLOR:

((u, v) + 1) d = 1 + d(u, v) + d (d2!- 1__.~<u,

) v) 2 + d (d - 1)3!(d - 2) (u, v) 3

-t d (d - 1) (d4I- 2) (d - 3) (?2, ~))4 + . . .

Non-integer exponents do not terminate the series like the integer ones, but the
influence of high-order terms decreases nevertheless. In contrast to kernels with
an integer exponent there are no mappings • corresponding to such a fractional
exponent kernel which have an image space of finite dimension.
Fractional degrees allow a more continuous range of concepts. The resulting
separating hyperplanes smoothly change the shapes with the exponent d. This
will be of importance especially for domains dealing with feature spaces which
already cover tens, hundreds or more dimensions (e.g. recognition of graphical
images), where a lower degree of a polynomial kernel is preferred. A simple
artificial problem in a two dimensional feature space is presented in Figure 1.
[Fri93]
1 One could imagine this in the original space: The representing vectors u and v of
both participating examples form a sufficient obtuse angle.
147

=.as
• ::::::::::::::::::::::::::::::::::: ,~.:: u.
eaet-ee~. • el.% ~ a" : : : : : : : : :::~:, ..~ i V ' : i ~ ' . : " . ~ .!::!,~
i.............. * ~' ~ ,.~-.., ~.. •
~al- ::::::::::::::::::::::::::::::::::::::
| ====================================
" "~.'0 *e e~. "e % I •
::::::::::::::::::::::::::::::::::
•a f • :iiiiiiiii~ii:~'iii?~
================================
i'~ i*i~i.i ~ i:
::::::::::::::::::::::::::::::::
"~ : ' / : ~ i i i i i i i i i i i i i i i i i i i i i i i i i • ,

:;:-;:.::
• x~l ~ 5 ~ a =====================
_ u • :::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::

ata

|
::::::::::::::::::::::::::
:::::::::::::::::::::::::::
============================== :::::::::::::::::::::::::::
: ================================
::::::::::::::::::::::::::::: ::::::::::::::::::::::::::::::
• ================================ ~::::iii~iiiiiiii:: ~i~ii ~ii::~::iii
• ~ -- ":iiii!!ii!i!!ii:'i!!i!!!!!!!!!!
;•~.:• ========================================= ? .~-:;:~i!iiii::iiiiiii!ii!iiiiii!i~ • ",
.. ~ ~ ~ - ======================== ~ • . • • • ===========================

~• ~ ::::::::::::::::::::::::::::
oa o, oa oa

Fig. 1. Continuous variation of exponent d. 226 examples, class distribution 93/133,

90 % used to generate the separation. Two properties of the problem are significant: low
dimensionality of the original feature space, difficultly crossed arrangement of examples
in the lower right area. As expected, a somewhat higher exponent of the polynomial
kernel is necessary for the approximation of the concept.

3 The "1/2 Trick"

Realizing the SVM as a whole, the solution of the quadratic optimization prob-
lem (quadratic programming, QP) - actually a series of such, with different
parameters - constitutes the real amount of work. Generally the QP task is for
the most part determined by the calculation of function values, gradients (or
its estimations). It makes more difficulties here because of the (potential) large
HESSE matrix and its nonsparsity.
We tackle this by choosing a kernel of the type ((u, v) + 1) 6 with d = m + 1
and m E N. The corresponding entry in the resulting HESSE matrix (Re(A) in
(5)) will vanish for negative ((u, v) + 1).
The SVM algorithm selects a separating hyperplane according to a criterion
of sufficient values on the training examples as well as the minimization of the
norm of the hyperplane. Unfortunately, • is nonlinear - the resulting shape of the
function and thus the border between the predicted areas of both classes varies
with uniform translations of the examples in the feature space. For instance
the resulting separation lines for different centered sets of the well known XOR
problem are depicted in Figure 2. A second degree kernel is used.
Despite of the non-invariance against the uniform translation of the examples
in the feature space, one could center the set into the origin of the co-ordinate
system to obtain a sufficient obtuse angle between a large number of pairs of
examples. This will result in a sparser HESSE matrix for the QP task. Up to
50 % of the entries may be zeroed by means of this smart approach.
148

i~i!i!igiiiiii!i!i!iiiig~" " ".~.~ : ::::::::::::::::::::::::::::::::

"~iii{i!ili~:
• ~iiiiiiiiiii}iii!ii~~: ' !!iiii~iiiiiili!i!ii~'"

:::::::::::::::::::::::::::

i~iiii~i~iiiiiiii~iiiiiiiii '2! .~:~ii. 05 • • ====================================

iil}iii;iiiiiiiiiiiiii;iiil
°"7Iggiiggglggi
. i!i!!i!;ii~
" . . -~,~ :

: ii?i!~:
°" !ii!gi!!i!!ig!i!!ig!iigiil
==============================

~,,
:::::::::::::::::::::::::::::

::::::::::::::::::::::::::::
~. .:S~!~!i~!?2f?J!?~!}i!21!?i}!ii!i~i

~
iiiiiiiliiliiiii;
!?iiii!ii!?i!iiiii!~i!?i#?ii?ii?

/x + }, y + }) are members of one class, while the two other examples (x - 6' Y +
Ix + ~, y - ~) belong to a second class. The four points are centered on (x, y).

4 Summary

The Support Vector algorithm shows some promising properties but needs s o m e
refinement especially on the level of practical realization to soften the enormous
effort to find the "simplest" explanation for a learning problem. Polynomial
kernels with fractional degrees provide a broader range of concepts as well as a
way to reduce the numerical effort to be spent in the QP.
T h e algorithm works well with a feature space of "similar" features. Is is
often preferred to do a componentwise transformation to normalize the data in
front of the number crunching task of the SVM itself. For specific domains this
could be done in the kernel function.

References

[CV95] C. Cortes and V. N. Vapnik. Support-vector networks. Machine Learning,

20:273-297, 1995.
[Fri93] B. Fritzke. Growing cell structures - a serf-organizing network for unsupervised
and supervised learning. Technical Report 93-026, International Computer
Science Institute, Berkeley, California, 1993.
[IRZ98] A. Ittner, R. Rossius, and G. Zenker. Support Vector Learning. Technical Re-
port CSR-98, Chemnitz University of Technology, Chemnitz, Germany, 1998.
[Vap95] V. N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag,
1995.

Support Vector Machine
100% (2)
Support Vector Machine
11 pages
Some Methods of Constructing Kernel
No ratings yet
Some Methods of Constructing Kernel
23 pages
SchSmo03c
No ratings yet
SchSmo03c
24 pages
A Tutorial on ν-Support Vector Machines: 1 An Introductory Example
No ratings yet
A Tutorial on ν-Support Vector Machines: 1 An Introductory Example
29 pages
SVM Overview
No ratings yet
SVM Overview
4 pages
Kernal Methods Machine Learning
No ratings yet
Kernal Methods Machine Learning
53 pages
0701907v3
No ratings yet
0701907v3
53 pages
Ds 11
No ratings yet
Ds 11
21 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
Support Vector Networks
No ratings yet
Support Vector Networks
25 pages
Support Vector Network
No ratings yet
Support Vector Network
25 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
45 pages
4.4-InstanceBasedLearning Part 2
No ratings yet
4.4-InstanceBasedLearning Part 2
16 pages
SML Unit 4
No ratings yet
SML Unit 4
61 pages
Discriminative and Generative Methods For Bags of Features: Zebra Non-Zebra
No ratings yet
Discriminative and Generative Methods For Bags of Features: Zebra Non-Zebra
40 pages
C. Cifarelli Et Al - Incremental Classification With Generalized Eigenvalues
No ratings yet
C. Cifarelli Et Al - Incremental Classification With Generalized Eigenvalues
25 pages
Support Vector Machine
No ratings yet
Support Vector Machine
12 pages
B43 Exp3 ML
No ratings yet
B43 Exp3 ML
5 pages
03 - Kernelization
No ratings yet
03 - Kernelization
32 pages
Polynomial Kernel - Wikipedia
No ratings yet
Polynomial Kernel - Wikipedia
2 pages
cs229 Notes3
No ratings yet
cs229 Notes3
30 pages
L5-Support Vector Machine
No ratings yet
L5-Support Vector Machine
61 pages
1-s2.0-S1875389212003471-main
No ratings yet
1-s2.0-S1875389212003471-main
8 pages
Kernel Nearest-Neighbor Algorithm
No ratings yet
Kernel Nearest-Neighbor Algorithm
10 pages
SVM_1997
No ratings yet
SVM_1997
11 pages
SVM Intro
No ratings yet
SVM Intro
23 pages
תרגול - SVM 1
No ratings yet
תרגול - SVM 1
32 pages
SVM Kernel Functions
No ratings yet
SVM Kernel Functions
12 pages
Reproducing Kernel Hilbert Space, Mercer's Theorem, Eigenfunctions, Nystr Om Method, and Use of Kernels in Machine Learning: Tutorial and Survey
No ratings yet
Reproducing Kernel Hilbert Space, Mercer's Theorem, Eigenfunctions, Nystr Om Method, and Use of Kernels in Machine Learning: Tutorial and Survey
31 pages
DSA5102X_lecture2
No ratings yet
DSA5102X_lecture2
43 pages
Introduction To Support Vector Machines: 1 Description
No ratings yet
Introduction To Support Vector Machines: 1 Description
15 pages
Lecture09 SVM Intro, Kernel Trick (Updated)
No ratings yet
Lecture09 SVM Intro, Kernel Trick (Updated)
36 pages
High Dimensional Representation
No ratings yet
High Dimensional Representation
33 pages
07 Kernels
No ratings yet
07 Kernels
6 pages
Icml Tutorial
No ratings yet
Icml Tutorial
85 pages
Vahid
No ratings yet
Vahid
18 pages
Lec5 Support vector machine
No ratings yet
Lec5 Support vector machine
28 pages
4__Separator_Margin_LogisticRegression_SVM
No ratings yet
4__Separator_Margin_LogisticRegression_SVM
118 pages
Lecture 05
No ratings yet
Lecture 05
49 pages
Kernels Regularization and Differential Equations
No ratings yet
Kernels Regularization and Differential Equations
16 pages
This Is
No ratings yet
This Is
7 pages
Machine Learning
No ratings yet
Machine Learning
46 pages
Support Vector Machine SVM
No ratings yet
Support Vector Machine SVM
58 pages
An Introduction To Kernel Methods: C. Campbell
No ratings yet
An Introduction To Kernel Methods: C. Campbell
38 pages
KernelMethods
No ratings yet
KernelMethods
19 pages
2008-Kukenys+Mccane-svms For Human Face Detection
No ratings yet
2008-Kukenys+Mccane-svms For Human Face Detection
4 pages
SVM
No ratings yet
SVM
43 pages
Grauman Darrell Iccv05
No ratings yet
Grauman Darrell Iccv05
8 pages
Linear Processing: Spring 2023 16-725 (Cmu Ri) : Bioe 2630 (Pitt) Dr. John Galeotti
No ratings yet
Linear Processing: Spring 2023 16-725 (Cmu Ri) : Bioe 2630 (Pitt) Dr. John Galeotti
20 pages
Kernel Models 1233
No ratings yet
Kernel Models 1233
56 pages
SVM-Worked Out Example
No ratings yet
SVM-Worked Out Example
4 pages
Tutorial4 SVM
No ratings yet
Tutorial4 SVM
8 pages
Face Recognition Using PCA and SVM
No ratings yet
Face Recognition Using PCA and SVM
5 pages
Fast Kernel Classifiers
No ratings yet
Fast Kernel Classifiers
41 pages
Lecture Notes - SVM
No ratings yet
Lecture Notes - SVM
13 pages
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
No ratings yet
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
16 pages
SVM Seminarbericht Hofmann
No ratings yet
SVM Seminarbericht Hofmann
16 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Chen and Zhou - 2022 - Drivers of Salespeople's AI Acceptance What Do Ma
No ratings yet
Chen and Zhou - 2022 - Drivers of Salespeople's AI Acceptance What Do Ma
15 pages
Ai HF
No ratings yet
Ai HF
15 pages
Platform - DBG White Paper
No ratings yet
Platform - DBG White Paper
8 pages
Robust CAPTCHAs Towards Malicious OCR
No ratings yet
Robust CAPTCHAs Towards Malicious OCR
13 pages
Aerial Imagery For Roof Segmentation A Large-Scale
No ratings yet
Aerial Imagery For Roof Segmentation A Large-Scale
18 pages
Instant ebooks textbook HBR's 10 Must Reads 2020: The Definitive Management Ideas of the Year from Harvard Business Review (with bonus article "How CEOs Manage Time" by Michael E. Porter and Nitin Nohria) 2020th Edition Harvard Business Review download all chapters
100% (3)
Instant ebooks textbook HBR's 10 Must Reads 2020: The Definitive Management Ideas of the Year from Harvard Business Review (with bonus article "How CEOs Manage Time" by Michael E. Porter and Nitin Nohria) 2020th Edition Harvard Business Review download all chapters
40 pages
Sample Term Paper
No ratings yet
Sample Term Paper
7 pages
Predicting Article Retweets and Likes Based On The Title Using Machine Learning
No ratings yet
Predicting Article Retweets and Likes Based On The Title Using Machine Learning
10 pages
Maize Seed Variety Identification Model
No ratings yet
Maize Seed Variety Identification Model
9 pages
MPS-2-EM-2024-25-MP_watermark
No ratings yet
MPS-2-EM-2024-25-MP_watermark
11 pages
#### JUIT (Pre-Registration Choice Send To HOD)
No ratings yet
#### JUIT (Pre-Registration Choice Send To HOD)
2 pages
A Systematic Literature Review on the Integration of AI in Software Engineering Phases and Activities
No ratings yet
A Systematic Literature Review on the Integration of AI in Software Engineering Phases and Activities
3 pages
What Is NLP?: Natural Language Processing in AI
No ratings yet
What Is NLP?: Natural Language Processing in AI
5 pages
Position Paper - UE for MD using AI_ML Technology_final_draft_2024-10-16
No ratings yet
Position Paper - UE for MD using AI_ML Technology_final_draft_2024-10-16
39 pages
Real Time Object Detection Using Deep Learning
No ratings yet
Real Time Object Detection Using Deep Learning
6 pages
Federated AI for Real World Business Scenarios 1st Edition Dinesh C. Verma instant download
100% (3)
Federated AI for Real World Business Scenarios 1st Edition Dinesh C. Verma instant download
73 pages
Stress Detection Using Deep Neural Networks
No ratings yet
Stress Detection Using Deep Neural Networks
11 pages
Message
No ratings yet
Message
1 page
Advancing Academic Integrity in Online Learning
No ratings yet
Advancing Academic Integrity in Online Learning
23 pages
Satyam Mini Project
No ratings yet
Satyam Mini Project
52 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
9 pages
Introduction To ML P1
No ratings yet
Introduction To ML P1
21 pages
Neurons to GenerativeAI V2 Roadmap
No ratings yet
Neurons to GenerativeAI V2 Roadmap
14 pages
Acmegrade Artificial Intelligence April'24 Project List
No ratings yet
Acmegrade Artificial Intelligence April'24 Project List
5 pages
Practical - 6
No ratings yet
Practical - 6
6 pages
BICON - Brochure
No ratings yet
BICON - Brochure
4 pages
2025 March the State of Ai How Organizations Are Rewiring to Capture Value_final
No ratings yet
2025 March the State of Ai How Organizations Are Rewiring to Capture Value_final
26 pages
664cff5e Ai Governance Playbook Innovation Meets Security For Responsible Ai Adoption
No ratings yet
664cff5e Ai Governance Playbook Innovation Meets Security For Responsible Ai Adoption
23 pages
FINTECH AND ANALYTICS
No ratings yet
FINTECH AND ANALYTICS
6 pages
Application de L'intelligence Artificielle Au Réservoir
No ratings yet
Application de L'intelligence Artificielle Au Réservoir
168 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Poly Kernel

Uploaded by

Poly Kernel

Uploaded by

A Short Note About the Application of

Polynomial Kernels with Fractional Degree in

Rolf Rossius, G~rard Zenker, Andreas Ittner, and Werner Dilger

Department of Computer Science

A b s t r a c t . In the mid 90's a fundamental new Machine Learning ap-

Well known representatives of classification and prediction methods in the field of

has to be a function from the set

expanded to a very high dimensional image space by (e.g.):

: Itn _+ R g , n << N,~(z) "-" ( 1 , 7 1 X l , . ,"[nXn,"/n+lX2,"Yn+2XlX2,

and in this space the linear separation is performed. An inverse transformation

f(z) = (w, q~(x)) -I- b .

K(u, v) = (~(u), 4)(v)).

g(u, v) = ((u, v) + l) a , d=l,2,... (4)

with degree d as an integer. Another choice may be K(u, v) = e-I1-:~11. A gen-

2 Polynomial Kernels with Fractional Degree

• (u) = (1, V~Ul, v~u2, u 2, V~UlU2, u2), u = (Ul, u2)

and infinite number of others.

I AT1 -- 1ATRe (A) A -+ max

((u, v) + 1) d = 1 + d(u, v) + d (d2!- 1__.~<u,

-t d (d - 1) (d4I- 2) (d - 3) (?2, ~))4 + . . .

Fig. 1. Continuous variation of exponent d. 226 examples, class distribution 93/133,

3 The "1/2 Trick"

i~i!i!igiiiiii!i!i!iiiig~" " ".~.~ : ::::::::::::::::::::::::::::::::

i~iiii~i~iiiiiiii~iiiiiiiii '2! .~:~ii. 05 • • ====================================

[CV95] C. Cortes and V. N. Vapnik. Support-vector networks. Machine Learning,

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.