SNNSv4 2 Manual
SNNSv4 2 Manual
SNNSv4 2 Manual
UNIVERSITY OF STUTTGART
WILHELM{SCHICKARD{INSTITUTE
FOR COMPUTER SCIENCE
Department of Computer Architecture
UNIVERSITY OF STUTTGART
WILHELM{SCHICKARD{INSTITUTE
FOR COMPUTER SCIENCE
Department of Computer Architecture
SNNS
Stuttgart Neural Network Simulator
Andreas Zell, G
unter Mamier, Michael Vogt
Jens Wieland, J
urgen Gatter
external contributions by
Martin Reczko, Martin Riedmiller
1 Introduction to SNNS 1
2 Licensing, Installation and Acknowledgments 4
2.1 SNNS License . . . . . . . . . . .. . . .. . .. . .. . . .. . .. . . .. . . 5
2.2 How to obtain SNNS . . . . . . .. . . .. . .. . .. . . .. . .. . . .. . . 6
2.3 Installation . . . . . . . . . . . .. . . .. . .. . .. . . .. . .. . . .. . . 7
2.4 Contact Points . . . . . . . . . .. . . .. . .. . .. . . .. . .. . . .. . . 11
2.5 Acknowledgments . . . . . . . . .. . . .. . .. . .. . . .. . .. . . .. . . 12
2.6 New Features of Release 4.2 . . .. . . .. . .. . .. . . .. . .. . . .. . . 15
3 Neural Network Terminology 18
3.1 Building Blocks of Neural Nets . . . . .. . .. . .. . . .. . .. . . .. . . 18
3.1.1 Units . . . . . . . . . . . . . . .. . .. . .. . . .. . .. . . .. . . 19
3.1.2 Connections (Links) . . . . . . .. . .. . .. . . .. . .. . . .. . . 23
3.1.3 Sites . . . . . . . . . . . . . . . .. . .. . .. . . .. . .. . . .. . . 24
3.2 Update Modes . . . . . . . . . . . . . .. . .. . .. . . .. . .. . . .. . . 24
3.3 Learning in Neural Nets . . . . . . . . .. . .. . .. . . .. . .. . . .. . . 25
3.4 Generalization of Neural Networks . . .. . .. . .. . . .. . .. . . .. . . 27
3.5 An Example of a simple Network . . . .. . .. . .. . . .. . .. . . .. . . 28
4 Using the Graphical User Interface 29
4.1 Basic SNNS usage . . . . . . . . . . . . . . . . . .. . . .. . .. . . .. . . 29
4.1.1 Startup . . . . . . . . . . . . . . . . . . . .. . . .. . .. . . .. . . 29
4.1.2 Reading and Writing Files . . . . . . . . . .. . . .. . .. . . .. . . 30
4.1.3 Creating New Networks . . . . . . . . . . .. . . .. . .. . . .. . . 31
4.1.4 Training Networks . . . . . . . . . . . . . .. . . .. . .. . . .. . . 34
4.1.4.1 Initialization . . . . . . . . . . . .. . . .. . .. . . .. . . 34
4.1.4.2 Selecting a learning function . . .. . . .. . .. . . .. . . 34
4.1.5 Saving Results for Testing . . . . . . . . . .. . . .. . .. . . .. . . 36
4.1.6 Further Explorations . . . . . . . . . . . . .. . . .. . .. . . .. . . 36
4.1.7 SNNS File Formats . . . . . . . . . . . . . .. . . .. . .. . . .. . . 36
i
ii CONTENTS
Introduction to SNNS
SNNS (Stuttgart Neural Network Simulator) is a simulator for neural networks developed
at the Institute for Parallel and Distributed High Performance Systems (Institut fur Par-
allele und Verteilte Hochstleistungsrechner, IPVR) at the University of Stuttgart since
1989. The goal of the project is to create an eÆcient and
exible simulation environment
for research on and application of neural nets.
The SNNS simulator consists of four main components that are depicted in gure 1.1:
Simulator kernel, graphical user interface, batch execution interface batchman, and net-
work compiler snns2c. There was also a fth part, Nessus, that was used to construct
networks for SNNS. Nessus, however, has become obsolete since the introduction of pow-
erful interactive network creation tools within the graphical user interface and is no longer
supported. The simulator kernel operates on the internal network data structures of the
neural nets and performs all operations on them. The graphical user interface XGUI1 ,
built on top of the kernel, gives a graphical representation of the neural networks and
controls the kernel during the simulation run. In addition, the user interface can be used
to directly create, manipulate and visualize neural nets in various ways. Complex net-
works can be created quickly and easily. Nevertheless, XGUI should also be well suited
for unexperienced users, who want to learn about connectionist models with the help of
the simulator. An online help system, partly context-sensitive, is integrated, which can
oer assistance with problems.
An important design concept was to enable the user to select only those aspects of the
visual representation of the net in which he is interested. This includes depicting several
aspects and parts of the network with multiple windows as well as suppressing unwanted
information.
SNNS is implemented completely in ANSI-C. The simulator kernel has already been tested
on numerous machines and operating systems (see also table 1.1). XGUI is based upon
X11 Release 5 from MIT and the Athena Toolkit, and was tested under various window
managers, like twm, tvtwm, olwm, ctwm, fvwm. It also works under X11R6.
1
X Graphical User Interface
2 CHAPTER 1. INTRODUCTION TO SNNS
Figure 1.1: SNNS components: simulator kernel, graphical user interface xgui, batchman,
and network compiler snns2c
SNNS is
c (Copyright) 1990-96 SNNS Group, Institute for Parallel and Distributed High-
Performance Systems (IPVR), University of Stuttgart, Breitwiesenstrasse 20-22, 70565
Stuttgart, Germany, and
c (Copyright) 1996-98 SNNS Group, Wilhelm Schickard Institute
for Computer Science, University of Tubingen, Kostlinstr. 6, 72074 Tubingen, Germany.
SNNS is distributed by the University of Tubingen as `Free Software' in a licensing agree-
ment similar in some aspects to the GNU General Public License. There are a number of
important dierences, however, regarding modications and distribution of SNNS to third
parties. Note also that SNNS is not part of the GNU software nor is any of its authors
connected with the Free Software Foundation. We only share some common beliefs about
software distribution. Note further that SNNS is NOT PUBLIC DOMAIN.
The SNNS License is designed to make sure that you have the freedom to give away
verbatim copies of SNNS, that you receive source code or can get it if you want it and
that you can change the software for your personal use; and that you know you can do
these things.
We protect your and our rights with two steps: (1) copyright the software, and (2) oer
you this license which gives you legal permission to copy and distribute the unmodied
software or modify it for your own purpose.
In contrast to the GNU license we do not allow modied copies of our software to be
distributed. You may, however, distribute your modications as separate les (e. g. patch
les) along with our unmodied SNNS software. We encourage users to send changes and
improvements which would benet many other users to us so that all users may receive
these improvements in a later version. The restriction not to distribute modied copies is
also useful to prevent bug reports from someone else's modications.
Also, for our protection, we want to make certain that everyone understands that there is
NO WARRANTY OF ANY KIND for the SNNS software.
2.1. SNNS LICENSE 5
2.1 SNNS License
1. This License Agreement applies to the SNNS program and all accompanying pro-
grams and les that are distributed with a notice placed by the copyright holder
saying it may be distributed under the terms of the SNNS License. \SNNS", below,
refers to any such program or work, and a \work based on SNNS" means either
SNNS or any work containing SNNS or a portion of it, either verbatim or with
modications. Each licensee is addressed as \you".
2. You may copy and distribute verbatim copies of SNNS's source code as you receive
it, in any medium, provided that you conspicuously and appropriately publish on
each copy an appropriate copyright notice and disclaimer of warranty; keep intact
all the notices that refer to this License and to the absence of any warranty; and
give any other recipients of SNNS a copy of this license along with SNNS.
3. You may modify your copy or copies of SNNS or any portion of it only for your own
use. You may not distribute modied copies of SNNS. You may, however, distribute
your modications as separate les (e. g. patch les) along with the unmodied SNNS
software. We also encourage users to send changes and improvements which would
benet many other users to us so that all users may receive these improvements in
a later version. The restriction not to distribute modied copies is also useful to
prevent bug reports from someone else's modications.
4. If you distribute copies of SNNS you may not charge anything except the cost for
the media and a fair estimate of the costs of computer time or network time directly
attributable to the copying.
5. You may not copy, modify, sub-license, distribute or transfer SNNS except as ex-
pressly provided under this License. Any attempt otherwise to copy, modify, sub-
license, distribute or transfer SNNS is void, and will automatically terminate your
rights to use SNNS under this License. However, parties who have received copies,
or rights to use copies, from you under this License will not have their licenses
terminated so long as such parties remain in full compliance.
6. By copying, distributing or modifying SNNS (or any work based on SNNS) you
indicate your acceptance of this license to do so, and all its terms and conditions.
7. Each time you redistribute SNNS (or any work based on SNNS), the recipient auto-
matically receives a license from the original licensor to copy, distribute or modify
SNNS subject to these terms and conditions. You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
8. Incorporation of SNNS or parts of it in commercial programs requires a special
agreement between the copyright holder and the Licensee in writing and usually
involves the payment of license fees. If you want to incorporate SNNS or parts of it
in commercial programs write to the author about further details.
9. Because SNNS is licensed free of charge, there is no warranty for SNNS, to the
extent permitted by applicable law. The copyright holders and/or other parties
provide SNNS \as is" without warranty of any kind, either expressed or implied,
6 CHAPTER 2. LICENSING, INSTALLATION AND ACKNOWLEDGMENTS
including, but not limited to, the implied warranties of merchantability and tness
for a particular purpose. The entire risk as to the quality and performance of SNNS
is with you. Should the program prove defective, you assume the cost of all necessary
servicing, repair or correction.
10. In no event will any copyright holder, or any other party who may redistribute SNNS
as permitted above, be liable to you for damages, including any general, special,
incidental or consequential damages arising out of the use or inability to use SNNS
(including but not limited to loss of data or data being rendered inaccurate or losses
sustained by you or third parties or a failure of SNNS to operate with any other
programs), even if such holder or other party has been advised of the possibility of
such damages.
configure --enable-global
-> will install to /usr/local/bin
If you are totally unhappy with your SNNS installation, you can run the command
make uninstall
If you want to compile and install, clean, or uninstall only parts of SNNS, you may also
call one or more of the following commands:
make compile-kernel
make compile-tools (implies making of kernel libraries)
make compile-xgui (implies making of kernel libraries)
make clean-kernel
make clean-tools
make clean-xgui
make uninstall-kernel
make uninstall-tools
make uninstall-xgui
If you are a developer and like to modify SNNS or parts of it for your own purpose, there
are even more make targets available for the Makeles in each of the source directories.
See the source of those Makeles for details. Developers experiencing diÆculties may also
nd the target
make bugreport
useful. Please send those reports to the contact address given below.
Note, that SNNS is ready to work together with the genetic algortihm tool ENZO. A
default installation will, however, not support this. If you plan to use genetic algorithms,
you must specify --enable-enzo for the congure call and then later on compile ENZO
in its respective directory. See the ENZO Readme-le and manual for details.
in the <SNNSDIR>/kernel/sources directory will recreate the parser and reinstall the
kernel libraries. If you completely messed up your pattern parser, please use the origi-
nal kr pat parse.c/y.tab.h combination from the SNNS distribution. Don't forget to
\touch" these les before running make to ensure that they remain unchanged.
To rebuild the parser you should use bison version 1.22 or later. If your version of bison
is older, you may have to change the denition of BISONFLAGS in Makele.def. Also
look for any warning messages while running \congure". Note, that the common parser
generator yacc will not work!
The equivalent bison discussion holds true for the parser, which is used by the SNNS tool
batchman in the tools directory. Here, the orginal grammar le is called gram1.y, while
the bison created les are named gram1.tab.c and gram1.tab.h.
The parsers in SNNS receive their input from scanners which were built by the pro-
gram flex. A pre-generated version of every necessary scanner (kr pat scan.c in the
2.4. CONTACT POINTS 11
kernel/sources directory, lex.yyy.c and lex.yyz.c in the tools/sources directory) are
included in the distribution. These les are newer than the corresponding input les
(kr pat scan.l, scan1.l, scan2.l) when the SNNS distribution is unpacked. There-
fore flex is not called (and does not need to be) by default. Only if you want to change
a scanner or if you have trouble with compiling and linking you should enter the
sources directories and rebuild the scanners. To do this, you have either to touch the *.l
les or to delete the les kr pat scan.c, lex.yyy.c, and lex.yyz.c. Running
make install
in the sources directories will then recreate and reinstall all necessary parts. If you com-
pletely messed up your pattern scanners please use the original les from the SNNS distri-
bution. Don't forget to \touch" these les before runing make to ensure that they remain
unchanged.
Note, that to rebuild the scanners you must use flex. The common scanner generator
lex will not work!
Running SNNS
After installation, the executable for the graphical user interface can be found as program
xgui in the <SNNSDIR>/xgui/sources directory. We usually build a symbolic link named
snns to point to the executable xgui program, if we often work on the same machine
architecture. E.g.:
ln -s xgui/bin/<architecture>/xgui snns
This link should be placed in the user's home directory (with the proper path prex to
SNNS) or in a directory of binaries in the local user's search path.
The simulator is then called simply with
snns
For further details about calling the various simulator tools see chapter 13.
If you would like to contact other SNNS users to exchange ideas, ask for help, or distribute
advice, then post to the SNNS mailing list. Note, that you must be subscribed to it before
being able to post.
To subscribe, send a mail to
SNNS-Mail-Request@informatik.uni-tuebingen.de
With the one line message (in the mail body, not in the subject)
subscribe
You will then receive a welcome message giving you all the details about how to post.
2.5 Acknowledgments
SNNS is a joint eort of a number of people, computer science students, research assistants
as well as faculty members at the Institute for Parallel and Distributed High Performance
Systems (IPVR) at University of Stuttgart, the Wilhelm Schickard Institute of Computer
Science at the University of Tubingen, and the European Particle Research Lab CERN in
Geneva.
The project to develop an eÆcient and portable neural network simulator which later
became SNNS was lead since 1989 by Prof. Dr. Andreas Zell, who designed the predecessor
to the SNNS simulator and the SNNS simulator itself and acted as advisor for more than
two dozen independent research and Master's thesis projects that made up the SNNS
simulator and some of its applications. Over time the SNNS source grew to a total
size of now 5MB in 160.000+ lines of code. Research began under the supervision of
Prof. Dr. Andreas Reuter and Prof. Dr. Paul Levi. We are all grateful for their support
and for providing us with the necessary computer and network equipment. We also would
like to thank Prof. Sau Lan Wu, head of the University of Wisconsin research group on
high energy physics at CERN in Geneva, Switzerland for her generous support of our work
towards new SNNS releases.
The following persons were directly involved in the SNNS project. They are listed in the
order in which they joined the SNNS team.
Andreas Zell Design of the SNNS simulator, SNNS project team leader
[ZMS90], [ZMSK91b] [ZMSK91c], [ZMSK91a]
Niels Mache SNNS simulator kernel (really the heart of SNNS) [Mac90],
parallel SNNS kernel on MasPar MP-1216.
Tilman Sommer original version of the graphical user interface XGUI with in-
tegrated network editor [Som89], PostScript printing.
Ralf Hubner SNNS simulator 3D graphical user interface [Hub92], user in-
terface development (version 2.0 to 3.0).
Thomas Korb SNNS network compiler and network description language Nes-
sus [Kor89]
2.5. ACKNOWLEDGMENTS 13
Michael Vogt Radial Basis Functions [Vog92]. Together with Gunter Mamier
implementation of Time Delay Networks. Denition of the new
pattern format and class scheme.
Gunter Mamier SNNS visualization and analyzing tools [Mam92]. Implementa-
tion of the batch execution capability. Together with Michael
Vogt implementation of the new pattern handling. Compila-
tion and continuous update of the user manual. Bugxes and
installation of external contributions. Implementation of pat-
tern remapping mechanism.
Michael Schmalzl SNNS network creation tool Bignet, implementation of Cas-
cade Correlation, and printed character recognition with SNNS
[Sch91a]
Kai-Uwe Herrmann ART models ART1, ART2, ARTMAP and modication of the
BigNet tool [Her92].
Artemis Hatzigeorgiou Video documentation about the SNNS project, learning proce-
dure Backpercolation 1.1
Dietmar Posselt ANSI-C translation of SNNS.
Sven Doring ANSI-C translation of SNNS and source code maintenance.
Implementation of distributed kernel for workstation clusters.
Tobias Soyez Jordan and Elman networks, implementation of the network
analyzer [Soy93].
Tobias Schreiner Network pruning algorithms [Sch94]
Bernward Kett Redesign of C-code generator snns2c.
Gianfranco Clemente Help with the user manual
Henri Bauknecht Manager of the SNNS mailing list.
Jens Wieland Design and implementation of batchman.
Jurgen Gatter Implementation of TACOMA and some modications of Cas-
cade Correlation [Gat96].
We are proud of the fact that SNNS is experiencing growing support from people outside
our development team. There are many people who helped us by pointing out bugs or
oering bug xes, both to us and other users. Unfortunately they are to numerous to list
here, so we restrict ourselves to those who have made a major contribution to the source
code.
1
Backpercolation 1 was developed by JURIK RESEARCH & CONSULTING, PO 2379, Aptos, CA
95001 USA. Any and all SALES of products (commercial, industrial, or otherwise) that utilize the
Backpercolation 1 process or its derivatives require a license from JURIK RESEARCH & CONSUL-
TING. Write for details.
14 CHAPTER 2. LICENSING, INSTALLATION AND ACKNOWLEDGMENTS
10. the options of adding noise to the weights with the JogWeights function improved
im multiple ways.
11. improved plotting in the graph panel as well as printing option
12. when standard colormap is full, SNNS will now start with a privat map instead of
aborting.
13. analyze tool now features a confusion matrix.
14. pruning panel now more \SNNS-like". You do not need to close the panel anymore
before pruning a network.
15. Changes in batchman
(a) batchman can now handle DLVQ training
(b) new batchman command \setActFunc" allows the changing of unit activation
functions from within the training script. Thanks to Thomas Rausch, Univer-
sity of Dresden, Germany.
(c) batchman output now with \#" prex. This enables direct processing by a lot
of unix tools like gnuplot.
(d) batchman now automatically converts function parameters to correct type in-
stead of aborting.
(e) jogWeights can now also be called from batchman
(f) batchman catches some non-fatal signals (SIGINT, SIGTERM, ...) and sets
the internal variable SIGNAL so that the script can react to them.
(g) batchman features ResetNet function (e.g. for Jordan networks).
16. new tool \linknets" introduced to combine existing networks
17. new tools \td bignet" and \ bignet" introduced for script-based generation of net-
work les; Old tool bignet removed.
18. displays will be refreshed more often when using the graphical editor
19. weight and projection display with changed color scale. They now match the 2D-
display scale.
20. pat sel now can handle pattern les with multi-line comments
21. manpages now available for most of the SNNS programs.
22. the number of things stored in an xgui conguration le was greatly enhanced.
23. Extensive debugging:
(a) batchman computes MSE now correctly from the number of (sub-) patterns.
(b) RBFs receive now correct number of parameters.
(c) spurious segmentation faults in the graphical editor tracked and eliminated.
(d) segmentation fault when training on huge pattern les cleared.
2.6. NEW FEATURES OF RELEASE 4.2 17
(e) various seg-faults under single operating systems tracked and cleared.
(f) netperf now can test on networks that need multiple training parameters.
(g) segmentaion faults when displaying 3D-Networks cleared.
(h) correct default values for initialization functions in batchman.
(i) the call \TestNet()" prohibited further training in batchman. Now everything
works as expected.
(j) segmentation fault in batchman when doing multiple string concats cleared and
memory leak in string operations closed. Thanks to Walter Prins, University
of Stellenbosch, South Africa.
(k) the output of the validation error on the shell window was giving wrong values.
(l) algorithm SCG now respects special units and handles them correctly.
(m) the description of the learning function parameters in section 4.4 is nally
ordered alphabetically.
Chapter 3
11.71
-5.24 -5.24 hidden unit
6.97 6.97
input units
3.1.1 Units
Depending on their function in the net, one can distinguish three types of units: The
units whose activations are the problem input for the net are called input units ; the units
1
In the following the more common name "units" is used instead of "cells".
2
The term transfer function often denotes the combination of activation and output function. To make
matters worse, sometimes the term activation function is also used to comprise activation and output
function.
20 CHAPTER 3. NEURAL NETWORK TERMINOLOGY
output output
output function output function
activation activation
activation function activation function
site value
site function
where:
aj (t) activation of unit j in step t
netj (t) net input in unit j in step t
j threshold (bias) of unit j
The SNNS default activation function Act logistic , for example, computes the net-
work input simply by summing over all weighted activations and then squashing the
result with the logistic function fact(x) = 1=(1 + e x). The new activation at time
(t + 1) lies in the range [0; 1]4 . The variable j is the threshold of unit j .
The net input netj (t) is computed with
netj (t) = wij oi (t) if unit j has no sites
X
i
netj (t) = sjk (t) if the unit j has sites, with site values
X
k
4
Mathematically correct would be ]0; 1[, but the values 0 and 1 are reached due to arithmetic inaccuracy.
22 CHAPTER 3. NEURAL NETWORK TERMINOLOGY
where:
aj (t) activation of unit j in step t
netj (t) net input in unit j in step t
oi (t) output of unit i in step t
sjk (t) site value of site k on unit j in step t
j index for some unit in the net
i index of a predecessor of the unit j
k index of a site of unit j
wij weight of the link from unit i to unit j
j threshold (bias) of unit j
Activation functions in SNNS are relatively simple C functions which are linked to
the simulator kernel. The user may easily write his own activation functions in C and
compile and link them to the simulator kernel. How this can be done is described
later.
output function or outFunc: The output function computes the output of every
unit from the current activation of this unit. The output function is in most cases
the identity function (SNNS: Out identity). This is the default in SNNS. The
output function makes it possible to process the activation before an output occurs.
oj (t) = fout (aj (t))
where:
aj (t) Activation of unit j in step t
oj (t) Output of unit j in step t
j Index for all units of the net
Another predened SNNS-standard function, Out Clip01 clips the output to the
range of [0::1] and is dened as follows:
0 if aj (t) < 0
8
>
<
oj (t) = > 1 if aj (t) > 1
:
aj (t) otherwise
Output functions are even simpler C functions than activation functions and can be
user-dened in a similar way.
3.1. BUILDING BLOCKS OF NEURAL NETS 23
f-type:
The user can assign so called f-types (functionality types, prototypes) to
a unit. The unusual name is for historical reasons. One may think of an f-type as
a pointer to some prototype unit where a number of parameters has already been
dened:
{ activation function and output function
{ whether sites are present and, if so, which ones
These types can be dened independently and are used for grouping units into sets
of units with the same functionality. All changes in the denition of the f-type
consequently aect also all units of that type. Therefore a variety of changes becomes
possible with minimum eort.
position: Every unit has a specic position (coordinates in space) assigned to it.
These positions consist of 3 integer coordinates in a 3D grid. For editing and 2D
visualization only the rst two (x and y) coordinates are needed, for 3D visualization
of the networks the z coordinate is necessary.
subnet no: Every unit is assigned to a subnet. With the use of this variable,
structured nets can be displayed more clearly than would otherwise be possible in a
2D presentation.
layers: Units can be visualized in 2D in up to 8 layers5 . Layers can be displayed
selectively. This technique is similar to a presentation with several transparencies,
where each transparency contains one aspect or part of the picture, and some or all
transparencies can be selected to be stacked on top of each other in a random order.
Only those units which are in layers (transparencies) that are 'on' are displayed.
This way portions of the network can be selected to be displayed alone. It is also
possible to assign one unit to multiple layers. Thereby it is feasible to assign any
combination of units to a layer that represents an aspect of the network.
frozen: This attribute
ag species that activation and output are frozen. This
means that these values don't change during the simulation.
All `important' unit parameters like activation, initial activation, output etc. and all func-
tion results are computed as
oats with nine decimals accuracy.
3.1.2 Connections (Links)
The direction of a connection shows the direction of the transfer of activation. The unit
from which the connection starts is called the source unit , or source for short, while the
other is called the target unit , or target. Connections where source and target are identical
(recursive connections) are possible. Multiple connections between one unit and the same
input port of another unit are redundant, and therefore prohibited. This is checked by
SNNS.
Each connection has a weight (or strength) assigned to it. The eect of the output of one
unit on the successor unit is dened by this value: if it is negative, then the connection
5
Changing it to 16 layers can be done very easily in the source code of the interface.
24 CHAPTER 3. NEURAL NETWORK TERMINOLOGY
is inhibitory, i.e. decreasing the activity of the target unit; if it is positive, it has an
excitatory, i.e. activity enhancing, eect.
The most frequently used network architecture is built hierarchically bottom-up. The in-
put into a unit comes only from the units of preceding layers. Because of the unidirectional
ow of information within the net they are also called feed-forward nets (as example see
the neural net classier introduced in chapter 3.5). In many models a full connectivity
between all units of adjoining levels is assumed.
Weights are represented as
oats with nine decimal digits of precision.
3.1.3 Sites
A unit with sites doesn't have a direct input any more. All incoming links lead to dierent
sites, where the arriving weighted output signals of preceding units are processed with
dierent user-denable site functions (see picture 3.2). The result of the site function is
represented by the site value. The activation function then takes this value of each site as
network input.
The SNNS simulator does not allow multiple connections from a unit to the same input
port of a target unit. Connections to dierent sites of the same target units are allowed.
Similarly, multiple connections from one unit to dierent input sites of itself are allowed
as well.
g(: : :) function, depending on the activation of the unit and the teaching input
h(: : :) function, depending on the output of the preceding element and the current
weight of the link
Training a feed-forward neural network with supervised learning consists of the following
procedure:
An input pattern is presented to the network. The input is then propagated forward in
the net until activation reaches the output layer. This constitutes the so called forward
propagation phase.
The output of the output layer is then compared with the teaching input. The error,
i.e. the dierence (delta) Æj between the output oj and the teaching input tj of a target
output unit j is then used together with the output oi of the source unit i to compute
the necessary changes of the link wij . To compute the deltas of inner units for which no
teaching input is available, (units of hidden layers) the deltas of the following layer, which
are already computed, are used in a formula given below. In this way the errors (deltas)
are propagated backward, so this phase is called backward propagation.
In online learning, the weight changes wij are applied to the network after each training
pattern, i.e. after each forward and backward pass. In oine learning or batch learning
the weight changes are cumulated for all patterns in the training le and the sum of all
changes is applied after one full cycle (epoch) through the training pattern le.
The most famous learning algorithm which works in the manner described is currently
backpropagation. In the backpropagation learning algorithm online training is usually
signicantly faster than batch training, especially in the case of large training sets with
many similar training examples.
The backpropagation weight update rule, also called generalized delta-rule reads as follows:
wij = Æj oi
fj0 (netj )(P
tj oj ) if unit j is a output-unit
(
Æj = fj (netj ) k Æk wjk
0 if unit j is a hidden-unit
where:
learning factor eta (a constant)
Æj error (dierence between the real output and the teaching input) of unit j
tj teaching input of unit j
oi output of the preceding unit i
i index of a predecessor to the current unit j with link wij from i to j
j index of the current unit
k index of a successor to the current unit j with link wjk from j to k
There are several backpropagation algorithms supplied with SNNS: one \vanilla backprop-
agation" called Std Backpropagation, one with momentum term and
at spot elimination
called BackpropMomentum and a batch version called BackpropBatch. They can be cho-
sen from the control panel with the button OPTIONS and the menu selection select
learning function.
3.4. GENERALIZATION OF NEURAL NETWORKS 27
In SNNS, one may either set the number of training cycles in advance or train the network
until it has reached a predened error on the training set.
This chapter describes how to use XGUI, the X-Window based graphical user interface to
SNNS, which is the usual way to interact with SNNS on Unix workstations. It explains
how to call SNNS and details the multiple windows and their buttons and menus. Together
with the chapters 5 and 6 it is probably the most important chapter in this manual.
4.1.1 Startup
SNNS comes in two guises: It can be used via an X-windows user interface, or in 'batch
mode', that is without user interaction. To run it with the X-GUI, type snns. You
obviously need an X-terminal. The default setting for SNNS is to use colour screens,
if you use a monochrome X-terminal start it up using snns -mono. You will loose no
functionality - some things are actually clearer in black and white.
30 CHAPTER 4. USING THE GRAPHICAL USER INTERFACE
After starting the package a banner will appear which will vanish after you click the left
mouse button in the panel. You are then left with the SNNS manager panel.
File Training & Information Network Error Graph Network
handling testing control about single units Diagram Definition
or weights
Exit
Help
current directory
file name
scrollbar
click here to
load
save
currently available. Double-clicking on one of the lenames, say 'letters' will copy the
network name into the le name window. To load the network simply click on LOAD .
You can also enter the lename directly into the le name window (top left).
at the moment
ignore all this
POS to change the unit type and relative position. The relative position is not used for
the rst plane of units (there is nothing to position it relatively to). The layers will, for
instance, be positioned below the previous layers if the 'Rel. Position' has been changed
to 'below' by clicking on the POS button.
Here is an example of how to create a simple pattern associator network with a 5x7 matrix
of inputs inputs, 10 hidden units and 26 outputs:
Leave 'Type' as input
set no 'x' direction to 5
set no 'y' direction to 7
and click on ENTER
If the input is acceptable it will be copied to the column to the left. The next step is to
dene the hidden layer, containing 10 units, positioned to the right the inputs.
4.1. BASIC SNNS USAGE 33
Change 'Type' from input to hidden by clicking on TYPE once.
set no 'x' direction to 1
set no 'y' direction to 10
change 'Rel.Position' to 'below' by clicking on POS
and click on ENTER
You are now ready to dene the output plane, here you want 26 output units to the right
of the input. You may want to save space and arrange the 26 outputs as two columns of
13 units each.
Change 'Type' from hidden to output by clicking on TYPE again.
set no 'x' direction to 2
set no 'y' direction to 13
and click on ENTER
After dening the layer topology the connections have to be made. Simply click on
FULL CONNECTION (bottom left of lower panel). Then select CREATE NET and
DONE . You may have to conrm the destruction of any network already present.
Selection of DISPLAY from the SNNS-manager panel should result in gure 4.4.
4.1.4.1 Initialization
Many networks have to be initialised before they can be used. To do this, click on INIT
(top line of buttons in control). You can change the range of random numbers used in the
initialization by entering appropriate values into the elds to the right of \INIT" at the
lower end of the control panel.
enter no of cycles here start / stop training initialize network
Training
Control
Parameters click here to
present patterns
in random order
Select learning
Learning function from this
Parameters button
# 2
0 2 2 0 -1 -1 0.91 0.91 0 1 0 0 0.01 0.01 0.34 -0.09 0.0 data
# 4
1 2 2 1 -1 4 2.93 2.93 0 0.95 0 0 0 0 0 0.75 1.0 every new pattern
# 10 starts with #label
0 2 2 0 1 2 3.43 2.15 0 0.94 0 0 0 0 0 0.65 1.0
# 12 could also use
0 0 2 0 1 -1 2.89 2.89 0 0.95 0 0 0 0 0 0.78 0.0
# input
# 14
0 2 0 0 -1 -1 2.59 ... 0 1 0 0 1
# target
1
etc
Figure 4.7: Manager panel, info panel, control panel and a display.
The complete help text from the le help.hdoc is available in the text section of a help
window. Information about a word can be retrieved by marking that word in the text and
then clicking LOOK or MORE . A list of keywords can be obtained by a click to TOPICS .
This window also allows context sensitive help, when the editor is used with the keyboard.
QUIT is used to leave XGUI. XGUI can also be left by pressing ALT-q in any SNNS
window. Pressing ALT-Q will exit SNNS without asking further questions.
4.3.1 Manager Panel
Figure 4.8 shows the manager panel. From the manager panel all other elements that have
a dierent, independent window assigned can be called. Because this window is of such
central importance, it is recommended to keep it visible all the time.
Below the buttons to open the SNNS windows are two lines that display the current status
of the simulator.
SNNS Status Message:
This line features messages about a current operation or its termination. It is also
the place of the command sequence display of the graphical network editor. When the
command is activated, a message about the execution of the command is displayed. For
a listing of the command sequences see chapter 6.
Status line:
This line shows the current position of the mouse in a display, the number of selected
units, and the position of
ags, set by the editor.
X:0 Y:0 gives the current position of the mouse in the display in SNNS unit coordinates.
The next icon shows a small selected unit. The corresponding value is the number of
currently selected units. This is important, because there might be selected units not
visible in the displays. The selection of units aects only editor operations (see chapter 6
and 6.3).
The last icon shows a miniature
ag. If safe appears next to the icon, the safety
ag was
set by the user (see chapter 6). In this case XGUI forces the user to conrm any delete
actions.
of the corresponding buttons below. With the setting of picture 4.9 a network le would
be selected. A le name beginning with a slash (/) is taken to be an absolute path.
Note: The extension .net for nets, .pat for patterns, .cfg for congurations, and .txt
for texts is added automatically and must not be specied. After the name is specied the
desired operation is selected by clicking either LOAD or SAVE . In the case of an error
the conrmer appears with an appropriate message. These errors might be:
Load: The le does not exist or has the wrong type
Save: A le with that name already exists
Depending upon the error and the response to the conrmer, the action is aborted or
executed anyway.
NOTE: The directories must be executable in order to be processed properly by the
program!
the result le has no meaning for the loaded network a load operation is not useful and
therefore not supported.
Messages that document the simulation run can be stored in the log le. The protocol
contains le operations, denitions of values set by clicking the SET button in the info
panel or the SET FUNC button in the control panel, as well as a teaching protocol (cycles,
parameters, errors). In addition, the user can output data about the network to the log
le with the help of the INFO button in the control panel. If no log le is loaded, output
takes place only on stdout. If no le name is specied when clicking LOAD , a possibly
open log le is closed and further output is restricted to stdout.
beginning of every single epoch. To remind you that jogging weights is activated,
the JOG button will be displayed inverted as long as this option is enabled.
It is also possible to jog only the weights of highly correlated non-special hidden
units of a network by selecting the corresponding button in the panel. For a de-
tailed description of this process please refer to the description of the function
jogCorrWeights in chapter 12.
5. INIT : Initialises the network with values according to the function and parameters
given in the initialization line of the panel.
6. RESET : The counter is reset and the units are assigned their initial activation.
7. ERROR : By pressing the error button in the control panel, SNNS will print out
several statistics. The formulas were contributed by Warren Sarle from the SAS
institute. Note that these criteria are for linear models; they can sometimes be
applied directly to nonlinear models if the sample size is large. A recommended
reference for linear model selection criteria is [JGHL80].
Notation:
n = Number of observations (sample size)
p = Number of parameters, to be estimated (i.e. weights)
SSE = The sum of squared errors
T SS = The total sum of squares corrected for the mean
for the dependent variable
Criteria for adequacy of the estimated model in the sample
Pearson's R2, the proportion of variance, is explained or accounted by the model:
R2 := 1 SSE T SS
Criteria for adequacy of the true model in the population
The mean square errorp [JGHL80] is dened as: MSE := n p , the root mean square
SSE
error as: RMSE := MSE .
2 , the R2 [JGHL80] adjusted for degrees of freedom, is dened as:
The Radj
2 := 1 n 1 (1 R2 )
Radj n p
Criteria for adequacy of the estimated model in the population
2 :
Anemiya's prediction criterion [JGHL80] is similar to the Radj
P C := 1 nn+pp (1 R2 )
The estimated mean square error of prediction (Jp ) assuming that the values of the
regressors are xed and that the model is correct is:
Jp := (n + p) MSE=n
The conservative mean square error in prediction [Weh94] is:
CMSEP := nSSE2p
The generalised cross validation (GCV ) is given by Wahba [GHW79] as:
GCV := (SSE n
n p)
2
The estimated mean square error of prediction assuming that both independent and
dependent variables are multivariate normal is dened as:
4.3. WINDOWS OF XGUI 47
GMSEP := MSEn((nn+1)(n 2)
p 1)
This panel is also very important for editing, since some operations refer to the displayed
TARGET unit or (SOURCE!TARGET) link. A default unit can also be created here, whose
values (activation, bias, IO-type, subnet number, layer numbers, activation function, and
output function) are copied into all selected units of the net.
The source unit of a link can also be specied in a 2D display by pressing the middle
mouse button, the target unit by releasing it. To select a link between two units the user
presses the middle mouse button on the source unit in a 2D display, moves the mouse to
the target unit while holding down the mouse button and releases it at the target unit.
Now the selected units and their link are displayed in the info panel. If no link exists
between two units selected in a 2D display, the TARGET is displayed with its rst link,
thereby changing SOURCE.
In table 4.2 the various elds are listed. The elds in the second line of the SOURCE or
TARGET unit display the name of the activation function, name of the output function,
name of the f-type (if available). The elds in the line LINK have the following meaning:
weight, site value, site function, name of the site. Most often only a link weight is available.
In this case no information about sites is displayed.
Unit number, unit subnet number, site value, and site function cannot be modied. To
change attributes of type text, the cursor has to be exactly in the corresponding eld.
There are the following buttons for the units (from left to right):
1. Arrow button : The button below TARGET selects the rst target unit (of the
given source unit); the button below SOURCE selects the rst source unit (of the
given target unit);
2. Arrow button : The button below TARGET selects the next target unit (of the
given source unit); the button below SOURCE selects the next source unit (of the
given target unit);
3. FREEZE : Unit is frozen, if this button is inverted. Changes become active only after
SET is clicked.
4. DEF : The default unit is assigned the displayed values of TARGET and SOURCE (only
52 CHAPTER 4. USING THE GRAPHICAL USER INTERFACE
There exist the following buttons for links (from left to right):
1. : Select rst link of the TARGET unit.
2. : Select next link of the TARGET unit.
3. OPTIONS : Calls the following menu:
4.3. WINDOWS OF XGUI 53
list current site of TARGET list of all links of current site.
list all sites of TARGET list all sites of the TARGET
list all links from SOURCE list all links starting at SOURCE
delete site delete displayed site
note: f-type gets lost!
add site add new site to TARGET
note: f-type gets lost!
4. SET : Only after clicking this button the link weight is set.
4.3.5 2D Displays
A 2D display or simply display is always part of the user interface. It serves to display the
network topology, the units' activations and the weights of the links. Each unit is located
on a grid position, which simplies the positioning of the units. The distance between two
grid points (grid width) can be changed from the default 37 pixels to other values in the
setup panel.
The current position, i.e. the grid position of the mouse, is also numerically displayed at
the bottom of the manager panel. The x-axis is the horizontal line and valid coordinates
lie in the range 32736 .. .+32735 (short integer).
The current version displays units as boxes, where the size of the box is proportional to the
value of the displayed attribute. Possible attributes are activation, initial activation, bias,
and output. A black box represents a positive value, an empty box a negative value. The
size of the unit varies between 16x16 and 0 pixels according to the value of scaleF actor.
The parameter scaleFactor has a default value of 1:0, but may be set to values between
0:0 and 2:0 in the setup panel. Each unit can be displayed with two of several attributes.
One above the unit and one below the unit. The attributes to be displayed can be selected
in the setup panel.
Links are shown as solid lines, with optional numerical display of the weight in the center
of the line and/or arrow head pointing to the target unit. These features are optional,
because they heavily aect the drawing speed of the display window.
A display can also be frozen with the button FREEZE (button gets inverted). It is after-
wards neither updated anymore2 , nor does it accept further editor commands.
An iconied display is not updated and therefore consumes (almost) no CPU time. If a
window is closed, its dimensions and setup parameters are saved in a stack (LIFO). This
means that a newly requested display gets the values of the window assigned that was last
closed. For better orientation, the window title contains the subnet number which was
specied for this display in the setup panel.
If the network has changed since the freeze, its contents will also have changed!
4.3. WINDOWS OF XGUI 55
unit below the unit. The numerical attribute selected with the button SHOW at the
bottom of the unit (activation, initial activation, output, or bias) also determines
the size of the unit in the graphical representation.
It is usually not advisable to switch o top (number or name), because this informa-
tion is needed for reference to the info panel. An unnamed unit is always displayed
with its number.
2. Buttons to control the display of link information: The third line consists of three
buttons to select the display of link data, ON , 2:35 , ! .
ON determines whether to draw links at all (then ON is inverted),
2:35 displays link weights at the center of the line representing the link,
! displays arrow heads of the links pointing from source to target unit.
3. LAYERS invokes another popup window to select the display of up to eight dierent
layers in the display window. Layers are being stacked like transparent sheets of
paper and allow for a selective display of units and links. These layers need NOT
correspond with layers of units of the network topology (as in multilayer feed-forward
networks), but they may do so. Layers are very useful to display only a selected sub-
set of the network. The display of each layer can be switched on or o independently.
A unit may belong to several layers at the same time. The assignment of units to
layers can be done with the menu assign layers invoked with the button OPTIONS
in the main Info panel.
4. COLOR sets the 2D{display colors. On monochrome terminals, black on white or
white on black representation of the network can be selected from a popup menu.
56 CHAPTER 4. USING THE GRAPHICAL USER INTERFACE
On color displays, a color editing window is opened. This window consists of three
parts: The palette of available colors at the top, the buttons to select the item to
be colored in the lower left region, and the color preview window in the lower right
region.
displayed simultaneously is 25. If a 26th curve is tried to be drawn, the conrmer appears
with an error message.
When the curve reaches the right end of the window, an automatic rescale of the x-axis is
performed. This way, the whole curve always remains visible.
In the top region of the graph window, several buttons for handling the display are located:
GRID : toggles the printing of a grid in the display. This helps in comparing dierent
curves.
PRINT : Prints the current graph window contents to a Postscript le. If the le already
exists a conrmer window pops up to let the user decide whether to overwrite or not. The
name of the output le is to be specied it the dialog box to the right of the button. If
no path is specied as prex, it will be written into the directory xgui was started from.
CLEAR : Clears the screen of the graph window and sets the cycle counter to zero.
DONE : Closes the graph window and resets the cycle counter.
For both the x{ and y{axis the following two buttons are available:
: Reduce scale in one direction.
: Enlarge scale in one direction.
SSE : Opens a popup menu to select the value to be plotted. Choices are SSE , MSE ,
and SSE/out , the SSE divided by the number of output units.
While the simulator is working all buttons are blocked.
The graph window can be resized by the mouse like every X-window. Changing the size
of the window does not change the size of the scale.
When validation is turned on in the control panel two curves will be drawn simultaneously
in the graph window, one for the training set and one for the validation set. On color
terminals the validation error will be plotted as solid red line, on B/W terminals as dashed
black line.
The projection analysis tool allows to display how the output of one unit (e.g. a hidden
or an output unit) depends on two input units. It thus realizes a projection to two input
vector axes.
It can be called by clicking the PROJECTION button in the manager panel or by typing
Alt-p in any SNNS window. The display of the projection panel is similar to the weights
display, from which it is derived.
in the setup panel, two units must be specied, whose inputs are varied over the given
input value range to give the X resp. Y coordinate of the projection display. The third
unit to be specied is the one whose output value determines the color of the points with
the given X and Y coordinate values. The range for the color coding can be specied as
output range. For the most common logistic activation function this range is [0; 1].
The use of the other buttons, ZOOM IN , ZOOM OUT and DONE are analogous to
the weight display and should be obvious.
The projection tool is very instructive with the 2-spirals problem, the XOR problem
or similar problems with two-dimensional input. Each hidden unit or output unit can be
inspected and it can be determined, to which part of the input space the neuron is sensitive.
Comparing dierent networks trained for such a problem by visualizing to which part of
the input space they are sensitive gives insights about the internal representation of the
networks and sometimes also about characteristics of the training algorithms used for
training. A display of the projection panel is given in gure 4.19.
4.3. WINDOWS OF XGUI 61
4.3.9 Print Panel
The print panel handles the Postscript output. A 2D-display can be associated with the
printer. All setup options and values of this display will be printed. Color and encapsulated
Postscript are also supported. The output device is either a printer or a le. If the output
device is a printer, a '.ps'-le is generated and spooled in the /tmp directory. It has a
unique name starting with the prex `snns'. The directory must be writable. When xgui
terminates normally, all SNNS spool les are deleted.
NETWORK : Opens the network setup panel. This panel allows the specication of several
options to control the way the network is printed.
The variables that can be set here include:
1. x-min, y-min, x-max and y-max describe the section to be printed.
2. Unit size: FIXED : All units have the same size.
VALUE : The size of a unit depends on its value.
3. Shape: Sets the shape of the units.
4. Text: SOLID : The box around text overwrites the background color and the links.
TRANSPARENT : No box around the text.
5. Border: A border is drawn around the network, if set to 'ON'.
6. Color: If set, the value is printed color coded.
7. Fill Intens: The ll intensity for units on monochrome printers.
8. Display: Selects the display to be printed.
special demands, like storing information about unit types or patterns. The best approach
would be to list all relevant keywords at the end of the le under the headline \* TOPICS",
so that the user can select this directory by a click to TOPICS .
The rst line reports whether all or only a single pattern is trained. The next lines give
the number of specied cycles and the given learning parameters, followed by a brief setup
description.
Then the 10-row-table of the learning progress is given. If validation is turned on this
table is intermixed with the output of the validation. The rst column species whether
the displayed error is computed on the training or validation pattern set, \Test" is printed
for the latter case. The second column gives the number of epochs still to be processed.
The third column is the Sum Squared Error (SSE) of the learning function. It is computed
with the following formula:
p2patterns j 2output
where tpj is the teaching output (desired output) of output neuron j on pattern p and opj
is the actual output. The forth column is the Mean Squared Error (MSE), which is the
SSE divided by the number of patterns. The fth value nally gives the SSE divided by
the number of output units.
The second and third values are equal if there are as many patterns as there are output
units (e.g. the letters network), the rst and third values are identical, if the network has
only one output unit (e.g. the xor network).
If the training of the network is interrupted by pressing the STOP button in the control
panel, the values for the last completed training cycle are reported.
The shell window also displays output when the INFO button in the control button is
pressed such an output may look like the following:
66 CHAPTER 4. USING THE GRAPHICAL USER INTERFACE
#input units: 35
#output units: 26
#patterns : 63
#subpatterns : 63
#sites : 0
#links : 610
#STable entr.: 0
#FTable-Entr.: 0
sizes in bytes:
units : 208000
sites : 0
links : 160000
NTable : 8000
STable : 0
FTable : 0
1. : vigilance parameter. Species the minimal length of the error vector r (units
ri ).
2. a: Strength of the in
uence of the lower level in F1 by the middle level.
3. b: Strength of the in
uence of the middle level in F1 by the upper level.
4. c: Part of the length of vector p (units pi) used to compute the error.
5. : Threshold for output function f of units xi and qi.
ARTMAP
Backpercolation 1:
1. : global error magnication. This is the factor in the formula = (t o),
where is the internal activation error of a unit, t is the teaching input and o
the output of a unit.
Typical values of are 1. Bigger values (up to 10) may also be used here.
2. : If the error value drops below this threshold value, the adaption according
to the Backpercolation algorithm begins. is dened as:
=
1 p N
XX
jo j
pN
BackpropThroughTime (BPTT),
BatchBackpropThroughTime (BBPTT):
>
> and S (t 1) < 0
>
:
0; else
where ij (t) is dened as follows: ij (t) = ij (t 1)+= . Furthermore, the
condition 0 < < 1 < + should not be violated.
{ Quickprop (in CC):
1. 1 : learning parameter, species the step width of the gradient descent
when minimizing the net error. A typical value is 0:0001
2. 1: maximum growth parameter, realizes a kind of dynamic momentum
term. A typical value is 2.0.
3. : weight decay term to shrink the weights. A typical value is <= 0:0001.
4. 2 : learning parameter, species the step width of the gradient ascent when
maximizing the covariance. A typical value is 0:0007
5. 2: maximum growth parameter, realizes a kind of dynamic momentum
term. A typical value is 2.0.
The formula used is:
S (t); if wij (t 1) = 0
8
>
( )
wij (t) = S(t 1) S(t) wij (t 1); if wij (t 1) 6= 0 and S(t S1)(t) S(t) <
<
S t
wij (t 1); else
>
:
72 CHAPTER 4. USING THE GRAPHICAL USER INTERFACE
Counterpropagation:
1. learning parameter, species the step width of the gradient descent. Values
n:
less than (1 / number of nodes) are recommended.
2. Wmax: maximum weight strength, species the maximum absolute value of
weight allowed in the network. A value of 1.0 is recommended, although this
should be lowered if the network experiences explosive growth in the weights
and activations. Larger networks will require lower values of Wmax.
3. count: number of times the network is updated before calculating the error.
NOTE: With this learning rule the update function RM Synchronous has to be used
which needs as update parameter the number of iterations!
Kohonen
1. Adaptation height. The initial adaptation height can vary between 0 and
h(0):
1. It determines the overall adaptation strength.
2. r(0): Adaptation radius. The initial adaptation radius r(0) is the radius of the
neighborhood of the winning unit. All units within this radius are adapted.
Values should range between 1 and the size of the map.
3. mult H: Decrease factor. The adaptation height decreases monotonically after
the presentation of every learning pattern. This decrease is controlled by the
decrease factor mult H: h(t + 1) := h(t) mult H
4.4. PARAMETERS OF THE LEARNING FUNCTIONS 73
4. mult R:Decrease factor. The adaptation radius also decreases monotonically
after the presentation of every learning pattern. This second decrease is con-
trolled by the decrease factor mult R: r(t + 1) := r(t) mult R
5. h: Horizontal size. Since the internal representation of a network doesn't allow
to determine the 2-dimensional layout of the grid, the horizontal size in units
must be provided for the learning function. It is the same value as used for the
creation of the network.
Monte-Carlo:
1. Min: lower limit of weights and biases. Typical values are 10:0 : : : 1:0.
2. Max: upper limit of weights and biases. Typical values are 1:0 : : : 10:0.
Simulated Annealing SS error,
Simulated Annealing WTA error and
Simulated Annealing WWTA error:
1. Min: lower limit of weights and biases. Typical values are 10:0 : : : 1:0.
2. Max: upper limit of weights and biases. Typical values are 1:0 : : : 10:0.
3. T0 : learning parameter, species the Simulated Annealing start temperature .
Typical values of T0 are 1:0 : : : 10:0.
4. deg: degradation term of the temperature: Tnew = Told deg Typical values of
deg are 0:99 : : : 0:99999.
Quickprop:
1. : learning parameter, species the step width of the gradient descent.
Typical values of for Quickprop are 0:1 : : : 0:3.
2. : maximum growth parameter, species the maximum amount of weight
change (relative to 1) which is added to the current change
Typical values of are 1:75 : : : 2:25.
3. : weight decay term to shrink the weights.
Typical values of are 0:0001. Quickprop is rather sensitive to this parameter.
It should not be set too large.
4. dmax : the maximum dierence dj = tj oj between a teaching value tj and an
output oj of an output unit which is tolerated, i.e. which is propagated back as
dj = 0. See above.
74 CHAPTER 4. USING THE GRAPHICAL USER INTERFACE
QuickpropThroughTime (QPTT):
1. delta0 :
starting values for all 4ij . Default value is 0.1.
2. deltamax : the upper limit for the update values 4ij .The default value of 4max
is 50:0.
3. : the weight-decay determines the relationship between the output error and
to reduction in the size of the weights.Important: Please note that the weight
decay parameter denotes the exponent, to allow comfortable input of very
small weight-decay. A choice of the third learning parameter = 4 corresponds
to a ratio of weight decay term to output error of 1 : 10000(1 : 104 ).
Scaled Conjugate Gradient (SCG)
All of the following parameters are non-critical, i.e. they in
uence only the speed of
convergence, not whether there will be success or not.
1. 1. Should satisfy 0 < 1 10 4 . If 0, will be set to 10 4 ;
2. 1. Should satisfy 0 < 1 10 6 . If 0, will be set to 10 6 ;
3. max . See standard backpropagation. Can be set to 0 if you don't know what
to do with it;
4. 1. Depends on the
oating-point precision. Should be set to 10 8 (simple
precision) or to 10 16 (double precision). If 0, will be set to 10 8 .
76 CHAPTER 4. USING THE GRAPHICAL USER INTERFACE
Field1..Field5 are the positions in the control panel. For a more detailed description of
ART-parameters see section 9.13: ART Models in SNNS
Now here is a description of the steps the various update functions perform, and of the
way in which they dier.
ART1 Stable
The ART1 Stable update function updates the neurons activation and output values until
a stable state is reached. In one propagation step the activation of all non-input units
is calculated and then the calculation of the output of all neuron follows. The state is
considered stable if the 'classiable' or the 'not classiable' neuron is selected. 'Classiable'
means that the input vector (pattern) is recognized by the net. 'Not classiable' means
that that there is no neuron in the recognition layer which would t the input pattern.
The required parameter is in eld1.
ART1 Synchronous
The algorithm of the ART1 Synchronous update function is the ART1 equivalent to the
algorithm of the Synchronous Order function. The only dierence is that the winner of
the ART1 recognition layer is identied. The required parameter is in eld1.
ART2 Stable
The rst task of this algorithm is to initialize the activation of all units. This is necessary
each time a new pattern is loaded to the network. The ART2 net is initialized for a new
pattern now. The output and activation will be updated with synchronous propagations
until a stable state is reached. One synchronous propagation cycle means that each neuron
calculate its output and then its new activation. The required parameters are , a, b, c,
in eld1, eld2, eld3, eld4, eld5 respectively.
78 CHAPTER 4. USING THE GRAPHICAL USER INTERFACE
ART2 Synchronous
This function is the ART2 equivalent to the Synchronous Order function. The only dif-
ference is that additionally the winner neuron of the ART1 recognition layer is calculated.
The required parameters are , a, b, c, in eld1, eld2, eld3, eld4, eld5 respectively.
ARTMAP Stable
ARTMAP Stable updates all units until a stable state is reached. The state is considered
stable if the classied or unclassied unit is 'on'. All neurons compute their output and
activation in one propagation step. The propagation step continues until the stable state
is reached. The required parameters are a ; b; c in eld1, eld2, eld3 respectively.
ARTMAP Synchronous
The rst step is to calculate the output value of the input units (input units of ARTa,
ARTb). Now a complete propagation step takes place, i.e. all units calculate their output
and activation value. The search for two recognition neuron with highest activation follows.
The search takes place in both ARTa and ARTb. The required parameters are a ; b; c
in eld1, eld2, eld3 respectively.
Auto Synchronous
First the Auto Synchronous function calculates the activation of all neurons. The next
step is to calculate the output of all units. The two steps will be repeated n times. For
the iteration parameter n, which has to be provided in eld1, a value of 50 has shown to
be very suitable.
BAM Order
The rst step of this update function is to search for the rst hidden unit of the network.
The current output is saved and a new output is calculated for all neurons of the hidden
and output layer. Once this is accomplished the next progression of the hidden and output
units starts. Now for each neuron of the hidden and output layer the new output is saved
and the old saved output is restored. With this older output the activation of all hidden
and output neurons is calculated. After this task is accomplished the new saved output
value of all hidden and output neurons is restored.
BBTT Order
The BBTT Order algorithm performs an update on a recurrent network. The recurrent
net can be transformed into a regular feedforward net with an input, multiple hidden and
output layer. At the beginning the update procedure checks if there is a zero-input pattern
4.5. UPDATE FUNCTIONS 79
in the input layer. Suppose there is such a pattern, then the so called i act value buer
is set to 0 for all neurons. In this case i act can be seen as a buer for the output value
of the hidden and output neurons. The next step is to copy the i act value to the output
of all hidden and output neurons. The new activation of the hidden and output units will
be calculated. Now the new output for every neuron in the hidden and output layer will
be computed and stored in i act.
CC Order
The CC Order update function propagates a pattern through the net. This means all
neurons calculate their new activation and output in a topological order. The CC Order
update function also handles the special units which represent the candidate units.
CounterPropagation
The CounterPropagation update algorithm updates a net that consists of a input, hidden
and output layer. In this case the hidden layer is called the Kohonen layer and the output
layer is called the Grossberg layer. At the beginning of the algorithm the output of the
input neurons is equal to the input vector. The input vector is normalized to the length
of one. Now the progression of the Kohonen layer starts. This means that a neuron with
the highest net input is identied. The activation of this winner neuron is set to 1. The
activation of all other neurons in this layer is set to 0. Now the output of all output
neurons is calculated. There is only one neuron of the hidden layer with the activation
and the output set to 1. This and the fact that the activation and the output of all output
neurons is the weighted sum on the output of the hidden neurons implies that the output
of the output neurons is the weight of the link between the winner neuron and the output
neurons. This update function makes sense only in combination with the CPN learning
function.
Dynamic LVQ
This update algorithm initializes the output and activation value of all input neurons with
the input vector. Now the progression of the hidden neurons begins. First the activation
and output of each of the hidden neurons is initialized with 0 and the new activation will
be calculated. The hidden neuron with the highest activation will be identied. Note
that the activation of this winner unit has to be > -1. The class which the input pattern
belongs to will be propagated to the output neuron and stored as the neurons activation.
This update function is sensible only in combination with the DLVQ learning function.
with 0. Afterwards the output value of all neurons will be calculated. The required
parameter is x in eld1.
JE Order
This update function propagates a pattern from the input layer to the rst hidden layer,
then to the second hidden layer, etc. and nally to the output layer. After this follows a
synchronous update of all context units. This function is makes sense only for JE-networks.
JE Special
Using the update function JE Special, input patterns will be generated dynamically.
Let n be the number of input units and m the number of output units of the network.
JE Special generates the new input vector with the output of the last n m input units
and the outputs of the m output units. The usage of this update function requires n > m.
The propagation of the newly generated pattern is done like using JE Update. The number
of the actual pattern in the control panel has no meaning for the input pattern when using
JE Special. This update function is used to determine the prediction capabilities of a
trained network.
Kohonen Order
The Kohonen Order function propagates neurons in a topological order. There are 2
propagation steps. The rst step all input units are propagated, which means that the
output of all neurons is calculated. The second step consists of the propagation of all
hidden units. This propagation step calculates all hidden neuron's activation and output.
Please note that the activation and output are normally not required for the Kohonen
algorithm. The activation and output values are used for display and evaluation reasons
internally. The Act Euclid activation function for example, copies the Euclidean distance
of the unit from the training pattern to the units activation.
Random Order
The Random Order update function selects a neuron and calculates its activation and
output value. The selection process is absolutely random and will be repeated n times.
4.5. UPDATE FUNCTIONS 81
The parameter n is the number of existing neurons. One specic neuron can be selected
more than one time while other neurons may be left out. This kind of update function is
rarely used and is just a theoretical base to prove the stability of Hopeld nets.
Random Permutation
This update function is similar to the Random Order function. The only dierence is that a
random permutation of all neurons is used to select the order of the units. This guarantees
that each neuron will be selected exactly once to calculate the output and activation value.
This procedure has two big disadvantages. The rst disadvantage is that the computation
of the permutation is very time consuming and the second disadvantage is that it takes a
long time until a stable output vector has been established.
Serial Order
The Serial Order update function calculates the activation and output value for each
unit. The progression of the neurons is serial which means the computation process starts
at the rst unit and proceeds to the last one.
Synchronous Order
With the synchronous update function all neurons change their value at the same time.
All neurons calculate their activation in one single step. The output of all neurons will be
calculated after the activation step. The dierence to the serial order update function
is that the calculation of the output and activation value requires two progressions of all
neurons. This kind of propagation is very useful for distributed systems (SIMD).
TimeDelay Order
The update function TimeDelay Order is used to propagate patters through a time delay
network. Its behavior is analogous to the Topological Order functions with recognition of
logical links.
Topological Order
This mode is the most favorable mode for feedforward nets. The neurons calculate their
new activation in a topological order. The topological order is given by the net-topology.
This means that the rst processed layer is the input layer. The next processed layer is
the rst hidden layer and the last layer is the output layer. A learning cycle is dened as
a pass through all neurons of the net. Shortcut-connections are allowed.
82 CHAPTER 4. USING THE GRAPHICAL USER INTERFACE
In order to work with various neural network models and learning algorithms, dierent ini-
tialization functions that initialize the components of a net are required. Backpropagation,
for example, will not work properly if all weights are initialized to the same value.
To select an initialization function, one must click SEL. FUNC in the INIT line of the
control panel.
The following initialization functions are available:
ART1 Weights for ART1 networks
ART2 Weights for ART2 networks
ARTMAP Weights for ARTMAP networks
CC Weights for Cascade Correlation and TACOMA networks
ClippHebb for Associative Memory networks
CPN Rand Pat for Counterpropagation
CPN Weights v3.2 for Counterpropagation
CPN Weights v3.3 for Counterpropagation
DLVQ Weights for Dynamic Learning Vector Quantization
Hebb for Associative Memory networks
Hebb Fixed Act for Associative Memory networks
JE Weights for Jordan or Elman networks
Kohonen Rand Pat for Self-Organizing Maps (SOMS)
Kohonen Const for Self-Organizing Maps (SOMS)
Kohonen Weights v3.2 for Self-Organizing Maps (SOMS)
Pseudoinv for Associative Memory networks
Randomize Weights for any network, except the ART-family
Random Weights Perc for Backpercolation
RBF Weights for Radial Basis Functions (RBFs)
RBF Weights Kohonen for Radial Basis Functions (RBFs)
RBF Weights Redo for Radial Basis Functions (RBFs)
RM Random Weights for Autoassociative Memory Networks
All these functions receive their input from the ve init parameter elds in the control
panel. See gure 4.11
Here is a short description of the dierent initialization functions:
ART1 Weights
ART1 Weights is responsible to set the initial values of the trainable links in an ART1
network. These links are the ones from F1 to F2 and the ones from F2 to F1 respectively.
For more details see chapter 9.13.1.2.
4.6. INITIALIZATION FUNCTIONS 83
ART2 Weights
For an ART2 network the weights of the top-down-links (F2 ! F1 links) are set to 0.0
according to the theory ([CG87b]). The choice of the initial bottom-up-weights is described
in chapter 9.13.2.2.
ARTMAP Weights
The trainable weights of an ARTMAP network are primarily the ones of the two ART1
networks ARTa and ARTb , therefore the initialization process is similar. For more details
see chapter 9.13.1.2 and chapter 9.13.2.2.
CC Weights
CC Weights calls the Randomize Weights function. See Randomize Weights.
ClippHebb
The ClippHebb algorithm is almost the same as the Hebb algorithm, the only dierence
is that all weights can only be set to 1 and 0. After the activation for the neurons is
calculated, all weights > 1 will be set to 1. As mentioned in 4.6 the ClippHebb algorithm
is a learning algorithm.
Hebb FixAct
JE Weights
This network consists of two types of neurons. The regular neurons and the so called con-
text neurons. In such networks all links leading to context units are considered recurrent
links. The initialization function JE Weights requires the specication of ve parameters:
, : The weights of the forward connections are randomly chosen from the interval
[ ; ]. ; have to be provided in eld1 and eld2 of the init panel.
: Weights of self recurrent links from context units to themselves. Simple Elman
networks use = 0. has to be provided in eld3 of the init panel.
: Weights of other recurrent links to context units. This value is often set to
1:0.
has to be provided in eld4 of the init panel.
: Initial activation of all context units. has to be provided in eld5 of the
init panel.
Note that it is required that > . If this is not the case, an error message will appear on
the screen. The context units will be initialized as described above. For all other neurons
the bias and all weights will be randomly chosen from the interval [ ; ].
86 CHAPTER 4. USING THE GRAPHICAL USER INTERFACE
Kohonen Const
Each component wij of each Kohonen weight vector wj is set to the value of p1n , thus
yielding all identical weight vectors wj of length 1. This is no problem, because the
Kohonen algorithm will quickly pull weight vectors away from this central dot and move
them into the proper direction.
PseudoInv
The PseudoInv initialization function computes all weights with the help of the pseudo
inverse weight matrix which is calculated with the algorithm of Greville. The formula
for the weight calculation is: W = QS +. Where S + is the 'Pseudoinverse' of the input
vectors, Q are the output vectors and W are the desired weights of the net. The bias is
not set and there are no parameters necessary. Please note that the calculated weights
are usually odd. As mentioned in 4.6 the PseudoInv algorithm is a learning algorithm.
Randomize Weights
This function initializes all weights and the bias with distributed random values. The
values are chosen from the interval [ ; ]. and have to be provided in eld1 and eld2
of the init panel. It is required that > .
RM Random Weights
The RM Random Weights function initializes the bias and all weights of all units which are
not input-units with a random value. This value is selected from the interval [ ; ].
and have to be provided in eld1 and eld2 of the init panel. > has to hold.
training value, thereby in principle forming classes of patterns for training, where the
composition of the classes can be changed on the
y.
The following remapping functions are available:
None default; does no remapping
Binary remaps to 0 and 1; threshold 0.5
Clip clips the pattern values on upper and lower limit
Inverse remaps to 1 and 0; threshold 0.5
LinearScale performs a linear transformation
Norm normalizes the output patterns to length 1
Threshold remapping to two target values
All these functions receive their input from the ve remap parameter elds in the control
panel. See gure 4.11. The result of the remapping function is visible to the user when
pressing the arrow buttons in the control panel. All pattern remapping is completely
transparent (during training, update, result-le generation) except when saving a pattern
le. In pattern les, always the original, unchanged patterns are stored, together with the
name of the remapping function which is to be applied.
Here is a short description of the dierent pattern remapping functions:
Binary
Maps the values of the output patterns to 0 and 1. This will then be a binary classier.
All values greater than 0.5 will be trained as 1, all others, i.e. also negative values, will
be trained as 0. This function does not need any parameters.
value used for value used for
display and training display and training
1 1
Inverse
Inverts all the patterns of a binary classier. All '1's will be trained as '0's and vice versa.
This mapping is also valid for other original output values. In general values greater than
0.5 will be trained as 0, all others as 1.
4.7. PATTERN REMAPPING FUNCTIONS 89
Clip
Clips all values above or below the limits to the limit values. Intermediate values remain
unchanged.
Note that this means that the values are cut to the interval [0,1], and not scaled to it!
Upper and lower limit are the two parameters required by this function.
upper
lower upper
lower
LinearScale
Performs a linear transformation to all output pattern values according to the general line
equation
new val = par1 pattern val + par2
where par1 and par2 are the rst and second function parameters, to be specied in the
REMAP line of the control panel. With these two parameters any linear transformation
can be dened.
None
This is the default remapping function. All patterns are trained as is, no remapping takes
place.
If you have a very time critical application, it might be advisable to bring the patterns
into the correct conguration before training and then use this remapping function, since
it is by far the fastest.
Norm
Here, all the patterns are normalized, i.e. mapped to a pattern of length 1. Using this
remapping function is only possible if there is at least one non-zero value in each pattern!
This function facilitates the use of learning algorithms like DLVQ that require that their
output training patterns are normalized. This function has no parameters.
90 CHAPTER 4. USING THE GRAPHICAL USER INTERFACE
Threshold
Threshold takes four parameters and is the most
exible of all the predened remapping
functions. The rst two values will be the upper and lower threshold values, the third
and fourth parameters the inner and outer training goals respectively. If the rst two
values are identical, the `inner' value will be treated as lower, while the `outer' value will
be treated as upper training goal.
value used for value used for
display and training display and training
outer outer
inner inner
Figure 4.26: The pattern remapping function threshold with 1st = 2nd parameter left and
1st 6= 2nd parameter on the right
Examples:
A parameter set of \-3.0, 3.0, 0.0, 5.0" will transform all output pattern values in the
interval [-3,3] to 0 while all other values will be converted to 5.0. A parameter set of
\128.0, 128.0, 255.0, 0.0" will bring all values below 128.0 to 255.0 while the others are
converted to 0. With an image as an output training pattern this would automatically
train on a binary negative of the image.
Note, that the list of available remapping functions can easily be extended. Refer to
section 15.2 for details. Keep in mind, that all remapping functions can have a maximum
of 5 parameters.
Figure 4.27: Edit panels for unit prototypes (f-types) and sites
Figure 4.27 shows the panels to edit unit prototypes (f-types) and sites. Both panels are
accessed from the EDITORS button in the control panel. The change of the f-type is
performed on all units of that type. Therefore, the functionality of all units to an f-type
can easily be changed. The elements in the panel have the following meaning:
4.8. CREATING AND EDITING UNIT PROTOTYPES AND SITES 91
SELECT : Selects of the activation and output function.
CHOOSE : Chooses the f-type to be changed.
SET : Makes the settings/changes permanent. Changes in the site list are not set
(see below).
NEW , DELETE : Creates or deletes an f-type.
ADD , DELETE : F-types also specify the sites of a unit. Therefore these two buttons
are necessary to add/delete a site in the site list.
Note: The number and the selection of sites can not be changed after the creation of an
f-type.
The elements in the edit panel for sites are almost identical. A site is selected for change
by clicking at it in the site list.
SELECT : Selects the new site function. The change is performed in all sites in the
net with the same name.
SET : Validates changes/settings.
NEW : Creates a new site.
DELETE : Deletes the site marked in the site list.
Chapter 5
The normal way to use a pattern together with a neural network is to have one pattern
value per input/output unit of the network. The set of activations of all input units is
called input pattern, the set of activations of all output units is called output pattern.
The input pattern and its corresponding output pattern is simply called a pattern. This
denition implies that all patterns for a particular network have the same size. These
patterns will be called regular or xed sized.
SNNS also oers another, much more
exible type of patterns. These patterns will be
called variable sized. Here, the patterns are usually larger than the input/output layers
of the network. To train and recall these patterns small portions (subsequently called
subpatterns) are systematically cut out from the large pattern and propagated through
the net, one at a time. Only the smaller subpatterns have to have the xed size tting the
network. The pattern itself may have an arbitrary size and dierent patterns within one
pattern set may have diering sizes. The number of variable dimensions is also variable.
Example applications for one and two variable dimensions include time series patterns for
TDNNs and picture patterns.
A third variation of patterns that can be handled by SNNS are the patterns that include
some class information together with the input an output values. This feature makes it
possible to group the patterns according to some property they have, even when no two
patterns have the exact same output. Section 5.4 explains how to use this information in
the pattern le.
Finally patterns can be trained dierent from the way they were specied in the pattern
le. SNNS features pattern remap functions, that allow easy manipulation of the pattern
output pattern on the
y without the need to rewrite or reload the pattern le. The use
of these functions is described in section 5.5.
All these types of patterns are loaded into SNNS from the same kind of pattern le. For
a detailed description of the structure of this le see sections 5.2 and 5.3. The grammar
is given in appendix A.4
5.1. HANDLING PATTERN SETS 93
5.1 Handling Pattern Sets
Although activations can be propagated through the network without patterns dened,
learning can be performed only with patterns present. A set of patterns belonging to the
same task is called a pattern set. Normally there are two dedicated pattern sets when
dealing with a neural network. One for training the network (training pattern set), and
one for testing purposes to see what the network has learned (test pattern set). In SNNS
both of these (and more) can be kept in the simulator at the same time. They are loaded
with the le browser (see chapter 4.3.2). The pattern set loaded last is made the current
pattern set. All actions performed with the simulator refer only to, and aect only the
current pattern set. To switch between pattern sets press the button USE in the control
panel (see gure 4.11 on page 44). It opens up a list of loaded pattern sets from which
a new one can be selected. The name of the current pattern set is displayed to the right
of the button. The name equals the name body of the loaded pattern le. If no pattern
set is loaded, \: : : Pattern File ?" is given as indication that no associated pattern le is
dened.
Loaded pattern sets can be removed from main memory with the DELETE button in
the control panel. Just like the USE button it opens a list of loaded pattern sets, from
which any set can be deleted. When a pattern set is deleted, the corresponding memory is
freed, and again available for other uses. This is especially important with larger pattern
sets, where memory might get scarce.
processing for two variable dimensions. The SNNS pattern denition is very
exible and
allows a great degree of freedom. Unfortunately this also renders the writing of correct
pattern les more diÆcult and promotes mistakes.
To make the user acquainted with the pattern le format we describe the format with the
help of an example pattern le. The beginning of the pattern le describing a bitmap
picture is given below. For easier reference, line numbers have been added on the left.
0001 SNNS pattern definition file V3.2
0002 generated at Tue Aug 3 00:00:44 1999
0003
0004 No. of patterns : 10
0005 No. of input units : 1
0006 No. of output units : 1
0007 No. of variable input dimensions : 2
0008 Maximum input dimensions : [ 200 200 ]
0009 No. of variable output dimensions : 2
0010 Maximum output dimensions : [ 200 200 ]
0011
0012 # Input pattern 1: pic1
0013 [ 200 190 ]
0014 1 1 1 0 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 0 1 1 0 0 1 1 0 1
.
.
.
0214 1 1 1 0 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 0 1 1 0 0 1 1 0 1
0215 # Output pattern 1: pic1
0216 [ 200 190 ]
0217 1 1 1 0 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 0 1 1 0 0 1 1 0 1
.
.
.
Some of the comments identifying parameter names make no sense when the le describes
a variable pattern. They are kept, however, for reasons of compatibility with the regular
xed size pattern denitions.
The meaning of the various lines is:
Line 0001 gives the version number of the grammar this le follows. For variable size
pattern les the version V3.2 is mandatory!
Line 0002 is information for the book keeping of the user only. Usually the time of
the generation of the pattern le is given here. The string 'generated at' is
mandatory !
Line 0004 gives the number of patterns dened in this le. The number of subpatterns
is not specied, since it depends on the size of the network. Remember: The
same pattern may be used by dierent sized networks resulting in varying
numbers of subpatterns!
Line 0005 CAUTION ! This variable does NOT give the number of input units but
the size C of the xed dimension. For TDNNs this would be the (invariant)
number of features, for a picture it would be the number of values per pixel
(i.e. a bitmap picture would have size 1, an RGB picture size 3).
Line 0006 corresponds to line 0005 for the output pattern
5.3. VARIABLE SIZE PATTERNS 95
Line 0007 this line species the number of variable input dimensions I . With xed size
patterns 0 has to be specied.
Line 0008 this line species the size of the largest pattern in this pattern set. It is
required for parsing and storage allocation purposes. The number of entries
in the list has to match the number given in line 0007, if 0 was specied there
an empty list (i.e. \[ ]")has to be given here.
Note: The lines 0007 and 0008 are pairwise mandatory, i.e. if one is given,
the other has to be specied as well. Old pattern les do have neither one and
can therefore still be read correctly.
Line 0009 corresponds to line 0007 for the output pattern. It species the number of
variable output dimensions O.
Line 0010 corresponds to line 0008 for the output pattern.
Note: The lines 0009 and 0010 are again pairwise mandatory, i.e. if one is
given, the other has to be specied as well. Old pattern les do have neither
one and can therefore still be read correctly.
Line 0012 an arbitrary comment. All Text following the sign `#' in the same line is
ignored.
Line 0013 this line has to be specied whenever I in line 0007 is 6= 0. It species the
size of the following input pattern and is given as a list of integers separated
by blanks and enclosed in [ ]. The values have to be given by descending
dimensions row, i.e. [ dimension 3 dimension 2 dimension 1 ] (here: [200
190]). Note that [200 190] is less than the maximum, which is specied in line
0008.
Line 0014 the rst line of Ii=1 dimensioni C activation values2 (i.e. here 1*190 = 190
Q
The additional functionality necessary for dealing with variable size patterns is provided
by the subpat panel depicted in gure 5.1.
3rd subpattern
2nd subpattern
1st subpattern 1st subpattern 5th subpattern
3rd subpattern
Dimension 1
Dimension 1
2nd subpattern
4th subpattern
4th subpattern
5th subpattern
Dimension 2 Dimension 2
be divided in tiles or overlapping pieces. When implementing a lter for example, whether
picture or others, a tiling style will always be more appropriate, since dierent units are
treated not concordantly.
It is the sole responsibility of the user to dene the step width and the size of the subpattern
correctly for both input and output. The user has to take care for the subpatterns to
be correspondent. A wrong specication can lead to unpredictable learning behavior.
The best way to check the settings is to press the TEST button, since exactly those
subpatterns are thereby generated that will also be used for the training. By observing
the reported position in the subpattern panel it can be veried whether meaningful values
have been specied.
Example
The Pattern le
SNNS pattern definition file V4.2
generated at Tue Aug 3 00:00:44 1999
No. of patterns : 6
No. of input units : 3
No. of output units : 3
No. of classes : 2
Class redistribution : [ 2 1 ]
# Pattern 1:
0 0 1
1 0 0
# Class:
B
# Pattern 2:
0 1 0
1 0 1
# Class:
A
# Pattern 3:
0 1 1
0 0 0
# Class:
A
# Pattern 4:
1 0 0
0 0 1
# Class:
A
# Pattern 5:
1 0 1
1 1 0
# Class:
B
# Pattern 6:
1 1 0
1 1 1
# Class:
B
Would dene a virtual pattern set with 12 patterns. There are 4 patterns of class B and 2
patterns of class A. Since the string A is alpha-numerically smaller than B it gets the rst
redistribution value (\2") assigned, B gets assigned \1" respectively. Since now for each 1
100 CHAPTER 5. HANDLING PATTERNS WITH SNNS
B there must be 2 As and each pattern has to be used at least once, this makes for a total
of 2*4 A + 4 B = 12 patterns. Since there are only 6 patterns physically present, some
of the patterns will be trained multiple times in each epoch (here the two A patterns are
used 4 times).
Each group of patterns with the given class redistribution is called a \chunk group". This
term is used during further explanations. For the given example and without pattern
shuing, the virtual pattern le would look like a pattern le with 12 patterns, occuring
in the following order:
virtual (user visible) pattern number 1 2 3 4 5 6 7 8 9 10 11 12
physical (led) pattern number 3 1 4 3 2 4 3 5 4 3 6 4
class A B A A B A A B A A B A
Within each chunk group the patterns are arranged in such an order, that that classes are
intermixed as much as possible.
With pattern shuing enabled, the composition of 2 As and 1 B within one chunk group
remains the same. In addition, the order of all As and Bs is shued, which could lead to
the following virtual training order (shuing is not visible to the user and takes place only
during training):
virtual (user visible) pattern number 1 2 3 4 5 6 7 8 9 10 11 12
physical (led) pattern number 3 5 4 4 1 3 4 2 3 3 6 4
class A B A A B A A B A A B A
Note, that also during shuing, a pattern is never used twice unless all other patterns
within the same class were used at least once. This means that an order like
3 1 3 4 2 4 . . .
A B A A B A A B A
can never occur because the second A (physical pattern 3) is used twice before using pattern
4 once.
The unshued, virtual pattern order is visible to the user if class redistribution is activated,
either through the optional Class redistribution eld in the pattern le or through the
CLASSES panel. Activation of class redistribution results in a dynamic, virtual change
of the pattern set size whenever values from the CLASSES panel are altered. Also the
virtual pattern order changes after alteration.
All virtualization is transparent to the user interface (e.g. , buttons in the CONTROL
panel) to all learn, update, and init functions of SNNS, as well as to the result le cre-
ation. Saving pattern les, however, results in a physical pattern composition together
with dened values in the Class redistribution eld.
Without the Class redistribution in the pattern le, or when switching the class usage
o in xgui or batchman, the virtual (visible) pattern set will be identical to the patterns
given in the physical pattern le.
5.5. PATTERN REMAPPING 101
PLEASE NOTE:
At this time, the classical applications for class information, namely Kohonen
and DLVQ learning, do not take advantage of this class information within
the learning algorithm! This is due to the fact that classes were introduced to
SNNS long after those learning schemes were implemented. Look for future re-
leases of SNNS where there might be new implementations of these algorithms
with classes.
Currently, class information is used only to dene virtual pattern sets where
the size of the virtual set is dierent from the size of the physical set.
Figure 5.3: The eect of invers pattern Figure 5.4: An example of threshold pat-
remapping tern remapping
With this remapping it becomes possible to quickly change a continuous output value
pattern set to a binary one. Also patterns can easily be
ipped, i.e. 0-s become 1-s and
102 CHAPTER 5. HANDLING PATTERNS WITH SNNS
The graphical user interface of SNNS has a network editor built in. With the network
editor it is possible to generate a new network or to modify an existing network in various
ways. There also exist commands to change the display style of the network.
As an introduction, operations on networks without sites will be discussed rst, since they
are easier to learn and understand. Operations that have a restricted or slightly dierent
meaning for networks with sites are displayed with the extension (Sites!) in the following
overview. These changes are discussed in detail in section 6.5.
As usual with most applications of X-Windows, the mouse must be in the window in
which an input is to appear. This means that the mouse must be in the display window
for editor operations to occur. If the mouse is moved in a display, the status indicator of
the manager panel changes each time a new raster position in the display is reached.
Dierent displays of a network can be seen as dierent views of the same object. This
means that all commands in one display may aect objects (units, links) in the other
displays. Objects are moved or copied in a second display window in the same way as
they are moved or copied in the rst display window.
The editor operations are usually invoked by a sequence of 2 to 4 keys on the keyboard.
They only take place when the last key of the command (e.g. deletion of units) is pressed.
We found that for some of us, the fastest way to work with the editor was to move the
mouse with one hand and to type on the keyboard with the other hand. Keyboard actions
and mouse movement may occur at the same time, the mouse position is only relevant
when the last key of the sequence is pressed.
The keys that are suÆcient to invoke a part of a command are written in capital letters
in the commands. The message line in the manager panel indicates the completed parts
of the command sequence. Invalid keys are ignored by the editor.
As an example, if one presses the keys U for Units and C for Copy the status line changes
as follows:
status line command comment
> Units operation on units
104 CHAPTER 6. GRAPHICAL NETWORK EDITOR
To the left of the caret the fully expanded input sequence is displayed. At this place
also a message is displayed when a command sequence is accepted and the corresponding
operation is called. This serves as feedback, especially if the operation takes some time.
If the operation completes quickly, only a short
icker of the text displayed can be seen.
Some error messages appear in the conrmer, others in the message line.
The mode command is useful, if several unit or link commands are given in sequence.
Return cancels a command, like Quit does, but also returns to normal mode.
6.2 Selection
6.2.1 Selection of Units
Units are selected by clicking on the unit with the left mouse button. On Black&White
terminals, selected units are shown with crosses, on color terminals in a special, user
dened, color. The default is yellow. By pressing and holding the mouse button down and
moving the mouse, all units within a rectangular area can be selected, like in a number of
popular drawing programs. It is not signicant in what direction the rectangle is opened.
6.3. USE OF THE MOUSE 105
To remove a unit or group of units from a selection, one presses the SHIFT key on the
keyboard while selecting the unit or group of units again. This undoes the previous
selection for the specied unit or group of units. Alternatively, a single unit can be
deselected with the right mouse button.
If the whole selection should be reset, one clicks in an empty raster position. The number
of selected units is displayed at the bottom of the manager panel next to a stylized selection
icon.
Example (setting activations of a group of units):
The activations of a group of units can be set to a specic value as follows: Enter the
value in the activation value eld of the target unit in the info panel. Select all units that
should obtain the new value. Then enter the command to set the activation (Units Set
Activation).
Selects a unit. If the mouse is moved with the button pressed down, a group of
units in a rectangular area is selected. If the SHIFT key is pressed at the same time,
the units are deselected. The direction of movement with the mouse to open the
rectangular area is not signicant, i.e. one can open the rectangle from bottom right
to top left, if convenient.
If the left mouse button is pressed together with the CONTROL key, a menu appears
with all alternatives to complete the current command sequence. The menu items
that display a trailing '!' indicate that the mouse position of the last command of
a command sequence is important. The letter 'T' indicates that the target unit in
the info panel plays a role. A (~) denotes that the command sequence is not yet
completed.
right mouse button:
Undo of a selection. Clicking on a selected unit with the right mouse button only
deselects this unit. Clicking on an empty raster position resets the whole selection.
middle mouse button:
Selects the source unit (on pressing the button down) and the target unit (on releas-
ing the button) and displays them both in the info panel. If there is no connection
between the two units, the target unit is displayed with its rst source unit. If the
button is pressed on a source unit and released over an empty target position, the
link between the source and the current (last) target is displayed. If there is no such
link the display remains unchanged. Conversely, if the button is pressed on an empty
source position and released on an existing target unit, the link between the current
(last) source unit and the selected target unit is displayed, if one exists. This is a
convenient way to inspect links.
In order to indicate the position of the mouse even with a small raster size, there is always
a sensitive area of at least 16x16 pixels wide.
Sites Copy with No links: copies the current site of the Target unit to all
selected units. Links are not copied
Sites Copy with All links: ditto, but with all links
3. Unit Commands:
Units Freeze: freeze all selected units
Units Unfreeze: reset freeze for all selected units
Units Set Name: sets name to the name of Target
Units Set io-Type: sets I/O type to the type of Target
Units Set Activation: sets activation to the activation of Target
Units Set Initial activation: sets initial activation to the initial activa-
tion of Target
Units Set Output: sets output to the output of Target
Units Set Bias: sets bias to the bias of Target
Units Set Function Activation: sets activation function. Note: all selected
units loose their default type (f-type)
Units Set Function Output: sets output function Note: all selected units
loose their default type (f-type)
Units Set Function Ftype: sets default type (f-type)
Units Insert Default: inserts a unit with default values. The unit has no
links
Units Insert Target: inserts a unit with the same values as the Target unit.
The unit has no links
Units Insert Ftype: inserts a unit of a certain default type (f-type) which is
determined in a popup window
Units Delete: deletes all selected units
Units Move: all selected units are moved. The mouse determines the desti-
nation position of the TARGET unit (info-panel). The selected units and their
position after the move are shown as outlines.
Units Copy ...: copies all selected units to a new position. The mouse posi-
tion determines the destination position of the TARGET unit (info-panel).
Units Copy All: copies all selected units with all links
Units Copy Input: copies all selected units with their input links
Units Copy Output: copies all selected units and their output links
Units Copy None: copies all selected units, but no links
6.4. SHORT COMMAND REFERENCE 109
Units Copy Structure ...: copies all selected units and the link structure
between these units, i.e. a whole subnet is copied
Units Copy Structure All: copies all selected units, all links between them,
and all input and output links to and from these units
Units Copy Structure Input: copies all selected units, all links between
them, and all input links to these units
Units Copy Structure Output: copies all selected units, all links between
them, and all output links from these units
Units Copy Structure None: copies all selected units and all links between
them
Units Copy Structure Back binding: copies all selected units and all links
between them and inserts additional links from the new to the corresponding
original units (Sites!)
Units Copy Structure Forward binding: copies all selected units and all
links between them and inserts additional links from the original to the corre-
sponding new units (Sites!)
Units Copy Structure Double binding: ditto, but inserts additional links
from the original to the new units and vice versa (Sites!)
4. Mode Commands:
Mode Units: unit mode, shortens command sequence if one wants to work with
unit commands only. All subsequences after the Units command are valid then
Mode Links: analogous to Mode Units, but for link commands
5. Graphics Commands:
Graphics All: redraws the local window
Graphics Complete: redraws all windows
Graphics Direction: draws all links from and to a unit with arrows in the
local window
Graphics Links: redraws all links in the local window
Graphics Move: moves the origin of the local window such that the Target
unit is displayed at the position of the mouse pointer
Graphics Origin: moves the origin of the local window to the position indi-
cated
Graphics Grid: displays a graphic grid at the raster positions in the local
window
Graphics Units: redraws all units in the local window
110 CHAPTER 6. GRAPHICAL NETWORK EDITOR
Connections leading to a site are only reversed, if the original source unit has a site
of the same name. Otherwise they remain as they are.
7. Links Delete Clique (selection : site-popup)
Links Delete from Source unit (selection [unit] : site-popup)
Links Delete to Target unit (selection [unit] : site-popup)
These three operations are the reverse of Links Make in that they delete the con-
nections. If the safety
ag is set (the word safe appears behind the
ag symbol in
the manager panel), a conrmer window forces the user to conrm the deletion.
8. Links Copy Input (selection [unit] :)
Links Copy Output (selection [unit] :)
Links Copy All (selection [unit] :)
Links Copy Input copies all input links of the selected group of units to the single
unit under the mouse pointer. If sites are used, incoming links are only copied if a
site with the same name as in the original units exists.
Links Copy Output copies all output links of the selected group of units to the
single unit under the mouse pointer.
Links Copy All Does both of the two operations above
9. Links Copy Environment (selection TARGET site-links [unit] :)
This is a rather complex operation: Links Copy Environment tries to duplicate the
links between all selected units and the current TARGET unit in the info panel at
the place of the unit under the mouse pointer. The relative position of the selected
units to the TARGET unit plays an important role: if a unit exists that has the same
relative position to the unit under the mouse cursor as the TARGET unit has to one
of the selected units, then a link between this unit and the unit under the mouse
pointer is created.
The result of this operation is a copy of the structure of links between the selected
units and the TARGET unit at the place of the unit under the mouse pointer. That
is, one obtains the same topological structure at the unit under the mouse pointer.
This is shown in gure 6.1. In this gure the structure of the TARGET unit and
the four Env units is copied to the unit UnderMousePtr. However, only two units
are in the same relative position to the UnderMousePtr as the Env units are to the
Target unit, namely corrEnv3 corresponding to Env3 and corrEnv4 corresponding
to Env4. So only those two links from the units corrEnv3 to UnderMousePtr and
from corrEnv4 to UnderMousePtr are generated.
10. Sites Add (selection : Popup)
A site which is chosen in a popup window is added to all selected units. The command
has no eect for all units which already have a site of this name (because the names
of all sites of a unit must be dierent)
6.5. EDITOR COMMANDS 113
these links would go unnoticed. There are also options, which additional links are
to be copied. If only the substructure is to be copied, the command Units Copy
Structure None is used.
Figure 6.2: An Example for Units Copy Structure with Forward binding
The options with binding present a special feature. There, links between original
and copied units are inserted automatically, in addition to the copied structure links.
Back, Forward and Double specify thereby the direction of the links, where \back"
means the direction towards the original unit. An example is shown in picture 6.2.
If sites are used, the connections to the originals are assigned to the site selected
in the popup. If not all originals have a site with that name, not all new units are
linked to their predecessors.
With these various copy options, large, complicated nets with the same or similar
substructures can be created very easily.
20. Mode Units (:)
Mode Links (:)
Switches to the mode Units or Links. All sequences of the normal modes are
available. The keys U and L need not be pressed anymore. This shortens all sequences
by one key.
21. Units . . . Return (:)
Links . . . Return (:)
Returns to normal mode after executing Mode Units.
(\hidden")
Links> specify weight \6:97"
Links> Make to Target create links
Links> set mouse to unit 4 (\output");
specify weight \ 5:24"
Links> Make to Target create links
Links> deselect all units and
select unit 3
Links> set mouse to unit 4 and
specify \11:71" as weight.
Links> Make to Target create links
Now the topology is dened. The only actions remaining are to set the IO types and
the four patterns. To set the IO types, one can either use the command Units Set
Default io-type, which sets the types according to the topological position of the units,
or repeatedly use the command Units Set io-Type. The second option can be aborted
by pressing the Done button in the popup window before making a selection.
Chapter 7
SNNS provides ten tools for easy creation of large, regular networks. All these tools
carry the common name BigNet. They are called by clicking the button BIGNET in the
manager panel. This invokes the selection menu given below, where the individual tools
can be selected. This chapter gives a short indroduction to the handling of each of them.
general
time delay
art 1
art 2
artmap
kohonen
jordan
elman
hopfield
auto assoz
Note, that there are other network creation tools to be called from the Unix command
line. Those tools are described in chapter 13.
Figure 7.1: The BigNet window for Feed-Forward and recurrent Networks
7.1. BIGNET FOR FEED-FORWARD AND RECURRENT NETWORKS 121
BigNet creates a net in two steps:
1. Edit net: This generates internal data structures in BigNet which describe the net-
work but doesn't generate the network yet. This allows for easy modication of the
network parameters before creation of the net.
The net editor consists of two parts:
(a) The plane editing part for editing planes. The input data is stored in the plane
list.
(b) The link editing part for editing links between planes. The input data is stored
in the link list.
2. Generate net in SNNS: This generates the network from the internal data structures
in BigNet.
Both editor parts are subdivided into an input part (Edit plane, Edit link) and into a
display part for control purposes (Current plane, Current link). The input data of both
editors is stored, as described above, in the plane list and in the link list. After pressing
ENTER , INSERT , or OVERWRITE the input data is added to the corresponding
editor list. In the control part one list element is always visible. The buttons , , ,
and enable moving around in the list. The operations DELETE, INSERT, OVER-
WRITE, CURRENT PLANE TO EDITOR and CURRENT LINK TO EDITOR refer to
the current element. Input data is only entered in the editor list if it is correct, otherwise
nothing happens.
7.1.2 Buttons of BigNet
ENTER : Input data is entered at the end of the plane or the link list.
INSERT : Input data is inserted in the plane list in front of the current plane.
OVERWRITE : The current element is replaced by the input data.
DELETE : The current element is deleted.
PLANE TO EDIT : The data of the current plane is written to the edit plane.
LINK TO EDIT : The data of the current link is written to the edit link.
TYPE : The type (input, hidden, output) of the units of a plane is determined.
POS : The position of a plane is always described relative (left, right, below) to the
position of the previous plane. The upper left corner of the rst plane is positioned at
the coordinates (1,1) as described in Figure 7.3. BigNet then automatically generates the
coordinates of the units.
FULL CONNECTION : A fully connected feed forward net is generated. If there are n planes
numbered 1::n then every unit in plane i with i > 0 is connected with every unit in plane
i + 1 for all 1 i n 1.
SHORTCUT CONNECTION : If there exist n planes 1 : : : n then every unit in plane i with
1 i < n is connected with every unit in all planes j with i < j n.
122 CHAPTER 7. GRAPHICAL NETWORK CREATION TOOLS
Plane :
x:5
y:5
Unit :
x :1 Cluster :
y :3 x : 2
y : 2
width : 2
height : 2
right
right
Plane 3
below
Plane 5 Plane 6
left
right
CREATE NET : The net described by the two editors is generated by SNNS. The default
name of the net is SNNS NET.net. If a net with this name already exists a warning is
issued before it is replaced.
CANCEL : All internal data of the editors is deleted.
Figure 7.4 shows the display for the three possible input combinations with (all units of)
a plane as source. The other combinations are similar. Note that both source plane and
target plane must be specied in all cases, even if source or target consists of a cluster of
units or a single unit. If the input data is inconsistent with the above rules it is rejected
with a warning and not entered into the link list after pressing ENTER or OVERWRITE .
With the Move parameters one can declare how many steps a cluster or a unit will be
moved in x or y direction within a plane after the cluster or the unit is connected with a
target or a source. This facilitates the construction of receptive elds where all units of
a cluster feed into a single target unit and this connectivity pattern is repeated in both
directions with a displacement of one unit.
The parameter dx (delta-x) denes the step width in the x direction and dy (delta-y)
denes the step width in the y direction. If there is no entry in dx or dy there is no
movement in this direction. Movements within the source plane and the target plane is
independent from each other. Since this feature is very powerful and versatile it will be
illustrated with some examples.
124 CHAPTER 7. GRAPHICAL NETWORK CREATION TOOLS
Figure 7.4: possible input combinations with (all units of) a plane as source, between 1)
a plane and a plane, 2) a plane and a cluster, 3) a plane and a unit. Note that the target
plane is specied in all three cases since it is necessary to indicate the target cluster or
target unit.
Example 1: Receptive Fields in Two Dimensions
This time the net consists of three planes (g. 7.8). To create the links
source: plane1 (1,1), (1,2), (1,3) ! target: plane 2 (1,1)
source: plane1 (2,1), (2,2), (2,3) ! target: plane 2 (1,2)
source: plane1 (3,1), (3,2), (3,3) ! target: plane 2 (1,3)
source: plane1 (1,1), (2,1), (3,1) ! target: plane 3 (1,1)
source: plane1 (1,2), (2,2), (3,2) ! target: plane 3 (1,2)
source: plane1 (1,3), (2,3), (3,3) ! target: plane 3 (1,3)
between the units one must insert the move data shown in gure 7.7. Every line of plane
1 is a cluster of width 3 and height 1 and is connected with a unit of plane 2, and every
column of plane 1 is a cluster of width 1 and height 3 and is connected with a unit of plane
3. In this special case one can ll the empty input elds of \move" with any data because
a movement in this directions is not possible and therefore these data is neglected.
126 CHAPTER 7. GRAPHICAL NETWORK CREATION TOOLS
Delay
Total Delay Length
Length
Couppled Weights
It is possible to specify seperate receptive elds for dierent feature units. With only one
receptive eld for all feature units, a "1" has to be specied in the input window for "1st
feature unit:". For a second receptive eld, the rst feature unit should be the width of
130 CHAPTER 7. GRAPHICAL NETWORK CREATION TOOLS
the rst receptive eld plus one. Of course, for all number of receptive elds, the sum
of their width has to equal the number of feature units! An example network with two
receptive elds is depicted in gure 7.12
Figure 7.14: Example for the generation of an ART1 network. First the BigNet (ART1)
panel is shown with the specied parameters. Next you see the created net as you can see
it when using an SNNS display.
For ARTMAP things are slightly dierent. Since an ARTMAP network exists of two
ART1 subnets (ARTa and ARTb ), for both of them the parameters described above have
to be specied. This is the reason, why BigNet (ARTMAP) takes eight instead of four
parameters. For the MAP eld the number of units and the number of rows is taken from
the repective values for the Fb2 layer.
The input layer is fully connected to the hidden layer,i.e. every input unit is con-
nected to every unit of the hidden layer. The hidden layer is fully connected to the
output layer.
Output units are connected to context units by recurrent 1-to-1-connections. Every
context unit is connected to itself and to every hidden unit.
Default activation function for input and context units is the identity function, for
hidden and output units the logistic function.
Default output function for all units is the identity function
To close the BigNet window for Jordan networks click on the DONE button.
7.6.2 BigNet for Elman Networks
By clicking on the ELMAN button in the BigNet menu, the BigNet window for Elman
networks (see g.7.20) is opened.
8.1 Inversion
Very often the user of a neural network asks what properties an input pattern must have in
order to let the net generate a specic output. To help answer this question, the Inversion
algorithm developed by J. Kindermann and A. Linden ([KL90]) was implemented in SNNS.
8.1.1 The Algorithm
The inversion of a neural net tries to nd an input pattern that generates a specic output
pattern with the existing connections. To nd this input, the deviation of each output
from the desired output is computed as error Æ. This error value is used to approach
the target input in input space step by step. Direction and length of this movement is
computed by the inversion algorithm.
The most commonly used error value is the Least Mean Square Error. E LMS is dened
as
n
= [Tp f( wij opi )]2
X X
E LMS
p=1 i
agation training, with the dierence that no weights are adjusted here. When the error
signals reach the input layer, they represent a gradient in input space, which gives the
direction for the gradient descent. Thereby, the new input vector can be computed as
I (1) = I (0) + Æi(0)
where is the step size in input space, which is set by the variable eta.
This procedure is now repeated with the new input vector until the distance between the
generated output vector and the desired output vector falls below the predened limit of
delta max, when the algorithm is halted.
For a more detailed description of the algorithm and its implementation see [Mam92].
8.1.2 Inversion Display
The inversion algorithm is called by clicking the INVERSION button in the manager panel.
Picture 8.1 shows an example of the generated display.
where cycle is the number of the current iteration, inversion error is the sum of the
squared error of the output units for the current input pattern, and error units are
all units that have an activation that diers more than the value of Æmax from the
target activation.
3. STOP : Interrupts the iteration. The status of the network remains unchanged. The
interrupt causes the current activations of the units to be displayed on the screen. A
click to the STEP button continues the algorithm from its last state. Alternatively
the algorithm can be reset before the restart by a click to the NEW button, or
continued with other parameters after a change in the setup. Since there is no
automatic recognition of innite loops in the implementation, the STOP button is
also necessary when the algorithm obviously does not converge.
4. NEW Resets the network to a dened initial status. All variables are assigned the
values in the setup panel. The iteration counter is set to zero.
5. SETUP : Opens a pop-up window to set all variables associated with the inversion.
These variables are:
eta The step size for changing the activations. It should range
from 1.0 to 10.0. Corresponds to the learning factor in
backpropagation.
delta max The maximum activation deviation of an output unit.
Units with higher deviation are called error units.
A typical value of delta max is 0.1.
Input pattern Initial activation of all input units.
2nd approx ratio In
uence of the second approximation. Good values range
from 0.2 to 0.8.
A short description of all these variables can be found in an associated help window,
which pops up on pressing HELP in the setup window.
The variable second approximation can be understood as follows: Since the goal is
to get a desired output, the rst approximation is to get the network output as close
8.1. INVERSION 139
as possible to the target output. There may be several input patterns generating
the same output. To reduce the number of possible input patterns, the second
approximation species a pattern the computed input pattern should approximate
as well as possible. For a setting of 1.0 for the variable Input pattern the algorithm
tries to keep as many input units as possible on a high activation, while a value of
0.0 increases the number of inactive input units. The variable 2nd approx ratio
denes then the importance of this input approximation.
It should be mentioned, however, that the algorithm is very unstable. One inversion
run may converge, while another with only slightly changed variable settings may
run indenitely. The user therefore may have to try several combinations of variable
values before a satisfying result is achieved. In general, the better the net was
previously trained, the more likely is a positive inversion result.
6. HELP : Opens a window with a short help on handling the inversion display.
The network is displayed in the lower part of the window according to the settings of the
last opened 2D{display window. Size, color, and orientation of the units are read from
that display pointer.
8.1.3 Example Session
The inversion display may be called before or after the network has been trained. A pattern
le for the network has to be loaded prior to calling the inversion. A target output of the
network is dened by selecting one or more units in the 2D{display by clicking the middle
mouse button. After setting the variables in the setup window, the inversion run is started
by clicking the start button. At regular intervals, the inversion gives a status report on
the shell window, where the progress of the algorithm can be observed. When there are no
more error units, the program terminates and the calculated input pattern is displayed.
If the algorithm does not converge, the run can be interrupted with the stop button and
the variables may be changed. The calculated pattern can be tested for correctness by
selecting all input units in the 2D{display and then deselecting them immediately again.
This copies the activation of the units to the display. It can then be dened and tested
with the usual buttons in the control panel. The user is advised to delete the generated
pattern, since its use in subsequent learning cycles alters the behavior of the network which
is generally not desirable.
Figure 8.2 shows an example of a generated input pattern (left). Here the minimum
active units for recognition of the letter 'V' are given. The corresponding original pattern
is shown on the right.
140 CHAPTER 8. NETWORK ANALYZING TOOLS
Figure 8.2: An Example of an Inversion Display (left) and the original pattern for the
letter V
GRID : Displays a grid. The number of rows and columns of the grid can be
specied in the network analyzer setup.
CLEAR : This button clears the graph in the display. The time counter will be
reset to 1. If there is an active M--TEST operation, this operation will
be killed.
M-TEST : A click on this button corresponds to several clicks on the TEST button
in the control panel. The number n of TEST operations to be executed
can be specied in the Network Analyzer setup. Once pressed, the
button remains active until all n TEST operations have been executed
or the M--TEST operation has been killed, e.g. by clicking the STOP
button in the control panel.
RECORD : If this button activated, the points will not only be shown on the display,
but their coordinates will also be saved in a le. The name of this le
can be specied in the setup of the Network Analyzer.
D-CTRL : Opens the display control window of the Network Analyzer. The de-
scription of this window follows below.
SETUP : Opens the Network Analyzer setup window. The description of the
setup follows in the next subsection.
DONE : Closes the network analyzer window. An active M--TEST operation will
be killed.
142 CHAPTER 8. NETWORK ANALYZING TOOLS
2 The second part of the setup window is used to specify some attributes about the
axes. The rst line contains the values for the axes in horizontal direction, the
second line these for the vertical axes. The columns min and max dene the area to
be displayed. The numbers of the units, whose activation or output values should
be drawn have to be specied in the column unit. The last column grid the
number of columns and rows of the grid can be varied. The labeling of the axes is
dependent on these values, too.
3a The selection between showing the activation or the output of a unit along the x{
or y{axes can be made here. To draw the output of a unit click on OUT and to
draw the activation of a unit click on ACT .
3b Dierent types of error curves can be drawn:
X
jti oi j For each output unit the dierence between the generated
i
output and the teaching output is computed. The error is
computed as the sum of the absolute values of the dierences.
If AVE is toggled, the result is divided by the number of
output units, giving the average error per output unit.
X
jti oij2 The error is computed as above, but the square of the dier-
i
ences is taken instead of the absolute values. With AVE the
mean squared deviation is computed.
Here the deviation of only a single output unit is processed.
jtj oj j The number of the unit is specied as unit j.
m-test: Species the number of TEST operations, which have to be executed
4 when clicking on M-TEST button.
time: Sets the time counter to the given value.
5 The name of the le, in which the visualized data can be saved by activating
the RECORD button, can be specied here. The lename will be automatically
extended by the suÆx '.rec'. To change the lename, the RECORD button must
not be activated.
8.2. NETWORK ANALYZER 143
1
y
2
y
3a
4
z
3b
9
5
9
Figure 8.4: The Network Analyzer SetupWindows: the setup window for a x-y{graph
(top), the setup window for a t-y{graph (middle) and the setup window for a t-e{graph
(bottom).
144 CHAPTER 8. NETWORK ANALYZING TOOLS
When the setup is left by clicking on CANCEL all the changes made in the setup are lost.
When leaving the setup by pressing the DONE button, the changes will be accepted if no
errors could be detected.
8.2.2 The Display Control Window of the Network Analyzer
The display control window appears, when clicking on D-CTRL button on the right side
of the network analyzer window. This windows is used to easily change the area in the
display of the network analyzer.
The following chapter introduces the models and learning functions implemented in SNNS.
A strong emphasis is placed on the models that are less well known. They can not, however,
be explained exhaustively here. We refer interested users to the literature.
wij = Æ j oi
fj0 (netj )(P
tj o j ) if unit j is an output unit
(
Æj = fj (netj ) k Æk wjk
0 if unit j is a hidden unit
This algorithm is also called online backpropagation because it updates the weights after
every training pattern.
9.1.2 Enhanced Backpropagation
An enhanced version of backpropagation uses a momentum term and
at spot elimination.
It is listed among the SNNS learning functions as BackpropMomentum.
The momentum term introduces the old weight change as a parameter for the computation
of the new weight change. This avoids oscillation problems common with the regular
146 CHAPTER 9. NEURAL NETWORK MODELS AND FUNCTIONS
backpropagation algorithm when the error surface has a very narrow minimum area. The
new weight change is computed by
wij (t + 1) = Æj oi + wij (t)
is a constant specifying the in
uence of the momentum.
The eect of these enhancements is that
at spots of the error surface are traversed
relatively rapidly with a few big steps, while the step size is decreased as the surface gets
rougher. This adaption of the step size increases learning speed signicantly.
Note that the old weight change is lost every time the parameters are modied, new
patterns are loaded, or the network is modied.
9.1.3 Batch Backpropagation
Batch backpropagation has a similar formula as vanilla backpropagation. The dierence
lies in the time when the update of the links takes place. While in vanilla backpropagation
an update step is performed after each single pattern, in batch backpropagation all weight
changes are summed over a full presentation of all training patterns (one epoch). Only
then, an update with the accumulated weight changes is performed. This update behavior
is especially well suited for training pattern parallel implementations where communication
costs are critical.
9.1.4 Backpropagation with chunkwise update
There is a third form of Backpropagation, that comes in between the online and batch
versions with regard to updating the weights. Here, a chunk is dened as the number of
patterns to be presented to the network before making any alternations to the weights.
This version is very useful for training cases with very large training sets, where batch
update would take to long to converge and online update would be too instable. We found
to achieve excellent results with chunk sizes between 10 and 100 patterns.
This algorithm allows also to add random noise to the link weights before the handling
of each chunk. This weights jogging proofed to be very useful for complicated training
tasks. Note, however, that it has to be used very carefully! Since this noise is added fairly
frequently, it can destroy all learning progress if the noise limits are chosen to large. We
recommend to start with very small values (e.g. [-0.01 , 0.01]) and try larger values only
when everything is looking stable. Note also, that this weights jogging is independent from
the one dened in the jog-weights panel. If weights jogging is activated in the jog-weights
panel, it will operate concurrently, but on an epoch basis and not on a chunk basis. See
section 4.3.3 for details on how weights jogging is performed in SNNS.
It should be clear, that weights jogging will make it hard to reproduce your exact learning
results!
Another new feature introduced by this learning scheme is the notion of selective updating
of units. This feature can be exploited only with patterns that contain class information.
See chapter 5.4 for details on this pattern type.
9.1. BACKPROPAGATION NETWORKS 147
Using class based pattern sets and a special naming convention for the network units,
this learning algorithm is able to train dierent parts of the network individually. Given
the example pattern set of page 98, it is possible to design a network which includes
units that are only trained for class A or for class B (independent of whether additional
class redistribution is active or not). To utilise this feature the following points must be
observed.
Within this learning algorithm, dierent classes are known by the number of their
position according to an alphabetic ordering and not by their class names. E.g.:
If there are pattern classes named alpha, beta, delta, all alpha patterns belong
to class number 0, all beta patterns to number 1, and all delta patterns to class
number 2.
If the name of a unit matches the regular expression class+x[+y]* (x,y 2 f0; 1; :::32g),
it is trained only if the class number of the current pattern matches one of the given
x, y, ... values. E.g.: A unit named class+2 is only trained on patterns with
class number 2, a unit named class+2+0 is only trained on patterns with class
number 0 or 2.
If the name of a unit matches the regular expression class-x[-y]* (x,y 2 f0; 1; :::32g),
it is trained only if the the class number of the current pattern does not match any of
the given x, y, ... values. E.g.: A unit named class-2 is trained on all patterns
but those with class number 2, a unit named class-2-0 is only trained on patterns
with class numbers other than 0 and 2.
All other network units are trained as usual.
The notion of training or not training a unit in the above description refers to adding up
weight changes for incoming links and the unit's bias value. After one chunk has been
completed, each link weight is individually trained (or not), based on its own update count.
The learning rate is normalised accordingly.
The parameters this function requires are:
: learning parameter, species the step width of the gradient descent as with
Std Backpropagation. Use the same values as there (0.2 to 0.5).
dmax : the maximum training output dierences as with Std Backpropagation. Usu-
ally set to 0.0
N : chunk size. The number of patterns to be presented during training before an
update of the weights with the accumulated error will take place. Depending on the
overall size of the pattern set used, a value between 10 and 100 is suggested here.
lowerlimit: Lower limit for the range of random noise to be added for each chunk.
upperlimit: Upper limit for the range of random noise to be added for each chunk.
If both upper and lower limit are 0.0, no weights jogging takes place.
148 CHAPTER 9. NEURAL NETWORK MODELS AND FUNCTIONS
9.2 Quickprop
One method to speed up the learning is to use information about the curvature of the
error surface. This requires the computation of the second order derivatives of the error
function. Quickprop assumes the error surface to be locally quadratic and attempts to
jump in one step from the current position directly into the minimum of the parabola.
Quickprop [Fah88] computes the derivatives in the direction of each weight. After com-
puting the rst gradient with regular backpropagation, a direct step to the error minimum
is attempted by
(t + 1)wij = S (t)S (t S+(t1)+ 1) (t)wij
where:
wij weight between units i and j
(t + 1) actual weight change
S (t + 1) partial derivative of the error function by wij
S (t) the last partial derivative
9.3 RPROP
9.3.1 Changes in Release 3.3
The implementation of Rprop has been changed in two ways: First, the implementation
now follows a slightly modied adaptation scheme. Essentially, the backtracking step is
no longer performed, if a jump over a minimum occurred. Second, a weight-decay term is
introduced. The weight-decay parameter (the third learning parameter) determines the
relationship of two goals, namely to reduce the output error (the standard goal) and to
reduce the size of the weights (to improve generalization). The composite error function
is:
E = (ti oi )2 + 10 wij2
X X
9.3. RPROP 149
Important: Please note that the weight decay parameter denotes the exponent, to allow
comfortable input of very small weight-decay. A choice of the third learning parameter
= 4 corresponds to a ratio of weight decay term to output error of 1 : 10000(1 : 104 ).
>
(9.1)
0 ; else
>
:
where @w@E (t) denotes the summed gradient information over all patterns of the pattern
set ('batch learning').
ij
It should be noted, that by replacing the 4ij (t) by a constant update-value 4, equation
(9.1) yields the so-called 'Manhattan'-update rule.
The second step of Rprop learning is to determine the new update-values 4ij (t). This is
based on a sign-dependent adaptation process.
4(ijt) =
ij ij
if @w@E (t 1) @w@E (t) < 0
<
>
4(ijt 1) ; ij ij
(9.2)
4(ijt 1) ;else
>
>
:
In order to reduce the number of freely adjustable parameters, often leading to a tedious
search in parameter space, the increase and decrease factor are set to xed values ( :=
0:5; + = 1:2).
150 CHAPTER 9. NEURAL NETWORK MODELS AND FUNCTIONS
For Rprop tries to adapt its learning process to the topology of the error function, it follows
the principle of 'batch learning' or 'learning by epoch'. That means, that weight-update
and adaptation are performed after the gradient information of the whole pattern set is
computed.
9.3.3 Parameters
The Rprop algorithm takes three parameters: the initial update-value 40, a limit for the
maximum step size, 4max , and the weight-decay exponent (see above).
When learning starts, all update-values are set to an initial value 40. Since 40 directly
determines the size of the rst weight step, it should be chosen according to the initial
values of the weights themselves, for example 40 = 0:1 (default setting). The choice of
this value is rather uncritical, for it is adapted as learning proceeds.
In order to prevent the weights from becoming too large, the maximum weight-step de-
termined by the size of the update-value, is limited. The upper bound is set by the
second parameter of Rprop, 4max . The default upper bound is set somewhat arbitrarily
to 4max = 50:0. Usually, convergence is rather insensitive to this parameter as well. Nev-
ertheless, for some problems it can be advantageous to allow only very cautious (namely
small) steps, in order to prevent the algorithm getting stuck too quickly in suboptimal
local minima. The minimum step size is constantly xed to 4min = 1e 6 .
9.4.1 Parameters
To keep the relation to the previous Rprop implementation, the rst three parameters have
still the same semantics. However, since tuning of the rst two parameters has almost no
positive in
uence on the generalization error, we recommend to keep them constant, i.e.
the rst parameter (initial step size) is set to 0:001 or smaller, and the second parameter
(the maximal step size) is set to 0:1 or smaller. There is no need for larger values, since
the weight-decay regularizer keeps the weights small anyways. Larger values might only
disturb the learning process. The third parameter determines the initial weighting of
the weight-decay regularizer, and is updated during the learning process. The fourth
parameter species how often the weighting parameter is updated, e.g. every 50 epochs.
The algorithm for determining assumes that the network was trained to a local minima
of the current error function, and than re-estimates thus changing the error function.
9.4. RPROP WITH ADAPTIVE WEIGHT-DECAY (RPROPMAP) 151
The forth parameter should therefore be set in a way, that the network has had the chance
to learn something sensible.
The fth parameter allows to select dierent error-functions:
0: Sum-square error for regression problems
1: Cross-Entropy error for classication problems with two classes. The output neuron
needs to have a sigmoid activation function, e.g., a range from 0 to 1.
2: Multiple cross entropy function for classication problems with several classes. The
output neurons needs to have the softmax - activati on function
Fore a discussion about error functions see also the book of C. Bishop.
The theorem of Bayes is used within the Bayesian framework to relate the posteriori
distribution of weights p(wjD), (i.e. after using the data D) to a prior assumption about
the weights p(w) and the noise in the target data respectively the likelihood p(Djw), i.e.
to which extent the model is consistent with the observed data:
One can show that the weight-decay regularizer corresponds to the assumption that the
weights are normally distributed with mean 0. We are minimizing the error function
E = ED + EW , where ED is the error of the neural network (e.g. sum square error)
and EW is a regularization term (e.g. weight-decay). Making use of the MAP-approach
(MAximum Posterior) we can adapt from time to time during the learning process.
Under the assumption that the weights have a Gaussian distribution with zero mean and
variance 1= and that the error has also a Gaussian distribution with variance 1= , one can
adjust these two hyper-parameters by maximizing the evidence, which is the a-posteriori
probability of and . Setting = = every few epochs, the hyper-parameters are re-
estimated by new = W= wi2 and new = N=ED , where W is the number of weights and
P
N is the number of patterns. The iterative approach is necessary since we are interested in
the most probable weight vector and the values for and . This problem is resolved by
rst adjusting the weights, and then re-estimating the hyper-parameters with xed weight
vector.
Note that the method does not need a validation set, but all parameters are solely deter-
mined during the training process, i.e. there is more data to train and test the model.
In practical applications results are better, when the initial guess for the weight decay is
good. This reduces the number of necessary iterations as well as the probability to overt
heavily in the beginning. An initial guess can be obtained by dividing the training set in
two sets and determine the weight decay 'by hand' as in the standard case.
See also the Readme le for the rpropMAP network in the examples directory.
152 CHAPTER 9. NEURAL NETWORK MODELS AND FUNCTIONS
9.5 Backpercolation
Backpercolation 1 (Perc1) is a learning algorithm for feedforward networks. Here the
weights are not changed according to the error of the output layer as in backpropagation,
but according to a unit error that is computed separately for each unit. This eectively
reduces the amount of training cycles needed.
The algorithm consists of ve steps:
1. A pattern is propagated through the network and the global error is computed.
2. The gradient Æ is computed and propagated back through the hidden layers as in
backpropagation.
3. The error in the activation of each hidden neuron is computed. This error species
the value by which the output of this neuron has to change in order to minimize the
global error Err.
4. All weight parameters are changed according to .
5. If necessary, an adaptation of the error magnifying parameter is performed once
every learning epoch.
The third step is divided into two phases: First each neuron receives a message ,
specifying the proposed change in the activation of the neuron (message creation - MCR).
Then each neuron combines the incoming messages to an optimal compromise, the internal
error of the neuron (message optimization - MOP). The MCR phase is performed in
forward direction (from input to output), the MOP phase backwards.
The internal error k of the output units is dened as k = (dk k ), where is the
global error magnication parameter.
Unlike backpropagation Perc1 does not have a learning parameter. Instead it has an error
magnication parameter . This parameter may be adapted after each epoch, if the total
mean error of the network falls below the threshold value .
When using backpercolation with a network in SNNS the initialization function Random -
Weights Perc and the activation function Act TanH Xdiv2 should be used.
9.6 Counterpropagation
9.6.1 Fundamentals
Counterpropagation was originally proposed as a pattern-lookup system that takes ad-
vantage of the parallel architecture of neural networks. Counterpropagation is useful in
pattern mapping and pattern completion applications and can also serve as a sort of
bidirectional associative memory.
When presented with a pattern, the network classies that pattern by using a learned
reference vector. The hidden units play a key role in this process, since the hidden layer
9.6. COUNTERPROPAGATION 153
performs a competitive classication to group the patterns. Counterpropagation works
best on tightly clustered patterns in distinct groups.
Two types of layers are used: The hidden layer is a Kohonen layer with competitive
units that do unsupervised learning; the output layer is a Grossberg layer, which is fully
connected with the hidden layer and is not competitive.
When trained, the network works as follows. After presentation of a pattern in the input
layer, the units in the hidden layer sum their inputs according to
netj = wij oi
X
i
and then compete to respond to that input pattern. The unit with the highest net input
wins and its activation is set to 1 while all others are set to 0. After the competition, the
output layer does a weighted sum on the outputs of the hidden layer.
ak = netk =
X
wjk oj
j
Let c be the index of the winning hidden layer neuron. Since oc is the only nonzero element
in the sum, which in turn is equal to one, this can be reduced to
ak = wck
Thus the winning hidden unit activates a pattern in the output layer.
During training, the weights are adapted as follows:
1. A winner of the competition is chosen in response to an input pattern.
2. The weights between the input layer and the winner are adjusted according to
wic (t + 1) = wic (t) + (oi wic (t))
All the other weights remain unchanged.
3. The output of the network is computed and compared to the target pattern.
4. The weights between the winner and the output layer are updated according to
wck (t + 1) = wck (t) + (ok wck (t))
All the other weights remain unchanged.
9.6.2 Initializing Counterpropagation
For Counterpropagation networks three initialization functions are available: CPN Rand -
Pat, CPN Weights v3.2, and CPN Weights v3.3. See section 4.6 for a detailed description
of these functions.
Note:
In SNNS versions 3.2 and 3.3 there was only the initialization function CPN Weights avail-
able. Although it had the same name, there was a signicant dierence between the two.
154 CHAPTER 9. NEURAL NETWORK MODELS AND FUNCTIONS
The older version, still available now as CPN Weights v3.2 selected its values from the hy-
percube dened by the two initialization parameters. This resulted in an uneven distribu-
tion of these values after they had been normalized, thereby biasing the network towards a
certain (unknown) direction. The newer version, still available now as CPN Weights v3.3
selected its values from the hypersphere dened by the two initialization parameters. This
resulted in an even distribution of these values after they had been normalized. However it
had the disadvantage of having an exponential time complexity, thereby making it useless
for networks with more than about 15 input units. The in
uence of the parameters on
these two functions is given below.
Two parameters are used which represent the minimum (a) and maximum (b) of the range
out of which initial values for the second (Grossberg) layer are selected at random. The
vector wi of weights leading to unit i of the Kohonen layer are initialized as normalized
vectors (length 1) drawn at random from part of a hyper-sphere (hyper-cube). Here, min
and max determine which part of the hyper body is used according to table 9.1.
min (a) max (b) vectors out of .. .
a0 b0 positive sector
a0 b<0 whole hyper-sphere
a<0 b0 whole hyper-sphere
a<0 b<0 negative sector
Table 9.1: In
uence of minimum and maximum on the initialization of weight vectors for
CPN and SOM.
. . .
. . . output unit
input layer . . .
. . .
. . . .
. . . .
. . . .
hidden layer
layers and the weights are adapted using the formula for backpropagation with mo-
mentum term after each pattern. The momentum term uses the weight change
during the previous pattern. Using small learning rates eta, BPTT is especially use-
ful to start adaption with a large number of patterns since the weights are updated
much more frequently than in batch-update.
BBPTT: Batch backpropagation through time.
The gradient for each weight is calculated for each pattern as in BPTT and then
averaged over the whole training set. The momentum term uses update information
closer to the true gradient than in BPTT.
QPTT: Quickprop through time.
The gradient in quickprop through time is calculated as in BBPTT, but the weights
are adapted using the substantially more eÆcient quickprop-update rule.
A recurrent network has to start processing a sequence of patterns with dened activations.
All activities in the network may be set to zero by applying an input pattern containing
only zero values. If such all-zero patterns are part of normal input patterns, an extra input
unit has to be added for reset control. If this reset unit is set to 1, the network is in the
free running mode. If the reset unit and all normal input units are set to 0, all activations
in the network are set to 0 and all stored activations are cleared as well.
The processing of an input pattern I (t) with a set of non-input activations ai(t) is per-
formed as follows:
1. The input pattern I (t) is copied to the input units to become a subset of the existing
unit activations ai(t) of the whole net.
2. If I (t) contains only zero activations, all activations ai(t+1) and all stored activations
ai (t); ai (t 1); : : : ; ai (t backstep) are set to 0:0.
3. All activations ai(t + 1) are calculated synchronously using the activation function
and activation values ai(t).
4. During learning, an output pattern O(t) is always compared with the output subset
of the new activations ai(t + 1).
Therefore there is exactly one synchronous update step between an input and an output
pattern with the same pattern number.
If an input pattern has to be processed with more than one network update, there has
to be a delay between corresponding input and output patterns. If an output pattern oP
is the n-th pattern after an input pattern iP , the input pattern has been processed in
n + 1 update steps by the network. These n + 1 steps may correspond to n hidden layers
processing the pattern or a recurrent processing path through the network with n + 1
steps. Because of this pipelined processing of a pattern sequence, the number of hidden
layers that may develop during training in a fully recurrent network is in
uenced by the
delay between corresponding input and output patterns. If the network has a dened
hierarchical topology without shortcut connections between n dierent hidden layers, an
output pattern should be the n-th pattern after its corresponding input pattern in the
pattern le.
9.9. THE CASCADE CORRELATION ALGORITHMS 159
An example illustrating this relation is given with the delayed XOR network in the net-
work le xor-rec.net and the pattern les xor-rec1.pat and xor-rec2.pat. With the
patterns xor-rec1.pat, the task is to compute the XOR function of the previous input
pattern. In xor-rec2.pat, there is a delay of 2 patterns for the result of the XOR of
the input pattern. Using a xed network topology with shortcut connections, the BPTT
learning algorithm develops solutions with a dierent number of processing steps using
the shortcut connections from the rst hidden layer to the output layer to solve the task
in xor-rec1.pat. To map the patterns in xor-rec2.pat the result is rst calculated in
the second hidden layer and copied from there to the output layer during the next update
step3.
The update function BPTT-Order performs the synchronous update of the network and
detects reset patterns. If a network is tested using the TEST button in the control panel,
the internal activations and the output activation of the output units are rst overwritten
with the values in the target pattern, depending on the setting of the button SHOW .
To provide correct activations on feedback connections leading out of the output units
in the following network update, all output activations are copied to the units initial
activation values i act after each network update and are copied back from i act to out
before each update. The non-input activation values may therefore be in
uenced before a
network update by changing the initial activation values i act.
If the network has to be reset by stepping over a reset pattern with the TEST button,
keep in mind that after clicking TEST , the pattern number is increased rst, the new
input pattern is copied into the input layer second, and then the update function is called.
So to reset the network, the current pattern must be set to the pattern directly preceding
the reset pattern.
Cascade-Correlation (CC) combines two ideas: The rst is the cascade architecture, in
which hidden units are added only one at a time and do not change after they have been
added. The second is the learning algorithm, which creates and installs the new hidden
units. For each new hidden unit, the algorithm tries to maximize the magnitude of the
correlation between the new unit's output and the residual error signal of the net.
The algorithm is realized in the following way:
1. CC starts with a minimal network consisting only of an input and an output layer.
Both layers are fully connected.
2. Train all the connections ending at an output unit with a usual learning algorithm
until the error of the net no longer decreases.
3. Generate the so-called candidate units. Every candidate unit is connected with all
input units and with all existing hidden units. Between the pool of candidate units
and the output units there are no weights.
4. Try to maximize the correlation between the activation of the candidate units and
the residual error of the net by training all the links leading to a candidate unit.
Learning takes place with an ordinary learning algorithm. The training is stopped
when the correlation scores no longer improves.
5. Choose the candidate unit with the maximum correlation, freeze its incoming weights
and add it to the net. To change the candidate unit into a hidden unit, generate
links between the selected unit and all the output units. Since the weights leading
to the new hidden unit are frozen, a new permanent feature detector is obtained.
Loop back to step 2.
This algorithm is repeated until the overall error of the net falls below a given value.
Figure 9.2 shows a net after 3 hidden units have been added.
The training of the output units tries to minimize the sum-squared error E :
E=
1 tpo )2
2 o (ypo
X X
where tpo is the desired and ypo is the observed output of the output unit o for a pattern
p. The error E is minimized by gradient decent using
9.9. THE CASCADE CORRELATION ALGORITHMS 161
Outputs
Output Units
Hidden Unit 3
Hidden Unit 2
Hidden Unit 1
Inputs
Bias 1
Figure 9.2: A neural net trained with cascade-correlation after 3 hidden units have been
added. The vertical lines add all incoming activations. Connections with white boxes are
frozen. The black connections are trained repeatedly.
@wio p
where fp0 is the derivative of an activation function of a output unit o and Iip is the value of
an input unit or a hidden unit i for a pattern p. wio denominates the connection between
an input or hidden unit i and an output unit o.
After the training phase the candidate units are adapted, so that the correlation C between
the value ypo of a candidate unit and the residual error epo of an output unit becomes
maximal. The correlation is given by Fahlman with:
X X
C =
(ypo yo)(epo eo )
o p
X X
= ypoepo eo
X
ypo
o p p
X X
=
ypo(epo eo ) ;
o p
where yo is the average activation of a candidate unit and eo is the average error of an
output unit over all patterns p. The maximization of C proceeds by gradient ascent using
162 CHAPTER 9. NEURAL NETWORK MODELS AND FUNCTIONS
where o is the sign of the correlation between the candidate unit's output and the residual
error at output o.
This is a quite dierent modication, originally proposed by H. Klagges and M. Soegtrop.
The idea of LFCC is not to reduce the number of layers, but to reduce the Fan-In of
the units. Units with constant and smaller Fan-In are easier to build in hardware or on
massively parallel environments.
Every candidate unit (and so the hidden units) has a maximal Fan-In of k. If the number
of input units plus the number of installed hidden units is smaller or equal to k, that's
no problem. The candidate gets inputs from all of them. If the number of possible
input-connections exceeds k, a random set with cardinality k is chosen, which functions
as inputs for the candidate. Since every candidate could have a dierent set of inputs,
the correlation of the candidate is a measure for the usability of the chosen inputs. If this
modication is used, one should increase the number of candidate units (Klagges suggests
500 candidates).
In this approach the candidates are not trained to maximize the correlation with the
global error function. Only a good correlation with the error of a part of the output units
is necessary. If you want to use this modication there has to be more than one output
unit.
The algorithm works as follows:
Every candidate unit belongs to one of g (1 < g min(no; nc), nh number of output
units, nc number of candidates) groups. The output units are distributed to the groups.
The candidates are trained to maximize the correlation to the error of the output units of
their group. The best candidate of every group will be installed, so every layer consists of
k units.
As stated in [SB94] and [Gat96] the depth of net can be reduced down to one hidden
layer with SDCC, RLCC or a static method for many problems. If the number of layers
is smaller than three or four, the number of needed units will increase, for deeper nets
the increase is low. There seems to be little dierence between the three algorithms with
regard to generalisation and number of needed units.
LFCC reduces the depth too, but mainly the needed links. It is interesting that for
example the 2-spiral-problem can be learned with 16 units with Fan-In of 2 [Gat96]. But
the question seems to be how the generalisation results have to be interpreted.
9.9. THE CASCADE CORRELATION ALGORITHMS 165
9.9.3 Pruned-Cascade-Correlation (PCC)
The aim of Pruned-Cascade-Correlation (PCC) is to minimize the expected test set error,
instead of the actual training error [Weh94]. PCC tries to determine the optimal number
of hidden units and to remove unneeded weights after a new hidden unit is installed. As
pointed out by Wehrfritz, selection criteria or a hold-out set, as it is used in \stopped-
learning", may be applied to digest away unneeded weights. In this release of SNNS,
however, only selection criteria for linear models are implemented.
The algorithm works as follows (CC steps are printed italic):
1. Train the connections to the output layer
2. Compute the selection criterion
3. Train the candidates
4. Install the new hidden neuron
5. Compute the selection criterion
6. Set each weight of the last inserted unit to zero and compute the selection criterion;
if there exists a weight, whose removal would decrease the selection criterion, remove
the link, which decreases the selection criterion most. Goto step 5 until a further
removal would increase the selection criterion.
7. Compute the selection criterion; if it is greater than the one, computed before in-
serting the new hidden unit, notify the user that the net is getting too big.
In this release of SNNS, three model selection criteria are implemented: the Schwarz's
Bayesian criterion (SBC), Akaikes information criterion (AIC) and the conservative mean
square error of prediction (CMSEP). The SBC, the default criterion, is more conservative
compared to the AIC. Thus, pruning via the SBC will produce smaller networks than
pruning via the AIC. Be aware that both SBC and AIC are selection criteria for linear
models, whereas the CMSEP does not rely on any statistical theory, but happens to work
pretty well in an application. These selection criteria for linear model can sometimes
directly be applied to nonlinear models, if the sample size is large.
The RCC algorithm has been removed from the SNNS repository. It was unstable and
showed to be outperformed by Jordan and Elman networks in all applications tested.
166 CHAPTER 9. NEURAL NETWORK MODELS AND FUNCTIONS
Networks that make use of the cascade correlation architecture can be created in SNNS
in the same way as all other network types. The control of the training phase, however,
is moved from the control panel to the special cascade window described below. The
control panel is still used to specify the learning parameters, while the text eld CYCLE
does not specify as usual the number of learning cycles. This eld is used here to specify
the maximal number of hidden units to be generated during the learning phase. The
number of learning cycles is entered in the cascade window. The learning parameters for
the embedded learning functions Quickprop, Rprop and Backprop are described in chapter
4.4.
If the topology of a net is specied correctly, the program will automatically order the
units and layers from left to right in the following way: input layer, hidden layer, output
layer, and a candidate layer. 4 The hidden layer is generated with 5 units always having
the same x-coordinate (i.e. above each other on the display).
The cascade correlation control panel and the cascade window (see g. 9.3), is opened by
clicking the Cascade button in the manager panel. The cascade window is needed to
set the parameters of the CC learning algorithm. To start Cascade Correlation, learning
function CC, update function CC Order and init function CC Weights in the corresponding
menus have to be selected. If one of these functions is left out, a conrmer window with
an error message pops up and learning does not start. The init functions of cascade dier
from the normal init functions: upon initialization of a cascade net all hidden units are
deleted.
The cascade window has the following text elds, buttons and menus:
Global parameters:
{ Max. output unit error:
This value is used as abort condition for the CC learning algorithm. If the
error of every single output unit is smaller than the given value learning will
be terminated.
{ Learning function:
Here, the learning function used to maximize the covariance or to minimize the
net error can be selected from a pull down menu. Available learning functions
are: Quickprop, Rprop Backprop and Batch-Backprop.
{ Modification:
One of the modications described in the chapters 9.9.2.1 to 9.9.2.6 can be
chosen. Default is no modication.
{ Print covariance and error:
If the YES button is on, the development of the error and and the covariance of
every candidate unit is printed. NO prevents all outputs of the cascade steps.
4
The candidate units are realized as special units in SNNS.
9.9. THE CASCADE CORRELATION ALGORITHMS 167
Output Parameters:
{ Error change:
analogous to Min. covariance change
{ Output patience:
analogous to Candidate patience
{ Max. no. of epochs:
analogous to Max. no. of covariance updates
The button DELETE CAND. UNITS was deleted from this window. Now all candidates
are automatically deleted at the end of training.
9.10. TIME DELAY NETWORKS (TDNNS) 169
9.10 Time Delay Networks (TDNNs)
9.10.1 TDNN Fundamentals
Time delay networks (or TDNN for short), introduced by Alex Waibel ([WHH+ 89]), are
a group of neural networks that have a special topology. They are used for position
independent recognition of features within a larger pattern. A special convention for
naming dierent parts of the network is used here (see gure 9.4)
2nd Feature Unit
width
Receptive Field
Delay
Length
Total Delay
Length
Couppled Weights
they would experience if treated separately. Also the units' bias which realizes a
special sort of link weight is duplicated over all delay steps of a current feature unit.
In gure 9.4 only two pairs of coupled links are depicted (out of 54 quadruples) for
simplicity reasons.
The activation of a unit is normally computed by passing the weighted sum of its inputs to
an activation function, usually a threshold or sigmoid function. For TDNNs this behavior
is modied through the introduction of delays. Now all the inputs of a unit are each
multiplied by the N delay steps dened for this layer. So a hidden unit in gure 9.4 would
get 6 undelayed input links from the six feature units, and 7x6 = 48 input links from the
seven delay steps of the 6 feature units for a total of 54 input connections. Note, that
all units in the hidden layer have 54 input links, but only those hidden units activated at
time 0 (at the top most row of the layer) have connections to the actual feature units. All
other hidden units have the same connection pattern, but shifted to the bottom (i.e. to a
later point in time) according to their position in the layer (i.e. delay position in time).
By building a whole network of time delay layers, the TDNN can relate inputs in dierent
points in time or input space.
Training in this kind of network is performed by a procedure similar to backpropagation,
that takes the special semantics of coupled links into account. To enable the network to
achieve the desired behavior, a sequence of patterns has to be presented to the input layer
with the feature shifted within the patterns. Remember that since each of the feature units
is duplicated for each frame shift in time, the whole history of activations is available at
once. But since the shifted copies of the units are mere duplicates looking for the same
event, weights of the corresponding connections between the time shifted copies have to be
treated as one. First, a regular forward pass of backpropagation is performed, and the error
in the output layer is computed. Then the error derivatives are computed and propagated
backward. This yields dierent correction values for corresponding connections. Now all
correction values for corresponding links are averaged and the weights are updated with
this value.
This update algorithm forces the network to train on time/position independent detection
of sub-patterns. This important feature of TDNNs makes them independent from error-
prone preprocessing algorithms for time alignment. The drawback is, of course, a rather
long, computationally intensive, learning phase.
The original time delay algorithm was slightly modied for implementation in SNNS, since
it requires either variable network sizes or xed length input patterns. Time delay networks
in SNNS are allowed no delay in the output layer. This has the following consequences:
The input layer has xed size.
Not the whole pattern is present at the input layer at once. Therefore one pass
through the network is not enough to compute all necessary weight changes. This
makes learning more computationally intensive.
9.10. TIME DELAY NETWORKS (TDNNS) 171
The coupled links are implemented as one physical (i.e. normal) link and a set of logical
links associated with it. Only the physical links are displayed in the graphical user in-
terface. The bias of all delay units has no eect. Instead, the bias of the corresponding
feature unit is used during propagation and backpropagation.
9.10.2.1 Activation Function
For time delay networks the new activation function Act TD Logistic has been imple-
mented. It is similar to the regular logistic activation function Act Logistic but takes
care of the special coupled links. The mathematical notation is again
aj (t + 1) =
1
1 + e ( w o (t) )
P
i ij i j
where oi includes now also the predecessor units along logical links.
9.10.2.2 Update Function
The update function TimeDelay Order is used to propagate patterns through a time delay
network. It's behavior is analogous to the Topological Order function with recognition
of logical links.
9.10.2.3 Learning Function
The learning function TimeDelayBackprop implements the modied backpropagation algo-
rithm discussed above. It uses the same learning parameters as standard backpropagation.
9.10.3 Building and Using a Time Delay Network
In SNNS, TDNNs should be generated only with the tool BIGNET (Time Delay). This
program automatically denes the necessary variables and link structures of TDNNs. The
logical links are not depicted in the displays and can not be modied with the graphical
editor. Any modications of the units after the creation of the network may result in
undesired behavior or even system failure!
After the creation of the net, the unit activation function Act TD Logistic, the update
function TimeDelay Order, and the learning function TimeDelayBackprop have to be
assigned in the usual way.
NOTE: Only after the special time delay learning function has been assigned, will a save
of the network also save the special logical links! A network saved beforehand will lack
these links and be useless after a later load operation. Also using the TEST and STEP
button will destroy the special time delay information unless the right update function
(TimeDelay Order) has been chosen.
Patterns must t the input layer. If the application requires variable pattern length, a
tool to segment these patterns into tting pieces has to be applied. Patterns may also
172 CHAPTER 9. NEURAL NETWORK MODELS AND FUNCTIONS
be generated with the graphical user interface. In this case, it is the responsibility of the
user to supply enough patterns with time shifted features for the same teaching output to
allow a successful training.
i=1
h is the radial basis function and ~ti are the K centers which have to be selected. The
coeÆcients ci are also unknown at the moment and have to be computed. x~i and t~i are
elements of an n{dimensional vector space.
h is applied to the Euclidian distance between each center t~i and the given argument ~x.
Usually a function h which has its maximum at a distance of zero is used, most often the
Gaussian function. In this case, values of ~x which are equal to a center ~t yield an output
value of 1.0 for the function h, while the output becomes almost zero for larger distances.
The function f should be an approximation of the N given pairs (x~i; yi) and should
therefore minimize the following error function H :
N
H [f ] = (yi f (x~i ))2 + kP f k2
X
i=1
The rst part of the denition of H (the sum) is the condition which minimizes the total
error of the approximation, i.e. which constrains f to approximate the N given points.
The second part of H ( kP f k2 ) is a stabilizer which forces f to become as smooth as
possible. The factor determines the in
uence of the stabilizer.
Under certain conditions it is possible to show that a set of coeÆcients ci can be calculated
so that H becomes minimal. This calculation depends on the centers ~ti which have to be
chosen beforehand.
9.11. RADIAL BASIS FUNCTIONS (RBFS) 173
Introducing the following vectors and matrices ~c = (c1 ; ; cK )T ; ~y = (y1; ; yN )T
h(jx~1 t~1 j) h(jx~1 t~K j) h(jt~1 t~1 j) h(jt~1 t~K j)
0 1 0 1
G=
B
B
@
.
.. ... .
.. ; G2 =
C
C
A
... B
B
@
... ... C
C
A
h(jx~N t~1 j) h(jx~N t~K j) h(jt~K t~1 j) h(jt~K t~K j)
By setting to 0 this formula becomes identical to the computation of the Moore Penrose
inverse matrix, which gives the best solution of an under-determined system of linear
equations. In this case, the linear system is exactly the one which follows directly from
the conditions of an exact interpolation of the given problem:
K
f (x~j ) = ci h(jx~j t~i j) =! yj ; j = 1; : : : N
X
i=1
The method of radial basis functions can easily be represented by a three layer feedforward
neural network. The input layer consists of n units which represent the elements of the
vector ~x. The K components of the sum in the denition of f are represented by the units
of the hidden layer. The links between input and hidden layer contain the elements of the
vectors t~i. The hidden units compute the Euclidian distance between the input pattern
and the vector which is represented by the links leading to this unit. The activation of the
hidden units is computed by applying the Euclidian distance to the function h. Figure 9.5
shows the architecture of the special form of hidden units.
t1;1
x1 h(j~x t~j j)
t2;1 c1
x2 d1
d2 o
dn
tn;1
cK
xn h(j~x t~j j)
tn;K
The single output neuron gets its input from all hidden neurons. The links leading to the
output neuron hold the coeÆcients ci. The activation of the output neuron is determined
by the weighted sum of its inputs.
The previously described architecture of a neural net, which realizes an approximation
using radial basis functions, can easily be expanded with some useful features: More than
one output neuron is possible which allows the approximation of several functions f around
the same set of centers t~i. The activation of the output units can be calculated by using
a nonlinear invertible function (e.g. sigmoid). The bias of the output neurons and a
direct connection between input and hidden layer (shortcut connections) can be used to
improve the approximation quality. The bias of the hidden units can be used to modify
the characteristics of the function h. All in all a neural network is able to represent the
following set of approximations:
0 1
K n
ok (~x) = @ cj;k h j~x t~j j; pj + di;k xi + bk A = (fk (~x)) ; k = 1; : : : ; m
X X
j =1 i=1
This formula describes the behavior of a fully connected feedforward net with n input, K
hidden and m output neurons. ok (~x) is the activation of output neuron k on the input
~x = x1 ; x2 ; : : : ; xn to the input units. The coeÆcients cj;k represent the links between
hidden and output layer. The shortcut connections from input to output are realized by
di;k . bk is the bias of the output units and pj is the bias of the hidden neurons which
determines the exact characteristics of the function h. The activation function of the
output neurons is represented by .
The big advantage of the method of radial basis functions is the possibility of a direct
computation of the coeÆcients cj;k (i.e. the links between hidden and output layer) and
the bias bk . This computation requires a suitable choice of centers t~j (i.e. the links between
input and hidden layer). Because of the lack of knowledge about the quality of the t~j , it
is recommended to append some cycles of network training after the direct computation
of the weights. Since the weights of the links leading from the input to the output layer
can also not be computed directly, there must be a special training procedure for neural
networks that uses radial basis functions.
The implemented training procedure tries to minimize the error E by using gradient
descent. It is recommended to use dierent learning rates for dierent groups of trainable
parameters. The following set of formulas contains all information needed by the training
procedure:
!
m N @E @E
E= (yi;k ok (x~i ))2 t~j = pj =
X X
; 1 ~ ; 2
k=1 i=1 @ tj @pj
@E @E @E
cj;k = 3
@cj;k
; di;k = 3
@di;k
; bk = 3
@bk
9.11. RADIAL BASIS FUNCTIONS (RBFS) 175
It is often helpful to use a momentum term. This term increases the learning rate in
smooth error planes and decreases it in rough error planes. The next formula describes
the eect of a momentum term on the training of a general parameter g depending on the
additional parameter . gt+1 is the change of g during the time step t + 1 while gt is
the change during time step t:
@E
gt+1 =
@g
+ gt
Another useful improvement of the training procedure is the denition of a maximum
allowed error inside the output neurons. This prevents the network from getting over-
trained, since errors that are smaller than the predened value are treated as zero. This
in turn prevents the corresponding links from being changed.
9.11.2 RBF Implementation in SNNS
9.11.2.1 Activation Functions
For the use of radial basis functions, three dierent activation functions h have been
implemented. For computational eÆciency the square of the distance r2 = j~x ~tj2 is
uniformly used as argument for h. Also, an additional argument p has been dened which
represents the bias of the hidden units. The vectors ~x and ~t result from the activation and
weights of links leading to the corresponding unit. The following radial basis functions
have been implemented:
1. Act RBF Gaussian | the Gaussian function
h(r2 ; p) = h(q; p) = e pq where q = j~x ~t j2
During the construction of three layered neural networks based on radial basis functions,
it is important to use the three activation functions mentioned above only for neurons
inside the hidden layer. There is also only one hidden layer allowed.
For the output layer two other activation functions are to be used:
1. Act IdentityPlusBias
2. Act Logistic
176 CHAPTER 9. NEURAL NETWORK MODELS AND FUNCTIONS
Act IdentityPlusBias activates the corresponding unit with the weighted sum of all
incoming activations and adds the bias of the unit. Act Logistic applies the sigmoid
logistic function to the weighted sum which is computed like in Act IdentityPlusBias.
In general, it is necessary to use an activation function which pays attention to the bias
of the unit.
The last two activation functions converge towards innity, the rst converges towards
zero. However, all three functions are useful as base functions. The mathematical precon-
ditions for their use are fullled by all three functions and their use is backed by practical
experience. All three functions have been implemented as base functions into SNNS.
The most frequently used base function is the Gaussian function. For large distances r,
the Gaussian function becomes almost 0. Therefore, the behavior of the net is easy to
predict if the input patterns dier strongly from all teaching patterns. Another advantage
of the Gaussian function is, that the network is able to produce useful results without the
use of shortcut connections between input and output layer.
9.11.2.2 Initialization Functions
The goal in initializing a radial basis function network is the optimal computation of link
weights between hidden and output layer. Here the problem arises that the centers t~j (i.e.
link weights between input and hidden layer) as well as the parameter p (i.e. the bias of
the hidden units) must be set properly. Therefore, three dierent initialization procedures
have been implemented which perform dierent tasks:
1. RBF Weights: This procedure rst selects evenly distributed centers t~j from the
loaded training patterns and assigns them to the links between input and hidden
layer. Subsequently the bias of all neurons inside the hidden layer is set to a value
determined by the user and nally the links between hidden and output layer are
computed. Parameters and suggested values are: 0scale (0); 1scale (1); smoothness
(0); bias (0.02); deviation (0).
2. RBF Weights Redo: In contrast to the preceding procedure only the links between
hidden and output layer are computed. All other links and bias remain unchanged.
3. RBF Weights Kohonen: Using the self{organizing method of Kohonen feature maps,
appropriate centers are generated on base of the teaching patterns. The computed
centers are copied into the corresponding links. No other links and bias are changed.
It is necessary that valid patterns are loaded into SNNS to use the initialization. If no
patterns are present upon starting any of the three procedures an alert box will occur
showing the error. A detailed description of the procedures and the parameters used is
given in the following paragraphs.
RBF Weights Of the named three procedures RBF Weights is the most comprehensive
one. Here all necessary initialization tasks (setting link weights and bias) for a fully con-
nected three layer feedforward network (without shortcut connections) can be performed
in one single step. Hence, the choice of centers (i.e. the link weights between input and
9.11. RADIAL BASIS FUNCTIONS (RBFS) 177
hidden layer) is rather simple: The centers are evenly selected from the loaded teaching
patterns and assigned to the links of the hidden neurons. The selection process assigns
the rst teaching pattern to the rst hidden unit, and the last pattern to the last hidden
unit. The remaining hidden units receive centers which are evenly picked from the set of
teaching patterns. If, for example, 13 teaching patterns are loaded and the hidden layer
consists of 5 neurons, then the patterns with numbers 1, 4, 7, 10 and 13 are selected as
centers.
Before a selected teaching pattern is distributed among the corresponding link weights
it can be modied slightly with a random number. For this purpose, an initialization
parameter (deviation, parameter 5) is set, which determines the maximum percentage
of deviation allowed to occur randomly. To calculate the deviation, an inverse tangent
function is used to approximate a normal distribution so that small deviations are more
probable than large deviations. Setting the parameter deviation to 1.0 results in a max-
imum deviation of 100%. The centers are copied unchanged into the link weights if the
deviation is set to 0.
A small modication of the centers is recommended for the following reasons: First, the
number of hidden units may exceed the number of teaching patterns. In this case it is
necessary to break the symmetry which would result without modication. This symme-
try would render the calculation of the Moore Penrose inverse matrix impossible. The
second reason is that there may be a few anomalous patterns inside the set of teaching
patterns. These patterns would cause bad initialization results if they accidentally were
selected as a center. By adding a small amount of noise, the negative eect caused by
anomalous patterns can be lowered. However, if an exact interpolation is to be performed
no modication of centers may be allowed.
The next initialization step is to set the free parameter p of the base function h, i.e. the
bias of the hidden neurons. In order to do this, the initialization parameter bias (p),
parameter 4 is directly copied into the bias of all hidden neurons. The setting of the bias
is highly related to the base function h used and to the properties of the teaching patterns.
When the Gaussian function is used, it is recommended to choose the value of the bias so
that 5{10% of all hidden neurons are activated during propagation of every single teaching
pattern. If the bias is chosen too small, almost all hidden neurons are uniformly activated
during propagation. If the bias is chosen too large, only that hidden neuron is activated
whose center vector corresponds to the currently applied teaching pattern.
Now the expensive initialization of the links between hidden and output layer is actually
performed. In order to do this, the following formula which was already presented above
is applied:
~c = (GT G + G2 ) 1 GT ~y
K
fj (~x) = ci;j hi (~x) + bj
X
i=1
The bias of the output neuron(s) is directly set to the calculated value of b (bj ). Therefore,
it is necessary to choose an activation function for the output neurons that uses the
bias of the neurons. In the current version of SNNS, the functions Act Logistic and
Act IdentityPlusBias implement this feature.
The activation functions of the output units lead to the remaining two initialization pa-
rameters. The initialization procedure assumes a linear activation of the output units.
The link weights are calculated so that the weighted sum of the hidden neurons equals
the teaching output. However, if a sigmoid activation function is used, which is recom-
mended for pattern recognition tasks, the activation function has to be considered during
initialization. Ideally, the supposed input for the activation function should be computed
with the inverse activation function depending on the corresponding teaching output.
This input value would be associated with the vector ~y during the calculation of weights.
Unfortunately, the inverse activation function is unknown in the general case.
The rst and second initialization parameters (0 scale) and (1 scale) are a remedy for
this dilemma. They dene the two control points of a piecewise linear function which
approximates the activation function. 0 scale and 1 scale give the net inputs of the output
units which produce the teaching outputs 0 and 1. If, for example, the linear activation
function Act IdentityPlusBias is used, the values 0 and 1 have to be used. When using
the logistic activation function Act Logistic, the values -4 and 4 are recommended. If
the bias is set to 0, these values lead to a nal activation of 0:018 (resp. 0:982). These
are comparatively good approximations of the desired teaching outputs 0 and 1. The
implementation interpolates linearly between the set values of 0 scale and 1 scale. Thus,
also teaching values which dier from 0 and 1 are mapped to corresponding input values.
out
1
logistic activation
linear approximation
-4 0 4
net
0scale 1scale
Figure 9.6: Relation between teaching output, input value and logistic activation
Figure 9.6 shows the activation of an output unit under use of the logistic activation
9.11. RADIAL BASIS FUNCTIONS (RBFS) 179
function. The scale has been chosen in such a way, that the teaching outputs 0 and 1 are
mapped to the input values 2 and 2.
The optimal values used for 0 scale and 1 scale can not be given in general. With the
logistic activation function large scaling values lead to good initialization results, but
interfere with the subsequent training, since the logistic function is used mainly in its very
at parts. On the other hand, small scaling values lead to bad initialization results, but
produce good preconditions for additional training.
RBF Weights Kohonen One disadvantage of the above initialization procedure is the
very simple selection of center vectors from the set of teaching patterns. It would be
favorable if the center vectors would homogeneously cover the space of teaching patterns.
RBF Weights Kohonen allows a self{organizing training of center vectors. Here, just as the
name of the procedure already tells, the self{organizing maps of Kohonen are used (see
[Was89]). The simplest version of Kohonen's maps has been implemented. It works as
follows:
One precondition for the use of Kohonen maps is that the teaching patterns have to be
normalized. This means, that they represent vectors with length 1. K patterns have to
be selected from the set of n teaching patterns acting as starting values for the center
vectors. Now the scalar product between one teaching pattern and each center vector is
computed. If the vectors are normalized to length 1, the scalar product gives a measure
for the distance between the two multiplied vectors. Now the center vector is determined
whose distance to the current teaching pattern is minimal, i.e. whose scalar product is the
largest one. This center vector is moved a little bit in the direction of the current teaching
pattern:
This procedure is repeated for all teaching patterns several times. As a result, the center
vectors adapt the statistical properties of the set of teaching patterns.
The resp. meanings of the three initialization parameters are:
1. learn cycles: determines the number of iterations of the Kohonen training for all
teaching patterns. If 0 epochs are specied only the center vectors are set, but no
training is performed. A typical value is 50 cycles.
2. learning rate : It should be picked between 0 and 1. A learning rate of 0 leaves
the center vectors unchanged. Using a learning rate of 1 replaces the selected center
vector by the current teaching pattern. A typical value is 0.4.
3. shue: Determines the selection of initial center vectors at the beginning of the pro-
cedure. A value of 0 leads to the even selection already described for RBF Weights.
Any value other than 0 causes a random selection of center vectors from the set of
teaching patterns.
180 CHAPTER 9. NEURAL NETWORK MODELS AND FUNCTIONS
Note, that the described initialization procedure initializes only the center vectors (i.e. the
link weights between input and hidden layer). The bias values of the neurons have to
be set manually using the graphical user interface. To perform the nal initialization of
missing link weights, another initialization procedure has been implemented.
RBF Weights Redo This initialization procedure in
uences only the link weights be-
tween hidden and output layer. It initializes the network as well as possible by taking the
bias and the center vectors of the hidden neurons as a starting point. The center vectors
can be set by the previously described initialization procedure. Another possibility is to
create the center vectors by an external procedure, convert these center vectors into a
SNNS pattern le and copy the patterns into the corresponding link weights by using the
previously described initialization procedure. When doing this, Kohonen training must
not be performed of course.
The eect of the procedure RBF Weights Redo diers from RBF Weights only in the way
that the center vectors and the bias remain unchanged. As expected, the last two initial-
ization parameters are omitted. The meaning and eect of the remaining three parameters
is identical with the ones described in RBF Weights.
9.11.2.3 Learning Functions
Because of the special activation functions used for radial basis functions, a special learning
function is needed. It is impossible to train networks which use the activation functions
Act RBF : : : with backpropagation. The learning function for radial basis functions imple-
mented here can only be applied if the neurons which use the special activation functions
are forming the hidden layer of a three layer feedforward network. Also the neurons of the
output layer have to pay attention to their bias for activation.
The name of the special learning function is RadialBasisLearning. The required param-
eters are:
1. 1 (centers): the learning rate used for the modication t~j of center vectors ac-
cording to the formula t~j = 1 @@Et~ . A common value is 0:01.
j
2. 2 (bias p): learning rate used for the modication of the parameters p of the base
function. p is stored as bias of the hidden units and is trained by the following
formula pj = 2 @p@E . Usualy set to 0:0
j
3. 3 (weights): learning rate which in
uences the training of all link weights that are
leading to the output layer as well as the bias of all output neurons. A common
value is 0:01.
@E @E @E
cj;k = 3
@c
; di;k = 3
@d
; bk = 3
@b
j;k i;k k
4. delta max.:To prevent an overtraining of the network the maximally tolerated error
in an output unit can be dened. If the actual error is smaller than delta max. the
corresponding weights are not changed. Common values range from 0 to 0:3.
9.11. RADIAL BASIS FUNCTIONS (RBFS) 181
5. momentum: momentum term during training, after the formula gt+1 = 5 @E @g +
gt . The momentum{term is usually chosen between 0:8 and 0:9.
The learning rates 1 to 3 have to be selected very carefully. If the values are chosen too
large (like the size of values for backpropagation) the modication of weights will be too
extensive and the learning function will become unstable. Tests showed, that the learning
procedure becomes more stable if only one of the three learning rates is set to a value
bigger than 0. Most critical is the parameter bias (p), because the base functions are
fundamentally changed by this parameter.
Tests also showed that the learning function working in batch mode is much more stable
than in online mode. Batch mode means that all changes become active not before all
learning patterns have been presented once. This is also the training mode which is
recommended in the literature about radial basis functions. The opposite of batch mode
is known as online mode, where the weights are changed after the presentation of every
single teaching pattern. Which mode is to be used can be dened during compilation of
SNNS. The online mode is activated by dening the C macro RBF INCR LEARNING during
compilation of the simulator kernel, while batch mode is the default.
9.11.3 Building a Radial Basis Function Application
As a rst step, a three{layer feedforward network must be constructed with full connec-
tivity between input and hidden layer and between hidden and output layer. Either the
graphical editor or the tool BIGNET (both built into SNNS) can be used for this purpose.
The output function of all neurons is set to Out Identity. The activation function of
all hidden layer neurons is set to one of the three special activation functions Act RBF : : :
(preferably to Act RBF Gaussian). For the activation of the output units, a function is
needed which takes the bias into consideration. These functions are Act Logistic and
Act IdentityPlusBias.
The next step consists of the creation of teaching patterns. They can be generated man-
ually using the graphical editor, or automatically from external data sets by using an
appropriate conversion program. If the initialization procedure RBF Weights Kohonen is
going to be used, the center vectors should be normalized to length 1, or to equal length.
It is necessary to select an appropriate bias for the hidden units before the initialization is
continued. Therefore, the link weights between input and hidden layer are set rst, using
the procedure RBF Weights Kohonen so that the center vectors which are represented by
the link weights form a subset of the available teaching patterns. The necessary initializa-
tion parameters are: learn cycles = 0, learning rate = 0:0, shue = 0:0. Thereby teaching
patterns are used as center vectors without modication.
To set the bias, the activation of the hidden units is checked for dierent teaching patterns
by using the button TEST of the SNNS control panel. When doing this, the bias of the
hidden neurons have to be adjusted so that the activations of the hidden units are as diverse
as possible. Using the Gaussian function as base function, all hidden units are uniformly
highly activated, if the bias is chosen too small (the case bias = 0 leads to an activation of
1 of all hidden neurons). If the bias is chosen too large, only the unit is activated whose
182 CHAPTER 9. NEURAL NETWORK MODELS AND FUNCTIONS
link weights correspond to the current teaching pattern. A useful procedure to nd the
right bias is to rst set the bias to 1, and then to change it uniformly depending on the
behavior of the network. One must take care, however, that the bias does not become
negative, since some implemented base functions require the bias to be positive. The
optimal choice of the bias depends on the dimension of the input layer and the similarity
among the teaching patterns.
After a suitable bias for the hidden units has been determined, the initialization procedure
RBF Weights can be started. Depending on the selected activation function for the output
layer, the two scale parameters have to be set (see page 178). When Act IdentityPlus-
Bias is used, the two values 0 and 1 should be chosen. For the logistic activation function
Act Logistic the values -4 and 4 are recommended (also see gure 9.6). The parameters
smoothness and deviation should be set to 0 rst. The bias is set to the previously
determined value. Depending on the number of teaching patterns and the number of
hidden neurons, the initialization procedure may take rather long to execute. Therefore,
some processing comments are printed on the terminal during initialization.
After the initialization has nished, the result may be checked by using the TEST but-
ton. However, the exact network error can only be determined by the teaching function.
Therefore, the learning function RadialBasisLearning has to be selected rst. All learn-
ing parameters are set to 0 and the number of learning cycles (CYCLES) is set to 1. After
pressing the button ALL , the learning function is started. Since the learning param-
eters are set to 0, no changes inside the network will occur. After the presentation of
all available teaching patterns, the actual error is printed to the terminal. As usual, the
error is dened as the sum of squared errors of all output units (see formula 9.4). Under
certain conditions it can be possible that the error becomes very large. This is mostly due
to numerical problems. A poorly selected bias, for example, has shown to be a diÆcult
starting point for the initialization. Also, if the number of teaching patterns is less than or
equal to the number of hidden units a problem arises. In this case the number of unknown
weights plus unknown bias values of output units exceeds the number of teaching patterns,
i.e. there are more unknown parameters to be calculated than equations available. One or
more neurons less inside the hidden layer then reduces the error considerably.
After the rst initialization it is recommended to save the current network to test the
possibilities of the learning function. It has turned out that the learning function becomes
quickly unstable if too large learning rates are used. It is recommended to rst set only
one of the three learning rates (centers, bias (p), weights) to a value larger than 0 and
to check the sensitivity of the learning function on this single learning rate. The use of
the parameter bias (p) is exceptionally critical because it causes serious changes of the
base function. If the bias of any hidden neuron is getting negative during learning, an
appropriate message is printed to the terminal. In that case, a continuing meaningful
training is impossible and the network should be reinitialized.
Immediately after initialization it is often useful to train only the link weights between
hidden and output layer. Thereby the numerical inaccuracies which appeared during
initialization are corrected. However, an optimized total result can only be achieved if
also the center vectors are trained, since they might have been selected disadvantageously.
The initialization procedure used for direct link weight calculation is unable to calculate the
9.12. DYNAMIC DECAY ADJUSTMENT FOR RBFS (RBF{DDA) 183
weights between input and output layer. If such links are present, the following procedure
is recommended: Even before setting the center vectors by using RBF Weights Kohonen,
and before searching an appropriate bias, all weights should be set to random values
between 0:1 and 0:1 by using the initialization procedure Randomize Weights. Thereby,
all links between input and output layer are preinitialized. Later on, after executing the
procedure RBF Weights, the error of the network will still be relatively large, because the
above mentioned links have not been considered. Now it is easy to train these weights by
only using the teaching parameter weights during learning.
The output layer computes the output for each class as follows:
m
f (~x) = Ai Ri (~x)
X
i=1
with m indicating the number of RBFs belonging to the corresponding class and Ai being
the weight for each RBF.
An example of a full RBF-DDA is shown in gure 9.7. Note that there do not exist any
shortcut connections between input and output units in an RBF-DDA.
output units
weighted connections
RBF units
input nodes
thresholds
x
area of
conflict
Figure 9.8: One RBF unit as used by the DDA-Algorithm. Two thresholds are used to
dene an area of con
ict where no other prototype of a con
icting class is allowed to exist.
In addition, each training pattern has to be in the inner circle of at least one prototype of
the correct class.
9.12. DYNAMIC DECAY ADJUSTMENT FOR RBFS (RBF{DDA) 185
Normally, + is set to be greater than which leads to a area of con
ict where neither
matching nor con
icting training patterns are allowed to lie6. Using these thresholds, the
algorithm constructs the network dynamically and adjusts the radii individually.
In short the main properties of the DDA-Algorithm are:
constructive training: new RBF nodes are added whenever necessary. The net-
work is built from scratch, the number of required hidden units is determined during
training. Individual radii are adjusted dynamically during training.
fast training: usually about ve epochs are needed to complete training, due to
the constructive nature of the algorithm. End of training is clearly indicated.
guaranteed convergence: the algorithm can be proven to terminate.
two uncritical parameters: only the two parameters + and have to be ad-
justed manually. Fortunately the values of these two thresholds are not critical to
determine. For all tasks that have been used so far, + = 0:4 and = 0:2 was a
good choice.
guaranteed properties of the network: it can be shown that after training has
terminated, the network holds several conditions for all training patterns: wrong
classications are below a certain threshold ( ) and correct classications are above
another threshold (+).
The DDA-Algorithm is based on two steps. During training, whenever a pattern is mis-
classied, either a new RBF unit with an initial weight = 1 is introduced (called commit)
or the weight of an existing RBF (which covers the new pattern) is incremented. In both
cases the radii of con
icting RBFs (RBFs belonging to the wrong class) are reduced (called
shrink). This guarantees that each of the patterns in the training data is covered by an
RBF of the correct class and none of the RBFs of a con
icting class has an inappropriate
response.
Two parameters are introduced at this stage, a positive threshold + and a negative thresh-
old . To commit a new prototype, none of the existing RBFs of the correct class has an
activation above + and during shrinking no RBF of a con
icting class is allowed to have
an activation above . Figure 9.9 shows an example that illustrates the rst few training
steps of the DDA-Algorithm.
After training is nished, two conditions are true for all input{output pairs7 (~x; c) of the
training data:
at least one prototype of the correct class c has an activation value greater or equal
to +:
9i : Ric(~x) +
all prototypes of con
icting classes have activations less or equal to (mk indicates
6
The only exception to this rule is the case where a pattern of the same class lies in the area of con
ict
but is covered by another RBF (of the correct class) with a suÆciently high activation.
7
In this case the term \input{class pair" would be more justied, since the DDA{Algorithm trains the
network to classify rather than approximate an input{output mapping.
186 CHAPTER 9. NEURAL NETWORK MODELS AND FUNCTIONS
p(x) p(x)
A A B
+1 +1
x pattern class B x
(1) (2)
pattern class A
p(x) B p(x) B
+2 +2
A A A
+1 +1
(3) (4)
Figure 9.10:
Structure of an ART1 network
in SNNS. Thin arrows repre-
sent a connection from one
unit to another. Fat arrows
which go from a layer to a
unit indicate that each unit of
the layer is connected to the
target unit. Similarly a fat
arrow from a unit to a layer
means that the source unit is
connected to each of the units
in the target layer. The two
big arrows in the middle rep-
resent the full connection be-
tween comparison and recog-
nition layer and the one be-
tween delay and comparison
layer, respectively.
creation tool BigNet has been extended. It now oers an easy way to create ART1, ART2
and ARTMAP networks according to your requirements. For a detailed explanation of
the respective features of BigNet see chapter 7.
9.13.1 ART1
The topology of ART1 networks in SNNS has been chosen to to perform most of the ART1
algorithm within the network itself. This means that the mathematics is realized in the
activation and output functions of the units. The idea was to keep the propagation and
training algorithm as simple as possible and to avoid procedural control components.
In gure 9.10 the units and links of ART1 networks in SNNS are displayed.
The F0 or input layer (labeled inp in gure 9.10) is a set of N input units. Each of them
has a corresponding unit in the F1 or comparison layer (labeled cmp). The M elements
in the F2 layer are split into three levels. So each F2 element consists of three units. One
recognition (rec) unit, one delay (del) unit and one local reset (rst) unit. These three
parts are necessary for dierent reasons. The recognition units are known from the theory.
The delay units are needed to synchronize the network correctly8 . Besides, the activated
unit in the delay layer shows the winner of F2 . The job of the local reset units is to block
the actual winner of the recognition layer in case of a reset.
8
This is only important for the chosen realization of the ART1 learning algorithm in SNNS
9.13. ART MODELS IN SNNS 189
Finally, there are several special units. The cl unit gets positive activation when the input
pattern has been successfully classied. The nc unit indicates an unclassiable pattern,
when active. The gain units g1 and g2 with their known functions and at last the units
ri (reset input), rc (reset comparison), rg (reset general) and (vigilance), which realize
the reset function.
For an exact denition of the required topology for ART1 networks in SNNS see sec-
tion 9.13.4
9.13.1.2 Using ART1 Networks in SNNS
To use an ART1 network in SNNS several functions have been implemented: one to
initialize the network, one to train it and two dierent update functions to propagate an
input pattern through the net.
ART1 Initialization Function First the ART1 initialization function ART1 Weights
has to be selected from the list of initialization functions.
ART1 Weights is responsible to set the initial values of the trainable links in an ART1
network. These links are the ones from F1 to F2 and the ones from F2 to F1 respectively.
The F2 ! F1 links are all set to 1.0 as described in [CG87a]. The weights of the links
from F1 to F2 are a little more diÆcult to explain. To assure that in an initialized network
the F2 units will be used in their index order, the weights from F1 to F2 must decrease
with increasing index. Another restriction is, that each link-weight has to be greater than
0 and smaller than 1=N . Dening j as a link-weight from a F1 unit to the j th F2 unit
this yields
0 < M < M 1
1 < : : : < 1 + N :
To get concrete values, we have to decrease the fraction on the right side with increasing
index j and assign this value to j . For this reason we introduce the value and we obtain
j
1 :
+ (1 + j)N
:
M
So we have two parameters for ART1 Weights: and
. For both of them a value of 1.0
is useful for the initialization. The rst parameter of the initialization function is , the
190 CHAPTER 9. NEURAL NETWORK MODELS AND FUNCTIONS
second one is
. Having chosen and
one must press the INIT -button to perform
initialization.
The parameter is stored in the bias eld of the unit structure to be accessible to the
learning function when adjusting the weights.
One should always use ART1 Weights to initialize ART1 networks. When using another
SNNS initialization function the behavior of the simulator during learning is not pre-
dictable, because not only the trainable links will be initialized, but also the xed weights
of the network.
ART1 Learning Function To train an ART1 network select the learning function
ART1. To start the training of an ART1 network, choose the vigilance parameter (e.g.:
0.1) as rst value in both LEARN and UPDATE row of the control panel. Parameter ,
which is also needed to adjust the trainable weights between F1 and F2 , has already been
specied as initialization parameter. It is stored in the bias eld of the unit structure and
read out by ART1 when needed.
Like the learning function, both of the update functions only take the vigilance value
as parameter. It has to be entered in the control panel, the line below the parameters for
the learning function. The dierence between the two update functions is the following:
ART1 Stable propagates a pattern until the network is stable, i.e. either the cl unit or
the nc unit is active. To use this update function, you can use the TEST -button of the
control panel. The next pattern is copied to the input units and propagated completely
through the net, until a stable state is reached.
ART1 Synchronous, performs just one propagation step with each call. To use this function
you have to press the RESET -button to reset the net to a dened initial state, where
each unit has its initial activation value. Then copy a new pattern into the input layer,
using the buttons < and > . Now you can choose the desired number of propagation
steps that should be performed, when pressing the STEP -button (default is 1). With
this update function it is very easy to observe how the ART1 learning algorithm does its
job.
So use ART1 Synchronous, to trace a pattern through a network, ART1 Stable to propagate
the pattern until a stable state is reached.
9.13. ART MODELS IN SNNS 191
Figure 9.11:
Structure of an ART2 network
in SNNS. Thin arrows repre-
sent a connection from one
unit to another. The two big
arrows in the middle repre-
sent the full connectivity be-
tween comparison and recog-
nition layer and the one be-
tween recognition and com-
parison layer, respectively.
9.13.2 ART2
The realization of ART2 diers from the one of ART1 in its basic idea. In this case the
network structure would have been too complex, if mathematics had been implemented
within the network to the same degree as it has been done for ART1. So here more of
the functionality is in the control program. In gure 9.11 you can see the topology of an
ART2 network as it is implemented in SNNS.
All the units are known from the ART2 theory, except the rst units. They have to do the
same job for ART2 as for ART1 networks. They block the actual winner in the recognition
layer in case of reset. Another dierence between the ART2 model described in [CG87b]
and the realization in SNNS is, that originally the units ui have been used to compute the
error vector r, while this implementation takes the input units instead.
For an exact denition of the required topology for ART2 networks in SNNS see sec-
tion 9.13.4
As for ART1 there are an initialization function, a learning function and two update
functions for ART2. To initialize, train or test an ART2 network, these functions have to
be used. The description of the handling, is not repeated in detail in this section since it
is the same as with ART1. Only the parameters for the functions will be mentioned here.
192 CHAPTER 9. NEURAL NETWORK MODELS AND FUNCTIONS
ART2 Initialization Function For an ART2 network the weights of the top-down-
links (F2 ! F1 links) are set to 0.0 according to the theory ([CG87b]).
The choice of the initial bottom-up-weights is determined as follows: if a pattern has been
trained, then the next presentation of the same pattern must not generate a new winning
class. On the contrary, the same F2 unit should win, with a higher activation than all the
other recognition units.
This implies that the norm of the initial weight-vector has to be smaller than the one it
has after several training cycles. If J (1 J M ) is the actual winning unit in F2 , then
equation 9.4 is given by the theory:
u
= 1 1 d;
jjzJ jj !
1 d
(9.4)
where zJ is the the weight vector of the links from the F1 units to the J th F2 unit and
where d is a parameter, described below.
If all initial values zij (0) are presumed to be equal, this means:
zij (0)
1p 8 1 i N; 1 j M: (9.5)
(1 d) N
If equality is chosen in equation 9.5, then ART2 will be as sensitive as possible.
To transform the inequality 9.5 to an equation, in order to compute values, we introduce
another parameter
and get:
zij (0) =
1 p 8 1 i N; 1 j M; (9.6)
(1 d) N
where
1.
To initialize an ART2 network, the function ART2 Weights has to be selected. Specify
the parameters d and
as the rst and second initialization parameter. (A description of
parameter d is given in the subsection on the ART2 learning function.) Finally press the
INIT -button to initialize the net.
WARNING! You should always use ART2 Weights to initialize ART2 networks. When
using another SNNS initialization function the behavior of the simulator during learning is
not predictable, because not only the trainable links will be initialized, but also the xed
weights of the network.
ART2 Learning Function For the ART2 learning function ART2 there are various
parameters to specify. Here is a list of all parameters known from the theory:
Vigilance parameter. (rst parameter of the learning and update function). is
dened on the interval 0 p1: For some reason, described in [Her92] only the
following interval makes sense: 21 2 1:
9.13. ART MODELS IN SNNS 193
a Strength of the in
uence of the lower level in F1 by the middle level. (second
parameter of the learning and update function). Parameter a denes the importance
of the expection of F2 , propagated to F1: a > 0: Normally a value of a 1 is chosen
to assure quick stabilization in F1 .
b Strength of the in
uence of the middle level in F1 by the upper level. (third pa-
rameter of the learning and update function). For parameter b things are similar
to parameter a. A high value for b is even more important, because otherwise the
network could become instable ([CG87b]). b > 0; normally:b 1:
c Part of the length of vector p (units p1 ... pN ) used to compute the error. (fourth
parameter of the learning and update function). Choose c within 0 < c < 1.
d Output value of the F2 winner unit. You won't have to pass d to ART2, because
this parameter is already needed for initialization. So you have to enter the value,
when initializing the network (see subsection on the initialization function). Choose
d within 0 < d < 1. The parameters c and d are dependent on each other. For
reasons of quick stabilization c should be chosen as follows: 0 < c 1. On the
other hand c and d have to t the following condition: 0 1cdd 1:
e Prevents from division by zero. Since this parameter does not help to solve essential
problems, it is implemented as a x value within the SNNS source code.
Kind of threshold. For 0 x; q the activation values of the units xi and qi only
have small in
uence (if any) on the middle level of F1 . The output function f of
the units xi and qi takes as its parameter. Since this noise function is continu-
ously dierentiable, it is called Out ART2 Noise ContDiff in SNNS. Alternatively
a piecewise linear output function may be used. In SNNS the name of this function
is Out ART2 Noise PLin. Choose within 0 < 1:
To train an ART2 network, make sure, you have chosen the learning function ART2. As
a rst step initialize the network with the initialization function ART2 Weights described
above. Then set the ve parameters , a, b, c and , in the parameter windows 1 to 5
in both the LEARN and UPDATE lines of the control panel. Example values are 0.9, 10.0,
10.0, 0.1, and 0.0. Then select the number of learning cycles, and nally use the buttons
SINGLE and ALL to train a single pattern or all patterns at a time, respectively.
ART2 Update Functions Again two update functions for ART2 networks have been
implemented:
ART2 Stable
ART2 Synchronous.
Meaning and usage of these functions are equal to their equivalents of the ART1 model.
For both of them the parameters , a, b, c and have to be dened in the row of update
parameters in the control panel.
194 CHAPTER 9. NEURAL NETWORK MODELS AND FUNCTIONS
9.13.3 ARTMAP
Since an ARTMAP network is based on two networks of the ART1 model, it is useful to
know how ART1 is realized in SNNS. Having taken two of the ART1 (ARTa and ARTb)
networks as they were dened in section 9.13.1, we add several units that represent the
MAP eld. The connections between ARTa and the MAP eld, ARTb and the MAP eld,
as well as those within the MAP eld are shown in gure 9.12. The gure lacks the full
connection from the Fa2 layer to the Fab layer and those from each Fb2 unit to its respective
Fab unit and vice versa.
ARTMAP Stable is again used to propagate a pattern through the network until a stable
state is reached, while ARTMAP Synchronous does only perform one propagation step at a
time. For both of the functions the parameters a , b and have to be specied in the
line for update parameters of the control panel. The usage is the same as it is for ART1
and ART2 networks.
9.13.4 Topology of ART Networks in SNNS
The following tables are an exact description of the topology requirements for the ART
models ART1, ART2 and ARTMAP. For ARTMAP the topologies of the two ART1-parts
of the net are the same as the one shown in the ART1 table.
ART2
ARTMAP
site denition
site name site function
site denition
site name site function
i=1
The vector10 Wc most similar to X is the one with the largest dot product with X :
Netc (t) = max
j
fNetj (t)g = X Wc
The topological ordering is achieved by using a spatial neighborhood relation between the
competitive units during learning. I.e. not only the best-matching vector, with weight Wc,
but also its neighborhood 11 Nc, is adapted, in contrast to a basic competitive learning
algorithm like LVQ:
wij (t) = ej (t) (xi(t) wij (t)) for j 2 Nc
wij (t) = 0 for j 62 Nc
10
c will be used as index for the winning unit in the competitive layer throughout this text
11
Neighborhood is dened as the set of units within a certain radius of the winner. So N (1) would be
the the eight direct neighbors in the 2D grid; N (2) would be N (1) plus the 16 next closest; etc.
9.14. SELF-ORGANIZING MAPS (SOMS) 199
where
ej (t) = h(t) e (d =r(t)) (Gaussian Function)
j
2
anew, it resumes at the point where it was stopped last. Both mult H and mult R should
be in the range (0; 1]. A value of 1 consequently keeps the adaption values at a constant
level.
maps can be displayed using the LAYER buttons in the KOHONEN panel. Again,
green squares represent large, positive weights.
3. Winning Units
The set of units that came out as winners in the learning process can also be displayed
in SNNS. This shows the distribution of patterns on the SOM. To proceed, turn on
units top in the setup window of the display and select the winner item to be
shown. New winning units will be displayed without deleting the existing, which
enables tracing the temporal development of clusters while learning is in progress.
The display of the winning units is refreshed by pressing the WINNER button
again.
Note: Since the winner algorithm is part of the KOHONEN learning function, the
learning parameters must be set as if learning is to be performed.
Trainable links
Links with fixed weight 1.0
where:
neti is the net input to i (external + internal)
E is the excitation parameter (here set to 0.15)
D is the decay parameter (here set to 0.15)
This function is included in SNNS as ACT RM. Other activation functions may be used in
its place.
9.16. PARTIAL RECURRENT NETWORKS 205
9.16 Partial Recurrent Networks
9.16.1 Models of Partial Recurrent Networks
9.16.1.1 Jordan Networks
output units
hidden units
output units
hidden units
1.0
context layer 3 γ
3
context layer 2 γ2
context layer 1 γ1
input layer
In this subsection, the initialization, learning, and update functions for partial recurrent
networks are described. These functions can not be applied to only the three network
models described in the previous subsection. They can be applied to a broader class
of partial recurrent networks. Every partial recurrent network, that has the following
restrictions, can be used:
If after the deletion of all context units and the links to and from them, the remaining
network is a simple feedforward architecture with no cycles.
Input units must not get input from other units.
Output units may only have outgoing connections to context units, but not to other
units.
Every unit, except the input units, has to have at least one incoming link. For a
context unit this restriction is already fullled when there exists only a self recurrent
link. In this case the context unit receives its input only from itself.
In such networks all links leading to context units are considered as recurrent links.
Thereby the user has a lot of possibilities to experiment with a great variety of partial
recurrent networks. E.g. it is allowed to connect context units with other context units.
Note: context units are realized as special hidden units. All units of type special hidden
are assumed to be context units and are treated like this.
9.16. PARTIAL RECURRENT NETWORKS 207
9.16.2.1 The Initialization Function JE Weights
The initialization function JE Weights requires the specication of ve parameters:
, : The weights of the forward connections are randomly chosen from the interval
[ ; ].
: Weights of self recurrent links from context units to themselves. Simple Elman
networks use = 0.
: Weights of other recurrent links to context units. This value is often set to
1:0.
: Initial activation of all context units.
These values are to be set in the INIT line of the control panel in the order given above.
9.16.2.2 Learning Functions
By deleting all recurrent links in a partial recurrent network, a simple feedforward network
remains. The context units have now the function of input units, i.e. the total network
input consists of two components. The rst component is the pattern vector, which was
the only input to the partial recurrent network. The second component is a state vector.
This state vector is given through the next{state function in every step. By this way
the behavior of a partial recurrent network can be simulated with a simple feedforward
network, that receives the state not implicitly through recurrent links, but as an explicit
part of the input vector. In this sense, backpropagation algorithms can easily be modied
for the training of partial recurrent networks in the following way:
1. Initialization of the context units. In the following steps, all recurrent links are
assumed to be not existent, except in step 2(f).
2. Execute for each pattern of the training sequence the following steps:
input of the pattern and forward propagation through the network
calculation of the error signals of output units by comparing the computed
output and the teaching output
back propagation of the error signals
calculation of the weight changes
only on{line training: weight adaption
calculation of the new state of the context units according to the incoming links
3. Only o{line training: weight adaption
In this manner, the following learning functions have been adapted for the training of
partial recurrent networks like Jordan and Elman networks:
JE BP: Standard Backpropagation for partial recurrent networks
208 CHAPTER 9. NEURAL NETWORK MODELS AND FUNCTIONS
9.17.1 Monte-Carlo
Monte-Carlo learning is an easy way to determine weights and biases of a net. At every
learning cycle all weights and biases are chosen by random in the Range [Min; Max].
Then the error is calculated as summed squared error of all patterns. If the error is lower
than the previous best error, the weights and biases are stored. This method is not very
eÆcient but useful for nding a good start point for another learning algorithm.
Simulated annealing is a more sophisticated method for nding the global minima of a
error surface. In contrast to monte carlo learning only one weight or bias is changed at a
learning cycle. Dependant on the error development and a system temperature this change
is accepted or rejected. One of the advantages of simulated annealing is that learning does
not get stuck in local minima.
At the beginning of learning the temperature T is set to T0. Each training cycle consists
of the following four steps.
1. Change one weight or bias by random in the range [Min; Max].
2. Calculate the net error as sum of the given error function for all patterns.
3. Accept change if the error decreased or if the error increased by E with the prob-
ability p given by: p = exp T E
4. Decrease the temperature: T = T deg
The three implemented simulated annealing functions only dier in the way the net error is
calculated. Sim Ann SS calculates a summed squared error like the backpropagation learn-
ing functions; Sim Ann WTA calculates a winner takes all error; and Sim Ann WWTA
calculates a winner takes all error and adds a term corresponding to the security of the
winner takes all decision.
as the Hessian is not always positive denite, which prevents the algorithm from
achieving good performance, SCG uses a scalar k which is supposed to regulate
the indeniteness of the Hessian. This is a kind of Levenberg-Marquardt method
[P+ 88], and is done by setting:
E 0 (wk + k pk ) E 0 (wk )
s =
k + p
k k
k
and adjusting k at each iteration. This is the main contribution of SCG to both
elds of neural learning and optimization theory.
SCG has been shown to be considerably faster than standard backpropagation and than
other CGMs [Mol93].
9.18. SCALED CONJUGATE GRADIENT (SCG) 211
9.18.3 Parameters of SCG
As k and k are computed from their respective values at step k 1, SCG has two
parameters, namely the initial values 1 and 1. Their values are not critical but should
respect the conditions 0 < 1 10 4 and 0 < 1 10 6 . Empirically Mller has shown
that bigger values of 1 can lead to a slower convergence.
The third parameter is the usual quantity max (cf. standard backpropagation).
In SNNS, it is usually the responsibility of the user to determine when the learning process
should stop. Unfortunately, the k adaptation mechanism sometimes assigns too large
values to k when no more progress is possible. In order to avoid
oating-point exceptions,
we have added a termination criterion13 to SCG. The criterion is taken from the CGMs
presented in [P+88]: stop when
2 (jE (wk+1 ) E (wk )j) 1 (jE (wk+1 )j + jE (wk )j + 2)
2 is a small number used to rectify the special case of converging to a function value of
exactly zero. It is set to 10 10 . 1 is a tolerance depending of the
oating-point precision
of your machine, and it should be set to , which is usually equal to 10 8 (simple
precision) or to 10 16 (double precision).
To summarize, there are four non-critical parameters:
1. 1. Should satisfy 0 < 1 10 4 . If 0, will be set to 10 4 ;
2. 1. Should satisfy 0 < 1 10 6 . If 0, will be set to 10 6 ;
3. max . See standard backpropagation. Can be set to 0 if you don't know what to do
with it;
4. 1. Depends on the
oating-point precision. Should be set to 10 8 (simple precision)
or to 10 16 (double precision). If 0, will be set to 10 8 .
Note: SCG is a batch learning method, so shuing the patterns has no eect.
1+e
0
net y2 1
jI j x v 2
h(~x) = exp
X
@ i i A
i=1 r i
xi
is the mean value of the trainings pattern in dimension i. is a random number
between 0:1 and 0:1. K can be entered in the Max. no. of candidate units-
eld.
2. Train the K points with the following procedure. After the mapping, the ~vi should
be located at the maxima of the mapping of the residual error in input space.
For N epochs compute for each pattern ~x the ~v for which k~v ~xk < k~vk ~xk holds
for all k 6= and update ~v by
O
~v;t+1 = ~v;t + (t) jEp;oj(~x ~v;t)
X
o=1
(t) decreases with time14 . Ep;o is the error of output unit o on pattern p.
3. Let
Nk = f~xp 2 P j8i 6= k : k~xp ~vk k < k~xp ~vi kg
be the set of neighbours of ~vk . In other words, ~xp is in Nk , i ~xp is in the voronoi
region of ~vk .
Generate for every ~vk , for which
gk =
1 jO j X X X
jxi vi jjEp;oj
max(gk ) o=1 p2N k i
evaluates to a value lower than , a new hidden unit. Since must be smaller
than 1:0 at least one unit will be installed. The new units are working with the
TACOMA-activation-function as mentioned above.
4. Connect the new units with:
(a) the input units. For these links we need the data of the window-function. The
center of the window is initialized with the ~vk calculated above. The radii are
initialized with s
rk;i =
(dk;i)2
2ln
14
Actually 0:1 (N n)=N is used, where n is the number of the actual cycle
214 CHAPTER 9. NEURAL NETWORK MODELS AND FUNCTIONS
additional parameter eld. For small problems like the 2-spirals = 0:6 is a
good choice, but for problems with more input units = 0:99 or = 0:999
may be chosen.
(b) former installed hidden units. Here we connect only those hidden units, whose
window functions have a signicant overlap This is done by a connection routing-
procedure which uses
i=1 (hl (~xi )hk (~xi ))
N P p
Ql;m = q :
N
i=1
P
h l
p
(~x i ) 2 N
P p
i=1 h k (~xi ) 2
If Ql;m is bigger than
, the unit l (former installed) and unit m (new unit) are
connected.
(c) the output units. Since the output-units have a sigmoidal (or gaussian, sin,...)
activation, no window function parameters must be set.
5. Training of the new units. Here we use the same parameter settings as in Cas-
cade Correlation (see chapter 9.9.5). To obtain better results the values for the
patience and number of cycles should be increased. Better generalisation values can
be achieved by decreasing the value for Max. output unit error, but this leads
to a bigger net.
(a) Training of the weights and biases. The units and links are trained with the
actual learning function to maximize the correlation Sk . For more details see
the similar routines of Cascade-Correlation.
(b) Training of the center and radii of the window functions. The training re
ects
to goals:
i. Maximization of Sk , and
ii. Maximization of the anticorrelation between the output of the unit and the
output of the other units of that layer.
This leads to a aggregated functional F :
F L S
P
F = Z = L 1 L i=1 i
FN i=1 j =i+1 jRi;j j +
P P
Pruning Algorithms
This chapter describes the four pruning functions which are available in SNNS. The rst
section of this chapter introduces the common ideas of pruning functions, the second
takes a closer look at the theory of the implemented algorithms and the last part gives
guidance for the use of the methods. Detailed description can be found in [Bie94] (for
\non-contributing units") and [Sch94] (for the rest).
Now it is necessary to compute the diagonal elements of the Hesse-Matrix. For the de-
scription of this and to obtain further information read [YLC90].
10.2.3 Optimal Brain Surgeon
Optimal Brain Surgeon (OBS, see [BH93]) was a further development of OBD. It computes
the full Hesse-Matrix iteratively, which leads to a more exact approximation of the error
function: 1
ÆE = ÆW T H ÆW (10.3)
2
From equation (10.3), we form a minimization problem with the additional condition, that
at least one weight must be set to zero:
min
min
1 ÆW T H ÆW j E T ÆW + wij = 0
(10.4)
(ij ) ÆW 2 (ij )
and deduce a Lagrangian from that:
1
L = ÆW T H ÆW + (E(Tij ) ÆW + w(ij ) ) (10.5)
2
where is an Lagrangian multiplier. This leads to
wij
ÆW =
[H 1] H 1 E(ij) (10.6)
(ij );(ij )
i
Figure 10.2 illustrates the use of the attentional strength.
10.3. PRUNING NETS IN SNNS 219
α1 α2
α3 α4 α5
Figure 10.2: Neural network with attentional strength for each input and hidden neuron
Dening the relevance of a unit as the change in the error function while removing the
unit we get
@E
i = E =0 E =1 (10.9)
i i
@i =1
i
The pruning algorithms can be customized by changing the parameters in the pruning
panel (see gure 10.3) which can be invoked by pressing PRUNING in the manager panel.
The gure shows the default values of the parameters.
3D-Visualization of Neural
Networks
A
A
(0,0)
AA
AA
(4,0)
AA
AA
(8,0)
(9,3)
AAA (2,5,2)
layer 2
AA
(1,3,2) moved by x = -8 units
AA
(0,0,2) (2,5,1)
layer 1
AA
(0,5,1) (2,5,0) moved by x = -4 units
AA(0,0,0)
layer 0
not moved
The event of 3D-creation is easily controlled by rotating the network in the 3D display by
90Æ to be able to see the network sideways. It may be useful to display the z-coordinates
in the XGUI display (see 11.2.3.4).
The user is advised to create a 3D network rst as a wire-frame model without links for
much faster screen display.
The desired new z-coordinate may be entered in the setup panel of the 2D-display, or in
the z-value panel of the 3D-control panel. The latter is more convenient, since this panel
is always visible. Values between -32768 and +32767 are legal.
With the mouse all units are selected which are to receive the new z-coordinate.
With the key sequence U 3 Z (for Units 3d Z) the units are assigned the new value.
Afterwards all units are deselected.
11.2. USE OF THE 3D-INTERFACE 225
11.2.3.3 Moving a z-Plane
From the plane to be moved, one unit is selected as a reference unit in the 2D display.
Then the mouse is moved to the unit in the base layer above which the selected unit is to
be located after the move.
With the key sequence U 3 M (for Units 3d Move) all units of the layer are moved to the
current z-plane.
The right mouse button deselects the reference unit.
Figure 11.3: Scaled network (left) and network rotated by 90Æ (right)
Figure 11.4: Selection of one layer (left) and assigning a z-value (right)
11.2. USE OF THE 3D-INTERFACE 227
To assign the z-coordinate to the layer, the z-value entry in the 3D-control panel is set
to three. Then one moves the mouse into the 2D-display and enters the key sequence "`U
3 Z"'. This is shown in gure 11.4 (right).
Figure 11.5: Selection of a reference unit (left) and moving a plane (right)
Now the reference unit must be selected (gure 11.5, left).
To move the units over the zero plane, the mouse is moved in the XGUI display to the
position x=3, y=0 and the keys "`U 3 M"' are pressed. The result is displayed in gure 11.5
(right).
The output layer, which is assigned the z-value 6, is treated accordingly. Now the network
may be rotated to any position (gure 11.6, left).
Finally the central projection and the illumination may be turned on (gure 11.6, right).
These are the links in the wire-frame model (gure 11.7, left).The network with links in
the solid model looks like gure 11.7 (right).
Figure 11.6: Wire-frame model in parallel projection (left) and solid model in central
projection (right)
Figure 11.7: Network with links in the wireframe model (left) and in the solid model
(right)
11.2. USE OF THE 3D-INTERFACE 229
The elds Diffuse Light determine the parameters for diuse re
ection.
Intensity sets the intensity of the light source.
Note: The 3D-display is only a display window, while the 2D-display windows have a
graphical editor integrated. There is also no possibility to print the 3D-display via the
print panel.
Chapter 12
Batchman
Since training a neural network may require several hours of CPU time, it is advisable
to perform this task as a batch job during low usage times. SNNS oers the program
batchman for this purpose. It is basically an additional interface to the kernel that allows
easy background execution.
12.1 Introduction
This newly implemented batch language is to replace the old snnsbat. Programs which
are written in the old snnsbat language will not be able to run on the newly designed
interpreter. Snnsbat is not supported any longer, but we keep the program for those users
who are comfortable with it and do not want to switch to batchman. The new language
supports all functions which are necessary to train and test neural nets. All non-graphical
features which are oered by the graphical user interface (XGUI) may be accessed with the
help of this language as well.
The new batch language was modeled after languages like AWK, Pascal, Modula2 and C. It
is an advantage to have some knowledge in one of the described languages. The language
will enable the user to get the desired result without investing a lot of time in learning
its syntactical structure. For most operators multiple spellings are possible and variables
don't have to be declared before they are used. If an error occurs in the written batch
program the user will be informed by a displayed meaningful error message (warning) and
the corresponding line number.
For example:
/Unix> batchman -h
This is an instruction which should be entered in the Unix command line, where /Unix> is
the shell prompt which expects input from the user. Its appearance may change depending
on the Unix-system installed. The instruction batchman -h starts the interpreter with the
-h help option which tells the interpreter to display a help message. Every form of input
has to be conrmed with Enter (Return). Batch programs or part of batch programs will
also be displayed in typewriter writing. Batch programs can be written with a conventional
text editor and saved in a le. Commands can also be entered in the interactive mode
of the interpreter. If a le is used as a source to enter instructions, the name of the le
has to be provided when starting the interpreter. Typewriter writing is also used for wild
cards. Those wild cards have to be replaced by real names.
which produces:
SNNS Batch Interpreter V1.0. Type batchman -h for help.
No input file specified, reading input from stdin.
batchman>
Now the interpreter is ready to accept the user's instructions, which can be entered with
the help of the keyboard. Once the input is completed the interpreter can be put to work
with Ctrl-D. The interpreter can be aborted with Ctrl-C. The instructions entered are
only invoked after Ctrl-D is pressed.
If the user decides to use a le for input the command line option -f has to be given
together with the name of the interpreter:
/Unix> batchman -f myprog.bat
Once this is completed, the interpreter starts the program contained in the le myprog.bat
and executes its commands.
The standard output is usually the screen but with the command line option -l the output
can be redirected in a protocol le. The name of the le has to follow the command line
option:
/Unix> batchman -l logfile
Usually the output is redirected in combination with the reading of the program out of a
le:
12.2. DESCRIPTION OF THE BATCH LANGUAGE 237
/Unix> batchman -f myprog.bat -l logfile
The order of the command line options is arbitrary. Note, that all output lines of batchman
that are generated automatically (e.g. Information about with pattern le is loaded or
saved) are preceded by the hash sign \#". This way any produced log le can be processed
directly by all programms that treat \#" as a comment delimiter, e.g. gnuplot.
The other command line options are:
-p: Programs should only be parsed but not executed. This option tells the
interpreter to check the correctness of the program without executing the
instructions contained in the program. Run time errors can not be detected.
Such a run time error could be an invalid SNNS function call.
-q: No messages should be displayed except those caused by the print()-
function.
-s: No warnings should be displayed.
-h: A help message should be displayed which describes the available command
line options.
All following input will be printed without the shell-text.
The second line begins with an instruction and ends with a comment.
12.2.3 Variables
In order to save values it is possible to use variables in the batch language. A variable is
introduced to the interpreter automatically once it is used for the rst time. No previous
declaration is required. Names of variables must start with a letter or an underscore.
Digits, letters or more underscores could follow. Names could be:
a, num1, test, first net, k17 u, Test buffer 1
The interpreter distinguishes between lower and upper case letters. The type of a variable
is not known until a value is assigned to it. The variable has the same type as the assigned
value:
a = 5
filename := "first.net"
init flag := TRUE
12.2. DESCRIPTION OF THE BATCH LANGUAGE 239
NET ERR = 4.7e+11
a := init flag
The assignment of variables is done by using `=' or `:='. The comparison operator is
`=='. The variable `a' belongs to the type integer and changes its type in line 5 to
boolean. Filename belongs to the type string and NET ERR to the type
oat.
12.2.4 System Variables
System variables are predened variables that are set by the program and that are read-
only for the user. The following system variables have the same semantics as the displayed
variables in the graphical user interface:
SSE Sum of the squared dierences of each output neuron
MSE SSE divided by the number of training patterns
SSEPU SSE divided by the number of output neurons of the net
CYCLES Number of the cycles trained so far.
Additionally there are three more system variables:
PAT The number of patterns in the current pattern set
EXIT CODE The exit status of an execute call
SIGNAL The integer value of a caught signal during execution
12.2.5 Operators and Expressions
An expression is usually a formula which calculates a value. An expression could be a
complex mathematical formula or just a value. Expressions include:
3
TRUE
3 + 3
17 - 4 * a + (2 * ln 5) / 0.3
The value or the result of an expression can be assigned to a variable. The available
operators and their precedence are given in table 12.1. Higher position in the table means
higher priority of the operator.
If more than one expression occurs in a line the execution of expressions starts at the left
and proceeds towards the right. The order can be changed with parentheses `(' `)'.
The type of an expression is determined at run time and is set with the operator except
in the case of integer number division, the modulo operation, the boolean operation and
the compare operations.
If two integer values are multilpied, the result will be an integer value. But if an integer
and a
oat value are multilpied, the result will be a
oat value. If one operator is of
type string, then all other operators are transformed into strings. Partial expressions are
calculated before the transformation takes place:
240 CHAPTER 12. BATCHMAN
Operator Function
+; Sign for numbers
not, ! Logic negation for boolean numbers
sqrt Square root
ln Natural logarithm to the basis e
log Logarithms to the basis 10
, ^ Exponential function
Multiplication
= Division
div Even number division with an even result
mod, % Result after an even number division
+ Addition
Subtraction
< smaller than
<=; =< smaller equal
> greater than
>=; => greater equal
== equal
<>; ! = not equal
and, && logic AND for boolean values
or, jj logic OR for boolean values
Table 12.1: The precedence of the batchman operators
Please note that if the user decides to use operators such as sqrt, ln, log or the exponential
operator, no parentheses are required because the operators are not function calls:
Square root: sqrt 9
natural logarithm: ln 2
logarithm to the base of 10: log alpha
Exponential function: 10 ** 4 oder a^b
If a variable, which has not been assigned a value yet, is tried to be printed, the print
function will display < > undef instead of a value.
The If Instruction
There are two variants to the if instruction. The first variant is:
If EXPRESSION then BLOCK endif
The block is executed only if the expression has the boolean value TRUE.
EXPRESSIONS can be replaced by any complex expression if it delivers a boolean value:
produces:
hello world
Please note that the logic operator `and' is the operator last executed due to its lowest
priority. If there is confusion about the execution order, it is recommended to use brackets
to make sure the desired result will be achieved.
The second variant of the if operator uses a second block which will be executed as an
alternative to the rst one. The structure of the second if variant looks like this:
if EXPRESSION then BLOCK1 else BLOCK2 endif
The rst BLOCK, here described as BLOCK1, will be executed only if the resulting
value of EXPRESSION is `TRUE'. If EXPRESSION delivers `FALSE', BLOCK2 will be
executed.
The For Instruction
The for instruction is a control structure to repeat a block, a xed number of times. The
most general appearance is:
for ASSIGNMENT to EXPRESSION do BLOCK endfor
A counter for the for repetitions of the block is needed. This is a variable which counts
the loop iterations. The value is increased by one if an loop iteration is completed. If
the value of the counter is larger then the value of the EXPRESSIONS, the BLOCK won't
be executed anymore. If the value is already larger at the beginning, the instructions
contained in the block are not executed at all. The counter is a simple variable. A for
instruction could look like this:
for i := 2 to 5 do print (" here we are: ",i) endfor
produces:
here we are: 2
here we are: 3
here we are: 4
here we are: 5
The user has to make sure that the cycle terminates at one point. This can be achieved
by making sure that the EXPRESSION delivers once the value `TRUE' in case of the
repeat instruction or `FALSE' in case of the while instruction. The for example from
the previous section is equivalent to:
i := 2
while i <= 5 do
print ( "here we are: ",i)
i := i + 1 endwhile
or to:
i := 2
repeat
print ( "here we are: ",i)
i := i + 1
until i > 5
The main dierence between repeat and while is that repeat guarantees that the BLOCK
is executed at least once. The break and the continue instructions may also be used
within the BLOCK.
{ setSubPattern()
{ setShue()
{ setSubShue()
{ setClassDistrib()
Functions which refer to neural nets :
{ loadNet()
{ saveNet()
{ saveResult()
{ initNet()
{ trainNet()
{ resetNet()
{ jogWeights()
{ jogCorrWeights()
{ testNet()
Functions which refer to patterns :
{ loadPattern()
{ setPattern()
{ delPattern()
Special functions :
{ pruneNet()
{ pruneTrainNet()
{ pruneNetNow()
{ delCandUnits()
{ execute()
{ print()
{ exit()
{ setSeed()
The format of such calls is:
function name (parameter1, parameter2...)
No parameters, one parameter, or multiple parameters can be placed after the function
name. Unspecied values take on a default value. Note, however, that if the third value
is to be modied, the rst two values have to be provided with the function call as well.
The parameters have the same order as in the graphical user interface.
12.3. SNNS FUNCTION CALLS 245
12.3.1 Function Calls To Set SNNS Parameters
The following functions calls to set SNNS parameters are available:
The format and the usage of the function calls will be discussed now. It is an enormous help
to be familiar with the graphical user interface of the SNNS especially with the chapters
\Parameters of the learning functions", \Update functions", \Initialization functions",
\Handling patterns with SNNS", and \Pruning algorithms".
setInitFunc
This function call selects the function with which the net is initialized. The format is:
setInitFunc (function name, parameter...)
where function name is the initialization function and has to be selected out of:
ART1_Weights DLVQ_Weights Random_Weights_Perc
ART2_Weights Hebb Randomize_Weights
ARTMAP_Weights Hebb_Fixed_Act RBF_Weights
CC_Weights JE_Weights RBF_Weights_Kohonen
ClippHebb Kohonen_Rand_Pat RBF_Weights_Redo
CPN_Weights_v3.2 Kohonen_Weights_v3.2 RM_Random_Weights
CPN_Weights_v3.3 Kohonen_Const
CPN_Rand_Pat PseudoInv
It has to be provided by the user and the name has to be exactly as printed above. The
function name has to be embraced by "".
After the name of the initialization function is provided the user can enter the parameters
which in
uence the initialization process. If no parameters have been entered default values
will be selected. The selected parameters have to be of type
oat or integer. Function
calls could look like this:
setInitFunc ("Randomize Weights")
setInitFunc("Randomize Weights", 1.0, -1.0)
246 CHAPTER 12. BATCHMAN
where the rst call selects the Randomize Weights function with default parameters. The
second call uses the Randomize Weights function and sets two parameters. The batch
interpreter displays:
# Init function is now Randomize Weights
# Parameters are: 1.0 -1.0
setLearnFunc
The function call setLearnFunc is very similar to the setinitFunc call. setLearnFunc
selects the learning function which will be used in the training process of the neural net.
The format is:
setLearnFunc (function name, parameters....)
where function name is the name of the desired learning algorithm. This name is manda-
tory and has to match one of the following strings:
ART1 Counterpropagation Quickprop
ART2 Dynamic_LVQ RadialBasisLearning
ARTMAP Hebbian RBF-DDA
BackPercolation JE_BP RM_delta
BackpropBatch JE_BP_Momentum Rprop
BackpropChunk JE_Quickprop Sim_Ann_SS
BackpropMomentum JE_Rprop Sim_Ann_WTA
BackpropWeightDecay Kohonen Sim_Ann_WWTA
BPTT Monte-Carlo Std_Backpropagation
BBPTT PruningFeedForward TimeDelayBackprop
CC QPTT TACOMA
After the name of the learning algorithm is provided, the user can specify some parameters.
The interpreter is using default values if no parameters are selected. The values have to be
of the type
oat or integer. A detailed description can be found in the chapter \Parameter
of the learning function". Function calls could look like this:
setLearnFunc("Std Backpropagation")
setLearnFunc( "Std Backpropagation", 0.1)
The rst function call selects the learning algorithm and the second one additionally
provides the rst learning parameter. The batch interpreter displays:
# Learning function is now: Std backpropagation
# Parameters are: 0.1
setUpdateFunc
This function is selecting the order in which the neurons are visited. The format is:
setUpdateFunc (function name, parameters...)
12.3. SNNS FUNCTION CALLS 247
where function name is the name of the update function. The name of the update algorithm
has to be selected as shown below.
Topological_Order BAM_Order JE_Special
ART1_Stable BPTT_Order Kohonen_Order
ART1_Synchronous CC_Order Random_Order
ART2_Stable CounterPropagation Random_Permutation
ART2_Synchronous Dynamic_LVQ Serial_Order
ARTMAP_Stable Hopfield_Fixed_Act Synchonous_Order
ARTMAP_Synchronous Hopfield_Synchronous TimeDelay_Order
Auto_Synchronous JE_Order
After the name is provided several parameters can follow. If no parameters are selected,
default values are chosen by the interpreter. The parameters have to be of the type
oat
or integer. The update functions are described in the chapter Update functions. A
function call could look like this:
setUpdateFunc ("Topological Order")
setPruningFunc
This function call is used to select the dierent pruning algorithms for neural networks.
(See chapter Pruning algorithms). A function call may look like this:
setPruningFunc (function name1, function name2, parameters)
where function name1 is the name of the pruning function and has to be selected from:
MagPruning OptimalBrainSurgeon OptimalBrainDamage
Noncontributing_Units Skeletonization
Function name2 is the name of the subordinated learning function and has to be selected
out of:
BackpropBatch Quickprop BackpropWeightDecay
BackpropMomentum Rprop Std_Backpropagation
Additionally the parameters described below can be entered. If no parameters are entered
default values are used by the interpreter. Those values appear in the graphical user
interface in the corresponding widget of the pruning window.
1. Maximum error increase in % (
oat)
2. Accepted error (
oat)
3. Recreate last pruned element (boolean)
248 CHAPTER 12. BATCHMAN
In the rst function call the pruning function and the subordinate learning function is
selected. In the second function call almost all parameters are specied. Please note that
a function call has to be specied without a carriage return. Long function calls have to
be specied within one line. The following text is displayed by the batch interpreter:
# Pruning function is now MagPruning
# Subordinate learning function is now Rprop
# Parameters are: 15.0 3.5 FALSE 500 90 1.0 1e-6 TRUE TRUE
The regular learning function PruningFeedForward has to be set with the function call
setLearnFunc(). This is not necessary if PruningFeedForward is already set in the
network le.
setRemapFunc
This function call selects the pattern remapping function. The format is:
setRemapFunc (function name, parameter...)
where function name is the pattern remapping function and has to be selected out of:
None Binary Inverse
Norm Threshold
It has to be provided by the user and the name has to be exactly as printed above. The
function name has to be enclosed in "".
After the name of the pattern remapping function is provided the user can enter the
parameters which in
uence the remapping process. If no parameters have been entered
default values will be selected. The selected parameters have to be of type
oat or integer.
Function calls could look like this:
setRemapFunc ("None")
setRemapFunc("Threshold", 0.5, 0.5, 0.0, 1.0)
12.3. SNNS FUNCTION CALLS 249
where the rst call selects the default function None that does not do any remapping. The
second call uses the Threshold function and sets four parameters. The batch interpreter
displays:
# Remap function is now Threshold
# Parameters are: 0.5 0.5 0.0 1.0
setActFunc
This function call changes the activation function for all units in the network of a specic
type. The format is:
setActFunc (Type, function name)
where function name is the activation function and has to be selected out of the available
unit activation functions:
Act_Logistic Act_Elliott Act_BSB
Act_TanH Act_TanH_Xdiv2 Act_Perceptron
Act_Signum Act_Signum0 Act_Softmax
Act_StepFunc Act_HystStep Act_BAM
Logistic_notInhibit Act_MinOutPlusWeight Act_Identity
Act_IdentityPlusBias Act_LogisticTbl Act_RBF_Gaussian
Act_RBF_MultiQuadratic Act_RBF_ThinPlateSpline Act_less_than_0
Act_at_most_0 Act_at_least_2 Act_at_least_1
Act_exactly_1 Act_Product Act_ART1_NC
Act_ART2_Identity Act_ART2_NormP Act_ART2_NormV
Act_ART2_NormW Act_ART2_NormIP Act_ART2_Rec
Act_ART2_Rst Act_ARTMAP_NCa Act_ARTMAP_NCb
Act_ARTMAP_DRho Act_LogSym Act_CC_Thresh
Act_Sinus Act_Exponential Act_TD_Logistic
Act_TD_Elliott Act_Euclid Act_Component
Act_RM Act_TACOMA
It has to be provided by the user and the name has to be exactly as printed above. The
function name has to be embraced by "".
Type is the type of the units that are to be assigned the new function. It has to be specied
as an integer with the following meaning:
Type aected units Type aected units
0 all units in the network 5 special units only
1 input units only 6 special input units only
2 output units only 7 special output units only
3 hidden units only 8 special hidden units only
4 dual units only 9 special dual units only
See section 3.1.1 and section 6.5 of this manual for details about the various unit types.
250 CHAPTER 12. BATCHMAN
setCascadeParams
The function call setCascadeParams denes the additional parameters required for train-
ing a cascade correleation network. The parameters are the same as in the Cascade window
of the graphical user interface. The order is the same as in the window from top to bottom.
The format of the function call is:
setCascadeParams(parameter, ...)
will display:
12.3. SNNS FUNCTION CALLS 251
# Cascade Correlation
# Parameters are: 0.2 Quickprop no FALSE TRUE FALSE SBC 0.0 0.0 0.0
0.0 0.0 0.04 25 200 8 Act LogSym 0.01 50 200
Note that (like with the graphical user interface in the learning function widgets) in the
batchman call setLearnFunc() CC has to be specied as learning function, while the the
parameters will refer to the subordinate learning function given in this call.
setSubPattern
The function call setSubPattern denes the Subpattern-Shifting-Scheme which is de-
scribed in chapter 5.3. The denition of the Subpattern-Shifting-Scheme has to t the
used pattern le and the architecture of the net. The format of the function call is:
setSubPattern(InputSize, InputStep1, OutputSize1, OutputStep1)
The rst dimension of the subpatterns is described by the rst four parameters. The
order of the parameters is identical to the order in the graphical user interface ( see
chapter \Sub Pattern Handling"). All four parameters are needed for one dimension. If
a second dimension exists the four parameters of that dimension are given after the four
parameters of the rst dimension. This applies to all following dimensions. Function calls
could look like this:
setSubPattern (5, 3, 5, 1)
setSubPattern(5, 3, 5, 1, 5, 3, 5, 1)
setShue, setSubShue
The function calls setShuffle and setSubShuffle enable the user to work with the
shue function of the SNNS which selects the next training pattern at random. The
shue function can be switched on or o. The format of the function calls is:
setShuffle (mode)
setSubShuffle (mode)
where the parameter mode is a boolean value. The boolean value TRUE switches the
shue function on and the boolean value FALSE switches it o. setShuffe relates to
regular patterns and setSubShuffle relates to subpatterns. The function call:
252 CHAPTER 12. BATCHMAN
setSubShuffle (TRUE)
will display:
# Subpattern shuffling enabled
setClassDistrib
The function call setClassDistrib denes the composition of the pattern set used for
training. Without this call, or with the rst parameter set to FLASE, the distribution
will not be altered and will match the one in the pattern le. The format of the function
call is:
setClassDistrib(flag, parameters....)
The
ag is a boolean value which denes whether the distribution dened by the following
parameters is used (== TRUE), or ignored (== FALSE).
The next parameters give the relative amount of patterns of the various classes to be used
in each epoch or chunk. The ordering asumes an alphanumeric ordering of the class names.
Function calls could look like this:
setClassDistrib(TRUE, 5, 3, 5, 1, 2)
Given class names of \alpha", \beta", \gamma", \delta", \epsilon", this would result in
training 5 times the alpha class patterns, 3 times the beta class patterns, 5 times the
delta class patterns, once the epsilon class patterns, and twice the gamma class patterns.
This is due to the alphanumeric ordering of those class names \alpha", \beta", \delta",
\epsilon", \gamma".
If the learning function BackpropChunk is selected, this would also recommend a chunk size
of 16. However, the chunk size parameter of BackpropChunk is completely independent
from the values given to this function.
The following text is displayed by the batch interpreter:
# Class distribution is now ON
# Parameters are: 5 3 5 1 2
The function calls loadNet and saveNet both have the same format:
loadNet (file name)
saveNet (file name)
where le name is a valid Unix le name enclosed in " ". The function loadNet loads
a net in the simulator kernel and saveNet saves a net which is currently located in the
simulator kernel. The function call loadNet sets the system variable CYCLES to zero.
This variable contains the number of training cycles used by the simulator to train a net.
Examples for such calls could be:
loadNet ("encoder.net")
...
saveNet ("encoder.net")
The function call saveResult saves a SNNS result le and has the following format:
saveResult (file name, start, end, inclIn, inclOut, file mode)
The rst parameter (le name) is required. The le name has to be a valid Unix le
name enclosed by " ". All other parameters are optional. Please note that if one specic
parameter is to be entered all other parameters before the entered parameter have to be
provided also. The parameter start selects the rst pattern which will be handled and
end selects the last one. If the user wants to handle all patterns the system variable
PAT can be entered here. This system variable contains the number of all patterns. The
parameters inclIn and inclOut decide if the input patterns and the output patterns
should be saved in the result le or not. Those parameters contain boolean values. If
inclIn is TRUE all input patterns will be saved in the result le. If inclIn is FALSE the
patterns will not be saved. The parameter inclOut is identical except for the fact that it
relates to output patterns. The last parameter file mode of the type string, decides if a
le should be created or if data is just appended to an existing le. The strings "create"
and "append" are accepted for le mode. A saveResult call could look like this:
254 CHAPTER 12. BATCHMAN
saveResult ("encoder.res")
saveResult ("encoder.res", 1, PAT, FALSE, TRUE, "create")
In the second case the result le encoder.res was written and contains all output patterns.
The function calls initNet, trainNet, testNet are related to each other. All functions
are called without any parameters:
initNet()
trainNet()
testNet()
initNet() initializes the neural network. After the net has been reset with the function
call setInitFunc, the system variable CYCLE is set to zero. The function call initNet is
necessary if an untrained net is to be trained for the rst time or if the user wants to set
a trained net to its untrained state.
initNet()
produces:
# Net initialized
The function call trainNet is training the net exactly one cycle long. After this, the
content of the system variables SSE, MSE, SSEPU and CYCLES is updated.
The function call testNet is used to display the user the error of the trained net, without
actually training it. This call changes the system variables SSE , MSE, SSEPU but leaves
the net and all its weights unchanged.
Please note that the function calls trainNet, jogWeights, and jogCorrWeights are usu-
ally used in combination with a repetition control structure like for, repeat, or while.
Another function call without parameters is
resetNet()
It is used to bring all unit values to their original settings. This is useful to clean up gigantic
unit activations that sometimes result from large learnign rates. It is also necessary for
some special algorithms, e.g. training of Elman networks, that save a history of the
training in certain unit values. These need to be cleared , e.g. when a new pattern is
loaded.
Note that the weights are not changed by this function!
The function call jogWeights is used to apply random noise to the link weights. This
might be useful, if the network is stuck in a local minimum. The function is called like
jogWeights(minus, plus)
12.3. SNNS FUNCTION CALLS 255
where minus and plus dene the maximum random weight change as a factor of the
current link weight. E.g. jogWeights(-0.05, 0.02) will result in new random link
weights within the range of 95% to 102% of the current weight values.
jogCorrWeights is a more sophisticated version of noise injection to link weights. The idea
is only to jog the weights of non-special hidden units which show a very high correlation
during forward propagation of the patterns. The function call
jogCorrWeights(minus, plus, mincorr)
rst propagates all patterns of the current set through the network. During propagation,
statistical parameters are collected for each hidden unit with the goal to compute the
correlation coeÆcient between any two arbitrary hidden units:
cov(X; Y ) n (Xi X )(Yi Y )
P
x;y = = q
i=1 (12.1)
x y P n (Xi X )2 ni=1 (Yi Y )2
P
i=1
x;y 2 [ 1:0; 1:0] denotes the correlation coeÆcient between the hidden units x and y,
while Xi and Yi equal the activation of these two units during propagation of pattern i.
Now the hidden units x and y are determined which yield the highest correlation (or anti-
correlation) which is als higher than the parameter mincorr: jx;y j > mincorr. If such
hidden units exist, one of them is chosen randomly and its weights are jogged accoring to
the minus and plus parameters. The computing time for one call to jogCorrWeights() is
about the same as the time consumed by testNet() or half the time used by trainNet().
Reasonable parameters for mincorr are in the range of [0:8; 0:99].
of the SNNS.) selects one of the loaded pattern les as the one currently in use. The call
delPattern deletes the pattern le currently in use from the kernel. The function calls:
loadPattern ("encoder.pat")
loadPattern ("encoder1.pat")
setPattern("encoder.pat")
delPattern("encoder.pat")
produce:
# Patternset encoder.pat loaded; 1 patternset(s) in memory
# Patternset encoder1.pat loaded; 2 patternset(s) in memory
# Patternset is now encoder.pat
# Patternset encoder.pat deleted; 1 patternset(s) in memory
# Patternset is now encoder1.pat
pruneNet
The function call pruneNet() is pruning a net equivalent to the pruning in the graphical
user interface. After all functions and parameters are set with the call setPruningFunc
the pruneNet() function call can be executed. No parameters are necessary.
pruneTrainNet
The function call pruneTrainNet() is equivalent to TrainNet() but is using the subordi-
nate learning function of pruning. Use it when you want to perform a training step during
your pruning algorithm. It has the same parameter syntax as TrainNet().
pruneNetNow
The function call pruneNetNow() performs one pruning step and then calculates the SSE,
MSE, and SSEPU values of the resulting network.
12.3. SNNS FUNCTION CALLS 257
delCandUnits
This function has no functionality. It is kept for backward compatibility reasons. In earlier
SNNS versions Cascade Correlation candiate-units had to be deleted manually with this
function. Now they are deleted automatically at the end of training.
execute
An interface to the Unix operation system can be created by using the function execute.
This function call enables the user to start a program at the Unix command line and
redirect its output to the batch program. All Unix help programs can be used to make
this special function a very powerful tool. The format is:
execute (instruction, variable1, variable2.....)
where `instruction' is a Unix instruction or a Unix program. All output, generated by the
Unix command has to be separated by blanks and has to be placed in one line. If this is
not done automatically please use the Unix commands AWK or grep to format the output
as needed. Those commands are able to produce such a format. The output generated
by the program will be assigned, according to the order of the output sequences, to the
variables variable1, variable2.. The data type of the generated output is automatically
set to one of the four data types of the batch interpreter. Additionally the exit state of
the Unix program is saved in the system variable EXIT CODE. An example for execute is:
execute ("date", one, two, three, four)
print ("It is ", four, " o'clock")
This function call calls the command date and reads the output "Fri May 19 16:28:29
GMT 1995" in the four above named variables. The variable `four' contains the time. The
batch interpreter produces:
It is 16:28:29 o'clock
The execute call could also be used to determine the available free disk space:
execute ("df .| grep dev", dmy, dmy, dmy, freeblocks)
print ("There are ", freeblocks, "Blocks free")
In this examples the Unix pipe and the grep command are responsible for reducing the
output and placing it into one line. All lines, that contain dev, are ltered out. The
second line is read by the batch interpreter and all information is assigned to the named
variables. The rst three elds are assigned to the variable dmy. The information about
the available blocks will be stored in the variable freeblocks. The following output is
produced:
There are 46102 Blocks free
The examples given above should give the user an idea how to handle the execute com-
mand. It should be pointed out here that execute could as well call another batch
interpreter which could work on partial solutions of the problem. If the user wants to
258 CHAPTER 12. BATCHMAN
accomplish such a task the command line option -q of the batch interpreter could be used
to suppress output not caused by the print command. This would ease the reading of the
output.
exit
This function call leaves the batch program immediately and terminates the batch inter-
preter. The parameter used in this function is the exit state, which will be returned to
the calling program (usually the Unix shell). If no parameter is used the batch interpreter
returns zero. The format is:
exit (state)
The integer state ranges from -128 to +127. If the value is not within this range the
value will be mapped into the valid range and an error message displayed. The following
example will show the user how this function call could be used:
if freeblocks < 1000 then
print ("Not enough disk space")
exit (1)
endif
setSeed
The function setSeed sets a seed value for the random number generator used by the
initialization functions. If setSeed is not called before initializing a network, subsequent
initializiations yield the exact same initial network conditions. Thereby it is possible to
make an exact comparison of two training runs with dierent learning parameters.
setSeed(seed)
SetSeed may be called with an integer parameter as a seed value. Without a parameter
it uses the value returned by the shell command `date' as seed value.
while SSE > 6.9 and CYCLES < 1000 and SIGNAL == 0 do
12.4. BATCHMAN EXAMPLE PROGRAMS 259
if CYCLES mod 10 == 0 then
print ("cycles = ", CYCLES, " SSE = ", SSE) endif
trainNet()
endwhile
if SIGNAL != 0 then
print("Stopped due to signal reception: signal " + SIGNAL")
endif
12.4.2 Example 2
The following example program reads the output of the network analyzation program
analyze. The output is transformed into a single line with the help of the program
analyze.gawk. The net is trained until all patterns are classied correctly:
loadNet ("encoder.net")
loadPattern ("encoder.pat")
initNet ()
while(TRUE)
for i := 1 to 500 do
trainNet ()
endfor
260 CHAPTER 12. BATCHMAN
resfile := "test.res"
saveResult (resfile, 1, PAT, FALSE, TRUE, "create")
saveNet("enc1.net")
12.4.3 Example 3
The last example program shows how the user can validate the training with a second
pattern le. The net is trained with one training pattern le and the error, which is
used to determine when training should be stopped, is measured on a second pattern le.
Thereby it is possible to estimate if the net is able to classify unknown patterns correctly:
loadNet ("test.net")
loadPattern ("validate.pat")
loadPattern ("training.pat")
initNet ()
repeat
for i := 1 to 20 do
trainNet ()
endfor
saveNet ("test." + CYCLES + "cycles.net")
setPattern ("validate.pat")
testNet ()
valid_error := SSE
setPattern ("training.pat")
12.5. SNNSBAT { THE PREDESSOR 261
until valid_error < 2.5
saveResult ("test.res")
The program trains a net for 20 cycles and saves it under a new name for every iteration
of the repeat instruction. Each time the program tests the net with the validation pattern
set. This process is repeated until the error of the validation set is smaller than 2.5
the key 'Type:', which has to be listed only once and as the rst key. If a key is omitted,
the corresponding value(s) are assigned a default.
Here is a listing of the tuples and their meaning:
Key Value Meaning
InitFunction: <string> Name of the initialization function.
InitParam: <
oat> 'NoOfInitParam' parameters for initiali-
zation function, separated by blanks.
LearnParam: <
oat> 'NoOfLearnParam' parameters for learn-
ing function, separated by blanks.
UpdateParam: <
oat> 'NoOfUpdateParam' parameters for the
update function, separated by blanks.
LearnPatternFile: <string> Filename of the learning patterns.
MaxErrorToStop: <
oat> Network error when learning is to be
halted.
MaxLearnCycles: <int> Maximum number of learning cycles to
be executed.
NetworkFile: <string> Filename of the net to be trained.
NoOfInitParam: <int> Number of parameters for the initializa-
tion function.
NoOfLearnParam: <int> No of parameters for learning function.
NoOfUpdateParam: <int> No of parameters for update function.
NoOfVarDim: <int> <int> Number of variable dimensions of the in-
put and output patterns.
PerformActions: none Execution run separator.
PruningMaxRetrainCycles: <int> maximum no. of cycles per retraining
PruningMaxErrorIncrease: <
oat> Percentage to be added to the rst net
error. The resulting value cannot be ex-
ceeded by the net error, unless it is lower
than the accepted error
PruningAcceptedError: <
oat> Maximum accepted error.
PruningRecreate: [ YES j NO ] Flag for reestablishing the last state of
the net at the end of pruning
PruningOBSInitParam: <
oat> initial value for OBS
PruningInputPruning: [ YES j NO ] Flag for input unit pruning
PruningHiddenPruning: [ YES j NO ] Flag for hidden unit pruning
ResultFile: <string> Filename of the result le.
ResultIncludeInput: [ YES j NO ] Flag for inclusion of input patterns in the
result le.
ResultIncludeOutput: [ YES j NO ] Flag for inclusion of output learning pat-
terns in the result le.
SubPatternOSize: <int> NoOfVarDim[2] int values that specify
the shape of the sub patterns of each out-
put pattern.
12.5. SNNSBAT { THE PREDESSOR 263
Key Value Meaning
SubPatternOStep: <int>
NoOfVarDim[2] int values that specify the
shifting steps for the sub patterns of each
output pattern.
TestPatternFile: <string> Filename of the test patterns.
TrainedNetworkFile: <string> Filename where the net should be stored
after training / initialization.
Type: <string> The type of grammar that corresponds to
this le. Valid types are:
'SNNSBATCH 1': performs only one exe-
cution run.
'SNNSBATCH 2': performs multiple exe-
cution runs.
ResultMinMaxPattern: <int> <int> Number of the rst and last pattern to be
used for result le generation.
Shue: [ YES j NO ] Flag for pattern shuing.
ShueSubPat: [ YES j NO ] Flag for subpattern shuing.
SubPatternISize: <int> NoOfVarDim[1] int values that specify the
shape of the sub patterns of each input
pattern.
SubPatternIStep: <int> NoOfVarDim[1] int values that specify the
shifting steps for the sub patterns of each
input pattern.
Please note the mandatory colon after each key and the upper case of several letters.
snnsbat may also be used to perform only parts of a regular network training run. If the
network is not to be initialized, training is not to be performed, or no result le is to be
computed, the corresponding entries in the conguration le can be omitted.
For all keywords the string '<OLD>' is also a valid value. If <OLD> is specied, the value
of the previous execution run is kept. For the keys 'NetworkFile:' and 'LearnPatternFile:'
this means, that the corresponding les are not read in again. The network (patterns)
already in memory are used instead, thereby saving considerable execution time. This
allows for a continuous logging of network performance. The user may, for example, load
a network and pattern le, train the network for 100 cycles, create a result le, train
another 100 cycles, create a second result le, and so forth. Since the error made by the
current network in classifying the patterns is reported in the result le, the series of result
les document the improvement of the network performance.
The following table shows the behavior of the program caused by omitted entries:
264 CHAPTER 12. BATCHMAN
#
#This execution run continues the training of the already loaded file
#for another 100 cycles before creating a second result file.
#
PerformActions:
#
NetworkFile: <OLD>
#
LearnPatternFile: <OLD>
NoOfLearnParam: <OLD>
LearnParam: 0.2 0.3
MaxLearnCycles: 100
MaxErrorToStop: 0.01
Shuffle: YES
#
ResultFile: letters3.res
ResultMinMaxPattern: <OLD>
ResultIncludeInput: <OLD>
ResultIncludeOutput: <OLD>
TrainedNetworkFile: trained_letters.net
#
#This execution run concludes the training of the already loaded file.
#After another 100 cycles of training with changed learning
#parameters the final network is saved to a file and a third result
#file is created.
#
The le <log le> collects the SNNS kernel messages and contains statistics about running
time and speed of the program.
If the <log le> command line parameter is omitted, snnsbat opens the le `snnsbat.log'
in the current directory. To limit the size of this le, a maximum of 100 learning cycles
are logged. This means, that for 1000 learning cycles a message will be written to the le
every 10 cycles.
If the time required for network training exceeds 30 minutes of CPU time, the network is
saved. The log le then shows the message:
##### Temporary network file 'SNNS_Aaaa00457' created. #####
Temporay networks always start with the string `SNNS '. After 30 more minutes of CPU
time, snnsbat creates a second security copy. Upon normal termination of the program,
these copies are deleted from the current directory. The log le then shows the message:
##### Temporary network file 'SNNS_Aaaa00457' removed. #####
In an emergency (powerdown, kill, alarm, etc.), the current network is saved by the pro-
gram. The log le, resp. the mailbox, will later show an entry like:
Signal 15 caught, SNNS V4.2Batchlearning terminated.
SNNS V4.2Batchlearning terminated at Tue Mar 23 08:49:04 1995
System: SunOS Node: matisse Machine: sun4m
12.5. SNNSBAT { THE PREDESSOR 267
Networkfile './SNNS BAAa02686' saved.
Logfile 'snnsbat.log' written.
13.1 Overview
There are the following tools available to ease the use of SNNS:
analyze: analyzes result les generated by SNNS to test the classication
capabilities of the corresponding net
td bignet: time-delay network generator
ff bignet: feedforward network generator
Convert2snns: pattern conversion tool for Kohonen Networks
feedback-gennet: generator for network denition les
mkhead: writes SNNS pattern le header to stdout
mkout: writes SNNS output pattern to stdout
mkpat: reads 8 bit rawle and writes SNNS pattern le to stdout
netlearn: backpropagation test program
netperf: benchmark program
pat sel: produces pattern le with selected patterns
snns2c: compiles an SNNS network le into an executable C source
linknets: connects two or more SNNS network les into one big net
isnns: interactive stream interface for online training
13.2 Analyze
The purpose of this tool is to analyze the result les that have been created by SNNS.
The result le which you want to analyze has to contain the teaching output and the
output of the network.
Synopsis: analyze [-options]
It is possible to choose between the following options in any order:
-w numbers of patterns which were classied wrong are printed
13.2. ANALYZE 269
-r numbers of patterns which were classied right are printed
-u numbers of patterns which were not classied are printed
-a same as -w -r -u
-S "t c" specic: numbers of class t pattern which are
classied as class c are printed (-1 = noclass)
-v verbose output. Each printed number is preceded by one of the
words 'wrong', 'right', 'unknown', or 'specic' depending
on the result of the classication.
-s statistic information containing wrong, right and not classied
patterns. The network error is printed also.
-c same as -s, but statistics for each output unit (class) is displayed.
-m show confusion matrix (only works with -e 402040 or -e WTA)
-i <file name> name of the 'result le' which is going to be analyzed.
-o <file name> name of the le which is going to be produced by analyze.
-e <function> denes the name of the 'analyzing function'.
Possible names are: 402040, WTA, band (description see below)
-l <real value> rst parameter of the analyzing function.
-h <real value> second parameter of the analyzing function.
Starting analyze without any options is equivalent to:
analyze -w -e 402040 -l 0.4 -h 0.6
13.3 bignet
The program bignet can be used to automatically construct complex neural networks.
The synopsis is kind of lengthy, so when networks are to be constructed manually, the
graphical version included in xgui is preferrable. If, however, networks are to be con-
structed automatically, e.g. a whole series from within a shell script, this program is the
method of choice.
13.3. FF BIGNET 271
Synopsis:
ff bignet <plane definition>... <link definition>... [<output file>]
where:
<plane definition> : -p <x> <y> [<act> [<out> [<type>]]]
<x> : number of units in x-direction
<y> : number of units in y-direction
<act> : optional activation function
e.g.: Act_Logistic
<out> : optional output function, <act> must be given too
e.g.: Out_Identity
<type>: optional layer type, <act> and <out> must be given
too. Valid types: input, hidden, or output
Target section:
<tp> : target plane (1, 2, ...)
<tcx> : x position of target cluster
<tcy> : y position of target cluster
<tcw> : width of target cluster
<tch> : height of target cluster
<tux> : x position of a distinct target unit
<tuy> : y position of a distinct target unit
<tmx> : delta x for multiple target fields
<tmy> : delta y for multiple target fields
denes a network with three layers. A 6x20 input layer, a 1x10 hidden layer, and a single
output unit. The upper 6x10 input units are fully connected to the hidden layer, which
in turn is fully connected to the output unit. The lower 6x10 input units do not have any
connections.
NOTE:
Even though the tool is called bignet, it can not only construct feed-forward, but also
recurrent networks.
13.4 td bignet
The program td bignet can be used to automatically construct neural networks with the
topology for time-delay learning. As with ff bignet, the graphical version included in
xgui is preferrable if networks are to be constructed manually.
Synopsis:
td bignet <plane definition>... <link definition>... [<output file>]
where:
<plane definition> : -p <f> <d>
<f> : number of feature units
<d> : total delay length
<link defintion> : -l <sp> <sf> <sw> <d> <tp> <tf> <tw>
<sp> : source plane (1, 2, ...)
<sf> : 1st feature unit in source plane
<sw> : field width in source plane
<d> : delay length in source plane
<tp> : target plane (2, 3, ...)
<tf> : 1st feature unit in target plane
<tw> : field width in target plane
<output file> : name of the output file (default SNNS_TD_NET.net)
At least two plane denitions and one link denition are mandatory. There is no upper
limit on the number of planes that can be specied.
13.5 linknets
linknets allows to easily link several independent networks to one combined network.
In general n so called input networks (n ranges from 1 to 20) are linked to m so called
output networks (m ranges from 0 to 20). It is possible to add a new layer of input units
to feed the former input units of the input networks. It is also possible to add a new layer
of output units which is either fed by the former output units of the output networks (if
output networks are given) or by the former output units of the input networks.
Synopsis:
linknets -innets <netfile> ... [ -outnets <netfile> ... ]
13.5. LINKNETS 273
-o <output network file> [ options ]
It is possible to choose between the following options:
-inunits use copies of input units
-inconnect <n> fully connect with <n> input units
-direct connect input with output one-to-one
-outconnect <n> fully connect to <n> output units
-inunits and -inconnect may not be used together. -direct is ignored if no output
networks are given.
If no input options are given (-inunits, -inconnect), the resulting network uses the
same input units as the given input networks.
If -inconnect <n> is given, <n> new input units are created. These new input units are
fully connected to the (former) input units of all input networks. The (former) input units
of the input networks are changed to be hidden units in the resulting network. The newly
created network links are initialized with weight 0:0.
To use the option -inunits, all input networks must have the same number of input units.
If -inunits is given, a new layer input units is created. The number of new input units
is equal to the number of (former) input units of a given input network. The new input
units are connected by a one-to-one scheme to the (former) input units, which means,
that every former input unit gets input activation from exactly one new input unit. The
newly created network links are initialized with weight 1:0. The (former) input units of the
input networks are changed to be special hidden units in the resulting network (incoming
weights of special hidden units are not changed during further training). This connection
scheme is usefull to feed several networks with similar input structure with equal input
patterns.
Similar to the description of -inconnect, the option -outconnect may be used to create
a new set of output units: If -outconnect <n> is given, <n> new output units are created.
These new output units are fully connected either to the (former) output units of all
output networks (if output networks are given) or to the (former) output units of all input
networks. The (former) output units are changed to be hidden units in the resulting
network. The newly created network links are initialized with weight 0:0.
There exsists no option -outunits (similar to -inunits), so far since it is not clear, how
new output units should be activated by a xed weighting scheme. This heavily depends
on the kind of used networks and type of application. However, it is possible to create a
similar structure by hand, using the graphical user interface. Doing this, don't forget to
change the unit type of the former output units to hidden.
By default all output units of the input networks are fully connected to all input units of
the output networks. In some cases it is usefull, not to use a full connection but a one-by-
one connection scheme. This is performed by giving the option -direct. To use the option
-direct, the sum of all (former) output units of the input networks must equal the sum
of all (former) input units of the output networks. Following the given succession of input
and output networks (and the network dependent succession of input and output units),
274 CHAPTER 13. TOOLS FOR SNNS
13.5.1 Limitations
linknets accepts all types of SNNS networks. But.... It is only tested to use feedforward
type networks (multy layered networks, RBF networks, CC networks). It will denately
not work with DLVQ, ART, reccurent type networks, and networks with DUAL units.
13.5.3 Examples
Figure 13.5: Two input networks one-by-one connected to two output networks
13.6 Convert2snns
In order to work with the KOHONEN tools in SNNS, a pattern le and a network le
with a special format are necessary.
Convert2snns will accomplish three important things:
Creation of a 2-dimensional Kohonen Feature Map with n components
Weight les are converted in a SNNS compatible .net le
A le with raw patterns is converted in a .pat le
When working with convert2snns, 3 les are necessary:
1. A control le, containing the conguration of the network
2. A le with weight vectors
3. A le with raw patterns
13.7. FEEDBACK-GENNET 277
13.6.1 Setup and Structure of a Control, Weight, Pattern File
Each line of the control le begins with a KEYWORD followed by the respective declaration.
The order of the keywords is arbitrary.
Example of a control le:
PATTERNFILE eddy.in **
WEIGHTFILE eddy.dat
XSIZE 18
YSIZE 18
COMPONENTS 8
PATTERNS 47 **
For creation of a network le you need at least the statements marked and for the .pat
le additionally the statements marked **.
Omitting the WEIGHTFILE will initialize the weights of the network with 0.
The WEIGHTFILE is a simple ASCII le, containing the weight vectors row by row.
The PATTERNFILE contains in each line the components of a pattern.
If convert2snns has nished the conversion it will ask for the name of the network and
pattern les to be saved.
13.7 Feedback-gennet
The program feedback-gennet generates network denition les for fully recurrent net-
works of any size. This is not possible by using bignet.
The networks have the following structure:
- input layer with no intra layer connections
- fully recurrent hidden layer
- output layer: connections from each hidden unit to each output unit
AND
optionally fully recurrent intra layer connections in the output layer
AND
optionally feedback connections from each output unit to each hidden unit.
The activation function of the output units can be set to sigmoidal or linear. All weights
are initialized with 0.0. Other initializations should be performed by the init functions in
SNNS.
Synopsis: feedback-gennet
example:
278 CHAPTER 13. TOOLS FOR SNNS
unix> feedback-gennet
produces
Enter # input units: 2
Enter # hidden units: 3
Enter # output units: 1
INTRA layer connections in the output layer (y/n) :n
feedback connections from output to hidden units (y/n) :n
Linear output activation function (y/n) :n
Enter name of the network le: xor-rec.net
working...
generated xor-rec.net
13.8 Mkhead
This program writes a SNNS pattern le header to stdout. This program can be used
mkpat and mkout to produce pattern les from raw les in a shell script.
Synopsis: mkhead <pats> <in units> <out units>
where:
pats are the number of patterns in the le
in units are the number of input units in the le
out units are the number of output units in the le
13.9 Mkout
This program writes a SNNS output pattern to stdout. This program can be used together
with mkpat and mkhead to produce pattern les from raw les in a shell script.
Synopsis: mkout <units> <active unit>
where:
units is the number of output units
active unit is the unit which has to be activated
13.10 Mkpat
The purpose of this program is to read a binary 8-Bit le from the stdin and writes a
SNNS pattern le entry to stdout. This program can be used together with mkpat and
mkout to produce pattern les from raw les in a shell script.
13.11. NETLEARN 279
Synopsis: mkpat <xsize> <ysize>
where:
xsize is the xsize of the raw le
ysize is the ysize of the raw le
13.11 Netlearn
This is a SNNS kernel backpropagation test program. It is a demo for using the SNNS
kernel interface to train networks.
Synopsis: netlearn
example:
unix> netlearn
produces
SNNS 3D-Kernel V 4.2
|Network learning|
Filename of the network le: letters untrained.net
Loading the network...
Network name: letters
No. of units: 71
No. of input units: 35
No. of output units: 26
No. of sites: 0
No. of links: 610
Learning function: Std Backpropagation
Update function: Topological Order
Filename of the pattern le: letters.pat
loading the patterns...
Number of pattern: 26
The learning function Std Backpropagation needs 2 input parameters:
Parameter [1]: 0.6
Parameter [2]: 0.6
Choose number of cycles: 250
Shue patterns (y/n) n
280 CHAPTER 13. TOOLS FOR SNNS
13.12 Netperf
This is a benchmark program for SNNS. Propagation and backpropagation tests are per-
formed.
Synopsis: netperf
example:
unix> netperf
produces
SNNS 3D-Kernel V4.2 | Benchmark Test |
Filename of the network le: nettalk.net
loading the network... Network name: nettalk1
No. of units: 349
No. of input units: 203
No. of ouput units: 26
No. of sites: 0
No. of links: 27480
Learningfunction: Std Backpropagation
Updatefunction: Topological Order
Do you want to benchmark
Propagation [1] or
Backpropagation [2]
Input: 1
Choose no. of cycles
Begin propagation...
No. of units updated: 34900
No. of sites updated: 0
No. of links updated: 2748000
CPU Time used: 3.05 seconds
No. of connections per second (CPS) : 9.0099e+05
13.13. PAT SEL 281
13.13 Pat sel
Given a pattern le and a le which contains numbers, pat sel produces a new pattern
le which contains the subset of the rst one. This pattern le consists of the patterns
whose numbers are given in the number le.
Synopsis: pat sel <number le> <input pattern le> <output pattern le>
Parameters:
<number file> ASCII le which contains positive integer
numbers (one per line) in ascending order.
<input pattern file> SNNS pattern le.
<output pattern file> SNNS pattern le which contains the selected
subset (created by pat sel)
Pat sel can be used to create a pattern le which contains only the patterns that were
classied 'wrong' by the neural network. That is why a `result le' has to be created using
SNNS. The result le can be analyzed with the tool analyze. This 'number le' and the
corresponding 'pattern le' are used by pat sel. The new 'pattern le' will be created.
Note:
Pat sel is able to handle all SNNS pattern les. However, it becomes increasingly slow
with larger pattern sets. Therefore we provide also a simpler version of this program, that
is fairly fast on huge pattern les, but that can handle the most primitive pattern le form
only. I.e. les including subpatterns, pattern remapping, or class information can not be
handled. This simpler form of the program pat sel is of course called pat sel simple.
13.14 Snns2c
Synopsis: snns2c <network> [<C-lename> [<function-name>] ]
where:
<network> is the name of the SNNS network le,
<C-filename> is the name of the output le
<function-name> is the name of the procedure in the application.
This tool compiles an SNNS network le into an executable C source. It reads a network
le <network.net> and generates a C source named <C-filename>. The network can be
called now as a function named <function-name>. If the parameter <function-name>
is missing, the name of <C-filename> is taken without the ending \*.c". If this parameter
is also missing, the name of the network le is chosen and tted with a new ending for the
output le. This name without ending is also used for the function name.
It is not possible to train the generated net, SNNS has to be used for this purpose. After
completion of network training with SNNS, the tool snns2c is used to integrate the trained
network as a C function into a separate application.
282 CHAPTER 13. TOOLS FOR SNNS
This program is also an example how to use the SNNS kernel interface for loading a net
and changing the loaded net into another format. All data and all SNNS functions {
except the activation functions { are placed in a single C function.
Note:
Snns2c does not support sites. Any networks created with SNNS that make use of the site
feature can not be converted to C source by this tool. Output functions are not supported,
either.
The program can translate the following network-types:
Feedforward networks trained with Backpropagation and all variants of it like Quick-
prop, RPROP etc.
Radial Basis Functions
Partially-recurrent Elman and Jordan networks
Time Delay Neural Networks (TDNN)
Dynamic Learning Vector Quantisation (DLVQ)
Backpropagation Through Time (BPTT, QPTT, BBPTT)
Counterpropagation Networks
While the use of SNNS or any parts of it in commercial applications requires a spe-
cial agreement/licensing from the developers, the use of trained networks generated with
snns2c is hereby granted without any fees for any purpose, provided proper academic
credit to the SNNS team is given in the documentation of the application.
Interfaces: All generated networks may be called as C functions. This functions have
the form:
intfunction-name(float *in, float *out, int init)
where in and out are pointers to the input and output arrays of the network. The init
ag is needed by some network types and it's special meaning is explained in 13.14.3. The
function normally returns the value 0 (OK). Other return values are explained in section
13.14.3.
The generated C-source can be compiled separately. To use the network it's necessary to
include the generated header le (*.h) which is also written by snns2c. This header le
contains a prototype of the generated function and a record which contains the number of
input and output units also.
To include the network in your own application the header le must be included. There
should also two arrays being provided, one for the input and one for the output of the
network. The number of inputs and outputs can be derived from a record in the header
le. This struct is named like the function which contains the compiled network and has
the suÆx REC to mark the record. So the number of input units is determined with
myNetworkREC.NoOfInput and the number of outputs with myNetworkREC.NoOfOutput
in this example. Hence, your own application should contain:
...
#include "myNetwork.h"
...
float *netInput, *netOutput; /* Input and Output arrays of the Network */
netInput = malloc(myNetworkREC.NoOfInput * sizeof(float));
netOutput = malloc(myNetworkREC.NoOfOutput * sizeof(float));
...
myNetwork(netInput, netOutput, 0)
...
Don't forget to link the object code of the network to your application
284 CHAPTER 13. TOOLS FOR SNNS
13.15 isnns
isnns is a small program based on the SNNS kernel which allows stream-oriented network
training. It is supposed to train a network with patterns that are generated on the
y by
some other process. isnns does not support the whole SNNS functionality, it only oers
some basic operations.
The idea of isnns is to provide a simple mechanism which allows to use an already
trained network within another application, with the possibility to retrain this network
during usage. This can not be done with networks created by snns2c. To use isnns
eectively, another application should fork an isnns process and communicate with the
isnns-process over the standard input and standard output channels. Please refer to the
common literature about UNIX processes and how to use the fork() and exec() system
calls (don't forget to ush() the stdout channel after sending data to isnns, other wise it
would hang). We can not give any more advise within this manual.
Synopsis of the isnns call:
isnns [ <output pattern le> ]
After starting isnns, the program prints its prompt \ok>" to standard output. This
prompt is printed again whenever an isnns command has been parsed and performed
completely. If there are any input errors (unrecognized commands), the prompt changes
to \notok>" but will change back to \ok>" after the next correct command. If any kernel
error occurs (loading non-existent or illegal networks, etc.) isnns exits immediately with
an exit value of 1.
13.15.1 Commands
The set of commands is restricted to the following list:
load <net le name>
This command loads the given network into the SNNS kernel. After loading the
network, the number of input units n and the number of output units m is printed
13.15. ISNNS 287
to standard output. If an optional <output pattern le> has been given at startup
of isnns, this le will be created now and will log all future training patterns (see
below).
save <net le name>
Save the network to the given le name.
prop < i1 > : : : < in >
This command propagates the given input pattern < i1 > : : : < in > through the
network and prints out the values of the output units of the network. The number
of parameters n must match exactly the number of input units of the network. Since
isnns reads input as long as enough values have been provided, the input values
may pass over several lines. There is no prompt printed while waiting for more input
values.
train < lr >< o1 > : : : < om >
Taking the current activation of the input units into account, this command performs
one single training step based on the training function which is given in the network
description. The rst parameter < lr > to this function refers to the rst training
parameter of the learning function. This is usually the learning rate. All other
learning parameters are implicitely set to 0:0. Therefore the network must use a
learning function which works well if only the rst learning parameter is given (e.g.
Std Backpropagation). The remaining values < o1 > : : : < om > dene the teaching
output of the network. As for the prop command, the number of values m is derived
from the loaded network. The values may again pass over several input lines.
Usually the activation of the input units (and therefore the input pattern for this
training step) was set by the command prop. However, since prop also applies one
propagation step, these input activations may change if a recurrent networks is used.
This is a special feature of isnns.
After performing the learning step, the summed squared error of all output units is
printed to standard output.
learn < lr >< i1 > : : : < in >< o1 > : : : < om >
This command is nearly the same as a combination of prop and train. The only
dierence is, that it ensures that the input units are set to the given values < i1 >
: : : < in > and not read out of the current network. < o1 > : : : < om > represents
the training output and < lr > again refers to the rst training parameter.
After performing the learning step, the summed squared error of all output units is
printed to standard output.
quit
Quit isnns after printing a nal \ok>" prompt.
help
Print help information to standard error output.
288 CHAPTER 13. TOOLS FOR SNNS
13.15.2 Example
Here is an example session of an isnns run. First the xor-network from the examples
directory is loaded. This network has 2 input units and 1 output unit. Then the patterns
(0 0), (0 1), (1 0), and (1 1) are propagated through the network. For each pattern the
activation of all (here it is only one) output units are printed. The pattern (0 1) seems
not to be trained very well (output: 0.880135). Therefore one learning step is performed
with a learning rate of 0.3, an input pattern (0 1), and a teaching output of 1. The next
propagation of the pattern (0 1) gives a slightly better result of 0.881693. The pattern
(which is still stored in the input activations) is again trained, this time using the train
command. A last propagation shows a nal result before quitting isnns. (The comments
starting with the #-character have been added only in this documentation and are not
printed by isnns)
unix> isnns test.pat
ok> load examples/xor.net
2 1 # 2 input and 1 output units
ok> prop 0 0
0.112542 # output activation
ok> prop 0 1
0.880135 # output activation
ok> prop 1 0
0.91424 # output activation
ok> prop 1 1
0.103772 # output activation
ok> learn 0.3 0 1 1
0.0143675 # summed squared output error
ok> prop 0 1
0.881693 # output activation
ok> train 0.3 1
0.0139966 # summed squared output error
ok> prop 0 1
0.883204 # output activation
ok> quit
ok>
Since the command line denes an output pattern le, after quitting isnns this le contains
a log of all patterns which have been trained. Note that for recurrent networks the input
activation of the second training pattern might have been dierent from the values given
by the prop command. Since the pattern le is generated while isnns is working, the
number of pattern is not known at the beginning of execution. It must be set by the user
afterwards.
13.15. ISNNS 289
unix> cat test.pat
SNNS pattern definition file V3.0
generated at Wed Mar 18 18:53:26 1998
# 1
0 1
1
# 2
0 1
1
Chapter 14
14.1 Overview
The simulator kernel oers a variety of functions for the creation and manipulation of
networks. These can roughly be grouped into the following categories:
functions to manipulate the network
functions to determine the structure of the network
functions to dene and manipulate cell prototypes
functions to propagate the network
learning functions
functions to manipulate patterns
functions to load and save the network and pattern les
functions for error treatment, search functions for names, functions to change default
values etc.
The following paragraphs explains the interface functions in detail. All functions of this
interface between the kernel and the user interface carry the prex krui ... (kernel user
interface functions).
Additionally there are some interface functions which are useful to build applications for
ART networks. These functions carry the prex artui . .. (ART user interface functions).
int krui_getNoOfUnits()
determines the number of units in the neural net.
int krui_getNoOfSpecialUnits()
determines the number of special units in the neural net.
int krui_getFirstUnit()
Many interface functions refer to a current unit or site. krui_getFirstUnit() selects the
(chronological) rst unit of the network and makes it current. If this unit has sites, the
chronological rst site becomes current. The function returns 0 if no units are dened.
int krui_getNextUnit()
selects the next unit in the net, as well as its rst site (if present); returns 0 if no more
units exist.
krui_err krui_setCurrentUnit( int UnitNo )
makes the unit with number UnitNo current unit; returns an error code if no unit with
the specied number exists.
int krui_getCurrentUnit()
determines the number of the current unit (0 if not dened)
char *krui_getUnitName( int UnitNo )
krui_err krui_setUnitName( int UnitNo, char *unit_name )
determines/sets the name of the unit. krui setUnitName returns NULL if no unit with
the specied number exists.
int krui_searchUnitName( char *unit_name )
searches for a unit with the given name. Returns the rst unit number if a unit with the
given name was found, 0 otherwise.
int krui_searchNextUnitName( void )
searches for the next unit with the given name. Returns the next unit number if a unit
with the given name was found, 0 otherwise. krui_searchUnitName( unit_name ) has
to be called before at least once, to conrm the unit name. Returns error code if no units
are dened.
14.2. UNIT FUNCTIONS 293
char *krui_getUnitOutFuncName( int UnitNo )
char *krui_getUnitActFuncName( int UnitNo )
determines the output function resp. activation function of the unit.
krui_err krui_setUnitOutFunc( int UnitNo, char *unitOutFuncName )
krui_err krui_setUnitActFunc( int UnitNo, char *unitActFuncName )
sets the output function resp. activation function of the unit. Returns an error code if
the function name is unknown, i.e. if the name does not appear in the function table as
output or activation function. The f-type of the unit is deleted.
char *krui_getUnitFTypeName( int UnitNo )
yields the f-type of the unit; returns NULL if the unit has no prototype.
FlintType krui_getUnitActivation( int UnitNo )
krui_err krui_setUnitActivation( int UnitNo,
FlintTypeParam unit_activation )
returns/sets the activation of the unit.
FlintType krui_getUnitInitialActivation( int UnitNo )
void krui_setUnitInitialActivation( int UnitNo,
FlintType unit_i_activation )
returns/sets the initial activation of the unit, i.e. the activation after loading the net. See
also krui_resetNet().
FlintType krui_getUnitOutput( int UnitNo )
krui_err krui_setUnitOutput( int unit_no, FlintTypeParam unit_output )
returns/sets the output value of the unit.
FlintType krui_getUnitBias( int UnitNo )
void krui_setUnitBias( int UnitNo, FlintType unit_bias )
returns/sets the bias (threshold) of the unit.
int krui_getUnitSubnetNo( int UnitNo )
void krui_setUnitSubnetNo( int UnitNo, int subnet_no)
returns/sets the subnet number of the unit (the range of subnet numbers is -32736 to
+32735).
unsigned short krui_getUnitLayerNo( int UnitNo )
void krui_setUnitLayerNo( int UnitNo,int layer_no)
returns/sets the layer number (16 Bit integer).
void krui_getUnitPosition( int UnitNo, struct PosType *position )
void krui_setUnitPosition( int UnitNo, struct PosType *position )
determines/sets the (graphical) position of the unit. See also include le glob_typ.h for
the denition of PosType.
int krui_getUnitNoAtPosition( struct PosType *position, int subnet_no )
yields the unit number of a unit with the given position and subnet number; returns 0 if
no such unit exists.
294 CHAPTER 14. KERNEL FUNCTION INTERFACE
bool krui_deleteSite()
deletes the current site of the current unit and all input connections to that site. The func-
tionality type of the unit is also erased. krui_setFirstSite() or krui_setNextSite()
has to be called before at least once, to conrm the current site/unit. After the deletion
the next available site becomes current. The return code is TRUE if further sites exist,
else FALSE. The following program is suÆcient to delete all sites of a unit:
if ( krui_setFirstSite() )
while ( krui_deleteSite() ) { }
krui_err krui_deleteLink()
deletes the current link. To delete a connection between the current unit/site and the
source unit a sequence of krui isConnected( source unit no ) and krui deleteLink()
is ideal.
krui_err krui_deleteAllInputLinks()
krui_err krui_deleteAllOutputLinks()
deletes all inputs/outputs at the current unit/site.
void krui_jogWeights( FlintTypeParam minus, FlintTypeParam plus)
adds uniform distributed random values to the connection weights of the network. Minus
must be less then plus. See also krui_setSeedNo(...).
krui_err krui_jogCorrWeights( FlintTypeParam minus, FlintTypeParam plus,
Add uniform distributed random values not to all but only to connection weights of highly
correlated, non-special hidden units. Minus must be less than Plus. The two hidden units
with maximum positive or negative correlation with an absolute value higher than mincorr
are searched for. The incoming weights of one of these units are jogged.
pattern sets online to see how dierent training targets in
uence training performance,
without having to modify and reload any pattern les.
krui_err krui_showPattern( int mode )
outputs a pattern on the activation or output values of the input/output units. The
following modes are possible:
OUTPUT_NOTHING: stores the input pattern in the activation of the input units.
OUTPUT_ACT: like OUTPUT_NOTHING, but stores also the output pattern in the activa-
tion of the output units.
OUTPUT_OUT: like OUTPUT_ACT, additionally a new output value of the output units
is computed.
krui_showPattern(...) draws pattern on the display. Generates an error code if the
number of input and output units does not correspond with the previously loaded pattern.
The constants of the various modes are dened in glob_typ.h.
krui_err krui_newPattern( void )
creates a new pattern (an input/output pair). A pattern can be created by modifying the
activation value of the input/output units. The function returns an error code if there is
insuÆcient memory or the number of input/output units is incompatible with the pattern.
Note: krui_newPattern() switches pattern shuing o. For shuing the new patterns
call:
krui_newPattern(...)
krui_shufflePatterns( TRUE )
void krui_deleteAllPatterns()
deletes all previously dened patterns in main memory.
krui_err krui_shufflePatterns( bool on_or_off )
shues the order of the patterns if on or off is true. If on or off is false the original
order can be restored. See also krui_setSeedNo(...).
krui_err krui_shuffleSubPatterns(bool on_or_off)
shues sub pattern pairs by using pseudo random generator.
krui_shuffleSubPatterns(TRUE) switches shueling of sub patterns on,
krui_shuffleSubPatterns(FALSE) switches shueling of sub patterns o.
The default presetting is krui_shuffleSubPatterns (FALSE).
int krui_getNoOfPatterns( void )
returns the actual number of patterns (0 if no patterns have been loaded).
int krui_getTotalNoOfSubPatterns( void )
returns the total number of subpatterns contained in all patterns of the current pattern
set (0 if no patterns have been loaded).
14.10. FUNCTIONS FOR THE MANIPULATION OF PATTERNS 307
krui_err krui_allocNewPatternSet(int *set_no)
returns the number of the allocated pattern set. In case of an error the error code will be
returned.
krui_err krui_setCurrPatSet(int number)
chooses the number of the current pattern set. The number ranges from 0 to n-1.
krui_err krui_deletePatSet(int number)
deletes all patterns of the pattern set with the given number.
krui_err krui_GetPatInfo(pattern_set_info *set_info,
pattern_descriptor *pat_info)
gets all available information concerning the current pattern set and the current pattern.
krui_err kruiDefShowSubPat(int *insize,int *outsize,int,
int *inpos,int *outpos)
denes the sub pattern that will be shown with the next call of krui_showPattern.
krui_err krui_DefTrainSubPat(int *insize,int *outsize,int *instep,
int *outstep, int *max_n_pos)
denes how sub pattern should be generated during training. krui_DefTrainSubPat()
has to be called before any training can take place.
krui_err krui_AlignSubPat(int *inpos,int *outpos,int *no)
aligns the given sub pattern position to a valid position which ts the dened sub pattern
training scheme (krui_DefTrainSubPat).
krui_err krui_GetShapeOfSubPattern(int *insize,int *outsize,int *inpos,
int *outpos,int *n_pos)
gets the shape of the sub pattern by using the current set,current pattern and the current
train scheme (dened with krui_DefTrainSubPat).
krui_err krui_setClassDistribution( unsigned int *classDist )
denes the composition of the pattern set. The list of integers supplied as parameters
will determine how often each class will be represented in every training epoch. This will
override the distribution implicitly dened in the pattern set by the number of patterns
of each class dened in it. Distribution values are assigned to classes in alphanumerical
ordering, i.e. the class with the rst alphanumerical name will be assigned the rst value
of the array and so forth. The number of values in the array has to match the number of
classes in the pattern set, or an error code will be returned.
krui_err krui_setClassInfo( char *name )
assinges the string name as class information to the current pattern. This will work only,
when all the patterns in the pattern set carry class information, or when the current
pattern is the only one in the current pattern set.
krui_err krui_useClassDistribution( bool use_it )
toggles the use of class information during training. When called with FALSE as pa-
308 CHAPTER 14. KERNEL FUNCTION INTERFACE
rameter, no class information will be used. When switched on, the pattern distribution
dened by krui_setClassDistribution() will be used to determine the composition of
the training pattern set.
pattern
KRERR_NP_NO_SUCH_PATTERN: No such pattern available
KRERR_NP_NO_CURRENT_PATTERN_SET: No current pattern set defined
KRERR_NP_DOES_NOT_FIT: Pattern (sub pattern) does not fit the
network
KRERR_NP_NO_TRAIN_SCHEME: No sub pattern shifting scheme defined
KRERR_NP_NO_OUTPUT_PATTERN: Pattern contains no output information
KRERR_NP_INCOMPATIBLE_NEW: New Pattern does not fit to existing set
KRERR_NP_PARAM_ZERO: Illegal parameter value <= 0 specified
KRERR_IP_ISNOTINITED: Paragon kernel is not initialized
KRERR_CC_INVALID_ADD_PARAMETERS: Additional Parameters of CC or Tacoma are
not set correctly
KRERR_NP_WORKAROUND: Initialization needs pattern. Please
press TEST and reinitialize
DDA_PARAM_ONE: First RBF-DDA parameter out of range (0,1]
DDA_PARAM_TWO: Second RBF-DDA parameter out of range (0,1]
DDA_PARAM_THREE: Third RBF-DDA parameter must be >=0
DDA_DESIRED_CLASS: More than one desired class in output pattern
DDA_CONN_POINTER: Input-hidden connection pointer problem
DDA_SHORTCUTS: Input-output shortcut connections are
not allowed
DDA_INPUT_ACT_FUNC: Activation function of input units must be
Act_Identity
DDA_HIDDEN_ACT_FUNC: Activation function of hidden units must be
Act_RBF_Gaussian
DDA_OUTPUT_ACT_FUNC: Activation function of output units must be
Act_Identity
KRERR_UPS_ACT_NOT_THRESHOLD: Wrong activation function for this algorithm
KRERR_UPS_LEARN_NOT_BACKPROP: Wrong learning function for this meta algorithm
KRERR_SINGLE_CLASS: No learning possible with only one class
KRERR_REMAP_FUNC Invalid pattern remap function
KRERR_NO_CLASSES Patterns don't have class information
KRERR_ILL_CLASS_DISTRIB Illegal virtual class distribution
KRERR_CANT_NORM Patterns can not be normalized
Chapter 15
Transfer Functions
Several other site functions have been implemented for the ART models in SNNS:
At least 2, At least 1, At most 0, Reciprocal. These functions normally are
not useful for other networks. So they are mentioned here, but not described in
detail. For more information please refer to the section about the corresponding
ART model in SNNS.
316 CHAPTER 15. TRANSFER FUNCTIONS
Activation functions:
Function Formula
8
< 1
> if netj (t) > 0
BAM = > aj (t 1) if netj (t) = 0
aj (t)
:
1 if netj (t) < 0
BSB aj (t) = netj (t) j
netj (t)+j
Elliott aj (t) = 1+jnet j (t)+j j
Identity aj (t) = netj (t)
IdentityPlusBias aj (t) = netj (t) + j
Logistic aj (t) = 1
1+e (netj (t)+j )
Logistic notInhibit like Logistic, but skip input from units
named "Inhibit"
Logistic Tbl like Logistic, but with table lookup
instead of computation
MinOutPlusWeight aj (t) = min i (wij + oi )
1 if netj (t) j
(
Perceptron aj (t) =
0 if netj (t) < j
Q
Product aj (t) = i wij oi
RBF Gaussian see the chapter about RBFs in the user manual
RBF MultiQuadratic see the chapter about RBFs in the user manual
RBF ThinPlateSpline see the chapter about RBFs in the user manual
0:85 aj (t 1)+
8
>
>
0:15 netj (t)
>
>
>
>
>
>
>
<
RM aj (t) = (1 aj (t 1)) if netj (t) > 0
>
>
> 0:85 aj (t 1)+
0:15 netj (t)
>
>
>
>
if netj (t) 0
>
>
:
(
(1 + aj (t 1))
1 if netj (t) > 0
Signum aj (t) =
8
1 if netj (t) 0
< 1
> if netj (t) > 0
Signum0 aj (t) =
>
0 if netj (t) = 0
:
1 if netj (t) < 0
Softmax aj (t) = e( ( net j (t)+j )
Several other activation functions have been implemented for the ART models in
SNNS: Less than 0, At most 0, At least 2, At least 1, Exactly 1,
ART1 NC, ART2 Identity, ART2 Rec, ART2 NormP, ART2 NormV, ART2 NormW,
ART2 NormIP, ART2 Rst, ARTMAP NCa, ARTMAP NCb, ARTMAP DRho.
15.1. PREDEFINED TRANSFER FUNCTIONS 317
These functions normally are not useful for other networks. So they are mentioned
here, but not described in detail. For time delay networks the following modied
versions of regular activation functions have been implemented: TD Logistic,
TD Elliott. They behave like the ordinary functions with the same name body.
Output Functions:
Function Formula
Identity oj (t)
= a8j (t)
< 0
> if aj (t) 0
Clip 0 1 oj (t) = aj (t) if aj (t) < 1
if aj (t) 1
>
:
1
1 if aj (t) 1
8
>
<
Clip 1 1 oj (t) = aj (t) if 1 < aj (t) < 1
if aj (t) 1
>
:
1
0 if aj (t) 0:5
(
Two other output functions have been implemented for ART2 in SNNS:
ART2 Noise PLin and ART2 Noise ContDiff. These functions are only useful for
the ART2 implementation, so they are mentioned here, but not described in detail.
Remap Functions:
Function Formula
None oj (t) = o(j (t)
0 if oj (t) 0:5
Binary oj (t) =
8
1 if oj (t) > 0:5
< low
> if oj (t) < low
Clip oj (t) = high if oj (t) > high
>
:
(
oj (t) else
0 if oj (t) > 0:5
Invers oj (t) =
1 if oj (t) 0:5
q
Norm oj (t) = oj (t)=
P
j oj (t)
2
8
>
>
>
high if threshold1 = threshold2 and
>
>
> oj (t) > threshold1
high if threshold1 6= threshold2 and
>
>
>
>
<
Threshold oj (t) = > oj (t) threshold1
>
>
>
>
high if threshold1 6= threshold2 and
>
>
>
>
>
oj (t) > threshold2
low else
:
318 CHAPTER 15. TRANSFER FUNCTIONS
are separated by a comma and a blank, or by a comma and a newline (see example in the
Appendix B).
The le may contain comment lines. Each line beginning with # is skipped by the SNNS-
kernel.
TWO_COLUMN_LINE "-"+"|""-"+
THREE_COLUMN_LINE "-"+"|""-"+"|""-"+
FOUR_COLUMN_LINE "-"+"|""-"+"|""-"+"|""-"+
SIX_COLUMN_LINE "-"+"|""-"+"|""-"+"|""-"+"|""-"+"|""-"+
SEVEN_COLUMN_LINE "-"+"|""-"+"|""-"+"|""-"+"|""-"+"|""-"+"|""-"+
TEN_COLUMN_LINE "-"+"|""-"+"|""-"+"|""-"+"|""-"+"|""-"+"|""-"+
"|""-"+"|""-"+"|""-"+
NO "no."
TYPE_NAME "type name"
UNIT_NAME "unit name"
ACT "act"
BIAS "bias"
ST "st"
POSITION "position"
SUBNET "subnet"
LAYER "layer"
ACT_FUNC "act func"
OUT_FUNC "out func"
SITES "sites"
SITE_NAME "site name"
SITE_FUNCTION "site function"
NAME "name"
TARGET "target"
SITE "site"
SOURCE:WEIGHT "source:weight"
UNIT_NO "unitNo."
DELTA_X "delta x"
DELTA_Y "delta y"
Z "z"
LLN "LLN"
LUN "LUN"
TROFF "Troff"
SOFF "Soff"
CTYPE "Ctype"
A.3.3 Grammar:
/* 3D translation section */
A.4.2 Grammar
The lines in the connection denition section have been truncated to 80 characters per
line for printing purposes.
B.1 Example 1:
SNNS network definition file V3.0
generated at Fri Aug 3 00:28:44 1992
no. | typeName | unitName | act | bias | st | position | act func | out func | sites
----|----------|----------|----------|----------|----|----------|----------|----------|-------
1 | | u11 | 1.00000 | 0.00000 | i | 1, 1, 0 |||
2 | | u12 | 0.00000 | 0.00000 | i | 2, 1, 0 |||
3 | | u13 | 0.00000 | 0.00000 | i | 3, 1, 0 |||
4 | | u14 | 0.00000 | 0.00000 | i | 4, 1, 0 |||
5 | | u15 | 1.00000 | 0.00000 | i | 5, 1, 0 |||
6 | | u21 | 1.00000 | 0.00000 | i | 1, 2, 0 |||
7 | | u22 | 1.00000 | 0.00000 | i | 2, 2, 0 |||
8 | | u23 | 0.00000 | 0.00000 | i | 3, 2, 0 |||
9 | | u24 | 1.00000 | 0.00000 | i | 4, 2, 0 |||
10 | | u25 | 1.00000 | 0.00000 | i | 5, 2, 0 |||
B.1. EXAMPLE 1: 329
11 | | u31 | 1.00000 | 0.00000 | i | 1, 3, 0 |||
12 | | u32 | 0.00000 | 0.00000 | i | 2, 3, 0 |||
13 | | u33 | 1.00000 | 0.00000 | i | 3, 3, 0 |||
14 | | u34 | 0.00000 | 0.00000 | i | 4, 3, 0 |||
15 | | u35 | 1.00000 | 0.00000 | i | 5, 3, 0 |||
16 | | u41 | 1.00000 | 0.00000 | i | 1, 4, 0 |||
17 | | u42 | 0.00000 | 0.00000 | i | 2, 4, 0 |||
18 | | u43 | 0.00000 | 0.00000 | i | 3, 4, 0 |||
19 | | u44 | 0.00000 | 0.00000 | i | 4, 4, 0 |||
20 | | u45 | 1.00000 | 0.00000 | i | 5, 4, 0 |||
21 | | u51 | 1.00000 | 0.00000 | i | 1, 5, 0 |||
22 | | u52 | 0.00000 | 0.00000 | i | 2, 5, 0 |||
23 | | u53 | 0.00000 | 0.00000 | i | 3, 5, 0 |||
24 | | u54 | 0.00000 | 0.00000 | i | 4, 5, 0 |||
25 | | u55 | 1.00000 | 0.00000 | i | 5, 5, 0 |||
26 | | u61 | 1.00000 | 0.00000 | i | 1, 6, 0 |||
27 | | u62 | 0.00000 | 0.00000 | i | 2, 6, 0 |||
28 | | u63 | 0.00000 | 0.00000 | i | 3, 6, 0 |||
29 | | u64 | 0.00000 | 0.00000 | i | 4, 6, 0 |||
30 | | u65 | 1.00000 | 0.00000 | i | 5, 6, 0 |||
31 | | u71 | 1.00000 | 0.00000 | i | 1, 7, 0 |||
32 | | u72 | 0.00000 | 0.00000 | i | 2, 7, 0 |||
33 | | u73 | 0.00000 | 0.00000 | i | 3, 7, 0 |||
34 | | u74 | 0.00000 | 0.00000 | i | 4, 7, 0 |||
35 | | u75 | 1.00000 | 0.00000 | i | 5, 7, 0 |||
36 | | h1 | 0.99999 | 0.77763 | h | 8, 0, 0 |||
37 | | h2 | 0.19389 | 2.17683 | h | 8, 1, 0 |||
38 | | h3 | 1.00000 | 0.63820 | h | 8, 2, 0 |||
39 | | h4 | 0.99997 | -1.39519 | h | 8, 3, 0 |||
40 | | h5 | 0.00076 | 0.88637 | h | 8, 4, 0 |||
41 | | h6 | 1.00000 | -0.23139 | h | 8, 5, 0 |||
42 | | h7 | 0.94903 | 0.18078 | h | 8, 6, 0 |||
43 | | h8 | 0.00000 | 1.37368 | h | 8, 7, 0 |||
44 | | h9 | 0.99991 | 0.82651 | h | 8, 8, 0 |||
45 | | h10 | 0.00000 | 1.76282 | h | 8, 9, 0 |||
46 | | A | 0.00972 | -1.66540 | o | 11, 1, 0 |||
47 | | B | 0.00072 | -0.29800 | o | 12, 1, 0 |||
48 | | C | 0.00007 | -2.24918 | o | 13, 1, 0 |||
49 | | D | 0.02159 | -5.85148 | o | 14, 1, 0 |||
50 | | E | 0.00225 | -2.33176 | o | 11, 2, 0 |||
51 | | F | 0.00052 | -1.34881 | o | 12, 2, 0 |||
52 | | G | 0.00082 | -1.92413 | o | 13, 2, 0 |||
53 | | H | 0.00766 | -1.82425 | o | 14, 2, 0 |||
54 | | I | 0.00038 | -1.83376 | o | 11, 3, 0 |||
55 | | J | 0.00001 | -0.87552 | o | 12, 3, 0 |||
56 | | K | 0.01608 | -2.20737 | o | 13, 3, 0 |||
57 | | L | 0.01430 | -1.28561 | o | 14, 3, 0 |||
58 | | M | 0.92158 | -1.86763 | o | 11, 4, 0 |||
59 | | N | 0.05265 | -3.52717 | o | 12, 4, 0 |||
60 | | O | 0.00024 | -1.82485 | o | 13, 4, 0 |||
61 | | P | 0.00031 | -0.20401 | o | 14, 4, 0 |||
62 | | Q | 0.00025 | -1.78383 | o | 11, 5, 0 |||
63 | | R | 0.00000 | -1.61928 | o | 12, 5, 0 |||
64 | | S | 0.00000 | -1.59970 | o | 13, 5, 0 |||
65 | | T | 0.00006 | -1.67939 | o | 14, 5, 0 |||
66 | | U | 0.01808 | -1.66126 | o | 11, 6, 0 |||
67 | | V | 0.00025 | -1.53883 | o | 12, 6, 0 |||
68 | | W | 0.01146 | -2.78012 | o | 13, 6, 0 |||
69 | | X | 0.00082 | -2.21905 | o | 14, 6, 0 |||
70 | | Y | 0.00007 | -2.31156 | o | 11, 7, 0 |||
71 | | Z | 0.00002 | -2.88812 | o | 12, 7, 0 |||
----|----------|----------|----------|----------|----|----------|----------|----------|-------
330 APPENDIX B. EXAMPLE NETWORK FILES
B.2 Example 2:
SNNS network definition file V3.0
generated at Fri Aug 3 00:25:42 1992
[Mol93] Martin Fodslette Moller. A scaled conjugate gradient algorithm for fast su-
pervised learning. Neural Networks, 6:525{533, 1993.
[MP69] M. Minsky and S. Papert. Perceptrons: An Introduction to Computational
Geometry. The MIT Press, Cambridge, Massachusetts, 1969.
[MR92] H.Braun M. Riedmiller. Rprop: A fast adaptive learning algorithm. In Proc.
of the Int. Symposium on Computer and Information Science VII, 1992.
[MR93] H.Braun M. Riedmiller. Rprop: A fast and robust backpropagation learning
strategy. In Proc. of the ACNN, 1993.
[P+88] William H. Press et al. Numerical Recipes, The Art of Scientic Computing.
Cambridge University Press, 1988.
[Pet91] A. Petzold. Vergleich verschiedener Lernverfahren fur neuronale Netze. Stu-
dienarbeit 940, IPVR, Universitat Stuttgart, 1991.
[RB93] M Riedmiller and H Braun. A direct adaptive method for faster backpropa-
gation learning: The RPROP algorithm. In Proceedings of the IEEE Inter-
national Conference on Neural Networks 1993 (ICNN 93), 1993.
[RCE82] D.L. Reilly, L.N. Cooper, and C. Elbaum. A neural model for category learn-
ing. Biol. Cybernet., 45, 1982.
[RHW86] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal repre-
sentations by error propagation. In D. E. Rumelhart and J. L. McClelland,
editors, Parallel Distributed Processing: Explorations in the microstructure of
cognition; Vol. 1: Foundations, Cambridge, Massachusetts, 1986. The MIT
Press.
[Rie93] M Riedmiller. Untersuchungen zu konvergenz und generalisierungsverhalten
uberwachter lernverfahren mit dem SNNS. In Proceedings of the SNNS 1993
workshop, 1993.
[RM86] D.E. Rumelhart and J.L. McClelland. Parallel Distributed Processing, vol-
ume 1. MIT Press, 1986.
[SB94] S.E. Fahlman S. Baluja. Reducing network depth in the cascade-correlation
learning architecture. Technical Report CMU-CS-94-209, Carnegie Mellon
University, 1994.
[Sch78] G. Schwarz. Annals of statistics 6, chapter Estimating the dimensions of a
model. 1978.
[Sch91a] M. Schmalzl. Rotations- und translationsinvariante Erkennung von maschi-
nengeschrieben Zeichen mit neuronalen Netzen. Studienarbeit 1011, IPVR,
Universitat Stuttgart, 1991.
[Sch91b] D. Schmidt. Anwendung neuronaler Netzwerkmodelle zur Erkennung und
Klassikation exogener und endogener Komponenten hirnelektrischer Poten-
tiale. Studienarbeit 1010, IPVR, Universitat Stuttgart, 1991.
BIBLIOGRAPHY 337
[Sch94] T. Schreiner. Ausdunnungsverfahren fur Neuronale Netze. Diplomarbeit 1140,
IPVR, Universitat Stuttgart, 1994.
[Shi68] R. Shibata. Biometrika, 68, chapter An optimal selection of Regression Vari-
ables. 1968.
[Sie91] J. Sienel. Kompensation von Storgerauschen in Spracherkennungssystemen
mittels neuronaler Netze. Studienarbeit 1037, IPVR, Universitat Stuttgart,
1991.
[SK92] J. Schurmann and U. Kreel. Mustererkennung mit statistischen Methoden.
Technical report, Daimler-Benz AG, Forschungszentrum Ulm, Institut fur
Informatik, 1992.
[Som89] T. Sommer. Entwurf und Realisierung einer graphischen Benutzerober
ache
fur einen Simulator konnektionistischer Netzwerke. Studienarbeit 746, IPVR,
Universitat Stuttgart, 1989.
[Soy93] T. Soyez. Prognose von Zeitreihen mit partiell recurrenten Netzen und Back-
propagtion. Studienarbeit 1270, IPVR, University of Stuttgart, 1993.
[SUN86] SUN. Sunview user reference manual. Technical report, SUN microsystems,
1986.
[Vei91] A. Veigel. Rotations{ und rotationsinvariante Erkennung handgeschriebener
Zeichen mit neuronalen Netzwerken. Diplomarbeit 811, IPVR, Universitat
Stuttgart, 1991.
[Vog92] M. Vogt. Implementierung und Anwendung von 'Generalized Radial Basis
Functions' in einem Simulator neuronaler Netze. Diplomarbeit 875, IPVR,
Universitat Stuttgart, 1992.
[Was89] Philip D. Wasserman. Neural Computing. Van Nostrand Reinhold, New York,
1989.
[Was95] Philip D. Wasserman. Advanced Methods in Neural Computing. Van Nostrand
Reinhold, 1995.
[Weh94] C. Wehrfritz. Neuronale netze als statistische methode zur erklarung von
klassikationen. Master's thesis, Friedrich-Alexander-Universitat Erlangen-
Nurnberg, Lehrstuhl fur Statistik 1, 1994.
[Wer88] P. Werbos. Backpropagation: Past and future. In Proceedings of the IEEE
International Conference on Neural Networks, pages 343{353. IEEE Press,
1988.
[WHH+ 89] A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. Lang. Phoneme
Recognition Using Time Delay Neural Networks. IEEE Transactions on Ac-
coustics, Speech and Signal Processing, 37:328{339, 1989.
[YLC90] S. Solla Y. Le Cun, J. Denker. Optimal Brain Damage. In D. S. Touretzky,
editor, Advances in Neural Information Processing Systems (NIPS) 2, pages
598{605, San Mateo, 1990. Morgan Kaufmann Publishers Inc.
338 BIBLIOGRAPHY