SC Unit II

M.Sc. IT Sem.
I Unit II Soft Computing

Artificial Neural Network:
An artificial neural network (ANN) is an efficient information processing system which resembles
in characteristics with a biological neural network. ANNs possess large number of highly interconnected
processing elements called nodes or units or neuron, which usually operate in parallel and are configured
in regular architectures. Each neuron is connected with the oilier by a connection link. Each connection link
is associated with weights which contain information about the input signal. This information is used by
the neuron net to solve a particular problem. ANNs' collective behavior is characterized by their ability to
learn, recall and generalize training pattern or data similar to that of a human brain. They have the capability
to model networks of original neurons as found in the brain. Thus, the ANN processing elements are called
neurons or artificial neurons.
It should be noted that each neuron has an internal stare of its own. This internal stare is called the
activation or activity level of neuron, which is the function of the. inputs the neuron receives. The activation
signal of a neuron is transmitted to other neurons. Remember, a neuron can send only one signal at a time,
which can be transmitted to several ocher neurons.
To depict the basic operation of a neural net, consider a set of neurons, say X 1 and X2, transmitting
signals to another neuron, Y. Here X1, and X2 are input neurons, which transmit signals, and Y is the output
neuron, which receives signals. Input neurons X1, and X2 are connected to the output neuron Y, over a
weighted interconnection links (W1, and W2) as shown in Figure 2·1.
For the above simple neuron net architecture, the net input has to be calculated in the following way:
yin = + x1w1+x2w2
where x1 and x2 are the activations of the input neurons X1, and X2, i.e., the output of input signals. The
output y of the output neuron Y can be obtained applying activation over the net input, i.e., the function of
the net input:
y= f(yin)
Output=Function(net input calculated)
The function to be applied over the net input is called activation function. There are various activation
functions. The above calculation on of the net input is similar to the calculation of output of a pure linear
straight-line equation (y = mx). The neural net of a pure linear equation is as shown in Figure 2·2.
Mr. Bhanuprasad Vishwakarma Page: 1

M.Sc. IT Sem. I Unit II Soft Computing
Biological Neural Network
It is well known that the human brain consists of a huge number of neurons, approximately 10 11, with
numerous interconnections. A schematic diagram of a biological neuron is shown in Figure 2-4.
The biological neuron depicted in Figure 2-4 consists of three main parts:
1. Soma or cell body- where the cell nucleus is located.
2. Dendrites- where the nerve is connected to the cell body.
3. Axon- which carries the impulses of the neuron.
Dendrites are tree-like networks made of nerve fiber connected to the cell body. An axon is a
single, long connection extending from the cell body and carrying signals from the neuron. The end of the
axon splits into fine strands. It is found that each strand terminates into a small bulb-like organ called
synapse. It is through synapse that e neuron introduces its signal to other nearby neurons. The receiving
ends of these synapse that the neurons can be un both on the dendrites and on the cell body. There are
approximately 104 synapses per neuron in the human brain.
Electric impulses are passed between the synapse and the dendrites. This type of signal
transmission involves chemical process in which specific transmitter substances are released from the
sending side of the junction. This results in increase or decrease the electric potential inside the body of
the receiving cell. If the electric potential reaches a threshold then the receiving cell fires and a pulse or
action potential of fixed strength and duration is sent out through the axon to the synaptic junctions of the
other cells. After firing, a cell has to wait for a period of time called refractory period before it can fire
again. The synapses are said to be inhibitory if they let passing impulses hinder the firing of receiving cell
or excitatory if they let passing impulses cause the firing of the receiving cell.
Figure 2.5 shows a mathematical representation of the above discussed chemical processing raking
place in an artificial neuron. In this model, the net input is elucidated as
yin = + x1w1+x2w2 ++…………+++ xnwn =∑ 𝑥𝑖𝑤𝑖
where i represents the ith processing element. The activation function is applied over it to calculate
the output. The weight represents the strength of synapse connecting the input and the output neurons. A
positive weight corresponds to an excitatory synapse, and a negative weight corresponds to an inhibitory
synapse.

The terms associated with the biological neuron and their counterparts in artificial neuron are
presented in Table 2-l.
Brain vs. Computer - Comparison Between Biological Neuron and Artificial Neuron (Brain vs.
Computer)
A comparison could be made between biological and artificial neurons on the basis of the
following criteria:
1. Speed: The cycle time of execution in the ANN is of few nano seconds whereas in the case of
biological neuron it of a few milliseconds. Hence, the artificial neuron modeled using a com purer is more
faster.
2. Processing: Basically, the biological neuron can perform massive parallel operations simultaneously.
The artificial neuron can also perform several parallel operations simultaneously, but, in general, the
artificial neuron network process is faster than that of the brain. .
3. Size and complexity: The total number of neurons in the brain is about 10 11 and the total number of
interconnections is about 1015• Hence, it can be noted that the complexity of the brain is comparatively
higher, i.e. the computational work takes places not only in the brain cell body, but also in axon, synapse,
ere. On the other hand, the size and complexity of an ANN is based on the chosen application and
the network designer. The size and complexity of a biological neuron is more than that of an artificial
neuron.
4. Storage capacity (Memory): The biological. neuron stores the information in its interconnection or in
synapse strength but in an artificial neuron it is stored in its contiguous memory locations. In an artificial
neuron, the continuous loading of new information may sometimes overload the memory locations. As a
result, some of the addresses containing older memory locations may be destroyed. But in case of the
brain, new information can be added in the interconnections by adjusting the strength without destroying
the older information. A disadvantage related to brain is that sometimes its memory may fail to recollect
the stored information whereas in an artificial neuron, once the information is stored in its me~ locations,
it can be retrieved. Owing to these facts, The adaptability is more- toward an artificial neuron.
5. Tolerance: The biological neuron assesses fault tolerant capability whereas the artificial neuron has no
fault tolerance. Th distributed nature of the biological neurons enables to store and retrieve information
even when the interconnections in them get disconnected. Thus biological neurons are fault tolerant. But in
case of artificial neurons, the information gets corrupted if the network interconnections are disconnected.
Biological neurons can accept redundancies, which is not possible in artificial neurons. Even when some
cells die, the human nervous system appears to be performing with the same efficiency.
6. Control mechanism: In an artificial neuron modeled using a computer, there is a control unit present in
Central Processing Unit, which can transfer and control precise scalar values from unit to unit, bur there
is no such control unit for monitoring in the brain. The strength of a neuron in the brain depends on the
active chemicals present and whether neuron connections are strong or weak as a result of structure layer
rather than individual synapses. However, the ANN possesses simpler interconnections and is free from
chemical actions similar to those raking place in brain (biological neuron). Thus, the control mechanism
of an arri6cial neuron is very simple compared to that of a biological neuron.
we can say that an ANN possesses the following characteristics:

1. It is a neutrally implemented mathematical model.
2. There exist a large number of highly interconnected processing elements called neurons in an ANN.
3. The interconnections with their weighted linkages hold the informative knowledge.
4. The input signals arrive at the processing elements through connections and connecting weights.
5. The processing elements of the ANN have the ability to learn, recall and generalize from the given
data by suitable assignment or adjustment of weights.
6. The computational power can be demonstrated only by the collective behavior of neurons, and it
should be noted that no single neuron carries specific information.
Evolution of neural networks

The evolution of neural networks has been facilitated by the rapid development of architectures and
algorithms that are currently being used. The history of the development neural networks along with the
names of their designers is outlined Table 2~2.


Basic models of artificial neural Networks
The models of ANN are specified by the three basic entities namely:
1. the model's synaptic interconnections;
2. the training or learning rules adopted for upgrading arid adjusting the connection weights;
3. their activation functions.
Connections
The neurons should be visualized for their arrangements in layers. An ANN consists of a set of
highly interconnected processing elements (neurons) such that each processing element output is found to
be connected through weights to the other processing elements or to itself, delay lead and lag-free
connections are allowed. Hence, the arrangements of these processing elements and the geometry of their
interconnections are essential for an ANN. The point where the connection originates and terminates should
be noted, and the function of each processing element in an ANN should be specified.
Besides the simple neuron, there exist several other types of neural network connections. The
arrangement of neuron to form layers and the connection pattern formed within and between layers is called
the network architecture. here exist five basic types of neuron connection architectures. They are:
1. Single-Layer Feed- Forward Network;
2. Multilayer Feed-Forward Network;
3. Single Node with Its Own Feedback;
4. Single-Layer Recurrent Network;
5.Multilayer Recurrent Network.
Basically, neural nets are classified into single-layer or multilayer neural nets. A layer is formed by
taking a processing element and combining it with other processing elements. Practically, a layer implies a
stage, going stage by stage, i.e., the input stage and the output stage are linked with each other. These linked
interconnections lead to the formation of various network architectures.
When a layer of the processing nodes is formed, the inputs can be connected to these nodes with various
weights, resulting in series of outputs, one per node. Thus, a single-layer feed-forward network is formed.
As given in below figure.

A multi feed-forward network (Figure 2-7) is formed by the interconnection of several layers. The
input layer is that which receives the input and this layer has no function except buffering the input signal.
The output layer generates the output of the network. Any layer that is formed between e input and output
layers is called hidden layer. This hidden layer is internal to the network and has no direct contact with
the external environment. It should be noted that there may be zero to several hidden layers in an ANN.
More the number of the hidden layers, more is complexity of network This may, however, provide an
efficient output response. In case of fully connected network every output from one layer is connected to
each and every node in the next layer.
A network is said to be a feed-forward network if no neuron in the output layer is an input to a

node in the same layer or in the preceding layer. On the other hand, when outputs can be directed back as
inputs to same or preceding layer nodes then it results in me formation of feedback network.
If the feedback of the output processing elements is directed back as input to the processing
elements in the same layer then it is called lateral feedback. Recurrent networks are feedback networks
with closed loop. Figure 2-8(A) shows a simple recurrent neural network having a single neuron with
feedback to itself.

Figure 2-9 shows a single layer network with a feedback connection in which a processing element's
output can be directed back to the processing element itself or to the other processing element or to both.
The architecture of a competitive layer is shown in Figure 2-8(B), the competitive interconnections
having fixed weights of -g. This net is called Maxnet, and will be discussed in the unsupervised learning
network category. Apart from the network architectures discussed so far, there also exists another type of
architecture with lateral feedback, which is called the on·center-off-surround or lateral inhibition
structure. In this structure, each processing neuron receives two different classes of inputs- "excitatory"
input from nearby processing elements and "inhibitory" inputs from more distantly located processing
elements. This type of interconnection is shown in Figure 2-11.
Learning
The main property of an ANN is its capability to learn. Learning or training is a process by means
of which a neural network adapts itself to a stimulus by making proper adjustment resulting in the
production of desired response. Broadly, there are two kinds of learning in ANNs:
1. Parameter learning: It updates the connecting weights in a neural net.
2. Struture learning: It focuses on the change in network structure (which includes the number of
processing elements as well as their connection types).

The above two types of learning can be performed simultaneously or separately. Apart from these two
categories of learning, the learning in an ANN can be generally classified into three categories as:
 Supervised Learning;
 Unsupervised Learning;
 Reinforcement Learning.
Supervised Learning:
Let us take the example of the learning process of a small child. The child doesn't know how to
read/write. He/she is being taught by the parents at home and by the teacher in school. The children are
trained and molded to recognize the alphabets, numerals, etc. Their each and every action is supervised by
a teacher. Actually, a child works on the basis of the output that he/she has to produce. All these real-time
events involve supervised learning methodology. Similarly, in ANNs following the supervised learning,
each input vector requires a corresponding target vector, which represents the desired output. The input
vector along with the target vector is called training pair. The network here is informed precisely about
what should be emitted as output. The block diagram of figure2-12 depicts the working of a supervised
learning network.
During training. the input vector is presented to the network, which results in an output vector. This output
vector is the actual output vector. Then the actual output vector is compared with the desired (target) output
vector. If there exists a difference between the two output vectors then an error signal is generated by the
network. This error signal is used for adjustment of weights until the actual output matches the desired
target (output).
Unsupervised Learning;
The learning here is performed without the help of a teacher. Consider the learning process of a
tadpole, it learns by itself, that is, a child fish learns to swim by itself, it is not taught by its mother. Thus,
its learning process is independent and is nor supervised by a teacher. In ANNs following unsupervised
learning, the input vectors of similar type are grouped without the use of training data to specify how
member of each group looks or to which group a number belong. In the training process, the network
receives the input patterns and organizes these patterns to form clusters. When a new input pattern is
applied, the neural network gives an output response indicating the class to which the input pattern belongs.
If for an input, a pattern class cannot be found then a new class is generated The block 1agram of
unsupervised learning is shown in Figure 2-13.
From Figure 2·13 it is clear that there is no feedback from the environment to inform what the
outputs should be or whether the outputs are correct. In this case, the network must itself discover patterns
regularities, features or categories from the input data and relations for the input data over the output. While
discovering all these features, the network undergoes change in its parameters. This process is called self-
organizing in which exact clusters will be formed by discovering similarities and dissimilarities among the
objects.
Reinforcement Learning:
This learning process is similar to supervised learning. In the case of supervised learning, the
correct target output values are known for each input pattern. But, in some cases, less information might
be available.
For example, the network might be told chat its actual output is only "50% correct" or so. Thus, here
only critic information is available, nor the exact information. The learning based on this critic information
is called reinforcement learning and the feedback sent is called reinforcement signal.
The block diagram of reinforcement learning is shown in Figure 2-14. The reinforcement learning
is a form of supervise the network receives some feedback from its environment. However, the feedback
obtained here is only evaluative and not instructive. The external reinforcement signals are processed in the
critic signal generator, and the obtained critic signals are sent to the ANN for adjustment of weights properly
so as to get better critic feedback in future. The reinforcement learning is also called learning with a critic
as opposed to learning with a teacher, which indicates supervised learning.

Activation Function:
To make the work more efficient and to obtain exact output, some force or activation may be
given. This activation helps in achieving the exact output. In a similar way, the activation function is
applied over the net input to calculate the output of an ANN.
The information processing of a processing element can be viewed as consisting of two major parts:
input and output. An integration function (say f) is associated with the input of a processing element. This
function serves to combine activation, information or evidence from an external source or other processing
elements into a net input to the processing element. The nonlinear activation function is used to ensure that
a neuron's response is bounded- that is, the actual response of the neuron is conditioned or dampened as a
result of large or small activating stimuli and is thus controllable.
Certain nonlinear functions are used to achieve advantages of a multilayer network from a single-
layer network. When a signal is fed through a multilayer network with linear activation functions, the
output obtained remains same as that could be obtained using a single -layer network. Due to this reason,
nonlinear functions are widely used in multilayer networks compared to linear functions.
There are several activation functions.


McCulloch-Pitts Neuron
The McCulloch-Pitts neuron was the earliest neural network discovered in 1943. It is usually called
as M-P neuron. The M-P neurons are connected by directed weighted paths. It should be noted that the
activation of a M-P neuron is binary, that is, at any time step the neuron may fire or may not fire. The
weights associated with the communication links may be excitatory (weight is positive) or inhibitory
(weight is negative). All the excitatory connected weights entering into a particular neuron will have same
weights. The threshold plays a major role in M-P neuron: There is a fixed threshold for each neuron, and if
the net input to the neuron is greater than the threshold then the neuron fires. Also, it should be noted that
any nonzero inhibitory input would prevent the neuron from firing. The M-P neurons are most widely used
in the case of logic function.
A simple M-P neuron is shown in Figure 2-18. Tthe M-P neuron has both excitatory and inhibitory
connections. It is excitatory with weight (w > 0) or inhibitory with weight -p(p < 0). In Figure 2-18, inputs
from X1 to Xn possess excitatory weighted connections and inputs from Xn+ 1 to Xn+m possess inhibitory
weighted interconnections. Since the firing of the output neuron is based upon the threshold, the activation
function here is defined as
For inhibition to be absolute, the threshold with the activation function should satisfy the following
condition:

The output will fire if it receives say “k” or more excitatory inputs but no inhibitory inputs, where ----
The M-P neuron has no particular training algorithm. An analysis has to be performed m determine
the values of the weights and the threshold. Here the weights of the neuron are set along with the threshold
to make the neuron "perform a simple logic function. The M-P neurons are used as buildings blocks on
which we can model any function or phenomenon, which can be represented as a logic function.
Linear Separability :
An ANN does not give an exact solution for a nonlinear problem. However, it provides possible
approximate solutions nonlinear problems. Linear separability, is the concept wherein the separation of the
input space into regions is based whether network response is positive or negative.
A decision line is drawn to separate positive and negative responses. The decision line may also be
called as the decision-making line or decision-support line or linear-separable line. The necessity of the
linear separability concept was felt to classify the patterns based upon their output responses. Generally
the net input calculated to the output unit is given as
For example, if a bipolar step activation function is used over the calculated net input (yin) then the value
of the function is 1 for a positive net input and -1 for a negative net input. Also, it is clear that there exists
a boundary between the regions where yin, > 0 and yin < 0. This region may be called as decision oundary
and can be determined by the relation

On the basis of the number of input units in the network, the above equation may represent a line,
a plane or a hyperplane. The linear separability of the network is based on the decision-boundary line. If
there exist weights (with bias) for which the training input vectors having positive (correct) response,+1,
lie on one side of the decision boundary and all the other vectors having negative (incorrect) response, -1,
lie on the otherside of the decision boundary. then we can conclude the problem is "linearly separable."
Consider a single-layer network as shown in Figure 2-19 with bias incduded.
The net input for the network shown in Figure 2-l9 is given as
yin= b+x1w1+x2w2
The separating line for which the boundary lies between the values x1and x2, so that the net gives a
positive response on one side and negative response on other side, is given as
b+x1w1+x2w2 = 0
If weight w2. is not equal to 0 then we get
Thus, the requirement for the positive response of the net is

b+x1w1+x2w2 > 0
During training process, the values of W1 > W2 and b are determined so that the net will produce a
positive (correct) response for the training data. if on the other hand, threshold value is being used, then
the condition for obtaining the positive response from output unit is

During training process, the values of W1 and W2 have to be determined, so that the net will have a
correct response to the training data. For this correct response, the line passes close through the origin. In
certain situations, even for correct response, the separating line does not pass through the origin.
Hebb Network
For a neural net, the Hebb learning rule is a simple one. Donald Hebb stated in 1949 that in the
brain, the learning is performed by the change in the synaptic gap. Hebb explained it: "When an axon of
cell A is near enough to excite cell B, and repeatedly or permanently takes place in firing it, some growth
process or metabolic change takes place in one or both the cells such that A’s efficiency, as one of the
cells firing B, is increased.
According to the Hebb rule, the weight vector is found to increase proportionately to the product
of the input and the learning signal. Here the learning signal is equal to the neuron's output. In Hebb
learning, if two interconnected neurons are 'on' simultaneously then the weights associated with these
neurons can be increased by the modification made in their synaptic gap (strength). The weight update in
Hebb rule is given by
wi(new) = wi(old) + xiy
The Hebb rule is more suited for bipolar data than binary data. If binary data is used, the above weight
updation formula cannot distinguish two conditions namely;
1. A training pair in which an input unit is "on" and target value is "off."
2. A training pair in which both the input unit and the target value are "off."
Thus, there are limitations in Hebb rule application over binary data. Hence, the representation using
bipolar data is advantageous.
The training algorithm of Hebb network is given below:
Step 0: First initialize the weights. Basically in this network they may be set to zero, i.e., wi = 0 for i= 1
to n where "n" may be the total number of input neurons.
Step 1: Steps 2-4 have to be performed for each input training vector and target output pair, s: t.
Step 2: Input unit activations are ser. Generally, the activation function of input layer is identity function:
xj=si for i =1 to n
Step 3: Output unit activations are set: y = t
Step 4: Weight adjustments and bias adjustments are performed:
wi(new) = wi(old) + xiy
b(new+ b(old)+y

Perceptron Network:
Perceptron networks come under single-layer feed-forward networks and are also called simple
perceptron. Various types of perceptron were designed by Rosenblatt (1962) and Minsky-Papert (1969,
1988). However, a simple perceptron network was discovered by Block in 1962.
The key points to be noted in a perceptron network are:

1- The perceptron network consists of three units, namely, sensory unit (input unit), associator unit
(hidden unit), response unit (output unit).
2- The sensory units are connected to associator units with fixed weights having values 1, 0 or -l,
which are assigned at random.
3- The binary activation function is used in sensory unit and associator unit.
4- The response unit has an activation of l, 0 or -1. The binary step which fixed threshold Ѳ is used
as activation for associator. The output signals that are sent from the associator unit to the
response unit are only binary.
5- The output of the perceptron network is given by
6- The perceptron learning rule is used in the weight updation between the associator unit and the
response unit. For each training input, the net will calculate the response and it will determine
whether or not an error has occurred.
7- The error calculation is based on the comparison of the value of targets with those of the
ca1culated outputs.
8- The weights on the connections from the units that send the nonzero signal will get adjusted
suitably.
9- The weights will be adjusted on the basis of the learning rule if an error has occurred for a
particular training pattern i.e.
wi(new) = wi(old) + αtx
b(new+ b(old)+ αt
If no error occurs, there is no weight updation and hence the training process may be stopped. In the
above equations, the target value "t" is +1 or –l and α is the learning rate. ln general, these learning rules
begin with an initial guess at the weight values and then successive adjustments are made on the basis of
the evaluation of an above function. Eventually, the learning rules reach a near-optimal or optimal
solution in a finite number of steps.

A perceptron network with its three units is shown in Figure 3~1. A£ shown in Figure 3~1. a sensory unit
can be a two-dimensional matrix of 400 photodetectors upon which a lighted picture with geometric black
and white pattern impinges. These detectors provide a binary(0) electrical signal if the input signal found
to exceed a certain value of threshold. Also, these detectors are connected randomly with the associator
unit. The associator unit is found to consist of a set of subcircuits called feature predicates. The feature
predicates are hard-wired to detect the specific feature of a pattern and are equivalent to the feature
detectors.
Perceptron Learning Rule:

In case of the perceptron learning rule, the learning signal is the difference between the desired
and actual, response of a neuron.
Consider a finite "n" number of input training vectors, with their associated target (desired) values
x(n) and t{n), where "n” ranges from 1 to N. The target is either + 1 or -1. The output ''y" is obtained on
the basis of the net input calculated and activation function being applied over the net input.

The weight can be initialized at any values. The perceptron rule convergence theorem state that “If there
is a weight vector W such that f(x(n)W)=t(n), for all n, then for any starting vector w1, the perceptron
learning rule will converge to a weight vector that gives the correct response for all training patterns, and
this learning takes place within a finite number of steps provided that the solution exists.
Architecture of Perceptron Network:
In the original perceptron network, the output obtained from the associator unit is a binary vector,
and hence that output can be taken as input signal to the response unit and classification can be performed.
Here only the weights between the associator unit and the output unit can be adjusted and the weights
between the sensory and associator units are fixed. As a result, the discussion of the network is limited to a
single portion. Thus, the associator unit behaves like the input unit. A simple perceptron network
architecture is shown in Figure 3-2.
In Figure 3-2, there are n input neurons, 1 output neuron and a bias. The input-layer and output layer
neurons are connected through a directed communication link, which is associated with weights. The goal
of the perceptron net is to classify the input vector as a member or not a member to a particular
class.

The flowchart for the perceptron network training Process
The flowchart for the perceptron network training is shown in Figure 3-3. The network has to be suitably
trained to obtain the response. The flowchart depicted here presents the flow of the training process.

As depicted in the flowchart, first the basic initialization required for the training process is
performed. The entire loop of the training process continues until the training input pair is presented to the
network. The training (weight updation) is done on the basis of the comparison between the calculated
and desired output. The loop is terminated if there is no change in weight.
Perceptron Training Algorithm for Single Output Classes

The perceptron algorithm can be used for either binary or bipolar input vectors, having bipolar targets,
threshold being fixed and variable bias. In the algorithm discussed below, initially the inputs are assigned.
Then net input is calculated. The output of the network is obtained by app1ying the activation function over
the calculated net input. On performing comparison over the calculated and the desired output, the weight
updation process is carried out. The entire network is trained based on the mentioned stopping criterion.
The algorithm of a perceptron network is as follows:

Perceptron Training Algorithm for Multiple Output Classes

Perceptron Network Testing Algorithm:
Adaptive Linear Neuron (Adaline)

The units with linear activation function are called liner units. A network with a single linear unit
is called an Adaline (adaptive linear neuron). That is, in an Adaline, the input-output relationship is linear.
Adaline uses bipolar activation for its input signals and its target output. The weights between the input
and the output are adjustable. The bias in Adaline acts like an adjustable weight, whose connection is
from a unit with activations being always 1. Adaline is a net which has only one output unit. The Adaline
network may be trained using delta rule. The delta rule may also be called as least mean square (LMS)
rule or Widrow-Hoff rule. This learning rule is found to minimize the mean-squared error between the
activation and the target value
Delta Rule for Single Output Unit

The Widrow-Hoff rule is very similar to perceptron learning rule. However, their origins are
different. The perceptron learning rule originates from the Hebbian assumption while the delta rule is
derived from the gradient-descent method (it can be generalized to more than one layer). Also, the
perceptron learning rule stops after a finite number of learning steps, but the gradient-descent approach
continues forever, converging only asymptotically to the solution. The delta rule updates the weights
between the connections so as w minimize the difference between the net input to the output unit and the
target value. The major aim is to minimize the error over all training patterns. This is done by reducing the
error for each pattern, one at a time.
The delta rule for adjusting the weight of ith pattern (i = 1 to n) is

∆wi = α(t- yin)xi
where ∆wi is the weight change; α the learning rate; x the vector of activation of input unit; yin the net
input to output unit, i.e., Y = ∑ 𝑥𝑖𝑤𝑖 ; the target output. The delta rule in case of several output units for
adjusting the weight from ith input unit to the jth output unit (for each pattern) is

Architecture of Adaptive Linear Neuron (Adaline)
As already stated, Adaline is a single-unit neuron, which receives input from several units and also
from one unit called bias. An Adaline model is shown in Figure 3-5. The basic Adaline model consists of
trainable weights. Inputs are either of the two values (+ 1 or -1) and the weights have signs (positive or
negative). Initially, random weights are assigned. The net input calculated is applied to a quantizer
transfer function (possibly activation function) that restores the output to + 1 or -1. The Adaline model
compares the actual output with the target output and on the basis of the training algorithm, the weights
are adjusted.
Flowchart for Training Process of Adaptive Linear Neuron (Adaline)

The flowchart for the training process is shown in Figure 3~6. This gives a pictorial representation
of the network training. The conditions necessary for weight adjustments have to be checked carefully. The
weights and other required parameters are initialized. Then the net input is calculated, output is obtained
and compared with the desired output for calculation of error. On the basis of the error Factor, weights are
adjusted.


Training Algorithm of Adaptive Linear Neuron (Adaline):
Testing Algorithm of Adaptive Linear Neuron (Adaline):

It is essential to perform the resting of a network that has been trained. When training is completed, the
Adaline can be used to classify input patterns. A step function is used to test the performance of the
network. The resting procedure for the Adaline network is as follows:

Multiple Adaptive Linear Neurons (Madaline):
The multiple adaptive linear neurons (Madaline) model consists of many Adalines in parallel with a single
output unit whose value is based on certain selection rules. 'It may use majority vote rule. On using this
rule, the output would have as answer either true or false. On the other hand, if AND rule is used, the
output is true if and only if both the inputs are true, and so on. The weights that are connected from the
Adaline layer to the Madaline layer are fixed, positive and possess equal values. The weights between rhe
input layer and the Adaline layer are adjusted during the training process. The Adaline and Madaline layer
neurons have a bias of excitation "l" connected to them. The training process for a Madaline system is
similar to that of an Adaline.
Architecture of Multiple Adaptive Linear Neurons (Madaline):

A simple Madaline architecture is shown in Figure 3-7, which consists of "n" units of input layer, "m"
units of Adaline layer and "1" unit of the Madaline layer. Each neuron in the Adaline and Madaline layers
has a bias of excitation 1. The Adaline layer is present between the input layer and the Madaline (output)
layer; hence, the Adaline layer can be considered a hidden layer. The use of the hidden layer gives the net
computational capability which is nor found in single-layer nets, but chis complicates the training process
to some extent.
The Adaline and Madaline models can be applied effectively in communication systems of adaptive
equalizers and adaptive noise cancellation and other cancellation circuits.
Flowchart of Training Process of Multiple Adaptive Linear Neurons (Madaline):

The flowchart of the training process of the Madaline network is shown in Figure 3-8. In case of training,
the weight between the input layer and the hidden layer are adjusted, and the weights between the hidden
layer and the output layer are fixed. The time taken for the training process in the Madaline network is
very high compared to that of the Adaline network.


Figure 3.8

Training Algorithm of Multiple Adaptive Linear Neurons (Madaline):

Back-Propagation Network:
This learning algorithm is applied to multilayer feed-forward networks consisting of processing
elements with continuous differentiable activation functions. The networks associated with back-
propagation learning algorithm are also called back-propagation network (BPNs). For a given set of
training input-output pair, chis algorithm provides a procedure for changing the weights in a BPN to classify
the given input patterns correctly. The basic concept for this weight update algorithm is simply the gradient-
descent method as used in the case of simple perceptron networks with differentiable units. This is a method
where the error is propagated ack to the hidden unit. The aim of the neural network is to train the net to
achieve a balance between the net's ability to respond (memorization) and its ability to give reasonable
responses to the input that is similar bur not identical to me one that is used in training (generalization).
The back-propagation algorithm is different from other networks in respect to the process by which
weights are calculated during the learning period of the network. The general difficulty with the multilayer
perceptron is calculating the weights of the hidden layers in an efficient way that would result in a very
small or zero output error. When the hidden layers are increased the network training becomes more
complex. To update weights, the error must be calculated. The error, Which is the difference between the
actual (calculated) and the desired (target) output, is easily measured at the output layer. It should be noted
that at the hidden layers, there is no direct information of the error. Therefore, other techniques should be
used to calculate an error at the hidden layer, which will cause minimization of the output error, and this is
the ultimate goal.
The training of the BPN is done in three stages - the feed-forward of the input training pattern, the
calculation and back-propagation of the error, and updation of weights. The testing of the BPN involves
the computation of feed-forward phase only.
Architecture of Back-Propagation Network:

A back-propagation neural network is a multilayer, feed-forward neural network consisting of an
input layer, a hidden layer and an output layer. The neurons present in the hidden and output layers have
biases, which are the connections from the units whose activation is always 1. The bias terms also acts as
weights. Figure 3-9 shows the architecture of a BPN, depicting only the direction of information flow for
the feed-forward phase. During the back-propagation phase of learning signals are sent in the reverse
direction.
The inputs sent to the BPN and the output obtained from the net could be either binary (0, 1) or
bipolar ( -1, + 1). The activation function could be any function which increases monotonically and is also
differentiable.

Flowchart for Training Process of Back-Propagation Network:


Training Algorithm of Back-Propagation Network:


Learning Factors of Back-Propagation Network:
1-Initial Weights
The ultimate solution may be affected by the initial weights of a multilayer feed-forward network. They
are initialized at small random values. The choice of the initial weight determines how fast the network
converges. The initial weights cannot be very high because the sigmodal activation functions used here
may get saturated from the beginning itself and the system may be stuck at a local minima or at a very flat
plateau at the starting point itself. One method of choosing the weigh~ is choosing it in the range.
2- Learning Rate α
The learning rate (α) affects the convergence of the BPN. A larger value of a may speed up the
convergence but might result in overshooting, while a smaller value of a has vice-versa effect. The range
of a from 10-3 to 10 has been used successfully for several back-propagation algorithmic experiments. Thus,
a large learning rate leads to rapid learning but there is oscillation of weights, while the lower learning rare
leads to slower learning.
3- Momentum Factor
The gradient descent is very slow if the learning rare a is small and oscillates widely if a is too
large. One very efficient and commonly used method that allows a larger learning rate without oscillations
is by adding a momentum factor to the normal gradient-descent method.
The momentum factor is denoted by ηϵ [0, 1] and the value of 0.9 is often used for the momentum
factor. Also, this approach is more useful when some training data are very different from the majority
of data. A momentum factor can be used with either pattern by pattern updating or batch-mode updating.
In case of batch mode, it has the effect of complete averaging over the patterns. Even though the
averaging is only partial in the pattern-by-pattern mode, it leaves some useful information for weight
updation.

The weight updation formulas used here are
4- Generalization
The best network for generalization is BPN. A network is said robe generalized when it sensibly
interpolates with input networks that are new to the network. When there are many trainable parameters for
the given amount of training dam, the network learns well but does not generalize well. This is usually
called overfitting or overtraining. One solution to this problem is to monitor the error on the rest sec and
terminate the training when the error increases. With small number of trainable parameters, the network
fails to learn the training data and performs very poorly. on the .test data. For improving the ability of the
network to generalize from a training data set to a test data set, it is desirable to make small changes in the
input space of a pattern, without changing the output components. This is achieved by introducing variations
in the input space of training patterns as pan of the training set. However, computationally, this method is
very expensive. Also, a net with large number of nodes is capable of memorizing the training set at the cost
of generalization. As a result, smaller nets are preferred than larger ones.
5- Number of Training Data
The training data should be sufficient and proper. There exists a rule of thumb, which states that the
training data should cover the entire expected input space, and while training, training-vector pairs should
be selected randomly from the set. Assume that the input space as being linearly separable into "L" disjoint
regions with their boundaries being part of hyper planes. Let "T" be the lower bound on the number of
training pattern. Then, choosing T/L >>1 will allow the network w discriminate pattern classes using fine
piecewise hyperplane partitioning. Also in some cases, scaling or normalization has to be done to help
learning.
6- Number of Hidden Layer Nodes

If there exists more than one hidden layer in a BPN, then the calculations performed for a single layer are
repeated for all the layers and are summed up at the end. In case of all multilayer feed-forward networks,
the size of a hidden layer is very important. The number of hidden units required for an application needs
to be determined separately. The size of a hidden layer is usually determined experimentally. For a
network of a reasonable size, the size of hidden nodes has to be only relatively small fraction of the input
layer. For example, if the network does not converge to a solution, it may need more hidden nodes. On the
other hand, if the network converges, the user may try a very few hidden nodes and then settle finally on a
size based on overall system performance.

Testing Algorithm of Back-Propagation Network:
Radial Basis Function Network:

The radial basis function (RBF) is a classification and functional approximation neural network developed
by M.J.D. Powell. The Newark uses the most common nonlinearities such as sigmoidal and Gaussian
kernel functions. The Gaussian functions are also used in regularization networks. The response of such a
function is positive for all values of yi the response decreases to 0 as |y|-> 0. The Gaussian function is
generally defined as

Architecture of Radial Basis Function Network:

The architecture for the radial basis function network (RBFN) is shown in Figure 3-12. The architecture
consist of two layers whose output nodes form a linear combination of the kernel (or basis) functions
computed by means of the RBF nodes or hidden layer nodes. The basis function (nonlinearity) in the
hidden layer produces a significant nonzero response w the input stimulus it has received only when the
input of it falls within a small localized region of the input space. This network can also be called as
localized receptive field network.

Flowchart for Training Process of Radial Basis Function Network:

Training Algorithm of Radial Basis Function Network:
Time Delay Neural Network

The neural network has to respond to a sequence of patterns. Here the network is required to produce a
particular output sequence in response to a particular sequence of inputs. A shift register can be
considered as a tapped delay line. Consider a case of a multilayer perceptron where the tapped outputs of
the delay line are applied to its inputs. This type of network constitutes a time delay neural network
(TONN). The output consists of a finite temporal dependence on its input, given as
U(t)= F[x(t),x(t-1),…. X(t-n)]

Where F is any nonlinearity function. The multilayer perceptron with delay line is shown in Figure 3-14.
When the function U(t) is a weigh red sum, then the· TDNN is equivalent to a finite impulse response
filter (FIR). In TDNN, when the output is being fed back through a unit delay into the input layer, then the
net computed here is equivalent to an infinite impulse response (IIR) filter. Figure 3-15 shows TDNN with
output feedback.
Thus, a neuron with a tapped delay line is called a TDNN unit, and a network which consists of
TDNN units is called a TDNN. A specific application of TDNNs is speech recognition. The TDNN can be
trained using the back-propagation learning rule with a momentum factor.
Functional Link Networks

These networks are specifically designed for handling linearly non-separable problems using appropriate
input representation. Thus, suitable enhanced representation of the input data has to be found out. This
can be achieved by increasing the dimensions of the input space. The input data which is expanded is
used for training instead of the actual input data. In this case, higher order input terms are chosen so that
they are linearly independent of the original pattern components. Thus, the input representation has been
enhanced and linear separability can be achieved in the extended space. One of the functional link model
networks is shown in Figure 3·16. This model is helpful for learning continuous functions. For this model,

the higher-order input terms are obtained using the orthogonal basis functions such as sin πx, cos πx, sin
2πx, cos 2 πx, etc.
The most common example of linear nonseparability is XOR problem. The functional link networks help
in solving this problem. The inputs now are

Thus, it can be easily seen that the functional link network in Figure 3- 17 is used for solving this
problem. The functional link network consists of only one layer, therefore, it can be trained using delta
learning rule instead of the generalized delta learning rule used in BPN. As, a result, the learning speed of
the functional link network is faster than that of the BPN.
Tree Neural Networks

The tree neural networks (TNNs) are used for the pattern recognition problem. The main concept of
this network is to use a small multilayer neural network at each decision-making node of a binary
classification tree for extracting the non-linear features. TNNs completely extract the power of tree
classifiers for using appropriate local features at the different levels and nodes of the tree. A binary
classification tree is shown in Figure 3-18.
The decision nodes are present as circular nodes and the terminal nodes are present as square nodes.
The terminal node has class label denoted 'by C associated with it. The rule base is formed in the decision
node (splitting rule in the form of f(x) < Ѳ ). The rule determines whether the pattern moves to the right or
to the left. Here f(x) indicates the associated feature of pattern and "Ѳ" is the threshold. The pattern will be
given the class label of the terminal node on which it has landed. The classification here is based on the fact
that the appropriate features can be selected at different nodes and levels in the tree. The output feature y =
j(x)
The algorithm for a TNN consists of two phases:

1. Tree growing phase: In this phase, a large tree is grown by recursively finding the rules for
splitting until all the terminal nodes have pure or nearly pure class membership, else it cannot
split further.
2. Tree pruning phase: Here a smaller tree is being selected from the pruned subtree to avoid the
overfilling of data.
The training of TNN involves two nested optimization problems. In the inner optimization problem,
the BPN algorithm can be used to train the network for a given pair of classes. On the other hand, in outer
optimization problem, a heuristic search method is used to find a good pair of classes.

Associative Memory Networks
An associative memory network can store a set of patterns as memories. When the associative memory is
being presented with a key pattern, it responds by producing one of the scored patterns, which closely
resembles or relates to the key pattern. Thus, the recall is through association of the key pattern, with the
help of information memorized. These types of memories are also called as content-addressable memories
(CAM) in contrast to that of traditional address-addressable memories in digital computers where stored
pattern (in byres) is recalled by its address. It is also a matrix memory as in RAM/ROM. The CAM can also
be viewed as associating data to address, i.e.; for every data in the memory there is a corresponding unique
address. Also, it can be viewed as data correlator. Here input data is correlated with chat of the stored data
in the CAM. It should be noted that stored patterns must be unique, i.e., different patterns in each location.
If the same pattern exists in more than one location in the CAM, then, even though the correlation is correct,
the address is noted to be ambiguous. The basic structure of CAM is given in Figure 4-1.
Training Algorithms for Pattern Association

There are two algorithms developed for training of pattern association nets.
1-Hebb Rule
The Hebb rule is widely used for finding the weights of an associative memory neural net. The training
vector pairs here are denoted as s:r.
The f1owchart for the training algorithm of pattern association is as shown in Figure 4-2. The
weights are updated until there is no weight change.

The training algorithm of pattern association:
Outer Products Rule:

Autoassociative Memory Network:

In the case of an autoassociative neural net, the training input and the target output vectors are the
same. The-determination of weights of the association net is called storing of vector. This type of memory
net needs suppression of the output noise at the memory output. The vectors that have been stored can be
retrieved from distorted (noisy) input if the input is sufficiently similar to it. The net's performance is based
on its ability to reproduce a stored pattern from a noisy input. It should be noted, that in the case of
autoassociative net, the weights on the diagonal can be set to zero. This can be called as auto associative
net with no self-connection. The main reason behind setting the weights to zero is that it improves the net's
ability to generalize or increase the biological plausibility of the net. This may be more suited for iterative
nets and when delta rule is being used.
Architecture of Autoassociative Memory Network:

The architecture of an autoassociative neural net is shown in Figure 4-3.

Flowchart for Training Process of Autoassociative Memory Network:

Training Algorithm of Autoassociative Memory Network:
Testing Algorithm of Autoassociative Memory Network:

Heteroassociative Memory Network
In case of a heteroassocitive neural net, the training input and the target output vectors are
different. The weights are determined in a way that the net can store a set of pattern associations. The
association here is a pair of training input target output vector pairs (s(p), t(p)), with p=1…..P. Each
vector s(p) has n components and each vector t(p) has m components. The determination of weights is
done either by using Hebb rule or delta rule. The net finds an appropriate output vector, which
corresponds to an input vector x, that may be either one of the stored patterns or a new pattern.
Architecture of Heteroassociative Memory Network

The architecture of a heteroassociative net is shown in Figure 4·5. From The figure, it can be noticed that
for a hereroassociative net, the training input and target output vectors are different. The input layer
consists of n number of input units and the output layer consists of m number of output units. There exist
weighted interconnections between the input and output layers. The input and output layer units are nor
correlated with each other.
Flowchart and Training Algorithm of Heteroassociative Memory Network:

(See Hebb rule algorithm and flowchart)
Testing Algorithm of Heteroassociative Memory Network:

Bidirectional Associative Memory (BAM)

The BAM was developed by Kosko in the ear 1988. The BAM network performs forward and
backward associative searches stored stimulus responses. The BAM is a recurrent heteroassociative
pattern-matching network that encodes binary or bipolar patterns using Hebbian learning rule. It
associates patterns, say from set A to patterns from set B and vice versa is also performed. BAM neural
nets can respond to input from either layers (input layer and output layer).
Architecture of BAM
The architecture of BAM network is shown in Figure 4·6. It consists of two layers of neurons which are
connected by directed weighted path interconnections. The network dynamics involve two layers of
interaction. The BAM network iterates by sending the signals back and forth between the two layers until
all the neurons reach equilibrium. The weights associated with the network are bidirectional. Thus, BAM
can respond to the inputs m either layer. Figure 4-6 shows a single layer BAM network consisting of n
units in X layer and m units in Y layer. The layers can be connected in both direction (bidirectional) with
the result the weight matrix sent from the X layer to the Y layer is W and the weight matrix for signals
sent from the Y layer to the X layer is WT. Thus, the weight matrix is calculated in both directions.

There exist two types of BAM, called discrete and continuous BAM.
1- Discrete Bidirectional Associative Memory

The structure of discrete BAM is same as shown in Figure 4-6. When the memory neurons are being
activated by pruning an initial vector at the input of a layer, the network evolves a two pattern stable stare
with each pattern at the output of one layer. Thus, the network involves two layers of interaction between
each other.
The two bivalent forms of BAM are found to be related with each other, i.e., binary and bipolar.
The weights in both the cases are found as the sum of the outer products of the bipolar form of the given
training vector.
Determination of weight:
Activation Functions for BAM

The step activation function with a nonzero threshold is used as the activation function for discrete BAM
network.
The activation function for the Y layer:

Testing Algorithm for Discrete BAM:

2- Continuous BAM:
A continuous BAM transforms the input smoothly and continuously in the range 0-1 using logistic
sigmoid functions as the activation functions for all units. The logistic sigmoidal functions may be either
binary sigmoidal or bipolar sigmoidal function. When a bipolar sigmoidal function with a high gain is
chosen, then the continuous BAM might converge to a state of vectors which will approach vertices of the
cube. When that state of the vector approaches it acts like discrete BAM.

Hopfield Networks:
The networks proposed by Hopfield are known as Hopfield networks and it is his work that promoted
construction of the first analog VLSI neural chip. Two types of network are discussed:
 Discrete Hopfield Networks
 Continuous Hopfield Networks
Discrete Hopfield Networks

The Hopfield network is an autoassociative fully interconnected single-layer feedback network. It
is also a symmetrically weighted network. When this is operated in discrete line fashion it is called as
discrete Hopfield network and its architecture as a single-layer feedback network can be called as
recurrent. The network takes two-valued inputs: binary (0, 1) or bipolar (+l, -1); the use of bipolar inputs
makes the analysis easier. The network has symmetrical weights with no self-connections, i.e.
wij=wji; wii=0
The key points to be noted in Hopfield net are: only one unit updates its activation at a time; also each unit
is found to continuously receive an external signal along with the signals it receives from the other units in
the net. When a single-layer recurrent network is performing a sequential updating process, an input pattern
is first applied to the network and the network's output is found to be initialized accordingly. Afterwards,
the initializing pattern is removed, and the output that is initialized becomes the new updated input through
the feedback connections. The first updated input forces the first updated output, which in turn acts as the
second updated input through the feedback interconnections and results in second updated output. This
transition process continues unci! no new, updated responses are produced and the network reaches its
equilibrium.
Architecture of Discrete Hopfield Net

The architecture of discrete Hopfield net is shown in Figure 4-7. The Hopf1eld's model consist of
processing elements with two outputs, one inverting and the other non-inverting. The outputs from each
processing element are fed back to the input of other processing dements bur nor to itself. The
connections are found to be resistive and the connection strength over it is represented as wij. Here, as
such there are no negative resistors, hence excitatory connections use positive outputs and inhibitory
connections use inverted outputs. Connections are excitatory if the output of a processing element is
found to be same as the input, and they are inhibitory if the inputs differ from the output of the processing
element. A connection between the processing elements i and j is found to be associated with a connection
strength wij · This weight is positive if units i and j are both on. On the other hand, if the connection
strength is negative, it represents the situation of unit i being on and j being off. Also, the weights are
symmetric, i.e., the weights wij are same as wji.

Training Algorithm of Discrete Hopfield Net:

Testing Algorithm of Discrete Hopfield Net
2- Continuous Hopfield Network

A discrete Hopfield net can be modified to a continuous model, in which time is assumed to be a
continuous variable, and can be used for associative memory problems or optimization problems like
traveling salesman problem. The nodes of this network have a continuous, graded output rather than a two-
state binary output. Thus, the energy of the network decreases continuously with time. The continuous
Hopfield networks can be realized as an electronic circuit, which uses non-linear amplifiers and resistors.
This helps building the Hopf1eld network using analog VLSI technology.
Hardware Model of Continuous Hopfield Network:

The continuous network build up of electrical components is shown in Figure 4-8.
The model consists of n amplifiers, mapping its input voltage ui; into an output voltage yi over an
activation function a(ui)The activation function used can be a sigmoid function, say,
Where λ, is called the gain parameter.

The continuous model becomes a discrete one when λ->α. Each of the amplifiers consists of an input
capacitance c; and an input conductance gn. The external signals entering into the circuit are xi. The
external signals supply constant current to each amplifier for an actual circuit. The output of the j th node is
connected to the input of the ith node through conductance wij. Since all real resistor values are positive,
the inverted node outputs yi,̶ are used to simulate the inhibitory signals. The connection is made with the
signal from the noninverted output if the output of a particular node excites some other node. If the
connection is inhibitory, then the connection is made with the signal from the inverted output. Here also,
the important symmetric weight requirement for Hopfield network is imposed i.e.
wij=wji; and wii=0
The rule of each node in a continuous Hopfield network can be derived as shown in Figure 4-9.
Consider the input of a single node as in Figure 4-9. Applying Kirchoff's current law (KCL), which states
that the total current entering a junction is equal to that leaving the same function, we get

Iterative Autoassociative Memory Networks

There exists a situation where the nee does not respond to the input signal immediately with a
stored target pattern but the response may be more like the stored pattern, which suggests using the first
response as input to the net again. The iterative autoassociative net should be able to recover an original
stored vector when presented with a test vector dose to it. These types of networks can also be called as
recurrent autoassociative networks and Hopfield networks.
Linear Autoassociative Memory (LAM)

In 1977, James Anderson focused on the development of the LAM. This was based on Hebbian
rule, which scares that connections between neuron like elements are strengthened every time when they
are activated. Linear algebra is used to analyze the performance of the net.
Consider an m x m non singular symmetric matrix having "m" mutually orthogonal eigen vectors.
The Eigen vectors satisfy the property of orthogonality. A recurrent linear autoassociator network is
trained using a set of P orthogonal unit vector u1,, ... , up, where the number of times each vector going to
be presented is nor the same.

The weight matrix can be determined using Hebb learning rule, bur this allows the repetition of
some of the stored vectors. Each of these stored vectors is an eigen vector of the weight matrix. Here,
eigen values represent the number of times the vector was presented.
When the input vector X is presented, the output response of the net is XW. where W is the weight
matrix. From the concepts of linear algebra, we know that we obtain the largest value of ||XW|| when X is
the eigen vector for the largest eigenvalue; the next largest value of ||XW|| occurs when X is the
eigenvector for the next largest eigenvalue, and so on. Thus, a recurrent linear autoassociator produces its
response as the stored vector for which the input vector is most similar. This may perhaps rake several
iterations. The linear combination of vectors may be used to represent an input pattern. When an input
vector is presented, the response of the net is the linear combination of its corresponding eigen values.
The eigen vector with largest value in this linear expansion is the one which is most similar to that of the
input vectors. Although, the net increases its response corresponding to components of the input pattern
over which iris trained most extensively, the overall output response of the system may grow without
bound.
The main conditions of linearity between the associative memories is that the set of input vector
pairs and output vector pairs (since, autoassociative, both are same) should be mutually orthogonal with
each other, i.e., if ''Ap” is the input pattern pair, for p = 1 to P, then
AiAj = 0 for all i ≠ j
Brain-in-the-Box Network
An extension to the linear associator is the brain-in-the-box model. This model was described by
Anderson, 1972, as follows: an activity pattern inside the box receives positive feedback on certain
components, which has the effect of forcing it outward. When its element start to limit (when it hits the
wall of the box), it moves to corner of the box where it remains as such. The box resides in the state-space
(each neuron occupies one axis) of the network and represents the saturation limits for each state. Each
component here is being restricted between -1 and + 1. The updation of activations of the units in brain-
in-the-box model is done simultaneously.
The brain-in-the-box model consists of n units, each being connected to every oilier unit. Also,
there is a trained weight on the self-connection, i.e., the diagonal elements are set to zero. There also
exists a self-connection with weight 1.

Training Algorithm for Brain-in-the-Box Model:
Autoassociator with Threshold Unit

If a threshold unit is set, then a threshold function can be used as the activation function for an
iterative autoassociator net. The testing algorithm of autoassociator with specified threshold for bipolar
vectors and activations with symmetric weights and no self-connections, i.e., Wij = Wji and Wii = 0 is
given in the following section.
Testing Algorithm:

Temporal Associative Memory Network:

The associative memories discussed so far evolve a stable state and stay there. All are acting as
content addressable memories for a set of static patterns. Bur there is also a possibility of storing the
sequences of patterns in the form of dynamic transitions. These types of patterns are called as temporal
patterns and an associative memory with this capability is called as a temporal associative memory. In
this section, we shall learn how the BAM act as temporal associative memories. Assume all temporal
patterns as bipolar or binary vectors given by an ordered set S with p vectors:

SC Unit II

Uploaded by

Copyright:

Available Formats

SC Unit II

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SC Unit II

Uploaded by

Copyright:

Available Formats

M.Sc. IT Sem.

I Unit II Soft Computing

Mr. Bhanuprasad Vishwakarma Page: 1

Mr. Bhanuprasad Vishwakarma Page: 2

we can say that an ANN possesses the following characteristics:

Evolution of neural networks

Mr. Bhanuprasad Vishwakarma Page: 4

Mr. Bhanuprasad Vishwakarma Page: 5

Mr. Bhanuprasad Vishwakarma Page: 6

A network is said to be a feed-forward network if no neuron in the output layer is an input to a

Mr. Bhanuprasad Vishwakarma Page: 7

Mr. Bhanuprasad Vishwakarma Page: 8

Mr. Bhanuprasad Vishwakarma Page: 10

There are several activation functions.

Mr. Bhanuprasad Vishwakarma Page: 11

Mr. Bhanuprasad Vishwakarma Page: 12

Mr. Bhanuprasad Vishwakarma Page: 13

Mr. Bhanuprasad Vishwakarma Page: 14

Thus, the requirement for the positive response of the net is

Mr. Bhanuprasad Vishwakarma Page: 15

Mr. Bhanuprasad Vishwakarma Page: 17

The key points to be noted in a perceptron network are:

Mr. Bhanuprasad Vishwakarma Page: 18

Perceptron Learning Rule:

Mr. Bhanuprasad Vishwakarma Page: 19

Mr. Bhanuprasad Vishwakarma Page: 20

Mr. Bhanuprasad Vishwakarma Page: 21

Perceptron Training Algorithm for Single Output Classes

Mr. Bhanuprasad Vishwakarma Page: 22

Perceptron Training Algorithm for Multiple Output Classes

Mr. Bhanuprasad Vishwakarma Page: 23

Adaptive Linear Neuron (Adaline)

Delta Rule for Single Output Unit

The delta rule for adjusting the weight of ith pattern (i = 1 to n) is

Mr. Bhanuprasad Vishwakarma Page: 24

Flowchart for Training Process of Adaptive Linear Neuron (Adaline)

Mr. Bhanuprasad Vishwakarma Page: 25

Mr. Bhanuprasad Vishwakarma Page: 26

Testing Algorithm of Adaptive Linear Neuron (Adaline):

Mr. Bhanuprasad Vishwakarma Page: 27

Architecture of Multiple Adaptive Linear Neurons (Madaline):

Flowchart of Training Process of Multiple Adaptive Linear Neurons (Madaline):

Mr. Bhanuprasad Vishwakarma Page: 28

Mr. Bhanuprasad Vishwakarma Page: 29

Mr. Bhanuprasad Vishwakarma Page: 30

Mr. Bhanuprasad Vishwakarma Page: 31

Architecture of Back-Propagation Network:

Mr. Bhanuprasad Vishwakarma Page: 32

Flowchart for Training Process of Back-Propagation Network:

Mr. Bhanuprasad Vishwakarma Page: 33

Mr. Bhanuprasad Vishwakarma Page: 34

Training Algorithm of Back-Propagation Network:

Mr. Bhanuprasad Vishwakarma Page: 35

Mr. Bhanuprasad Vishwakarma Page: 36

Mr. Bhanuprasad Vishwakarma Page: 37

6- Number of Hidden Layer Nodes

Mr. Bhanuprasad Vishwakarma Page: 38

Radial Basis Function Network:

Mr. Bhanuprasad Vishwakarma Page: 39