SC Unit II
SC Unit II
SC Unit II
It should be noted that each neuron has an internal stare of its own. This internal stare is called the
activation or activity level of neuron, which is the function of the. inputs the neuron receives. The activation
signal of a neuron is transmitted to other neurons. Remember, a neuron can send only one signal at a time,
which can be transmitted to several ocher neurons.
To depict the basic operation of a neural net, consider a set of neurons, say X 1 and X2, transmitting
signals to another neuron, Y. Here X1, and X2 are input neurons, which transmit signals, and Y is the output
neuron, which receives signals. Input neurons X1, and X2 are connected to the output neuron Y, over a
weighted interconnection links (W1, and W2) as shown in Figure 2·1.
For the above simple neuron net architecture, the net input has to be calculated in the following way:
yin = + x1w1+x2w2
where x1 and x2 are the activations of the input neurons X1, and X2, i.e., the output of input signals. The
output y of the output neuron Y can be obtained applying activation over the net input, i.e., the function of
the net input:
y= f(yin)
Output=Function(net input calculated)
The function to be applied over the net input is called activation function. There are various activation
functions. The above calculation on of the net input is similar to the calculation of output of a pure linear
straight-line equation (y = mx). The neural net of a pure linear equation is as shown in Figure 2·2.
The biological neuron depicted in Figure 2-4 consists of three main parts:
1. Soma or cell body- where the cell nucleus is located.
2. Dendrites- where the nerve is connected to the cell body.
3. Axon- which carries the impulses of the neuron.
Dendrites are tree-like networks made of nerve fiber connected to the cell body. An axon is a
single, long connection extending from the cell body and carrying signals from the neuron. The end of the
axon splits into fine strands. It is found that each strand terminates into a small bulb-like organ called
synapse. It is through synapse that e neuron introduces its signal to other nearby neurons. The receiving
ends of these synapse that the neurons can be un both on the dendrites and on the cell body. There are
approximately 104 synapses per neuron in the human brain.
Electric impulses are passed between the synapse and the dendrites. This type of signal
transmission involves chemical process in which specific transmitter substances are released from the
sending side of the junction. This results in increase or decrease the electric potential inside the body of
the receiving cell. If the electric potential reaches a threshold then the receiving cell fires and a pulse or
action potential of fixed strength and duration is sent out through the axon to the synaptic junctions of the
other cells. After firing, a cell has to wait for a period of time called refractory period before it can fire
again. The synapses are said to be inhibitory if they let passing impulses hinder the firing of receiving cell
or excitatory if they let passing impulses cause the firing of the receiving cell.
Figure 2.5 shows a mathematical representation of the above discussed chemical processing raking
place in an artificial neuron. In this model, the net input is elucidated as
yin = + x1w1+x2w2 ++…………+++ xnwn =∑ 𝑥𝑖𝑤𝑖
where i represents the ith processing element. The activation function is applied over it to calculate
the output. The weight represents the strength of synapse connecting the input and the output neurons. A
positive weight corresponds to an excitatory synapse, and a negative weight corresponds to an inhibitory
synapse.
The terms associated with the biological neuron and their counterparts in artificial neuron are
presented in Table 2-l.
Brain vs. Computer - Comparison Between Biological Neuron and Artificial Neuron (Brain vs.
Computer)
A comparison could be made between biological and artificial neurons on the basis of the
following criteria:
1. Speed: The cycle time of execution in the ANN is of few nano seconds whereas in the case of
biological neuron it of a few milliseconds. Hence, the artificial neuron modeled using a com purer is more
faster.
2. Processing: Basically, the biological neuron can perform massive parallel operations simultaneously.
The artificial neuron can also perform several parallel operations simultaneously, but, in general, the
artificial neuron network process is faster than that of the brain. .
3. Size and complexity: The total number of neurons in the brain is about 10 11 and the total number of
interconnections is about 1015• Hence, it can be noted that the complexity of the brain is comparatively
Mr. Bhanuprasad Vishwakarma Page: 3
M.Sc. IT Sem. I Unit II Soft Computing
higher, i.e. the computational work takes places not only in the brain cell body, but also in axon, synapse,
ere. On the other hand, the size and complexity of an ANN is based on the chosen application and
the network designer. The size and complexity of a biological neuron is more than that of an artificial
neuron.
4. Storage capacity (Memory): The biological. neuron stores the information in its interconnection or in
synapse strength but in an artificial neuron it is stored in its contiguous memory locations. In an artificial
neuron, the continuous loading of new information may sometimes overload the memory locations. As a
result, some of the addresses containing older memory locations may be destroyed. But in case of the
brain, new information can be added in the interconnections by adjusting the strength without destroying
the older information. A disadvantage related to brain is that sometimes its memory may fail to recollect
the stored information whereas in an artificial neuron, once the information is stored in its me~ locations,
it can be retrieved. Owing to these facts, The adaptability is more- toward an artificial neuron.
5. Tolerance: The biological neuron assesses fault tolerant capability whereas the artificial neuron has no
fault tolerance. Th distributed nature of the biological neurons enables to store and retrieve information
even when the interconnections in them get disconnected. Thus biological neurons are fault tolerant. But in
case of artificial neurons, the information gets corrupted if the network interconnections are disconnected.
Biological neurons can accept redundancies, which is not possible in artificial neurons. Even when some
cells die, the human nervous system appears to be performing with the same efficiency.
6. Control mechanism: In an artificial neuron modeled using a computer, there is a control unit present in
Central Processing Unit, which can transfer and control precise scalar values from unit to unit, bur there
is no such control unit for monitoring in the brain. The strength of a neuron in the brain depends on the
active chemicals present and whether neuron connections are strong or weak as a result of structure layer
rather than individual synapses. However, the ANN possesses simpler interconnections and is free from
chemical actions similar to those raking place in brain (biological neuron). Thus, the control mechanism
of an arri6cial neuron is very simple compared to that of a biological neuron.
The architecture of a competitive layer is shown in Figure 2-8(B), the competitive interconnections
having fixed weights of -g. This net is called Maxnet, and will be discussed in the unsupervised learning
network category. Apart from the network architectures discussed so far, there also exists another type of
architecture with lateral feedback, which is called the on·center-off-surround or lateral inhibition
structure. In this structure, each processing neuron receives two different classes of inputs- "excitatory"
input from nearby processing elements and "inhibitory" inputs from more distantly located processing
elements. This type of interconnection is shown in Figure 2-11.
Learning
The main property of an ANN is its capability to learn. Learning or training is a process by means
of which a neural network adapts itself to a stimulus by making proper adjustment resulting in the
production of desired response. Broadly, there are two kinds of learning in ANNs:
1. Parameter learning: It updates the connecting weights in a neural net.
2. Struture learning: It focuses on the change in network structure (which includes the number of
processing elements as well as their connection types).
During training. the input vector is presented to the network, which results in an output vector. This output
vector is the actual output vector. Then the actual output vector is compared with the desired (target) output
vector. If there exists a difference between the two output vectors then an error signal is generated by the
network. This error signal is used for adjustment of weights until the actual output matches the desired
target (output).
Unsupervised Learning;
The learning here is performed without the help of a teacher. Consider the learning process of a
tadpole, it learns by itself, that is, a child fish learns to swim by itself, it is not taught by its mother. Thus,
its learning process is independent and is nor supervised by a teacher. In ANNs following unsupervised
learning, the input vectors of similar type are grouped without the use of training data to specify how
member of each group looks or to which group a number belong. In the training process, the network
receives the input patterns and organizes these patterns to form clusters. When a new input pattern is
applied, the neural network gives an output response indicating the class to which the input pattern belongs.
Mr. Bhanuprasad Vishwakarma Page: 9
M.Sc. IT Sem. I Unit II Soft Computing
If for an input, a pattern class cannot be found then a new class is generated The block 1agram of
unsupervised learning is shown in Figure 2-13.
From Figure 2·13 it is clear that there is no feedback from the environment to inform what the
outputs should be or whether the outputs are correct. In this case, the network must itself discover patterns
regularities, features or categories from the input data and relations for the input data over the output. While
discovering all these features, the network undergoes change in its parameters. This process is called self-
organizing in which exact clusters will be formed by discovering similarities and dissimilarities among the
objects.
Reinforcement Learning:
This learning process is similar to supervised learning. In the case of supervised learning, the
correct target output values are known for each input pattern. But, in some cases, less information might
be available.
For example, the network might be told chat its actual output is only "50% correct" or so. Thus, here
only critic information is available, nor the exact information. The learning based on this critic information
is called reinforcement learning and the feedback sent is called reinforcement signal.
The block diagram of reinforcement learning is shown in Figure 2-14. The reinforcement learning
is a form of supervise the network receives some feedback from its environment. However, the feedback
obtained here is only evaluative and not instructive. The external reinforcement signals are processed in the
critic signal generator, and the obtained critic signals are sent to the ANN for adjustment of weights properly
so as to get better critic feedback in future. The reinforcement learning is also called learning with a critic
as opposed to learning with a teacher, which indicates supervised learning.
Certain nonlinear functions are used to achieve advantages of a multilayer network from a single-
layer network. When a signal is fed through a multilayer network with linear activation functions, the
output obtained remains same as that could be obtained using a single -layer network. Due to this reason,
nonlinear functions are widely used in multilayer networks compared to linear functions.
McCulloch-Pitts Neuron
The McCulloch-Pitts neuron was the earliest neural network discovered in 1943. It is usually called
as M-P neuron. The M-P neurons are connected by directed weighted paths. It should be noted that the
activation of a M-P neuron is binary, that is, at any time step the neuron may fire or may not fire. The
weights associated with the communication links may be excitatory (weight is positive) or inhibitory
(weight is negative). All the excitatory connected weights entering into a particular neuron will have same
weights. The threshold plays a major role in M-P neuron: There is a fixed threshold for each neuron, and if
the net input to the neuron is greater than the threshold then the neuron fires. Also, it should be noted that
any nonzero inhibitory input would prevent the neuron from firing. The M-P neurons are most widely used
in the case of logic function.
A simple M-P neuron is shown in Figure 2-18. Tthe M-P neuron has both excitatory and inhibitory
connections. It is excitatory with weight (w > 0) or inhibitory with weight -p(p < 0). In Figure 2-18, inputs
from X1 to Xn possess excitatory weighted connections and inputs from Xn+ 1 to Xn+m possess inhibitory
weighted interconnections. Since the firing of the output neuron is based upon the threshold, the activation
function here is defined as
For inhibition to be absolute, the threshold with the activation function should satisfy the following
condition:
The output will fire if it receives say “k” or more excitatory inputs but no inhibitory inputs, where ----
The M-P neuron has no particular training algorithm. An analysis has to be performed m determine
the values of the weights and the threshold. Here the weights of the neuron are set along with the threshold
to make the neuron "perform a simple logic function. The M-P neurons are used as buildings blocks on
which we can model any function or phenomenon, which can be represented as a logic function.
Linear Separability :
An ANN does not give an exact solution for a nonlinear problem. However, it provides possible
approximate solutions nonlinear problems. Linear separability, is the concept wherein the separation of the
input space into regions is based whether network response is positive or negative.
A decision line is drawn to separate positive and negative responses. The decision line may also be
called as the decision-making line or decision-support line or linear-separable line. The necessity of the
linear separability concept was felt to classify the patterns based upon their output responses. Generally
the net input calculated to the output unit is given as
For example, if a bipolar step activation function is used over the calculated net input (yin) then the value
of the function is 1 for a positive net input and -1 for a negative net input. Also, it is clear that there exists
a boundary between the regions where yin, > 0 and yin < 0. This region may be called as decision oundary
and can be determined by the relation
The net input for the network shown in Figure 2-l9 is given as
yin= b+x1w1+x2w2
The separating line for which the boundary lies between the values x1and x2, so that the net gives a
positive response on one side and negative response on other side, is given as
b+x1w1+x2w2 = 0
If weight w2. is not equal to 0 then we get
During training process, the values of W1 and W2 have to be determined, so that the net will have a
correct response to the training data. For this correct response, the line passes close through the origin. In
certain situations, even for correct response, the separating line does not pass through the origin.
Hebb Network
For a neural net, the Hebb learning rule is a simple one. Donald Hebb stated in 1949 that in the
brain, the learning is performed by the change in the synaptic gap. Hebb explained it: "When an axon of
cell A is near enough to excite cell B, and repeatedly or permanently takes place in firing it, some growth
process or metabolic change takes place in one or both the cells such that A’s efficiency, as one of the
cells firing B, is increased.
According to the Hebb rule, the weight vector is found to increase proportionately to the product
of the input and the learning signal. Here the learning signal is equal to the neuron's output. In Hebb
learning, if two interconnected neurons are 'on' simultaneously then the weights associated with these
neurons can be increased by the modification made in their synaptic gap (strength). The weight update in
Hebb rule is given by
wi(new) = wi(old) + xiy
The Hebb rule is more suited for bipolar data than binary data. If binary data is used, the above weight
updation formula cannot distinguish two conditions namely;
1. A training pair in which an input unit is "on" and target value is "off."
2. A training pair in which both the input unit and the target value are "off."
Thus, there are limitations in Hebb rule application over binary data. Hence, the representation using
bipolar data is advantageous.
The training algorithm of Hebb network is given below:
Step 0: First initialize the weights. Basically in this network they may be set to zero, i.e., wi = 0 for i= 1
to n where "n" may be the total number of input neurons.
Step 1: Steps 2-4 have to be performed for each input training vector and target output pair, s: t.
Step 2: Input unit activations are ser. Generally, the activation function of input layer is identity function:
xj=si for i =1 to n
Step 3: Output unit activations are set: y = t
Step 4: Weight adjustments and bias adjustments are performed:
Mr. Bhanuprasad Vishwakarma Page: 16
M.Sc. IT Sem. I Unit II Soft Computing
wi(new) = wi(old) + xiy
b(new+ b(old)+y
Perceptron Network:
Perceptron networks come under single-layer feed-forward networks and are also called simple
perceptron. Various types of perceptron were designed by Rosenblatt (1962) and Minsky-Papert (1969,
1988). However, a simple perceptron network was discovered by Block in 1962.
6- The perceptron learning rule is used in the weight updation between the associator unit and the
response unit. For each training input, the net will calculate the response and it will determine
whether or not an error has occurred.
7- The error calculation is based on the comparison of the value of targets with those of the
ca1culated outputs.
8- The weights on the connections from the units that send the nonzero signal will get adjusted
suitably.
9- The weights will be adjusted on the basis of the learning rule if an error has occurred for a
particular training pattern i.e.
wi(new) = wi(old) + αtx
b(new+ b(old)+ αt
If no error occurs, there is no weight updation and hence the training process may be stopped. In the
above equations, the target value "t" is +1 or –l and α is the learning rate. ln general, these learning rules
begin with an initial guess at the weight values and then successive adjustments are made on the basis of
the evaluation of an above function. Eventually, the learning rules reach a near-optimal or optimal
solution in a finite number of steps.
Consider a finite "n" number of input training vectors, with their associated target (desired) values
x(n) and t{n), where "n” ranges from 1 to N. The target is either + 1 or -1. The output ''y" is obtained on
the basis of the net input calculated and activation function being applied over the net input.
The weight can be initialized at any values. The perceptron rule convergence theorem state that “If there
is a weight vector W such that f(x(n)W)=t(n), for all n, then for any starting vector w1, the perceptron
learning rule will converge to a weight vector that gives the correct response for all training patterns, and
this learning takes place within a finite number of steps provided that the solution exists.
Architecture of Perceptron Network:
In the original perceptron network, the output obtained from the associator unit is a binary vector,
and hence that output can be taken as input signal to the response unit and classification can be performed.
Here only the weights between the associator unit and the output unit can be adjusted and the weights
between the sensory and associator units are fixed. As a result, the discussion of the network is limited to a
single portion. Thus, the associator unit behaves like the input unit. A simple perceptron network
architecture is shown in Figure 3-2.
In Figure 3-2, there are n input neurons, 1 output neuron and a bias. The input-layer and output layer
neurons are connected through a directed communication link, which is associated with weights. The goal
of the perceptron net is to classify the input vector as a member or not a member to a particular
class.
where ∆wi is the weight change; α the learning rate; x the vector of activation of input unit; yin the net
input to output unit, i.e., Y = ∑ 𝑥𝑖𝑤𝑖 ; the target output. The delta rule in case of several output units for
adjusting the weight from ith input unit to the jth output unit (for each pattern) is
Figure 3.8
2- Learning Rate α
The learning rate (α) affects the convergence of the BPN. A larger value of a may speed up the
convergence but might result in overshooting, while a smaller value of a has vice-versa effect. The range
of a from 10-3 to 10 has been used successfully for several back-propagation algorithmic experiments. Thus,
a large learning rate leads to rapid learning but there is oscillation of weights, while the lower learning rare
leads to slower learning.
3- Momentum Factor
The gradient descent is very slow if the learning rare a is small and oscillates widely if a is too
large. One very efficient and commonly used method that allows a larger learning rate without oscillations
is by adding a momentum factor to the normal gradient-descent method.
The momentum factor is denoted by ηϵ [0, 1] and the value of 0.9 is often used for the momentum
factor. Also, this approach is more useful when some training data are very different from the majority
of data. A momentum factor can be used with either pattern by pattern updating or batch-mode updating.
In case of batch mode, it has the effect of complete averaging over the patterns. Even though the
averaging is only partial in the pattern-by-pattern mode, it leaves some useful information for weight
updation.
4- Generalization
The best network for generalization is BPN. A network is said robe generalized when it sensibly
interpolates with input networks that are new to the network. When there are many trainable parameters for
the given amount of training dam, the network learns well but does not generalize well. This is usually
called overfitting or overtraining. One solution to this problem is to monitor the error on the rest sec and
terminate the training when the error increases. With small number of trainable parameters, the network
fails to learn the training data and performs very poorly. on the .test data. For improving the ability of the
network to generalize from a training data set to a test data set, it is desirable to make small changes in the
input space of a pattern, without changing the output components. This is achieved by introducing variations
in the input space of training patterns as pan of the training set. However, computationally, this method is
very expensive. Also, a net with large number of nodes is capable of memorizing the training set at the cost
of generalization. As a result, smaller nets are preferred than larger ones.
5- Number of Training Data
The training data should be sufficient and proper. There exists a rule of thumb, which states that the
training data should cover the entire expected input space, and while training, training-vector pairs should
be selected randomly from the set. Assume that the input space as being linearly separable into "L" disjoint
regions with their boundaries being part of hyper planes. Let "T" be the lower bound on the number of
training pattern. Then, choosing T/L >>1 will allow the network w discriminate pattern classes using fine
piecewise hyperplane partitioning. Also in some cases, scaling or normalization has to be done to help
learning.
Architecture of BAM
The architecture of BAM network is shown in Figure 4·6. It consists of two layers of neurons which are
connected by directed weighted path interconnections. The network dynamics involve two layers of
interaction. The BAM network iterates by sending the signals back and forth between the two layers until
all the neurons reach equilibrium. The weights associated with the network are bidirectional. Thus, BAM
can respond to the inputs m either layer. Figure 4-6 shows a single layer BAM network consisting of n
units in X layer and m units in Y layer. The layers can be connected in both direction (bidirectional) with
the result the weight matrix sent from the X layer to the Y layer is W and the weight matrix for signals
sent from the Y layer to the X layer is WT. Thus, the weight matrix is calculated in both directions.
Determination of weight:
2- Continuous BAM:
A continuous BAM transforms the input smoothly and continuously in the range 0-1 using logistic
sigmoid functions as the activation functions for all units. The logistic sigmoidal functions may be either
binary sigmoidal or bipolar sigmoidal function. When a bipolar sigmoidal function with a high gain is
chosen, then the continuous BAM might converge to a state of vectors which will approach vertices of the
cube. When that state of the vector approaches it acts like discrete BAM.
Hopfield Networks:
The networks proposed by Hopfield are known as Hopfield networks and it is his work that promoted
construction of the first analog VLSI neural chip. Two types of network are discussed:
Discrete Hopfield Networks
Continuous Hopfield Networks
wij=wji; wii=0
The key points to be noted in Hopfield net are: only one unit updates its activation at a time; also each unit
is found to continuously receive an external signal along with the signals it receives from the other units in
the net. When a single-layer recurrent network is performing a sequential updating process, an input pattern
is first applied to the network and the network's output is found to be initialized accordingly. Afterwards,
the initializing pattern is removed, and the output that is initialized becomes the new updated input through
the feedback connections. The first updated input forces the first updated output, which in turn acts as the
second updated input through the feedback interconnections and results in second updated output. This
transition process continues unci! no new, updated responses are produced and the network reaches its
equilibrium.
Consider the input of a single node as in Figure 4-9. Applying Kirchoff's current law (KCL), which states
that the total current entering a junction is equal to that leaving the same function, we get
Consider an m x m non singular symmetric matrix having "m" mutually orthogonal eigen vectors.
The Eigen vectors satisfy the property of orthogonality. A recurrent linear autoassociator network is
trained using a set of P orthogonal unit vector u1,, ... , up, where the number of times each vector going to
be presented is nor the same.
When the input vector X is presented, the output response of the net is XW. where W is the weight
matrix. From the concepts of linear algebra, we know that we obtain the largest value of ||XW|| when X is
the eigen vector for the largest eigenvalue; the next largest value of ||XW|| occurs when X is the
eigenvector for the next largest eigenvalue, and so on. Thus, a recurrent linear autoassociator produces its
response as the stored vector for which the input vector is most similar. This may perhaps rake several
iterations. The linear combination of vectors may be used to represent an input pattern. When an input
vector is presented, the response of the net is the linear combination of its corresponding eigen values.
The eigen vector with largest value in this linear expansion is the one which is most similar to that of the
input vectors. Although, the net increases its response corresponding to components of the input pattern
over which iris trained most extensively, the overall output response of the system may grow without
bound.
The main conditions of linearity between the associative memories is that the set of input vector
pairs and output vector pairs (since, autoassociative, both are same) should be mutually orthogonal with
each other, i.e., if ''Ap” is the input pattern pair, for p = 1 to P, then
Brain-in-the-Box Network
An extension to the linear associator is the brain-in-the-box model. This model was described by
Anderson, 1972, as follows: an activity pattern inside the box receives positive feedback on certain
components, which has the effect of forcing it outward. When its element start to limit (when it hits the
wall of the box), it moves to corner of the box where it remains as such. The box resides in the state-space
(each neuron occupies one axis) of the network and represents the saturation limits for each state. Each
component here is being restricted between -1 and + 1. The updation of activations of the units in brain-
in-the-box model is done simultaneously.
The brain-in-the-box model consists of n units, each being connected to every oilier unit. Also,
there is a trained weight on the self-connection, i.e., the diagonal elements are set to zero. There also
exists a self-connection with weight 1.
Testing Algorithm: