Soft Computing Notes
Soft Computing Notes
Soft Computing Notes
SOFT COMPUTING
Soft Computing (CS361) Syllabus
Syllabus
Expected Outcome
Student is able to
1. Learn about soft computing techniques and their applications.
2. Analyze various neural network architectures.
3. Define the fuzzy systems.
4. Understand the genetic algorithm concepts and their applications.
5. Identify and select a suitable Soft Computing technology to solve the problem; construct a
Solution and implement a Soft Computing solution.
Text Books
1. S.N.Sivanandam and S.N.Deepa, Principles of soft computing-Wiley India.
2. Timothy J. Ross, Fuzzy Logic with engineering applications-Wiley India.
References
1. N. K. Sinha and M. M. Gupta, Soft Computing & Intelligent Systems: Theory &
Applications-Academic Press /Elsevier. 2009.
2. Simon Haykin, Neural Network- A Comprehensive Foundation- Prentice Hall International,
Inc.
3. R. Eberhart and Y. Shi, Computational Intelligence: Concepts to Implementation, Morgan
Kaufman/Elsevier, 2007.
4. Ross T.J. , Fuzzy Logic with Engineering Applications- McGraw Hill.
5. Driankov D., Hellendoorn H. and Reinfrank M., An Introduction to Fuzzy Control- Narosa
Pub.
6. Bart Kosko, Neural Network and Fuzzy Systems- Prentice Hall, Inc., Englewood Cliffs
7. Goldberg D.E., Genetic Algorithms in Search, Optimization, and Machine Learning
AddisonWesley.
Course Plan
Sem.Exam
Module Contents Hours
Marks%
Introduction to Soft Computing
network.
methods.
SECOND INTERNAL EXAM
Truth values and Tables in Fuzzy Logic, Fuzzy propositions,
characteristics – classification.
Introduction to genetic algorithm, operators in genetic
2. Part A
a. Total marks : 12
b. Four questions each having 3 marks, uniformly covering modules I and II; All four
3. Part B
a. Total marks : 18
b. Three questions each having 9 marks, uniformly covering modules I and II; Two
questions have to be answered. Each question can have a maximum of three subparts
4. Part C
a. Total marks : 12
b. Four questions each having 3 marks, uniformly covering modules III and
5. Part D
a. Total marks : 18
b. Three questions each having 9 marks, uniformly covering modules III and IV; Two
questions have to be answered. Each question can have a maximum of three subparts
6. Part E
a. Total Marks: 40
b. Six questions each carrying 10 marks, uniformly covering modules V and VI; four
Module – 1
• Hard computing
• Soft computing
It deals with approximate model to give solution for complex problems. The term “soft
computing" was introduced by Professor Lorfi Zadeh with the objective of exploiting the
tolerance for imprecision, uncertainty and partial truth to achieve tractability, robustness,
low solution cost and better rapport with reality. The ultimate goal is to be able to emulate
the human mind as closely as possible. It is a combination of Genetic Algorithm, Neural
Network and Fuzzy Logic.
Dendrites are tree like networks made of nerve fiber connected to the cell body. An Axon is a
single, long connection extending from the cell body and carrying signals from the neuron. The
end of axon splits into fine strands. It is found that each strand terminated into small bulb like
organs called as synapse. It is through synapse that the neuron introduces its signals to other
nearby neurons. The receiving ends of these synapses on the nearby neurons can be found both
on the dendrites and on the cell body. There are approximately 104 synapses per neuron in the
human body. Electric impulse is passed between synapse and dendrites. It is a chemical process
which results in increase/decrease in the electric potential inside the body of the receiving cell.
If the electric potential reaches a thresh hold value, receiving cell fires & pulse / action
potential of fixed strength and duration is send through the axon to synaptic junction of the
cell. After that, cell has to wait for a period called refractory period.
Cell Neuron
Dendrites Weights or interconnections
Soma Net input
Axon Output
𝑦𝑖𝑛 = 𝑥1 𝑤1 + 𝑥2 𝑤2 + ⋯ + 𝑥𝑛 𝑤𝑛 = ∑ 𝑥𝑖 𝑤𝑖
𝑖=1
th
Where, i representes i processing element. The activation function applied over it to calculate
the output. The weight represents the strength of synapses connecting the input and output.
data similar to that of human brain. The ANN processing elements called neurons or artificial
neurons.
Each neuron has an internal state of its own, called activation or activity level of neuron which
is the function of the inputs the neuron receives. The activation signal of a neuron is
transmitted to other neurons. A neuron can send only one signal at a time which can be
transmitted to several neurons.
Consider the figure 1.5, here X1 and X2 are input neurons, Y is the output neuron W1 and W2
are the weights net input is calculated as
𝑦𝑖𝑛 = 𝑥1 𝑤1 + 𝑥2𝑤2
where x1 and x2 are the activation of the input neurons X1 and X2, i.e., is the output of the input
signals. The output y of the output neuron Y can be obtained by applying activations over the
net input.
𝑦 = 𝑓(𝑦𝑖𝑛 )
Output = Function (net input calculated)
The function to be applied over the net input is called activation function. The net input
calculation is similar to the calculation of output of a pure linear straight line equation y=mx
The weight involve in the ANN is equivalent to the slope of the straight line.
Speed Execution time is few milliseconds Execution time is few nano seconds
Perform several parallel operations
Perform massive parallel operations
Processing simultaneously. It is faster the
simultaneously
biological neuron
Number of Neuron is 1011 and
Size and number of interconnections is 1015. It depends on the chosen
complexity So complexity of brain is higher than application and network designer.
computer
• Information is stored in
interconnections or in synapse • Stored in continuous memory
strength. location.
Storage
• New information is stored without • Overloading may destroy older
capacity
destroying old one. locations.
• Sometimes fails to recollect • Can be easily retrieved
information
• No fault tolerance
• Fault tolerant
• Information corrupted if the
Tolerance • Store and retrieve information
network connections
even interconnections fails
disconnected.
• Accept redundancies
• No redundancies
Depends on active chemicals and
Control CPU
neuron connections are strong or
mechanism Control mechanism is very simple
weak
Characteristics of ANN:
1960 Adaline Widrow and Hoff Here the weights are adjusted to reduce
the difference between the net input to
the output unit and the desired output.
1972 Kohonen self- Kohonen Inputs are clustered to obtain a fired
organizing feature output neuron.
map
1982, Hopfield network John Hopfield and Based on fixed weights.
1984, Tank Can act as associative memory nets
1985,
1986,
1987
1986 Rumelhart, • Multilayered
Back propagation
Hinton and • Error propagated backward from
network
Williams output to the hidden units
1988 Counter Grossberg Similar to kohonen network.
propagation
network
1987- Adaptive Carpenter and Designed for binary and analog inputs.
1990 resonance Grossberg
Theory(ART)
1988 Broomhead and Resemble back propagation network, but
Radial basis
Lowe activation function used is Gaussian
function network
function.
1988 Neo cognitron Fukushima For character recognition.
1.6.1 Connections
The arrangement of neurons to form layers and the connection pattern formed within and
between layers is called the network architecture. There exist five basic types of connection
architecture.
They are:
1. Single layer feed forward network
2. Multilayer feed-forward network
3. Single node with its own feedback
4. Single-layer recurrent network
5. Multilayer recurrent network
Feed forward network: If no neuron in the output layer is an input to a node in the same
layer / proceeding layer.
Feedback network: If outputs are directed back as input to the processing elements in the
same layer/proceeding layer.
Lateral feedback: If the output is directed back to the input of the same layer.
Recurrent networks: Are networks with feedback networks with closed loop.
Layer is formed by taking processing elements and combining it with other processing
elements. Input and output are linked with each other Inputs are connected to the
processing nodes with various weights, resulting in series of outputs one per node.
When a layer of processing nodes is formed the inputs can be connected to these nodes
with various weights, resulting in a serious of outputs, one per node. This is called single
layer feedforward network.
This network is formed by the interconnection of several layers. Input layer receives
input and buffers input signal. Output layer generated output. Layer between input and
output is called hidden layer. Hidden layer is internal to the network. There are Zero to
several hidden layers in a network. More the hidden layer more is the complexity of
network, but efficient output is produced.
It is a simple recurrent neural network having a single neuron with feedback to itself.
A single layer network with feedback from output can be directed to processing element
itself or to other processing element/both.
Processing element output can be directed back to the nodes in the preceding layer,
forming a multilayer recurrent network.
1.6.2 Learning
Learning or Training is the process by means of which a neural network adapts itself to a
stimulus by making proper parameter adjustments, resulting in the production of desired
response.
i) Supervised learning
ii) Unsupervised learning
iii) Reinforcement learning
i) Supervised learning
The Learning here is performed with the help of a teacher. Example: Consider the
learning process of a small child. Child doesn’t know how to read/write. Their each and
every action is supervised by a teacher. Actually a child works on the basis of the output
that he/she has to produce. In ANN, each input vector requires a corresponding target
vector, which represents the desired output. The input vector along with target vector is
called training pair. Input vector results in output vector. The actual output vector is
compared with desired output vector. If there is a difference means an error signal is
generated by the network. It is used for adjustment of weights until actual output
matches desired output.
Learning is performed without the help of a teacher. Example: tadpole – learn to swim
by itself. In ANN, during training process, network receives input patterns and organize
it to form clusters.
From the above Fig.1.16 it is observed that no feedback is applied from environment to
inform what output should be or whether they are correct. The network itself discover
patterns, regularities, features/ categories from the input data and relations for the input
data over the output. Exact clusters are formed by discovering similarities &
dissimilarities so called as self – organizing.
The external reinforcement signals are processed in the critic signal generator, and the
obtained critic signals are sent to the ANN for adjustment of weights properly to get
critic feedback in future.
To make work more efficient and for exact output, some force or activation is given. Like
that, activation function is applied over the net input to calculate the output of an ANN.
Information processing of processing element has two major parts: input and output. An
integration function (f) is associated with input of processing element.
1 𝑖𝑓 𝑥 ≥ 𝜃
𝑓(𝑥) = {
0 𝑖𝑓 𝑥 < 𝜃
Where, θ represents thresh hold value. It is used in single layer nets to convert the net
input to an output that is binary (0 or 1).
1 𝑖𝑓 𝑥 ≥ 𝜃
𝑓(𝑥) = {
−1 𝑖𝑓 𝑥 < 𝜃
Where, θ represents threshold value. It is used in single layer nets to convert the net input
to an output that is bipolar (+1 or -1).
Two types:
1
𝑓(𝑥) =
1 + 𝑒 −𝜆𝑥
2 1 − 𝑒 −𝜆𝑥
𝑓(𝑥) = − 1 =
1 + 𝑒 −𝜆𝑥 1 + 𝑒 −𝜆𝑥
Where λ represents steepness parameter and the sigmoid range is between -1 and +1.
The derivative of this function can be
𝜆
𝑓 ′ (𝑥) = [1 + 𝑓(𝑥)][1 − 𝑓(𝑥)]
2
𝑒 𝑥 − 𝑒 −𝑥
ℎ(𝑥) = = 𝑥
𝑒 + 𝑒 −𝑥
1 − 𝑒 −2𝑥
ℎ(𝑥) = =
1 + 𝑒 −2𝑥
1 𝑖𝑓𝑥 > 1
𝑓(𝑥) = {𝑥 𝑖𝑓 0 ≤ 𝑥 ≤ 1
0 𝑖𝑓 𝑥 < 0
The graphical representation of all these function is given in the upcoming figure 1.18
Figure 1.18: Depiction of activation functions: (A) identity function; (B) binary step function; (C) bipolar step
function; (D) binary sigmoidal function; (E) bipolar sigmoidal function; (F) ramp function.
inputs from x1 to xn possess excitatory weighted connection and X n+1 to xn+m has inhibitory
weighted interconnections.
1 𝑖𝑓 𝑦𝑖𝑛 ≥ 𝜃
𝑓(𝑥) = {
0 𝑖𝑓 𝑦𝑖𝑛 < 𝜃
For inhibition to be absolute, the threshold with the activation function should satisfy the
following condition:
θ > nw –p
Output will fire if it receives “k” or more excitatory inputs but no inhibitory inputs where
kw ≥ θ>(k-1) w
The M-P neuron has no particular training algorithm. An analysis is performed to determine
the weights and the threshold. It is used as a building block where any function or phenomenon
is modeled based on a logic function.
Donald Hebb stated in 1949 that “In brain, the learning is performed by the change in the
synaptic gap”. When an axon of cell A is near enough to excite cell B, and repeatedly or
permanently takes place in firing it, some growth process or metabolic change takes place in
one or both the cells such that A’s efficiency, as one of the cells firing B, is increased.
According to Hebb rule, the weight vector is found to increase proportionately to the product of
the input and the learning signal. In Hebb learning, two interconnected neurons are ‘on’
simultaneously. The weight update in Hebb rule is given by
wi(new) = wi (old)+ xi y
Hebbs network is suited more for bipolar data. If binary data is used, the weight updation
formula cannot distinguish two conditions namely:
1. A training pair in which an input unit is “on” and the target value is “off”.
2. A training pair in which both the input unit and the target value is “off”.
Training algorithm
The training algorithm is used for the calculation and adjustment of weights. The flowchart for
the training algorithm of Hebb network is given below
Step 0: First initialize the weights. Basically in this network they may be set to zero, i.e., w; =
0, for i= 1 to n where "n" may be the total number of input neurons.
Step 1: Steps 2-4 have to be performed for each input training vector and target output pair,
s: t.
Step 2: Input units activations are set. Generally, the activation function of input layer is
identity function: xi = si for i=1 to n
Step 3: Output units activations are set: y = t.
Step 4: Weight adjustments and bias adjustments are performed:
wi(new) = wi(old)+xiy
b(new)=b(old)+y
w(new) = w(old)+y
Δw = xy
As a result,
w(new)=w(old)+Δw
Hebb rule is used for pattern association, pattern categorization, pattern classification and over
a range of other areas.
Module – 2
• Perceptron networks
o Learning rule
o Training and testing algorithm
• Adaptive Linear Neuron
• Back propagation Network
o Architecture
o Training algorithm
2.1.1 Theory
Perceptron networks come under single-layer feed-forward networks and are also called
simple perceptrons. Various types of perceptrons were designed by Rosenblatt (1962) and
Minsky-Papert (1969, 1988).
1. The perceptron network consists of three units, namely, sensory unit (input unit),
associator unit (hidden unit), and response unit (output unit).
2. The sensory units are connected to associator units with fixed weights having values 1, 0
or -l, which are assigned at random.
3. The binary activation function is used in sensory unit and associator unit.
4. The response unit has an activation of l, 0 or -1. The binary step with fixed threshold ɵ is
used as activation for associator. The output signals that are sent from the associator unit
to the response unit are only binary.
5. The output of the perceptron network is given by
𝑦 = 𝑓(𝑦𝑖𝑛 )
1 𝑖𝑓𝑦𝑖𝑛 > 𝜃
𝑓(𝑦𝑖𝑛 ) = { 0 𝑖𝑓 − 𝜃 ≤ 𝑦𝑖𝑛 ≤ 𝜃
−1 𝑖𝑓 𝑦𝑖𝑛 < −𝜃
6. The perceptron learning rule is used in the weight updation between the associator unit
and the response unit. For each training input, the net will calculate the response and it
will determine whether or not an error has occurred.
7. The error calculation is based on the comparison of the values of targets with those of
the ca1culated outputs.
8. The weights on the connections from the units that send the nonzero signal will get
adjusted suitably.
9. The weights will be adjusted on the basis of the learning rule an error has occurred for a
particular training patterns .i.e..,
𝑏(𝑛𝑒𝑤) = 𝑏(𝑜𝑙𝑑) + 𝛼 𝑡
If no error occurs, there is no weight updation and hence the training process may be
stopped. In the above equations, the target value "t" is +1 or-l and α is the learning rate. In
general, these learning rules begin with an initial guess at the weight values and then
successive adjustments are made on the basis of the evaluation of an objective function.
Eventually, the learning rules reach a near optimal or optimal solution in a finite number of
steps.
A Perceptron network with its three units is shown in above figure. The sensory unit can be
a two-dimensional matrix of 400 photodetectors upon which a lighted picture with
geometric black and white pattern impinges. These detectors provide a binary (0) electrical
signal if the input signal is found to exceed a certain value of threshold. Also, these detectors
are connected randomly with the associator unit. The associator unit is found to consist of a
set of subcircuits called feature predicates. The feature predicates are hardwired to detect the
specific feature of a pattern and are equivalent to the feature detectors. For a particular
feature, each predicate is examined with a few or all of the responses of the sensory unit. It
can be found that the results from the predicate units are also binary (0 or 1). The last unit,
i.e. response unit, contains the pattern recognizers or perceptrons. The weights present in the
input layers are all fixed, while the weights on the response unit are trainable.
Learning signal is the difference between desired and actual response of a neuron. The
perceptron learning rule is explained as follows:
Consider a finite "n" number of input training vectors, with their associated target (desired)
values x(n) and t(n), where “n” ranges from 1 to N. The target is either +1 or -1. The output
''y" is obtained on the basis of the net input calculated and activation function being applied
over the net input.
1 𝑖𝑓𝑦𝑖𝑛 > 𝜃
𝑦 = 𝑓 (𝑦𝑖𝑛 ) = { 0 𝑖𝑓 − 𝜃 ≤ 𝑦𝑖𝑛 ≤ 𝜃
−1 𝑖𝑓 𝑦𝑖𝑛 < −𝜃
If y ≠ t, then
else,
𝑤(𝑛𝑒𝑤) = 𝑤(𝑜𝑙𝑑)
The weights can be initialized at any values in this method. The perceptron rule convergence
theorem states that “ If there is a weight vector W such that 𝑓(𝑥(𝑛)𝑊) = 𝑡(𝑛), for all n
then for any starting vector w1, the perceptron learning rule will convergence to a weight
vector that gives the correct response for all training patterns, and this learning takes place
within a finite number of steps provided that the solution exists”.
2.1.3 Architecture
Here only the weights between the associator unit and the output unit can be adjusted, and
the weights between the sensory and associator units are fixed.
Step 0: Initialize the weights and the bias (for easy calculation they can be set to zero). Also
initialize the learning rate α (0< α ≤ 1). For simplicity α is set to 1.
Step 1: Perform Steps 2-6 until the final stopping condition is false.
Step 2: Perform Steps 3-5 for each training pair indicated by s:t.
Step 3: The input layer containing input units is applied with identity activation functions:
𝑥𝑖 = 𝑠𝑖
Step 4: Calculate the output of the network. To do so, first obtain the net input:
𝑦𝑖𝑛 = 𝑏 + ∑ 𝑥𝑖 𝑤𝑖
𝑖=1
Where "n" is the number of input neurons in the input layer. Then apply activations over the
net input calculated to obtain the output:
1 𝑖𝑓𝑦𝑖𝑛 > 𝜃
𝑦 = 𝑓(𝑦𝑖𝑛 ) = { 0 𝑖𝑓 − 𝜃 ≤ 𝑦𝑖𝑛 ≤ 𝜃
−1 𝑖𝑓 𝑦𝑖𝑛 < −𝜃
Step 5: Weight and bias adjustment: Compare the value of the actual (calculated) output and
desired (target) output.
If y ≠ t, then
𝑏(𝑛𝑒𝑤) = 𝑏(𝑜𝑙𝑑 ) + 𝛼 𝑡
else,
𝑤𝑖 (𝑛𝑒𝑤) = 𝑤𝑖 (𝑜𝑙𝑑)
𝑏(𝑛𝑒𝑤) = 𝑏(𝑜𝑙𝑑)
Step 6: Train the network until there is no weight change. This is the stopping condition for
the network. If this condition is not met, then start again from Step 2.
Step 0: The initial weights to be used here are taken from the training algorithms (the final
weights obtained during training).
Step 1: For each input vector X to be classified, perform Steps 2-3.
Step 2: Set activations of the input unit.
Step 3: Obtain the response of output unit.
𝑦𝑖𝑛 = ∑ 𝑥𝑖 𝑤𝑖
𝑖=1
1 𝑖𝑓𝑦𝑖𝑛 > 𝜃
𝑦 = 𝑓(𝑦𝑖𝑛 ) = { 0 𝑖𝑓 − 𝜃 ≤ 𝑦𝑖𝑛 ≤ 𝜃
−1 𝑖𝑓 𝑦𝑖𝑛 < −𝜃
Thus, the testing algorithm tests the performance of network. In the case of perceptron
network, it can be used for linear separability. Here the separating line may be based on the
value of threshold that is, the threshold used in the activation function must be a non
negative value.
The condition for separating the response from the region of positive to region of zero is
𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏 > 𝜃
The condition for separating the response from the region of zero to region of negative is
𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏 < −𝜃
The conditions above are stated for a single layer perceptron network with two input
neurons and one output neuron and one bias.
2.2.1 Theory
The units with linear activation function are called linear units. A network with a single
linear unit is called an Adaline (adaptive linear neuron). That is, in an Adaline, the input-
output relationship is linear. Adaline uses bipolar activation for its input signals and its
target output. The weights between the input and the output are adjustable. The bias in
Department of CSE , ICET 32
Soft Computing (CS361) Module 2
Adaline acts like an adjustable weight, whose connection is from a unit with activations
being always 1. Adaline is a net which has only one output unit. The Adaline network may
be trained using delta rule. The delta rule may also be called as least mean square (LMS)
rule or Widrow-Hoff rule. This learning rule is found to minimize the mean squared error
between the activation and the target value.
The perceptron learning rule originates from the Hebbian assumption while the delta rule is
derived from the gradient descendent method (it can be generalized to more than one layer).
Also, the perceptron learning rule stops after a finite number of leaning steps, but the
gradient-descent approach continues forever, converging only asymptotically to the solution.
The delta rule updates the weights between the connections so as to minimize the difference
between the net input to the output unit and the target value. The major aim is to minimize
the error over all training patterns. This is done by reducing the error for each pattern, one at
a time.
Where Δwi is the weight change; α the learning rate; x the vector of activation of input unit;
yin the net input to output unit, i.e., 𝑦𝑖𝑛 = ∑𝑛𝑖=1 𝑥𝑖 𝑤𝑖 ; t the target output. The delta rule in
case of several output units for adjusting the weight from ith input unit to the jth output unit
(for each pattern) is
2.2.3 Architecture
Adaline is a single unit neuron, which receives input from several units and also from one
unit called bias. The basic Adaline model consists of trainable weights. Inputs are either of
the two values (+ 1 or -1) and the weights have signs (positive or negative). Initially,
random weights are assigned. The net input calculated is applied to a quantizer transfer
function (possibly activation function) that restores the output to + 1 or -1. The Adaline
model compares the actual output with the target output and on the basis of the training
algorithm, the weights are adjusted.
Step 0: Weights and bias are set to some random values but not zero. Set the learning rate
parameter α.
Step 1: Perform Steps 2-6 when stopping condition is false.
Step 2: Perform Steps 3-5 for each bipolar training pair s:t.
𝑥𝑖 = 𝑠𝑖
𝑦𝑖𝑛 = 𝑏 + ∑ 𝑥𝑖 𝑤𝑖
𝑖=1
Step 6: If the highest weight change that occurred during training is smaller than a specified
tolerance then stop the training process, else continue. This is the rest for stopping condition
of a network.
Step 0: Initialize the weights. (The weights are obtained from the training algorithm.)
Step 1: Perform Steps 2-4 for each bipolar input vector x.
Step 2: Set the activations of the input units to x.
Step 3: Calculate the net input to the output unit:
𝑦𝑖𝑛 = 𝑏 + ∑ 𝑥𝑖 𝑤𝑖
Step 4: Apply the activation function over the net input calculated:
1 𝑖𝑓 𝑦𝑖𝑛 ≥ 0
𝑦= {
−1 𝑖𝑓 𝑦𝑖𝑛 < 0
2.3.1 Theory
The back propagation learning algorithm is one of the most important developments in
neural networks (Bryson and Ho, 1969; Werbos, 1974; Lecun, 1985; Parker, 1985;
The training of the BPN is done in three stages - the feed-forward of the input training
pattern, the calculation and back-propagation of the error, and updation of weights. The
testing of the BPN involves the computation of feed-forward phase only. There can be more
than one hidden layer (more beneficial) but one hidden layer is sufficient. Even though the
training is very slow, once the network is trained it can produce its outputs very rapidly.
2.3.2 Architecture
obtained from the net could be either binary (0, 1) or bipolar (-1, +1). The activation
function could be any function which increases monotonically and is also differentiable.
2.3.3 Flowchart
The terminologies used in the flowchart and in the training algorithm are as follows:
𝑧𝑗 = 𝑓(𝑧𝑖𝑛𝑗 )
𝑦𝑘 = 𝑓(𝑦𝑖𝑛𝑘 )
δk = error correction weight adjustment for wjk that is due to an error in unit yk, which is
back-propagated to the hidden units that feed into unit yk
δj = error correction weight adjustment for vij that is due to the back-propagation of error to
the hidden unit is zj.
The commonly used activation functions are binary, sigmoidal and bipolar sigmoidal
activation functions. These functions are used in the BPN because of the following
characteristics: (i) Continuity (ii) Differentiability iii) Non decreasing monotonic.
Step 0: Initialize weights and learning rate (take some small random values).
Feedforward Phase 1
Step 3: Each input unit receives input signal xi and sends it to the hidden unit (i = l to n).
Step 4: Each hidden unit zj (j = 1 to p) sums its weighted input signals to calculate net input:
Calculate output of the hidden unit by applying its activation functions over 𝑧𝑖𝑛𝑗
(binary or bipolar sigmoidal activation function):
𝑧𝑗 = 𝑓(𝑧𝑖𝑛𝑗 )
and send the output signal from the hidden unit to the input of output layer units.
Step 5: For each output unit 𝑦𝑘 (k = 1 to m), calculate the net input:
𝑦𝑘 = 𝑓(𝑦𝑖𝑛𝑘 )
Step 6: Each output unit 𝑦𝑘 (k=1 to m) receives a target pattern corresponding to the input
training pattern and computes the error correction term:
𝛿𝑘 = (𝑡𝑘 − 𝑦𝑘 )𝑓 ′ (𝑦𝑖𝑛𝑘 )
The derivative 𝑓 ′ (𝑦𝑖𝑛𝑘 ) can be calculated as in activation function section. On the basis of
the calculated error correction term, update the change in weights and bias:
Step 7: Each hidden unit (𝑧𝑗 = 1 to p) sums its delta inputs from the output units:
𝛿𝑖𝑛𝑗 = ∑ 𝛿𝑘 𝑤𝑗𝑘
𝑘=1
The term 𝛿𝑖𝑛𝑗 gets multiplied with the derivative of 𝑓(𝑧𝑖𝑛𝑗 ) to calculate the error term:
𝛿𝑗 = 𝛿𝑖𝑛𝑗 𝑓 ′ (𝑧𝑖𝑛𝑗 )
Step 8: Each output unit (yk, k = 1 to m) updates the bias and weights:
Step 9: Check for the stopping condition. The stopping condition may be certain number of
epochs reached or when the actual output equals the target output.
2.4 Madaline
1. Automobile and other vehicle subsystems, such as automatic transmissions, ABS and cruise
control (e.g. Tokyo monorail).
2. Air conditioners.
3. Auto focus on cameras.
4. Digital image processing, such as edge detection.
5. Rice cookers.
6. Dishwashers.
7. Elevators.
Department of CSE , ICET 47
Soft Computing (CS361) Module 2
Module – 3
• Fuzzy logic
• Fuzzy sets
o Properties
o Operations on fuzzy sets
• Fuzzy relations
o Operations on fuzzy relations
Figure 3.1: A fuzzy logic system accepting imprecise data and providing a decision
In 1965 Lotfi Zadeh, published his famous paper “Fuzzy sets”. This new logic for representing
and manipulating fuzzy terms was called fuzzy logic, and Zadeh became the Master/Father of
fuzzy logic.
Fuzzy logic is the logic underlying approximate, rather than exact, modes of reasoning. It
operates on the concept of membership. The membership was extended to possess various
"degrees of membership" on the real continuous interval [0, l].
In fuzzy systems, values are indicated by a number (called a truth value) ranging from 0 to l,
where 0.0 represents absolute falseness and 1.0 represents absolute truth.
A classical set is a collection of objects with certain characteristics. For example, the user may
define a classical set of negative integers, a set of persons with height less than 6 feet, and a set
of students with passing grades. Each individual entity in a set is called a member or an element
of the set.
There are several ways for defining a set. A set may be defined using one of the following:
xi + 1
A = {xi = , i = 1 to 10, where xi = 1}
5
4. The set may be defined on the basis of the results of a logical operation.
Example A = {x|x is an element belonging to P AND Q}
5. There exists a membership function, which may also be used to define a set. The
membership is denoted by the letter 𝜒 and the membership function for a set A is given by
(for all values of x).
1, 𝑖𝑓 𝑥 ∈ 𝐴
𝜒𝐴 (𝑥) = {
0, 𝑖𝑓 𝑥 ∉ 𝐴
The set with no elements is defined as an empty set or null set. It is denoted by symbol Ø.
The set which consist of all possible subset of a given set A is called power set
𝑃(𝐴) = {𝑥|𝑥 ⊆ 𝐴}
2.14.1 Properties
1. Commutativity
𝐴 ∪ 𝐵 = 𝐵 ∪ 𝐴; 𝐴 ∩ 𝐵 = 𝐵 ∩ 𝐴
2. Associativity
𝐴 ∪ (𝐵 ∪ 𝐶) = (𝐴 ∪ 𝐵) ∪ 𝐶; 𝐴 ∩ (𝐵 ∩ 𝐶) = (𝐴 ∩ 𝐵) ∩ 𝐶
3. Distributivity
𝐴 ∪ (𝐵 ∩ 𝐶) = (𝐴 ∪ 𝐵) ∩ (𝐴 ∪ 𝐶)
𝐴 ∩ (𝐵 ∪ 𝐶) = (𝐴 ∩ 𝐵) ∪ (𝐴 ∩ 𝐶)
4. Idempotency
𝐴 ∪ 𝐴 = 𝐴; 𝐴 ∩ 𝐴 = 𝐴
5. Transitivity
𝐼𝑓 𝐴 ⊆ 𝐵 ⊆ 𝐶, 𝑡ℎ𝑒𝑛 𝐴 ⊆ 𝐶
6. Identity
𝐴 ∪ ∅ = 𝐴, 𝐴 ∩ ∅ = ∅
𝐴 ∪ 𝑋 = 𝑋, 𝐴 ∩ 𝑋 = 𝐴
7. Involution (double negation)
𝐴̿ = 𝐴
8. Law of excluded middle
𝐴 ∪ 𝐴̅ = 𝑋
9. Law of contradiction
𝐴 ∩ 𝐴̅ = ∅
10. DeMorgans law
̅̅̅̅̅̅̅
|𝐴 ∩ 𝐵 | = 𝐴̅ ∪ 𝐵̅ ; |𝐴
̅̅̅̅̅̅̅
∪ 𝐵 | = 𝐴̅ ∩ 𝐵̅ ;
1. Union
The union between two sets gives all those elements in the universe that belong to either
set A or set B or both sets A and B. The union operation can be termed as a logical OR
operation. The union of two sets A and B is given as
𝐴 ∪ 𝐵 = {𝑥|𝑥 ∈ 𝐴 𝑜𝑟 𝑥 ∈ 𝑏}
The union of sets A and B is illustrated by the Venn diagram shown below
2. Intersection
The intersection between two sets represents all those elements in the universe that
simultaneously belong to both the sets. The intersection operation can be termed as a
logical AND operation. The intersection of sets A and B is given by
𝐴 ∩ 𝐵 = {𝑥|𝑥 ∈ 𝐴 𝑎𝑛𝑑 𝑥 ∈ 𝑏}
The intersection of sets A and B is illustrated by the Venn diagram shown below
3. Complement
The complement of set A is defined as the collection of all elements in universe X that
do not reside in set A, i.e., the entities that do not belong to A. It is denoted by A and is
defined as
𝐴̅ = {𝑥|𝑥 ∉ 𝐴, 𝑥 ∈ 𝑋}
where X is the universal set and A is a given set formed from universe X. The
complement operation of set A is show below
4. Difference (Subtraction)
The difference of set A with respect to ser B is the collection of all elements in the
universe that belong to A but do not belong to B, i.e., the difference set consists of all
elements that belong to A bur do not belong to B. It is denoted by A l B or A- B and is
given by
(A) (B)
Figure 3.6: (A) Difference A|B or (A-B); (B) Difference B|A or (B-A)
1, 𝑥∈𝐴
𝜒𝐴 (𝑥) = {
0, 𝑥∉𝐴
where 𝜒𝐴 is the membership in set A for element x in the universe. The membership concept
represents mapping from an element x in universe X to one of the two elements in universe
Y (either to element 0 or 1).
Let A and B be two sets in universe X. The function-theoretic forms of operations performed
between these two sets are given as follows:
1. Union
2. Intersection
3. Complement
𝐴 = {(𝑥, µ𝐴 (𝑥)) | 𝑥 ∈ 𝑋}
~ ~
where µ𝐴 (𝑥) is the degree of membership of x in 𝐴 and it indicates the degree that x belongs to
~ ~
𝐴 . In the fuzzy theory, fuzzy set A of universe X is defined by function µ𝐴 (𝑥) called the
~ ~
membership function of set A.
µ𝐴 (𝑥) = 0 if x is not in A;
~
This set allows a continuum of possible choices. For any element x of universe X, membership
function A(x) equals the degree to which x is an element of set A. This degree, a value
between 0 and 1, represents the degree of membership, also called membership value, of
element x in set A.
From figure 3.7 it can be noted that "a" is clearly a member of fuzzy set P, "c" is clearly not a
member of fuzzy set P and the membership of "b" is found to be vague. Hence "a" can take
membership value 1, "c" can take membership value 0 and "b" can take membership value
between 0 and 1 [0 to 1], say 0.4, 0.7, etc. This is said to be a partial membership of fuzzy set P.
There are other ways of representation of fuzzy sets; all representations allow partial
membership to be expressed. When the universe of discourse U is discrete and finite, fuzzy
set 𝐴 is given as follows:
~
𝑛
µ𝐴 (𝑥1 ) µ𝐴 (𝑥2 ) µ𝐴 (𝑥3 ) µ𝐴 (𝑥𝑖 )
~ ~ ~
𝐴={ + + + ⋯ } = {∑ ~ }
~ 𝑥1 𝑥2 𝑥3 𝑥𝑖
𝑖=1
2.15.1 Properties
Fuzzy sets follow the same properties as crisp sets except for the law of excluded middle
and law of contradiction.
𝐴 ∪ 𝐴̅ = 𝑈; 𝐴 ∩ 𝐴̅ = ∅
~ ~ ~ ~
1. Commutativity
𝐴 ∪ 𝐵 = 𝐵 ∪ 𝐴; 𝐴 ∩ 𝐵 = 𝐵 ∩ 𝐴
~ ~ ~ ~ ~ ~ ~ ~
2. Associativity
𝐴 ∪ (𝐵 ∪ 𝐶 ) = (𝐴 ∪ 𝐵 ) ∪ 𝐶
~ ~ ~ ~ ~ ~
𝐴 ∩ (𝐵 ∩ 𝐶 ) = (𝐴 ∩ 𝐵 ) ∩ 𝐶
~ ~ ~ ~ ~ ~
3. Distributivity
𝐴 ∪ (𝐵 ∩ 𝐶 ) = (𝐴 ∪ 𝐵 ) ∩ (𝐴 ∪ 𝐶 )
~ ~ ~ ~ ~ ~ ~
𝐴 ∩ (𝐵 ∪ 𝐶 ) = (𝐴 ∩ 𝐵 ) ∪ (𝐴 ∩ 𝐶 )
~ ~ ~ ~ ~ ~ ~
4. Idempotency
𝐴 ∪ 𝐴 = 𝐴; 𝐴 ∩ 𝐴 = 𝐴
~ ~ ~ ~ ~ ~
5. Transitivity
𝐼𝑓 𝐴 ⊆ 𝐵 ⊆ 𝐶 , 𝑡ℎ𝑒𝑛 𝐴 ⊆ 𝐶
~ ~ ~ ~ ~
6. Identity
𝐴 ∪ ∅ = 𝐴 𝑎𝑛𝑑 𝐴 ∪ 𝑈 = 𝑈
~ ~ ~
𝐴 ∩ ∅ = ∅ 𝑎𝑛𝑑 𝐴 ∩ 𝑈 = 𝐴
~ ~ ~
8. DeMorgans law
̅̅̅̅̅̅̅
|𝐴 ∩ 𝐵 | = 𝐴̅ ∪ 𝐵̅ ; |𝐴
̅̅̅̅̅̅̅
∪ 𝐵 | = 𝐴̅ ∩ 𝐵̅ ;
~ ~ ~ ~ ~ ~ ~ ~
1. Union
𝜇𝐴∪𝐵 (𝑥) = 𝑚𝑎𝑥 [µ𝐴 (𝑥), µ𝐵 (𝑥)] = µ𝐴 (𝑥) ∨ µ𝐵 (𝑥) 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑥 ∈ 𝑈
~ ~ ~ ~ ~ ~
where ∨ indicates max operator. The Venn diagram for union operation of fuzzy sets
𝐴and 𝐵 is shown below figure.
~ ~
2. Intersection
𝜇𝐴∩𝐵 (𝑥) = 𝑚𝑖𝑛 [µ𝐴 (𝑥), µ𝐵 (𝑥)] = µ𝐴 (𝑥) ∧ µ𝐵 (𝑥) 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑥 ∈ 𝑈
~ ~ ~ ~ ~ ~
where ∧ indicates min operator. The Venn diagram for intersection operation of fuzzy
sets 𝐴 and 𝐵 is shown below figure.
~ ~
3. Complement
The Venn diagram for complement operation of fuzzy set 𝐴 is shown below figure.
~
b. Algebraic product
The algebraic product (𝐴. 𝐵 ) of fuzzy sets, fuzzy sets 𝐴 𝑎𝑛𝑑 𝐵 is defined as
~ ~ ~ ~
c. Bounded sum
d. Bounded difference
An ordered r-tuple is an ordered sequence of r-elements expressed in the form (a1, a2, a3, ... ,
ar). An unordered tuple is a collection of r-elements without any restrictions in order. For r =
2, the r-tuple is called an ordered pair. For crisp sets A1, A2, ... , Ar, the set of all r-tuples (a1,
a2, a3, ... , ar), where a1∈ A1, a2 ∈ A2 ... , ar ∈ Ar is called me Cartesian product of A1,A2 .. ,Ar
and is denoted by A1 x A2 x ... x Ar.
𝑋 × 𝑌 = {(𝑥, 𝑦)| 𝑥 ∈ 𝑋, 𝑦 ∈ 𝑌}
Here the Cartesian product forms an ordered pair of every 𝑥 ∈ 𝑋 with every 𝑦 ∈ 𝑌. Every
element in X is completely related to every element in Y. The characteristic function,
denoted by χ, gives the strength of the relationship between ordered pair of elements in each
universe. If it takes unity as its value, then complete relationship is found; if the value is
zero, then there is no relationship, i.e.,
1, (𝑥, 𝑦) ∈ 𝑋 × 𝑌
𝜒𝑋×𝑌 (𝑥, 𝑦) = {
0, (𝑥, 𝑦) ∉ 𝑋 × 𝑌
When the universes or sets are finite, then the relation is represented by a matrix called
relation matrix. An r-dimensional relation matrix represents an r-ary relation. Thus, binary
relations are represented by two-dimensional matrices.
X={2,4,6} Y= {p,q,r}
X × Y= {(p, 2), (p, 4), (p, 6), (q, 2), (q, 4), (q, 6), (r, 2), (r, 4), (r, 6)}
R P Q R
2 1 0 0
4 0 1 1
6 0 0 1
The relation between sets X and Y may also be expressed by mapping representations as
shown in below figure.
A binary relation in which each element from set X is not mapped to more than one element
in second set Y is called a function and is expressed as
𝑅: 𝑋 → 𝑌
The characteristic function is used to assign values of relationship in the mapping of the
Cartesian space X × Y to the binary values (0, 1) and is given by
1, (𝑥, 𝑦) ∈ 𝑅
𝜒𝑅 (𝑥, 𝑦) = {
0, (𝑥, 𝑦) ∉ 𝑅
(A) (B)
The constrained Cartesian product for sets when r = 2 (i.e., A×A=A2) is called identity
relation, and the unconstrained Cartesian product for sets when r = 2 is called universal
relation.
Then universal relation (UA) and identity relation (IA) are given as follows:
UA = {(2,2),(2,4),(2,6),(4,2),(4,4),(4,6),(2,6),(4,6),(6,6)}
IA = {(2,2),(4,4),(6,6)}
Consider n elements of the universe X being related to m elements of universe Y. When the
cardinality of X= 𝑛𝑋 and the cardinality of Y =𝑛𝑌 , then the cardinality of relation R between
the two universe is
𝑛𝑋×𝑌 = 𝑛𝑋 × 𝑛𝑌
The cardinality of the power set P(X × Y) describing the relation is given by
𝑛𝑃(𝑋×𝑌) = 2(𝑛𝑋𝑛𝑌 )
Let R and S be two separate relations on the Cartesian universe X ×Y. The null relation and
the complete relation are defined by the relation matrices ØR and ER. An example of a 3 X 3
form of the ØR and ER matrices is given below:
0 0 0 1 1 1
∅𝑅 = [0 0 0] and 𝐸𝑅 = [1 1 1]
0 0 0 1 1 1
1. Union
R ∪ S → χR∪S (x, y): χR∪S (x, y) = max[χR (x, y), χS (x, y)]
2. Intersection
R ∩ S → χR∩S (x, y): χR∩S (x, y) = min[χR (x, y), χS (x, y)]
3. Complement
̅ → χR̅ (x, y) ∶ χR̅ (x, y) = 1 − χR̅ (x, y)
R
4. Containment
R ⊂ S → χR (x, y): χR (x, y) ≤ χS (x, y)
5. Identity
∅ → ∅R and X → ER
Let R be a relation that maps elements from universe X to universe an e a relation that maps
elements from universe Y to universe Z
1. Max-min composition
The max-min composition is defined by the function theoretic expression as
𝑇= 𝑅∘𝑆
𝜒𝑇 (𝑥, 𝑧) = ⋁𝑦∈𝑌[χR (x, y) ∧ χS (y, z)]
2. Max-product composition
The max-product composition is defined by the function theoretic expression as
𝑇 = 𝑅∘𝑆
𝜒𝑇 (𝑥, 𝑧) = ⋁𝑦∈𝑌[χR (x, y) . χS (y, z)]
A fuzzy relation is a fuzzy set defined on the Cartesian product of classical sets {X I, X2, ... Xn}
where tuples (x1, x2, xn) may have varying degrees of membership µ R (x1,x2, .. , xn) within the
relation.
A fuzzy relation between two sets X and Y is called binary fuzzy relation and is denoted by
R(X,Y). A binary relation R(X,Y) is referred to as bipartite graph when X≠Y. The binary
relation on a single set X is called directed graph or digraph. This relation occurs when X=Y
and is denoted as R(X,X) or R(X2).
Let
1. Union
2. Intersection
3. Complement
µ𝑅̅ (x, y) = 1 − µ𝑅 (x, y)
~ ~
4. Containment
𝑅 ⊂ 𝑆 → µ𝑅 (x, y) ≤ µ𝑆 (x, y)
~ ~ ~ ~
5. Inverse
The inverse of a fuzzy relation R on X × Y is denoted by R-1. It is a relation on Y × X
defined by 𝑅−1 (𝑦, 𝑥) = 𝑅(𝑥, 𝑦) for all pairs(𝑦, 𝑥) ∈ 𝑌 × 𝑋.
6. Projection
For a fuzzy relation R(X,Y), let [𝑅 ↓ 𝑌] denote the projection of R onto Y. Then [𝑅 ↓ 𝑌]
is a fuzzy relation in Y whose membership function is defined by
(𝑥, 𝑦)
𝜇[𝑅↓𝑌] (𝑥, 𝑦) = 𝑚𝑎𝑥 𝜇𝑅
𝑥 ~
• Fuzzy Composition
Let 𝐴 be a fuzzy set on universe X and 𝐵 be a fuzzy set on universe Y. The Cartesian
~ ~
product over 𝐴 and 𝐵 results in fuzzy relation 𝑅 and is contained within the entire
~ ~ ~
𝐴×𝐵 = 𝑅
~ ~ ~
where
𝑅 ⊂𝑋×𝑌
~
There also exists fuzzy min-max composition method, but the most commonly used
technique is fuzzy max-min composition. Let 𝑅 be fuzzy relation on Cartesian
~
µ 𝑇 (x, z) = µ𝑅°𝑆 (x, z) = max {min [µ𝑅 (x, y), µ𝑆 (y, z)]}
~ ~ ~ y∈Y ~ ~
The min-max composition of R(X,Y) and S(Y,Z), denoted by 𝑅(𝑋, 𝑌)°𝑆(𝑌, 𝑍)is defined
by T(X,Z) as
µ 𝑇 (x, z) = µ𝑅°𝑆 (x, z) = min {max [µ𝑅 (x, y), µ𝑆 (y, z)]}
~ ~ ~ y∈Y ~ ~
(𝑅 ° 𝑆) ° 𝑀 = 𝑅° (𝑆 ° 𝑀)
~ ~ ~ ~ ~ ~
9. Fails safely.
10. Modified and tweaked easily.
8. Automobile and other vehicle subsystems, such as automatic transmissions, ABS and cruise
control (e.g. Tokyo monorail).
9. Air conditioners.
10. Auto focus on cameras.
11. Digital image processing, such as edge detection.
12. Rice cookers.
13. Dishwashers.
14. Elevators.
15. Washing machines and other home appliances.
16. Video game artificial intelligence.
17. Language filters on message boards and chat rooms for filtering out offensive text.
18. Pattern recognition in Remote Sensing.
19. Fuzzy logic has also been incorporated into some microcontrollers and microprocessors.
20. Bus Time Tables.
21. Predicting genetic traits. (Genetic traits are a fuzzy situation for more than one reason).
22. Temperature control (heating/cooling).
23. Medical diagnoses.
24. Predicting travel time.
25. Antilock Braking System.
Module – 4
Membership function defines the fuzziness in a fuzzy set irrespective of the elements in the set,
which are discrete or continuous. A fuzzy set 𝐴 in the universe of discourse X can be defined
~
𝐴 = {(𝑥, µ𝐴 (𝑥)) | 𝑥 ∈ 𝑋}
~ ~
where µ𝐴 (.) is called membership function of 𝐴 . The membership function µ𝐴 (.) maps X to the
~ ~ ~
membership space M, ie., µ𝐴 ∶ 𝑋 → 𝑀. The membership value ranges in the interval [0, 1] ie.,
~
the range of the membership function is a subset of the non-negative real numbers whose
supremum is finite.
The three main basic features involved in characterizing membership function are the following.
1. Core
The core of a membership function for some fuzzy set 𝐴 is defined as that region of
~
universe that is characterized by complete membership in the set 𝐴 . The core has elements x
~
µ𝐴 (𝑥) = 1
~
2. Support
The support of a membership function for a fuzzy set 𝐴 is defined as that region of
~
µ𝐴 (𝑥) > 0
~
A fuzzy set whose support is a single element in X with µ𝐴 (𝑥) = 1 is referred to as a fuzzy
~
singleton.
The boundary elements are those which possess partial membership in the fuzzy set 𝐴 .
~
A fuzzy set whose membership function has at least one element x in the universe whose
membership value is unity is called normal fuzzy set. The element for which the membership is
equal to 1 is called prototypical element. A fuzzy set where no membership function has its
value equal to 1 is called subnormal fuzzy set.
Figure 4.2: (A) Normal fuzzy set and (B) subnormal fuzzy set
A convex fuzzy set has a membership function whose membership values are strictly
monotonically increasing or strictly monotonically decreasing or strictly monotonically
Department of CSE, ICET 69
increasing than strictly monotonically decreasing with increasing elements in the universe. A
fuzzy set possessing characteristics opposite to that of convex fuzzy set is called non convex
fuzzy set.
Figure 4.3: (A) Convex normal fuzzy set and (B) Nonconvex normal fuzzy set
The convex normal fuzzy set can be defined in the following way. For elements x1, x2 and x3 in
a fuzzy set 𝐴 , if the following relation between x1, x2 and x3 holds. i.e.,
~
The element in the universe for which a particular fuzzy set 𝐴 has its value equal to 0.5 is
~
called crossover point of a membership function. The membership value of a crossover point of
a fuzzy set is equal to 0.5, ie., µ𝐴 (𝑥) = 0.5. There can be more than one crossover point in a
~
fuzzy set. The maximum value of the membership function in a fuzzy set 𝐴 is called as the
~
height of the fuzzy set. For a normal fuzzy set, the height is equal to 1, because the maximum
value of the membership function allowed is 1. Thus, if the height of a fuzzy set is less than 1,
then the fuzzy set is called subnormal fuzzy set.
It is the process of transforming crisp set to a fuzzy set or a fuzzy set to a fuzzifier set. For a
keeping 𝜇𝑖 constant and 𝑥𝑖 being transformed to a fuzzy set 𝑄(𝑥𝑖 ) depicting the expression
about 𝑥𝑖 . The fuzzy set 𝑄(𝑥𝑖 )is referred to as the kernel of fuzzification. The fuzzified set 𝐴
~
can be expressed as
where the symbol ~ means fuzzified. This process of fuzzification is called support fuzzification
(s-fuzzification). There is another method of fuzzification called grade fuzzification (g-
fuzzification) where xi is kept constant and µi is expressed as a fuzzy set. Thus, using these
methods, fuzzification is carried out.
4.3.1 Intuition
Intuition method is based upon the common intelligence of human.lt is the capacity of the
human to develop membership functions on the basis of their own intelligence and
understanding capacity. There should be an in-depth knowledge of the application to which
membership value assignment as to be made.
Figure 4.5 shows various shapes of weights of people measured in kilogram in the universe.
Each curve is a membership function corresponding to various fuzzy (linguistic) variables,
such as very lighter, light, normal, heavy and very heavy. The curves are based on context
functions and the human developing them. For example, if the weights are referred to range
of thin persons we get one set of curves, and if they are referred to range of normal weighing
persons we get another set and so on.
4.3.2 Inference
The inference method uses knowledge to perform deductive reasoning. Deduction achieves
conclusion by means inference. There are various methods for performing deductive
reasoning. Here the knowledge of geometrical shapes and geometry is used for defining
membership values. The membership functions may be defined by various shapes:
triangular, trapezoidal, bell-shaped, Gaussian and so on. The inference method here is
discussed via triangular shape.
Consider a triangle, where X,Y and Z are angles such that X ≥ Y ≥ Z ≥ 0, and let U be the
universe of triangles i.e.,
𝑇= other triangles
~
1
µ 𝐼 (𝑋, 𝑌, 𝑍) = 1 − 𝑚𝑖𝑛(𝑋 − 𝑌, 𝑌 − 𝑍)
~ 60○
1
µ𝑅 (𝑋, 𝑌, 𝑍) = 1 − |𝑋 − 90○ |
~ 90○
Membership value of appropriate isosceles right angled triangle is obtained by taking the
logical intersection of the approximate isosceles and approximate right-angle triangle
membership function i.e.,
𝐼𝑅 = 𝐼 ∩ 𝑅
~ ~ ~
and it is given by
1
µ𝐸 (𝑋, 𝑌, 𝑍) = 1 − |𝑋 − 𝑍|
~ 180○
The membership function of other triangles, denoted by 𝑇 , is the complement of the logical
~
𝑇 = 𝐼 ∪𝑅∪𝐸
~ ~ ~ ~
𝑇 = 𝐼̅ ∩ 𝑅̅ ∩ 𝐸̅
~ ~ ~ ~
The formation of government is based on the polling concept; to identify a best student,
ranking may be performed; to buy a car, one can ask for several opinions and so on.
Consider a fuzzy set 𝐴 . The set 𝐴𝜆 (0 < 𝜆 < 1), called the lambda (𝜆) -cut (or alpha [𝛼] -cut)
~
The set 𝐴𝜆 is called weak lambda-cut set if it consists of all the elements of a fuzzy set whose
membership function have values greater than or equal to the specified value. The set 𝐴𝜆 is
called strong lambda cut if it consist of all elements of a fuzzy set whose membership functions
have values strictly greater than a specified value. A strong 𝜆 – cut set is given by
1. (𝐴 ∪ 𝐵 ) = 𝐴λ ∪ 𝐵λ
~ ~ λ
2. (𝐴 ∩ 𝐵 ) = 𝐴λ ∩ 𝐵λ
~ ~ λ
Defuzzification is the process of conversion of a fuzzy quantity into a precise quantity. The
output of a fuzzy set process may be union of two or more fuzzy membership functions defined
on the universe of discourse of the output variable.
Figure 4.7: (A) First part of fuzzy output, (B) second part of fuzzy output (C) union of parts (A) and (B)
This method is also known as height method and is limited to peak output functions. This
method is given by the algebraic expression
This method is also known as center of mass, center of area or center of gravity method. It is
the most commonly used defuzzification method. The defuzzified output x* is defined as,
∫ 𝜇𝑐 (𝑥). 𝑥𝑑𝑥
𝑥∗ = ~
∫ 𝜇𝑐 (𝑥 )𝑑𝑥
~
This method is valid for symmetrical output membership functions only. Each membership
is weighted by its maximum membership value. The output is given by,
∑ 𝜇 𝑐 (𝑥̅ 𝑖 ). 𝑥̅ 𝑖
∗ ~
𝑥 =
∑ 𝜇 𝑐 (𝑥̅ 𝑖 )
~
where ∑ denotes algebraic sum and 𝑥̅ 𝑖 is the maximum of the i th membership function. The
method is illustrated in figure 4.10, where two fuzzzy sets are considered. From the figure
the defuzzified output is given by
0.5𝑎 + 0.8𝑏
𝑥∗ =
0.5 + 0.8
This method is also known as the middle of the maxima. This is closely related to method,
except that the locations of the maximum membership can be nonunique. The output here is
given by
∑𝑛𝑖=1 𝑥̅ 𝑖
𝑥∗ =
𝑛
The method is illustrated in figure 4.11, where two fuzzzy sets are considered. From the
figure the defuzzified output is given by
𝑎+𝑏
𝑥∗ =
2
This method employs sum of the individual fuzzy subsets instead of their union. The
calculations here are very fast, bur the main drawback is that intersecting areas are added
twice. The defuzzified value x* is given by
∫𝑥 ∑𝑛𝑖=1 𝜇𝑐 (𝑥)𝑑𝑥
∗ ~
𝑥 =
∫𝑥 𝑥 ∑𝑛𝑖=1 𝜇 𝑐 (𝑥)𝑑𝑥
~
Figure 4.12 illustrates the center of sums method. In center of sums method, the weights are
the areas of the respective membership functions, whereas in the weighted average method
the weights are individual membership values.
This method is adopted when the output consist of atleast two convex fuzzy subsets which
are not overlapping. The output is biased towards a side of one membership function. When
the output of fuzzy set has atleast two convex regions, then the center of gravity of the
convex fuzzy sub region having the largest area is to obtain the defuzzified value x*.
∫ 𝜇𝑐𝑖 (𝑥 )𝑑𝑥
~
where 𝑐𝑖 is the convex subregion that has the largest area making up 𝑐𝑖 . Figure 4.13
~ ~
This method uses the overall output or union of all individual output fuzzy set 𝑐𝑖 for
~
determining the smallest value of the domain with maximized membership in 𝑐𝑗· The steps
~
Module – 5
Fuzzy logic uses linguistic variables. The values of a linguistic variable are words or
sentences in a natural or artificial language. For example, height is a linguistic variable if it
takes values such as tall, medium, short and so on. Consider the statement “John is tall”
implies that the linguistic variable John takes the linguistic value tall. The linguistic variable
provides approximate characterization of a complex problem. The name of the variable, the
universe of discourse and a fuzzy subset of universe of discourse characterize a fuzzy
variable. The range of possible values of a linguistic variable represents the universe of
discourse of that variable. For example, the universe of discourse of the linguistic variable
speed might have the range between 0 and 220 km/h and may include such fuzzy subsets as
very slow, slow, medium, fast, and very fast.
A linguistic variable is a variable of a higher order than a fuzzy variable and its values are
taken to be fuzzy variables. A linguistic variable is characterized by
A linguistic variable carries with it the concept of fuzzy set qualifiers, called hedges. Hedges
are terms that modify the shape of fuzzy sets. In the fuzzy set "very tall", the word "very" is a
linguistic hedge. A few popular linguistic hedges include: very, highly, slightly, moderately,
plus, minus, fairly, rather.
If it is not take the complement of membership value. For example not very short then take
the complement of very short.
Truth tables define logic functions of two propositions. Let X and Y be two propositions,
either of which can be true or false. The basic logic operations performed over the
propositions are the following:
On the basis of these operations on propositions, inference rules can be formulated. Few
inference rules are as follows:
[𝑋 ∧ (𝑋 =>𝑌)] => 𝑌
[𝑌̅ ∧ (𝑋 =>𝑌)] => 𝑋̅
[(𝑋 =>𝑌) ∧ (𝑌 =>𝑍)] => (𝑋 =>𝑍)
The above rules produce certain propositions that are always true irrespective of the truth
values of propositions X and Y. Such propositions are called tautologies.
The truth values of propositions in fuzzy logic are allowed to range over the unit interval [0,
1]. The truth value of the proposition '' Z is A," or simply the truth value of A, denoted by
1. Fuzzy predicates
In fuzzy logic the predicates can be fuzzy, for example, tall, short, quick. Hence, we have
proposition like "Peter is tall." It is obvious that most of the predicates in natural language
are fuzzy rather than crisp.
2. Fuzzy-predicate modifiers
In fuzzy logic, there exists a wide range of predicate modifiers that act as hedges, for
example, very, fairly, moderately, rather, slightly. These predicate modifiers are necessary
for generating the values of a linguistic variable. An example can be the proposition
"Climate is moderately cool," where "moderately" is the fuzzy predicate modifier.
3. Fuzzy quantifiers: The fuzzy quantifiers such as most, several, many, frequently are used
in fuzzy logic. Employing these we can have proposition like "Many people are educated."
A fuzzy quantifier can be interpreted as a fuzzy number or a fuzzy proposition.
4. Fuzzy qualifiers: There are four modes of qualification in fuzzy logic, which are as
follows:
• Fuzzy truth qualification
It is expressed as "x is τ," in which τ is a fuzzy truth value. A fuzzy truth value claims
the degree of truth of a fuzzy proposition. Consider the example,
Here the qualified proposition is (Paul is Young) and the qualifying fuzzy truth value is
"NOT Very True."
It is expressed as "x is π," where π is a fuzzy possibility and can be of the following
forms: possible, quire possible, almost impossible. These values can be interpreted as
labels of fuzzy subsets of the real line. Consider the example,
The general way of representing human knowledge is by forming natural language expressions
given by
The above expression is referred to as the IF- THEN rule based form. There are three general
forms that exist for any linguistic variable. They are: (a) assignment statements; (b) conditional
statements; (c) unconditional statements.
2. Conditional statements
The following are some examples.
IF y is very cool THEN stop.
IF A is high THEN B is low ELSE B is not low.
IF temperature is high THEN climate is hot.
The conditional statements use the "IF.THEN" rule-based form.
3. Unconditional statements
They can be of the form
Goto sum.
Stop.
Divide by a.
Turn the pressure low.
A compound rule is a collection of many simple rules combined together. Any compound rule
structure may be decomposed and reduced to a number of simple canonical rule forms. The
rules are generally based on natural language representations.
𝐴𝑚 = 𝐴1 ∩ 𝐴2 ∩ … ∩ 𝐴𝑛
~ ~ ~ ~
In view of the fuzzy intersection operation, the compound rule may be rewritten as
𝐼𝐹 𝐴𝑚 𝑇𝐻𝐸𝑁 𝐵𝑚
~ ~
𝐴𝑚 = 𝐴1 ∪ 𝐴2 ∪ 𝐴3 ∪ … ∪ 𝐴𝑛
~ ~ ~ ~ ~
𝐼𝐹 𝐴1 𝑇𝐻𝐸𝑁 ( 𝐵1𝐸𝐿𝑆𝐸 𝐵2 )
~ ~ ~
can be decomposed into two simple canonical rule forms, connected by "OR":
𝐼𝐹 𝐴1 𝑇𝐻𝐸𝑁 𝐵1
~ ~
𝑂𝑅
𝐼𝐹 𝑁𝑂𝑇 𝐴1 𝑇𝐻𝐸𝑁 𝐵2
~ ~
can be decomposed as
𝐼𝐹 𝐴1 𝑇𝐻𝐸𝑁 𝐵1
~ ~
𝑂𝑅
𝐼𝐹 𝐴2 𝑇𝐻𝐸𝑁 𝑁𝑂𝑇 𝐵1
~ ~
The rule can be of the form "IF A1 THEN [IF A2 THEN (B1 )] " can be of the form
~ ~ ~
𝐼𝐹 𝐴1 𝐴𝑁𝐷 𝐴2 𝑇𝐻𝐸𝑁 𝐵1
~ ~ ~
Thus, based on all the above mentioned methods compound rules can be decomposed into
series of canonical simple rules.
Aggregation of rules is the process of obtaining the overall consequents from the individual
consequents provided by each rule. The following two methods are used for aggregation of
fuzzy rules:
Fuzzy rule based systems, fuzzy models, and fuzzy expert systems are generally known as
systems. The key unit of a fuzzy logic system is FIS. The primary work of this system is
decision making. FIS uses "IF ... THEN" rules along with connectors "OR" or "AND" for
making necessary decision rules. The input to FlS may be fuzzy or crisp, but the output from
FIS is always a fuzzy set.
Initially, in the fuzzification unit, the crisp input is convened into a fuzzy input. Various
fuzzification methods are employed for this. After this process, rule base is formed. Database
and rule base are collectively called the knowledge base. Finally, defuzzification process is
carried out to produce crisp output. Mainly, the fuzzy rules are formed in the rule base and
suitable decisions are made in the decision-making unit.
Ebsahim Mamdani proposed this system in the year 1975 to control a steam engine and
boiler combination by synthesizing a set of fuzzy rules obtained from people working on the
system. In this case, the output membership functions are expected to be fuzzy sets. After
aggregation process, each output variable is a fuzzy set, hence defuzzification is important at
the output stage. The steps include:
The fuzzy rules are formed using "IF-THEN" statements and "AND/OR'' connectives. The
consequence of the rule can be obtained in two steps:
1. By computing the rule strength completely using the fuzzified inputs from the fuzzy
combination.
2. By clipping the output membership function at the rule strength
The outputs of all the fuzzy rules are combined to obtain one fuzzy output distribution. From
FIS, it is desired to get only one crisp output. This crisp output may be obtained from
defuzzification process. The common techniques of defuzzification used are center of mass
and mean of maximum.
Sugeno fuzzy method was proposed by Takagi, Sugeno and Kang in the year 1985.The
format of the fuzzy rule of a Sugeno fuzzy model is given by
where AB are fuzzy sets in the antecedents and z = f (x,y) is a crisp function.
Sugeno's method can act as an interpolating supervisor for multiple linear controllers, which
are to be applied, because of the linear dependence of each rule on the input variables of a
system. A Sugeno model is suited for smooth interpolation of linear gains that would be
applied across the input space and for modeling nonlinear systems by interpolating between
multiple linear models. The Sugeno system uses adaptive techniques for constructing fuzzy
models. The adaptive techniques are used to customize the membership functions.
The main difference between Mamdani and Sugeno methods lies in the output membership
functions. The Sugeno output membership functions are either linear or constant. The
difference also lies in the consequents of their fuzzy rules as a result their aggregation and
defuzzification procedures differ suitably. A large number of fuzzy rules must be employed
in Sugeno method for approximating periodic or highly oscillatory functions. The
configuration of Sugeno fuzzy systems can be reduced and it becomes smaller than that of
Mamdani fuzzy systems if nontriangular or nontrapezoidal fuzzy input sets are used. Sugeno
controllers have more adjustable parameters in the rule consequent and the number of
parameters grows exponentially with the increase of the number of input variables. There
exist several mathematical results for Sugeno fuzzy controllers than for Mamdani
controllers. Formation of Mamdani FIS is easier than Sugeno FIS.
It is a learning mechanism that utilizes the training and learning algorithms from neural
networks to find parameters of a fuzzy system (i.e., fuzzy sets, fuzzy rules, fuzzy numbers, and
so on). The neuro-fuzzy is divided into two areas:
When neural networks are concerned, if one problem is expressed by sufficient number of
observed examples then only it can be used. These observations are used to train the black
box. Though no prior knowledge about the problem is needed extracting comprehensible
rules from a neural network's structure is very difficult.
A fuzzy system, on the other hand, does not need learning examples as prior knowledge,
rather linguistic rules are required. Moreover, linguistic description of the input and output
variables should be given. If the knowledge is incomplete, wrong or contradictory, then the
fuzzy system must be tuned. This is a time consuming process.
5.7.2 Characteristics
It can be represented by a three-layer feed forward neural network model. It can also be
observed that the first layer corresponds m the input variables·, and the second and third
layers correspond to the fuzzy rules and output variables, respectively. The fu7zy sets are
converted to (fuzzy) connection weights. NFS can also be considered as a system of fuzzy
rules wherein the system can be initialized in the form of fuzzy rules based on the prior
knowledge available. Some researchers use five layers- the fuzzy sets being encoded in the
units of the second and the fourth layer, respectively
5.7.3 Classification
1. Cooperative NFSs.
In this type of system, both artificial neural network (ANN) and fuzzy system work
independently from each other. Four different kinds of cooperative fuzzy neural networks
are shown in figure 5.5.
The FNN in figure 5.5(A) learns fuzzy set from the given training data. This is done,
usually, by fining membership functions with a neural network; the fuzzy sets then being
determined offline. This is followed by their utilization m form the fuzzy system by fuzzy
rules that are given, and not learned. The NFS in figure 5.5 (B) determines, by a neural
network, the fuzzy rules from the training data. Here again, the neural networks learn offline
before the fuzzy system is initialized. The rule learning happens usually by clustering on
self-organizing feature maps. There is also the possibility of applying fuzzy clustering
methods to obtain rules.
For the neuro-fuzzy model shown in figure 5.5 (C), the parameters of membership function
are learnt online, while the fuzzy system is applied. This means that, initially, fuzzy rules
and membership functions must be defined beforehand. Also, in order to improve and guide
the learning step, the error has to be measured. The model shown in figure 5.5 (D)
determines the rule weights for all fuzzy rules by a neural network. A rule is determined by
its rule weight-interpreted as the influence of a rule. They are then multiplied with the rule
output.
Membership functions expressing the linguistic terms of the inference rules should be
formulated for building a fuzzy controller. However, in fuzzy systems, no formal approach
exists to define these functions. Any shape, such as Gaussian or triangular or bell shaped or
Department of CSE, ICET 97
trapezoidal, can be considered as a membership function with an arbitrary set of parameters.
Thus for fuzzy systems, the optimization of these functions in terms of generalizing the data
is very important; this problem can be solved by using neural networks. Using learning
rules, the neural network must optimize the parameters by fixing a distinct shape of the
membership functions; for example, triangular. But regardless of the shape of the
membership functions, training data should also be available.
The neuro fuzzy hybrid systems can also be modeled in another method. In this case, the
training data is grouped into several clusters and each cluster is designed to represent a
particular rule. These rules are defined by the crisp data points and are not defined
linguistically. The testing can be carried out by presenting a random testing sample to the
trained neural network.
Module – 6
Genetic algorithm (GA) is reminiscent of sexual reproduction in which the genes of two
parents combine to form those of their children. When it is applied to problem solving, the
basic premise is that we can create an initial population of individual’s representing possible
solutions to a problem we are trying to solve. Each of these individuals has certain
characteristics that make them more or less fit as members of the population. The more fir
members will have a higher probability of mating and producing offspring that have a
significant chance of retaining the desirable characteristics of their parents than the less fit
members. This method is very effective at finding optimal or near-optimal solutions m a
wide variety of problems
The science that deals with the mechanisms responsible for similarities and differences in a
species is called Genetics. The word "genetics" is derived from the Greek word "genesis"
meaning "to grow" or "to become."
• Cell
Every animal/human cell is a complex of many "small" factories that work together. The
center of all this is the cell nucleus. The genetic information is contained in the cell
nucleus.
• Chromosomes
All the genetic information gets stored in the chromosomes. Each chromosome is build of
deoxyribonucleic acid (DNA). In humans, chromosomes exist in pairs (23 pairs found).
The chromosomes are divided into several parts called genes. Genes code the properties of
species, i.e., the characteristics of an individual. The possibilities of combination of the
genes for one property are called alleles, and a gene can take different alleles. For example,
there is a gene for eye color, and all the different possible alleles are black, brown, blue and
green (since no one has red or violet eyes!). The set of all possible alleles present in a
particular population forms a gene pool. This gene pool can determine all the different
possible variations for the future generations. The size of the gene pool helps in
determining the diversity of the individuals in the population. The set of all the genes of a
specific species is called genome. Each and every gene has a unique position on the
genome called locus.
• Genetics
A particular individual, the entire combination of genes is called genotype. The phenotype
describes the physical aspect of decoding a genotype to produce the phenotype.
Chromosomes contain two sets of genes. These are known as diploids.
• Reproduction
1. Mitosis: In mitosis the same genetic information is copied to new offspring. There is no
exchange of information. This is a normal way of growing of multicell structures, such
as organs.
2. Meiosis: Meiosis forms the basis of sexual reproduction. When meiotic division takes
place, two gametes appear in the process. When reproduction occurs, these two gametes
conjugate to a zygote which becomes the new individual.
Chromosome String
Gene Feature or Character
Allele Feature value
Locus String position
Genotype Structure or coded string
Parameter set, a decoded
Phenotype
structure
• Natural Selection
For example,
Giraffe with long necks can have food from tall trees as well from the ground; on the other
hand, goat and deer having smaller neck can have food only from the ground. As a result,
natural selection plays a major role in this survival process.
• Individuals
• Genes
Genes are the basic "instructions" for building a GA. A chromosome is a sequence of
genes. Genes may describe a possible solution to a problem, without actually being the
• Fitness
• Populations
Those chromosomes with a higher fitness value are more likely to reproduce
offspring (which can mutate after reproduction). The offspring is a product of the
father and mother, whose composition consists of a combination of genes from the
two (this process is known as "crossingover").
If the new generation contains a solution that produces an output that is close enough
or equal to the desired then the problem has been solved. If this is not the case, then
the new generation will go through the same process as their parents did. This will
continue until a solution is reached.
The basic operators include: encoding, selection, recombination and mutation operators. The
operators with their various types are explained with necessary examples.
6.2.1 Encoding
Encoding is a process of representing individual genes. The process can be performed using
bits, numbers, trees, arrays, lists or any other object.
Each chromosome encodes a binary (bit) string. Each bit in the string can represent some
characteristics of the solution. Every bit string therefore is a solution but not necessarily
the best solution. Another possibility is that the whole string can represent a number.
The way bit strings can code differs from problem to problem.
Binary encoding gives many possible chromosomes with a smaller number of alleles.
Binary coded strings with 1s and 0s are mostly used. The length of the string depends on
the accuracy.
• Octal Encoding
• Hexadecimal Encoding
• Value Encoding
Every chromosome is a string of values and the values can be anything connected to the
problem. In value encoding, every chromosome is a string of some values. Values can be
anything connected to problem, form numbers, real numbers or characters to some
complicated objects.
• Tree Encoding
This encoding is mainly used for evolving program expressions for genetic
programming. Every chromosome is a tree of some objects such as functions and
commands of a programming language
Selection is the process of choosing two parents from the population for crossing. After
deciding on an encoding, the next step is to decide how to perform selection, i.e., how to
choose individuals in the population that will create offspring for the next generation and
how many offspring each will create. The purpose of selection is to emphasize fitter
individuals in the-population in hopes that their offspring have higher fitness. According to
Darwin’s theory of evolution the best ones survive to create new offspring.
Selection is a method that randomly picks chromosomes out of the population according to
their evaluation function. The higher the fitness function, the better chance that an individual
will be selected. The selection pressure is defined as the degree to which the better
individuals are favored. The higher the selection pressure, the more the better individuals are
favored. This selection pressure drives the GA to improve the population fitness over
successive generations.
• Proportionate-based selection
Proportionate-based selection picks out individuals based upon their fitness values
relative to the fitness of the other individuals in the population.
• Ordinal-based selection
Ordinal-based selection schemes select individuals not upon their raw fitness, but upon
their rank within the population. This requires that the selection pressure is independent
of the fitness distribution of the population, and is solely based upon the relative
ordering (ranking) of the population.
The commonly used reproduction operator is the proportionate reproductive operator where
a string is selected from the mating Pool with a probability proportional to the fitness. The
principle of Roulette selection is a linear search through a Roulette wheel with the slots in
the wheel weighted in proportion to the individual's fitness values. A target value is set,
which is a random proportion of the sum of the fitness’s the population. The population is
stepped through until the target value is reached. This is only a moderately strong selection
technique, since fit individuals are not guaranteed to be selected for, but somewhat have a
greater chance. A fit individual will contribute more to the target value, but if it does not
exceed it, the next chromosome in line has a chance, and it may be weak. It is essential that
the population not be sorted by fitness, since this would dramatically bias the selection.
The Roulette process can also be explained as follows: The expected value of an individual
is individual's fitness divided by the actual fitness of the population. Each individual is
assigned a slice of the Roulette wheel, the size of the slice being proportional to the
individual's fitness. The wheel is spun N times, where N is the number of individuals in the
population. On each spin, the individual under the wheel's marker is selected to be in the
pool of parents for the next generation.
5. Sum the total expected value of the individuals in the population. Let it be T.
6. Repeat N times:
i. Choose a random integer "r" between 0 and T.
ii. Loop through the individuals in the population, summing the expected values,
until the sum is greater than or equal to "r." The individual whose expected value
puts the sum over this limit is the one selected.
This technique randomly selects a parent from the population. In terms of disruption of
generic codes, random selection is a little more disruptive, on average, than Roulette wheel
selection.
Rank Selection ranks the population and every chromosome receives fitness from the
ranking. The worst has fitness 1 and the best has fitness N. It also keeps up selection
pressure when the fitness variance is low. In effect, potential parents are selected and a
tournament is held to decide which of the individuals will be the parent.
There are many ways this can be achieved and two suggestions are:
The best individual from the tournament is the one with the highest fitness, who is the
winner of Nu. Tournament competitions and the winner are then inserted into the mating
pool. The tournament competition is repeated until the mating pool for generating new
offspring is filled. The mating pool comprising the tournament winner has higher average
population fitness. The fitness difference provides the selection pressure, which drives GA
to improve the fitness of the succeeding genes. This method is more efficient and leads to an
optimal solution.
Stochastic universal sampling provides zero bias and minimum spread. The individuals are
mapped to contiguous segments of a line, such that each individual's segment is equal in size
to its fitness exactly as in Roulette wheel selection. Here equally spaced pointers are placed
Crossover is the process of taking two parent solutions and producing from them a child.
After the selection (reproduction) process, the population is enriched with better individuals.
Reproduction makes clones of good strings but does not create new ones. Crossover
operator is applied to the mating pool with the hope that it creates a better offspring.
Crossover is a recombination operator that proceeds in three steps:
1. The reproduction operator selects at random a pair of two individual strings for the
mating.
2. A cross site is selected are random along the string length.
3. 3. Finally, the position values are swapped between the two strings following the cross
site.
The two mating chromosomes are cur once at corresponding points and the sections after the
cuts exchanged. Here, a cross site or crossover point is selected randomly along the length
of the mated strings and bits next to the cross sites are exchanged. If appropriate site is
chosen, better children can be obtained by combining good parents, else it severely hampers
string quality.
In two-point crossover, two crossover points are chosen and the contents between these
points are exchanged between two mated parents.
Originally, GAs were using one point crossover which cuts two chromosomes in one point
and splices the two halves to create new ones. But with this one-point crossover, the head
There are two ways in this crossover. One is even number of cross sires and the other odd
number of cross sites. In the case of even number of cross sites, the cross sites are
selected randomly around a circle and information is exchanged. In the case of odd
number of cross sites, a different cross point is always assumed at the string beginning.
Each gene in the offspring is created by copying the corresponding gene from one or the
other parent chosen according to a random generated binary crossover mask of the same
length as the chromosomes. Where there is a 1 in the crossover mask, the gene is copied
from the first parent, and where there is a 0 in the mask the gene is copied from the
second parent. A new crossover mask is randomly generated for each pair of parents. In
the below figure, while producing child 1, when there is a 1 in the mask, the gene is
copied from parent 1 else it is copied from parent 2. On producing child 2, when there is
a 1 in the mask, the gene is copied from parent 2, and when there is a 0 in the mask, the
gene is copied from the parent 1.
In this crossover technique, three parents are randomly chosen. Each bit of the first parent is
compared with the bit of the second parent. If both are the same, the bit is taken for the
offspring, otherwise the bit from the third parent is taken for the offspring.
The reduced surrogate operator constraints crossover to always produce new individuals
wherever possible. This is implemented by restricting the location of crossover points such
that crossover points only occur where gene values differ.
Ordered two-point crossover is used when the problem is order based, for example in
assembly line balancing, Given two parent chromosomes, two random crossover points are
selected partitioning them into a left, middle and right portions. The ordered crossover
behaves in the following way: child 1 inherits its left and right section from parent l, and its
middle section is determined by the genes in the middle section of parent 1 in the order in
which the values appear in parent 2. A similar process is applied to determine child 2.
Partially matched crossover (PMX) can be applied usefully in the TSP. Indeed, TSP
chromosomes are simply sequences of integers, where each integer represents a different
city and the order represents the time at which a city is visited. Under this representation,
known as permutation encoding, we are only interested in labels and not alleles. PMX
proceeds as follows:
The matching section defines the position-wise exchanges that must take place in both
parents to produce the offspring. The exchanges are read from the matching section of one
chromosome to that of the other. In the example illustrate in Figure 6.23, numbers that
exchange places are 5 and 2, 6 and 3, and 7 and 10.
After crossover, the strings are subjected to mutation. Mutation prevents the algorithm to be
trapped in a local minimum. Mutation plays the role of recovering the lost genetic materials as
well as for randomly distributing generic information. Mutation is viewed as a background
operator to maintain genetic diversity in the population. It introduces new generic structures in
the population by randomly modifying some of its building blocks. Mutation helps escape
from local minima's trap and maintains diversity in the population. It also keeps the gene pool
well stocked, thus ensuring ergodicity. A search space is said to be ergodic if there is a non-
zero probability of generating any solution from any population state.
6.2.4.1 Flipping
6.2.4.2 Interchanging
Two random positions of the string are chosen and the bits corresponding to those positions
are interchanged.
A random position is chosen and the bits next to that position are reversed and child
chromosome is produced.
It decides how often parts of chromosome will be mutated. If there is no mutation, offspring
are generated immediately after crossover (or directly copied) without any change. If
mutation is performed, one or more parts of a chromosome are changed. If mutation
probability is 100%, whole chromosome is changed; if it is 0%, nothing is changed.
1. Maximum generations: The GA stops when the specified number of generations has
evolved.
2. Elapsed time: The generic process will end when a specified time has elapsed.
Note: If the maximum number of generation has been reached before the specified time has
elapsed, the process will end.
3. No change in fitness: The genetic process will end if there is no change to the population's
best fitness for a specified number of generations.
4. Stall generations: The algorithm stops if there is no improvement in the objective function
for a sequence of consecutive generations of length "Stall generations."
5. Stall time limit. The algorithm stops if there is no improvement in the objective function
during an interval of time in seconds equal to "Stall time limit."
A best individual convergence criterion stops the search once the minimum fitness in the
population drops below the convergence value. This brings the search to a faster conclusion,
guaranteeing at least one good solution.
Worst individual terminates the search when the least fit individuals in the population have
fitness less than the convergence criteria. This guarantees the entire population be of
minimum standard, although the best individual may not be significantly better than the
worst. In this case, a stringent convergence value may never be met, in which case the
search will terminate after the maximum has been exceeded.
In this termination scheme, the search is considered to have satisfaction converged when the
sum of the fitness in the entire population is less than or equal to the convergence value in
the population record. This guarantees that virtually all individuals in the population will be
within a particular fitness range, although it is better to pair this convergence criteria with
weakest gene replacement, otherwise a few unfit individuals in the population will blow out
the fitness sum. The population size has to be considered while setting the convergence
value.
Here at least half of the individuals will be better than or equal to the convergence value,
which should give a good range of solutions to choose from.
BPN is a method of reaching multi-layer neural networks how to perform a given task. Here
learning occurs during this training phase.
1. BPN do not have the ability to recognize new patterns; they can recognize patterns
similar to those they have learnt.
2. They must be sufficiently trained so that enough general features applicable to both seen
and unseen instances can be extracted; there may be undesirable effects due to over
training the network.
6.3.2.1 Coding
Assume a BPN configuration n-l-m where n is the number of neurons in the input layer, l is
the number of neurons in the hidden layer and m is the number of output layer neurons. The
number of weights to be determined is given by
Department of CSE, ICET 122
(n+m )l
Each weight (which is a gene here) is a real number. Let d be the number of digits (gene
length) in weight.
In order to determine the fitness values, weights are extracted from each chromosome. Let
a1, a2…..ad….al represent a chromosome and let apd+l, apd+2... , a(p+l)d represent pth gene (p≥0)
in the chromosomes.
where X and Yare the inputs and targets, respectively. Compute initial population I0 of size
'j'. Let O10, O20, ... , Oj0 represent 'j' chromosomes of the initial population I0. Let the
weights extracted for each of the chromosomes upto jth chromosome be w10, w20,w30, ... , wj0
For n number of inputs and m number of outputs, let the calculated output of the considered
BPN be
The fitness function is further from this root mean square error given by
1
𝐹𝐹𝑛 =
𝐸𝑟𝑚𝑠𝑒
In this process, before the parents produce the offspring with better fitness, the mating pool
has to be formulated. This is accomplished by neglecting the chromosome with minimum
fitness and replacing it with a chromosome having maximum fitness, In other words, the
fittest individuals among the chromosomes will be given more chances to participate in the
generations and the worst individuals will be eliminated.
6.3.2.5 Convergence
The convergence for generic algorithm is the number of generations with which the fitness
value increases towards the global optimum. Convergence is the progression towards
increasing uniformity. When about 95% of the individuals in the population share the same
fitness value then we say that a population has converged.
For modeling complex systems in which classical tools are unsuccessful, due to them being
complex or imprecise, an important tool in the form of fuzzy rule based systems has been
identified.
The main objectives of optimization in fuzzy rule based system are as follows:
1. The task of finding an appropriate knowledge base (KB) for a particular problem. This is
equivalent to parameterizing the fuzzy KB (rules and membership functions). 2. To find
those parameter values that are optimal with respect to the design criteria.
Considering a GFRBS, one has to decide which parts of the knowledge base (KB) are
subject to optimization by the GA. The KB of a fuzzy system is the union of qualitatively
different components and not a homogeneous structure.
The task of tuning the scaling functions and fuzzy membership function is important in
FRBS design.
The universes of discourse where fuzzy membership function are defined are
normalized by scaling functions applied to the input and output variables of FRBSs.
In case of linear the scaling functions are parameterized by a single factor or either
by specifying a lower and upper bound. In case of non-linear scaling, the scaling
functions are parameterized by one or several contradiction/dilation parameters.
These parameters are adapted such that the scaled universe of discourse matches the
underlying variable range.
For FRBSs of the descriptive (using linguistic variables) or the approximate (using
fuzzy variables) type, the structure of the chromosome is different. In the process of
tuning the membership functions in a linguistic model, the entire fuzzy partitions are
encoded into the chromosome and in order to maintain the global semantic in the
RB, it is globally adapted.
When considering a rule based system and focusing on learning rules, there are three main
approaches that have been applied in the literature:
1. Pittsburgh approach.
2. Michigan approach.
3. Iterative rule learning approach
The Pittsburgh approach is characterized by representing an entire rule set as a generic code
(chromosome), maintaining a population of candidate rule sets and using selection and
generic operators to produce new generations of rule sets. The Michigan approach considers
a different model where the members of the population are individual rules and a rule set is
represented by the entire population. In the third approach, the iterative one, chromosomes
code individual rules, and a new rule is adapted and added to the rule set, in an iterative
fashion, in every run of the genetic algorithm.
Problems