0% found this document useful (0 votes)
18 views

Soft Computing Notes (1)

Uploaded by

vidhi goyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Soft Computing Notes (1)

Uploaded by

vidhi goyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Soft Computing Notes

Unit -1

McCulloch Pitt's Model of Neuron

 The McCulloch Pitt's Model of Neuron was developed by Warren McCulloch and Warren
Pitts in 1943.
 McCulloh/Pitt’s Model accepts only Boolean inputs.
 In McCulloh/Pitt’s Model the inputs are not weighted which means that this model is not
flexible.

Architecture
 Inputs: The neuron receives multiple input signals (like 1s and 0s).
 Weights: Each input has a weight that can change the importance of that input.
 Summation: All the weighted inputs are added together to get a total value.
 Threshold: The neuron has a fixed number called a threshold.
 Activation: If the total value is equal to or greater than the threshold, the neuron fires
(outputs 1). If it is less, the neuron does not fire (outputs 0).
 Output: The result is either 1 (active) or 0 (inactive), like a yes/no decision.

Example
John carries an umbrella if it is sunny or if it is raining. There are four given situations. I need to
decide when John will carry the umbrella. The situations are as follows:
 First scenario: It is not raining, nor it is sunny
 Second scenario: It is not raining, but it is sunny
 Third scenario: It is raining, and it is not sunny
 Fourth scenario: It is raining as well as it is sunny
To analyse the situations using the McCulloch-Pitts neural model, I can consider the input
signals as follows:
 X1: Is it raining?
 X2 : Is it sunny?
So, the value of both scenarios can be either 0 or 1. We can use the value of both weights X1 and
X2 as 1 and a threshold function as 1. So, the neural network model will look like:

Truth Table for this case will be:

Situation x1 x2 ysum yout

1 0 0 0 0

2 0 1 1 1

3 1 0 1 1

4 1 1 2 1

So, I can say that,

From the truth table, we can conclude that in the situations where the value of yout is 1, John
needs to carry an umbrella. Hence, he will need to carry an umbrella in scenarios 2, 3 and 4.

Hebbian Learning Rule


 It was the first and oldest learning rule which was discovered by Donald Hebb in 1949.
 It is unsupervised learning.
 It states that if two neurons fire together, the connection between them becomes stronger.
 It identifies, how to modify the weights of nodes of a network.
Mathematical Representation:
ΔW=η×xi×yj
Where:
 ΔW = Change in synaptic weight
 η = Learning rate (a small positive constant)
 xi = Input from neuron i
 yj = Output from neuron j
The following phenomenon occurs when
 If two neighbor neurons are operating in the same phase at the same time, then the weight
between them should increase.
 For neurons operating in the opposite phase, the weight between them should decrease.
 If there is no signal correlation, the weight does not change.
 When inputs of both the nodes are either positive or negative, it results in a strong
positive weight.
 If the input of one node is positive and negative for the other, a strong negative weight is
present.
Example: Suppose neuron AA is responsible for detecting light, and neuron BB is responsible
for detecting a bell sound. If both neurons are activated at the same time repeatedly (like a light
flashing with a bell sound), their connection will become stronger. After some time, if the light
flashes, neuron BB (which responds to the bell sound) might also activate because of the
strengthened connection, leading to an association between light and sound.

Delta learning rule


 It was developed by Bernard Widrow and Marcian Hoff.
 It is a supervised learning rule.
 It states that Modification in sympatric weight of a node is equal to the multiplication of
error and the input.
 This rule is used to minimize the difference (or error) between the actual output and the
desired output of a neuron.
Mathematical Representation:
ΔW=η×(d−y)×x
Where:
1. ΔWΔW = Change in synaptic weight
2. η = Learning rate (a small positive constant)
3. d = Desired (target) output
4. y = Actual output
5. x = Input value
Working Principle:
 If the actual output y is different from the desired output d, the weights are adjusted in
the direction that reduces the error.
 The amount of change is proportional to the difference between d and y (i.e., the
error) and the input x.

Example: Suppose a neural network is trying to predict whether an image is a cat or not (binary
output, 1 or 0). The actual output y of the neuron might be 0.8, but the desired output d is 1. The
Delta rule will adjust the weights so that the network learns to output a value closer to 1 the next
time the same input is given.

Types of Learning
Supervised Learning:
 It happens in the presence of supervisor.
 Here the model is trained on labeled data, where both the input and the corresponding
correct output are provided.
 The network learns to map inputs to outputs by adjusting weights based on the error
between predicted outputs and actual outputs.
Advantages:
 It is fast learning mechanisms.
 It helps to predict output on a prior basis.
 Provides high accuracy.
Disadvantages:
 You might not be able to solve complex problems.
 Requires a lot of computational time to train the algorithm.
Example: Image classification, where the model is trained with images of cats and dogs labeled
as "cat" or "dog."
Other examples are image segmentation, medical diagnosis, spam detection, fraud detection,
speech recognition.

Unsupervised Learning:
 It happens in absence of supervisor.
 Here the model is trained on data without any labels. The algorithm tries to find patterns
or structures in the input data.
 The network learns to group similar data points or reduce dimensionality without
guidance on what the output should be.

Advantages:
 It reduces effort of labelling data.
 It provides accurate results.
Disadvantages:
 Requires a lot of computational time to train algorithms.
 May be difficult to predict quality of model output.
Example: Clustering customer data for market segmentation. The model identifies groups of
similar customers based on purchasing behavior without any predefined categories. Other
applications are – network analysis, recommendation system etc.

Reinforcement Learning
 It learns from feedback and past experiences.
 It is a long-term iterative process.
 It is also called the Markov Decision Process.
 It does not have any labelled or unlabeled datasets.
Advantages:
 Helps us to solve complex problems in the real world.
 Gives the most accurate results.
Disadvantages:
 It requires a huge amount of data and computation.
 It is not preferred for simpler problems.
Example: Training a robot to navigate a maze. The robot receives positive rewards for reaching
the goal and negative penalties for hitting walls. It learns the best path through trial and error.

Artificial Neural Network (ANN)


ANN is a system of connected nodes (neurons) that mimics how the human brain processes
information.

Architecture
 Input Layer: This is the first layer of ANN. It takes raw data directly like images pixels,
numerical data.
 Hidden Layer: There are one or more layers present between input and output layer. They
are responsible for processing and extracting features from data.
 Output Layer: It is the final layer that provides the prediction or result. In this number of
neurons depends on type of problem.
Characteristics
 Learning Ability: Can learn from data and improve over time.
 Adaptability: Can adjust to new data by changing weights.
 Parallel Processing: Can process many calculations at the same time.
 Non-linear: Can handle complex relationships between inputs and outputs.
 Black Box: Hard to understand exactly how decisions are made.
Merits (Advantages)
 Good at finding patterns and making predictions.
 Works for many types of problems (e.g., images, text).
 Learns without needing a lot of human instructions.
Demerits (Disadvantages)
 It takes a lot of time and data to train.
 Hard to understand how it makes decisions.
 Need powerful computers and a lot of energy.
Applications
Image Recognition, Speech Recognition, Medical Diagnosis

Difference Between Biological Neuron and Artificial Neuron


Feature Biological Neuron Artificial Neuron
Found in the human brain and nervous Created as a part of computer science and
Origin system. AI.
Complex with dendrites, soma (cell Simplified structure with input nodes,
Structure body), axon, and synapses. weights, and output.
Uses electrical and chemical signals for Uses numerical values and mathematical
Communication communication. functions for processing.
Signal Processes signals using Processes inputs using weighted sums and
Processing neurotransmitters and ion channels. activation functions.
Feature Biological Neuron Artificial Neuron
Can only update weights and biases during
Can grow new connections and change training but cannot change structure
Adaptability structure over time. automatically.
Powered by biochemical energy from Powered by electricity in computer
Energy Source the body. systems.
Processes at a slower rate, in Processes at a much faster rate (in
Speed milliseconds. microseconds).
Learns and adapts through experiences Learns from data using algorithms like
Learning and synaptic plasticity. backpropagation.
Highly complex, capable of very
sophisticated processing and decision- Simpler and limited to tasks it has been
Complexity making. trained for.
Has thousands of connections with Limited number of connections, defined
Connections other neurons (synapses). by the network design.
Naturally handles complex, non-linear Uses artificial activation functions to
Non-linearity interactions. model non-linear interactions.

Difference Between Artificial Neural Network (ANN) and Human Brain


Feature Human Brain Artificial Neural Network (ANN)
Composed of billions of neurons Composed of artificial neurons
interconnected with trillions of organized into layers with defined
Structure synapses. connections.
Learns through experiences,
complex feedback mechanisms, and Learns from data using algorithms like
Learning Method synaptic plasticity. backpropagation and gradient descent.
Highly adaptable with the ability to Limited to adjusting weights and biases
grow new connections and during training; structure stays fixed
Adaptability reorganize itself. unless reprogrammed.
Extremely complex with many Simplified model of the brain, limited
layers of interconnected processing by the number of layers and neurons
Complexity and self-regulation. defined by the programmer.
Highly energy-efficient, using only Consumes significant power for
about 20 watts for all brain training and running, especially for
Energy Efficiency functions. deep learning models.
Feature Human Brain Artificial Neural Network (ANN)
Simulates parallel processing but is
Parallel Capable of true parallel processing restricted by hardware and software
Processing on a massive scale. capabilities.
Processes at a slower rate for
individual neuron firing but can Can process data very quickly due to
handle multiple tasks computer hardware but is focused on
Speed simultaneously. one task at a time.
Learns from minimal examples, Often requires large amounts of data to
Learning generalizes easily, and can infer learn effectively and struggles with
Capacity from incomplete data. generalization without training.
Stores information through a
complex combination of neuron
connections and biochemical Stores information as numerical
Memory Storage processes. weights in its architecture.
Can switch between various Designed for specific tasks (e.g., image
cognitive tasks seamlessly (e.g., recognition, language processing) and
reasoning, emotional response, lacks versatility without
Flexibility creativity). reprogramming.
Can adapt and work effectively Performance degrades if certain parts of
even with damaged parts, showing the network are damaged or
Error Tolerance resilience and plasticity. misconfigured; not self-repairing.
Emotion and Has emotions, consciousness, and Lacks any form of emotion,
Consciousness self-awareness. consciousness, or self-awareness.

Feedforward Network
A feedforward neural network is a type of neural network where information flows in one
direction from the input to the output layers, without cycles or loops
In a Feedforward Neural Network, data is passed through a series of layers:
 Input Layer: Receives the initial data.
 Hidden Layers: Process the data received from the input layer. These layers can be one
or more, each consisting of neurons that apply activation functions to their inputs.
 Output Layer: Produces the final output.
The data flows in one direction, from the input layer to the output layer, without any feedback
loops.

Types of feedforward network


Single layer feedforward network
 It a neural network with only one layer of connections between the input and output.
 Can only solve linearly separable problems (e.g., simple classification tasks).
 Structure:
 Input Layer: Receives data inputs but does not process them.
 Output Layer: Processes inputs and produces the final output.

Multilayer feedforward network


 It is a network with one or more hidden layers between the input and output layers.
 Can solve both linearly and non-linearly separable problems.
 Structure:
 Input Layer: Receives input data.
 Hidden Layers: One or more layers that process inputs using weights, biases, and
activation functions to capture complex patterns.
 Output Layer: Produces the final output.
Feedback Neural Networks
 These networks have connections that loop back, allowing information to be fed
back into the network.
 This structure enables them to handle sequential data and temporal dependencies
 They have ability to maintain a state that captures information about previous
inputs.
Architecture:
It consists of three layers- input layer, output layer and hidden layers.
 Input Layer: Receives the input data.
 Hidden Layers: Contain neurons with recurrent connections that maintain a state
over time.
 Output Layer: Produces the final output based on the processed information.

The recurrent connections allow RNNs to maintain a memory of previous inputs, which
is crucial for tasks involving sequential data.
Applications:
NLP, Time series prediction, Speech recognition, handwriting recognition.
Advantages:
 Can handle dynamic systems
 It is good for sequential tasks.
 Suitable for problems like speech recognition, time series prediction, and control
systems.
Disadvantages:
 Requires lot of time to train the network.
 It is more complex in nature.
 Requires more computational resources.

Linear Separability and XOR Problem


Linear Separability
 Definition: Linear separability refers to the ability of a dataset to be perfectly divided
into classes using a straight line (in 2D), a plane (in 3D), or a hyperplane (in higher
dimensions).
 Example: Consider a 2D dataset with points labeled as Class A and Class B:
 If you can draw a single straight line that separates all points of Class A from all
points of Class B, then the dataset is linearly separable.
 Use Case: Simple classification tasks where data can be divided into distinct classes
using a linear boundary (e.g., classifying points in a plane as either above or below a
certain line).
Visual Representation
 Linearly Separable: Data points that can be separated with a straight line.
 Not Linearly Separable: Data points that require a more complex boundary (e.g., a
curve) to separate them.

XOR Problem
 Definition: The XOR (exclusive OR) problem is a classic example in machine learning
that demonstrates the limitation of single layer perceptron’s.
 Explanation:
 The XOR function outputs 1 if the inputs are different (e.g., (0, 1) or (1, 0)),
and 0 if the inputs are the same (e.g., (0, 0) or (1, 1)).
 The XOR problem is not linearly separable because you cannot draw a single
straight line to separate the 1s from the 0s in a 2D plane.
XOR Problem Example
Input Input XOR
1 2 Output

0 0 0

0 1 1

1 0 1

1 1 0
 Visualization:
 Points (0, 0) and (1, 1) represent output 0.
 Points (0, 1) and (1, 0) represent output 1.
 You cannot draw a straight line to separate points with output 1 from those with
output 0.
Why XOR is Important
 Limitations of Single-Layer Perceptron:
 A single-layer perceptron (linear classifier) cannot solve the XOR problem as it
cannot model non-linear relationships.
 Solution with Multi-Layer Networks:
 Multi-Layer Perceptrons (MLPs), with at least one hidden layer, can solve the
XOR problem.
 Activation Functions in hidden layers allow MLPs to model non-linear decision
boundaries, enabling them to classify data like XOR accurately.

Explain the various activation functions used in neural network


Activation functions are essential in neural networks as they determine how the input data is
transformed and whether a neuron should be activated (or fired). Here’s a simple explanation of
various activation functions commonly used in neural networks:

1. Step Function

 Definition: This function outputs either 0 or 1 based on whether the input is above or
below a certain threshold.
 Formula:f(x)={1if x≥00if x<0f(x)={10if x≥0if x<0
 Use: Simple binary classification tasks.
 Example: Deciding if an email is spam (1) or not spam (0).

2. Sigmoid Function

 Definition: This function maps any real-valued number to a value between 0 and 1,
creating an "S" shaped curve.
 Formula:f(x)=11+e−xf(x)=1+e−x1
 Use: Good for models where we want to predict probabilities.
 Example: Output of a neuron representing the likelihood of a class.

3. Hyperbolic Tangent (Tanh) Function

 Definition: Similar to the sigmoid function, but it maps values to a range between -1 and
1.
 Formula:f(x)=ex−e−xex+e−x
 Use: Preferred over sigmoid when outputs need to be zero-centered.
 Example: Used in hidden layers of neural networks for better performance.

4. ReLU (Rectified Linear Unit)


 Definition: This function outputs the input directly if it is positive; otherwise, it outputs
zero.
 Formula:f(x)=max⁡(0,x)
 Use: Commonly used in hidden layers of deep networks.
 Example: Helps models learn complex patterns without saturating like sigmoid.

5. Leaky ReLU

 Definition: A variation of ReLU that allows a small, non-zero gradient when the input is
negative.
 Formula:f(x)={xif x≥0αxif x<0(where αα is a small constant, like 0.01)
 Use: Addresses the "dying ReLU" problem where neurons become inactive.
 Example: Helps maintain some output even for negative inputs.

6. Softmax Function

 Definition: This function converts a vector of raw scores (logits) into probabilities that
sum to 1.
 Formula:f(xi)=exi∑exj(for each element xi in the input vector)
 Use: Used in the output layer for multi-class classification problems.
 Example: Classifying images into multiple categories (like cat, dog, bird).

McCulloch Pitts is a neuron model to design logic networks for AND and OR Logic
functions.
Unit -2

Perceptron Training Algorithm


Perceptron is a neural network unit that helps to detect certain input data computations in
business intelligence.
The perceptron learning algorithm adjusts weights based on errors from each training
example to minimize those errors over time.
Perceptron Training Algorithm with Bias
 Initialize the weights w1,w2,...,wnw1,w2,...,wn and the bias bb to small random
values or zeros.
 Set a learning rate η (a small positive constant, e.g., 0.1).
 Compute the weighted sum z=w1x1+w2x2+...+wnxn+
 Apply the step activation function.
 Update weights and bias if the prediction y^ is
incorrect:wi=wi+η×(y−y^)×xifor each i.
 Repeat the process for a number of epochs or until the weights converge (i.e., the
error rate is sufficiently low).

Perceptron Training Algorithm Without Bias


 Initialize the weights w1,w2,...,wn without a bias term to small random values or
zeros.
 Set a learning rate η.
 For each training example (x,y):
 Compute the weighted sum z=w1x1+w2x2+...+wnxn
 Apply the step activation function
 Update weights if the prediction y^ is incorrect:
wi=wi+η×(y−y^)×xifor each i
 Continue this process for multiple epochs or until the weights stabilize.

Example with Bias


Training Data (for an AND gate):
y(Target)
X1 x2
0
0 0
0
0 1
0
1 0
1
1 1
Initialization:
 Weights w1=0.1, w2=0.1, bias b=0.1, learning rate η=0.1.
Training Step:
 For x=(1,1), y=1:
 Compute z=0.1⋅1+0.1⋅1+0.1=0.3 → y^=1 (correct).
 For x=(0,1), y=0:
 Compute z=0.1⋅0+0.1⋅1+0.1=0.2→ y^=1 (incorrect).
 Update: w2=0.1+0.1×(0−1)×1=0, b=0.1+0.1×(0−1)=0.
Repeat this process for each example until weights and bias converge.

Example Without Bias


Training Data (for an AND gate, same as above):
 The process is the same, but the computation excludes the bias term. This may lead to a
different set of weights or require additional modifications, such as adding constant
features, to work without bias effectively.

Radial Basis Function (RBF) Neural Networks


 Radial Basis Function (RBF) Neural Networks is a Multilayer Perceptron which has one
input layer, output layer and with one strictly hidden layer used primarily for function
approximation tasks.
 Radial Basis Functions (RBFs) are a special category of feed-forward neural networks.
 This neural network is used for classification, regression, interpolation, function
approximation, time series prediction.

Typical Radial Basis functions are:


 Gaussian RBF: It monotonically increases with the distance from the center.
 H(x) = e ^(-(x-c)^2)/r^2 , where c is center and r is radius.
 Multiquadric RBF: It monotonically decreases with distance from the center.
 H(x) = under root of (r^2 + (x-c)^2)/r

Working of RBF Neural Network:


1. Step 1: Input Layer: The input data is fed into the network. Each feature of the data is
passed to the neurons in the hidden layer.
2. Step 2: Hidden Layer: This layer uses radial basis functions (usually a Gaussian
function). Each neuron in this layer calculates the distance between the input and a center
(a specific point). The closer the input is to the center, the higher the activation.
3. Step 3: Output Layer: The activated values from the hidden layer are passed to the output
layer, which produces the final result (e.g., classification or prediction).

Advantages:
 RBF networks are faster to train compared to other neural networks
 They are effective for problems where local patterns are important.

Disadvantages:
 The performance can degrade if there are outliers in the data.
 Since the network uses all training data to compute distances, it can be memory-
intensive.

Example: Suppose you're trying to classify whether an email is spam or not. An RBF network
can learn the patterns in email content, where it focuses on the local features of the text (like the
frequency of specific words). The closer an email's features match a typical spam email, the
higher the activation, and the network can classify it as spam.

Adaline (Adaptive Linear Neuron):


 A single-layer neural network that works similarly to the perceptron but uses a linear
activation function instead of a step function.
 Adaline uses the Delta Rule (or Least Mean Squares rule)
 Adaline updates weights continuously based on how far its prediction is from the target
value.

Algorithm:
1. Initialize the weights w randomly and set the learning rate η.
2. Input the training data x and desired output d.
3. Calculate output: Compute the net input y=∑(wi×xi).
4. Compute error: Find the error e=d−y.
5. Update weights: Adjust the weights wi=wi+η×e×xi .
6. Repeat steps 3-5 for all training samples until the error is minimized (or a stopping
condition is met).
Merits of Adaline:
 Simple and easy to implement.
 Effective for problems with linearly separable data.
Demerits of Adaline:
 Limited to linear classification tasks; cannot handle non-linear data well.
Example:
 Adaline: Think of Adaline as a model predicting the price of a house based on features
like size and location. It adjusts its prediction error gradually to get closer to the real
price.

Madaline (Multiple Adaline):


 A multi-layer version of Adaline that can handle more complex tasks by using multiple
neurons and layers.
 Madaline can solve more complex problems than single-layer Adaline because it
processes inputs in multiple layers.
 It can be used for pattern recognition and classification problems and uses threshold
logic at the output layer to make decisions.


Algorithm:
1. Initialize the weights for all layers and set the learning rate ηη.
2. Input the training data xx.
3. Forward pass: Calculate the output for each Adaline unit in the hidden layer.
4. Threshold logic: Apply thresholding to determine the final output.
5. Compute error: Compare the final output with the desired output.
6. Update weights: Adjust the weights for all layers based on the error using the Madaline
Rule.
7. Repeat until the error is minimized or the network converges.
Merits of Madaline:
 Can solve more complex, non-linear problems.
 Suitable for tasks requiring multi-layer processing.
Demerits of Madaline:
 Training can be slower and more computationally expensive due to its multi-layer
structure.
 More complex than single-layer networks like Adaline or perceptron.
Example: predicting both house price and sales timing by considering more features and using
multiple layers of neurons to reach the final prediction.

Single Layer Perceptron Learning Algorithm


 A Single Layer Perceptron is a simple type of neural network used for binary
classification tasks.
 A single layer perceptron (SLP) is a feed-forward network based on a threshold
transfer function.
 It consists of one layer of output nodes connected directly to input features.

Algorithm:
 Initialize weights w and bias b randomly and set the learning rate n.
 Input training data x and desired output d.
 For each training example:
o Calculate net input:z=∑(wi×xi)+b
o Apply activation function (step function):y={1if z≥00if z<0
 Compute error: e=d−y
 If error exists (e≠0e =0):
o Update weights:wi=wi+η×e×xi
o Update bias:b=b+η×e
 Repeat steps 3 to 5 for all training examples and continue for multiple epochs
until convergence.
Advantages:
 Easy to understand and implement.
 Requires less computational power and time compared to more complex models.
Disadvantages:
 Can only solve problems that are linearly separable.
 Cannot model complex relationships between inputs and outputs

Example: Suppose you want to classify whether an email is spam (1) or not spam (0). The
perceptron learns from past emails (features like keywords and sender) and adjusts weights
based on whether its predictions are correct. Over time, it becomes better at making the right
predictions.
Multi-Layer Perceptron (MLP) Learning Algorithm
A Multi-Layer Perceptron (MLP) is a type of artificial neural network that has one or more
hidden layers between the input and output layers.
It can model complex relationships by learning non-linear patterns through backpropagation and
activation functions.

 Structure:
 Input Layer: Takes input data.
 Hidden Layers: One or more layers that process data using weights, biases, and
activation functions.
 Output Layer: Produces the final prediction or classification.
 Activation Functions: Common ones include ReLU, sigmoid, and tanh for hidden
layers to introduce non-linearity.
Algorithm for Training MLP
 Initialize weights and biases randomly.
 Pass input data through each layer to compute the output.
 Apply activation functions to introduce non-linearity.
 Compute the loss using a loss function.
 Calculate the gradient of the loss with respect to each weight and bias using the
chain rule.
 Propagate errors backward from the output layer to the input layer.
 Update weights and biases using a gradient descent optimizer.
 Repeat the forward and backward propagation steps for multiple epochs until the
model converges or reaches an acceptable level of accuracy.
Advantages of MLP
 Can learn non-linear relationships and solve problems that are not linearly separable.
 Can be used for various tasks, including classification, regression, and pattern
recognition.
 Theoretically, an MLP with one hidden layer containing sufficient neurons can
approximate any continuous function.
Disadvantages of MLP
 Requires a lot of computational power and time.
 Requires a lot of labeled data.
 Hard to interpret the internal workings and decision-making process of the network.

Error Back Propagation

 Backpropagation (short for "Backward Propagation of Errors") is a method used to train


artificial neural networks.
 Its goal is to reduce the difference between the model’s predicted output and the actual
output by adjusting the weights and biases in the network.
Architecture
1. Input Layer: Receives input data in the form of feature vectors and passes data to the
first hidden layer without any computation.
2. Hidden Layers: It consists of one or more layers where computations take place. Each
neuron in a hidden layer receives inputs from all neurons in the previous layer, computes
a weighted sum, adds a bias, and applies an activation function (e.g., sigmoid, ReLU).
3. Output Layer: Consists of one or more neurons that produce the final output of the
network. The output is compared to the target output to compute the error.

Characteristics:
 It trains multi-layer neural networks.
 It Reduces the difference between predicted and actual outputs by adjusting weights.
 It Updates weights from the output layer back to the input layer.
 Needs sufficient training data for good performance.
Advantages:
 Requires no prior neural network knowledge.
 It is simple and flexible.
 The algorithm scales efficiently with larger datasets and more complex problems.
Disadvantages:
 It is a complex process.
 Difficult to train the network.
 Requires a lot of computational time to train the network.
Applications:
Image Recognition, Speech recognition, NLP, Medical Diagnosis etc.

Algorithm:

Multi-Layer Perceptron (MLP) with Linear Activation Function


A Multi-Layer Perceptron (MLP) is a type of artificial neural network that consists of multiple
layers: an input layer, one or more hidden layers, and an output layer.
Linear Activation Function:
A linear activation function is defined as:
f(x)=ax
where a is a constant (often a=1).
Proof:
MLP with Linear Activation Functions is Equivalent to a Single-Layer Perceptron
If all neurons in an MLP use a linear activation function, the network's output becomes a linear
transformation of the input, regardless of the number of layers.
1. Forward Propagation in MLP:
In a single-layer perceptron with input vector x and weight vector w: y=w⋅x+b,
where y is the output, and b is the bias.
In an MLP with multiple layers, each layer computes: z(l)=W(l)⋅a(l−1)+b(l), where:
z(l) is the pre-activation output of layer l,
W(l) and b(l) are the weights and biases for layer l,
a(l−1) is the activation from the previous layer.
2. Impact of Linear Activation:
 If each activation function in the MLP is linear (i.e., f(x)=x), then:
a(l)=z(l)=W(l)⋅a(l−1)+b(l)
 The output of the network after n layers is a series of matrix
multiplications:y=W(n)⋅(W(n−1)⋅(…(W(1)⋅x+b(1))…)+b(n−1))+b(n)
3. Combining Linear Transformations:
 Matrix multiplication of linear transformations is itself a linear transformation:
y=Wcombined⋅x+bcombined
 The combined weight matrix Wcombined and bias bcombined are the result of
multiplying and adding the weight matrices and biases from each layer.
4. Conclusion:
 A deep MLP with only linear activation functions collapses to a single linear
transformation from input to output.
 Therefore, the network behaves like a single-layer perceptron and cannot model
non-linear relationships.
Conclusion:
Using linear activation functions in an MLP results in a model that is effectively the same as a
single-layer perceptron.
Unit -3

Kohonen’s Self Organizing Feature Map


 Self-Organizing Map (or Kohonen Map or SOM) is a type of Artificial Neural Network
which was built in 1970s.
 It follows an unsupervised learning approach and trained its network through a competitive
learning algorithm.
 SOM is used for clustering and mapping techniques to map multidimensional data onto
lower-dimensional.
Architecture of SOM
SOM has two layers one is input layer and another one is output layer. It also includes weight
vectors.

 Input Layer: The input layer consists of neurons that receive the input data directly. The
input is n-dimensional vector representing a data point with n features.
 Output Layer: The output layer is a grid of neurons arranged in a 1D, 2D, or sometimes 3D
space. Each neuron in this grid acts as a cluster representative and is fully connected to the
input layer.
 Weight Vectors: Each neuron in the output layer has an associated weight vector of the same
dimension as the input data. Those weights are being adjusted during training to represent
clusters of similar input data points.

Training Algorithm for SOM:


 Set up a 2D grid of neurons with random initial weights.
 Provide the input data to the SOM one sample at a time.
 Calculate the distance between the input sample and each neuron’s weight. Select the neuron
with the smallest distance as the winning neuron (Best Matching Unit, BMU).
 Adjust the weights of the winning neuron and its neighboring neurons to move them closer to
the input vector.
 The size of the neighborhood of the winning neuron reduces over time to allow more precise
weight updates.
 Repeat the process for a set number of epochs or until convergence.
Advantages:
 Simplifies high-dimensional data visualization
 Effectively clusters similar data without supervision.
Disadvantages:
 Requires careful tuning of parameters.
 Grid size must be set in advance.
Applications:
Data Visualization, Image Compression, Robotics.

Flowchart of Kohonen Self Organizing Map


Compare and Contrast Counter Propagation Network and Kohonen Self
Organizing Map

Counter Propagation Network


Aspect (CPN) Kohonen Self-Organizing Map (SOM)
Two layers: input layer, Kohonen Single-layer grid of neurons (Kohonen
Architecture layer, and Grossberg layer. layer).
Training Supervised and unsupervised
Type phases. Only unsupervised learning.
Input Processes inputs through both Processes inputs through the Kohonen
Processing Kohonen and Grossberg layers. layer only.
Has an additional Grossberg output No output layer; focuses on mapping
Output Layer layer. input to neurons.
Combines competitive learning
Learning (Kohonen) and supervised learning Uses competitive learning for mapping
Mechanism (Grossberg). inputs.
Flowchart Input → Kohonen Layer → Input → Kohonen Layer → Position
Steps Grossberg Layer → Output. mapped in neuron grid.
Maps input to target outputs with Creates a topological representation of
Main Goal training. input data.
Application Used in classification and data Used for clustering and visualization of
Focus association. data.
More complex due to dual learning Simpler, only requires unsupervised
Complexity phases. training.
Data Data clustering and pattern recognition
Structure Input-output mapping structure. structure.
Adaptation Can adapt to both classification Adapts input data into clusters,
Ability and associative memory. preserving neighborhood.

Counter Propagation Network


 Counter propagation networks were proposed by Hecht Nielsen in 1987.
 They are multilayer networks based on the combinations of the input, output and
clustering layers.
 A Counter Propagation Network (CPN) combines features of both supervised and
unsupervised learning.

Architecture:
1. Input Layer: Takes in the input data vector. Each input neuron represents one feature of
the input.
2. Kohonen Layer (SOM): The first layer is an unsupervised competitive layer. Neurons in
this layer learn to cluster the input data and identify the winning neuron (Best Matching
Unit).
3. Grossberg Layer (Output Layer): This is a supervised learning layer that maps the
clustered representation from the Kohonen layer to the target outputs. It adjusts weights
based on the expected output for a given input.

Flowchart of Counter Propagation:

Advantages:
 It is well suited for network that require a lot of training data and have multiple layers.
 Easy to implement due to its structured learning approach.
Disadvantages:
 It is very expensive.
 It can take a lot of computational time to train a network.
Applications:
 Data compression specially for Image and audio
 Function approximation
 Pattern association

Convolution Neural Networks


 Convolutional Neural Network (CNN) is the extended version of artificial neural
networks (ANN).
 It is predominantly used to extract the feature from the grid-like matrix dataset.
CNN Architecture
Convolutional Neural Network consists of multiple layers like the input layer, Convolutional
layer, Pooling layer, and fully connected layers.

 Convolutional layer: The convolutional layer applies filters to the input image to create
feature maps. These filters help the network detect features like edges, shapes, and
textures.
 Pooling Layer: The pooling layer reduces the size of the feature maps, making the
network faster and reducing the amount of computation while retaining important
information.
 Fully Connected Layer: The fully connected layer at the end takes the learned features
and uses them to make a final prediction or classification.
Training Process:
1. Feed the input data (e.g., an image) into the network.
2. Pass the input through the convolutional, pooling, and fully connected layers to generate
an output (prediction).
3. Compare the predicted output to the actual target value using a loss function to determine
the error.
4. Propagate the error backward through the network to update the weights of the filters and
connections.
5. Compute the gradients of the loss function with respect to the weights using optimization
algorithms.
6. Adjust the weights in the network using the calculated gradients and a learning rate.
7. Continue feeding new input data and repeat the process for multiple iterations until the
model's performance improves and the error is minimized.
8. Stop training when a stopping condition is met.
Advantages of CNNs:
 Good at detecting patterns and features in images, videos, and audio signals.
 Can handle large amounts of data and achieve high accuracy.
Disadvantages of CNNs:
 Computationally expensive to train and require a lot of memory.
 Requires large amounts of labeled data.
Applications:
Image classification, image segmentation, object detection, speech recognition.
CNNs are required for large images because they effectively reduce the dimensionality, focus on
important features, and minimize computational costs while preserving the ability to learn
complex patterns.

Recurrent Neural Networks


 Recurrent Neural Networks (RNNs) were introduced in the 1980s by researchers David
Rumelhart, Geoffrey Hinton, and Ronald J. Williams.
 Recurrent Neural Networks introduce a mechanism where the output from one step is fed
back as input to the next, allowing them to retain information from previous inputs.
 The main and most important feature of RNN is Hidden state, which remembers some
information about a sequence.
Architecture of Recurrent Neural Network work:
 The nodes represent the “Neurons” of the network.
 The neurons are spread over the temporal scale (i.e., sequence) separated into three
layers.

The layers are:


 Input layer represents information to be processed.
 A hidden layer represents the algorithms at work.
 Output layer shows the result of the operation.
Hidden layer contains a temporal loop that enables the algorithm not only to produce an output
but to feed it back to itself.

Training Process:

1. Feed sequential data (e.g., words in a sentence) into the network.


2. Process the input at each time step and pass hidden state information to the next step.
3. Produce an output based on the current input and previous hidden states.
4. Compute the difference between the predicted output and actual output using a loss
function.
5. Adjust weights using a modified version of backpropagation that considers each time
step. The network calculates gradients for each time step and updates weights
accordingly.
6. Use optimization algorithms (e.g., stochastic gradient descent) to update weights.
7. Continue processing data through the network for multiple epochs until performance
improves.
Advantages:
 Ideal for tasks involving sequences like speech, text, and time-series analysis.
 Can handle variable-length input sequences.
Disadvantages:
 More challenging to train than feedforward networks due to the sequential nature.
 It can be computationally expensive and slower to train.
Applications:
Speech Recognition, Natural Language Processing, Video Analysis.

Applications of Neural Network


Image Compression in Neural Networks
Image Compression refers to reducing the size of image data while retaining important features,
making it efficient for storage and transmission.

Neural networks, particularly autoencoders, are widely used for this purpose.
Process of Image Compression in Neural Networks:
1. Autoencoder Architecture:
 The network is divided into two parts: Encoder and Decoder.
 The encoder compresses the input image into a smaller representation (latent
space).
 The decoder reconstructs the image from this compressed representation.
2. Training:
 Train the network using image datasets to minimize the reconstruction error
between the original and decoded images.
 Loss functions like Mean Squared Error (MSE) are commonly used.
3. Compressed Representation:
 The latent space stores the image in a compact form, enabling compression.
Applications:

 Reducing storage requirements for high-resolution images.


 Efficient image transmission for real-time applications like video streaming.
Example:

 Compressing a 256x256 image into a 32-dimensional latent vector, reducing the data size
significantly while retaining key visual details.

Data Compression in Neural Networks


Data Compression focuses on reducing the size of any type of data (not just images) while
preserving its essential information. Neural networks, particularly Deep Belief Networks
(DBNs) and Variational Autoencoders (VAEs), can compress structured or unstructured data.
Process of Data Compression in Neural Networks:
1. Representation Learning:
 Neural networks extract key features and discard redundant information.
2. Latent Space Representation:
 Data is compressed into a lower-dimensional space, much smaller than the
original input.
3. Reconstruction:
 A decoder reconstructs the original data from the compressed format during the
process.
Applications:
 Compressing textual data for NLP tasks.
 Reducing dimensions in large datasets for computational efficiency.
 Storing and transmitting large-scale IoT data.
Example:
 Compressing a 1GB text file into a 50MB vector representation while retaining key
details for further analysis.

Unit -4

Fuzzy Sets
 A fuzzy set is a type of set where each element can have a degree of
membership between 0 and 1.
 This means an element can partially belong to the set.
 Fuzzy sets are represented with tilde character(~).
 A fuzzy set A~ in the universe of discourse, U, can be defined as a set of ordered pairs
and it is given by

Difference Between Fuzzy Sets and Classical Sets:


 Classical Sets: Elements are either fully in the set (membership = 1) or not in
the set at all (membership = 0). There is no partial membership.
 Fuzzy Sets: Elements can have a partial membership, with values ranging from
0 to 1. This allows for representing situations with uncertainty or vagueness.
Terminologies
1. Universe of Discourse:
The total set of all possible values for the fuzzy set.
Example: For "Height," the universe might be all heights (e.g., 100 cm to 250
cm).
2. Membership Function (µµ):
A function that assigns each element in the universe a membership value (0 to
1).
Example:
For the fuzzy set "Warm Temperature,"
 20°C → Membership = 0.3
 25°C → Membership = 0.6
3. Support:
All elements of the fuzzy set with membership greater than 0.
Example: In "Warm Temperature," temperatures like 20°C and 25°C are part of
the support.
4. Core:
The element(s) with the highest membership (value = 1).
Example: If 30°C → Membership = 1, it is part of the core.
5. Crossover Point:
The element where the membership value equals 0.5.
Example: In "Warm Temperature," 23°C might have membership 0.5.
6. Fuzziness:
The level of vagueness or uncertainty in the fuzzy set.
Key Properties of Fuzzy Relations:
1. Reflexivity: A fuzzy set always relates to itself.
μR(x,x)=1
2. Symmetry: If A relates to B, then B also relates to A.
μR(x,y)=μR(y,x) for any elements x,y
3. Transitivity: If A relates to B, and B relates to C, then A should relate to C.
μR(x,z)≥min(μR(x,y),μR(y,z)) for any elements x,y,z.
4. Normalization: At least one element has a membership value of 1.

Properties of operations with crisp sets and fuzzy sets


Relations in Fuzzy Sets
A fuzzy relation is a set of ordered pairs where each pair has a membership value.
Example: Relation between "students" and their "grades."
 (John,85):µ=0.8
 (Jane,70):µ=0.6

Set Theoretic Operations


1. Union: Combines two fuzzy sets by taking the maximum membership value for
each element.

2. Intersection: Finds the common part by taking the minimum membership value.
µA∩B(x)=min(µA(x),µB(x))

3. Complement: Calculates the opposite membership.


µA′(x)=1−µA(x)

4. Difference: Subtracts one fuzzy set from another.


µA−B(x)=min(µA(x),1−µB(x))

5. Cartesian Product: The Cartesian product A×BA×B is the set of all ordered pairs
with membership values as the minimum of the two sets. Represents
relationships between two sets.
Example of Fuzzy Set
Fuzzy Set AA: "Warm Temperature"
 20°C: µ=0.4
 25°C: µ=0.7
 30°C: µ=0.9
Operations:
1. Complement:
 20°C: 1−0.4=0.61−0.4=0.6
 25°C: 1−0.7=0.31−0.7=0.3
2. Union with another set "Hot Temperature":
 25°C (Warm µ=0.7, Hot µ=0.5): Union µ=0.7

Applications of Fuzzy Sets


Decision Making, Image Processing, Artificial Intelligence

Fuzzy Rules and Reasoning


A fuzzy rule is a conditional statement used within fuzzy logic systems to infer an output
based on input variables.
They allow us to handle uncertainty and vagueness by using if-then rules and
inferencing based on fuzzy logic.
Fuzzy Rule
A fuzzy rule is written in the form of:
IF (condition) THEN (result)
 IF part: Known as the antecedent; it describes a condition or situation using
fuzzy sets.
 THEN part: Known as the consequent; it describes the output or result.
Each fuzzy rule uses membership functions to evaluate the degree to which the input
data satisfies the rule.
Example of a Fuzzy Rule
1. Problem: Controlling a fan speed based on temperature.
 Rule 1: IF temperature is low THEN fan speed is slow.
 Rule 2: IF temperature is medium THEN fan speed is moderate.
 Rule 3: IF temperature is high THEN fan speed is fast.
Here, "low," "medium," and "high" are fuzzy sets for temperature, and "slow,"
"moderate," and "fast" are fuzzy sets for fan speed.
Fuzzy Reasoning
Fuzzy reasoning is the process of making decisions using fuzzy rules
It involves the following steps:
1. Fuzzification:
Convert crisp input values into fuzzy values using membership functions.
2. Rule Evaluation:
Apply the fuzzy rules and calculate the degree of truth for each rule using
the membership value of the inputs.
3. Aggregation:
Combine the outputs of all fuzzy rules into a single fuzzy set.
4. Defuzzification:
Convert the fuzzy output back into a crisp value.
Types of Fuzzy Reasoning in Simple Words
1. Categorical Reasoning:
 Uses clear fuzzy rules without complex terms.
 Example: "If temperature is high, then fan speed is fast."
2. Qualitative Reasoning:
 Uses fuzzy "if-then" rules with words like "hot" or "cold."
 Example: "If room is warm, then reduce heater power."
 Often used in control systems.
3. Syllogistic Reasoning:
 Combines multiple fuzzy statements to make a conclusion.
 Example:
 "Most apples are sweet."
 "Most sweet things are liked."
 "So, most apples are liked."
4. Dispositional Reasoning:
 Uses words like "usually" to handle uncertainty.
 Example: "Usually, if it rains, the ground becomes wet."

Applications:
Washing machines, air conditioners, medical diagnosis.

Fuzzy IF-THEN Rules Explained in Detail


Fuzzy IF-THEN rules are a way to represent human reasoning and decision-making in a fuzzy
logic system. These rules connect input conditions to output actions using fuzzy sets and
membership functions.
 IF part (Antecedent): Specifies the condition using fuzzy sets.
 THEN part (Consequent): Specifies the resulting action or decision using fuzzy sets.
Structure of a Fuzzy Rule
The general format is:
IF (condition 1) AND/OR (condition 2)... THEN (result)
1. IF: Defines the input fuzzy set (e.g., "temperature is high").
2. THEN: Defines the output fuzzy set (e.g., "fan speed is fast").
3. Logical Connectives: Combines conditions using AND, OR, etc.

Example of Fuzzy IF-THEN Rules


Let’s control the speed of a fan based on room temperature.
1. Rule 1: IF temperature is low, THEN fan speed is slow.
2. Rule 2: IF temperature is medium, THEN fan speed is moderate.
3. Rule 3: IF temperature is high, THEN fan speed is fast.

How Fuzzy IF-THEN Rules Work?


1. Input Variable: Crisp input like "temperature = 25°C."
2. Fuzzification: Convert the crisp input into fuzzy values using membership functions.
 For example, temperature = 25°C may belong partially to:
 Low temperature (membership = 0.3).
 Medium temperature (membership = 0.7).
3. Rule Evaluation:
 Rule 1: "Temperature is low → Slow fan speed."
 Membership of low temperature = 0.3.
 Rule 2: "Temperature is medium → Moderate fan speed."
 Membership of medium temperature = 0.7.
4. Aggregation: Combine the results from all rules.
 Slow fan speed = 0.3.
 Moderate fan speed = 0.7.
5. Defuzzification: Convert the fuzzy result into a crisp value (e.g., fan speed = 50%).
Applications:
Control Systems, Medical Diagnosis, Decision making tools

Membership Functions
 Membership functions were first introduced in 1965 by Lofti A. Zadeh
 Membership functions can be defined as a technique to solve practical problems by
experience rather than knowledge.
 Membership functions are represented by graphical forms.
Mathematical Notation of membership functions
Features of Membership Functions
Core: For any fuzzy set A˜, the core of a membership function is that region of universe that is
characterize by full membership in the set.
μA˜(y)=1
Support: For any fuzzy set A˜, the support of a membership function is the region of universe
that is characterize by a nonzero membership in the set.
μA˜(y)>0
Boundary: For any fuzzy set A˜, the boundary of a membership function is the region of
universe that is characterized by a nonzero but incomplete membership in the set.
1>μA˜(y)>0

Importance of Membership Functions


 Define how each element belongs to a fuzzy set.
 Represent uncertainty in real-world problems.
 Provide a mathematical basis for decision-making in fuzzy systems.

Fuzzy Inference System


 It is the key unit of a fuzzy logic system having decision making as its primary work.
 It uses the “IF…THEN” rules along with connectors “OR” or “AND” for drawing
essential decision rules.
Characteristics of Fuzzy Inference System
 The output from FIS is always a fuzzy set irrespective of its input which can be fuzzy or
crisp.
 It is necessary to have fuzzy output when it is used as a controller.
 A defuzzification unit would be there with FIS to convert fuzzy variables into crisp
variables.
Functional Blocks of FIS
 Rule Base − It contains fuzzy IF-THEN rules.
 Database − It defines the membership functions of fuzzy sets used in fuzzy rules.
 Decision-making Unit − It performs operation on rules.
 Fuzzification Interface Unit − It converts the crisp quantities into fuzzy quantities.
 Defuzzification Interface Unit − It converts the fuzzy quantities into crisp quantities.
Following is a block diagram of fuzzy interference system.

Working of FIS
 A fuzzification unit supports the application of numerous fuzzification methods, and
converts the crisp input into fuzzy input.
 A knowledge base - collection of rule base and database is formed upon the conversion of
crisp input into fuzzy input.
 The defuzzification unit fuzzy input is finally converted into crisp output.

Defuzzification
 Defuzzification is the inverse process of fuzzification where the mapping is done to
convert the fuzzy results into crisp results.
 It uses the center of gravity methods to find the centroid of the sets.
 Here Imprecise data is converted into precise data.
Importance of Defuzzification
 Converts fuzzy outputs into crisp values for real-world use.
 Bridges the gap between fuzzy reasoning and practical applications.
 Helps in decision-making by providing actionable results.
 Ensures compatibility of fuzzy logic systems with traditional systems.
 Provides a clear output for control systems like air conditioners or robots.
Example of Defuzzification
Problem:
Control fan speed based on temperature using fuzzy logic.
Fuzzy Rules:
1. IF temperature is low, THEN fan speed is slow.
2. IF temperature is medium, THEN fan speed is moderate.
3. IF temperature is high, THEN fan speed is fast.
Inputs:
 Temperature = 25°C.
 Fuzzy membership:
 Low = 0.3, Medium = 0.7.
Defuzzification Process:
After rule evaluation, the fan speed outputs are:
 Slow (membership = 0.3).
 Moderate (membership = 0.7).
Using the centroid method, the crisp fan speed might be calculated as:
Fan Speed=∑(μ(x)⋅x)∑μ(x)
For example, if:
 Slow = 20 RPM, Moderate = 50 RPM:
Fan Speed=(0.3⋅20)+(0.7⋅50)0.3+0.7=41 RPM.
The defuzzified result, 41 RPM, is the fan speed used in real-world systems.

Common Methods of Defuzzification


1. Centroid of Area (COA):
Calculates the center of the area under the membership function curve.
 Example: For a triangular fuzzy set with peaks at 10, 20, and 30, the crisp value
might be 20 (center of the triangle).
2. Mean of Maximum (MOM):
Averages the maximum membership values.
 Example: If max membership occurs at 15 and 25, the defuzzified value
is (15+25)/2=2
3. Smallest of Maximum (SOM):
Selects the smallest value with the highest membership.
 Example: If maximum membership is at 10 and 20, the defuzzified value is 10.
4. Largest of Maximum (LOM):
Selects the largest value with the highest membership.
 Example: If maximum membership is at 10 and 20, the defuzzified value is 20.
5. Weighted Average Method (WAM):
Calculates the weighted average based on membership values.
 Example: If memberships are 0.4 at 10 and 0.6 at 30, the defuzzified value is:
Value=(0.4 * 10)+(0.6 * 30) / 0.4+0.6=22
6. Bisector of Area (BOA):
Divides the area under the membership function into two equal halves.
 Example: For a triangular membership curve, the defuzzified value may fall at a
point where the left and right areas are equal.

Applications of fuzzy logic


1. Control Systems: Fuzzy logic controllers are used in the automatic transmission systems of
vehicles to decide the optimal gear shift points based on speed, throttle position, and other
factors.
2. Expert Systems: Medical diagnosis systems use fuzzy logic to evaluate symptoms and
medical test results, providing a diagnosis that accounts for the uncertainty and variability in
human health.
3. Image Processing: In edge detection, fuzzy logic can be used to classify pixels as part of an
edge or not, based on their intensity gradients, providing more accurate results than traditional
methods.
4. Decision Support Systems: In financial decision-making, fuzzy logic can be used to evaluate
the risk and return of investment options under uncertain market conditions.
5. Natural Language Processing (NLP): In sentiment analysis, fuzzy logic can be used to
evaluate the sentiment of a text based on the degrees of positive, negative, and neutral sentiments
expressed in the words and phrases.
6. Robotics: Autonomous robots use fuzzy logic to navigate through environments by making
real-time decisions about speed, direction, and obstacle avoidance.
7. Industrial Automation: In chemical manufacturing, fuzzy logic controllers are used to
maintain optimal temperature and pressure conditions in reactors, ensuring consistent product
quality.

Numerical
Unit -5
Genetic Algorithm
 Genetic Algorithm is one of the heuristic algorithms.
 They are used to solve optimization problems.
 They are inspired by Darwin’s Theory of Evolution.
 They are an intelligent exploitation of a random search.
 Although randomized, Genetic Algorithms are by no means random.
1. Selection (Reproduction)-
 It is the first operator applied on the population.
 It selects the chromosomes from the population of parents to cross over and produce
offspring.
 It is based on evolution theory of “Survival of the fittest” given by Darwin.
There are many techniques for reproduction or selection operator such as-
 Tournament selection
 Ranked position selection
 Steady state selection etc.

2. Cross Over-
 Then crossover operator is applied to the mating pool to create better strings.
 Crossover operator makes clones of good strings but does not create new ones.
 By recombining good individuals, the process is likely to create even better individuals.

3. Mutation-
 Mutation is a background operator.
 Mutation of a bit includes flipping it by changing 0 to 1 and vice-versa.
 After crossover, the mutation operator subjects the strings to mutation.
 It facilitates a sudden change in a gene within a chromosome.
 Thus, it allows the algorithm to see for the solution far away from the current ones.
 It guarantees that the search algorithm is not trapped on a local optimum.
 Its purpose is to prevent premature convergence and maintain diversity within the
population.

Algorithm-
Step-01:
 Randomly generate a set of possible solutions to a problem.
 Represent each solution as a fixed length character string.
Step-02:
Using a fitness function, test each possible solution against the problem to evaluate them.
Step-03:
 Keep the best solutions.
 Use the best solutions to generate new possible solutions.
Step-04:
Repeat the previous two steps until-
 Either an acceptable solution is found
 Or until the algorithm has completed its iterations through a given number of cycles /
generations.
Flow Chart-
The following flowchart represents how a genetic algorithm works-

Advantages:
 Genetic Algorithms are better than conventional AI.
 They do not break easily unlike older AI systems.
Application of Genetic Algorithms: Recurrent Neural Network, Code breaking, Filtering and
signal processing, Learning fuzzy rule base.

Definitions:
 Population- It is a subset of all the possible (encoded) solutions to the given problem.
The population for a GA is analogous to the population for human beings except that
instead of human beings, we have Candidate Solutions representing human beings.

 Chromosomes − A chromosome is one such solution to the given problem.


Chromosomes are strings of DNA and consist of genes, blocks of DNA. Each gene
encodes a trait, for example, the color of the eye.

 Gene − A gene is one element position of a chromosome.

 Reproduction: During reproduction, combination (or crossover) occurs first. Genes from
parents combine to form a whole new chromosome. The newly created offspring can then
be mutated.
 Allele − It is the value a gene takes for a particular chromosome.

 Genotype − Genotype is the population in the computation space. In the computation


space, the solutions are represented in a way which can be easily understood and
manipulated using a computing system.

 Phenotype − Phenotype is the population in the actual real world solution space in which
solutions are represented in a way they are represented in real world situations.

 Decoding - Decoding is a process of transforming a solution from the genotype to the


phenotype space, while encoding is a process of transforming from the phenotype to
genotype space.
 Encoding - Encoding is a process of representing individual genes. The process can be
performed using bits, numbers, trees, arrays, lists or any other objects. The encoding
depends mainly on solving the problems.

 Fitness Function − A fitness function simply defined is a function which takes the
solution as input and produces the suitability of the solution as the output. In some cases,
the fitness function and the objective function may be the same, while in others it might be
different based on the problem.
 Genetic Operators − These alter the genetic composition of the offspring. These include
crossover, mutation, selection, etc.
Crossover types
Crossover is a genetic operator used to vary the programming of a chromosome or chromosomes
from one generation to the next. Crossover is sexual reproduction.

Different types of crossover:


Single Point Crossover: A crossover point on the parent organism string is selected. All data
beyond that point in the organism string is swapped between the two parent organisms. Strings
are characterized by Positional Bias.

Two-Point Crossover : This is a specific case of a N-point Crossover technique. Two random
points are chosen on the individual chromosomes (strings) and the genetic material is exchanged
at these points.

Uniform Crossover: Each gene (bit) is selected randomly from one of the corresponding genes
of the parent chromosomes.
Use tossing of a coin as an example technique.
The crossover between two good solutions may not always yield a better or as good a solution.
Since parents are good, the probability of the child being good is high. If offspring is not good
(poor solution), it will be removed in the next iteration during “Selection”.

Problems with Crossover:


 Depending on coding, simple crossovers can have a high chance to produce illegal
offspring.
 Uniform crossover can often be modified to avoid this problem

Mutation techniques
Mutation may be defined as a small random tweak in the chromosome, to get a new solution. It is
used to maintain and introduce diversity in the genetic population.
Mutation Operators
Mutation Operator is a unary operator and it needs only one parent to work on.
Bit Flip Mutation
In this bit flip mutation, we select one or more random bits and flip them. This is used for binary
encoded GAs.

Random Resetting
Random Resetting is an extension of the bit flip for the integer representation. In this, a random
value from the set of permissible values is assigned to a randomly chosen gene.
Swap Mutation
In swap mutation, we select two positions on the chromosome at random, and interchange the
values. This is common in permutation based encodings.

Scramble Mutation
Scramble mutation is also popular with permutation representations. In this, from the entire
chromosome, a subset of genes is chosen and their values are scrambled or shuffled randomly.
Inversion Mutation
In inversion mutation, we select a subset of genes like in scramble mutation, but instead of
shuffling the subset, we merely invert the entire string in the subset.

Schema Theorem in GA
 It is also called fundamental theorem of genetic algorithm.
 The theorem was proposed by John Holland in 1970.
 Objective of the theorem is to provide a formal model for the effectiveness of the GA
search process.
Schema (H)- A schema is a template that identifies a subset of strings
with similarities at certain string positions.
Order of Schema o(H)- The order of a schema is defined as the number of fixed positions in
the template.
Schema Defining Length 8(H)- It is a difference between first and last specific position.
The Theorem
The schema theorem states that short, low-order schemata with above-
average fitness increase exponentially in successive generations.
Expressed as an equation:

Where,
• m(H, t) is the number of strings belonging to schema H at generation t.
• f(H) is the observed average fitness of schema H.
• at is the observed average fitness at generation t.
• The probability of disruption p is the probability that crossover mutation will destroy the
schema
The probability p can be expressed as

Where,
• o(H)is a order of schema.
• I is the length of code.
• Pm is the probability of mutation.
• Pc is the probability of crossover:
so, a schema with shorter defining length S (H) is less likely to be disrupted.
Limitations:
 Assumes gene positions are independent, which may not hold in real problems.
 Ignores disruptive effects of crossover and mutation.
 Overemphasizes short, low-order, high-fitness schemata.
 Lacks clarity in complex fitness landscapes.

Travelling salesman problem using GA


The problem says that a salesman is given a set of cities, he must find the shortest route to as to
visit each city exactly once and return to the starting city.
Approach:
 In the following implementation, cities are taken as genes.
 string generated using these characters is called a chromosome.
 Fitness score which is equal to the path length of all the cities mentioned, is used to target
a population.
 The number of iterations depends upon the value of a cooling variable.
 The value of the cooling variable keeps on decreasing with each iteration and reaches a
threshold after a certain number of iterations.
Algorithm
1. Initialize the population randomly.
2. Determine the fitness of the chromosome.
3. Until done repeat:
1. Select parents.
2. Perform crossover and mutation.
3. Calculate the fitness of the new population.
4. Append it to the gene pool.
How the mutation works?
Suppose there are 5 cities: 0, 1, 2, 3, 4. The salesman is in city 0 and he has to find the shortest
route to travel through all the cities back to the city 0. A chromosome representing the path
chosen can be represented as:

This chromosome undergoes mutation. During mutation, the position of two cities in the
chromosome is swapped to form a new configuration, except the first and the last cell, as they
represent the start and endpoint.
Original chromosome had a path length equal to INT_MAX, according to the input defined
below, since the path between city 1 and city 4 didn’t exist. After mutation, the new child formed
has a path length equal to 21, which is a much-optimized answer than the original assumption.
Benefit: GA helps find near-optimal solutions efficiently in large and complex search
spaces like TSP.
Time complexity: O(n^2) as it uses nested loops to calculate the fitness value of each
gnome in the population.
Auxiliary Space: O(n)

Ant Colony Optimization


Ant Colony Optimization is a method inspired by how ants find the shortest path to food.
Ants leave a chemical trail called pheromone on paths they travel.
Other ants follow this trail, and better paths get more pheromones, guiding more ants to take
them.
Stage 1:

In this stage, there is no pheromone in the path, and there are empty paths from food to the ant
colony.
Stage2:

In this stage, ants are divided into two groups following two different paths with a probability of
0.5. So we have four ants on the longer path and four on the shorter path.
Stage 3:
Now, the ants which follow the shorter path will react to the food first, and then the pheromone
concentration will be higher on this path as more ants from the colony will follow the shorter
path.
Stage 4:

Now more ants will return from the shortest path, and the concentration of pheromones will be
higher. Also, the rate of evaporation from the longer path will be higher as fewer ants are using
that path. Now more ants from the colony will use the shortest path.
Algorithm
 Now the above behavior of the ants can be used to design the algorithm to find the
shortest path.
 We can consider the ant colony and food source as the node or vertex of the graph and the
path as the edges to these vertices.
 Let's suppose there are only two paths which are P1 and P2. C1 and C2 are the weight or
the pheromone concentration along the path, respectively.
 So we can represent it as graph G(V, E) where V represents the Vertex and E represents
the Edge of the graph.
 Initially, for the ith path, the probability of choosing is:

 If C1 > C2, then the probability of choosing path 1 is more than path 2. If C 1 < C2, then
Path 2 will be more favorable.
 Concentration of pheromone according to the length of the path:

 Where Li is the length of the path and K is the constant depending upon the length of the
path. If the path is shorter, concentration will be added more to the existing pheromone
concentration.
 2. Change in concentration according to the rate of evaporation:

 Here parameter v varies from 0 to 1. If v is higher, then the concentration will be less.

Particle Swarm Optimization


 Particle Swarm Optimization (PSO) is a powerful meta-heuristic optimization algorithm.
 It is inspired by swarm behavior observed in nature such as fish and bird schooling.
Mathematical model
 Each particle in particle swarm optimization has an associated position, velocity, fitness
value.
 Each particle keeps track of the particle_bestFitness_value particle_bestFitness_position.
 A record of global_bestFitness_position and global_bestFitness_value is maintained.

Algorithm:
 Place a group of particles (solutions) randomly in the search space.
 Assign each particle a velocity and a position.
 Calculate the fitness (quality) of each particle based on the problem's objective function.
 Track the best position each particle has visited (personal best).
 Track the best position visited by any particle in the swarm (global best).
 Adjust each particle's velocity based on:
 Its personal best position.
 The global best position.
 Random factors for exploration and exploitation.
Formula for velocity:

 Move each particle based on its updated velocity.


 Continue updating fitness, velocity, and position until a stopping condition is met (e.g., a
set number of iterations or convergence).
 Return the global best position as the optimal solution.

Advantages of PSO:
1. Derivative free.
2. Very few algorithm parameters.
3. Very efficient global search algorithm.
Disadvantages of PSO:
1. Slow convergence in the refined search stage (Weak local search ability).

Numerical

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy