Soft Computing Notes (1)
Soft Computing Notes (1)
Unit -1
The McCulloch Pitt's Model of Neuron was developed by Warren McCulloch and Warren
Pitts in 1943.
McCulloh/Pitt’s Model accepts only Boolean inputs.
In McCulloh/Pitt’s Model the inputs are not weighted which means that this model is not
flexible.
Architecture
Inputs: The neuron receives multiple input signals (like 1s and 0s).
Weights: Each input has a weight that can change the importance of that input.
Summation: All the weighted inputs are added together to get a total value.
Threshold: The neuron has a fixed number called a threshold.
Activation: If the total value is equal to or greater than the threshold, the neuron fires
(outputs 1). If it is less, the neuron does not fire (outputs 0).
Output: The result is either 1 (active) or 0 (inactive), like a yes/no decision.
Example
John carries an umbrella if it is sunny or if it is raining. There are four given situations. I need to
decide when John will carry the umbrella. The situations are as follows:
First scenario: It is not raining, nor it is sunny
Second scenario: It is not raining, but it is sunny
Third scenario: It is raining, and it is not sunny
Fourth scenario: It is raining as well as it is sunny
To analyse the situations using the McCulloch-Pitts neural model, I can consider the input
signals as follows:
X1: Is it raining?
X2 : Is it sunny?
So, the value of both scenarios can be either 0 or 1. We can use the value of both weights X1 and
X2 as 1 and a threshold function as 1. So, the neural network model will look like:
1 0 0 0 0
2 0 1 1 1
3 1 0 1 1
4 1 1 2 1
From the truth table, we can conclude that in the situations where the value of yout is 1, John
needs to carry an umbrella. Hence, he will need to carry an umbrella in scenarios 2, 3 and 4.
Example: Suppose a neural network is trying to predict whether an image is a cat or not (binary
output, 1 or 0). The actual output y of the neuron might be 0.8, but the desired output d is 1. The
Delta rule will adjust the weights so that the network learns to output a value closer to 1 the next
time the same input is given.
Types of Learning
Supervised Learning:
It happens in the presence of supervisor.
Here the model is trained on labeled data, where both the input and the corresponding
correct output are provided.
The network learns to map inputs to outputs by adjusting weights based on the error
between predicted outputs and actual outputs.
Advantages:
It is fast learning mechanisms.
It helps to predict output on a prior basis.
Provides high accuracy.
Disadvantages:
You might not be able to solve complex problems.
Requires a lot of computational time to train the algorithm.
Example: Image classification, where the model is trained with images of cats and dogs labeled
as "cat" or "dog."
Other examples are image segmentation, medical diagnosis, spam detection, fraud detection,
speech recognition.
Unsupervised Learning:
It happens in absence of supervisor.
Here the model is trained on data without any labels. The algorithm tries to find patterns
or structures in the input data.
The network learns to group similar data points or reduce dimensionality without
guidance on what the output should be.
Advantages:
It reduces effort of labelling data.
It provides accurate results.
Disadvantages:
Requires a lot of computational time to train algorithms.
May be difficult to predict quality of model output.
Example: Clustering customer data for market segmentation. The model identifies groups of
similar customers based on purchasing behavior without any predefined categories. Other
applications are – network analysis, recommendation system etc.
Reinforcement Learning
It learns from feedback and past experiences.
It is a long-term iterative process.
It is also called the Markov Decision Process.
It does not have any labelled or unlabeled datasets.
Advantages:
Helps us to solve complex problems in the real world.
Gives the most accurate results.
Disadvantages:
It requires a huge amount of data and computation.
It is not preferred for simpler problems.
Example: Training a robot to navigate a maze. The robot receives positive rewards for reaching
the goal and negative penalties for hitting walls. It learns the best path through trial and error.
Architecture
Input Layer: This is the first layer of ANN. It takes raw data directly like images pixels,
numerical data.
Hidden Layer: There are one or more layers present between input and output layer. They
are responsible for processing and extracting features from data.
Output Layer: It is the final layer that provides the prediction or result. In this number of
neurons depends on type of problem.
Characteristics
Learning Ability: Can learn from data and improve over time.
Adaptability: Can adjust to new data by changing weights.
Parallel Processing: Can process many calculations at the same time.
Non-linear: Can handle complex relationships between inputs and outputs.
Black Box: Hard to understand exactly how decisions are made.
Merits (Advantages)
Good at finding patterns and making predictions.
Works for many types of problems (e.g., images, text).
Learns without needing a lot of human instructions.
Demerits (Disadvantages)
It takes a lot of time and data to train.
Hard to understand how it makes decisions.
Need powerful computers and a lot of energy.
Applications
Image Recognition, Speech Recognition, Medical Diagnosis
Feedforward Network
A feedforward neural network is a type of neural network where information flows in one
direction from the input to the output layers, without cycles or loops
In a Feedforward Neural Network, data is passed through a series of layers:
Input Layer: Receives the initial data.
Hidden Layers: Process the data received from the input layer. These layers can be one
or more, each consisting of neurons that apply activation functions to their inputs.
Output Layer: Produces the final output.
The data flows in one direction, from the input layer to the output layer, without any feedback
loops.
The recurrent connections allow RNNs to maintain a memory of previous inputs, which
is crucial for tasks involving sequential data.
Applications:
NLP, Time series prediction, Speech recognition, handwriting recognition.
Advantages:
Can handle dynamic systems
It is good for sequential tasks.
Suitable for problems like speech recognition, time series prediction, and control
systems.
Disadvantages:
Requires lot of time to train the network.
It is more complex in nature.
Requires more computational resources.
XOR Problem
Definition: The XOR (exclusive OR) problem is a classic example in machine learning
that demonstrates the limitation of single layer perceptron’s.
Explanation:
The XOR function outputs 1 if the inputs are different (e.g., (0, 1) or (1, 0)),
and 0 if the inputs are the same (e.g., (0, 0) or (1, 1)).
The XOR problem is not linearly separable because you cannot draw a single
straight line to separate the 1s from the 0s in a 2D plane.
XOR Problem Example
Input Input XOR
1 2 Output
0 0 0
0 1 1
1 0 1
1 1 0
Visualization:
Points (0, 0) and (1, 1) represent output 0.
Points (0, 1) and (1, 0) represent output 1.
You cannot draw a straight line to separate points with output 1 from those with
output 0.
Why XOR is Important
Limitations of Single-Layer Perceptron:
A single-layer perceptron (linear classifier) cannot solve the XOR problem as it
cannot model non-linear relationships.
Solution with Multi-Layer Networks:
Multi-Layer Perceptrons (MLPs), with at least one hidden layer, can solve the
XOR problem.
Activation Functions in hidden layers allow MLPs to model non-linear decision
boundaries, enabling them to classify data like XOR accurately.
1. Step Function
Definition: This function outputs either 0 or 1 based on whether the input is above or
below a certain threshold.
Formula:f(x)={1if x≥00if x<0f(x)={10if x≥0if x<0
Use: Simple binary classification tasks.
Example: Deciding if an email is spam (1) or not spam (0).
2. Sigmoid Function
Definition: This function maps any real-valued number to a value between 0 and 1,
creating an "S" shaped curve.
Formula:f(x)=11+e−xf(x)=1+e−x1
Use: Good for models where we want to predict probabilities.
Example: Output of a neuron representing the likelihood of a class.
Definition: Similar to the sigmoid function, but it maps values to a range between -1 and
1.
Formula:f(x)=ex−e−xex+e−x
Use: Preferred over sigmoid when outputs need to be zero-centered.
Example: Used in hidden layers of neural networks for better performance.
5. Leaky ReLU
Definition: A variation of ReLU that allows a small, non-zero gradient when the input is
negative.
Formula:f(x)={xif x≥0αxif x<0(where αα is a small constant, like 0.01)
Use: Addresses the "dying ReLU" problem where neurons become inactive.
Example: Helps maintain some output even for negative inputs.
6. Softmax Function
Definition: This function converts a vector of raw scores (logits) into probabilities that
sum to 1.
Formula:f(xi)=exi∑exj(for each element xi in the input vector)
Use: Used in the output layer for multi-class classification problems.
Example: Classifying images into multiple categories (like cat, dog, bird).
McCulloch Pitts is a neuron model to design logic networks for AND and OR Logic
functions.
Unit -2
Advantages:
RBF networks are faster to train compared to other neural networks
They are effective for problems where local patterns are important.
Disadvantages:
The performance can degrade if there are outliers in the data.
Since the network uses all training data to compute distances, it can be memory-
intensive.
Example: Suppose you're trying to classify whether an email is spam or not. An RBF network
can learn the patterns in email content, where it focuses on the local features of the text (like the
frequency of specific words). The closer an email's features match a typical spam email, the
higher the activation, and the network can classify it as spam.
Algorithm:
1. Initialize the weights w randomly and set the learning rate η.
2. Input the training data x and desired output d.
3. Calculate output: Compute the net input y=∑(wi×xi).
4. Compute error: Find the error e=d−y.
5. Update weights: Adjust the weights wi=wi+η×e×xi .
6. Repeat steps 3-5 for all training samples until the error is minimized (or a stopping
condition is met).
Merits of Adaline:
Simple and easy to implement.
Effective for problems with linearly separable data.
Demerits of Adaline:
Limited to linear classification tasks; cannot handle non-linear data well.
Example:
Adaline: Think of Adaline as a model predicting the price of a house based on features
like size and location. It adjusts its prediction error gradually to get closer to the real
price.
Algorithm:
1. Initialize the weights for all layers and set the learning rate ηη.
2. Input the training data xx.
3. Forward pass: Calculate the output for each Adaline unit in the hidden layer.
4. Threshold logic: Apply thresholding to determine the final output.
5. Compute error: Compare the final output with the desired output.
6. Update weights: Adjust the weights for all layers based on the error using the Madaline
Rule.
7. Repeat until the error is minimized or the network converges.
Merits of Madaline:
Can solve more complex, non-linear problems.
Suitable for tasks requiring multi-layer processing.
Demerits of Madaline:
Training can be slower and more computationally expensive due to its multi-layer
structure.
More complex than single-layer networks like Adaline or perceptron.
Example: predicting both house price and sales timing by considering more features and using
multiple layers of neurons to reach the final prediction.
Algorithm:
Initialize weights w and bias b randomly and set the learning rate n.
Input training data x and desired output d.
For each training example:
o Calculate net input:z=∑(wi×xi)+b
o Apply activation function (step function):y={1if z≥00if z<0
Compute error: e=d−y
If error exists (e≠0e =0):
o Update weights:wi=wi+η×e×xi
o Update bias:b=b+η×e
Repeat steps 3 to 5 for all training examples and continue for multiple epochs
until convergence.
Advantages:
Easy to understand and implement.
Requires less computational power and time compared to more complex models.
Disadvantages:
Can only solve problems that are linearly separable.
Cannot model complex relationships between inputs and outputs
Example: Suppose you want to classify whether an email is spam (1) or not spam (0). The
perceptron learns from past emails (features like keywords and sender) and adjusts weights
based on whether its predictions are correct. Over time, it becomes better at making the right
predictions.
Multi-Layer Perceptron (MLP) Learning Algorithm
A Multi-Layer Perceptron (MLP) is a type of artificial neural network that has one or more
hidden layers between the input and output layers.
It can model complex relationships by learning non-linear patterns through backpropagation and
activation functions.
Structure:
Input Layer: Takes input data.
Hidden Layers: One or more layers that process data using weights, biases, and
activation functions.
Output Layer: Produces the final prediction or classification.
Activation Functions: Common ones include ReLU, sigmoid, and tanh for hidden
layers to introduce non-linearity.
Algorithm for Training MLP
Initialize weights and biases randomly.
Pass input data through each layer to compute the output.
Apply activation functions to introduce non-linearity.
Compute the loss using a loss function.
Calculate the gradient of the loss with respect to each weight and bias using the
chain rule.
Propagate errors backward from the output layer to the input layer.
Update weights and biases using a gradient descent optimizer.
Repeat the forward and backward propagation steps for multiple epochs until the
model converges or reaches an acceptable level of accuracy.
Advantages of MLP
Can learn non-linear relationships and solve problems that are not linearly separable.
Can be used for various tasks, including classification, regression, and pattern
recognition.
Theoretically, an MLP with one hidden layer containing sufficient neurons can
approximate any continuous function.
Disadvantages of MLP
Requires a lot of computational power and time.
Requires a lot of labeled data.
Hard to interpret the internal workings and decision-making process of the network.
Characteristics:
It trains multi-layer neural networks.
It Reduces the difference between predicted and actual outputs by adjusting weights.
It Updates weights from the output layer back to the input layer.
Needs sufficient training data for good performance.
Advantages:
Requires no prior neural network knowledge.
It is simple and flexible.
The algorithm scales efficiently with larger datasets and more complex problems.
Disadvantages:
It is a complex process.
Difficult to train the network.
Requires a lot of computational time to train the network.
Applications:
Image Recognition, Speech recognition, NLP, Medical Diagnosis etc.
Algorithm:
Input Layer: The input layer consists of neurons that receive the input data directly. The
input is n-dimensional vector representing a data point with n features.
Output Layer: The output layer is a grid of neurons arranged in a 1D, 2D, or sometimes 3D
space. Each neuron in this grid acts as a cluster representative and is fully connected to the
input layer.
Weight Vectors: Each neuron in the output layer has an associated weight vector of the same
dimension as the input data. Those weights are being adjusted during training to represent
clusters of similar input data points.
Architecture:
1. Input Layer: Takes in the input data vector. Each input neuron represents one feature of
the input.
2. Kohonen Layer (SOM): The first layer is an unsupervised competitive layer. Neurons in
this layer learn to cluster the input data and identify the winning neuron (Best Matching
Unit).
3. Grossberg Layer (Output Layer): This is a supervised learning layer that maps the
clustered representation from the Kohonen layer to the target outputs. It adjusts weights
based on the expected output for a given input.
Advantages:
It is well suited for network that require a lot of training data and have multiple layers.
Easy to implement due to its structured learning approach.
Disadvantages:
It is very expensive.
It can take a lot of computational time to train a network.
Applications:
Data compression specially for Image and audio
Function approximation
Pattern association
Convolutional layer: The convolutional layer applies filters to the input image to create
feature maps. These filters help the network detect features like edges, shapes, and
textures.
Pooling Layer: The pooling layer reduces the size of the feature maps, making the
network faster and reducing the amount of computation while retaining important
information.
Fully Connected Layer: The fully connected layer at the end takes the learned features
and uses them to make a final prediction or classification.
Training Process:
1. Feed the input data (e.g., an image) into the network.
2. Pass the input through the convolutional, pooling, and fully connected layers to generate
an output (prediction).
3. Compare the predicted output to the actual target value using a loss function to determine
the error.
4. Propagate the error backward through the network to update the weights of the filters and
connections.
5. Compute the gradients of the loss function with respect to the weights using optimization
algorithms.
6. Adjust the weights in the network using the calculated gradients and a learning rate.
7. Continue feeding new input data and repeat the process for multiple iterations until the
model's performance improves and the error is minimized.
8. Stop training when a stopping condition is met.
Advantages of CNNs:
Good at detecting patterns and features in images, videos, and audio signals.
Can handle large amounts of data and achieve high accuracy.
Disadvantages of CNNs:
Computationally expensive to train and require a lot of memory.
Requires large amounts of labeled data.
Applications:
Image classification, image segmentation, object detection, speech recognition.
CNNs are required for large images because they effectively reduce the dimensionality, focus on
important features, and minimize computational costs while preserving the ability to learn
complex patterns.
Training Process:
Neural networks, particularly autoencoders, are widely used for this purpose.
Process of Image Compression in Neural Networks:
1. Autoencoder Architecture:
The network is divided into two parts: Encoder and Decoder.
The encoder compresses the input image into a smaller representation (latent
space).
The decoder reconstructs the image from this compressed representation.
2. Training:
Train the network using image datasets to minimize the reconstruction error
between the original and decoded images.
Loss functions like Mean Squared Error (MSE) are commonly used.
3. Compressed Representation:
The latent space stores the image in a compact form, enabling compression.
Applications:
Compressing a 256x256 image into a 32-dimensional latent vector, reducing the data size
significantly while retaining key visual details.
Unit -4
Fuzzy Sets
A fuzzy set is a type of set where each element can have a degree of
membership between 0 and 1.
This means an element can partially belong to the set.
Fuzzy sets are represented with tilde character(~).
A fuzzy set A~ in the universe of discourse, U, can be defined as a set of ordered pairs
and it is given by
2. Intersection: Finds the common part by taking the minimum membership value.
µA∩B(x)=min(µA(x),µB(x))
5. Cartesian Product: The Cartesian product A×BA×B is the set of all ordered pairs
with membership values as the minimum of the two sets. Represents
relationships between two sets.
Example of Fuzzy Set
Fuzzy Set AA: "Warm Temperature"
20°C: µ=0.4
25°C: µ=0.7
30°C: µ=0.9
Operations:
1. Complement:
20°C: 1−0.4=0.61−0.4=0.6
25°C: 1−0.7=0.31−0.7=0.3
2. Union with another set "Hot Temperature":
25°C (Warm µ=0.7, Hot µ=0.5): Union µ=0.7
Applications:
Washing machines, air conditioners, medical diagnosis.
Membership Functions
Membership functions were first introduced in 1965 by Lofti A. Zadeh
Membership functions can be defined as a technique to solve practical problems by
experience rather than knowledge.
Membership functions are represented by graphical forms.
Mathematical Notation of membership functions
Features of Membership Functions
Core: For any fuzzy set A˜, the core of a membership function is that region of universe that is
characterize by full membership in the set.
μA˜(y)=1
Support: For any fuzzy set A˜, the support of a membership function is the region of universe
that is characterize by a nonzero membership in the set.
μA˜(y)>0
Boundary: For any fuzzy set A˜, the boundary of a membership function is the region of
universe that is characterized by a nonzero but incomplete membership in the set.
1>μA˜(y)>0
Working of FIS
A fuzzification unit supports the application of numerous fuzzification methods, and
converts the crisp input into fuzzy input.
A knowledge base - collection of rule base and database is formed upon the conversion of
crisp input into fuzzy input.
The defuzzification unit fuzzy input is finally converted into crisp output.
Defuzzification
Defuzzification is the inverse process of fuzzification where the mapping is done to
convert the fuzzy results into crisp results.
It uses the center of gravity methods to find the centroid of the sets.
Here Imprecise data is converted into precise data.
Importance of Defuzzification
Converts fuzzy outputs into crisp values for real-world use.
Bridges the gap between fuzzy reasoning and practical applications.
Helps in decision-making by providing actionable results.
Ensures compatibility of fuzzy logic systems with traditional systems.
Provides a clear output for control systems like air conditioners or robots.
Example of Defuzzification
Problem:
Control fan speed based on temperature using fuzzy logic.
Fuzzy Rules:
1. IF temperature is low, THEN fan speed is slow.
2. IF temperature is medium, THEN fan speed is moderate.
3. IF temperature is high, THEN fan speed is fast.
Inputs:
Temperature = 25°C.
Fuzzy membership:
Low = 0.3, Medium = 0.7.
Defuzzification Process:
After rule evaluation, the fan speed outputs are:
Slow (membership = 0.3).
Moderate (membership = 0.7).
Using the centroid method, the crisp fan speed might be calculated as:
Fan Speed=∑(μ(x)⋅x)∑μ(x)
For example, if:
Slow = 20 RPM, Moderate = 50 RPM:
Fan Speed=(0.3⋅20)+(0.7⋅50)0.3+0.7=41 RPM.
The defuzzified result, 41 RPM, is the fan speed used in real-world systems.
Numerical
Unit -5
Genetic Algorithm
Genetic Algorithm is one of the heuristic algorithms.
They are used to solve optimization problems.
They are inspired by Darwin’s Theory of Evolution.
They are an intelligent exploitation of a random search.
Although randomized, Genetic Algorithms are by no means random.
1. Selection (Reproduction)-
It is the first operator applied on the population.
It selects the chromosomes from the population of parents to cross over and produce
offspring.
It is based on evolution theory of “Survival of the fittest” given by Darwin.
There are many techniques for reproduction or selection operator such as-
Tournament selection
Ranked position selection
Steady state selection etc.
2. Cross Over-
Then crossover operator is applied to the mating pool to create better strings.
Crossover operator makes clones of good strings but does not create new ones.
By recombining good individuals, the process is likely to create even better individuals.
3. Mutation-
Mutation is a background operator.
Mutation of a bit includes flipping it by changing 0 to 1 and vice-versa.
After crossover, the mutation operator subjects the strings to mutation.
It facilitates a sudden change in a gene within a chromosome.
Thus, it allows the algorithm to see for the solution far away from the current ones.
It guarantees that the search algorithm is not trapped on a local optimum.
Its purpose is to prevent premature convergence and maintain diversity within the
population.
Algorithm-
Step-01:
Randomly generate a set of possible solutions to a problem.
Represent each solution as a fixed length character string.
Step-02:
Using a fitness function, test each possible solution against the problem to evaluate them.
Step-03:
Keep the best solutions.
Use the best solutions to generate new possible solutions.
Step-04:
Repeat the previous two steps until-
Either an acceptable solution is found
Or until the algorithm has completed its iterations through a given number of cycles /
generations.
Flow Chart-
The following flowchart represents how a genetic algorithm works-
Advantages:
Genetic Algorithms are better than conventional AI.
They do not break easily unlike older AI systems.
Application of Genetic Algorithms: Recurrent Neural Network, Code breaking, Filtering and
signal processing, Learning fuzzy rule base.
Definitions:
Population- It is a subset of all the possible (encoded) solutions to the given problem.
The population for a GA is analogous to the population for human beings except that
instead of human beings, we have Candidate Solutions representing human beings.
Reproduction: During reproduction, combination (or crossover) occurs first. Genes from
parents combine to form a whole new chromosome. The newly created offspring can then
be mutated.
Allele − It is the value a gene takes for a particular chromosome.
Phenotype − Phenotype is the population in the actual real world solution space in which
solutions are represented in a way they are represented in real world situations.
Fitness Function − A fitness function simply defined is a function which takes the
solution as input and produces the suitability of the solution as the output. In some cases,
the fitness function and the objective function may be the same, while in others it might be
different based on the problem.
Genetic Operators − These alter the genetic composition of the offspring. These include
crossover, mutation, selection, etc.
Crossover types
Crossover is a genetic operator used to vary the programming of a chromosome or chromosomes
from one generation to the next. Crossover is sexual reproduction.
Two-Point Crossover : This is a specific case of a N-point Crossover technique. Two random
points are chosen on the individual chromosomes (strings) and the genetic material is exchanged
at these points.
Uniform Crossover: Each gene (bit) is selected randomly from one of the corresponding genes
of the parent chromosomes.
Use tossing of a coin as an example technique.
The crossover between two good solutions may not always yield a better or as good a solution.
Since parents are good, the probability of the child being good is high. If offspring is not good
(poor solution), it will be removed in the next iteration during “Selection”.
Mutation techniques
Mutation may be defined as a small random tweak in the chromosome, to get a new solution. It is
used to maintain and introduce diversity in the genetic population.
Mutation Operators
Mutation Operator is a unary operator and it needs only one parent to work on.
Bit Flip Mutation
In this bit flip mutation, we select one or more random bits and flip them. This is used for binary
encoded GAs.
Random Resetting
Random Resetting is an extension of the bit flip for the integer representation. In this, a random
value from the set of permissible values is assigned to a randomly chosen gene.
Swap Mutation
In swap mutation, we select two positions on the chromosome at random, and interchange the
values. This is common in permutation based encodings.
Scramble Mutation
Scramble mutation is also popular with permutation representations. In this, from the entire
chromosome, a subset of genes is chosen and their values are scrambled or shuffled randomly.
Inversion Mutation
In inversion mutation, we select a subset of genes like in scramble mutation, but instead of
shuffling the subset, we merely invert the entire string in the subset.
Schema Theorem in GA
It is also called fundamental theorem of genetic algorithm.
The theorem was proposed by John Holland in 1970.
Objective of the theorem is to provide a formal model for the effectiveness of the GA
search process.
Schema (H)- A schema is a template that identifies a subset of strings
with similarities at certain string positions.
Order of Schema o(H)- The order of a schema is defined as the number of fixed positions in
the template.
Schema Defining Length 8(H)- It is a difference between first and last specific position.
The Theorem
The schema theorem states that short, low-order schemata with above-
average fitness increase exponentially in successive generations.
Expressed as an equation:
Where,
• m(H, t) is the number of strings belonging to schema H at generation t.
• f(H) is the observed average fitness of schema H.
• at is the observed average fitness at generation t.
• The probability of disruption p is the probability that crossover mutation will destroy the
schema
The probability p can be expressed as
Where,
• o(H)is a order of schema.
• I is the length of code.
• Pm is the probability of mutation.
• Pc is the probability of crossover:
so, a schema with shorter defining length S (H) is less likely to be disrupted.
Limitations:
Assumes gene positions are independent, which may not hold in real problems.
Ignores disruptive effects of crossover and mutation.
Overemphasizes short, low-order, high-fitness schemata.
Lacks clarity in complex fitness landscapes.
This chromosome undergoes mutation. During mutation, the position of two cities in the
chromosome is swapped to form a new configuration, except the first and the last cell, as they
represent the start and endpoint.
Original chromosome had a path length equal to INT_MAX, according to the input defined
below, since the path between city 1 and city 4 didn’t exist. After mutation, the new child formed
has a path length equal to 21, which is a much-optimized answer than the original assumption.
Benefit: GA helps find near-optimal solutions efficiently in large and complex search
spaces like TSP.
Time complexity: O(n^2) as it uses nested loops to calculate the fitness value of each
gnome in the population.
Auxiliary Space: O(n)
In this stage, there is no pheromone in the path, and there are empty paths from food to the ant
colony.
Stage2:
In this stage, ants are divided into two groups following two different paths with a probability of
0.5. So we have four ants on the longer path and four on the shorter path.
Stage 3:
Now, the ants which follow the shorter path will react to the food first, and then the pheromone
concentration will be higher on this path as more ants from the colony will follow the shorter
path.
Stage 4:
Now more ants will return from the shortest path, and the concentration of pheromones will be
higher. Also, the rate of evaporation from the longer path will be higher as fewer ants are using
that path. Now more ants from the colony will use the shortest path.
Algorithm
Now the above behavior of the ants can be used to design the algorithm to find the
shortest path.
We can consider the ant colony and food source as the node or vertex of the graph and the
path as the edges to these vertices.
Let's suppose there are only two paths which are P1 and P2. C1 and C2 are the weight or
the pheromone concentration along the path, respectively.
So we can represent it as graph G(V, E) where V represents the Vertex and E represents
the Edge of the graph.
Initially, for the ith path, the probability of choosing is:
If C1 > C2, then the probability of choosing path 1 is more than path 2. If C 1 < C2, then
Path 2 will be more favorable.
Concentration of pheromone according to the length of the path:
Where Li is the length of the path and K is the constant depending upon the length of the
path. If the path is shorter, concentration will be added more to the existing pheromone
concentration.
2. Change in concentration according to the rate of evaporation:
Here parameter v varies from 0 to 1. If v is higher, then the concentration will be less.
Algorithm:
Place a group of particles (solutions) randomly in the search space.
Assign each particle a velocity and a position.
Calculate the fitness (quality) of each particle based on the problem's objective function.
Track the best position each particle has visited (personal best).
Track the best position visited by any particle in the swarm (global best).
Adjust each particle's velocity based on:
Its personal best position.
The global best position.
Random factors for exploration and exploitation.
Formula for velocity:
Advantages of PSO:
1. Derivative free.
2. Very few algorithm parameters.
3. Very efficient global search algorithm.
Disadvantages of PSO:
1. Slow convergence in the refined search stage (Weak local search ability).
Numerical