NN Unit 1 Complete Notes
NN Unit 1 Complete Notes
NN Unit 1 Complete Notes
R18 B.Tech. Cse (Computer Networks) Iii & Iv Year Jntu Hyderabad (Jawaharlal Nehru
Technological University, Hyderabad)
Prepared
by
Dr K Madan Mohan
Asst. Professor
Department of CSE (AI&ML)
Sreyas Institute of Engineering and Technology,
Nagole, Bandlaguda, Hyderabad
Course Outcomes:
Ability to understand the concepts of Neural Networks
Ability to select the Learning Networks in modeling real world systems
Ability to use an efficient algorithm for Deep Models
Ability to apply optimization strategies for large scale applications
UNIT-I
Artificial Neural Networks Introduction, Basic models of ANN, important terminologies, Supervised
Learning Networks, Perceptron Networks, Adaptive Linear Neuron, Back-propagation Network.
Associative Memory Networks. Training Algorithms for pattern association, BAM and Hopfield
Networks.
UNIT-II
Unsupervised Learning Network- Introduction, Fixed Weight Competitive Nets, Maxnet, Hamming
Network, Kohonen Self-Organizing Feature Maps, Learning Vector Quantization, Counter Propagation
Networks, Adaptive Resonance Theory Networks. Special Networks-Introduction to various networks.
UNIT - III
Introduction to Deep Learning, Historical Trends in Deep learning, Deep Feed - forward networks,
Gradient-Based learning, Hidden Units, Architecture Design, Back-Propagation and Other
Differentiation Algorithms
UNIT - IV
Regularization for Deep Learning: Parameter norm Penalties, Norm Penalties as Constrained
Optimization, Regularization and Under-Constrained Problems, Dataset Augmentation, Noise
Robustness, Semi-Supervised learning, Multi-task learning, Early Stopping, Parameter Typing and
Parameter Sharing, Sparse Representations, Bagging and other Ensemble Methods, Dropout,
Adversarial Training, Tangent Distance, tangent Prop and Manifold, Tangent Classifier
UNIT - V
Optimization for Train Deep Models: Challenges in Neural Network Optimization, Basic Algorithms,
Parameter Initialization Strategies, Algorithms with Adaptive Learning Rates, Approximate Second-
Order Methods, Optimization Strategies and Meta-Algorithms
Applications: Large-Scale Deep Learning, Computer Vision, Speech Recognition, Natural Language
Processing
TEXT BOOKS:
1. Deep Learning: An MIT Press Book By Ian Goodfellow and Yoshua Bengio and Aaron Courville
2. Neural Networks and Learning Machines, Simon Haykin, 3rd Edition, Pearson Prentice Hall.
UNIT-1
Artificial Neural Networks
Topic 1:
The term "Artificial Neural Network" is derived from Biological neural networks that
develop the structure of a human brain.
Similar to the human brain that has neurons interconnected to one another, artificial neural
networks also have neurons that are interconnected to one another in various layers of the
networks.
These neurons are known as nodes.
The given figure illustrates the typical diagram of Biological Neural Network.
The typical Artificial Neural Network looks something like the given figure.
Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks,
cell nucleus represents Nodes, synapse represents Weights, and Axon represents Output .
1.1 Relationship between Biological Neural Network and Artificial Neural Network:
1. Artificial Neural Network: Artificial Neural Network (ANN) is a type of neural network
that is based on a Feed-Forward strategy. It is called this because they pass information
through the nodes continuously till it reaches the output node. This is also known as the
simplest type of neural network. Some advantages of ANN :
Ability to learn irrespective of the type of data (Linear or Non-Linear).
ANN is highly volatile and serves best in financial time series forecasting.
Some disadvantages of ANN :
The simplest architecture makes it difficult to explain the behavior of the network.
This network is dependent on hardware.
2. Biological Neural Network: Biological Neural Network (BNN) is a structure that consists
of Synapse, dendrites, cell body, and axon. In this neural network, the processing is carried
out by neurons. Dendrites receive signals from other neurons, Soma sums all the incoming
signals and axon transmits the signals to other cells.
Some advantages of BNN :
The synapses are the input processing element.
It is able to process highly complex parallel inputs.
Some disadvantages of BNN :
There is no controlling mechanism.
Speed of processing is slow being it is complex.
Differences between ANN and BNN :
Biological Neural Networks (BNNs) and Artificial Neural Networks (ANNs) are both
composed of similar basic components, but there are some differences between them.
Neurons:
In both BNNs and ANNs, neurons are the basic building blocks that process and
transmit information.
However, BNN neurons are more complex and diverse than ANNs.
In BNNs, neurons have multiple dendrites that receive input from multiple sources,
and the axons transmit signals to other neurons, while in ANNs, neurons are
simplified and usually only have a single output.
Synapses:
In both BNNs and ANNs, synapses are the points of connection between neurons,
where information is transmitted.
However, in ANNs, the connections between neurons are usually fixed, and the
strength of the connections is determined by a set of weights, while in BNNs, the
connections between neurons are more flexible, and the strength of the connections
can be modified by a variety of factors, including learning and experience.
Neural Pathways:
In both BNNs and ANNs, neural pathways are the connections between neurons that
allow information to be transmitted throughout the network.
However, in BNNs, neural pathways are highly complex and diverse, and the
connections between neurons can be modified by experience and learning.
In ANNs, neural pathways are usually simpler and predetermined by the architecture
of the network.
Separate from a
Integrated in to processor distributed
Memory processor localized non-
content-addressable
content addressable.
centralized distributed
Overall, while BNNs and ANNs share many basic components, there are significant
differences in their complexity, flexibility, and adaptability.
BNNs are highly complex and adaptable systems that can process information in parallel,
and their plasticity allows them to learn and adapt over time.
In contrast, ANNs are simpler systems that are designed to perform specific tasks, and
their connections are usually fixed, with the network architecture determined by the
designer.
Some other points:
24. An artificial neuron receives signals then processes them and can signal neurons connected
to it.
25. The "signal" at a connection is a real number, and the output of each neuron is computed
by some non-linear function of the sum of its inputs.
26. The connections are called edges. Neurons and edges typically have a weight that adjusts
as learning proceeds.
27. The weight increases or decreases the strength of the signal at a connection. Neurons may
have a threshold such that a signal is sent only if the aggregate signal crosses that threshold.
28. Typically, neurons are aggregated into layers.
29. Different layers may perform different transformations on their inputs.
30. Signals travel from the first layer (the input layer), to the last layer (the output layer),
possibly after traversing the layers multiple times.
31. An artificial neural network is an interconnected group of nodes, inspired by a
simplification of neurons in a brain.
32. Here, each circular node represents an artificial neuron and an arrow represents a
connection from the output of one artificial neuron to the input of another.
1. Artificial Neural Network can be best represented as a weighted directed graph, where
the artificial neurons form the nodes.
2. The association between the neurons outputs and neuron inputs can be viewed as the
directed edges with weights.
3. The Artificial Neural Network receives the input signal from the external source in the
form of a pattern and image in the form of a vector.
4. These inputs are then mathematically assigned by the notations x(n) for every n number
of inputs.
5. Afterward, each of the input is multiplied by its corresponding weights ( these weights
are the details utilized by the artificial neural networks to solve a specific problem ).
6. In general terms, these weights normally represent the strength of the interconnection
between neurons inside the artificial neural network.
7. All the weighted inputs are summarized inside the computing unit.
8. If the weighted sum is equal to zero, then bias is added to make the output non-zero or
something else to scale up to the system's response. Bias has the same input, and weight
equals to 1.
9. Here the total of weighted inputs can be in the range of 0 to positive infinity.
10. Here, to keep the response in the limits of the desired value, a certain maximum value
is benchmarked, and the total of weighted inputs is passed through the activation
function.
11. The activation function refers to the set of transfer functions used to achieve the desired
output.
12. There is a different kind of the activation function, but primarily either linear or non-
linear sets of functions.
13. Some of the commonly used sets of activation functions are the Binary, linear, and Tan
hyperbolic sigmoidal activation functions.
1.3 Training
1) Neural networks learn (or are trained) by processing examples, each of which contains a
known "input" and "result", forming probability-weighted associations between the two,
which are stored within the data structure of the net itself.
2) The training of a neural network from a given example is usually conducted by determining
the difference between the processed output of the network (often a prediction) and a target
output.
3) This difference is the error. The network then adjusts its weighted associations according
to a learning rule and using this error value.
4) Successive adjustments will cause the neural network to produce output that is increasingly
similar to the target output.
5) After a sufficient number of these adjustments, the training can be terminated based on
certain criteria. This is a form of supervised learning.
6) Such systems "learn" to perform tasks by considering examples, generally without being
programmed with task-specific rules.
7) For example, in image recognition, they might learn to identify images that contain cats by
analyzing example images that have been manually labeled as "cat" or "no cat" and using
the results to identify cats in other images.
8) They do this without any prior knowledge of cats, for example, that they have fur, tails,
whiskers, and cat-like faces.
9) Instead, they automatically generate identifying characteristics from the examples that they
process.
1.4 How simple neuron works?
A given neuron receives hundreds of inputs, almost exclusively on its dendrites and
cell body.
These inputs add and subtract in a constantly evolving pattern, depending on what the
brain is thinking.
3. For above neuron architecture, the net input has to be calculated in the way.
I = xA + yB
where x and y are the activations of the input neurons X and Y.
4. The output z of the output neuron Z can be obtained by applying activations over the net
input.
O = f(I)
Output = Function ( net input calculated )
5. The function to be applied over the net input is called activation function . There are
various activation function possible for this.
1.5 Artificial Neural Networks Architecture
Input Layer:
As the name suggests, it accepts inputs in several different formats provided by the
programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the
calculations to find hidden features and patterns.
Output Layer:
1. The input goes through a series of transformations using the hidden layer, which finally
results in output that is conveyed using this layer.
2. The artificial neural network takes input and computes the weighted sum of the inputs and
includes a bias. This computation is represented in the form of a transfer function.
1. There are three layers in the network architecture: the input layer, the hidden layer (more
than one), and the output layer. Because of the numerous layers are sometimes referred to as
2. It is possible to think of the hidden layer as a “distillation layer,” which extracts some of the
most relevant patterns from the inputs and sends them on to the next layer for further analysis.
It accelerates and improves the efficiency of the network by recognizing just the most important
3. The activation function is important for two reasons: first, it allows you to turn on your
computer.
This model captures the presence of non-linear relationships between the inputs.
4. Finding the “optimal values of W — weights” that minimize prediction error is critical to
building a successful model. The “backpropagation algorithm” does this by converting ANN
5. The optimization approach uses a “gradient descent” technique to quantify prediction errors.
To find the optimum value for W, small adjustments in W are tried, and the impact on
prediction errors is examined. Finally, those W values are chosen as ideal since further W
ANNs offers many key benefits that make them particularly well-suited to specific issues and
situations:
1. ANNs can learn and model non-linear and complicated interactions, which is critical since
many of the relationships between inputs and outputs in real life are non-linear and complex.
2. ANNs can generalize – After learning from the original inputs and their associations, the
model may infer unknown relationships from anonymous data, allowing it to generalize and
3. ANN does not impose any constraints on the input variables, unlike many other prediction
Furthermore, numerous studies have demonstrated that ANNs can better simulate
heteroskedasticity, or data with high volatility and non-constant variance, because of their
capacity to discover latent correlations in the data without imposing any preset associations.
This is particularly helpful in financial time series forecasting (for example, stock prices) when
ANNs play a significant part in picture and character recognition because of their capacity
to take in many inputs, process them, and infer hidden and complicated, non-linear
correlations.
detection (for example, bank fraud) and even national security assessments.
Image recognition is a rapidly evolving discipline with several applications ranging from
Deep neural networks, which form the core of “deep learning,” have now opened up all of
the new and transformative advances in computer vision, speech recognition, and natural
research.
2. Forecasting:
allocation between goods, and capacity utilization), economic and monetary policy,
Forecasting issues are frequently complex; for example, predicting stock prices is
Traditional forecasting models have flaws when it comes to accounting for these
Given its capacity to model and extract previously unknown characteristics and
correlations, ANNs can provide a reliable alternative when used correctly. ANN also
has no restrictions on the input and residual distributions, unlike conventional models.
14. Neural Network can be used to predict targets with the help of echo patterns we get
from sonar, radar, seismic and magnetic instruments.
15. It can be used efficiently in Employee hiring so that any company can hire the right
employee depending upon the skills the employee has and what should be its
productivity in future.
16. It has a large application in Medical Research.
17. It can be used to for Fraud Detection regarding credit cards, insurance or taxes by
analyzing the past records.
1.8 Advantages of Artificial Neural Networks
1. Non-linearity: ANNs can capture non-linear relationships between inputs and outputs,
making them suitable for modeling complex data.
2. Adaptability: ANNs can learn from data and adjust their internal parameters to
improve their performance over time, making them adaptable to changing
environments and tasks.
3. Parallel Processing: ANNs can perform multiple computations simultaneously,
allowing for efficient processing of large-scale data.
4. Fault Tolerance: ANNs are robust against noisy or incomplete data due to their
distributed and interconnected nature.
5. Attribute-value pairs are used to represent problems in ANN.
6. The output of ANNs can be discrete-valued, real-valued, or a vector of multiple real or
discrete-valued characteristics, while the target function can be discrete-valued, real-
valued, or a vector of numerous real or discrete-valued attributes.
7. Noise in the training data is not a problem for ANN learning techniques. There may be
mistakes in the training samples, but they will not affect the final result.
8. It’s utilized when a quick assessment of the taught target function is necessary.
9. The number of weights in the network, the number of training instances evaluated, and
the settings of different learning algorithm parameters can all contribute to extended
training periods for ANNs.
10. Parallel processing capability: Artificial neural networks have a numerical value that
can perform more than one task simultaneously.
11. Storing data on the entire network:
Data that is used in traditional programming is stored on the whole network, not on a
database. The disappearance of a couple of pieces of data in one place doesn't prevent
the network from working.
12. Capability to work with incomplete knowledge: After ANN training, the information
may produce output even with inadequate data. The loss of performance here relies
upon the significance of missing data.
13. Having a memory distribution:
For ANN is to be able to adapt, it is important to determine the examples and to
if the event can't appear to the network in all its aspects, it can produce false output.
Having fault tolerance:
Extortion of one or more cells of ANN does not prohibit it from generating output,
1. Hardware Dependence:
The construction of Artificial Neural Networks necessitates the use of parallel
processors.
As a result, the equipment’s realization is contingent.
2. Understanding the network’s operation:
This is the most serious issue with ANN.
When ANN provides a probing answer, it does not explain why or how it was
chosen.
As a result, the network’s confidence is eroded.
3. Assured network structure:
Any precise rule does not determine the structure of artificial neural networks.
Experience and trial and error are used to develop a suitable network structure.
4. Difficulty in presenting the issue to the network:
ANNs are capable of working with numerical data.
Before being introduced to ANN, problems must be converted into numerical
values.
The display method that is chosen will have a direct impact on the network’s
performance.
The user’s skill is a factor here.
There are various types of Artificial Neural Networks (ANN) depending upon the human
brain neuron and network functions, an artificial neural network similarly performs tasks.
The majority of the artificial neural networks will have some similarities with a more
complex biological partner and are very effective at their expected tasks.
For example, segmentation or classification.
1.11.1 Feedback ANN:
In this type of ANN, the output returns into the network to accomplish the best-evolved
results internally.
As per the University of Massachusetts, Lowell Centre for Atmospheric Research.
The feedback networks feed information back into itself and are well suited to solve
optimization issues. The Internal system error corrections utilize feedback ANNs.
1.11.2 Feed-Forward ANN:
A feed-forward network is a basic neural network comprising of an input layer, an output
layer, and at least one layer of a neuron.
Through assessment of its output by reviewing its input, the intensity of the network can
be noticed based on group behavior of the associated neurons, and the output is decided.
The primary advantage of this network is that it figures out how to evaluate and recognize
input patterns.
1.12. Types of Modular Neural Networks (MNNs):
It is one of the fastest-growing areas of Artificial Intelligence.
1. Feedforward Neural Network – Artificial Neuron.
2. Radial basis function Neural Network.
3. Kohonen Self Organizing Neural Network.
4. Recurrent Neural Network(RNN)
5. Convolutional Neural Network (CNN)
6. Long / Short Term Memory.
Feedforward Neural Network (FNN) - Artificial Neuron:
A Feedforward Neural Network, also known as an Artificial Neural Network, is the
most basic form of neural networks.
It consists of input, hidden, and output layers of artificial neurons.
The information flows only in one direction, from the input layer through the hidden
layers to the output layer.
Each neuron in the network processes the input data and passes the output to the next
layer without any feedback loop.
FNNs are commonly used for tasks such as classification and regression.
Radial Basis Function Neural Network (RBFNN):
The Radial Basis Function Neural Network is a type of feedforward neural network that
uses radial basis functions as activation functions.
These functions evaluate the distance between the input data and a set of learned centers
in a multidimensional space.
RBFNNs are often employed for tasks like function approximation, interpolation, and
pattern recognition.
Neural networks can automatically learn relevant features and representations from the
input data, reducing the need for manual feature engineering.
This capability can streamline the data preprocessing phase and improve overall
efficiency.
10. In supervised learning, the network is presented with labeled training examples, where it
learns to map inputs to desired outputs.
11. In contrast, unsupervised learning involves training the network on unlabeled data, where
it learns to find patterns and structure in the data without explicit guidance.
12. There are various types of neural networks, each designed for different tasks and data types.
13. Some common types include feedforward neural networks, convolutional neural networks
(CNNs) for image analysis, recurrent neural networks (RNNs) for sequential data analysis,
and generative adversarial networks (GANs) for generating new data samples.
14. ANNs have found applications in a wide range of fields, including image and speech
recognition, natural language processing, recommendation systems, financial analysis, and
medical diagnosis, among others.
15. Their ability to automatically learn and adapt to complex patterns makes them a valuable
tool in solving real-world problems.
16. While ANNs have demonstrated remarkable success in many domains, they are not without
limitations.
17. Training large networks can be computationally intensive and requires substantial amounts
of labeled data.
18. Additionally, interpreting and understanding the inner workings of neural networks, often
referred to as the "black box" problem, can be challenging. Researchers continue to work
on addressing these limitations and advancing the field of artificial neural networks.
19. Artificial neural networks (ANNs, also shortened to neural networks (NNs) or neural nets)
are a branch of machine learning models that are built using principles of neuronal
organization discovered by connectionism in the biological neural networks constituting
animal brains.
20. An ANN is based on a collection of connected units or nodes called artificial neurons,
which loosely model the neurons in a biological brain.
21. Each connection, like the synapses in a biological brain, can transmit a signal to other
neurons.
22. An artificial neuron receives signals then processes them and can signal neurons connected
to it.
23. The "signal" at a connection is a real number, and the output of each neuron is computed
by some non-linear function of the sum of its inputs.
24. The connections are called edges.
25. Neurons and edges typically have a weight that adjusts as learning proceeds.
26. The weight increases or decreases the strength of the signal at a connection.
27. Neurons may have a threshold such that a signal is sent only if the aggregate signal crosses
that threshold.
28. Typically, neurons are aggregated into layers.
29. Different layers may perform different transformations on their inputs.
30. Signals travel from the first layer (the input layer), to the last layer (the output layer),
possibly after traversing the layers multiple times.
1.16.1 In neural networks how to find good approximate solutions to complex (large-scale)
problems?
To find good approximate solutions to complex (large-scale) problems in neural networks,
techniques like,
a) gradient-based optimization,
b) regularization, and
c) architecture design
are used to fine-tune the network parameters, enabling it to learn and generalize well from
the available data.
Additionally, advanced methods like
a) transfer learning,
b) ensembling, and
c) hyperparameter tuning
can further enhance the model's performance on challenging tasks.
1.16.2 why neural networks cannot provide the solution by working individually?
Neural networks cannot always provide the best solutions by working individually
because complex real-world problems often require the combination of diverse
knowledge and expertise.
Ensembling multiple neural networks or using collaborative approaches allows
leveraging diverse insights and strengths, leading to better overall performance and
more robust solutions.
1.16.3 To Solve Complex Problems:
In solving a complex problem, neural networks are divided into specialized groups, each
assigned to handle simpler tasks that align with their inherent abilities, contributing to an
efficient and effective problem-solving process.
1.16.4 Conclusion:
1. Analytical neural networks (ANNs) are powerful models that can be applied in many
scenarios.
2. Several noteworthy uses of ANNs have been mentioned above, although they have
Receptors:
Receptors are specialized cells or structures in the human body that detect and
convert stimuli from the external environment or internal body processes into
electrical impulses.
These stimuli can be anything from light, sound, touch, temperature, chemicals, or
even internal signals like pain or pressure.
For example, in the eyes, there are photoreceptor cells that convert light into
electrical signals, enabling us to see.
In the ears, there are hair cells that respond to sound vibrations, allowing us to hear.
Similarly, touch receptors in the skin respond to pressure, pain, temperature, and
other tactile sensations.
Effectors:
Effectors are organs or structures that receive signals from the neural net (brain) and
convert these electrical impulses back into discernible responses or actions.
Effectors play a crucial role in carrying out the instructions generated by the brain,
resulting in various bodily actions and responses.
When the brain sends electrical signals to specific muscles, they contract or relax,
enabling movement. Similarly, glands are effectors for secretion responses.
When the brain instructs certain glands to release hormones or other substances,
they respond by releasing these chemical messengers into the bloodstream.
The arrows pointing from left to right in the system diagram represent the forward
transmission of information.
It indicates how electrical impulses carrying sensory information travel from the
receptors to the neural net (brain).
This forward transmission ensures that sensory information reaches the brain for
processing and interpretation.
The forward transmission of information ensures that sensory input reaches the
brain.
The arrows pointing from right to left and shown in red represent feedback in the
system.
In the context of the human nervous system, feedback can refer to the information
that travels back from the neural net (brain) to influence or modify the signals from
the receptors.
Feedback loops enable the brain to influence and modulate the body's responses for
adaptive and coordinated actions.
For example, when you touch a hot object, the feedback loop allows your brain to
send signals to your muscles, causing you to quickly withdraw your hand to avoid
injury.
1.20.1 These components fits into the structural organization of levels in the brain:
1. Neurons:
Neurons are the fundamental building blocks of the brain and nervous system.
They are specialized cells that receive, process, and transmit information through
electrical and chemical signals. Neurons form the cellular level of brain organization.
2. Dendritic Trees:
Dendrites are branched extensions of neurons that receive incoming signals from
other neurons.
Dendritic trees play a crucial role at the cellular level as they collect and integrate
information from multiple sources.
3. Synapses:
4. Neural Microcircuits:
5. Local Circuits:
Local circuits refer to interconnected neurons within a particular brain region that
work together to perform specific functions.
These circuits are part of the regional level of brain organization.
6. Interregional Circuits:
The CNS includes the brain and spinal cord, which are the central processing
centers of the nervous system.
It represents the highest level of brain organization, coordinating all functions and
responses.
b) Information flows in a unidirectional manner from the input layer through the hidden
layers to the output layer. FNNs are suitable for tasks such as classification and
regression.
c) As the name suggests, a Feedforward artificial neural network is when data moves in
one direction between the input and output nodes.
d) Data moves forward through layers of nodes, and won’t cycle backwards through the
same layers.
e) Although there may be many different layers with many different nodes, the one-way
movement of data makes Feedforward neural networks relatively simple.
f) Feedforward artificial neural network models are mainly used for simplistic
classification problems.
g) Models will perform beyond the scope of a traditional machine learning model, but
don’t meet the level of abstraction found in a deep learning model.
2. It is primarily used for binary classification tasks, where the input data is fed into the
network, and it produces a binary output (e.g., yes/no, 0/1).
3. Perceptron is a neural network with only one neuron, and can only understand linear
relationships between the input and output data provided.
4. However, with Multilayer Perceptron, horizons are expanded and now this neural network
can have many layers of neurons, and ready to learn more complex patterns.
6. A Perceptron model is a binary classifier, separating data into two different classifications.
7. As a linear model it is one of the simplest examples of a type of artificial neural network.
8. Multilayer Perceptron artificial neural networks adds complexity and density, with the
capacity for many hidden layers between the input and output layer.
9. Each individual node on a specific layer is connected to every node on the next layer.
10. This means Multilayer Perceptron models are fully connected networks, and can be
leveraged for deep learning.
11. They’re used for more complex problems and tasks such as complex classification or voice
recognition.
12. Because of the model’s depth and complexity, processing and model maintenance can be
resource and time-consuming.
a. Input Layer: It receives the input data, where each input is represented by a feature or
attribute.
b. Weights: Each input is associated with a weight, which determines the strength of the
connection between the input and the neuron.
c. Activation Function: The weighted sum of inputs is passed through an activation function,
which determines the output of the perceptron.
The Multilayer Perceptron (MLP) is an extension of the perceptron and is also known as a
feedforward neural network.
Unlike the perceptron, MLP consists of multiple layers, including an input layer, one or
more hidden layers, and an output layer.
a. Input Layer: As with the perceptron, the input layer receives the input data.
b. Hidden Layers: Hidden layers are intermediate layers between the input and output layers.
Each neuron in the hidden layers uses an activation function to process the input and produce
an output.
c. Output Layer: The output layer produces the final output of the network, which is typically
used for making predictions or classifications.
3. Radial basis function networks are distinguished from other neural networks due to their
universal approximation and faster learning speed.
4. An RBF network is a type of feed forward neural network composed of three layers, namely
the input layer, the hidden layer and the output layer.
5. Radial basis function neural networks usually have an input layer, a layer with radial basis
function nodes with different parameters, and an output layer.
6. Models can be used to perform classification, regression for time series, and to control
systems.
7. Radial basis functions calculate the absolute value between a centre point and a given point.
8. In the case of classification, a radial basis function calculates the distance between an input
and a learned classification.
10. A common use for radial basis function neural networks is in system control, such as
systems that control power restoration after a power cut.
11. The artificial neural network can understand the priority order to restoring power,
prioritising repairs to the greatest number of people or core services.
b) They have connections that form cycles, allowing information to be stored and propagated
across different time steps.
c) RNNs have a "memory" of previous inputs, making them suitable for tasks such as natural
language processing and speech recognition.
d) Recurrent neural networks are powerful tools when a model is designed to process
sequential data.
e) The model will move data forward and loop it backwards to previous steps in the artificial
neural network to best achieve a task and improve predictions.
f) The layers between the input and output layers are recurrent, in that relevant information is
looped back and retained.
g) Memory of outputs from a layer is looped back to the input where it is held to improve the
process for the next input.
h) The flow of data is similar to Feedforward artificial neural networks, but each node will
retain information needed to improve each step.
i) Because of this, models can better understand the context of an input and refine the
prediction of an output.
For example, a predictive text system may use memory of a previous word in a string of
words to better predict the outcome of the next word.
j) A recurrent artificial neural network would be better suited to understand the sentiment
behind a whole sentence compared to more traditional machine learning models.
k) Recurrent neural networks are also used within sequence-to-sequence models, which are
used for natural language processing.
l) Two recurrent neural networks are used within these models, which consists of a
simultaneous encoder and decoder.
m) These models are used for reactive chatbots, translating language, or to summarise
documents.
7. If applied to data processing or the computing process, the speed of the processing will
be increased as smaller components can work in tandem.
9. This type of artificial neural network is beneficial as it can make complex processes
more efficient, and can be applied to a range of environments.
1. Although there is huge potential for leveraging artificial neural networks in machine
learning, the approach comes with some challenges.
2. Models are complex, and it can be difficult to explain the reasoning behind a decision in
what in many cases is a black box operation.
3. This makes the issue of explain ability a significant challenge and consideration.
4. With all types of machine learning models, the accuracy of the final model depends heavily
on the quantity and quality of training data available.
5. A model built with an artificial neural network needs even more data and resources to train
than a traditional machine learning model.
6. This means millions of data points in contrast to the hundreds of thousands needed by a
traditional machine learning model.
7. The most complex artificial neural networks are often referred to as deep neural networks,
referencing the multi-layered network architecture.
8. Deep learning models are usually trained using labelled training data, which is data with a
defined input and output.
10. The model will learn the features and patterns within the labelled training data, and learn
to perform an intended task through the examples in the training data.
11. Artificial neural networks need a huge amount of training data, more so then more
traditional machine learning algorithms.
12. This is in the realm of big data, so many millions of data points may be required.
13. The need for such a large array of labelled, quality data is a limiting factor to being able to
develop artificial neural network models.
14. Organisations are therefore limited to those that have access to the required big data.
15. The most powerful artificial neural network models have complex, multi-layered
architecture.
16. These models require a huge amount of resources and power to process datasets.
17. This requires powerful, resource-intensive GPU units and system architecture.
18. Again, the level of resources required is a limiting factor and challenge for organisations.
19. The method of transfer learning is often used to lower the resource intensity.
20. In this process, existing knowledge from other models and existing artificial neural
networks can be transferred or adapted when developing a new model.
21. This streamlines development as models aren’t built from scratch each time, but can be
built from elements of existing models.
b) McCulloch-Pitts Model
c) Simple McCulloch-Pitts neurons can be used to design logical operations. For that
purpose, the connection weights need to be correctly decided along with the threshold
function (rather than the threshold value of the activation function).
For better understanding purpose, let me consider an example:
John carries an umbrella if it is sunny or if it is raining. There are four given situations. I need
to decide when John will carry the umbrella. The situations are as follows:
First scenario: It is not raining, nor it is sunny
Second scenario: It is not raining, but it is sunny
Third scenario: It is raining, and it is not sunny
Fourth scenario: It is raining as well as it is sunny
To analyse the situations using the McCulloch-Pitts neural model, consider the input signals
as follows:
X1: Is it raining?
X2 : Is it sunny?
So, the value of both scenarios can be either 0 or 1. We can use the value of both weights
X1 and X2 as 1 and a threshold function as 1.
The truth table built with respect to the problem is depicted above.
From the truth table, I can conclude that in the situations where the value of yout is 1, John
needs to carry an umbrella.
Rosenblatt’s Perceptron
The perceptron receives a set of input x1, x2,….., xn. The linear combiner or the adder
mode computes the linear combination of the inputs applied to the synapses with synaptic
weights being w1, w2,……,wn.
Then, the hard limiter checks whether the resulting sum is positive or negative If the input
of the hard limiter node is positive, the output is +1, and if the input is negative, the output
is -1.
Mathematically the hard limiter input is:
The objective of the perceptron is o classify a set of inputs into two classes c 1 and c2.
This can be done using a very simple decision rule – assign the inputs to c1 if the output of
the perceptron i.e. yout is +1 and c2 if yout is -1.
So for an n-dimensional signal space i.e. a space for ‘n’ input signals, the simplest form of
perceptron will have two decision regions, resembling two classes, separated by a hyperplane
defined by:
Thus, we see that for a data set with linearly separable classes, perceptrons can always be
employed to solve classification problems using decision lines (for 2-dimensional space),
decision planes (for 3-dimensional space) or decision hyperplanes (for n-dimensional
space).
Appropriate values of the synaptic weights can be obtained by training a perceptron.
However, one assumption for perceptron to work properly is that the two classes should
be linearly separable i.e. the classes should be sufficiently separated from each other.
Otherwise, if the classes are non-linearly separable, then the classification problem
cannot be solved by perceptron.
The data is not linearly separable. Only a curved decision boundary can separate the classes
properly. To address this issue, the other option is to use two decision boundary lines in place
of one.
This is the philosophy used to design the multi-layer perceptron model. The major highlights
of this model are as follows:
The neural network contains one or more intermediate layers between the input
and output nodes, which are hidden from both input and output nodes
Each neuron in the network includes a non-linear activation function that is
differentiable.
The neurons in each layer are connected with some or all the neurons in the
previous layer.
As depicted in the below diagram, it has only output neurons. The output value can
be +1 or -1.
The activation function is such that if weighted sum is positive or 0, the output is 1,
else it is -1.
EXTRA INFORMATION
How should the neurons be connected together? If a network is to be of any use, there
must be inputs and outputs.
However, there also can be hidden neurons that play an internal role in the network.
The input, hidden and output neurons need to be connected together.
The units each perform a biased weighted sum of their inputs and pass this activation level
through a transfer function to produce their output, and the units are arranged in a layered
feedforward topology.
3.2 ADALINE
Adaptive Linear Neuron or later Adaptive Linear Element (Fig. 2) is an early single-layer
artificial neural network and the name of the physical device that implemented this
network.
It was developed by Bernard Widrow and Ted Hoff of Stanford University in 1960.
The difference between Adaline and the standard (McCulloch–Pitts) perceptron is that in
the learning phase the weights are adjusted according to the weighted sum of the inputs
(the net).
In the standard perceptron, the net is passed to the activation (transfer) function and the
function’s output is used for adjusting the weights.
3.3 ART
1. The primary intuition behind the ART model (Fig. 3) is that object identification and
recognition generally occur as a result of the interaction of ‘top-down’ observer
expectations with ‘bottom-up’ sensory information.
2. The model postulates that ‘top-down’ expectations take the form of a memory template or
prototype that is then compared with the actual features of an object as detected by the
senses.
3. This comparison gives rise to a measure of category belongingness.
4. As long as this difference between sensation and expectation does not exceed a set
threshold called the ‘vigilance parameter’, the sensed object will be considered a member
of the expected class.
5. The system thus offers a solution to the ‘plasticity/stability’ problem, i.e. the problem of
acquiring new knowledge without disrupting existing knowledge.
Two-dimensional CNN
Convolutional Neural Network
1. A convolutional neural network (Fig. 8) is a type of feed-forward artificial neural network
whose individual neurons are arranged in such a way that they respond to overlapping regions
tiling the visual field.
2. Convolutional neural networks consist of multiple layers of small neuron collections which
process portions of the input image.
3. The outputs of these collections are then tiled so that their input regions overlap, to obtain a
better representation of the original image; this is repeated for every such layer.
The ANN learns through various learning algorithms that are described as supervised
or unsupervised learning.
In supervised learning algorithms, the target values are labeled. Its goal is to try
to reduce the error between the desired output (target) and the actual output for
optimization. Here, a supervisor is present.
In unsupervised learning algorithms, the target values are not labeled and the
network learns by itself by identifying the patterns through repeated trials and
experiments.
ANN Terminology:
Weights: each neuron is linked to the other neurons through connection links that
carry weight.
The weight has information and data about the input signal. The output depends
solely on the weights and input signal.
The weights can be presented in a matrix form that is known as the Connection
matrix.
if there are “n” nodes with each node having “m” weights, then it is represented as:
Bias: Bias is a constant that is added to the product of inputs and weights to calculate
the product.
It is used to shift the result to the positive or negative side.
The net input weight is increased by a positive bias while The net input weight is
decreased by a negative bias.
Here,{1,x1…xn} are the inputs, and the output (Y) neurons will be computed by the
function g(x) which sums up all the input and adds bias to it.
g(x)=∑xi+b where i=0 to n
= x1+........+xn+b
and the role of the activation is to provide the output depending on the results of the
summation function:
Y=1 if g(x)>=0
Y=0 else
Threshold: A threshold value is a constant value that is compared to the net input
to get the output.
The activation function is defined based on the threshold value to calculate the
output.
For Example:
Y=1 if net input>=threshold
Y=0 else
Learning Rate: The learning rate is denoted α. It ranges from 0 to 1. It is used
for balancing weights during the learning of ANN.
Target value: Target values are Correct values of the output variable and are also
known as just targets.
Error: It is the inaccuracy of predicted output values compared to Target Values.
Supervised Learning Algorithms:
Delta Learning: It was introduced by Bernard Widrow and Marcian Hoff and is
also known as Least Mean Square Method. It reduces the error over the entire
learning and training process. In order to minimize error, it follows the gradient
descent method in which the Activation Function continues forever.
Outstar Learning: It was first proposed by Grossberg in 1976, where we use the
concept that a Neural Network is arranged in layers, and weights connected
through a particular node should be equal to the desired output resulting in neurons
that are connected with those weights.
Unsupervised Learning Algorithms:
Hebbian Learning: It was proposed by Hebb in 1949 to improve the weights of
nodes in a network. The change in weight is based on input, output, and learning
rate. the transpose of the output is needed for weight adjustment.
Competitive Learning: It is a winner takes all strategy. Here, when an input
pattern is sent to the network, all the neurons in the layer compete with each other
to represent the input pattern, the winner gets the output as 1 and all the others 0,
and only the winning neurons have weight adjustments.
1) Neuron: A fundamental unit of a neural network that receives input, applies an activation
function, and produces an output.
2) Input Layer: The first layer of a neural network that receives the initial input data.
3) Hidden Layer: Intermediate layers between the input and output layers that perform
computations and feature extraction.
4) Output Layer: The final layer of a neural network that produces the desired output or
prediction.
5) Activation Function: A mathematical function applied to the output of a neuron to
introduce non-linearity and control the neuron's firing behavior.
6) Weight: A parameter associated with each connection between neurons, determining the
strength or importance of the connection.
7) Bias: An additional parameter added to each neuron that allows for shifting the activation
function.
8) Forward Propagation: The process of passing input data through a neural network to
compute the output.
9) Backpropagation: An algorithm for updating the weights and biases of a neural network
by propagating the error from the output layer back to the input layer.
10) Loss Function: A function that quantifies the difference between the predicted output of a
neural network and the true output, used to guide the training process.
11) Gradient Descent: An optimization algorithm used to minimize the loss function by
iteratively adjusting the weights and biases of the neural network.
12) Epoch: One complete pass through the entire training dataset during the training phase of
a neural network.
13) Batch Size: The number of training examples used in each iteration of gradient descent
during training.
14) Learning Rate: A hyperparameter that determines the step size at each iteration of gradient
descent, influencing the rate at which the neural network learns.
15) Dropout: A regularization technique that randomly drops out a certain percentage of
neurons during training to prevent overfitting.
16) Overfitting: A condition where a neural network performs well on the training data but
fails to generalize to unseen data due to excessively fitting the training data.
17) Activation Layer: A layer in a neural network that applies an activation function to its
inputs.
18) Convolutional Neural Network (CNN): A specialized type of neural network commonly
used for image and video processing, featuring convolutional layers for local feature
extraction.
19) Recurrent Neural Network (RNN): A type of neural network designed for sequential data
processing, capable of capturing dependencies and patterns over time.
20) Long Short-Term Memory (LSTM): A variant of RNN that addresses the vanishing
gradient problem and is well-suited for learning long-term dependencies.
Example: We have not achieved AGI yet; it would be an AI system that could perform tasks
as diverse as cooking, painting, reasoning, and playing sports with the same level of
competence as a human.
These are just a few essential terms in the vast field of Artificial Intelligence, but they provide
a good starting point for understanding the fundamental concepts.
Example: Using a linear regression model for a highly nonlinear problem, which results in
poor predictions.
9. Hyperparameters:
Hyperparameters are settings or configurations that are set before the training process and
affect how the machine learning model learns.
Example: The learning rate in gradient descent or the number of hidden layers in a neural
network.
10. Validation Data:
Validation data is a separate set used during the training process to tune hyperparameters
and assess the model's performance while avoiding overfitting.
Example: A portion of the training dataset is kept as validation data to check the model's
performance after each training epoch.
These terminologies form the foundation of Machine Learning and are essential to
understand while working with ML models.
11. Social Media Analysis: ML is used to analyze social media data for sentiment analysis,
trend prediction, and targeted advertising.
12. Weather Prediction: ML models are applied to weather data to forecast temperature,
rainfall, and other weather patterns.
13. Robotics: Machine learning enables robots to learn from their interactions with the
environment, making them more adaptable and capable of performing complex tasks.
14. Industrial Predictive Maintenance: ML is used to predict equipment failures in industrial
settings, reducing downtime and maintenance costs.
15. Customer Churn Prediction: Companies use ML models to predict and prevent customer
churn by identifying at-risk customers and taking proactive measures.
7. Backpropagation:
Backpropagation is an optimization algorithm used to update the weights and biases of
a neural network during training by computing gradients in reverse order. It allows the
network to learn from the training data and minimize the loss function.
Example: During each training iteration, backpropagation calculates how much each
weight contributed to the error and adjusts the weights accordingly.
8. Batch Size:
The Batch Size refers to the number of training examples processed together in one
iteration of training. It affects the speed of training and the memory requirements.
Example: If the batch size is set to 32, the model updates its weights and biases after
processing 32 training examples.
9. Epoch:
An Epoch is a complete pass through the entire training dataset during the training
process. Multiple epochs are usually needed to optimize the model effectively.
Example: If a model goes through the entire dataset of 1000 images five times during
training, it has completed 5 epochs.
These are some of the key terminologies in Deep Learning that are crucial to understanding
and working with deep neural networks effectively.
Important Deep Learning examples
1. Deep learning has had a significant impact on various fields, and many important
examples have demonstrated its effectiveness. Here are some notable examples of deep
learning applications:
2. Image Classification: Deep learning models like Convolutional Neural Networks
(CNNs) have achieved remarkable success in image classification tasks. The most
famous example is the ImageNet competition, where deep learning models surpassed
human-level performance in identifying objects in images.
3. Natural Language Processing (NLP): Deep learning has revolutionized NLP tasks,
such as machine translation, sentiment analysis, and text generation. Transformers,
particularly models like BERT and GPT (including GPT-3), have achieved state-of-the-
art performance on numerous NLP benchmarks.
4. Speech Recognition: Deep learning has significantly improved speech recognition
systems, such as voice assistants (e.g., Amazon Alexa, Google Assistant) and
transcription services. Recurrent Neural Networks (RNNs) and Attention-based models
have been widely used in this domain.
5. Object Detection: Deep learning models, especially region-based CNNs like Faster R-
CNN and one-stage detectors like YOLO (You Only Look Once), have demonstrated
outstanding performance in detecting and localizing multiple objects within images.
6. Autonomous Vehicles: Deep learning plays a crucial role in the development of self-
driving cars. Deep neural networks are used for perception tasks, like detecting
pedestrians, traffic signs, and other vehicles, enabling autonomous decision-making.
7. Medical Imaging: Deep learning has shown great promise in medical image analysis,
aiding in the detection of diseases from X-rays, MRIs, and CT scans. It can assist
radiologists in diagnosing conditions like cancer and identifying abnormalities in
medical images.
8. Game Playing: DeepMind's AlphaGo, based on deep neural networks, defeated the
world champion in the ancient Chinese board game Go, demonstrating the ability of
deep learning to handle complex decision-making tasks.
9. Style Transfer: Deep learning models, particularly generative models like GANs
(Generative Adversarial Networks), can transform the style of images, such as
converting photographs into the style of famous artworks.
10. Drug Discovery: Deep learning has been employed in drug discovery, where it can
analyze chemical structures, predict drug-protein interactions, and assist in the
identification of potential new drug candidates.
11. Music Generation: Deep learning models like LSTM (Long Short-Term Memory)
networks have been used to generate music and create compositions that imitate the
style of famous composers or produce entirely novel music pieces.
These examples showcase the versatility and transformative power of deep learning across
diverse fields and applications. As research and technology progress, we can expect even more
groundbreaking applications of deep learning in the future.
10. Learning Rate: The Learning Rate is a hyperparameter that controls the step size at which
the neural network adjusts its weights during training. It affects how quickly or slowly the
model learns.
Example: A high learning rate may cause the model to make large weight updates,
potentially overshooting the optimal values, while a low learning rate may slow down
convergence.
These terminologies form the foundation of Neural Networks and are essential to understand
their functioning and training process.
Important Neural Networks with examples
1. Feedforward Neural Networks (FNN):
Example: A basic neural network used for binary classification tasks like spam email
detection.
2. Convolutional Neural Networks (CNN): Example: Image classification tasks, such as
identifying objects in images (e.g., ImageNet competition).
3. Recurrent Neural Networks (RNN): Example: Language modeling, machine translation,
and sentiment analysis, where the order of input data matters (e.g., predicting the next word
in a sentence).
4. Long Short-Term Memory (LSTM) Networks: Example: Language generation, text
summarization, and speech recognition, where the model needs to remember important
information from the past (e.g., generating coherent paragraphs or understanding spoken
sentences).
5. Generative Adversarial Networks (GAN): Example: Image synthesis, such as generating
realistic-looking faces or creating artwork in the style of famous painters.
6. Transformer Networks: Example: Natural Language Processing tasks, including machine
translation and language understanding (e.g., BERT for pre-training language
representations).
7. Autoencoders: Example: Dimensionality reduction and feature learning, such as
compressing image data or denoising images.
8. Siamese Neural Networks: Example: Face recognition, where the network learns to
compare and verify whether two facial images belong to the same person.
9. Deep Reinforcement Learning Networks: Example: Game playing, such as AlphaGo
playing the board game Go or agents learning to play video games using techniques like
Deep Q-Networks (DQN).
10. Residual Neural Networks (ResNet): Example: Image classification tasks with very deep
architectures, where the network can be trained more effectively by using skip connections
to avoid the vanishing gradient problem.
These are just a few examples of important neural networks and their applications. Neural
networks are versatile and can be adapted to solve a wide range of problems across various
domains. As the field of deep learning advances, new architectures and techniques will continue
to emerge, further expanding the possibilities of neural networks.
1. Supervised learning (SL) is a machine learning paradigm for problems where the
available data consists of labeled examples, meaning that each data point contains features
(covariates) and an associated label.
2. The goal of supervised learning algorithms is learning a function that maps feature vectors
(inputs) to labels (output), based on example input-output pairs.
3. It infers a function from labeled training data consisting of a set of training examples.
4. In supervised learning, each example is a pair consisting of an input object (typically a
vector) and a desired output value (also called the supervisory signal).
5. A supervised learning algorithm analyzes the training data and produces an inferred
function, which can be used for mapping new examples.
6. An optimal scenario will allow for the algorithm to correctly determine the class labels for
unseen instances.
7. This requires the learning algorithm to generalize from the training data to unseen situations
in a "reasonable" way (see inductive bias).
8. This statistical quality of an algorithm is measured through the so-called generalization
error.
Steps to follow
To solve a given problem of supervised learning, one has to perform the following steps:
1. Determine the type of training examples. Before doing anything else, the user should
decide what kind of data is to be used as a training set.
2. In the case of handwriting analysis, for example, this might be a single handwritten
character, an entire handwritten word, an entire sentence of handwriting or perhaps a
full paragraph of handwriting.
3. Gather a training set. The training set needs to be representative of the real-world use
of the function.
4. Thus, a set of input objects is gathered and corresponding outputs are also gathered,
either from human experts or from measurements.
5. Determine the input feature representation of the learned function. The accuracy of the
learned function depends strongly on how the input object is represented.
6. Typically, the input object is transformed into a feature vector, which contains a number
of features that are descriptive of the object.
7. The number of features should not be too large, because of the curse of dimensionality;
but should contain enough information to accurately predict the output.
8. Determine the structure of the learned function and corresponding learning algorithm.
For example, the engineer may choose to use support-vector machines or decision trees.
9. Complete the design. Run the learning algorithm on the gathered training set.
10. Some supervised learning algorithms require the user to determine certain control
parameters.
11. These parameters may be adjusted by optimizing performance on a subset (called
a validation set) of the training set, or via cross-validation.
12. Evaluate the accuracy of the learned function.
13. After parameter adjustment and learning, the performance of the resulting function
should be measured on a test set that is separate from the training set.
Algorithm choice
A wide range of supervised learning algorithms are available, each with its strengths
and weaknesses.
There is no single learning algorithm that works best on all supervised learning
problems (see the No free lunch theorem).
a) Bias-variance tradeoff
1. A first issue is the tradeoff between bias and variance.
2. Imagine that we have available several different, but equally good, training data
sets.
3. A learning algorithm is biased for a particular input x if, when trained on each of
these data sets, it is systematically incorrect when predicting the correct output for
x.
4. A learning algorithm has high variance for a particular input x if it predicts different
output values when trained on different training sets.
5. The prediction error of a learned classifier is related to the sum of the bias and the
variance of the learning algorithm.
6. Generally, there is a tradeoff between bias and variance.
7. A learning algorithm with low bias must be "flexible" so that it can fit the data well.
8. But if the learning algorithm is too flexible, it will fit each training data set
differently, and hence have high variance.
9. A key aspect of many supervised learning methods is that they are able to adjust this
tradeoff between bias and variance (either automatically or by providing a
bias/variance parameter that the user can adjust).
b) Function complexity and amount of training data
1) The second issue is of the amount of training data available relative to the complexity
of the "true" function (classifier or regression function).
2) If the true function is simple, then an "inflexible" learning algorithm with high bias and
low variance will be able to learn it from a small amount of data.
3) But if the true function is highly complex (e.g., because it involves complex interactions
among many different input features and behaves differently in different parts of the
input space), then the function will only be able to learn with a large amount of training
data paired with a "flexible" learning algorithm with low bias and high variance.
c) Dimensionality of the input space
1. A third issue is the dimensionality of the input space.
2. If the input feature vectors have large dimensions, learning the function can be
difficult even if the true function only depends on a small number of those features.
3. This is because the many "extra" dimensions can confuse the learning algorithm and
cause it to have high variance.
4. Hence, input data of large dimensions typically requires tuning the classifier to have
low variance and high bias.
5. In practice, if the engineer can manually remove irrelevant features from the input
data, it will likely improve the accuracy of the learned function.
6. In addition, there are many algorithms for feature selection that seek to identify the
relevant features and discard the irrelevant ones.
7. This is an instance of the more general strategy of dimensionality reduction, which
seeks to map the input data into a lower-dimensional space prior to running the
supervised learning algorithm.
d) Noise in the output values
1. A fourth issue is the degree of noise in the desired output values (the supervisory
target variables).
2. If the desired output values are often incorrect (because of human error or sensor
errors), then the learning algorithm should not attempt to find a function that exactly
matches the training examples.
3. Attempting to fit the data too carefully leads to overfitting.
4. You can overfit even when there are no measurement errors (stochastic noise) if the
function you are trying to learn is too complex for your learning model.
5. In such a situation, the part of the target function that cannot be modeled "corrupts"
your training data - this phenomenon has been called deterministic noise.
6. When either type of noise is present, it is better to go with a higher bias, lower
variance estimator.
7. In practice, there are several approaches to alleviate noise in the output values such
as early stopping to prevent overfitting as well as detecting and removing the noisy
training examples prior to training the supervised learning algorithm.
8. There are several algorithms that identify noisy training examples and removing the
suspected noisy training examples prior to training has decreased generalization
error with statistical significance.
Other factors to consider when choosing and applying a learning algorithm include
the following:
1. Heterogeneity of the data. If the feature vectors include features of many different kinds
(discrete, discrete ordered, counts, continuous values), some algorithms are easier to
apply than others.
2. Many algorithms, including support-vector machines, linear regression, logistic
regression, neural networks, and nearest neighbor methods, require that the input
features be numerical and scaled to similar ranges (e.g., to the [-1,1] interval).
3. Methods that employ a distance function, such as nearest neighbor methods and
support-vector machines with Gaussian kernels, are particularly sensitive to this.
4. An advantage of decision trees is that they easily handle heterogeneous data.
5. Redundancy in the data. If the input features contain redundant information (e.g., highly
correlated features), some learning algorithms (e.g., linear regression, logistic
regression, and distance based methods) will perform poorly because of numerical
instabilities.
6. These problems can often be solved by imposing some form of regularization.
7. Presence of interactions and non-linearities. If each of the features makes an
independent contribution to the output, then algorithms based on linear functions (e.g.,
linear regression, logistic regression, support-vector machines, naive Bayes) and
Algorithms:
The most widely used learning algorithms are:
The most widely used learning algorithms are a diverse set of methods used in machine learning
to solve various types of problems.
Here is a single definition that encompasses these algorithms:
1. Machine learning algorithms are computational models and techniques that enable
computers to learn patterns and relationships in data without being explicitly programmed.
They use statistical and mathematical principles to generalize from known examples
(training data) and make predictions or decisions about new, unseen data.
2. Each of the listed algorithms serves different purposes and is suitable for specific types of
tasks.
Here's a brief overview of each algorithm:
3. Support Vector Machines (SVM): A supervised learning algorithm used for classification
and regression tasks. It finds a hyperplane that best separates different classes in the data
space.
4. Linear Regression: A simple and widely used supervised learning algorithm for regression
tasks. It models the relationship between independent variables and a dependent variable
using a linear equation.
5. Logistic Regression: Another supervised learning algorithm used for binary classification
tasks. It models the probability that an instance belongs to a particular class.
6. Naive Bayes: A probabilistic supervised learning algorithm used for classification tasks. It
relies on Bayes' theorem and assumes independence between features.
7. Linear Discriminant Analysis (LDA): A dimensionality reduction technique and a
classifier used in supervised learning. It projects data into a lower-dimensional space while
maximizing class separability.
8. Decision Trees: A popular supervised learning algorithm for classification and regression
tasks. It recursively splits the data based on feature values to create a tree-like structure for
decision-making.
9. K-Nearest Neighbor (KNN) Algorithm: A simple and intuitive supervised learning
algorithm used for classification and regression tasks. It classifies data points based on the
majority class among their k nearest neighbors.
10. Neural Networks (Multilayer Perceptron): A powerful class of models used for various
machine learning tasks, including classification, regression, and more complex problems.
They are inspired by the structure and functioning of biological neural networks.
11. Similarity Learning: A type of unsupervised or supervised learning, where the algorithm
learns to measure similarity or distance between data points.
These algorithms play a crucial role in machine learning and data analysis, and their choice
depends on the nature of the problem, the amount and quality of available data, and other
specific requirements of the task at hand.
Applications
1. Bioinformatics:
Bioinformatics is the interdisciplinary field that combines biology, computer science,
and statistics to analyze and interpret biological data.
It involves the development and application of computational tools and methods to
study biological systems, genes, proteins, and other biomolecules.
2. Cheminformatics:
Cheminformatics is the application of informatics methods to the field of chemistry.
It involves the storage, analysis, retrieval, and manipulation of chemical data, especially
in the context of drug discovery and chemical compound design.
3. Quantitative Structure-Activity Relationship (QSAR): QSAR is a method used in
cheminformatics and pharmaceutical research to predict the biological activity or property
of a chemical compound based on its structure and molecular properties.
4. Database Marketing: Database marketing involves the use of customer data, such as
purchase history and demographic information, to create targeted marketing campaigns and
personalized communication with customers.
5. Handwriting Recognition: Handwriting recognition, also known as Handwritten Text
Recognition (HTR), is the technology that enables computers to interpret and convert
handwritten text into machine-readable text.
6. Information Retrieval: Information retrieval is the process of searching for and retrieving
relevant information from a collection of unstructured or structured data, such as text
documents, web pages, or databases.
7. Learning to Rank: Learning to Rank is a machine learning approach that focuses on
training algorithms to rank a set of items or documents based on their relevance to a given
query or user preference.
8. Information Extraction: Information Extraction involves automatically extracting
structured information from unstructured data sources, such as text documents, to create a
more organized and structured dataset.
9. Object Recognition in Computer Vision: Object recognition is the task of identifying and
localizing specific objects or patterns within an image or video using computer vision
techniques.
10. Optical Character Recognition (OCR): OCR is the technology that converts scanned
documents or images containing text into machine-encoded text, making it searchable and
editable.
11. Spam Detection: Spam detection is the process of identifying and filtering out unwanted
or unsolicited messages, often found in emails or online communication.
12. Pattern Recognition: Pattern recognition is the process of identifying recurring patterns
or regularities within data and using these patterns to make predictions or categorize new
data.
13. Speech Recognition: Speech recognition is the technology that converts spoken language
into written text or other machine-readable formats, enabling computers to understand and
process human speech.
14. Supervised Learning and Downward Causation: This statement appears to mix concepts
from different fields, and it is not an accurate representation of supervised learning, which
is a machine learning paradigm where the algorithm is trained using labeled data to make
predictions or decisions.
10
11
EXTRA INFORMATION
Supervised Learning Neural Networks: Key Points and Examples
12
d. Training: Feed training data through the network, adjust weights using optimization
techniques (e.g., gradient descent).
e. Validation: Fine-tune hyperparameters, like learning rate, using a validation dataset
to prevent overfitting.
f. Testing: Evaluate the trained model on a separate test dataset to assess its
generalization performance.
g. Deployment: Deploy the model to make predictions on new, unseen data.
7. Challenges:
a) Overfitting: Model may perform well on training data but poorly on new data due
to excessive complexity.
b) Underfitting: Model may not capture underlying patterns in data due to insufficient
complexity.
c) Bias and Fairness: Models can learn biases present in the training data, leading to
unfair predictions.
8. Advantages:
a) Predictive Power: Supervised learning can make accurate predictions when
provided with high-quality labeled data.
b) Versatility: Applicable in various domains, from image analysis to natural
language processing.
9. Limitations:
a. Labeling Effort: Requires extensive labeled data, which can be time-consuming
and expensive to create.
b. Dependency on Data Quality: Model performance heavily relies on the quality
and representativeness of the labeled data.
Supervised learning neural networks are powerful tools for pattern recognition and
prediction tasks, making them fundamental in many real-world applications.
13
Supervised Learning
As the name suggests, supervised learning takes place under the supervision of a teacher. This
learning process is dependent. During the training of ANN under supervised learning, the input
vector is presented to the network, which will produce an output vector. This output vector is
compared with the desired/target output vector. An error signal is generated if there is a
difference between the actual output and the desired/target output vector. On the basis of this
error signal, the weights would be adjusted until the actual output is matched with the desired
output.
Perceptron
Developed by Frank Rosenblatt by using McCulloch and Pitts model, perceptron is the basic
operational unit of arti cial neural networks. It employs supervised learning rule and is able to
classify the data into two classes.
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 1/16
Downloaded by Satwika Bijja (satwika14.com@gmail.com)
lOMoARcPSD|35088437
Links − It would have a set of connection links, which carries a weight including a bias
always having weight 1.
Adder − It adds the input after they are multiplied with their respective weights.
Activation function − It limits the output of neuron. The most basic activation function is a
Heaviside step function that has two possible outputs. This function returns 1, if the input
is positive, and 0 for any negative input.
Training Algorithm
Perceptron network can be trained for single output unit as well as multiple output units.
Weights
Bias
Learning rate α
For easy calculation and simplicity, weights and bias must be set equal to 0 and the learning
rate must be set equal to 1.
Step 2 − Continue step 3-8 when the stopping condition is not true.
xi = si (i = 1 to n)
Step 5 − Now obtain the net input with the following relation −
yin = b + ∑xn
i
i. wi
Here ‘b’ is bias and ‘n’ is the total number of input neurons.
Step 6 − Apply the following activation function to obtain the nal output.
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 2/16
Downloaded by Satwika Bijja (satwika14.com@gmail.com)
lOMoARcPSD|35088437
⎧ 1 if yin > θ
f (yin ) = ⎨
⎩ 0−1 ifif y− θ<⩽−yθin ⩽ θ
in
Case 1 − if y ≠ t then,
b(new) = b(old) + αt
Case 2 − if y = t then,
wi (new) = wi (old)
b(new) = b(old)
Here ‘y’ is the actual output and ‘t’ is the desired/target output.
Step 8 − Test for the stopping condition, which would happen when there is no change in
weight.
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 3/16
Downloaded by Satwika Bijja (satwika14.com@gmail.com)
lOMoARcPSD|35088437
Weights
Bias
Learning rate α
For easy calculation and simplicity, weights and bias must be set equal to 0 and the learning
rate must be set equal to 1.
Step 2 − Continue step 3-8 when the stopping condition is not true.
xi = si (i = 1 to n)
yin = b + ∑x w
n
i
i ij
Here ‘b’ is bias and ‘n’ is the total number of input neurons.
Step 6 − Apply the following activation function to obtain the nal output for each output unit j
= 1 to m −
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 4/16
Downloaded by Satwika Bijja (satwika14.com@gmail.com)
lOMoARcPSD|35088437
⎪⎧ 1 if yinj > θ
f (yin ) = ⎨
⎪⎩ 0−1 ifif y−injθ <⩽ −yinjθ ⩽ θ
Case 1 − if yj ≠ tj then,
Case 2 − if yj = tj then,
bj (new) = bj (old)
Here ‘y’ is the actual output and ‘t’ is the desired/target output.
Step 8 − Test for the stopping condition, which will happen when there is no change in weight.
It uses delta rule for training to minimize the Mean-Squared Error (MSE) between the
actual output and the desired/target output.
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 5/16
Downloaded by Satwika Bijja (satwika14.com@gmail.com)
lOMoARcPSD|35088437
Architecture
The basic structure of Adaline is similar to perceptron having an extra feedback loop with the
help of which the actual output is compared with the desired/target output. After comparison
on the basis of training algorithm, the weights and bias will be updated.
Training Algorithm
Step 1 − Initialize the following to start the training −
Weights
Bias
Learning rate α
For easy calculation and simplicity, weights and bias must be set equal to 0 and the learning
rate must be set equal to 1.
Step 2 − Continue step 3-8 when the stopping condition is not true.
Step 3 − Continue step 4-6 for every bipolar training pair s:t.
xi = si (i = 1 to n)
yin = b + ∑x w
n
i
i i
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 6/16
Downloaded by Satwika Bijja (satwika14.com@gmail.com)
lOMoARcPSD|35088437
Here ‘b’ is bias and ‘n’ is the total number of input neurons.
Step 6 − Apply the following activation function to obtain the nal output −
Case 1 − if y ≠ t then,
Case 2 − if y = t then,
wi (new) = wi (old)
b(new) = b(old)
Here ‘y’ is the actual output and ‘t’ is the desired/target output.
Step 8 − Test for the stopping condition, which will happen when there is no change in weight
or the highest weight change occurred during training is smaller than the speci ed tolerance.
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 7/16
Downloaded by Satwika Bijja (satwika14.com@gmail.com)
lOMoARcPSD|35088437
It is just like a multilayer perceptron, where Adaline will act as a hidden unit between the
input and the Madaline layer.
The weights and the bias between the input and Adaline layers, as in we see in the
Adaline architecture, are adjustable.
The Adaline and Madaline layers have xed weights and bias of 1.
Architecture
The architecture of Madaline consists of “n” neurons of the input layer, “m” neurons of the
Adaline layer, and 1 neuron of the Madaline layer. The Adaline layer can be considered as the
hidden layer as it is between the input layer and the output layer, i.e. the Madaline layer.
Training Algorithm
By now we know that only the weights and bias between the input and the Adaline layer are to
be adjusted, and the weights and bias between the Adaline and the Madaline layer are xed.
Weights
Bias
Learning rate α
For easy calculation and simplicity, weights and bias must be set equal to 0 and the learning
rate must be set equal to 1.
Step 2 − Continue step 3-8 when the stopping condition is not true.
Step 3 − Continue step 4-7 for every bipolar training pair s:t.
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 8/16
Downloaded by Satwika Bijja (satwika14.com@gmail.com)
lOMoARcPSD|35088437
xi = si (i = 1 to n)
Step 5 − Obtain the net input at each hidden layer, i.e. the Adaline layer with the following
relation −
Qinj = bj + ∑x w
n
i
i ij j = 1 to m
Here ‘b’ is bias and ‘n’ is the total number of input neurons.
Step 6 − Apply the following activation function to obtain the nal output at the Adaline and
the Madaline layer −
f (x) = { 1−1 if x⩾ 0
if x < 0
Qj = f (Qinj )
y = f (yin )
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 9/16
Downloaded by Satwika Bijja (satwika14.com@gmail.com)
lOMoARcPSD|35088437
In this case, the weights would be updated on Qj where the net input is close to 0 because t = 1.
In this case, the weights would be updated on Qk where the net input is positive because t = -1.
Here ‘y’ is the actual output and ‘t’ is the desired/target output.
Case 3 − if y = t then
Step 8 − Test for the stopping condition, which will happen when there is no change in weight
or the highest weight change occurred during training is smaller than the speci ed tolerance.
Architecture
As shown in the diagram, the architecture of BPN has three interconnected layers having
weights on them. The hidden layer as well as the output layer also has bias, whose weight is
always 1, on them. As is clear from the diagram, the working of BPN is in two phases. One
phase sends the signal from the input layer to the output layer, and the other phase back
propagates the error from the output layer to the input layer.
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 10/16
Downloaded by Satwika Bijja (satwika14.com@gmail.com)
lOMoARcPSD|35088437
Training Algorithm
For training, BPN will use binary sigmoid activation function. The training of BPN will have the
following three phases.
Weights
Learning rate α
For easy calculation and simplicity, take some small random values.
Step 2 − Continue step 3-11 when the stopping condition is not true.
Phase 1
Step 4 − Each input unit receives input signal xi and sends it to the hidden unit for all i = 1 to n
Step 5 − Calculate the net input at the hidden unit using the following relation −
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 11/16
Downloaded by Satwika Bijja (satwika14.com@gmail.com)
lOMoARcPSD|35088437
Qinj = bj 0 + ∑x v
n
i=1
i ij j = 1 to p
Here b0j is the bias on hidden unit, vij is the weight on j unit of the hidden layer coming from i
unit of the input layer.
Now calculate the net output by applying the following activation function
Qj = f (Qinj )
Send these output signals of the hidden layer units to the output layer units.
Step 6 − Calculate the net input at the output layer unit using the following relation −
yink = bk 0 + ∑Q w
p
j=1
j jk k = 1 to m
Here b0k is the bias on output unit, wjk is the weight on k unit of the output layer coming from j
unit of the hidden layer.
yk = f (yink )
Phase 2
Step 7 − Compute the error correcting term, in correspondence with the target pattern received
at each output unit, as follows −
δk = ( tk − yk )f (yink )
′
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 12/16
Downloaded by Satwika Bijja (satwika14.com@gmail.com)
lOMoARcPSD|35088437
Δ bk
0 = αδ k
Step 8 − Now each hidden unit will be the sum of its delta inputs from the output units.
δinj = ∑δ w
m
k=1
k jk
δj = δinj f (Qinj )
′
Δ wij = αδj xi
Δ bj
0 = αδ j
Phase 3
Step 9 − Each output unit (ykk = 1 to m) updates the weight and bias as follows −
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 13/16
Downloaded by Satwika Bijja (satwika14.com@gmail.com)
lOMoARcPSD|35088437
b k (new)
0 = b k (old)
0 + Δ bk
0
Step 10 − Each output unit (zjj = 1 to p) updates the weight and bias as follows −
b j (new)
0 = b j (old)
0 + Δ bj
0
Step 11 − Check for the stopping condition, which may be either the number of epochs reached
or the target output matches the actual output.
Mathematical Formulation
For the activation function yk = f (yink ) the derivation of net input on Hidden layer as well
yink = ∑zw
i
i jk
E =
1
2
∑t
k
[ k − yk ] 2
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 14/16
Downloaded by Satwika Bijja (satwika14.com@gmail.com)
lOMoARcPSD|35088437
E
∂
∂ wjk
=
∂
∂
wjk (
1
2
∑t k
[ k − yk ] 2
)
∂ 1
=
∂ wjk ⟮
2
tk[ − t(yink )] 2
⟯
= −[ tk − yk ] ∂ w∂ f (yink )
jk
= −[ tk − yk ]f (yink ) ∂ w∂ (yink )
jk
= −[ tk − yk ]f (yink )zj
′
∂E
∂ vij
= − ∑δ k
k ∂v
∂
ij
( yink )
δj = − ∑δ w f z
k
k jk
′
( inj )
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 15/16
Downloaded by Satwika Bijja (satwika14.com@gmail.com)
lOMoARcPSD|35088437
Δ wjk = − α ∂∂wE
jk
= α δk zj
Δ vij = − α ∂∂vE
ij
= α δj xi
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 16/16
Downloaded by Satwika Bijja (satwika14.com@gmail.com)
lOMoARcPSD|35088437
EXTRA NOTES
What is Perceptron: A Beginners Guide for Perceptron
1. A neural network link that contains computations to track features and uses
Artificial Intelligence in the input data is known as Perceptron.
2. This neural links to the artificial neurons using simple logic gates with binary
outputs.
3. An artificial neuron invokes the mathematical function and has node, input,
weights, and output equivalent to the cell nucleus, dendrites, synapse, and
axon, respectively, compared to a biological neuron.
What is a Binary Classifier in Machine Learning?
a) A binary classifier in machine learning is a type of model that is trained to
classify data into one of two possible categories, typically represented as
binary labels such as 0 or 1, true or false, or positive or negative.
a. For example, a binary classifier may be trained to distinguish between
spam and non-spam emails, or to predict whether a credit card
transaction is fraudulent or legitimate.
b) Binary classifiers are a fundamental building block of many machine learning
applications, and there are numerous algorithms that can be used to build
them, including logistic regression, support vector machines (SVMs),
decision trees, random forests, and neural networks.
c) These models are typically trained using labeled data, where the correct label
or category for each example in the training set is known, and then used to
predict the category of new, unseen examples.
d) The performance of a binary classifier is typically evaluated using metrics
such as accuracy, precision, recall, and F1 score, which measure how well
the model is able to correctly identify positive and negative examples in the
data.
e) High-quality binary classifiers are essential for a wide range of applications,
including natural language processing, computer vision, fraud detection, and
medical diagnosis, among many others.
Biological Neuron
A human brain has billions of neurons. Neurons are interconnected nerve
cells in the human brain that are involved in processing and transmitting
In the next section, let us compare the biological neuron with the artificial neuron.
Biological Neuron vs. Artificial Neuron
The biological neuron is analogous to artificial neurons in the following terms:
S.No Biological Neuron Artificial Neuron
1 Cell Nucleus (Soma) Node
2 Dendrites Input
3 Synapse Weights or Interconnections
4 Axon Output
Biological Neuron
a) Researchers Warren McCullock and Walter Pitts published their first concept
of simplified brain cell in 1943.
b) This was called McCullock-Pitts (MCP) neuron.
c) They described such a nerve cell as a simple logic gate with binary outputs.
d) Multiple signals arrive at the dendrites and are then integrated into the cell
body, and, if the accumulated signal exceeds a certain threshold, an output
signal is generated that will be passed on by the axon. In the next section, let
us talk about the artificial neuron.
What is Artificial Neuron
An artificial neuron is a mathematical function based on a model of biological
neurons, where each neuron takes inputs, weighs them separately, sums them up and
passes this sum through a nonlinear function to produce output.
Perceptron
1. Input Layer: The input layer consists of one or more input neurons,
which receive input signals from the external world or from other layers
of the neural network.
3. Bias: A bias term is added to the input layer to provide the perceptron with
additional flexibility in modeling complex patterns in the input data.
6. Training Algorithm:
Types of Perceptron:
1. Single layer: Single layer perceptron can learn only linearly separable
patterns.
2. Multilayer: Multilayer perceptrons can learn about two or more layers
having a greater processing power.
The Perceptron algorithm learns the weights for the input signals in order to draw a
linear decision boundary.
Note: Supervised Learning is a type of Machine Learning used to learn models from
labeled training data.
It enables output prediction for future or unseen data.
Let us focus on the Perceptron Learning Rule in the next section.
This step function or Activation function is vital in ensuring that output is mapped
between (0,1) or (-1,1).
Take note that the weight of input indicates a node’s strength.
Similarly, an input value gives the ability the shift the activation function curve up
or down.
Step 1: Multiply all input values with corresponding weight values and then add to
calculate the weighted sum. The following is the mathematical expression of it:
∑wi*xi = x1*w1 + x2*w2 + x3*w3+……..x4*w4
Add a term called bias ‘b’ to this weighted sum to improve the model’s performance.
Step 2: An activation function is applied with the above-mentioned weighted sum
giving us an output either in binary form or a continuous value as follows:
Y=f(∑wi*xi + b)
Types of Perceptron models
We have already discussed the types of Perceptron models in the Introduction. Here,
we shall give a more profound look at this:
1. Single Layer Perceptron model:
a) One of the easiest ANN(Artificial Neural Networks) types consists of
a feed-forward network and includes a threshold transfer inside the
model.
b) The main objective of the single-layer perceptron model is to analyze
the linearly separable objects with binary outcomes.
A Single-layer perceptron can learn only linearly separable patterns.
2. Multi-Layered Perceptron model: It is mainly similar to a single-layer
perceptron model but has more hidden layers.
3. Forward Stage: From the input layer in the on stage, activation functions
begin and terminate on the output layer.
4. Backward Stage:
a) In the backward stage, weight and bias values are modified per the
model’s requirement.
b) The backstage removed the error between the actual output and
demands originating backward on the output layer.
c) A multilayer perceptron model has a greater processing power and can
process linear and non-linear patterns.
d) Further, it also implements logic gates such as AND, OR, XOR,
XNOR, and NOR.
Advantages:
A multi-layered perceptron model can solve complex non-linear
problems.
It works well with both small and large input data.
Helps us to obtain quick predictions after the training.
Helps us obtain the same accuracy ratio with big and small data.
Disadvantages:
In multi-layered perceptron model, computations are time-consuming and
complex.
It is tough to predict how much the dependent variable affects each
independent variable.
The model functioning depends on the quality of training.
Characteristics of the Perceptron Model
The following are the characteristics of a Perceptron Model:
1. It is a machine learning algorithm that uses supervised learning of binary
classifiers.
2. In Perceptron, the weight coefficient is automatically learned.
3. Initially, weights are multiplied with input features, and then the decision
is made whether the neuron is fired or not.
4. The activation function applies a step rule to check whether the function
is more significant than zero.
5. The linear decision boundary is drawn, enabling the distinction between
the two linearly separable classes +1 and -1.
6. If the added sum of all input values is more than the threshold value, it
must have an output signal; otherwise, no output will be shown.
Limitation of Perceptron Model
The following are the limitation of a Perceptron model:
1. The output of a perceptron can only be a binary number (0 or 1) due to the
hard-edge transfer function.
2. It can only be used to classify the linearly separable sets of input vectors.
If the input vectors are non-linear, it is not easy to classify them correctly.
Perceptron Learning Rule
a) Perceptron Learning Rule states that the algorithm would automatically learn
the optimal weight coefficients.
b) The input features are then multiplied with these weights to determine if a
neuron fires or not.
The Perceptron receives multiple input signals, and if the sum of the input signals
exceeds a certain threshold, it either outputs a signal or does not return an output.
In the context of supervised learning and classification, this can then be used to
predict the class of a sample.
Perceptron Function
Perceptron is a function that maps its input “x,” which is multiplied with the learned
weight coefficient; an output value ”f(x)”is generated.
“b” = bias (an element that adjusts the boundary away from origin without
any dependence on the input value)
The output can be represented as “1” or “0.” It can also be represented as “1” or “-
1” depending on which activation function is used.
Inputs of a Perceptron
A Perceptron accepts inputs, moderates them with certain weight values, then applies
the transformation function to output the final result. The image below shows a
Perceptron with a Boolean output.
A Boolean output is based on inputs such as salaried, married, age, past credit profile,
etc.
It has only two values: Yes and No or True and False. The summation function “∑”
multiplies all inputs of “x” by weights “w” and then adds them up as follows:
For example:
If ∑ wixi> 0 => then final output “o” = 1 (issue bank loan)
Else, final output “o” = -1 (deny bank loan)
Step function gets triggered above a certain value of the neuron output; else it outputs
zero. Sign Function outputs +1 or -1 depending on whether neuron output is greater
than zero or not. Sigmoid is the S-curve and outputs a value between 0 and 1.
Output of Perceptron
Perceptron with a Boolean output:
Inputs: x1…xn
Output: o(x1….xn)
Bias Unit
For simplicity, the threshold θ can be brought to the left and represented as w0x0,
where w0= -θ and x0= 1.
Output:
The figure shows how the decision function squashes wTx to either +1 or -1 and how
it can be used to discriminate between two linearly separable classes.
Perceptron at a Glance
Perceptron has the following characteristics:
Perceptron is an algorithm for Supervised Learning of single layer binary
linear classifiers.
Optimal weight coefficients are automatically learned.
Weights are multiplied with the input features and decision is made if the
neuron is fired or not.
Activation function applies a step rule to check if the output of the
weighting function is greater than zero.
Observation:
In Fig(a) above, examples can be clearly separated into positive and
negative values; hence, they are linearly separable. This can include logic
gates like AND, OR, NOR, NAND.
Fig (b) shows examples that are not linearly separable (as in an XOR gate).
Diagram (a) is a set of training examples and the decision surface of a
Perceptron that classifies them correctly.
Diagram (b) is a set of training examples that are not linearly separable,
that is, they cannot be correctly classified by any straight line.
X1 and X2 are the Perceptron inputs.
3. XOR
A XOR gate, also called as Exclusive OR gate, has two inputs and one output.
The gate returns a TRUE as the output if and ONLY if one of the input states is true.
Input Output
A B
0 0 0
0 1 1
1 0 1
1 1 0
EXTRA NOTES
A network with a single linear unit is called Adaline (Adaptive Linear Neural). A unit with
a linear activation function is called a linear unit.
In Adaline, there is only one output unit and output values are bipolar (+1,-1).
ADALINE (Adaptive Linear Neuron or later Adaptive Linear Element)
It is an early single-layer artificial neural network and the name of the physical device that
implemented this network. The network uses memistors.
It was developed by professor Bernard Widrow and his doctoral student Ted Hoff at
Stanford University in 1960.
It is based on the perceptron. It consists of a weight, a bias and a summation function.
The difference between Adaline and the standard (McCulloch–Pitts) perceptron is in how
they learn.
Adaline unit weights are adjusted to match a teacher signal, before applying the Heaviside
function (see figure), but the standard perceptron unit weights are adjusted to match the
correct output, after applying the Heaviside function.
A multilayer network of ADALINE units is a MADALINE.
MADALINE
The three-layer network uses memistors. Three different training algorithms for MADALINE
networks, which cannot be learned using backpropagation because the sign function is not
differentiable, have been suggested, called Rule I, Rule II and Rule III.
Rule-1: MADALINE Rule 1 (MRI) - The first of these dates back to 1962 and cannot adapt
the weights of the hidden-output connection.[10]
Rule-2: MADALINE Rule 2 (MRII) - The second training algorithm improved on Rule I and
was described in 1988.[8]
Rule-3: The Rule II training algorithm is based on a principle called "minimal disturbance".
Introduction to Backpropagation:
2. Backpropagation does not require any parameters to be set, except the number of inputs.
required.
the model's parameters, aiming to minimize the mean squared error (MSE).
1. Traverse through the network from the input to the output by computing the hidden
2. In the output layer, calculate the derivative of the cost function with respect to the input
3. Repeatedly update the weights until they converge or the model has undergone enough
iterations.
Rosenblatt, but he did not know how to implement this, even though Henry J. Kelley
theory.
7. Backpropagation computes the gradient of a loss function with respect to the weights
of the network for a single input–output example, and does so efficiently, computing
the gradient one layer at a time, iterating backward from the last layer to avoid
redundant calculations of intermediate terms in the chain rule; this can be derived
8. Gradient descent, or variants such as stochastic gradient descent, are commonly used.
9. Strictly the term backpropagation refers only to the algorithm for computing the
gradient, not how the gradient is used; but the term is often used loosely to refer to the
entire learning algorithm – including how the gradient is used, such as by stochastic
gradient descent.
10. In 1986 David E. Rumelhart et al. published an experimental analysis of the technique.
1. Backpropagation is just a way of propagating the total loss back into the neural network
to know how much of the loss every node is responsible for, and subsequently updating
the weights in a way that minimizes the loss by giving the nodes with higher error rates
2.
5. The elements of the weight vector w are ordered by layer (starting from the first hidden
layer), then by neurons in a layer, and then by the number of a synapse within a neuron.
What is Backpropagation?
Backpropagation:
1. It is the essence of neural network training. It is the method of fine-tuning the weights of a
neural network based on the error rate obtained in the previous epoch (i.e., iteration).
2. Proper tuning of the weights allows you to reduce error rates and make the model reliable
by increasing its generalization.
3. Backpropagation in neural network is a short form for “backward propagation of errors.”
It is a standard method of training artificial neural networks.
4. This method helps calculate the gradient of a loss function with respect to all the weights
in the network.
How Backpropagation Algorithm Works
The Back propagation algorithm in neural network computes the gradient of the loss
function for a single weight by the chain rule.
It efficiently computes one layer at a time, unlike a native direct computation.
It computes the gradient, but it does not define how the gradient is used. It generalizes
the computation in the delta rule.
Consider the following Back propagation neural network example diagram to
understand:
A feedforward neural network is an artificial neural network where the nodes never
form a cycle.
This kind of neural network has an input layer, hidden layers, and an output layer.
It is the first and simplest type of artificial neural network.
Static Back-propagation
Recurrent Backpropagation
Static back-propagation:
It is one kind of backpropagation network which produces a mapping of a static input
for static output.
It is useful to solve static classification issues like optical character recognition.
Recurrent Backpropagation:
Recurrent Backpropagation in data mining is fed forward until a fixed value is achieved.
After that, the error is computed and propagated backward.
These kinds of neural networks work on the basis of pattern association, which means they can
store different patterns and at the time of giving an output they can produce one of the stored
patterns by matching them with the given input pattern. These types of memories are also
Architecture
As shown in the following gure, the architecture of Auto Associative memory network has ‘n’
number of input training vectors and similar ‘n’ number of output target vectors.
Training Algorithm
For training, this network is using the Hebb or Delta learning rule.
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_associate_memory.htm# 1/4
Downloaded by Satwika Bijja (satwika14.com@gmail.com)
lOMoARcPSD|35088437
xi = si (i = 1 to n)
yj = sj (j = 1 to n)
Testing Algorithm
Step 1 − Set the weights obtained during training for Hebb’s rule.
Step 3 − Set the activation of the input units equal to that of the input vector.
yinj = ∑x w
n
i=1
i ij
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_associate_memory.htm# 2/4
Downloaded by Satwika Bijja (satwika14.com@gmail.com)
lOMoARcPSD|35088437
Architecture
As shown in the following gure, the architecture of Hetero Associative Memory network has ‘n’
number of input training vectors and ‘m’ number of output target vectors.
Training Algorithm
For training, this network is using the Hebb or Delta learning rule.
xi = si (i = 1 to n)
yj = sj (j = 1 to m)
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_associate_memory.htm# 3/4
Downloaded by Satwika Bijja (satwika14.com@gmail.com)
lOMoARcPSD|35088437
Testing Algorithm
Step 1 − Set the weights obtained during training for Hebb’s rule.
Step 3 − Set the activation of the input units equal to that of the input vector.
yinj = ∑x w
n
i=1
i ij
⎧⎪ +1 if yinj > 0
yj = f (yinj) = ⎨
⎪⎩ 0−1 ifif yyinj
inj
= 0
< 0
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_associate_memory.htm# 4/4
Downloaded by Satwika Bijja (satwika14.com@gmail.com)
lOMoARcPSD|35088437
Pattern Association
Associative memory neural nets are single-layer nets in which the
weights are determined in such a way that the net can store a set of
pattern associations.
- Each association is an input-output vector pair, s: t.
- If each vector t is the same as the vectors with which it is associated,
then the net is called an autoassociative memory.
- If the t's are different from the s's, the net is called a heteroassociative
memory.
- In each of these cases, the net not only learns the specific pattern pairs
that were used for training, but also is able to recall the desired response
pattern when given an input stimulus that is similar, but not identical, to
the training input.
Before training an associative memory neural net, the original patterns
must be converted to an appropriate representation for computation.
In a simple example, the original pattern might consist of "on" and
"off" signals, and the conversion could be "on" = (+1), "off" = (0)
(binary representation) or "on" = (+1), "off" =(-1) (bipolar
representation).
TRAINING ALGORITHMS FOR PATTERN ASSOCIATION
1- Hebb Rule for Pattern Association:
- The Hebb rule is the simplest and most common method of
determining the weights for an associative memory neural net.
- we denote our training vector pairs (input training-target output
vectors) as s: t. We then denote our testing input vector as x, which
may or may not be the same as one of the training input vectors.
- In the training algorithm of hebb rule the weights initially adjusted
to 0, then updated using the following formula:
55
Outer products:
The weights found by using the Hebb rule (with all weights initially 0)
can also be described in terms of outer products of the input vector-output
vector pairs s:t. The outer product of two vectors
s = (s1, ……., si, ……., sn) ; t = (t1, ……., tj, ……., tm)
w = sTt
To store a set of associations s(p) : t(p), p = 1, . . . , P, where
s(p) = (s1(p), …., si(p), …., sn(p)) ;
t(p) = (t1(p), ……., tj(p), ……., tm(p))
P
wij si ( p)T t j ( p)
p 1
This is the sum of the outer product matrices required to store each
association separately. In general, we shall use the preceding formula or
the more concise vector matrix form,
P
W s ( p )T t ( p )
p 1
56
yJ =netJ = ∑xiwiJ
i=1
57
58
P s1 s2 s3 s4 t1 t2
1 s( 1 0 0 0) t( 1 0)
2 s( 1 1 0 0) t( 1 0)
3 s( 0 0 0 1) t( 0 1)
4 s( 0 0 1 1) t( 0 1)
Sol:
The training is accomplished by the Hebb rule, which is defined as:
wij(new) = wij(old)+ xiyj ; i.e., ∆wij = xiyj
xi = si
yj = t j
Training:
W=0
Note: only the weights that change at each step of the process are shown):
1. For the first pattern p=1, s: t pair (1, 0, 0, 0):(1, 0):
xl = 1; x2 = x3 = x4 = 0.; yl = 1; y2 = 0.
w11(new) = w11(old)+ x1y1 = 0 + 1 = 1
(all other weights remain 0)
2. For the second pattern p=2, s: t pair (1, 1, 0, 0):(1, 0):
xl = x2 = 1 ; x3 = x4 = 0.; yl = 1; y2 = 0.
w11(new) = w11(old)+ x1y1 = 1 + 1 = 2
w21(new) = w21(old)+ x2y1 = 0 + 1 = 1
(all other weights remain 0)
59
Now let us find the weight vector using outer products instead of the
algorithm for the Hebb rule.
The weight matrix to store the pattern pair (p) is given by the outer
product of the vector s(p) and t(p):
W(p) = s(p)Tt(p)
For p = 1 ; s = [1, 0, 0, 0] and t = [1, 0], the weight matrix
is
60
The weight matrix to store all four pattern pairs is the sum of the weight
matrices to store each pattern pair separately, namely,
We can also find the weight matrix to store all four patterns directly using
the outer product
W = sT t
1 1 0 0 1 0
0 1 0 0 1 0
W=
0 0 0 1 0 1
0 0 1 1 0 1
2 0
1 0
W= 0 1
0 2
61
3. In such memory associations for one type of object with another, a Recurrent Neural
Network (RNN) is needed to receive a pattern of one set of neurons as an input and
generate a related, but different, output pattern of another set of neurons.
BAM Architecture:
When BAM accepts an input of n-dimensional vector X from set A then the model recalls m-
dimensional vector Y from set B. Similarly when Y is treated as input, the BAM recalls X.
Algorithm:
Limitations of BAM:
Hop eld neural network was invented by Dr. John J. Hop eld in 1982. It consists of a single
layer which contains one or more fully connected recurrent neurons. The Hop eld network is
commonly used for auto-association and optimization tasks.
input and output patterns are discrete vector, which can be either binary 0, 1 or bipolar
+1, −1 in nature. The network has symmetrical weights with no self-connections i.e., wij =
Architecture
Following are some important points to keep in mind about discrete Hop eld network −
This model consists of neurons with one inverting and one non-inverting output.
The output of each neuron should be the input of other neurons but not the input of self.
Weight/connection strength is represented by wij.
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_hopfield.htm 1/5
Downloaded by Satwika Bijja (satwika14.com@gmail.com)
lOMoARcPSD|35088437
The output from Y1 going to Y2, Yi and Yn have the weights w12, w1i and w1n respectively.
Similarly, other arcs have the weights on them.
Training Algorithm
During training of discrete Hop eld network, weights will be updated. As we know that we can
have the binary input vectors as well as bipolar input vectors. Hence, in both the cases, weight
updates can be done with the following relation
wij = ∑sp
P
p=1
[2 i( ) − 1][2 sj (p) − 1] for i ≠ j
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_hopfield.htm 2/5
Downloaded by Satwika Bijja (satwika14.com@gmail.com)
lOMoARcPSD|35088437
wij = ∑s p
P
p=1
[ i( )][ sj (p)] for i ≠ j
Testing Algorithm
Step 1 − Initialize the weights, which are obtained from training algorithm by using Hebbian
principle.
Step 2 − Perform steps 3-9, if the activations of the network is not consolidated.
Step 4 − Make initial activation of the network equal to the external input vector X as follows −
yi = xi for i = 1 to n
yini = xi + ∑y wj
j ji
Step 7 − Apply the activation as follows over the net input to calculate the output −
⎧ 1 if yini > θi
yi = ⎨
⎩ y0i if yini = θi
if y ini < θi
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_hopfield.htm 3/5
Downloaded by Satwika Bijja (satwika14.com@gmail.com)
lOMoARcPSD|35088437
Energy function Ef, also called Lyapunov function determines the stability of discrete Hop eld
network, and is characterized as follows −
Ef = −
1
2
∑∑y y w ∑x y ∑θ y
n
i=1 j=1
n
i j ij −
n
i=1
i i +
i=1
n
i i
Condition − In a stable network, whenever the state of node changes, the above energy function
will decrease.
Δ Ef = Ef (yi k ( +1)
) − Ef (yi k )
( )
= − ( ∑w y
n
j=1
ij i
k
( )
+ xi − θi ) (yi k( +1)
− yi k )
( )
= −( neti )Δyi
Here Δ i = y yi k
( + 1)
− yi k
( )
The change in energy depends on the fact that only one unit can update its activation at a time.
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_hopfield.htm 4/5
Downloaded by Satwika Bijja (satwika14.com@gmail.com)
lOMoARcPSD|35088437
Model − The model or architecture can be build up by adding electrical components such as
ampli ers which can map the input voltage to the output voltage over a sigmoid activation
function.
Ef = 1
2
∑ ∑y y w ∑x y λ ∑ ∑w g ∫
n n
i=1 j=1
i j ij −
n
i=1
i i +
1
n
i=1 j=1
n
ij ri
0
yi
a (y)dy
−1
j≠ i j≠ i
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_hopfield.htm 5/5
Downloaded by Satwika Bijja (satwika14.com@gmail.com)