Unit 4 Notes
Unit 4 Notes
Unit 4 Notes
Chapter-2 Perceptrons
1
Chapter- 1 Artificial Neural Networks
The term "Artificial Neural Network" is derived from Biological neural networks that develop the structure of
a human brain. Similar to the human brain that has neurons interconnected to one another, artificial neural
networks also have neurons that are interconnected to one another in various layers of the networks. These
neurons are known as nodes.
The given figure illustrates the typical diagram of Biological Neural Network.
The typical Artificial Neural Network looks something like the given figure.
2
Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks, cell nucleus
represents Nodes, synapse represents Weights, and Axon represents Output.
To understand the concept of the architecture of an artificial neural network, we have to understand what a
neural network consists of. In order to define a neural network that consists of a large number of artificial
neurons, which are termed units arranged in a sequence of layers. Lets us look at various types of layers
available in an artificial neural network.
Input Layer:
As the name suggests, it accepts inputs in several different formats provided by the programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the calculations to find hidden
features and patterns.
Output Layer:
3
The input goes through a series of transformations using the hidden layer, which finally results in output that is
conveyed using this layer.
The artificial neural network takes input and computes the weighted sum of the inputs and includes a bias. This
computation is represented in the form of a transfer function.
It determines weighted total is passed as an input to an activation function to produce the output. Activation
functions choose whether a node should fire or not. Only those who are fired make it to the output layer. There
are distinctive activation functions available that can be applied upon the sort of task we are performing.
4
It is the most significant issue of ANN. When ANN produces a testing solution, it does not provide
insight concerning why and how. It decreases trust in the network.
Hardware dependence:
Artificial neural networks need processors with parallel processing power, as per their structure.
Therefore, the realization of the equipment is dependent.
Types of Artificial Neural Network
Neural Network works the same as the human nervous system functions. There are several types of neural
network. These networks implementation are based on the set of parameter and mathematical operation that are
required for determining the output.
FNN is the purest form of ANN in which input and data travel in only one direction. Data flows in an
only forward direction; that's why it is known as the Feedforward Neural Network.
The data passes through input nodes and exit from the output nodes. T
he nodes are not connected cyclically. It doesn't need to have a hidden layer. In FNN, there doesn't need
to be multiple layers. It may have a single layer also.
5
It has a front propagate wave that is achieved by using a classifying activation function. All other types
of neural network use backpropagation, but FNN can't.
In FNN, the sum of the product's input and weight are calculated, and then it is fed to the output.
Technologies such as face recognition and computer vision are used FNN.
RBFNN find the distance of a point to the centre and considered it to work smoothly.
There are two layers in the RBF Neural Network.
In the inner layer, the features are combined with the radial basis function. Features provide an output
that is used in consideration. Other measures can also be used rather than Euclidean.
o We define a receptor t.
o Confronted maps are drawn around the receptor.
o For RBF Gaussian Functions are generally used. So we can define the radial distance r=||X-t||.
6
This Neural Network is used in power restoration system. In the present era power system have increased in
size and complexity. It's both factors increase the risk of major power outages. Power needs to be restored as
quickly and reliably as possible after a blackout.
Multilayer Perceptron
A Multilayer Perceptron has three or more layer. The data that cannot be separated linearly is
classified with the help of this network.
This network is a fully connected network that means every single node is connected with all other
nodes that are in the next layer.
A Nonlinear Activation Function is used in Multilayer Perceptron. It's input and output layer nodes
are connected as a directed graph.
It is a deep learning method so that for training the network it uses backpropagation. It is extensively
applied in speech recognition and machine translation technologies.
7
In image classification and image recognition, a Convolutional Neural Network plays a vital role, or
we can say it is the main category for those.
Face recognition, object detection, etc., are some areas where CNN are widely used. It is similar to
FNN, learn-able weights and biases are available in neurons.
CNN takes an image as input that is classified and process under a certain category such as dog, cat,
lion, tiger, etc.
As we know, the computer sees an image as pixels and depends on the resolution of the picture. Based
on image resolution, it will see h * w * d, where h= height w= width and d= dimension.
For example, An RGB image is 6 * 6 * 3 array of the matrix, and the grayscale image is 4 * 4 * 3 array
of the pattern.
In CNN, each input image will pass through a sequence of convolution layers along with pooling, fully
connected layers, filters (Also known as kernels). And apply Soft-max function to classify an object
with probabilistic values 0 and 1.
Recurrent Neural Network is based on prediction. In this neural network, the output of a particular
layer is saved and fed back to the input. It will help to predict the outcome of the layer.
In Recurrent Neural Network, the first layer is formed in the same way as FNN's layer, and in the
subsequent layer, the recurrent neural network process begins.
Both inputs and outputs are independent of each other, but in some cases, it required to predict the next
word of the sentence.
8
Then it will depend on the previous word of the sentence. RNN is famous for its primary and most
important feature, i.e., Hidden State. Hidden State remembers the information about a sequence.
RNN has a memory to store the result after calculation. RNN uses the same parameters on each input
to perform the same task on all the hidden layers or data to produce the output.
Unlike other neural networks, RNN parameter complexity is less.
In Modular Neural Network, several different networks are functionally independent. In MNN the
task is divided into sub-task and perform by several systems.
During the computational process, networks don't communicate directly with each other. All the
interfaces are work independently towards achieving the output.
Combined networks are more powerful than flat and unrestricted. Intermediary takes the production of
each system, process them to produce the final output.
A Perceptron, on the other hand, is a single layer of LTU’s (Linear threshold units) which is similar to
an artificial neuron but the only difference is that the inputs and output might not be necessarily binary,
they can be any number.
As shown in the figure below, the LTU operates a function f(x) on the combination of inputs and the
respective weights.
A Linear Threshold Unit (LTU) as shown above is a perceptron which computes the linear
combination of these inputs and weights.
Z = (x1)*(w1) + (x2)*(w2)
10
And after this, it applies the function f(x) on Z, which is the resulting output of the LTU.
But for an LTU to give an output it needs to know the values of the weights w1 and w2. Now here comes
the training part.
The LTU is trained first to obtain the values of w1 and w2.
A Perceptron is composed of a single layer of LTU’s. Each of which is connected to every other LTU of the
previous layer of LTU’s or in other words the previous Perceptron.
The above combination of neurons and Perceptron receives two inputs and gives one output after the whole
computation process.
Perceptrons do not output a class probability, rather they just make predictions based on a hard threshold.
Training a Perceptron
A Perceptron is fed one training instance at a time. And for every output neuron that produced a wrong
prediction, it reinforces the connection weights from inputs that would have contributed to the correct prediction.
W(i, j) : = W(i, j) + n*(Y-y )x
W(i, j) — Connection weight between ith input neuron and jth output neuron.
n — Learning rate
Y — Output of the jth output neuron for the current training instance
y — Target output of the jth output neuron for the current training instance
x — ith input value of the current training instance
11
12
1
24
13
26
27
14
28
29
Chapter-2 Perceptrons
15
1) Introduction on Perceptrons
Perceptron consists of one or more inputs, a processor, and only one output.
Perceptron is Machine Learning algorithm for supervised learning of various binary classification tasks.
Further, Perceptron is also understood as an Artificial Neuron or neural network unit that helps to detect
certain input data computations in business intelligence.
Perceptron model is also treated as one of the best and simplest types of Artificial Neural networks. However, it
is a supervised learning algorithm of binary classifiers. Hence, we can consider it as a single-layer neural
network with four main parameters, i.e., input values, weights and Bias, net sum, and an activation
function.
In Machine Learning, binary classifiers are defined as the function that helps in deciding whether input data can
be represented as vectors of numbers and belongs to some specific class.
Binary classifiers can be considered as linear classifiers. In simple words, we can understand it as
a classification algorithm that can predict linear predictor function in terms of weight and feature vectors.
Mr. Frank Rosenblatt invented the perceptron model as a binary classifier which contains three main
components. These are as follows:
16
o Input Nodes or Input Layer:
This is the primary component of Perceptron which accepts the initial data into the system for further
processing. Each input node contains a real numerical value.
Weight parameter represents the strength of the connection between units. This is another most important
parameter of Perceptron components. Weight is directly proportional to the strength of the associated input
neuron in deciding the output. Further, Bias can be considered as the line of intercept in a linear equation.
o Activation Function:
These are the final and important components that help to determine whether the neuron will fire or not.
Activation Function can be considered primarily as a step function.
o Sign function
The data scientist uses the activation function to take a subjective decision based on various problem statements
and forms the desired outputs. Activation function may differ (e.g., Sign, Step, and Sigmoid) in perceptron
models by checking whether the learning process is slow or has vanishing or exploding gradients.
Based on the layers, Perceptron models are divided into two types. These are as follows:
17
Single Layer Perceptron Model:
This is one of the easiest Artificial neural networks (ANN) types. A single-layered perceptron model
consists feed-forward network and also includes a threshold transfer function inside the model.
The main objective of the single-layer perceptron model is to analyze the linearly separable objects with
binary outcomes.
In a single layer perceptron model, its algorithms do not contain recorded data, so it begins with
inconstantly allocated input for weight parameters.
If the outcome is same as pre-determined or threshold value, then the performance of this model is
stated as satisfied, and weight demand does not change. "Single-layer perceptron can learn only linearly
separable patterns."
Like a single-layer perceptron model, a multi-layer perceptron model also has the same model structure but has
a greater number of hidden layers.
The multi-layer perceptron model is also known as the Backpropagation algorithm, which executes in two
stages as follows:
o Forward Stage: Activation functions start from the input layer in the forward stage and terminate on the output
layer.
o Backward Stage: In the backward stage, weight and bias values are modified as per the model's requirement. In
this stage, the error between actual output and demanded originated backward on the output layer and ended on
the input layer.
A multi-layer perceptron model has greater processing power and can process linear and non-linear patterns.
Further, it can also implement logic gates such as AND, OR, XOR, NAND, NOT, XNOR, NOR.
Perceptron Function
Perceptron function ''f(x)'' can be achieved as output by multiplying the input 'x' with the learned weight
coefficient 'w'.
f(x)=1; if w.x+b>0
otherwise, f(x)=0
Characteristics of Perceptron
3. Initially, weights are multiplied with input features, and the decision is made whether the neuron is fired or not.
4. The activation function applies a step rule to check whether the weight function is greater than zero.
5. The linear decision boundary is drawn, enabling the distinction between the two linearly separable classes +1 and
-1.
6. If the added sum of all input values is more than the threshold value, it must have an output signal; otherwise, no
output will be shown.
o The output of a perceptron can only be a binary number (0 or 1) due to the hard limit transfer function.
19
o Perceptron can only be used to classify the linearly separable sets of input vectors. If input vectors are non-linear,
it is not easy to classify them properly.
Cost function(C) or Loss function measures the difference between the actual output and predicted output
We randomly initialize all the weights for a neural network to a value close to zero but not zero.
we calculate the gradient, ∂c/∂ω which is a partial derivative of cost with respect to weight.
α is learning rate, helps adjust the weights with respect to gradient descent
w is the weights for the neurons, α is learning rate, C is the cost and ∂c/∂ω is the gradient
Learning Rate
Learning rate controls how much we should adjust the weights with respect to the loss gradient. Learning rates
Lower the value of the learning rate, slower will be the convergence to global minima.
20
A higher value for learning rate will not allow the gradient descent to converge
Since our goal is to minimize the cost function to find the optimized value for weights, we run multiple iterations
with different weights and calculate the cost to arrive at a minimum cost as shown below
computationally intensive.
Batch gradient descent uses the entire dataset to calculate each iteration of gradient descent
21
Theoretical analysis of weights and convergence rates are easy to understand
Perform redundant computation for the same training example for large datasets
Can be very slow and intractable as large datasets may not fit in the memory
As we take the entire dataset for computation we can update the weights of the model for the new data
In stochastic gradient descent we use a single datapoint or example to calculate the gradient and update the
we first need to shuffle the dataset so that we get a completely randomized dataset. As the dataset is
randomized and weights are updated for each single example, update of the weights and the cost function will be
Random sample helps to arrive at a global minima and avoids getting stuck at a local minima.
Learning is much faster and convergence is quick for a very large dataset.
22
As we frequently update weights, Cost function fluctuates heavily
Mini batch gradient descent is widely used and converges faster and is more stable.
As we take a batch with different samples,it reduces the noise which is variance of the weight updates
23
2) Multilayer networks and Back propagation
Multi-Layer Perceptron
A Network Network model which has more than one layer of perceptrons is known as a Multi-Layer
Perceptron.
It comprises of an input layer, one or more layers of LTUs and one output layer.
The layers other than the input and output layers are also known as hidden layers. When there are two or
more hidden layers the Neural Network is known as Deep Neural Network.
Each Layer of the Neural Network except the output layer has a neuron which always gives 1 as output.
This Neuron is known as Bias Neuron.
24
Back propagation
What is Back propagation
Back propagation is the essence of neural network training. It is the method of fine-tuning the weights
of a neural network based on the error rate obtained in the previous epoch (i.e., iteration).
Proper tuning of the weights allows you to reduce error rates and make the model reliable by increasing
its generalization.
Backpropagation in neural network is a short form for “backward propagation of errors.” It is a standard
method of training artificial neural networks.
This method helps calculate the gradient of a loss function with respect to all the weights in the How
Backpropagation Algorithm Works
The Back propagation algorithm in neural network computes the gradient of the loss function for a single
weight by the chain rule.
It efficiently computes one layer at a time, unlike a native direct computation. It computes the gradient, but it
does not define how the gradient is used. It generalizes the computation in the delta rule.
Consider the following Back propagation neural network example diagram to understand:
5. Travel back from the output layer to the hidden layer to adjust the weights such that the error is
decreased.
A feedforward neural network is an artificial neural network where the nodes never form a cycle. This kind of
neural network has an input layer, hidden layers, and an output layer. It is the first and simplest type of artificial
neural network.
Static Back-propagation
Recurrent Backpropagation
Static back-propagation:
It is one kind of backpropagation network which produces a mapping of a static input for static output. It is
useful to solve static classification issues like optical character recognition.
Recurrent Backpropagation:
Recurrent Back propagation in data mining is fed forward until a fixed value is achieved. After that, the error is
computed and propagated backward.
26
Disadvantages of using Backpropagation
The actual performance of backpropagation on a specific problem is dependent on the input data.
Back propagation algorithm in data mining can be quite sensitive to noisy data
You need to use the matrix-based approach for backpropagation instead of mini-batch.
Introduction
ANN is inspired by the biological neural network. For simplicity, in computer science, it is represented as a set
of layers. These layers are categorized into three classes which are input, hidden, and output.
Knowing the number of input and output layers and the number of their neurons is the easiest part. Every
network has a single input layer and a single output layer. The number of neurons in the input layer equals the
number of input variables in the data being processed. The number of neurons in the output layer equals the
number of outputs associated with each input. But the challenge is knowing the number of hidden layers and
their neurons.
Here are some guidelines to know the number of hidden layers and neurons per each hidden layer in a
classification problem:
1. Based on the data, draw an expected decision boundary to separate the classes.
2. Express the decision boundary as a set of lines. Note that the combination of such lines must yield to the
decision boundary.
3. The number of selected lines represents the number of hidden neurons in the first hidden layer.
4. To connect the lines created by the previous layer, a new hidden layer is added. Note that a new hidden
layer is added each time you need to create connections among the lines in the previous hidden layer.
5. The number of hidden neurons in each new hidden layer equals the number of connections to be made.
To make things clearer, let’s apply the previous guidelines for a number of examples.
27
Example 1
Let’s start with a simple example of a classification problem with two classes as shown in figure 1. Each sample
has two inputs and one output that represents the class label. It is much similar to XOR problem.
Figure 1
The first question to answer is whether hidden layers are required or not. A rule to follow in order to determine
whether hidden layers are required or not is as follows:
In artificial neural networks, hidden layers are required if and only if the data must be separated non-
linearly.
Looking at figure 2, it seems that the classes must be non-linearly separated. A single line will not work. As a
result, we must use hidden layers in order to get the best decision boundary. In such case, we may still not use
hidden layers but this will affect the classification accuracy. So, it is better to use hidden layers.
In order to add hidden layers, we need to answer these following two questions:
2. What is the number of the hidden neurons across each hidden layer?
Following the previous procedure, the first step is to draw the decision boundary that splits the two classes.
There is more than one possible decision boundary that splits the data correctly as shown in figure 2. The one we
will use for further discussion is in figure 2(a).
28
Figure 2
Following the guidelines, next step is to express the decision boundary by a set of lines.
The idea of representing the decision boundary using a set of lines comes from the fact that any ANN is built
using the single layer perceptron as a building block. The single layer perceptron is a linear classifier which
separates the classes using a line created according to the following equation:
Where x_i is the input, w_i is its weight, b is the bias, and y is the output. Because each hidden neuron added
will increase the number of weights, thus it is recommended to use the least number of hidden neurons that
accomplish the task. Using more hidden neurons than required will add more complexity.
29
4) Distributed representations
The concept of distributed representations is often central to deep learning, particularly as it applies
to natural language tasks.
Those beginning in the field may quickly understand this as simply a vector that represents some
piece of data. While this is true, understanding distributed representations at a more conceptual level
increases our appreciation of the role they play in making deep learning so effective.
To examine different types of representation, we can do a simple thought exercise. Let’s say we
have a bunch of “memory units” to store information about shapes. We can choose to represent each
individual shape with a single memory unit, as demonstrated in Figure 1.
30
Figure 2 shows a distributed representation of this same set of shapes where information about the shape is
represented with multiple “memory units” for concepts related to orientation and shape. Now the “memory
units” contain information both about an individual shape and how each shape relates to each other .
5) Overfitting
What is Overfitting?
When a model performs very well for training data but has poor performance with test data (new data), it is
known as overfitting. In this case, the machine learning model learns the details and noise in the training data
such that it negatively affects the performance of the model on test data. Overfitting can happen due to low bias
and high variance.
31
Reasons for Underfitting
Data used for training is not cleaned and contains noise (garbage values) in it
The model has a high bias
The size of the training dataset used is not enough
The model is too simple
Ways to Tackle Underfitting
Increase the number of features in the dataset
Increase model complexity
Reduce noise in the data
Increase the duration of training the data
Now that you have understood what overfitting and underfitting are, let’s see what is a good fit model in this
tutorial on overfitting and underfitting in machine learning.
.It intended to simulate the behavior of biological systems composed of “neurons”. ANNs are
computational models inspired by an animal’s central nervous systems. It is capable of machine
32
These presented as systems of interconnected “neurons” which can compute values from inputs.
It consists of nodes which in the biological analogy represent neurons, connected by arcs. It corresponds
to dendrites and synapses. Each arc associated with a weight while at each node.
Apply the values received as input by the node and define Activation function along the incoming arcs,
Structure of a Biological Neural NetworkA neural network is a machine learning algorithm based on the
The human brain consists of millions of neurons. It sends and process signals in the form of electrical and
chemical signals.
These neurons are connected with a special structure known as synapses. Synapses allow neurons to pass
It works like the way the human brain processes information. ANN includes a large number of connected
processing units that work together to process information. They also generate meaningful results from it.
33
We can apply the Neural network not only for classification. It can also apply for the regression of
Neural networks find great application in data mining used in sectors. For example economics, forensics,
etc and for pattern recognition. It can be also used for data classification in a large amount of data after
careful training.
Artificial Neural Network Layers
a. Input layer
The purpose of the input layer is to receive as input the values of the explanatory attributes for each
observation. Usually, the number of input nodes in an input layer is equal to the number of explanatory
variables. ‘input layer’ presents the patterns to the network, which communicates to one or more ‘hidden
layers’.
The nodes of the input layer are passive, meaning they do not change the data. They receive a single
value on their input and duplicate the value to their many outputs. From the input layer, it duplicates each
34
b. Hidden Layer
The Hidden layers apply given transformations to the input values inside the network. In this, incoming
arcs that go from other hidden nodes or from input nodes connected to each node. It connects with
outgoing arcs to output nodes or to other hidden nodes. In the hidden layer, the actual processing is done
There may be one or more hidden layers. The values entering a hidden node multiplied by weights, a set
of predetermined numbers stored in the program. The weighted inputs are then added to produce a single
number.
c. Output layer
The hidden layers then link to an ‘output layer‘. Output layer receives connections from hidden layers or
from the input layer. It returns an output value that corresponds to the prediction of the response variable.
In classification problems, there is usually only one output node. The active nodes of the output layer
The choice of the structure determines the results which are going to obtain. It is the most critical part of
The simplest structure is the one in which units distributes in two layers:
An input layer and an output layer. Each unit in the input layer has a single input and a single output
which is equal to the input. The output unit has all the units of the input layer connected to its input, with
35
There may be more than 1 output unit. In this case, the resulting model is a linear or logistic regression.
This is depending on whether the transfer function is linear or logistics. The weights of the network are
regression coefficients.
By adding 1 or more hidden layers between the input and output layers and units in this layer the
. But a number of hidden layers should be as small as possible. This ensures that the neural network does
not store all information from learning set but can generalize it to avoid overfitting.
Overfitting can occur. It occurs when weights make the system learn details of learning set instead of
discovering structures. This happens when the size of the learning set is too small in relation to the
A hidden layer is present or not, the output layer of the network can sometimes have many units when
7) Recurrent networks
Recurrent Neural Network(RNN) are a type of Neural Network where the output from
previous step are fed as input to the current step.
In traditional neural networks, all the inputs and outputs are independent of each other, but in
cases like when it is required to predict the next word of a sentence, the previous words are
required and hence there is a need to remember the previous words.
Thus RNN came into existence, which solved this issue with the help of a Hidden Layer. The
main and most important feature of RNN is Hidden state, which remembers some
information about a sequence.
36
RNN have a “memory” which remembers all information about what has been calculated. It uses the
same parameters for each input as it performs the same task on all the inputs or hidden layers to
produce the output. This reduces the complexity of parameters, unlike other neural networks.
The working of a RNN can be understood with the help of below example:
Example:
Suppose there is a deeper network with one input layer, three hidden layers and one output layer.
Then like other neural networks, each hidden layer will have its own set of weights and biases, let’s
say, for hidden layer 1 the weights and biases are (w1, b1), (w2, b2) for second hidden layer and (w3,
b3) for third hidden layer. This means that each of these layers are independent of each other, i.e.
they do not memorize the previous outputs.
RNN converts the independent activations into dependent activations by providing the same
weights and biases to all the layers, thus reducing the complexity of increasing parameters and
memorizing each previous outputs by giving each output as input to the next hidden layer.
Hence these three layers can be joined together such that the weights and bias of all the hidden
layers is the same, into a single recurrent layer.
37
Formula for calculating current state:
where:
where:
Yt -> output
Why -> weight at output layer
38
Training through RNN
1. An RNN remembers each and every information through time. It is useful in time series
prediction only because of the feature to remember previous inputs as well. This is called Long
Short Term Memory.
2. Recurrent neural network are even used with convolutional layers to extend the effective pixel
neighborhood.
Disadvantages of Recurrent Neural Network
1. Gradient vanishing and exploding problems.
2. Training an RNN is a very difficult task.
3. It cannot process very long sequences if using tanh or relu as an activation function.
“The support vector machine (SVM) is a supervised learning method that generates input-output mapping
functions from a set of labeled training data." A Support Vector Machine (SVM) performs classification by
finding the hyperplane that maximizes the margin between the two classes. The vectors (cases) that define the
39
hyperplane are the support vectors.
Algorithm:
For the maximum margin hyperplane only examples on the margin matter (only these affect the distances).
These are called support vectors. The objective of the support vector machine algorithm is to find a hyperplane
in an N-dimensional space (N — the number of features) that distinctly classifies the data points.
40
To separate the two classes of data points, there are many possible hyperplanes that could be chosen. Our
objective is to find a plane that has the maximum margin, i.e the maximum distance between data points of both
classes. Maximizing the margin distance provides some reinforcement so that future data points can be
classified with more confidence.
Hyperplanes
Hyperplanes are decision boundaries that help classify the data points. Data points falling on either side of the
hyperplane can be attributed to different classes.
Also, the dimension of the hyperplane depends upon the number of features. If the number of input features is
2, then the hyperplane is just a line. If the number of input features is 3, then the hyperplane becomes a two-
dimensional plane. It becomes difficult to imagine when the number of features exceeds 3.
Support Vectors
Support vectors are data points that are closer to the hyperplane and influence the position and orientation of the
hyperplane. Using these support vectors, we maximize the margin of the classifier. Deleting the support vectors
will change the position of the hyperplane.
These are the points that help us build our SVM. It will be useful computationally if only a small fraction of the
datapoints are support vectors, because we use the support vectors to decide which side of the separator a test
case is on.
41
The support vectors are indicated by the circles around them.
To find the maximum margin the separator, we have to solve following optimization problem
Linear models are nice and interpretable but have limitations. Can’t learn difficult" nonlinear patterns.
42
How Does a Kernel Work?
Kernel machines are a class of pattern-analysis algorithms, the most well-known member of which is
the support vector machine (SVM).
The general objective of pattern analysis is to discover and investigate various sorts of relationships
(for example, clusters, ranks, principal components, correlations, and classifications) in datasets
Kernel methods are approaches for dealing with linearly inseparable data or non-linear data sets like
those presented in fig-1. The concept is to use a mapping function to project nonlinear combinations
of the original features onto a higher-dimensional space, where the data becomes linearly separable.
The two-dimensional dataset (X1, X2) is projected into a new three-dimensional feature space (Z1,
Z2, Z3) in the diagram above, where the classes become separable.
It appears that we will have to operate on the higher dimensional vectors in the modified feature
space in order to train a support vector classifier and maximize our objective function.
In real-world applications, data may contain numerous features, and transformations using multiple
polynomial combinations of these features will result in extremely large and prohibitive processing
costs.
43
Types of Kernel Functions
The kernel function is a function that may be expressed as the dot product of the mapping function
(kernel method) and looks like this,
K(xi,xj) = Ø(xi) . Ø(xj)
The kernel function simplifies the process of determining the mapping function. As a result, the
kernel function in the altered space specifies the inner product.
Polynomial Kernel
The polynomial kernel is a kernel function that allows the learning of non-linear models by
representing the similarity of vectors (training samples) in a feature space over polynomials of the
original variables. It is often used with support vector machines (SVMs) and other kernelized
models.
F(x, xj) = (x.xj+1)^d
Sigmoid Kernel
It is primarily used in neural networks. This kernel function is similar to the activation function for
neurons in a two-layer perceptron model of a neural network.
F(x, xj) = tanh(αxay + c)
Linear Kernel
It is the most fundamental sort of kernel and is usually one-dimensional in structure. When there are
numerous characteristics, it proves to be the best function. The linear kernel is commonly used for
text classification issues since most of these problems can be linearly split. Other functions are
slower than linear kernel functions.
F(x, xj) = sum( x.xj)
44
ADDITIONAL MATERIAL
a. Aerospace
Generally, we use ANN a for Autopilot aircrafts. They used for aircraft fault detection.
b. Military
In various ways, we use ANN an in the military. Such as Weapon orientation and steering, target tracking.
c. Electronics
Basically, we use an Artificial neural network in electronics in many ways. That are code sequence prediction,
IC chip layout, and chip failure analysis.
d. Medical
As medical has too many machines. That use in various ways. Such as cancer cell analysis, EEG and ECG
analysis.
e. Speech
f. Telecommunications
Generally, it has different applications. Thus, we use an Artificial neural network in many ways. Such as image
and data compression, automated information services.
g. Transportation
Generally, we use an Artificial neural network in transportation in many ways. That are truck Brake system
diagnosis and vehicle scheduling, routing systems.
h. Software
It also uses an ANN in pattern Recognition. Such as in facial recognition, optical character recognition, etc.
We use an Artificial neural network to predict time. Also, we use ANNs to make predictions on stocks and
natural calamities.
So, this was all about Artificial Neural Network (ANN) Tutorial. Hope you like our explanation.
45