Artificial Neural Network
Artificial Neural Network
Artificial Neural Network
The process of creating a neural network begins with the perceptron. In simple terms, the perceptron receives
inputs, multiplies them by some weights, and then passes them into an activation function (such as logistic,
relu, tanh, identity) to produce an output.
Neural networks are created by adding the layers of these perceptrons together, known as a multi-layer
perceptron model. There are three layers of a neural network - the input, hidden, and output layers.
The input layer directly receives the data, whereas the output layer creates the required output. The layers in
between are known as hidden layers where the intermediate computation takes place.
A neural network algorithm can be used for both classification and regression problems.
Artificial Neural Network (ANN)
An Artificial Neural Network (ANN) is a machine learning model inspired by the human brain’s neural
structure. It comprises interconnected nodes (neurons) organized into layers. Data flows through these nodes,
adjusting the weights of connections to learn patterns and make predictions. ANNs excel in tasks like image
recognition, language processing, and decision-making, revolutionizing various fields.
The primary function of artificial neural networks (ANNs) is to process and learn from data in a way that
enables them to recognize patterns, make predictions, and solve complex problems. ANNs mimic the human
brain’s neural connections, adjusting the connections’ strengths (weights) during training to improve their
ability to generalize and perform tasks such as image recognition, language processing, and decision-making.
Biological Neural Network
Dendrites Inputs
Synapse Weights
Axon Output
Architecture of an artificial neural network
To understand the concept of the architecture of an artificial neural network, we have to understand what a neural network
consists of. In order to define a neural network that consists of a large number of artificial neurons, which are termed units
arranged in a sequence of layers.
As the name suggests, it accepts inputs in several different formats provided by the programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the calculations to find hidden features and patterns.
Output Layer:
The input goes through a series of transformations using the hidden layer, which finally results in output that is conveyed using this
layer.
The artificial neural network takes input and computes the weighted sum of the inputs and includes a bias. This computation is
represented in the form of a transfer function.
Note: It determines weighted total is passed as an input to an activation function to produce the output. Activation functions choose whether a
node should fire or not. Only those who are fired make it to the output layer. There are distinctive activation functions available that can be
applied upon the sort of task we are performing.
Advantages of Artificial Neural Network
Parallel Processing capability:
Artificial neural networks have a numerical value that can perform more than one task simultaneously.
Data that is used in traditional programming is stored on the whole network, not on a database. The disappearance of a couple of
pieces of data in one place doesn't prevent the network from working.
After ANN training, the information may produce output even with inadequate data. The loss of performance here relies upon the
significance of missing data.
For ANN is to be able to adapt, it is important to determine the examples and to encourage the network according to the desired output
by demonstrating these examples to the network. The succession of the network is directly proportional to the chosen instances, and if
the event can't appear to the network in all its aspects, it can produce false output.
Extortion of one or more cells of ANN does not prohibit it from generating output, and this feature makes the network fault-tolerance.
Disadvantages of Artificial Neural Network
Assurance of proper network structure:
There is no particular guideline for determining the structure of artificial neural networks. The appropriate network structure is
accomplished through experience, trial, and error.
It is the most significant issue of ANN. When ANN produces a testing solution, it does not provide insight concerning why and how. It
decreases trust in the network.
Hardware dependence:
Artificial neural networks need processors with parallel processing power, as per their structure. Therefore, the realization of the
equipment is dependent.
ANNs can work with numerical data. Problems must be converted into numerical values before being introduced to ANN. The
presentation mechanism to be resolved here will directly impact the performance of the network. It relies on the user's abilities.
The network is reduced to a specific value of the error, and this value does not give us optimum results.
Working of Artificial Neural Networks
Artificial Neural Network can be best represented as a weighted directed graph, where the artificial neurons form the nodes.
The association between the neurons outputs and neuron inputs can be viewed as the directed edges with weights. The
Artificial Neural Network receives the input signal from the external source in the form of a pattern and image in the form of
a vector. These inputs are then mathematically assigned by the notations x(n) for every n number of inputs.
Afterward, each of the input is multiplied by its corresponding weights ( these weights are the details utilized by the artificial neural
networks to solve a specific problem ). In general terms, these weights normally represent the strength of the interconnection between
neurons inside the artificial neural network. All the weighted inputs are summarized inside the computing unit.
If the weighted sum is equal to zero, then bias is added to make the output non-zero or something else to scale up to the system's
response. Bias has the same input, and weight equals to 1. Here the total of weighted inputs can be in the range of 0 to positive infinity.
Here, to keep the response in the limits of the desired value, a certain maximum value is benchmarked, and the total of weighted inputs
is passed through the activation function.
The activation function refers to the set of transfer functions used to achieve the desired output. There is a different kind of the
activation function, but primarily either linear or non-linear sets of functions. Some of the commonly used sets of activation functions
are the Binary, linear, and Tan hyperbolic sigmoidal activation functions.
Binary:
In binary activation function, the output is either a one or a 0. Here, to accomplish this, there is a threshold value set up. If the net
weighted input of neurons is more than 1, then the final output of the activation function is returned as one or else the output is returned
as 0.
Sigmoidal Hyperbolic:
The Sigmoidal Hyperbola function is generally seen as an "S" shaped curve. Here the tan hyperbolic function is used to approximate
output from the actual net input. The function is defined as:
Feedback ANN:
In this type of ANN, the output returns into the network to accomplish the best-evolved results internally. As per the University of
Massachusetts, Lowell Centre for Atmospheric Research. The feedback networks feed information back into itself and are well suited to
solve optimization issues. The Internal system error corrections utilize feedback ANNs.
Feed-Forward ANN:
A feed-forward network is a basic neural network comprising of an input layer, an output layer, and at least one layer of a neuron.
Through assessment of its output by reviewing its input, the intensity of the network can be noticed based on group behavior of the
associated neurons, and the output is decided. The primary advantage of this network is that it figures out how to evaluate and
recognize input patterns.
Prerequisite
No specific expertise is needed as a prerequisite before starting this tutorial.
Audience
Our Artificial Neural Network Tutorial is developed for beginners as well as professionals, to help them understand the basic concept of
ANNs.
Recurrent Neural Networks (RNNs): These networks have a “memory” component, where information can flow in cycles through the
network. This allows the network to process sequences of data, such as time series or speech.
Convolutional Neural Networks (CNNs): These networks are designed to process data with a grid-like topology, such as images. The
layers consist of convolutional layers, which learn to detect specific features in the data, and pooling layers, which reduce the spatial
dimensions of the data.
Autoencoders: These are neural networks that are used for unsupervised learning. They consist of an encoder that maps the input
data to a lower-dimensional representation and a decoder that maps the representation back to the original data.
Generative Adversarial Networks (GANs): These are neural networks that are used for generative modeling. They consist of two
parts: a generator that learns to generate new data samples, and a discriminator that learns to distinguish between real and generated
data.
Interconnections:
Interconnection can be defined as the way processing elements (Neuron) in ANN are connected to each other. Hence, the
arrangements of these processing elements and geometry of interconnections are very essential in ANN.
These arrangements always have two layers that are common to all network architectures, the Input layer and output layer where the
input layer buffers the input signal, and the output layer generates the output of the network. The third layer is the Hidden layer, in
which neurons are neither kept in the input layer nor in the output layer. These neurons are hidden from the people who are interfacing
with the system and act as a black box to them.
There exist five basic types of neuron connection architecture :
In this type of network, we have only two layers input layer and the output layer but the input layer does not count because no
computation is performed in this layer. The output layer is formed when different weights are applied to input nodes and the cumulative
effect per node is taken. After this, the neurons collectively give the output layer to compute the output signals.
Multilayer feed-forward network
This layer also has a hidden layer that is internal to the network and has no direct contact with the external layer. The existence of one
or more hidden layers enables the network to be computationally stronger, a feed-forward network because of information flow through
the input function, and the intermediate computations used to determine the output Z. There are no feedback connections in which
outputs of the model are fed back into itself.
Single node with its own feedback
When outputs can be directed back as inputs to the same layer or preceding layer nodes, then it results in feedback networks.
Recurrent networks are feedback networks with closed loops. The above figure shows a single recurrent network having a single
neuron with feedback to itself.
Single-layer recurrent network
The above network is a single-layer network with a feedback connection in which the processing element’s output can be directed back
to itself or to another processing element or both. A recurrent neural network is a class of artificial neural networks where connections
between nodes form a directed graph along a sequence. This allows it to exhibit dynamic temporal behavior for a time sequence.
Unlike feedforward neural networks, RNNs can use their internal state (memory) to process sequences of inputs.
Multilayer recurrent network
In this type of network, processing element output can be directed to the processing element in the same layer and in the preceding
layer forming a multilayer recurrent network. They perform the same task for every element of a sequence, with the output being
dependent on the previous computations. Inputs are not needed at each time step. The main feature of a Recurrent Neural Network is
its hidden state, which captures some information about a sequence.
Activation Function
In artificial neural networks, an activation function is one that outputs a smaller value for tiny inputs and a higher value if its inputs are
greater than a threshold. An activation function "fires" if the inputs are big enough; otherwise, nothing happens. An activation function,
then, is a gate that verifies how an incoming value is higher than a threshold value.
Because they introduce non-linearities in neural networks and enable the neural networks can learn powerful operations,
activation functions are helpful. A feedforward neural network might be refactored into a straightforward linear function or
matrix transformation on to its input if indeed the activation functions were taken out.
By generating a weighted total and then including bias with it, the activation function determines whether a neuron should
be turned on. The activation function seeks to boost a neuron's outputs’ nonlinearity.
Explanation: As we are aware, neurons in neural networks operate in accordance with weight, bias, and their corresponding
activation functions. Based on the mistake, the values of the neurons inside a neural network would be modified. This
process is known as back-propagation. Back-propagation is made possible by activation functions since they provide the
gradients and error required to change the biases and weights.
The two main categories of activation functions are:
As can be observed, the functional is linear or lnon-inear. Therefore, no region will be employed to restrict the functions' output.
The normal data input to neural networks is unaffected by the complexity or other factors.
Non-linear Activation Function
The normal data input to neural networks is unaffected by the complexity or other factors.
A neural network without an activation function is essentially just a linear regression model. The
activation function does the non-linear transformation to the input making it capable to learn and
perform more complex tasks.
○ Linear Function
Equation: A linear function equation, which is y = x, is similar to the eqn of a single direction.
The ultimate activation function of the last layer is nothing more than a linear function of input from the first layer,
regardless of how many levels we have if they are all linear in nature. -inf to +inf is the range.
Uses: The output layer is the only location where the activation function is applied.
If we separate a linear function to add non-linearity, the outcome will no longer depend on the input "x," the function will
become fixed, and our algorithm won't exhibit any novel behaviour.
A good example of a regression problem is determining the cost of a house. We can use linear activation at the output layer
since the price of a house may have any huge or little value. The neural network's hidden layers must perform some sort of
non-linear function even in this circumstance.
○ Sigmoid Function
Non-linear in nature. Observe that while Y values are fairly steep, X values range from -2 to 2. To put it another way, small
changes in x also would cause significant shifts in the value of Y. spans from 0 to 1.
Uses: Sigmoid function is typically employed in the output nodes of a classi?cation, where the result may only be either 0 or
1. Since the value for the sigmoid function only ranges from 0 to 1, the result can be easily anticipated to be 1 if the value is
more than 0.5 and 0 if it is not.
○ Tanh Function
The activation that consistently outperforms sigmoid function is known as tangent hyperbolic function. It's actually a
sigmoid function that has been mathematically adjusted. Both are comparable to and derivable from one another.
Uses: - Since its values typically range from -1 to 1, the mean again for hidden layer of a neural network will be 0 or very near
to it. This helps to centre the data by getting the mean close to 0. This greatly facilitates learning for the following layer.
Equation:
Uses: Because ReLu includes simpler mathematical processes than tanh and sigmoid, it requires less computer time to run.
The system is sparse and efficient for computation since only a limited number of neurons are activated at any given time.
Simply said, RELU picks up information considerably more quickly than sigmoid and Tanh functions.
Currently, the ReLU is the activation function that is employed the most globally. Since practically all convolutional neural
networks and deep learning systems employ it.
However, the problem is that all negative values instantly become zero, which reduces the model's capacity to effectively fit
or learn from the data. This means that any negative input to a ReLU activation function immediately becomes zero in the
graph, which has an impact on the final graph by improperly mapping the negative values.
○ softmax Function
Although it is a subclass of the sigmoid function, the softmax function comes in handy when dealing with multiclass
classification issues.
Used frequently when managing several classes. In the output nodes of image classification issues, the softmax was
typically present. The softmax function would split by the sum of the outputs and squeeze all outputs for each category
between 0 and 1.
The output unit of the classifier, where we are actually attempting to obtain the probabilities to determine the class of each
input, is where the softmax function is best applied.
The usual rule of thumb is to utilise RELU, which is a usual perceptron in hidden layers and is employed in the majority of
cases these days, if we really are unsure of what encoder to apply.
A very logical choice for the output layer is the sigmoid function if your input is for binary classification. If our output
involves multiple classes, Softmax can be quite helpful in predicting the odds for each class.
Gradient Descent
Gradient Descent is an optimization algorithm for finding a local minimum of a differentiable function. Gradient descent in
machine learning is simply used to find the values of a function's parameters (coefficients) that minimize a cost function as
far as possible.
"A gradient measures how much the output of a function changes if you
change the inputs a little bit." — Lex Fridman (MIT)
A gradient simply measures the change in all weights with regard to the change in error. You can also think of a
gradient as the slope of a function. The higher the gradient, the steeper the slope and the faster a model can learn.
But if the slope is zero, the model stops learning. In mathematical terms, a gradient is a partial derivative with
respect to its inputs.
This involves a large amount of data. In the case of supervised learning, this will be labeled data i.e input
data paired with their respective label or target value.
Gradient Descent stands as a cornerstone in the realm of deep learning, orchestrating the intricate dance
of model optimization. At its core, it is a numerical optimization algorithm that aims to find the optimal
parameters—weights and biases—of a neural network by minimizing a defined cost function.
The learning happens during the backpropagation while training the neural network-based model.
There is a term known as Gradient Descent, which is used to optimize the weight and biases based on
the cost function. cost function evaluates the difference between the actual and predicted outputs.
Mathematical formula
𝑦=𝛽+θnXn,
where x is the parameters(can go from 1 to n), 𝛽 is the bias and θ is the weight
References
https://www.analyticsvidhya.com/blog/2021/09/introduction-to-artificial-neural-networks/#:~:text=An%20Artificial%20Neural%20Network
%20(ANN)%20is%20a%20machine%20learning%20model,learn%20patterns%20and%20make%20predictions.
https://www.geeksforgeeks.org/introduction-to-ann-set-4-network-architectures/
https://www.geeksforgeeks.org/activation-functions-neural-networks/
https://towardsdatascience.com/implementing-gradient-descent-in-python-from-scratch-760a8556c31f#:~:text=To%20implement%20a
%20gradient%20descent,and%20actual%20values%20of%20Y