Unit 1 Deep Learning
Unit 1 Deep Learning
Our brain is a neural network, which is full of neurons and each neuron is connected
with multiple neurons.
Dendrite collects the input signals which are summed up in the cell body and later
transmitted to the next neuron through the axon.
A neuron comprises three major parts: the cell body (also called Soma), the
dendrites, and the axon. The dendrites are like fibers branched in different
directions and are connected to many cells in that cluster.
Dendrites receive the signals from surrounding neurons, and the axon transmits the
signal to the other neurons. At the ending terminal of the axon, the contact with the
dendrite is made through a synapse.
Axon is a long fibre that transports the output signal as electric impulses along its
length. Each neuron has one axon. Axons pass impulses from one neuron to another
like a domino effect.
Information from the outside world enters the artificial neural network from the input
layer. Input nodes process the data, analyze or categorize it, and pass it on to the next
layer.
Hidden Layer
Hidden layers take their input from the input layer or other hidden layers. Artificial
neural networks can have a large number of hidden layers. Each hidden layer analyses
the output from the previous layer, processes it further, and passes it on to the next
layer.
Output Layer
The output layer gives the final result of all the data processing by the artificial neural
network. It can have single or multiple nodes. For instance, if we have a binary
(yes/no) classification problem, the output layer will have one output node, which will
give the result as 1 or 0. However, if we have a multi-class classification problem, the
output layer might consist of more than one output node.
It processes the data in one direction from input node to output node. Every node in
one layer is connected to every node in the next layer.
Back-Propagation
Then we have a summation junction that aggregates all the weighted input and then
passes the result to the activation functions.
The activation function is a threshold function that gives out 1 as the output if the sum
of the weighted input is equal or above the threshold value and 0 otherwise.
If X ≥ ø(threshold value)
Output = 1
Else
Output = 0
1 1 2 1
1 0 1 1
0 1 1 1
0 0 0 0
The truth table shows when the loan should be approved considering all the varying
scenarios. In this case, the loan is approved only if the salary is high and the credit score is
good. The McCulloch Pitt's model of neuron was mankind’s first attempt at mimicking the
human brain. And it was a fairly simple one too. It’s no surprise it had many limitations-
1. The model failed to capture and compute cases of non-binary inputs. It was limited by
its ability to compute every case with 0 and 1 only.
2. The threshold had to be decided beforehand and needed manual computation instead of
the model deciding itself.
3. Linearly separable functions couldn’t be computed.
OR Function
The inputs in OR Function is obviously Boolean, so only 4 combinations are possible — (0,0),
(0,1), (1,0) and (1,1).
Now plotting them on a 2D graph and making use of the OR function’s aggregation equation
i.e., x_1 + x_2 ≥ 1 using which we can draw the decision boundary as shown in the graph
below.
X1 X2 Output
0 0 0
0 1 1
1 0 1
1 1 1
We just used the aggregation equation i.e., x_1 + x_2 =1 to graphically show that all those
inputs whose output when passed through the OR function M-P neuron lie ON or ABOVE that
line and all the input points that lie BELOW that line are going to output 0. The M-P neuron
just learn a linear decision boundary! The M-P neuron is splitting the input sets into two
classes — positive and negative. Positive ones (which output 1) are those that lie ON or
ABOVE the decision boundary and negative ones (which output 0) are those that lie BELOW
AND Function
X1 X2 Output
0 0 0
0 1 0
1 0 0
1 1 1
Similar to OR Function, we can plot the graph for AND function considering the equation is
x_1+x_2=2.
In this case, the decision boundary equation is x_1 + x_2 =2. Here, all the input points that lie
ON or ABOVE, just (1,1), output 1 when passed through the AND function M-P neuron. It
fits! The decision boundary works!
Here, the decision boundary separates all the input points that lie ON or ABOVE and give
output 1 with just (1,1) when passed through the AND function.
From these examples, we can understand that with an increase in the number of inputs, the
dimensions which are plotted on the graph will also increase.
1.4-Perceptrons
A human neuron collects the input from other neurons using dendrites and sums all the
inputs. If the total is greater than threshold value it produces the output.
A perceptron is a mathematical model of a neuron. It receives weighted inputs which
are added together and passed to an activation function that determines whether the
neuron should fire and produce the inputs.
Many activation functions with different properties, but one of the simplest is step
function. A step function output is 1 if the input is higher than threshold value
otherwise will be 0.
Input Nodes or Input Layer:
This is the primary component of Perceptron which accepts the initial data into the system for
further processing. Each input node contains a real numerical value.
Weight parameter represents the strength of the connection between units. This is another
most important parameter of Perceptron components. Weight is directly proportional to the
strength of the associated input neuron in deciding the output. Further, Bias can be considered
as the line of intercept in a linear equation.
o Activation Function:
These are the final and important components that help to determine whether the neuron will
fire or not. Activation Function can be considered primarily as a step function.
o Sign function
o Step function, and
o Sigmoid function
Example of Perceptrons
Inputs
X1=0.9, X2=0.7
Weights
W1=0.2, W2=0.9
X1W1+X2W2=0.9*0.2+0.7*0.9
X1W1+X2W2=0.81
This step function or Activation function plays a vital role in ensuring that output
is mapped between required values (0,1) or (-1,1).
It is important to note that the weight of input is indicative of the strength of a
node. Similarly, an input's bias value gives the ability to shift the activation
function curve up or down.
Perceptron model works in two important steps as follows:
Step-1
In the first step first, multiply all input values with corresponding weight values
and then add them to determine the weighted sum. Mathematically, we can
calculate the weighted sum as follows:
Add a special term called bias 'b' to this weighted sum to improve the model's
performance.
∑wi*xi + b
Step-2
Y = f(∑wi*xi + b)
a vector is anything that sits anywhere in space, and has a magnitude and a direction.
Imagine you have two vectors oh size n+1, w and x, the dot product of these vectors (w.x)
could be computed as follows:
Now the same old dot product can be computed differently if only you knew the angle
between the vectors and their individual magnitudes.
The other way around, you can get the angle between two vectors, if only you knew the
vectors, given you know how to calculate vector magnitudes and their dot product.
Our goal is to find the w vector that can perfectly classify positive inputs and negative inputs
in our data.
We initialize w with some random vector. We then iterate over all the examples in the
data, (P U N) both positive and negative examples.
Now if an input x belongs to P, ideally what should the dot product w.x be? I’d say
greater than or equal to 0 because that’s the only thing what our perceptron wants at
the end of the day so let's give it that. And if x belongs to N, the dot product MUST be
less than 0.
We have already established that when x belongs to P, we want w.x > 0, basic perceptron rule.
What we also mean by that is that when x belongs to P, the angle between w and x should be
positive example data vectors (x E P) and an angle more than 90 degrees with the negative
So, we now strongly believe that the angle between w and x should be less than 90
when x belongs to P class and the angle between them should be more than 90 when x belongs
to N class.
So, when we are adding x to w, which we do when x belongs to P and w.x < 0 (Case 1), we
are essentially increasing the cos(alpha) value, which means, we are decreasing
the alpha value, the angle between w and x, which is what we desire. And the similar
intuition works for the case when x belongs to N and w.x ≥ 0 (Case 2).
1.6-Sigmoid Neuron
Sigmoid neuron is similar to perceptron but they are slightly modified such that the
output from sigmoid neurons is much smoother than step functional output from
perceptron.
Introducing sigmoid neurons where the output function is much smoother than step
function. In the sigmoid neuron, a small change in the input only causes a small
change in the output as opposed to the steeped function output.
There are many functions with the characterizes of an “S” shaped curve known as
sigmoid functions. The most used function is the logistic function.
We no longer a see a Sharpe transition at threshold “b”. The output from the
sigmoid neuron is not 0 or 1.Instead it is a real value between 0-1 which can be
interpreted as a probability.
In a sigmoid neuron for every input Xi, its weight Wi associated with it. The
weight depict the importance of the input in the decision making process. The
outputs from sigmoid ranges between 0 to 1, which can be interpret as a probability
rather than 0 or 1 like in the perceptron model.
In a case of 1-dimensional input X,the sigmoid function that best describe the
relationship between input and output is given by
In case of 2-dimensional input i.e. 2 input features the sigmoid function that best
describe the input-output relation given by
In case of highly dimensional input with many input features the sigmoid function
given by
Advantages
Unlike perceptron and M-P neurons has binary output. The sigmoid function
output lies between 0 and 1.
Another advantage of using sigmoid function is that it deals with data that are not
linearly separable. A sigmoid neuron cannot complete separate positive points
from negative points.
Loss function-is the sum of the square difference between true output and predicted
output.
Yi=True Output
Yi (Cap) =Predicated Output
Multilayer perceptron’s has one input layer and each input ne neuron. It has one
output layer with single node for each output and it can have any number of hidden
layers and each hidden layer can have any number of nodes.
MLP network are used for supervised learning format. Atypical learning algorithm for
MLP network is also called as back propagation algorithm.
MLP is a feed forward neural network which means that the data layer to the output
layer in the forward direction.
There are three inputs thus three input node and the hidden layer has three nodes. The
output layer gives two outputs therefore there are two output nodes. The nodes in the
input layer takes input and forward it for further process.
Every node in the multi-layer perceptron’s uses a sigmoid activation functions. The
sigmoid activation function takes the real value as input and convert it into a number
between 0 and 1.
Backpropagation
1. After calculating the output from the MLP neural network, calculate the errors.
2. This error is the difference between the output generated by neural network and
actual output. To calculated the errors is feedback to the network from output to
hidden layers.
3. Now the output becomes input to the network.
4. The model reduces errors by adjusting the weights in the hidden layers.
5. Calculated the predicted output with adjusted weight and check errors. This
process continuously used till there is a minimize or no error.
XOR function
The XOR function takes two binary inputs (0 or 1) and returns 1 if the inputs are
different (one is 0 and the other is 1); otherwise, it returns 0.
The XOR function cannot be implemented using a single perceptron since it is not a
linearly separable function. However, we can achieve the XOR functionality by
combining multiple perceptron’s in a network.
A perceptron makes decisions based on a linear combination of its inputs, followed by
applying an activation function.
The decision boundary of a perceptron is a hyperplane, which is a straight line in two-
dimensional space.
However, the XOR function requires a non-linear decision boundary, which cannot be
achieved by a single perceptron.
To implement the XOR function, we need a network of perceptrons, specifically a
multilayer perceptron (MLP) with at least one hidden layer.
The hidden layer(s) allows the network to learn non-linear mappings between the
inputs and outputs.
By combining multiple perceptrons and using non-linear activation functions, an MLP
can model complex relationships and achieve the necessary non-linear decision
boundaries to implement the XOR function.
X1 X2 Y
0 0 0
0 1 1
1 0 1
1 1 0
Conditions for the implementation of XOR:
w1<w0
w2≥w0
w3≥w0
w4<w0
X1 X2 XOR H1 H2 H3 H4 Output
0 0 0 1 0 0 0 W1
0 1 1 0 1 0 0 W2
1 0 1 0 0 1 0 W3
1 1 0 0 0 0 1 W4