DL Class1
DL Class1
DL Class1
• Cost function(C) or Loss function measures the difference between the actual
output and predicted output from the model.
• The higher the gradient, the steeper the slope and the faster a model
can learn.
Global minima is minimum point for the entire domain and local minima is
a sub optimal point where we get a relatively minimum point but is not
the global minimum point.
Learning Rate
Learning rate controls how much we should adjust the weights with respect
to the loss gradient. Learning rates are randomly initialized.
Lower the value of the learning rate, slower will be the convergence to
global minima.
A higher value for learning rate will not allow the gradient descent to
converge
Limitations of gradient descent learning algorithm
Learning Rate:
If we set the learning rate to a very If the steps are too big, it may
small value, gradient descent will not reach the local minimum
eventually reach the local minimum because it bounces back and
but that may take a while. forth between the convex
function of gradient descent
Limitations of gradient descent learning algorithm
In order for Gradient Descent to work we must set the learning rate to an
appropriate value.
This helps you see the value of your cost function after each iteration of
gradient descent, and provides a way to easily spot how appropriate your
learning rate is.
Limitations of gradient descent learning algorithm
Local Minima and Saddle Point:
For convex problems, gradient descent can find the global minimum easily,
while for non-convex problems, it is sometimes difficult to find the global
minimum
Whenever the slope of the cost function is at
zero or just close to zero, this model stops
learning further.
Local minima generates the shape similar to the
global minimum, where the slope of the cost
function increases on both sides of the current
points.
In contrast, with saddle points, the negative
gradient only occurs on one side of the point,
which reaches a local maximum on one side and
a local minimum on the other side.
Limitations of gradient descent learning algorithm
Saddle point on the surface of loss function is a point where, from one
perspective, that critical point looks like a local minima, while from
another perspective, it looks like a local maxima.
Saddle point injects confusion into the learning process. Model learning stops
(or becomes extremely slow) at this point, thinking that “minimum” has been
achieved since the slope becomes zero (or very close to zero).
https://www.offconvex.org/2016/03/22/saddlepoints/
Limitations of gradient descent learning algorithm
Vanishing and Exploding Gradient
In a deep neural network, if the model is trained with gradient descent and
backpropagation, there can occur two more issues other than local minima and saddle
point.
Vanishing Gradients:
Vanishing Gradient occurs when the gradient is smaller than expected. During
backpropagation, this gradient becomes smaller that causing the decrease in the
learning rate of earlier layers than the later layer of the network. Once this happens,
the weight parameters update until they become insignificant.
Exploding Gradient:
Exploding gradient is just opposite to the vanishing gradient as it occurs when the
Gradient is too large and creates a stable model. Further, in this scenario, model
weight increases, and they will be represented as NaN. This problem can be solved
using the dimensionality reduction technique, which helps to minimize complexity
within the model.
Mini-batch gradient descent algorithm
Mini-batch gradient descent algorithm
Mini-batches while training