Unit 2
Unit 2
Unit 2
An activation function in a neural network defines how the weighted sum of the input is
transformed into an output from a node or nodes in a layer of the network.
Activation function convert the linear input signals of a node into non-linear output signals
to facilitate the learning of high order polynomials that go beyond one degree for deep
networks.
The basic goal of AFs is to give the neural network non-linear qualities. They turn a node's
linear input signals to non-linear output signals to help neural networks learn high-order
polynomials with more than one degree.
The Activation Functions can be basically divided into 2 types-
1. Linear Activation Function
2. Non-linear Activation Functions
Backpropagation
Backpropagation is the method of fine-tuning the weights of a neural network based on
the error rate obtained in the previous epoch (i.e., iteration). Proper tuning of the weights
allows you to reduce error rates and make the model reliable by increasing its
generalization.
Backpropagation in neural network is a short form for “backward propagation of errors.”
It is a standard method of training artificial neural networks. This method helps calculate
the gradient of a loss function with respect to all the weights in the network.
Backpropagation (backward propagation) is a supervised learning algorithm, for training
Multi-layer (ANN) Neural Network. Backpropagation is an important mathematical tool
for improving the accuracy of predictions in machine learning.
Batch Normalization
Batch normalization is one of the important features we add to our model helps as a
Regularizer, normalizing the inputs, in the backpropagation process, and can be adapted to
most of the models to converge better.
Batch Normalization is a supervised learning technique that converts interlayer outputs
into of a neural network into a standard format, called normalizing. This effectively
'resets' the distribution of the output of the previous layer to be more efficiently processed
by the subsequent layer.
Batch normalization is a feature that we add between the layers of the neural network and
it continuously takes the output from the previous layer and normalizes it before sending it
to the next layer. This has the effect of stabilizing the neural network. Batch normalization
is also used to maintain the distribution of the data.
Since normalization guarantees that no activation value is too high or too low, and since it
enables each layer to learn independently from the others, this strategy leads to quicker
learning rates.
Gradient Descent
What is gradient descent?
Gradient descent is the most popular optimization strategy used in machine learning and
deep learning at the moment. It is used when training data models, can be combined with
every algorithm and is easy to understand and implement.
Gradient Descent is an optimization algorithm for finding a local minimum of a
differentiable function. Gradient descent is simply used in machine learning to find the
values of a function's parameters (coefficients) that minimize a cost function as far as
possible.
Gradient descent is an optimization algorithm that's used when training a machine learning
model. It's based on a convex function. Training data helps these models learn over time,
and the cost function within gradient descent. Until the function is close to or equal to
zero, the model will continue to adjust its parameters to yield the smallest possible error.
What is a Gradient?
A gradient simply measures the change in all weights with regard to the change in error.
You can also think of a gradient as the slope of a function. The higher the gradient, the
steeper the slope and the faster a model can learn. But if the slope is zero, the model stops
learning. In mathematical terms, a gradient is a partial derivative with respect to its inputs.
∆ w i=n ∑ ( td −od ) xd
In machine learning, a gradient is a derivative of a function that has more than one input
variable. Known as the slope of a function in mathematical terms, the gradient simply
measures the change in all weights with regard to the change in error.
Multilayer Network
Notation of Nodes
TUNING HYPER PARAMETERS
Hyperparameters
Hyperparameters are parameters whose values control the learning process and
determine the values of model parameters that a learning algorithm ends up learning.
The prefix ‘hyper_’ suggests that they are ‘top-level’ parameters that control the
learning process and the model parameters that result from it.
Random Search
In the random search method, we create a grid of possible values for hyperparameters. Each
iteration tries a random combination of hyperparameters from this grid, records the
performance, and lastly returns the combination of hyperparameters that provided the best
performance.
Grid Search
In the grid search method, we create a grid of possible values for hyperparameters. Each
iteration tries a combination of hyperparameters in a specific order. It fits the model on each
and every combination of hyperparameters possible and records the model performance.
Finally, it returns the best model with the best hyperparameters.
Source
Bayesian Optimization
Tuning and finding the right hyperparameters for your model is an optimization problem.
We want to minimize the loss function of our model by changing model parameters.
Bayesian optimization helps us find the minimal point in the minimum number of
steps. Bayesian optimization also uses an acquisition function that directs sampling to
areas where an improvement is possible over the current best observation.
Hyperband
Hyperband is a variation of random search, but with some explore-exploit theory to find the
best time allocation for each of the configurations.
This technique is a hybrid of the two most commonly used search techniques: Random
Search and manual tuning applied to Neural Network models.
PBT starts by training many neural networks in parallel with random hyperparameters. But
these networks aren’t fully independent of each other.
It uses information from the rest of the population to refine the hyperparameters and
determine the value of hyperparameter to try.
Source
BOHB
BOHB (Bayesian Optimization and HyperBand) mixes the Hyperband algorithm and
Bayesian optimization.
What is a Gradient?
The Gradient is nothing but a derivative of loss function with respect to the weights. It is
used to updates the weights to minimize the loss function during the back propagation in
neural networks.
These Gradients are used to update the weights, to minimize the loss function.
With the back propagation there are 2 types of Unstable Gradient Problem issues:
Vanishing Gradient
Exploding Gradient
AUTOENCODER
What is an Autoencoder?
An Autoencoder is a type of neural network that can learn to reconstruct images, text, and other data from
compressed versions of themselves.
The aim of an autoencoder is to learn a lower-dimensional representation (encoding) for a higher-
dimensional data, typically for dimensionality reduction, by training the network to capture the most important
parts of the input image.
5 types of autoencoders
The idea of Autoencoders for neural networks isn't new. In fact the first applications date to the 1980s.
Here are five popular autoencoders that we will discuss:
1. Undercomplete autoencoders
2. Sparse autoencoders
3. Contractive autoencoders
4. Denoising autoencoders
5. Variational Autoencoders (for generative modelling)
1. Undercomplete Autoencoders
An undercomplete autoencoder is one of the simplest types of autoencoders, it takes an image and tries to predict the
same image as output by reconstructing the image from the compressed bottleneck region.
2. Sparse Autoencoders
Sparse autoencoders are controlled by changing the number of nodes at each hidden layer.
Sparse autoencoders offer us an alternative method for introducing an information bottleneck without requiring a
reduction in the number of nodes at our hidden layers. Instead, we’ll construct our loss function such that we impose
penalty activations within a layer.
3. Contractive Autoencoders
Similar to other autoencoders, contractive autoencoders perform task of learning a representation of the image while
passing it through a bottleneck and reconstructing it in the decoder.
4. Denoising Autoencoders
Denoising autoencoders, as the name suggests, are autoencoders that remove noise from an image.
As opposed to autoencoders we’ve already covered, this is the first of its kind that does not have the input image as its
ground truth.
5. Variational Autoencoders
A variational autoencoder (VAE) provides a probabilistic manner for describing an observation in latent
space. Thus, rather than building an encoder which outputs a single value to describe each latent state
attribute, we'll formulate our encoder to describe a probability distribution for each latent attribute.
A variational autoencoder can be defined as being an autoencoder whose training is regularised to avoid overfitting
and ensure that the latent space has good properties that enable generative process.
Standard and variational autoencoders learn to represent the input just in a compressed form called the latent space or
the bottleneck.
Applications of autoencoders
Now that you understand various types of autoencoders, let’s summarize some of their most common use cases.
1. Dimensionality reduction
2. Image denoising
3. Generation of image and time series data
4. Anomaly Detection
5. Watermark Removal from an Image