DL Unit 4
DL Unit 4
AUTO ENCODER:
Denoisingautoencoder
Sparse Autoencoder
Deep Autoencoder
Contractive Autoencoder
UndercompleteAutoencoder
Convolutional Autoencoder
VariationalAutoencoder
DENOISING AUTOENCODER:
Autoencoders are Neural Networks which are commonly used for feature
selection and extraction. However, when there are more nodes in the
hidden layer than there are inputs, the Network is risking to learn the so-
called
“Identity Function”, also called “Null Function”, meaning that the
output equals the input, marking the Autoencoder useless.
DenoisingAutoencoders solve this problem by corrupting the data on
purpose by randomly turning some of the input values to zero. In general,
the percentage of input nodes which are being set to zero is about 50%.
Other sources suggest a lower count, such as 30%. It depends on the amount
of data and input nodes you have.
Specifically, if the autoencoder is too big, then it can just learn the data, so
the output equals the input, and does not perform any useful representation
learning or dimensionality reduction.
When calculating the Loss function, it is important to compare the output
values with the original input, not with the corrupted input. That way, the
risk of learning the identity function instead of extracting features is
eliminated.
Advantages-
2) Sparse Autoencoder
Sparse autoencoders have hidden nodes greater than input nodes.
They can still discover important features from the data.
A generic sparse autoencoder is visualized where the obscurity of a node
corresponds with the level of activation.
Sparsity constraint is introduced on the hidden layer. This is to prevent
output layer copy input data.
Sparsity may be obtained by additional terms in the loss function during the
training process, either by comparing the probability distribution of the
hidden unit activations with some low desired value,or by manually zeroing
all but the strongest hidden unit activations.
Some of the most powerful AIs in the 2010s involved sparse autoencoders
stacked inside of deep neural networks.
The structure of an SAE and what makes it different from an Undercomplete
AE
By sparsity, we mean that fewer neurons can be activated at the same time,
creating an information bottleneck similar to that of Unercomplete AE. See the
below illustration.
Advantages-
Sparse autoencoders have a sparsity penalty, a value close to zero but not
exactly zero. Sparsity penalty is applied on the hidden layer in addition to
the reconstruction error. This prevents overfitting.
They take the highest activation values in the hidden layer and zero out the
rest of the hidden nodes. This prevents autoencoders to use all of the hidden
nodes at a time and forcing only a reduced number of hidden nodes to be
used.
Drawbacks-
Contractive Autoencoder
The objective of a contractive autoencoder is to have a robust learned
representation which is less sensitive to small variation in the data.
Robustness of the representation for the data is done by applying a penalty
term to the loss function.
Contractive autoencoder is another regularization technique just like sparse
and denoisingautoencoders.
However, this regularizer corresponds to the Frobenius norm of the Jacobian
matrix of the encoder activations with respect to the input.
Frobenius norm of the Jacobian matrix for the hidden layer is calculated with
respect to input and it is basically the sum of square of all elements.
https://www.geeksforgeeks.org/contractive-autoencoder-cae/
Advantages-
It gives significant control over how we want to model our latent distribution
unlike the other models.
After training you can just sample from the distribution followed by
decoding and generating new data.
Drawbacks-
When training the model, there is a need to calculate the relationship of each
parameter in the network with respect to the final output loss using a
technique known as backpropagation. Hence, the sampling process requires
some extra attention.
Deep Learning Srihari
5. Denoising Autoencoders
• An autoencoder that receives a corrupted data point
as input and is trained to predict the original,
uncorrupted data point as its output
• Traditional autoencoders minimize L(x, g ( f (x)))
• where L is a loss function penalizing g( f (x)) for being
dissimilar from
x, such as L2 norm of difference: mean squared error
• A DAE
L(x, g(f (x! )))
minimizes
• wher x! is a copy of x that is corrupted by some form of
e noise
• The autoencoder must undo this corruption rather
than simply copying their input
3
Deep Learning Srihari
7
Deep Srihari
Learning DAE learns a vector
field