DL Unit 5
DL Unit 5
Autoencoders
5 and Generative Models
Contents
5.1 Autoencoders
5.2 Regularized Autoencoders
5.3 Stochastic Encoders and Decoders
5.4
Deep Generative Models
Generative Adversarial Networks
5.5
Variational Autoencoders
5.6
Answers
5.7 Two Marks Questions with
Deep Learming 5-2 Autoencoders and Generative Models
5.1 Autoencoders
e Autoencoders play a fundamental role in unsupervised learning and in deen
architectures for transfer learning and other tasks. An autoencoder neural network is an
unsupervised learning algorithm that applies backpropagation, setting the target values
to be equal to the inputs.
An autoencoder is a special type of neural network that is trained to copy its input to its
output. Internally, it has a hidden layer h that describes a code used to represent the
input. The network may be viewed as consisting of two parts : An encoder
function
h=f(x) and a decoder that produces a reconstruction r=g (h).
For example, given an image of a handwritten digit, an
autoencoder first encodes the
image into a lower dimensional latent
representation, then decodes the latent
representation back to an image.
" An autoencoder learns to
compress the data while minimizing the
An autoencoder is unsupervised reconstruction error.
learning technique. It is an artificial neural network
used to learn data encodings of unlabeled data or
the task of representation
Properties of autoencoder: learning.
1. Data-specific :Autoencoders are only able to
what they have been trained on. meaningfully compress data similar to
Since they learn features
training data, they are different than a specific for the given
standard data
2. Lossy : The
output of the autoencoder will not be compression algorithm like gzip.
will be a close but exactly the same as the input, it
3.
degraded representation.
Unsupervised :
Autoencoders are considered an unsupervised
since they don't need
explicit labels to train on. learning technique
5.1.1 Architecture of
Fig. 5.1.1 shows
Autoencoder
" architecture of autoencoder.
Autoencoders are a specific type of
input to output. A
bottleneck feedforward neural networks trained to copy its
is imposed in
knowledge of the original the network to
and then the output is input. The input is represent a compressed
from thiscompressed
into a
linput.
atent-space reconstructed
representation which is a representation. lower-dimensional
The code is also
code
called as
compact "'summary" or
"compression of the
TECHNICAL an
eepLeeming
5-3 Autoencoders and Qenerathve Models
Input Ottput
Code
Decoder
Encoder
2 Encoder Decoder
Reconstructed
Original input
input Compressed
representation
autoencoder
Fig. 5.1.1 Architecture of
Encoder, code and decoder.
The encoder
of three components: input
Anautoencoder consists code, the decoder then reconstructs the
produces the
compresses the input and
encoded
only using this code. using afunction f. It then decodes the
values x,
Autoencoders encodes the input values identical to the input values.
functiong, to create output
Values f(X), using a techniques similar to
feedforward networks , training
autoencoder is a special case of gradient descent following gradients
" As
such as minibatch
feedforward neural network
by back-propagation can be used for training. neural networks. Code is
computed fully-connected feedforward
Both the encoder and decoder are
of user choice. The number of nodes
dimensionality
layer of an ANN with the training the autoencoder.
asingle hyperparameter that we set before
For
in thecode layer is a outputs under some constraints:
inputs to its
avtoencoder learns to copy its adding noise to the inputs.
An latent space or
dimensionality of the feature
example, limiting the input, encode it to identify latent
the
does autoencoders work ? We take representation to recreate the input. We
" How latent feature reconstruction error
representation. Decode the output. To reduce the
loss by comparing the input and
calculate the
we back propagate and update the weights. Weight is updated based on how much they
are responsible for the error.
" There are four hyperparametersthat must be set before training the autoencoders. They
are as follows :
1. Code size : It is the number of nodes in the middle-layer. Smaller the size more is the
compression.
2. Number of layers: The autoencoder can be as deep as we like without considering
the input and output.
3. Number of nodes per layer: The number of nodes per layer decreases with each
subsequent layer of the encoder and increases back in the decoder. Also the layer
structure of decoder is symmetric to the encoder.
4. Loss function : Mean squared error or binary cross-entropy can be
used as loss
function. Cross-entropy is used if the input values are in the range [0,,1] else mean
squared error is used.
" The different types of autoencoders are as
follows :
1. Undercomplete autoencoders 2. Sparse
autoencoders
3. Contractive autoencoders
4. Denoising autoencoders
5. Variational autoencoders.
knowledge
TECHNICAL PUBLCATIONS- an up-thrust for
5-6
Autoencoders and Gonerative Mocdols
Deep Learning
generating a latent space or bottleneck.
The primary use for autoencoders like these is
can be casily decompressed
which forms a compressed substitute of the input data and
back with the help of the network when needed.
" Fig. 5.1.2shows undercomplete autoencoders. Output layer
Input layer Hidden layers
Decoder
autoencoders
Fig. 5.1.2 Undercomplete
autoencoder has no explicit regular1zation term - we simply train our
An undercomplete that the model
according to the reconstruction loss. Thus, our only way to ensure
model
data.
is not memorizing the input most
undercomplete representation forces the autoencoder to capture the
Learning an simply as
features of the training data. The learning process is described
salient function penalizing, g (f(x)
L(x, g (f(x)) where L is a loss
minimizing a loss function
being dissimilar from x, such as the mean squared error.
for
decoder is linear and L is the mean squared error, an undercomplete
When the Component Analysis (PCA).
same subspace as Principal
autoencoder learns to span the task has learned the principal
trained to perform the copying
In this case, an autoencoder
side-effect.
subspace of the training data as a PCA that
dimensionality reduction, we tend to think of methods like
When we think of
hyperplane to represent data in a higher-dimensional form
form a lower-dimensional
without losing information. disadvantage
only build linear relationships. As a result, it is put at a
" However, PCA can autoencoders that can learn non-linear
undercomplete
compared with methods like
dimensionality reduction.
relationships and therefore, perform better in
knowledge
NICAL P IBUCATONS - an up-thrust for
Deep Leaming Autoencoders and Generative Models
5-7
Inhs tom of nonlinear dimensionality reduction where the autoencoder learns a non
linear manifold is also termed as manifold learning
Effectively, if we remove allnon-linear activations from an undercomplete autoencoder
and use only linear lavers we reduce the undercomplete autoencoder into something
that works at an equal footing with PCA.
Ine loss function used to train an ndercomplete autoencoder is called reconstruction
loss, as it is a check of how well the image has been reconstructed from the input.
Disadvantages :
1. For it to be working, it's essential that the individual nodes of a trained model which
activate are data dependent and that different inputs will result in activations of
different nodes through the network.
2 Denoising Autoencoders
Denoising Autoencoders (DAE) are a stochastic version of standard autoencoders that
reduces the risk of learning the identity function. Denoising autoencoders attempt to get
around this risk of identity-function affiliation by introducing noise, i.e. randomly
corrupting input so that the autoencoder must then denoise'" or reconstruct the original
nput.
peepLoarning
AutoenHfers ang Ganostiye Mist
Denoising autoencoder is atype of autoencoder, which is atype of neural netey sed
for
unsupervised lenrning.
input before providing it toDenoising refers to intentionally adding nosse te) the raw
the network. Denoising can he achieved using stochastic
mapping.
" Keeping the code
layer snall forced our autoencoder to learn an intell1gent
representation of the data. There is another way to force the autoencoder to learn useful
features, which is adding
noise-free random noise to its inputs and making it recover the original
data. This way the autoencoder can't simply copy the input to its output
because the input also contains
proauce the underlying meaningfulrandom noise. We are asking itto subtract the noise and
data. This is called a denoising autoencoder.
Denoising autoencoders are a stochastic version of standard
the risk of learning the autoencoders that reduces
identity function.
n general, the more hidden Javers in an
reduction autoencoder., the more refined this
dimenssonal
Cali oe. However, if an autoencoder has more hidden layers than inputs there
IS a isk the algorithm only learns the identity function during
the output simply equals the input and then training, the point where
" Fig. 5.2.2 shows becomes useless.
denoising autoencoder.
coder
Decoder
Fig. 5.2.2 Denoising
autoencoder
Denoising autoencoders attempt to get around this risk of
introducing noise, i.e. randomly corrupting input so thatidentity-function affiliation by
"denoise' or reconstruct the original input. the autoencoder must then
" The denoising
where the noiseautoencoder
gets rid of noise by
can be filtered out learning a representation of the input
easily.
8-10 Autoenooders and Generative Moxlels
difficult, the auteoencoder eef..
While reoving l0ise directly fhom the image seems
manifold (ike i.
this hy napng the input data into a lower-dmensional
undenonplete autocneoders), where filtering of noise becomesmuch easier.
Essenthally. denoising autoencoders work with the help of non-linear dimensionalty
retetion. The loss function generally used in these types of networks isL, or L., loss,
" Denoising autoencoder helps :
1. The hidden layers of the autoencoder learn more robust filters
2. Reduce the risk of overfitting in the autoencoder
3. Prevent the autoencoder from learming asimple identify function.
" The DAE is trained using a supervised learming algorithm and can be used foravariety
of tasks, such as image and spech denoising, anomaly detection and data compression.
5.3 Stochastic Encoders and Decoders
Autoencoders are feedforward networks and use the same loss functions and output unit
that are used in traditional feedforward networks. For designing the output units and the
loss function of afeedforward network, an output distribution p (y |x) is defined and the
negative log-likelihood - log p(y | x) is minimized where yis avector of targets, e.g.
class labels.
But in an autoencoder, target as well as the input is x and still the same strategy can be
applied. So by using same strategy as in feedforward network, we can assume that for a
given code h, decoder is providing aconditional distribution p,ece|h).
Autoencoder can then be trained by minimizing - log Pecoe(X|h) where the exact form
of the loss function depends on the form of Pecoder'
Similar to traditional feedforward networks, linear output units are used to parameterize
the mean of a Gaussian distribution for real valued x. The negative log-likelihood yields
a mean squared error criterion in this case. Binary x values correspond to a Bernoulli
distribution whose parameters are given by a sigmoid output unit, discrete x values
correspond to a softmax distribution and so on.
Given h, the output variables are treatêd as
conditionaly independent so that evaluation of
Pencoder(hx) \Pdecoderlxjh)
probability distribution is inexpensive. For modeling
outputs with correlations, mixture density outputs can
be used.
Simpie runctions but instead involve some noise iniection, meaning that their output can
be seen as sampled from a distribution.
stochastic encoder
Any latent variable model Pmode(h, x) defines
Peneode (h x) * Pmodehx) and astochastic decoder paecoder(X |h) Pmodex|h).
5.4 Deep Generative Models
new
Generative models learn the distribution of the training data and help in generating
those distribution. In most cases,
tata PointS rom the learned distribution by sampling
Gaussian distribution of the data are assumed.
learning technique that learns the
diet, à deep generative model is an unsupervised
model network.
distribuion of the training data while optinmizing the loss function of the
most of them are used to
1nere are several variants ofthe deen generative models and
are also classified
pertorm dual function viz. abstraction and generation. These models
implicit probability
depending on whether the network is learning an explicit or an
distribution.
rmany hidden layerstrained
Deep Generative Models (DGM) are neural networks with
distributions using a large
to approximate complicated, high-dimensional probability
use the DGMs to estimate the
number of samples. When trained successfulý, we can
the underlying
likelihood of each observation and to create new samples from
distribution.
AutoEncoder (VAE) and Generative
Deep generative models are Variational
Adversarial Nets (GANS).
a decoder. VAÊs can perform
VAEsare generative models consistingg of an encoder and
Gaussian distribution for the data. Most
both abstraction and generation and assume
data samples from the learnt
commonly, a trained VAE can be used for generating new
distribution space.
and a discriminator. GANS are
GANs are generative models consisting of a generator
the generator and the
trained using an adversarial learning technique, where
discriminator compete with each other.
tries to generate more
During this process of training, the generator improves itself and
real dataset.
realistic samples which the discriminator cannot distinguish with the
5.4.1 Boltzmann Machine
nodes make binary
" Boltzmann Machine is a kind of recurrent neural network where the
decisions and are present with certain biases. Several Boltzmann machines can be
Hidden
nodes
Visible
nodes
of Boltzmann machine
Fig.5.4.1 Architecture
Top layer
LR
classifier
-DBN
RBM
layers
Bottom
layer
Input
vectors
Real Discriminator
Real/ Fake
Fake
Update
architecture of GAN
Fig. 5.5.1 Basic
knowledge
an up-thrust for
TECHNICAL PUBLICATIONSS
Deep Learning 7-16
" The veneratoy ereates an image from arandom seed Ihe discriminator evalates he
imagc hased on its training to see if it can telH reai trom fake. The result gres hack to fhe
generator anddiscriminator so that they improve.
The genertor model takes a fixed-length random vector as input and
in the domain.
generates a 14nole
.The vector is dravwn from randomly from aGaussian distribution, and the
vector is sed
to seed the generative process. After training, points in this
multidimensional vector
space will correspond to points in the problem domain. forming a cornpressed
representation of the data distribution.
" This vector space is refered to as a latent space, ot a vector space comprised of latent
variables. Latent variables or hidden variables are those variables that are important for
adomain but are not directly observable.
4. CycleGAN
designed for mapping one image to another image or image-to-image
" Ihis GAN is
translations.
and winter are subjected to the Image-Image translation process,
" For instance, if summer
mapping function that could transform summer images into those of
we discOver a features in accordance with the
versa by adding or removing
WInter mages and vice
such that the predicted output and actual output have the least amount
mapping function,
of loss.
representation
What is atm of autoencoder ?
aMeNOteris to leam a
lower-dimensional
by training the
network
to
(encoding for a
dimensionality reduction, capture
Nihesinal data. ypically Nr
image.
te mst imgrtant arts ofthe input
autoencoder ? encourages the model
to
What is regularization in function that have other
a loss
Ans. Regularized autoenNOoders use
output.
Rwnes sides opving its input to its
autoencoder supervised or unsupervised ? compressed
Is
network model that seeksto learn a representation
trained
of
Ans. : An autoenoder is a
neural although technically, they are
an unsupervised learning
method, using
te mut. They are self-supervised.
as
Senisilcaming methods, referred to
autoencoder ?
Q.5 Why do we use representation (encoding) for higher-
lower-dinmensional
to learn a
Ans. : An autoencoder ainns trainingthe network to capture the most
ata, ypically for dimensionalityreduction, by
umensonal
yortant parts of the input image.
network used for ?
Q.6 What is a deep belief associated with
Networks (DBNs) have been used to address the problems
Ans. : Deep Belief to noer
such as slow learning, becoming stuck in local minima owing
ciasssc neural networks,
training datasets.
peraneter selection and requiring many
supervised or unsupervised ?
Q.7 is the deep belief network
unsupervised learning algorithm consisting of two
Ans. : Deep Belief Networks (DBN) is an
and Restricted Boltzmann Machines. In
diterent types of neural nerworks - Belief Networks belief
percepron and backpropagation neural networks, DBN is also a multi-layer
ontrast to