0% found this document useful (0 votes)
8 views91 pages

Chapter 7 - Autoencoders

Autoencoders are unsupervised neural networks designed for representation learning, compressing input data into a lower-dimensional latent space and then reconstructing the original input. They consist of three main components: an encoder that compresses the input, a code representing the compressed data, and a decoder that reconstructs the input from the code. Various types of autoencoders exist, including undercomplete, sparse, convolutional, denoising, variational, and contractive autoencoders, each with specific functionalities and applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views91 pages

Chapter 7 - Autoencoders

Autoencoders are unsupervised neural networks designed for representation learning, compressing input data into a lower-dimensional latent space and then reconstructing the original input. They consist of three main components: an encoder that compresses the input, a code representing the compressed data, and a decoder that reconstructs the input from the code. Various types of autoencoders exist, including undercomplete, sparse, convolutional, denoising, variational, and contractive autoencoders, each with specific functionalities and applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 91

Chapter 6

Autoencoders
6.1. Introduction to Autoencoders
• Autoencoders are an unsupervised learning technique in which we
leverage neural networks for the task of representation learning.
• We design a neural network architecture such that we impose a
bottleneck in the network which forces a compressed knowledge
representation of the original input.
• If the input features were each independent of one another, this
compression and subsequent reconstruction would be a very
difficult task.
• However, if some sort of structure exists in the data (i.e.
correlations between input features), this structure can be learned
and consequently leveraged when forcing the input through the
network's bottleneck.
6.1. Introduction to Autoencoders…
• Autoencoders are artificial neural networks capable of learning
dense representations of the input data, called latent
representations or codings, without any supervision (i.e., the
training set is unlabeled).
• These codings typically have a much lower dimensionality than
the input data, making autoencoders useful for dimensionality
reduction, especially for visualization purposes.
• Autoencoder first take the input and compress it into a
low-dimensional vector.
• This part of the network is called the encoder because it is
responsible for producing the low-dimensional embedding or
code.
6.1. Introduction to Autoencoders…
• The second part of the network tries to invert the computation of
the first half of the network and reconstruct the original input.
• This piece is known as the decoder.
• The overall architecture is illustrated in figure below.
• The autoencoder architecture attempts to construct a
high-dimensional input into a low-dimensional embedding.
• It then uses that low-dimensional embedding to reconstruct the
input.
6.1. Introduction to Autoencoders…
• An autoencoder consists of three components:
• i. Encoder: An encoder is a feedforward, fully connected neural
network that compresses the input into a latent space representation
• It encodes the input image as a compressed representation in a
reduced dimension.
• The compressed image is the distorted version of the original image.
• ii. Code: This part of the network contains the reduced representation
of the input that is fed into the decoder.
• It is also called bottleneck.
• iii. Decoder: Decoder is also a feedforward network like the encoder
and has a similar structure to the encoder.
• The decoder layer decodes the encoded image back to the original
dimension.
• It is responsible for reconstructing the input back to the original
dimensions from the code.
• The decoded image is reconstructed from latent space representation,
and it is reconstructed from the latent space representation and is a
lossy reconstruction of the original image.
6.1. Introduction to Autoencoders…
• First, the input goes through the encoder where it is compressed and
stored in the layer called Code, then the decoder decompresses the
original input from the code.
• The main objective of the autoencoder is to get an output identical to
the input.
• Note that the decoder architecture is the mirror image of the encoder.
• This is not a requirement but it’s typically the case.
• The only requirement is the dimensionality of the input and output
must be the same.
6.1. Introduction to Autoencoders…
• The architecture as a whole looks something like this:
6.1. Introduction to Autoencoders…
• An autoencoder will encode the input distribution into a
low-dimensional tensor, which usually takes the form of a vector.
• This will approximate the hidden structure that is commonly
referred to as the latent representation, code, or vector.
• This process constitutes the encoding part.
• The latent vector will then be decoded by the decoder part to
recover the original input.
• As a result of the latent vector being a low-dimensional
compressed representation of the input distribution, it should be
expected that the output recovered by the decoder can only
approximate the input.
• The dissimilarity between the input and the output can be
measured by a loss function.
6.1. Introduction to Autoencoders…

Figure autoencoder
• Autoencoders are mainly a dimensionality reduction (or
compression) algorithm with a couple of important properties:
• Data-specific: Autoencoders are only able to meaningfully
compress data similar to what they have been trained on.
• Since they learn features specific for the given training data, they
are different than a standard data compression algorithm like gzip.
• So, we can’t expect an autoencoder trained on handwritten digits
to compress landscape photos.
• Lossy: The output of the autoencoder will not be exactly the same
as the input, it will be a close but degraded representation.
• If you want lossless compression, autoencoders are not the way to
go.
• Unsupervised: To train an autoencoder we don’t need to label the
data, just throw the raw input data at it.
• Autoencoders are considered an unsupervised learning technique
since they don’t need explicit labels to train on.
• But to be more precise they are self-supervised because they
generate their own labels from the training data.
6.1. Introduction to Autoencoders…
Stacked Autoencoders
• Just like other neural networks, autoencoders can have multiple
hidden layers.
• Such autoencoders are called stacked autoencoders (or deep
autoencoders).
• Adding more layers helps the autoencoder learn more complex
coding.
• That said, one must be careful not to make the autoencoder too deep.
• Imagine an encoder so powerful that it just learns to map each input
to a single arbitrary number (and the decoder learns the reverse
mapping).
• Obviously such an autoencoder will reconstruct the training data
perfectly, but it will not have learned any useful data representation
in the process and it is unlikely to generalize well to new instances.
• The architecture of a stacked autoencoder is typically symmetrical
with regard to the central hidden layer (the coding layer).
• To put it simply, it looks like a sandwich.
• For example, an autoencoder for MNIST may have 784 inputs (28
x 28), followed by a hidden layer with 100 neurons, then a central
hidden layer of 30 neurons, then another hidden layer with 100
neurons, and an output layer with 784 neurons.

Figure Stacked autoencoder


6.2. Examples of Autoencoders
• There are many types of autoencoders.
• The most popular are:
– Undercomplete autoencoders
– Convolutional autoencoders
– Sparse autoencoders
– Contractive autoencoders
– Denoising autoencoders
– Variational Autoencoders
6.2. Examples of Autoencoders…
i. Undercomplete Autoencoder
• Undercomplete autoencoder takes in an image and tries to predict
the same image as output, thus reconstructing the image from the
compressed bottleneck region.
• Undercomplete autoencoders are truly unsupervised as they do not
take any form of label, the target being the same as the input.
• The objective of undercomplete autoencoder is to capture the
most important features present in the data.
• Undercomplete autoencoders have a smaller dimension for hidden
layer compared to the input layer.
• This helps to obtain important features from the data.
• It minimizes the loss function by penalizing the output y for being
different from the input x.
6.2. Examples of Autoencoders…
• Undercomplete autoencoders do not need any regularization as
they maximize the probability of data rather than copying the
input to the output.

• The primary use of such autoencoders is the generation of the


latent space or the bottleneck, which forms a compressed
substitute of the input data and can be easily decompressed back
with the help of the network when needed.
6.2. Examples of Autoencoders…
ii. Sparse Autoencoders
• Sparse autoencoders are controlled by changing the number of
nodes at each hidden layer.
• Since it is impossible to design a neural network with a flexible
number of nodes at its hidden layers, sparse autoencoders work by
penalizing the activation of some neurons in hidden layers.
• It means that a penalty directly proportional to the number of
neurons activated is applied to the loss function.
• As a means of regularizing the neural network, the sparsity
function prevents more neurons from being activated.
• Sparse autoencoder training criterion involves a sparsity penalty.
• In most cases, we would construct our loss function by penalizing
activations of hidden layers so that only a few nodes are
encouraged to activate when an input is fed into the network.
• A sparse autoencoder is a type of autoencoder that is trained to
produce sparse representations of its inputs.
• Sparse representations are representations in which most of the
activation values are zero.
• This is achieved by adding a sparsity penalty term to the
autoencoder's loss function that encourages the encoder to
produce latent representations that are more sparse.

Figure sparse autoencoder


iii. Convolutional Autoencoders
• Autoencoders in their traditional formulation does not take into
account the fact that a signal can be seen as a sum of other signals.
• Convolutional Autoencoders use the convolution operator to
exploit this observation.
• CNNs are particularly well-suited for learning spatial
representations of data, such as images and videos.
• They learn to encode the input in a set of simple signals and then
try to reconstruct the input from them, modify the geometry or the
reflectance of the image.
• They are the state-of-art tools for unsupervised learning of
convolutional filters.
• Once these filters have been learned, they can be applied to any
input in order to extract features.
• These features, then, can be used to do any task that requires a
compact representation of the input, like classification.
• Advantages:
– Due to their convolutional nature, they scale well to
realistic-sized high dimensional images.
– Can remove noise from picture or reconstruct missing parts.
• Drawbacks:
– The reconstruction of the input image is often blurry and of
lower quality due to compression during which information is
lost.
6.2. Examples of Autoencoders…
iv. Denoising autoencoder
• The aim of Denoising Autoencoders is to remove noise from an
images.
• Denoising autoencoders create a noisy copy of the input image by
introducing some noise.
• This helps the autoencoders to avoid copying the input to the
output without learning features about the data.
• These autoencoders take a partially corrupted input while training
to recover the original undistorted input.
• The model learns a vector field for mapping the input data
towards a lower dimensional manifold which describes the natural
data to cancel out the added noise.
• Another example where denoising autoencoders could be used is
to remove the watermarks from an image.
6.2. Examples of Autoencoders…
v. Variational Autoencoders
• Variational autoencoders (VAEs) are models that address a
specific problem with standard autoencoders.
• When you train an autoencoder, it learns to represent the input just
in a compressed form called the latent space or the bottleneck.
• However, this latent space formed after training is not necessarily
continuous and, in effect, might not be easy to interpolate.
• Variational autoencoders deal with this specific topic and express
their latent attributes as a probability distribution, forming a
continuous latent space that can be easily sampled and
interpolated.
6.2. Examples of Autoencoders…
vi. Contractive Autoencoders
• Contractive autoencoders work on the basis that similar inputs
should have similar encodings and a similar latent space
representation.
• It means that the latent space should not vary by a huge amount
for minor variations in the input.
• The goal of contractive autoencoder is to reduce the
representation’s sensitivity towards the training input data.
• It targets to learn invariant representations to unimportant
transformations for the given data.
• In order to achieve this, we must add a regularizer or penalty term
to the cost function that the autoencoder is trying to minimize.
• To train a model that works along with this constraint, we have to
ensure that the derivatives of the hidden layer activations are
small with respect to the input.
6.2. Examples of Autoencoders…
• A contractive autoencoder is less sensitive to slight variations in
the training dataset.
• We can achieve this by adding a penalty term or regularizer to
whatever cost or objective function the algorithm is trying to
minimize.
• The result reduces the learned representation's sensitivity towards
the training input.
• This regularizer needs to conform to the Frobenius norm of the
Jacobian matrix for the encoder activation sequence concerning
the input.
• If this value is 0, we don't observe any change in the learned
hidden representations as we change input values.
• But if the value is huge, then the learned model is unstable as the
input values change.
6.3. Architectures of Autoencoders
• In autoencoders, both the encoder and decoder are fully-connected
feedforward neural networks.
• Code is a single layer of an ANN with the dimensionality of our choice.
• The number of nodes in the code layer (code size) is a hyperparameter
that we set before training the autoencoder.
6.3. Architectures of Autoencoders…
• This is a more detailed visualization of an autoencoder.
• First the input passes through the encoder, which is a
fully-connected ANN, to produce the code.
• The decoder, which has the similar ANN structure, then produces
the output only using the code.
• The goal is to get an output identical with the input.
• Note that the decoder architecture is the mirror image of the
encoder.
• This is not a requirement but it’s typically the case.
• The only requirement is the dimensionality of the input and output
needs to be the same.
• Anything in the middle can be played with.
• There are 4 hyperparameters that we need to set before training an
autoencoder:
• Code size: number of nodes in the middle layer. Smaller size
results in more compression.
• Number of layers: the autoencoder can be as deep as we like. In
the figure above we have 2 layers in both the encoder and
decoder, without considering the input and output.
• Number of nodes per layer: the autoencoder architecture we’re
working on is called a stacked autoencoder since the layers are
stacked one after another.
• The number of nodes per layer decreases with each subsequent
layer of the encoder, and increases back in the decoder.
• Also, the decoder is symmetric to the encoder in terms of layer
structure.
• Loss function: we either use mean squared error (mse) or binary
crossentropy. If the input values are in the range [0, 1] then we
typically use crossentropy, otherwise we use the mean squared
error.
6.3. Architectures of Autoencoders…

6.3. Architectures of Autoencoders…

6.3. Architectures of Autoencoders…

6.3. Architectures of Autoencoders…

6.3. Architectures of Autoencoders…

6.3. Architectures of Autoencoders…

6.3. Architectures of Autoencoders…

Training One Autoencoder at a Time
• Rather than training the whole stacked autoencoder in one go, it is
possible to train one shallow autoencoder at a time, and then stack
all of them into a single autoencoder.
• This why it is called stacked autoencoder.
• During the first phase of training, the first autoencoder learns to
reconstruct the inputs.
• Then we encode the whole training set using this first
autoencoder, and this gives us a new (compressed) training set.
• We then train a second autoencoder on this new dataset.
• This is the second phase of training.
• Finally, we build a big network using all these autoencoders.
• This gives us the final stacked autoencoder.
• We could easily train more autoencoders this way, building a very
deep stacked autoencoder.
6.3. Architectures of Autoencoders…

Figure Training one autoencoder at a time


6.3. Architectures of Autoencoders…
• One limitation of autoencoders is that they cannot be used to
generate more data for us.
• This is because we don't know how to create new latent vectors to
feed to the decoder; the only way is to use the encoder on input
data.
• We will now look at modification to the autoencoder that looks to
help solve this issue.
6.4. Variational Autoencoders
• Generative models in machine learning are capable of looking at a
set of data points (e.g. images), capturing some inner structure in
them and producing new data points (e.g. new images), which
bear the properties of the training data set.
• Variational Autoencoders (VAEs) belong to the family of
generative models.
• The generator of VAE is able to produce meaningful outputs
while navigating its continuous latent space.
• The possible attributes of the decoder outputs are explored
through the latent vector.
Variational autoencoder

gɸ fθ

Turning the autoencoder to


image generator by removing
the encoder part and keeping
the decoder only
6.4. Variational Autoencoders…
• A classical autoencoder takes an image, maps it to a latent vector
space via an encoder module and then, decodes it back via a
decoder module.
• The latent space z produced by the encoder is sparsely populated,
meaning that it is difficult to predict the distribution of values in
the latent space.
• Values are scattered and space will appear to be well utilized in a
2D representation.
• This is a very good property for compression systems.
• However, for generating new images, this sparsity is an issue.
• Finding a latent value for which the decoder knows how to
produce a valid image is almost impossible.
6.4. Variational Autoencoders…
• Furthermore, if space has gaps between clusters, and the decoder
receives a code from there, it will lack the knowledge to generate
something useful.
• Standard autoencoders learn to generate compact representations
and reconstruct their inputs well, but aside from a few
applications like denoising autoencoders, they are fairly limited.
• The fundamental problem with autoencoders, for generation, is
that the latent space they convert their inputs to and where their
encoded vectors lie, may not be continuous, or allow easy
interpolation.
Figure Optimizing purely for reconstruction loss
6.4. Variational Autoencoders…
• For example, training an autoencoder on the MNIST dataset, and
visualizing the encodings from a 2D latent space reveals the
formation of distinct clusters.
• This makes sense, as distinct encodings for each image type
makes it far easier for the decoder to decode them.
• This is fine if you’re just replicating the same images.
• But when you are building a generative model, we don’t want to
prepare to replicate the same image put in.
• We want to randomly sample from the latent space, or generate
variations of an input image from a continuous latent space.
• It is difficult to generate image using autoencoders.
• This is where variational autoencoders come in to serve as
generators.
6.4. Variational Autoencoders…
• To understand how a variational autoencoder model differs from
standard autoencoder architectures, it is useful to examine the
latent space.
• The main benefit of a variational autoencoder is that it is capable
of learning smooth latent space representations of the input data.
• For standard autoencoders, it simply needs to learn an encoding
which allows us to reproduce the input.
• Focusing only on reconstruction loss does allow us to separate out
the classes which should allow our decoder model the ability to
reproduce the original handwritten digit.
• But there is an uneven distribution of data within the latent space.
• In other words, there are areas in latent space which don't
represent any of our observed data.
6.7. Variational Autoencoders…
6.4. Variational Autoencoders…
• When the two terms, reconstruction loss and KL divergence, are
optimized simultaneously, the VAE is encouraged to describe the
latent state for an input with distributions close to the prior but
deviating when necessary to describe salient features of the input.
• Variational autoencoder works by making the latent space more
predictable, more continuous, less sparse.
• By forcing latent variables to become normally distributed, VAEs
gain control over the latent space.
• To provide an example, let us suppose we have trained an
autoencoder model on a large dataset of faces with a encoding
dimension of 6.
• An ideal autoencoder will learn descriptive attributes of faces
such as skin color, beard, gender, whether or not the person is
wearing glasses, etc. in an attempt to describe an observation in
some compressed representation.
6.4. Variational Autoencoders…
• In the above example, we have described the input image in
terms of its latent attributes using a single value to describe each
attribute.
• However, we may prefer to represent each latent attribute as a
range of possible values.
• For instance, what single value would you assign for the smile
attribute if you feed in a photo of the Mona Lisa?
• Using a variational autoencoder, we can describe latent attributes
in probabilistic terms.
• With this approach, we will now represent each latent attribute for
a given input as a probability distribution.
• When decoding from the latent state, we will randomly sample
from each latent state distribution to generate a vector as input for
our decoder model.
• VAEs have one fundamentally unique property that separates
them from vanilla autoencoders.
• This property that makes them so useful for generative modeling:
– their latent spaces are continuous, allowing easy random
sampling and interpolation.
• It achieves this by making its encoder not to output an encoding
vector of size n, rather, outputting two vectors of size n:
– a vector of means, μ, and
– a vector of standard deviations, σ.
• Instead of forwarding the latent values to the decoder directly,
VAEs use them to calculate a mean and a standard deviation.
• The input to the decoder is then sampled from the corresponding
normal distribution.
• During training, VAEs force this normal distribution to be as
close as possible to the standard normal distribution by including
the Kullback–Leibler divergence in the loss function.
6.4. Variational Autoencoders…

Figure Variational autoencoder architecture


6.4. Variational Autoencoders…
• They form the parameters of a vector of random variables of
length n, with the ith element of μ and σ being the mean and
standard deviation of the ith random variable, Xi, from which we
sample, to obtain the sampled encoding that we pass onward to
the decoder.

Figure Stochastically generating encoding vectors


6.4. Variational Autoencoders…
6.4. Variational Autoencoders…
• This stochastic generation means, that even for the same input,
while the mean and standard deviations remain the same, the
actual encoding will somewhat vary on every single pass simply
due to sampling.

Figure standard vs variational autoencoder


6.4. Variational Autoencoders…
• Intuitively, the mean vector controls where the encoding of an
input should be centered around.
• The standard deviation controls the “area”, how much from the
mean the encoding can vary.
• As encodings are generated at random from anywhere inside the
distribution, the decoder learns that not only is a single point in
latent space referring to a sample of that class, but all nearby
points refer to the same class as well.
• This allows the decoder to not just decode single, specific
encodings in the latent space, but ones that slightly vary too, as
the decoder is exposed to a range of variations of the encoding of
the same input during training.
6.4. Variational Autoencoders…

Figure A VAE maps an image to two vectors, mean and variance, which define a
probability distribution over the latent space, used to sample a latent point to decode
• In VAE, we encode the input as a distribution over the latent
space, instead of considering it as a single point.
• This encoded distribution is chosen to be normal so that the
encoder can be trained to return the mean and the variance matrix.
• Instead of compressing its input image into a fixed code in the
latent space, VAE turns the image into the parameters of a
statistical distribution: a mean and a variance.
• Essentially, this means we are assuming the input image has been
generated by a statistical process, and that the randomness of this
process should be taken into account during encoding and
decoding.
• The VAE then uses the mean and variance parameters to
randomly sample one element of the distribution, and decodes that
element back to the original input.
• The stochasticity of this process improves robustness and forces
the latent space to encode meaningful representations everywhere:
every point sampled in the latent space is decoded to a valid
output.
6.4. Variational Autoencoders…

Figure representing each latent attribute of an input as a probability distribution



6.4. Variational Autoencoders…
• Model training by gradient descent requires that our model be
differentiable with respect to its learned parameters.

Figure variational autoencoder learnable parameters


6.4. Variational Autoencoders…
• This presupposes that the model is deterministic i.e. a given input
always returns the same output for a fixed set of parameters, so
the only source of stochasticity are the inputs.
• Incorporating a probabilistic sampling node would make the
model itself stochastic (random).
• This can be overcome by using reparameterization trick.
• The idea of the reparameterization trick is to take out the random
sample node from the backpropagation loop.
• The smart trick is to sample a hidden state z via z=μ + σ⊙ε, where
ε is a random normal distribution with mean 0 and standard
deviation 1 i.e. ε ~ N(0, 1).
6.4. Variational Autoencoders…
• The smart trick is to sample a hidden state z via z=μ+σ⊙ε, where ε
is a random normal distribution with mean 0 and standard
deviation 1 i.e. ε ~ N(0, 1).
• Now the random node is not blocking the path for
backpropagation for μ and σ.

reparametrization enables
6.4. Variational Autoencoders…
• After reparametrization, the produced latent vector z will be the
same as before.
• But making the change allows the gradients to flow back through
to the encoder part of the VAE.
• Now, we need a method to compute the difference between two
probability distributions.
• For this, we use KL divergence (relative entropy).
• The KL divergence produces a number indicating how close two
distributions are to each other.
• The closer two distributions get to each other, the lower the KL
divergence becomes.
• The closer two distributions get to each other, the lower the loss
becomes.
• In the following graph, the blue distribution is trying to model the
green distribution.
• As the blue distribution comes closer and closer to the green one,
the KL divergence loss will get closer to zero.

Figure KL divergence between two probability distributions


6.7. Variational Autoencoders…

6.4. Variational Autoencoders…
• KL divergence metric is a statistical measurement from
information theory that is commonly used to quantify the
difference between one probability distribution from a reference
probability distribution.
• It can be thought of as measuring the distance between two data
distributions showing how different the two distributions are from
each other.
• Example:
x 1 2 3

Distribution p(x)

Distribution q(x)
• x 1 2 3

Distribution p(x)

Distribution q(x)
• Now we can proceed to the formulation of loss function.
• The loss function of the VAE is the negative log-likelihood with a
regularizer.
• The loss function of VAE is a combination of two terms:
– Reconstruction loss: This term measures how well the VAE can
reconstruct the input data from the latent representation.
– KL divergence loss: This term measures how close the latent
representation is to a standard normal distribution. A
commonly used loss is the Kullback–Leibler divergence
between the latent representation and a standard normal
distribution.
• For the regularization, we use an expression meant to nudge the
distribution of the encoder output towards standard normal
distribution centered around 0.
• This provides the encoder with a sensible assumption about the
structure of the latent space it is modeling.

6.4. Variational Autoencoders…
• The second term is a regularizer that we throw in.
• This is the KL divergence between the encoder’s distribution q∅
(z∣x) and 𝑝𝜃(z) where 𝑝𝜃(z) is a standard normal distribution
(μ=0, σ2=1).
• Here, p(z) is a standard normal distribution i.e. p(z) = N(μ=0,
σ2=1).
• This divergence measures how much information is lost when
using q to represent p.
• It is one of how close q is to p.
• In VAE, p is specified as a standard normal distribution with
mean 0 and variance 1, or p𝜃(z) = N(0, 1).
• If the encoder outputs representations z that are different from a
standard normal distribution, it will receive a penalty in the loss.
• This regularizer term means ‘keep the representations z of each
digit sufficiently diverse’.
• If we didn’t include the regularizer, the encoder could learn to
cheat and give each datapoint a representation in a different region
of Euclidean space.
• This is bad, because then two images of the same number, say
digit 2 written by two different people, 2bob and 2alice, could end up
with very different representations zbob , zalice.
• We want the representation space of z to be meaningful, so we
penalize this behavior.
• This has the effect of keeping similar numbers’ representations
close together (so the representations of the digit 2 of zalice, zbob,
zali remain sufficiently close).

Fig Different samples of handwritten digits in MNIST


6.4. Variational Autoencoders…
• After training is complete, by sampling from the latent space, we
can use the decoder network to form a generative model.
• This generative model is capable of creating new data similar to
what was observed during training.
• Specifically, we will sample from the prior distribution p(z) which
follows a unit Gaussian distribution.
• The figure below visualizes the data generated by the decoder
network of a VAE trained on the MNIST handwritten digits
dataset.
• Here, we have sampled a grid of values from a two-dimensional
Gaussian and displayed the output of the decoder network.
• As you can see, the distinct digits each exist in different regions of
the latent space and smoothly transform from one digit to another.
• VAE makes sure that the latent space has a Gaussian distribution,
so that by gradually moving from one point of latent space to its
neighbor, we get a meaningful gradually changing output:

Figure Sampling from nearby points of VAE latent space produces similar output images
• There are plenty of further improvements that can be made over
the variational autoencoder.
• We could replace the standard fully-connected dense
encoder-decoder with a convolutional-deconvolutional
encoder-decoder pair to produce great synthetic human face
photos.

Figure Generating celebrity-lookalike photos


• The VAE does not simply try to embed the data in the latent
space, but instead to characterize the latent space as a feature
landscape, a process which conditions the latent space to be
sufficiently well-behaved for data generation.
• Not only can we use this landscape to generate new data, but we
can even modify the salient features of input data.
• We can control, for example, not only whether a face in an image
is smiling, but also the type and intensity of the smile.

Figure control the amount of smile


6.5. Applications of Autoencoders
• Autoencoders have various applications like:
1. Anomaly Detection:
• Autoencoders can be used for anomaly detection.
• For example, consider an autoencoder that has been trained on a
specific dataset P.
• For any image sampled from the training dataset P, the autoencoder
will give a low reconstruction loss and is supposed to reconstruct the
image as is.
• For any image which is not present in the training dataset, however,
the autoencoder cannot perform the reconstruction, as the latent
attributes are not adapted for the specific image that has never been
seen by the network.
• As a result, the outlier image gives off a very high reconstruction loss
and can easily be identified as an anomaly with the help of a proper
threshold.
• Autoencoders are commonly used in systems in which we know
what the normal data will look like, yet it’s difficult to describe
what is anomalous.
• Autoencoders can identify data anomalies using a loss function
that penalizes high loss.
• It can be helpful for anomaly detection in financial markets,
where you can use it to identify unusual activity and predict
market trends.
• Some examples of how VAEs are used for anomaly detection :
– Fraud detection: VAEs can be used to detect fraudulent transactions
in financial data.
– Medical diagnosis: VAEs can be used to detect anomalies in medical
images, such as X-rays and MRI scans.
– Network intrusion detection: VAEs can be used to detect network
intrusions in network traffic data.
– Quality control: VAEs can be used to detect defects in products and
manufacturing processes.
2. Denoising image and audio
• Autoencoders can help clean up noisy pictures or audio files.
• Autoencoders like the denoising autoencoder can be used for
performing efficient and highly accurate image or audio
denoising.
• Unlike traditional methods of denoising, autoencoders do not
search for noise, they extract the image or audio from the noisy
data that has been fed to them via learning a representation of it.
• The representation is then decompressed to form a noise-free
image or audio.
• Denoising autoencoders thus can denoise complex images or
audio that cannot be denoised via traditional methods.
3. Image inpainting
• Autoencoders have been used to fill in gaps in images by learning
how to reconstruct missing pixels based on surrounding pixels.
• For example, if you are trying to restore an old photograph that is
missing part of its right side, the autoencoder could learn how to
fill in the missing details based on what it knows about the rest of
the photo.
4. Dimensionality reduction:
• Undercomplete autoencoders are those that are used for
dimensionality reduction.
• These can be used as a pre-processing step for dimensionality
reduction as they can perform fast and accurate dimensionality
reductions without losing much information.
• It can be used for dimensionality reduction by learning a
lower-dimensional representation of the input data in the latent
space.
6.5. Applications of Autoencoders…
• Furthermore, while dimensionality reduction procedures like PCA
can only perform linear dimensionality reductions, undercomplete
autoencoders can perform large-scale non-linear dimensionality
reductions.
5. Generation of image and time series data
• Variational autoencoders can be used to generate both image and
time series data.
• The parameterized distribution at the bottleneck of the
autoencoder can be randomly sampled to generate discrete values
for latent attributes, which can then be forwarded to the decoder,
leading to generation of image data.
• VAEs can also be used to model time series data like music.
6.5. Applications of Autoencoders…
6. Information retrieval:
• Autoencoders can be used as content-based image retrieval systems
that allow users to search for images based on their content.
• Variational autoencoders can be used for image retrieval by
learning a latent code representation of the images that captures
their visual features.
• Once the VAE is trained, it can be used to retrieve images that are
similar to a query image by finding the images with the most
similar latent code representations.
• To retrieve images that are similar to a query image, the query
image is first encoded by the VAE to produce a latent code
representation.
• The latent code representation of the query image is then compared
to the latent code representations of the images in the search
database.
• Here are some examples of how VAEs have been used for image
retrieval in the real world:
– Google Photos uses VAEs to retrieve images that are similar to a
query image.
– Pinterest uses VAEs to retrieve images that are similar to a user's
interests.
– Shutterstock uses VAEs to retrieve images that are similar to a user's
search terms.
7. Image Colorization
• One of the applications of autoencoders is to convert a black and
white picture into a colored image.
• We would like to replicate the human abilities in identifying that
the sea and sky are blue, the grass field and trees are green, while
clouds are white, and so on.
• As shown in the figure below, if we are given a grayscale photo of
a rice field on the foreground, a volcano in the background and
sky on top, we are able to add the appropriate colors.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy