0% found this document useful (0 votes)

0 views105 pages

Unit 4 Notes

The document discusses various types of autoencoders, including their relationship to PCA, regularization techniques, and specific types like denoising, sparse, and contractive autoencoders. It also covers Recurrent Neural Networks (RNNs), addressing issues like vanishing and exploding gradients, and introduces advanced architectures such as Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU). The text emphasizes the importance of these models in processing sequential data and their mechanisms for maintaining long-term dependencies.

Uploaded by

nainalashalini

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views105 pages

Unit 4 Notes

Uploaded by

nainalashalini

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 105

UNIT-IV

Auto encoders: relation to PCA, Regularization in

auto encoders, Denoising auto encoders, sparse
auto encoders, Contractive auto encoders.

Recurrent Neural Networks: Vanishing and

Exploding Gradients, GRU, LSTMs. Encoder
Decoder Models,
Attention Mechanism.
Introduction to Autoencoders
Link between PCA and Autoencoders
1. Linear Encoder and Linear Decoder

The encoder’s job is to compress the data into a simpler form (latent
space), and the decoder tries to reconstruct the original data from that
compressed version.
• When both the encoder and decoder are linear, the
transformation is simple, like rotating and scaling the data, without
adding any curves or complex mappings.
• PCA also works through linear transformations, finding the best
"directions" (principal components) to project the data onto.
• This linear setup ensures that the autoencoder behaves similarly
to PCA because both are trying to find the best linear way to
represent the data
2. Squared Error Loss Function

The squared error measures how far the reconstructed data is from the
original data.
• The autoencoder aims to minimize this error, making sure the
reconstructed version is as close as possible to the original.
• In PCA, we also want to project the data in such a way that the
variance is maximized, which indirectly minimizes reconstruction
error when projecting back.
• So, using squared error aligns the autoencoder’s goal with PCA’s
goal of preserving data structure.
3. Normalizing Inputs

Data often comes with different scales—some features might range

from 0 to 1, while others might range from 0 to 1000.
• Normalization (centering the data) ensures that no single feature
dominates the variance simply because of its scale.
• PCA works best when the data is centered because it focuses on
how the data varies rather than its absolute values.
• Without normalization, both PCA and the autoencoder could
produce misleading results, as the variance could be biased by the
scale of the features.
In PCA, the principal components are the directions in which the data has the
most variance.
• When you train the autoencoder under the conditions above, the encoder
learns to find these same directions because it’s trying to compress the
data while keeping as much important information as possible.
• This is why, after training, the encoder’s weights are essentially the same as
the principal eigenvectors from PCA.

PCA and Autoencoders are Trying to Solve the Same Problem:

Both methods aim to reduce the data’s dimensionality without losing important
information.
•PCA does this by projecting data onto new axes (principal components) that
capture the most variance.
•Autoencoders do this by learning how to compress the data and still be able to
reconstruct it accurately.
Regularization in autoencoders

• Regularization is to provide Generalization.

• Overfitting leads to less generalization.
• Overfitting happens when you have a large number of parameters.
Denoising Autoencoders
30/55

We will now see a practical application in which AEs are used and then
compare Denoising Autoencoders with regular autoencoders
Sparse Autoencoders
A Sparse Autoencoder is a type of autoencoder where the key idea is to
force most of the neurons in the hidden layer to be inactive (or
“sparse”).
This helps the model learn more meaningful features from the data,
especially when dealing with high-dimensional data like images, text, or
audio.
Contractive Autoencoders (CAE):
A Contractive Autoencoder (CAE) is a variation of the traditional autoencoder,
designed to learn robust, invariant features by adding a contractive penalty to
the loss function.
This penalty encourages the model to be less sensitive to small changes in the
input, making it excellent for tasks where data can be noisy or slightly distorted.

The model learns features that are stable even if the input data has slight
variations (like noise, distortions, or small transformations).

Unlike regular autoencoders that might overfit to noise, CAEs focus on the
underlying structure of the data.

This idea mimics how the brain works—neurons respond to features consistently,
even if the environment changes slightly.
Recurrent Neural Networks: Vanishing and Exploding
Gradients, GRU, LSTMs. Encoder Decoder Models,
Attention Mechanism.
Sequence Learning Problems
Recurrent Neural Networks
How do we model such tasks involving sequences ?
RNN Structure
After a model is Prepared we need to train and test the model .

Backpropagation through time

The problem of Exploding and Vanishing Gradients
• But when training RNNs using Backpropagation Through Time (BPTT), we often run into two
major issues:
•Vanishing gradients
•Exploding gradients
•During backpropagation, the gradients (used to update the weights) become smaller and
smaller as they move backward through time. Eventually, they shrink so much that earlier
layers stop learning.
•The model forgets long-term dependencies.
•It becomes very hard for the RNN to learn relationships between distant inputs and outputs in
a sequence.
•Solution:
•Use ReLU activation (instead of sigmoid/tanh)
•Use Long Short-Term Memory (LSTM) or Gated Recurrent Units (GRU) which are designed to
preserve long-term dependencies.
•Layer normalization.
• Sometimes, gradients become extremely large during backpropagation.
• This happens when weights are large or the network is deep in time (many time steps).

• Causes numerical instability.

• Weight updates become too big, leading the model to diverge instead of converge.

• Loss becomes NaN or the model crashes.

• Solution:
• Gradient clipping: Limits the size of gradients during training.

• Use smaller learning rates.

• Proper weight initialization.

• Summary:
• RNNs Recurrent Neural Networks are a type of neural network that are
designed to process sequential data.
• They can analyze data with a temporal dimension, such as time series,
speech, and text.
• RNNs can do this by using a hidden state passed from one timestep to
the next.
• they remember the previous information and use it for processing the current input.
• The hidden state is updated at each timestep based on the input and
the previous hidden state.
• RNNs are able to capture short-term dependencies in sequential data,
but they struggle with capturing long-term dependencies..
• Due to the vanishing gradient problem, where gradients diminish as they
propagate through many time steps.
Selective Read, Selective Write, Selective Forget - The
Whiteboard Analogy
RNN also has a ﬁnite state
size, we need to ﬁgure out a way to
allow it to selectively read, write and
forget
• Long Short-Term Memory (LSTM) is a type of recurrent neural
network (RNN) architecture designed to address the vanishing
gradient problem, enabling it to effectively learn and retain
information over long sequences, making it suitable for tasks
like natural language processing and time series analysis.

• The structure of an LSTM network consists of a series of LSTM cells,

each of which has a set of gates (input, output, and forget gates)
that control the flow of information into and out of the cell.
• [LSTM= Memory + Gate Mechanisms]
• The gates are used to selectively forget or retain information from
the previous time steps, allowing the LSTM to maintain long-term
dependencies in the input data.
• The first part chooses whether the information coming from the previous
timestamp is to be remembered or is irrelevant and can be forgotten.
• In the second part, the cell tries to learn new information from the input to
this cell.
• At last, in the third part, the cell passes the updated information from the
current timestamp to the next timestamp.
• Just like a simple RNN, an LSTM also has a hidden state where
H(t-1) represents the hidden state of the previous timestamp and Ht
is the hidden state of the current timestamp.
• In addition to that, LSTM also has a cell state represented by C(t-1)
and C(t) for the previous and current timestamps, respectively.
• Here the hidden state is known as Short term memory, and the cell
state is known as Long term memory.It is interesting to note that the cell state carries the
information along with all the timestamps.
we move from the first sentence to the second sentence, our
network should realize that we are no more talking about
Bob. Now our subject is Dan. Here, the Forget gate of the
network allows it to forget about it.
Forget Gate
In a cell of the LSTM neural network, the first step is to decide
whether we should keep the information from the previous time step
or forget it.
Here is the equation for forget gate. •Xt: input to the current
timestamp.
•Uf: weight associated with
the input
•Ht-1 The hidden state of
the previous timestamp
•Wf: It is the weight matrix
associated with the hidden
state
Later, a sigmoid function is applied to it. That will make ft a
number between 0 and 1.
This ft is later multiplied with the cell state of the previous
timestamp, as shown below.
• Input Gate
• Letʼs take another example.
• “Bob knows swimming. He told me over the phone that he had served the
navy for four long years.ˮ
• Now just think about it, based on the context given in the first sentence,
which information in the second sentence is critical?
• The fact that he was in the navy is important information, and this is something we want
our model to remember for future computation.
•
• The input gate is used to quantify the importance of the new information carried by the
input. Here is the equation of the input gate
•Xt: Input at the current timestamp t
•Ui: weight matrix of input
•Ht-1 A hidden state at the previous timestamp
•Wi: Weight matrix of input associated with hidden
state

Again we have applied the sigmoid function over it. As a result, the value of I at timestamp t will be
between 0 and 1
New
Information

• the new information that needed to be passed to the cell state is a

function of a hidden state at the previous timestamp t-1 and input x
at timestamp t.
• The activation function here is tanh.
• Due to the tanh function, the value of new information will be
between 1 and 1.
• If the value of Nt is negative, the information is subtracted from the
cell state, and if the value is positive, the information is added to the
cell state at the current timestamp.
However, the Nt wonʼt be added directly to the cell state. Here comes the
updated equation:

Here, Ct-1 is the cell state at the current timestamp, and the others are the values
we have calculated previously.
• Output Gate
• “Bob single-handedly fought the enemy and died for his country.
For his contributions, brave______.ˮ

based on the current expectation, we have to give a relevant word

to fill in the blank. That word is our output, and this is the function
of our Output gate.
• Its value will also lie between 0 and 1 because of this sigmoid
function.

• Now to calculate the current hidden state, we will use Ot and tanh
of the updated cell state.

• It turns out that the hidden state is a function of Long term memory Ct and the
current output.
• If you need to take the output of the current timestamp, just apply the SoftMax
activation on hidden state Ht.
Here the token with the
maximum score in the output is
the prediction.
memory cell that stores information from previous time steps and uses it to
inﬂuence the output of the cell at the current time step.
The output of each LSTM cell is passed to the next cell in the network, allowing the
LSTM to process and analyze sequential data over multiple time steps.
How LSTMs avoid the problem of vanishing gradients
Limitations of Standard RNN

•Vanishing Gradient problem : occurs when processing long sequences.

• the gradients used to update the network weights become very small (vanish).
•This makes it difficult for the network to learn long-term dependencies in the
data.
•Exploding Gradients: occur when the gradients become very large during
backpropagation.
•This can lead to unstable training and prevent the network from converging to
an optimal solution.
•Limited Memory: Standard RNNs rely solely on the hidden state to capture
information from previous time steps.
•This hidden state has a limited capacity, making it difficult for the network to
remember information over long sequences.
•Difficulty in Training: Due to vanishing/exploding gradients and limited
memory, standard RNNs can be challenging to train, especially for complex
tasks involving long sequences.
GRU or Gated recurrent unit is an advancement of the standard
RNN
• GRUs are very similar to Long Short Term Memory(LSTM).

• LSTM, GRU uses gates to control the flow of information.

• They are relatively new as compared to LSTM.
• This is the reason they offer some improvement over LSTM and have simpler
architecture.
• it does not have a separate cell state (Ct). It only has a hidden state(Ht)- GRUs
are faster to train
The Architecture of Gated Recurrent
Unit
GRU cell which more or less similar to an LSTM cell or RNN cell.

• At each timestamp t, it takes an input Xt and the hidden state Ht-1 from the
previous timestamp t-1.
• Later it outputs a new hidden state Ht which again passed to the next
timestamp
• two gates in a GRU as opposed to three gates in an LSTM cell.
• i) Reset Gate (Short term memory)
• Ii)Reset
The Update Gate
gate can (Long
discard Term past
irrelevant memory)
information, and the Update gate controls the balance
between keeping past information and incorporating new information.

• i) Reset Gate: responsible for the short-term memory

The valueof
of the network
rt will i.e the
range from 0 to hidden state
1 because of the
(Ht). sigmoid function.
Here Ur and Wr are weight matrices for the reset
gate.
Update Gate Long Term memory) : Update gate for long-term
memory

GRU Works ?

•The GRU takes two inputs as vectors: the current input X_t and the
previous hidden state (h_(t-1)).
• we perform an element-wise multiplication (like a dot
product for each element) between the current input and
the previous hidden state vectors.

• This is done separately for each gate, essentially creating

“parameterizedˮ versions of the inputs specific to each
gate.

• we apply an activation function (a function that transforms the

values) element-wise to each element in these parameterized
vectors.

• This activation function typically outputs values between 0 and 1,

To find the Hidden state Ht in GRU, it follows a two-step process.

• Step 1 It takes in the current input and the hidden state from the
previous timestamp t-1 which is multiplied by the reset gate output
rt.

• Later passed this entire information to the tanh function, the

resultant value is the candidateʼs hidden state.

• how we are using the value of the reset gate to control how much influence
the previous hidden state can have on the candidate state. Selective read
• rt is equal to 1 then entire information from the previous hidden state Ht-1 is being
considered else ignored .
Step 2 Once we have the candidate state, it is used to generate the current
hidden state Ht using Update gate.

instead of using a separate gate like in LSTM and GRU Architecture we use a
single update gate to control both the historical information which is Ht-1
as well as the new information which comes from the candidate state.
Selective forget and Selective write

• Now assume the value of ut is around 0 then the first term in the equation will
vanish which means the new hidden state will not have much information from
the previous hidden state.
• On the other hand, the second part becomes almost 1 that essentially means
the hidden state at the current timestamp will consist of the information from
the candidate state only.
• if the value of ut is on the second term will become entirely 0 and
the current hidden state will entirely depend on the first term

• i.e the information from the hidden state at the previous timestamp
t-1.

• the value of ut is very critical in this equation and it can range

from 0 to 1.
How GRU Solve the Limitations of Standard
RNN
• GRUs use special gates Update gate and Reset gate) to control the flow of information within
the network.
These gates act as filters, deciding what information from the past to keep, forget, or update.
• By selectively allowing relevant information through the gates, GRUs prevent gradients from vanishing entirely.
This allows the network to learn long-term dependencies even in long sequences.
Encoder-Decoder
Model
• it is a Deep Learning model composed of two neural
networks.
• These two neural networks usually have the same structure.
• The first one will be used normally but the second one will work in
reverse
• A first neural network will take a sentence as input to output a
sequence of numbers.
• The second network will take this sequence of numbers as input to
output a sentence this time!
• In fact these two networks do the same thing.
• But one is used in the normal direction and the other in the
opposite.
• if we translate a sentence directly with the classical
approach, the network would translate word by word
without caring about the global meaning of the
sentence.
• Ex:if we have the sentence “prendre une expression au pied de la
lettre” and we translate it word by word, we would obtain: “take an
expression at the foot of the letter”.
• Indeed, the structure of the encoder allows to extract the meaning
of a sentence.
• It stores the extracted information in a vector (the result of the
encoder).
• Then, the decoder analyzes the vector to produce its own version of
the sentence.
Different Forms of Encoder-Decoder Models
What is the diﬀerence between encoder, decoder and
encoder-decoder in a neural network?

• would use an encoder when you wanted to compress the input

data, a decoder when you needed to produce some output based
on the input,

• and an encoder-decoder when you needed to change the format

of a sequence of data.
Applications of encoder and decoder architecture in a neural network

Encoder
•Image classiﬁcation
•Speech recognition:

•Decoder
•Text generation:
•Image creation:

•Encoder-decoder
•Text translation from one lang to another
•Summarisation:
Encoder-decoder models, particularly in the context of recurrent neural networks (RNNs), are a
framework commonly used for sequence-to-sequence tasks, such as machine translation, text
summarization, and speech recognition. Here's a breakdown of how they work:
Structure
1.Encoder:
- The encoder processes the input sequence (e.g., a sentence in a source language) and converts it
into a fixed-size context vector (also called the state vector).
- It typically consists of one or more RNN layers (e.g., LSTM or GRU) that read the input sequence
step-by-step. Each input element is passed through the RNN, updating its hidden state.
- The final hidden state of the encoder represents the entire input sequence.
2.Decoder:
- The decoder is another RNN that generates the output sequence (e.g., a translated sentence) from
the context vector produced by the encoder.
- It starts with the context vector and generates one token at a time. Each output token is fed back into
the decoder along with the previous hidden state to produce the next token.
- The decoder can also use techniques like attention mechanisms to focus on different parts of the input
sequence during generation, improving performance.
Training
•The model is typically trained using pairs of input and output sequences. During training, the decoder
often uses the true output sequence (teacher forcing) to predict the next token, which helps in learning
the sequence generation better.
Applications
•Machine Translation: Translating sentences from one language to another.
•Text Summarization: Generating concise summaries of longer texts.
•Speech Recognition: Converting audio sequences into text sequences.
Advantages
•Handling Variable-Length Sequences: Encoder-decoder models can process input and output
sequences of varying lengths, making them suitable for many NLP tasks.
•Context Awareness: The context vector captures information from the entire input sequence, allowing
the decoder to generate coherent outputs.
Limitations
•Fixed-Length Context Vector: Traditional implementations use a fixed-size context vector, which can
struggle with longer sequences. This limitation has led to the development of attention mechanisms and
transformer models, which can better manage longer dependencies.
Drawbacks of the Encoder-Decoder Approach RNN/LSTM):
1.Dependence on Encoder Summary:
1. If the encoder produces a poor summary, the translation will also be poor.
2. This issue worsens with longer sentences.
2.Long-Range Dependency Problem:
1. RNNs/LSTMs struggle to understand and remember long sentences.
2. This happens due to the vanishing/exploding gradient problem, making it hard to retain
distant information.
3. They perform better on recent inputs but forget earlier parts.
3.Performance Degradation with Longer Inputs:
1. Even the original creators Cho et al., 2014) showed that translation quality drops as
sentence length increases.
4.LSTM Limitations:
1. While LSTMs handle long-range dependencies better than RNNs, they still fail in certain
cases.
2. They can become “forgetfulˮ over long sequences.
5.Lack of Selective Attention:
1. The model cannot focus more on important words in the input while translating.
2. All words are treated equally, even if some are more critical for translation.
Attention Mechanism.
what is the color of the soccer
ball? Also, which Georgetown
player, the guys in white, is
wearing the captaincy band?
• Attention mechanisms enhance deep
learning models by selectively focusing on
important input elements, improving
prediction accuracy and computational
efficiency.
• They prioritize and emphasize relevant
information, acting as a spotlight to enhance
overall model performance.
• In psychology, attention is the cognitive
process of selectively concentrating on one
or a few things while ignoring others.
How Attention Mechanism Works?

1.Breaking Down the Input: Letʼs say you have a bunch of words (or any kind of
data) that you want the computer to understand. First, it breaks down this input
into smaller pieces, like individual words.
2.Picking Out Important Bits: Then, it looks at these pieces and decides which
ones are the most important. It does this by comparing each piece to a question
or ‘queryʼ it has in mind.
3.Assigning Importance: Each piece gets a score based on how well it matches
the question. The higher the score, the more important that piece is.
4.Focusing Attention: After scoring each piece, it figures out how much attention
to give to each one. Pieces with higher scores get more attention, while less
important ones get less attention.
5.Putting It All Together: Finally, it adds up all the pieces, but gives more weight
to the important ones. This way, the computer gets a clearer picture of whatʼs
most important in the input.
Advantages of Attention Mechanism in Deep Learning Models
The attention mechanism in deep learning models has multiple beneﬁts, including enhanced
performance and versatility across a variety of tasks. The following are some of the primary beneﬁts
of attention mechanisms:
1.Selective Information Processing: The attention mechanism enables the model to concentrate on
select parts of the input sequence, emphasizing critical information while potentially ignoring less
significant bits.
2.Improved Model Interpretability: Through attention weights, the Attention Mechanism reveals
which elements of the input data are considered relevant for a given prediction, improving model
interpretability and assisting practitioners and stakeholders in understanding and believing model
judgments.
3.Capturing Long-Range Dependencies: It tackles the challenge of capturing long-term
dependencies in sequential data by allowing the model to connect distant pieces, boosting the
model’s ability to recognize context and relationships between elements separated by substantial
distances.
4.Transfer Learning Capabilities: It aids in knowledge transfer by allowing the model to focus on
relevant aspects when adapting information from one task to another. This improves the model’s
adaptability and generalizability across domains.
5.Efficient Information Processing: It enables the model to process relevant information
selectively, decreasing computational waste and enabling more scalable and efficient learning,
improving the model’s performance on large datasets and computationally expensive tasks.
Drawbacks :
1.Computational Complexity: Attention processes can greatly increase a model’s
computational complexity, particularly when dealing with long input sequences.
Because of the increasing complexity, training and inference periods may be longer,
making attention-based models more demanding of resources.
2.Dependency on Model Architecture: The overall model design and the job at hand
can influence the effectiveness of attention mechanisms. Attention mechanisms do not
benefit all models equally, and their influence varies among architectures.
3.Overfitting Risks: Overfitting can also affect attention mechanisms, especially when
the number of attention heads is significant. When there are too many attention heads
in the model, it may begin to memorize the training data rather than generalize to new
data. As a result, performance on unseen data may suffer.
4.Attention to Noise: Attention mechanisms may pay attention to noisy or irrelevant
sections of the input, particularly when the data contains distracting information. This
can result in inferior performance and necessitates careful model adjustment.
THE END

Sequence Models231205
No ratings yet
Sequence Models231205
72 pages
Unit IV
No ratings yet
Unit IV
22 pages
RNN LSTM GRU Transformers
0% (1)
RNN LSTM GRU Transformers
123 pages
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
No ratings yet
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
61 pages
DL Mod4
No ratings yet
DL Mod4
105 pages
DL - Intro
No ratings yet
DL - Intro
35 pages
cs224n spr2024 Lecture06 Fancy RNN
No ratings yet
cs224n spr2024 Lecture06 Fancy RNN
56 pages
42 Recurrent Neural Networks and LSTM
No ratings yet
42 Recurrent Neural Networks and LSTM
68 pages
RNN StannfordBased
No ratings yet
RNN StannfordBased
102 pages
Unit IV V Deep Learning Material
No ratings yet
Unit IV V Deep Learning Material
32 pages
Unit 4
No ratings yet
Unit 4
86 pages
UNIT 1 Introduction Part 1
No ratings yet
UNIT 1 Introduction Part 1
37 pages
RNN 2
No ratings yet
RNN 2
144 pages
598 114 216 Recurrent Neural Networks
No ratings yet
598 114 216 Recurrent Neural Networks
87 pages
Chapter 2
No ratings yet
Chapter 2
68 pages
GraphX & Graph Analytics
No ratings yet
GraphX & Graph Analytics
61 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
34 pages
Domande ANN
No ratings yet
Domande ANN
28 pages
RNN and LSTM - Explanation by Example
No ratings yet
RNN and LSTM - Explanation by Example
56 pages
Chap 10-2 Sequence Modeling Recurrent and Recursive Net-Hyun-Lim Yang
No ratings yet
Chap 10-2 Sequence Modeling Recurrent and Recursive Net-Hyun-Lim Yang
39 pages
CE6146 Lecture 4
No ratings yet
CE6146 Lecture 4
53 pages
ch6 RNN
No ratings yet
ch6 RNN
25 pages
Deep Learning L3
No ratings yet
Deep Learning L3
37 pages
Introduction To Rnns
No ratings yet
Introduction To Rnns
48 pages
UNIT-3 Sequence Modeling
No ratings yet
UNIT-3 Sequence Modeling
20 pages
CNN RNN LSTM Attention
No ratings yet
CNN RNN LSTM Attention
86 pages
21458.basic and Advanced Regulatory Control System Design and Application PDF
100% (7)
21458.basic and Advanced Regulatory Control System Design and Application PDF
390 pages
Aids Ii
No ratings yet
Aids Ii
42 pages
DL Module 5
No ratings yet
DL Module 5
10 pages
RNN LSTM
No ratings yet
RNN LSTM
72 pages
AN2DL 04 2324 RecurrentNeuralNetworks
No ratings yet
AN2DL 04 2324 RecurrentNeuralNetworks
34 pages
Long Short Term Memory: Fundamentals and Applications for Sequence Prediction
From Everand
Long Short Term Memory: Fundamentals and Applications for Sequence Prediction
Fouad Sabry
No ratings yet
December Deep Learning
No ratings yet
December Deep Learning
10 pages
Module 4
No ratings yet
Module 4
14 pages
Lecture 11
No ratings yet
Lecture 11
57 pages
PP Lab Manual
No ratings yet
PP Lab Manual
38 pages
Dis6 Sol
No ratings yet
Dis6 Sol
6 pages
UNIT-5 Foundations of Deep Learning
No ratings yet
UNIT-5 Foundations of Deep Learning
9 pages
Bidirectional RNN and RVNN
No ratings yet
Bidirectional RNN and RVNN
15 pages
Design And Analysis Of Algorithm
From Everand
Design And Analysis Of Algorithm
Bhupendra Mandloi
No ratings yet
Introtodeeplearning MIT 6.S191
No ratings yet
Introtodeeplearning MIT 6.S191
36 pages
PP Handout 1
No ratings yet
PP Handout 1
29 pages
Transformer Construction and Working Principle
No ratings yet
Transformer Construction and Working Principle
14 pages
PP Handout-3
No ratings yet
PP Handout-3
24 pages
Unit 4 - Machine Learning
No ratings yet
Unit 4 - Machine Learning
16 pages
Unit 3 Questions With Answers Ghanta Ka Password
No ratings yet
Unit 3 Questions With Answers Ghanta Ka Password
20 pages
PP Handout 4
No ratings yet
PP Handout 4
20 pages
ENG6500 8 DL IntroductionToDeepLearning Part2
No ratings yet
ENG6500 8 DL IntroductionToDeepLearning Part2
65 pages
CSE 4237 SoftCom Solutions
No ratings yet
CSE 4237 SoftCom Solutions
115 pages
SAPAnalyticsCloud AnalyticsDesigner DeveloperHandbook
100% (1)
SAPAnalyticsCloud AnalyticsDesigner DeveloperHandbook
344 pages
Chemical Fuels
No ratings yet
Chemical Fuels
18 pages
Chap 7.2 Sequence Analysis Using RNN LSTM
No ratings yet
Chap 7.2 Sequence Analysis Using RNN LSTM
60 pages
ML 5
No ratings yet
ML 5
20 pages
PP Handout 5
No ratings yet
PP Handout 5
16 pages
Slides PyConfr Bordeaux Calcagno
No ratings yet
Slides PyConfr Bordeaux Calcagno
46 pages
Unit 3
No ratings yet
Unit 3
8 pages
Recurrent Neural Networks (RNNS) : A Gentle Introduction and Overview
No ratings yet
Recurrent Neural Networks (RNNS) : A Gentle Introduction and Overview
16 pages
DL Mod 3
No ratings yet
DL Mod 3
4 pages
Sequence Modeling - Recurrent Networks: Biplab Banerjee
No ratings yet
Sequence Modeling - Recurrent Networks: Biplab Banerjee
66 pages
Universal Human Values
No ratings yet
Universal Human Values
6 pages
NN Text Generation Zaid Bouslikhin
No ratings yet
NN Text Generation Zaid Bouslikhin
14 pages
CH4 - AA1.1-Sequence Models
No ratings yet
CH4 - AA1.1-Sequence Models
26 pages
NNDL
No ratings yet
NNDL
10 pages
Assignment 14 Modern AI
No ratings yet
Assignment 14 Modern AI
3 pages
RNN & LSTM: Vamsi Krishna B 1 9 M E 0 2 3
No ratings yet
RNN & LSTM: Vamsi Krishna B 1 9 M E 0 2 3
14 pages
Deep Learning Questions
No ratings yet
Deep Learning Questions
17 pages
RNN
No ratings yet
RNN
22 pages
DL Asmt-2
No ratings yet
DL Asmt-2
17 pages
RNN
No ratings yet
RNN
28 pages
Acme Stamping Steering Bracket Value Stream Improvement Project
No ratings yet
Acme Stamping Steering Bracket Value Stream Improvement Project
1 page
Laplce Trasfoms Bits
No ratings yet
Laplce Trasfoms Bits
5 pages
DIP Lec 1 - 2 Intro
No ratings yet
DIP Lec 1 - 2 Intro
33 pages
PP Assignment-II
No ratings yet
PP Assignment-II
2 pages
CS5560 Lect12-RNN - LSTM
No ratings yet
CS5560 Lect12-RNN - LSTM
30 pages
Unit-V Deep Learning Techniques
100% (1)
Unit-V Deep Learning Techniques
31 pages
HP Aruba Certified Network Security Professional - HPE7-A02 Free Exam Questions (2024) - 1
No ratings yet
HP Aruba Certified Network Security Professional - HPE7-A02 Free Exam Questions (2024) - 1
4 pages
Online Petshop
50% (2)
Online Petshop
60 pages
1692mse r3 (1) .2a Tech&oper Handbook PDF
No ratings yet
1692mse r3 (1) .2a Tech&oper Handbook PDF
164 pages
Quick Guide For Activating eSIM On Samsung Devices
No ratings yet
Quick Guide For Activating eSIM On Samsung Devices
13 pages
Kaza Concrete Brochure 2014 Issuu PDF
No ratings yet
Kaza Concrete Brochure 2014 Issuu PDF
48 pages
Release Yealink MeetingBar
No ratings yet
Release Yealink MeetingBar
10 pages
Task 01
No ratings yet
Task 01
10 pages
Aadhaar Card
No ratings yet
Aadhaar Card
1 page
Act - 01 17 - en CS 1000C
No ratings yet
Act - 01 17 - en CS 1000C
2 pages
Sun and Eames in ST of Energy 1995
No ratings yet
Sun and Eames in ST of Energy 1995
16 pages
Advanced Sharpening Screen (For Photoshop)
No ratings yet
Advanced Sharpening Screen (For Photoshop)
12 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Deep Junior Assistant - (Advt. No - 01 - 11 - 2021)
No ratings yet
Deep Junior Assistant - (Advt. No - 01 - 11 - 2021)
2 pages
Comparison of Electronic Design Automation (EDA) Software
No ratings yet
Comparison of Electronic Design Automation (EDA) Software
7 pages
PQC in A Flash: A Downloadable Mind Map For Post-Quantum Cryptography
No ratings yet
PQC in A Flash: A Downloadable Mind Map For Post-Quantum Cryptography
1 page
What Is Quality Planning - Quality Control Plans - ASQ
No ratings yet
What Is Quality Planning - Quality Control Plans - ASQ
6 pages
Lesson 1-3 Adding Subtracting Integers Worksheets
No ratings yet
Lesson 1-3 Adding Subtracting Integers Worksheets
9 pages
Information Technology Cover Letter Example
No ratings yet
Information Technology Cover Letter Example
2 pages
Abhijeet K Sahu Product Design BIW
100% (1)
Abhijeet K Sahu Product Design BIW
2 pages
Nursery Management System
No ratings yet
Nursery Management System
8 pages
CRM Course Home Assignment 1
No ratings yet
CRM Course Home Assignment 1
9 pages
Database Triggers
100% (4)
Database Triggers
11 pages
Department of Biotechnology: Biotechnology Eligibility Test (BET) 2020
No ratings yet
Department of Biotechnology: Biotechnology Eligibility Test (BET) 2020
2 pages
Guidance For CSDT
No ratings yet
Guidance For CSDT
15 pages
Assignment 3 - State Feedback 1 Two Liquid Tanks in Series: I II I II I I
No ratings yet
Assignment 3 - State Feedback 1 Two Liquid Tanks in Series: I II I II I I
2 pages
Relay Output Module SM 322 DO 16 X Rel. AC 120/230 V (6ES7322-1HH01-0AA0)
No ratings yet
Relay Output Module SM 322 DO 16 X Rel. AC 120/230 V (6ES7322-1HH01-0AA0)
5 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit 4 Notes

Uploaded by

Unit 4 Notes

Uploaded by

UNIT-IV

Auto encoders: relation to PCA, Regularization in

Recurrent Neural Networks: Vanishing and

Data often comes with different scales—some features might range

PCA and Autoencoders are Trying to Solve the Same Problem:

• Regularization is to provide Generalization.

Backpropagation through time

• Causes numerical instability.

• Loss becomes NaN or the model crashes.

• Use smaller learning rates.

• Proper weight initialization.

• The structure of an LSTM network consists of a series of LSTM cells,

• the new information that needed to be passed to the cell state is a

based on the current expectation, we have to give a relevant word

•Vanishing Gradient problem : occurs when processing long sequences.

• LSTM, GRU uses gates to control the flow of information.

• i) Reset Gate: responsible for the short-term memory

• This is done separately for each gate, essentially creating

• we apply an activation function (a function that transforms the

• This activation function typically outputs values between 0 and 1,

• Later passed this entire information to the tanh function, the

• the value of ut is very critical in this equation and it can range

• would use an encoder when you wanted to compress the input

• and an encoder-decoder when you needed to change the format

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.