4 LSTM Gru
4 LSTM Gru
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 2
LSTM and GRU : Introduction
• Consider a scenario where we have to evaluate the
The white board analogy: expression on a whiteboard:
Evaluate “ac(bd+a) + ad”
given that a= 1, b= 3, c= 5, d=11
• Normally, the evaluation in white board would look
like:
ac = 5
bd = 33
bd + a = 34
ac(bd + a) = 170
ad = 11
ac(bd + a) + ad = 181
Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras • Now, if the white board has space to accommodate
only 3 steps, the above evaluation cannot fit in the
required space and would lead to loss of information.
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 3
LSTM and GRU : Introduction
ac = 5
bd = 33
bd + a = 34
ac(bd + a) = 170
ad = 11
ac(bd + a) + ad = 181
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 4
LSTM and GRU : Introduction
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 5
LSTM and GRU : Introduction
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 6
LSTM and GRU : Introduction
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 7
LSTM and GRU : Introduction
bd + a = 34
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 8
LSTM and GRU : Introduction
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 9
LSTM and GRU : Introduction
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 10
LSTM and GRU : Introduction
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 11
LSTM and GRU : Introduction
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 12
LSTM and GRU : Introduction
Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras
Selectively write:
Selectively forget
• How do we combine st-1 and to get the new
state?
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 15
LSTM and GRU : Introduction
Selectively forget • But we may not want to use the whole of st-1
• How do we combine st-1 and to get the new but forget some parts of it.
state?
• To do this a forget gate is introduced:
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 16
LSTM (Long Short-Term Memory)
Long-term memory
Short-term memory
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 17
LSTM (Long Short-Term Memory)
• LSTM has many variants which include different number of gates and also different arrangement of gates.
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 18
LSTM Cell
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 19
LSTM Cell
Source: Aurélien Géron. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. " O'Reilly Media, Inc.", 2019.
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 20
LSTM computations
needed.
Wxi, Wxf, Wxo, Wxg are the weight matrices of each of the four layers for their connection to the input
vector x(t).
Whi, Whf, Who, and Whg are the weight matrices of each of the four layers for their connection to the
previous short-term state h(t–1).
bi, bf, bo, and bg are the bias terms for each of the four layers.
Source: Aurélien Géron. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. " O'Reilly Media, Inc.", 2019.
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 21
Gated Recurrent Unit (GRU)
Gates: States:
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 22
Gated Recurrent Unit (GRU)
-1
Gates: States:
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 23
Gated Recurrent Unit (GRU)
-1
Gates: States:
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 24
Gated Recurrent Unit (GRU)
-1
Gates: States:
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 25
Gated Recurrent Unit (GRU)
-1
Gates: States:
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 26
Gated Recurrent Unit (GRU)
Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras
-1
Gates: States:
No explicit forget gate (the forget gate and input gates are tied)
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 27
Gated Recurrent Unit (GRU)
Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras
-1
Gates: States:
The gates depend directly on st-1 and not the intermediate ht-1 as in the case of LSTMs
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 28
Gated Recurrent Unit CELL (Kyunghyun Cho et al, 2014)
The main simplifications of LSTM are:
• Both state vectors (short and long term) are
merged into a single vector h(t).
Source: Aurélien Géron. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. " O'Reilly Media, Inc.", 2019.
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 29
LSTM vs GRU computation
• They prevent any irrelevant information from being written to the state.
• It is easy to see that during backward pass the gradients will get multiplied by the gate.
• If the state at time t-1 did not contribute much to the state at time t then during backpropagation the gradients
flowing into st-1 will vanish
• The key difference from vanilla RNNs is that the flow of information and gradients is controlled by the gates which
ensure that the gradients vanish only when they should.
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 31
Different RNNs
Vanilla RNNs
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 32
Different RNNs
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 33
Different RNNs
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 34
Different RNNs
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 35
Different RNNs
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 36
Deep RNNs
Source: Deep Recurrent Neural Networks — Dive into Deep Learning 1.0.0-alpha1.post0 documentation (d2l.ai)
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 37
Bi-Directional RNNs: Intuition
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 38
Bi-Directional RNNs: Intuition
• The o/p at the third time step (where input is the string “apple”) depends on only previous two i/ps
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 39
Bi-Directional RNNs
• Adding an additional backward layer with connections as shown above makes the o/p at a time step depend on both
previous as well as future i/ps.
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 40
Bi-directional RNNs
▪ I am ___ hungry.
Source: Bidirectional Recurrent Neural Networks — Dive into Deep Learning 1.0.0-alpha1.post0 documentation (d2l.ai)
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 41
Bi-directional RNN computation
where,
A = activation function,
W = weight matrix
b = bias
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 42
Generating Shakespearan Text using a Character RNN
▪ Stateless RNNs
▪ at each training iteration the model starts
with a hidden state full of 0s
▪ Update this state at each time step
▪ Discards the output at the final state
when moving onto next training batch
▪ Stateful RNN
▪ Uses sequential nonoverlapping input
sequences
▪ Preserves the final state after processing
one training batch
▪ use it as initial state for next training
batch
▪ Model will learn long-term patterns
despite only backpropagating through
short sequences
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 44