0% found this document useful (0 votes)

43 views44 pages

4 LSTM Gru

Uploaded by

SHRAVANI ANAND

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views44 pages

4 LSTM Gru

Uploaded by

SHRAVANI ANAND

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

LSTM & GRU

DSE 3151 DEEP LEARNING

Dr. Rohini Rao & Dr. Abhilash K Pai

Dept. of Data Science and Computer Applications
MIT Manipal
LSTM and GRU : Introduction

• The state (si) of an RNN records information from

all previous time steps.

• At each new timestep the old information gets

morphed by the current input.

• After ‘t’ steps the information stored at time step t-

k (for some k < t) gets completely morphed.

• It would be impossible to extract the original

Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras
information stored at time step t – k.

• Also, there is the Vanishing gradients problem!

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 2
LSTM and GRU : Introduction
• Consider a scenario where we have to evaluate the
The white board analogy: expression on a whiteboard:
Evaluate “ac(bd+a) + ad”
given that a= 1, b= 3, c= 5, d=11
• Normally, the evaluation in white board would look
like:
ac = 5
bd = 33
bd + a = 34
ac(bd + a) = 170
ad = 11
ac(bd + a) + ad = 181

Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras • Now, if the white board has space to accommodate
only 3 steps, the above evaluation cannot fit in the
required space and would lead to loss of information.

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 3
LSTM and GRU : Introduction

• A solution is to do the following:

Evaluate “ac(bd+a) + ad”
given that a= 1, b= 3, c= 5, d=11

ac = 5
bd = 33
bd + a = 34
ac(bd + a) = 170
ad = 11
ac(bd + a) + ad = 181

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 4
LSTM and GRU : Introduction

• A solution is to do the following:

Evaluate “ac(bd+a) + ad”
given that a= 1, b= 3, c= 5, d=11
• Selectively write:
ac = 5
bd = 33
ac = 5
bd = 33
bd + a = 34
ac(bd + a) = 170
ad = 11
ac(bd + a) + ad = 181

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 5
LSTM and GRU : Introduction

• A solution is to do the following:

Evaluate “ac(bd+a) + ad”
given that a= 1, b= 3, c= 5, d=11
• Selectively write:
ac = 5
bd = 33
ac = 5
• Selectively read:
bd = 33 ac = 5
bd + a = 34 bd = 33
bd + a = 34
ac(bd + a) = 170
ad = 11
ac(bd + a) + ad = 181

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 6
LSTM and GRU : Introduction

• A solution is to do the following:

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 7
LSTM and GRU : Introduction

• A solution is to do the following:

bd + a = 34

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 8
LSTM and GRU : Introduction

• A solution is to do the following:

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 9
LSTM and GRU : Introduction

• A solution is to do the following:

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 10
LSTM and GRU : Introduction

• A solution is to do the following:

Evaluate “ac(bd+a) + ad”
given that a= 1, b= 3, c= 5, d=11
• Selectively write:
ac = 5
bd = 33
ac = 5
• Selectively read:
bd = 33 ac = 5
bd + a = 34 bd = 33
bd + a = 34
ac(bd + a) = 170
ad = 11 Now the board is full
ac(bd + a) + ad = 181
• So, Selectively forget:
ac(bd + a) + ad = 181
ac(bd + a) = 170
ad = 11
Since the RNN also has a finite state size, we need to figure out a way to allow it to selectively read, write and forget

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 11
LSTM and GRU : Introduction

• RNN reads the document from left to right and after

every word updates the state.

• By the time we reach the end of the document the

information obtained from the first few words is
completely lost.

• In our improvised network, ideally, we would like to:

• Forget the information added by stop words (a, the,

etc.)

• Selectively read the information added by previous

sentiment bearing words (awesome, amazing, etc.)
Example: Predicting the sentiment of a review
• Selectively write new information from the current
Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras word to the state.

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 12
LSTM and GRU : Introduction
Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras
Selectively write:

• In an RNN, the state st is defined as follows:

• Instead of passing st-1 as it is, we need to pass

(write) only some portions of it.

• To do this, we introduce a vector ot-1 which

decides what fraction of each element of st-1
The RNN has to learn ot-1 along with other parameters
should be passed to the next state.
(W,U,V)
• Each element of ot-1 (restricted to be between 0
and 1) gets multiplied with st-1

New parameters to be learned are: Wo, Uo, bo

• How does RNN know what fraction of the state
Ot is called the output gate as it decides how much to pass to pass on?
(write) to the next time step.
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 13
LSTM and GRU : Introduction
Selectively read:

• We will now use ht-1 and xt to compute the

new state at the time step t :

• Again, to pass only useful information from

to st, we selectively read from it before
constructing the new cell state.
Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras

• To do this we introduce another gate called as

the input gate:

• And use to selectively read the

information.
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 14
LSTM and GRU : Introduction

Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras

Selectively forget
• How do we combine st-1 and to get the new
state?

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 15
LSTM and GRU : Introduction

Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras

Selectively forget • But we may not want to use the whole of st-1
• How do we combine st-1 and to get the new but forget some parts of it.
state?
• To do this a forget gate is introduced:

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 16
LSTM (Long Short-Term Memory)

Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras

Long-term memory

Short-term memory
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 17
LSTM (Long Short-Term Memory)

Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras

• LSTM has many variants which include different number of gates and also different arrangement of gates.

• A popular variant of LSTM is the Gated Recurrent Unit (GRU).

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 18
LSTM Cell

▪ Neuron called a Cell

▪ FC are fully connected layers

▪ Long Term state c(t-1) traverses through forget gate

forgetting some memories and adding some new
memories

▪ Long term state c(t-1) is passed through tanh and

then filtered by an output gate, which produces
short term state h(t)

▪ Update gate- g(t) takes current input x(t) and

previous short term state h(t-1)

▪ Important parts of output g(t) goes to long term

• Note: ct is same as st state
Source: Aurélien Géron. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. " O'Reilly Media, Inc.", 2019.

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 19
LSTM Cell

▪ Gating Mechanism- regulates information

that the network stores

Other 3 layers are gate controllers

▪ Forget gate f(t) controls which part of the
long—term state should be erased

▪ Input gate i(t) controls which part of g(t)

should be added to long term state

▪ Output gate o(t) controls which parts of long

term state should be read and output at this
time state
▪ both to h(t) and to y(t)

Source: Aurélien Géron. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. " O'Reilly Media, Inc.", 2019.

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 20
LSTM computations

An LSTM cell can learn to recognize an important

input (role of the input gate), store it in long term

state, preserve it for as long as possible it is needed

(role of forget gate), and extract it whenever it is

needed.

Wxi, Wxf, Wxo, Wxg are the weight matrices of each of the four layers for their connection to the input
vector x(t).
Whi, Whf, Who, and Whg are the weight matrices of each of the four layers for their connection to the
previous short-term state h(t–1).
bi, bf, bo, and bg are the bias terms for each of the four layers.
Source: Aurélien Géron. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. " O'Reilly Media, Inc.", 2019.

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 21
Gated Recurrent Unit (GRU)

Gates: States:

Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 22
Gated Recurrent Unit (GRU)

-1

Gates: States:

Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 23
Gated Recurrent Unit (GRU)

-1

Gates: States:

Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 24
Gated Recurrent Unit (GRU)

-1

Gates: States:

Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 25
Gated Recurrent Unit (GRU)

-1

Gates: States:

Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 26
Gated Recurrent Unit (GRU)
Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras

-1

Gates: States:

No explicit forget gate (the forget gate and input gates are tied)
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 27
Gated Recurrent Unit (GRU)
Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras

-1

Gates: States:

The gates depend directly on st-1 and not the intermediate ht-1 as in the case of LSTMs
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 28
Gated Recurrent Unit CELL (Kyunghyun Cho et al, 2014)
The main simplifications of LSTM are:
• Both state vectors (short and long term) are
merged into a single vector h(t).

• Gate controller z(t) controller controls both the

forget gate and the input gate.
• If the gate controller outputs
• 1, the forget gate is open and the input gate is
closed.
• 0, the opposite happens
• whenever a memory must be written, the location
where it will be stored is erased first.

• No output gate, the full state vector is output at

every time step.

• Reset gate controller r(t) that controls which

part of the previous state will be shown to the
main layer g(t).

Source: Aurélien Géron. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. " O'Reilly Media, Inc.", 2019.

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 29
LSTM vs GRU computation

• GRU Performance is good but may have a slight dip in the

accuracy
• But lesser number of trainable parameters which makes it
advantageous to use
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 30
Avoiding vanishing gradients with LSTMs: Intuition
• During forward propagation the gates control the flow of information.

• They prevent any irrelevant information from being written to the state.

• Similarly during backward propagation they control the flow of gradients.

• It is easy to see that during backward pass the gradients will get multiplied by the gate.

• If the state at time t-1 did not contribute much to the state at time t then during backpropagation the gradients
flowing into st-1 will vanish

• The key difference from vanilla RNNs is that the flow of information and gradients is controlled by the gates which
ensure that the gradients vanish only when they should.

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 31
Different RNNs

Vanilla RNNs

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 32
Different RNNs

Eg: Image Captioning

Image -> Sequence of words

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 33
Different RNNs

Eg: Sentiment classification

Sequence of words -> Sentiment

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 34
Different RNNs

Eg: Machine Translation

Sequence of words -> Sequence of words

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 35
Different RNNs

Eg: Video Classification on frame level

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 36
Deep RNNs

RNNs that are deep not only in the time

direction but also in the input-to-output
direction.

Source: Deep Recurrent Neural Networks — Dive into Deep Learning 1.0.0-alpha1.post0 documentation (d2l.ai)
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 37
Bi-Directional RNNs: Intuition

Source: codebasics - YouTube

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 38
Bi-Directional RNNs: Intuition

• The o/p at the third time step (where input is the string “apple”) depends on only previous two i/ps

Source: codebasics - YouTube

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 39
Bi-Directional RNNs

• Adding an additional backward layer with connections as shown above makes the o/p at a time step depend on both
previous as well as future i/ps.

Source: codebasics - YouTube

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 40
Bi-directional RNNs

▪ Example - speech detection

▪ I am ___.

▪ I am ___ hungry.

▪ I am ___ hungry, and I can eat half a

cake.
▪ Regular RNNs are causal
▪ look at past and present inputs to generate
output.
▪ Use 2 recurrent layers on the same inputs
▪ One reading words from left to right
▪ Another reading words from right to left
▪ Combine their outputs at each time step

Source: Bidirectional Recurrent Neural Networks — Dive into Deep Learning 1.0.0-alpha1.post0 documentation (d2l.ai)
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 41
Bi-directional RNN computation

Ht (Forward) = A(Xt * WXH (forward) + Ht-1 (Forward) * WHH (Forward) + bH (Forward))

Ht (Backward) = A(Xt * WXH (Backward) + Ht+1 (Backward) * WHH (Backward) + bH (Backward))

where,

A = activation function,

W = weight matrix

b = bias

The output at any given hidden state is :

Yt = Ht * WAY + by , where Ht is a concatenation of Ht (Forward) and Ht (Backward)

Bidirectional Recurrent Neural Network - GeeksforGeeks

Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 42
Generating Shakespearan Text using a Character RNN

▪ “The unreasonable Effectiveness of ▪ Chop the Sequential dataset into

Recurrent Neural Networks” – Andrej multiple windows
Karpathy (2015)
▪ 3-layer RNN with 512 hidden nodes on
each layer
▪ Char-RNN was trained on Shakpeare’s
work used to generate novel text- one
character at a time
▪ PANDARUS:
Alas, I think he shall be come approached and
the day
When little srain would be attain'd into being
never fed,
And who is but a chain and subjects of his
death,
I should not sleep.
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 43
Stateful RNN

▪ Stateless RNNs
▪ at each training iteration the model starts
with a hidden state full of 0s
▪ Update this state at each time step
▪ Discards the output at the final state
when moving onto next training batch
▪ Stateful RNN
▪ Uses sequential nonoverlapping input
sequences
▪ Preserves the final state after processing
one training batch
▪ use it as initial state for next training
batch
▪ Model will learn long-term patterns
despite only backpropagating through
short sequences
Rohini Rao & Abhilash K Pai, Dept. of DSCA DSE 3151 Deep Learning 44

Recurrent Neural Nets
No ratings yet
Recurrent Neural Nets
144 pages
A Little Book of Deep Learning - Francois Fleuret
No ratings yet
A Little Book of Deep Learning - Francois Fleuret
149 pages
Lecture 15
No ratings yet
Lecture 15
196 pages
Deep Learning Model
No ratings yet
Deep Learning Model
144 pages
MachineLearningSlides PartTwo
No ratings yet
MachineLearningSlides PartTwo
141 pages
Dse 3151 Slide RNN
No ratings yet
Dse 3151 Slide RNN
91 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
105 pages
Lecture 15
No ratings yet
Lecture 15
43 pages
CNN RNN LSTM Attention
No ratings yet
CNN RNN LSTM Attention
86 pages
AML - Lecture - 09 - 08nov24
No ratings yet
AML - Lecture - 09 - 08nov24
126 pages
RNN LSTM
No ratings yet
RNN LSTM
72 pages
ML (Cs-601) Unit 4 Complete
No ratings yet
ML (Cs-601) Unit 4 Complete
45 pages
4 - RNN To GRU
No ratings yet
4 - RNN To GRU
63 pages
DL Module 1 - CS-1 Fundamentals of Neural Network
No ratings yet
DL Module 1 - CS-1 Fundamentals of Neural Network
81 pages
First
No ratings yet
First
92 pages
DNN Merged Sugata
No ratings yet
DNN Merged Sugata
243 pages
Lecture 4 - Language Modelling and RNNs Part 2
No ratings yet
Lecture 4 - Language Modelling and RNNs Part 2
44 pages
07 RNN Recurrent Neural Networks
No ratings yet
07 RNN Recurrent Neural Networks
115 pages
Ject Lab DL - Manual
No ratings yet
Ject Lab DL - Manual
55 pages
Introduction To Rnns
No ratings yet
Introduction To Rnns
48 pages
ML 5
No ratings yet
ML 5
20 pages
Introtodeeplearning MIT 6.S191
No ratings yet
Introtodeeplearning MIT 6.S191
36 pages
Unit 2 DL
No ratings yet
Unit 2 DL
44 pages
10 RNN
No ratings yet
10 RNN
56 pages
Chap 10-2 Sequence Modeling Recurrent and Recursive Net-Hyun-Lim Yang
No ratings yet
Chap 10-2 Sequence Modeling Recurrent and Recursive Net-Hyun-Lim Yang
39 pages
CSE 4237 SoftCom Solutions
No ratings yet
CSE 4237 SoftCom Solutions
115 pages
AIDL03 EvolutionOfAI
No ratings yet
AIDL03 EvolutionOfAI
22 pages
1 AI - Introduction and ML
No ratings yet
1 AI - Introduction and ML
32 pages
Recurrent Neural Networks (RNNS) : A Gentle Introduction and Overview
No ratings yet
Recurrent Neural Networks (RNNS) : A Gentle Introduction and Overview
16 pages
Retentive Network: A Successor To Transformer For Large Language Models
No ratings yet
Retentive Network: A Successor To Transformer For Large Language Models
14 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
143 pages
Revision Notes LSTRM
No ratings yet
Revision Notes LSTRM
19 pages
cs3244 10c.deep-Learning-Issues
No ratings yet
cs3244 10c.deep-Learning-Issues
17 pages
RNN
No ratings yet
RNN
22 pages
Exam Long Questions
No ratings yet
Exam Long Questions
8 pages
Retentive Network - A Successor To Transformer For Large Language Models
No ratings yet
Retentive Network - A Successor To Transformer For Large Language Models
14 pages
Second Exam 2021-22
No ratings yet
Second Exam 2021-22
14 pages
Deep Learning Lab 2023
No ratings yet
Deep Learning Lab 2023
47 pages
Machine Learning Unit 4 RNN
No ratings yet
Machine Learning Unit 4 RNN
11 pages
Insem2 Scheme
No ratings yet
Insem2 Scheme
6 pages
Polylut: Ultra-Low Latency Polynomial Inference With Hardware-Aware Structured Pruning
No ratings yet
Polylut: Ultra-Low Latency Polynomial Inference With Hardware-Aware Structured Pruning
13 pages
DSP Unit - Iv
No ratings yet
DSP Unit - Iv
15 pages
CS 229 - Deep Learning Cheatsheet
No ratings yet
CS 229 - Deep Learning Cheatsheet
6 pages
Solution: Introduction To Deep Learning
No ratings yet
Solution: Introduction To Deep Learning
20 pages
DSE 3151 25 Sep 2023
No ratings yet
DSE 3151 25 Sep 2023
9 pages
Deep Learning Questions
No ratings yet
Deep Learning Questions
17 pages
Assignmnt 2
No ratings yet
Assignmnt 2
10 pages
Week 11 Nptel Deep Learning
No ratings yet
Week 11 Nptel Deep Learning
6 pages
CS601 - Machine Learning - Unit 4 - Notes - 1672759767
No ratings yet
CS601 - Machine Learning - Unit 4 - Notes - 1672759767
12 pages
2022 Resit Solution
No ratings yet
2022 Resit Solution
12 pages
Dis6 Sol
No ratings yet
Dis6 Sol
6 pages
DL Questions
No ratings yet
DL Questions
5 pages
CS 601 Machine Learning Unit 4
No ratings yet
CS 601 Machine Learning Unit 4
14 pages
13.4. Spare Part List of Gekamac GKM 420-2G
No ratings yet
13.4. Spare Part List of Gekamac GKM 420-2G
2 pages
DL Exam 2023-2
No ratings yet
DL Exam 2023-2
5 pages
Deep Learning - IIT Ropar - Unit 14 - Week 11
No ratings yet
Deep Learning - IIT Ropar - Unit 14 - Week 11
4 pages
Practice Test 1 Neal Davis Course
No ratings yet
Practice Test 1 Neal Davis Course
100 pages
Specifications: Diagnostic Ultrasound System
100% (2)
Specifications: Diagnostic Ultrasound System
19 pages
Arbor BP Fundamentals - Part1
No ratings yet
Arbor BP Fundamentals - Part1
50 pages
BSC (Maths) V-Sem
No ratings yet
BSC (Maths) V-Sem
10 pages
Humbrol Conversions To Other Paint Manufacturers
100% (4)
Humbrol Conversions To Other Paint Manufacturers
6 pages
BE1!50!51B-244 Instruction Manual
No ratings yet
BE1!50!51B-244 Instruction Manual
64 pages
Armis - Hyper-V Virtual Appliance Configuration Guide Rev F
No ratings yet
Armis - Hyper-V Virtual Appliance Configuration Guide Rev F
14 pages
Department of Computer Science: Image To Text Using Text Recognition & Text To Speech
No ratings yet
Department of Computer Science: Image To Text Using Text Recognition & Text To Speech
66 pages
150 DSA Questions
No ratings yet
150 DSA Questions
9 pages
Antenna Lesson Plan New
No ratings yet
Antenna Lesson Plan New
2 pages
Chapter II SAMPLE
No ratings yet
Chapter II SAMPLE
19 pages
CALENG1 Lesson 1 - From Dr. Arnel Beltran
No ratings yet
CALENG1 Lesson 1 - From Dr. Arnel Beltran
51 pages
Liebert Ita 5kva and 6kva Ups.
100% (1)
Liebert Ita 5kva and 6kva Ups.
72 pages
Atlas HMV Flyer English 2024 06
No ratings yet
Atlas HMV Flyer English 2024 06
2 pages
DC-MSB 02-01-2023
No ratings yet
DC-MSB 02-01-2023
10 pages
Day 24
No ratings yet
Day 24
10 pages
G9SE Brochure
No ratings yet
G9SE Brochure
9 pages
Math 7 - Q4, WK5 Las
No ratings yet
Math 7 - Q4, WK5 Las
11 pages
MATH 1281 - Unit 7 Assignment
No ratings yet
MATH 1281 - Unit 7 Assignment
3 pages
Frequency Relay: 1MRS 750418-MBG Spaf 140 C
No ratings yet
Frequency Relay: 1MRS 750418-MBG Spaf 140 C
8 pages
PRM DWG DC Me Ta El 1001 1 1101
No ratings yet
PRM DWG DC Me Ta El 1001 1 1101
1 page
This Spreadsheet Supports STUDENT Analysis of The Case "Transportation and Consolidation at Elevalt LTD." (UVA-OM-1490)
No ratings yet
This Spreadsheet Supports STUDENT Analysis of The Case "Transportation and Consolidation at Elevalt LTD." (UVA-OM-1490)
7 pages
Mechatronic System Design Project: A 3D Printer Case Study: November 2015
No ratings yet
Mechatronic System Design Project: A 3D Printer Case Study: November 2015
7 pages
Jalalabad Gas T and D System Ltd. (JGTDSL) : Froq /ai6.ii I/1 /a I'fl Tqapia
No ratings yet
Jalalabad Gas T and D System Ltd. (JGTDSL) : Froq /ai6.ii I/1 /a I'fl Tqapia
3 pages
Sectigo JD
No ratings yet
Sectigo JD
2 pages
6 FM 12
No ratings yet
6 FM 12
2 pages
DSDH Hac B2a5
No ratings yet
DSDH Hac B2a5
3 pages
Python With Django Course: Section 1
No ratings yet
Python With Django Course: Section 1
4 pages
Swing Check Valve - Flanged Ends: Schematic Drawing
No ratings yet
Swing Check Valve - Flanged Ends: Schematic Drawing
1 page
IGNOU BCA Discrete Mathematics Previous Year Unsolved Papers MCS 013
From Everand
IGNOU BCA Discrete Mathematics Previous Year Unsolved Papers MCS 013
Manish Soni
No ratings yet
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
From Everand
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
Shubhankar Paul
No ratings yet
Trigonometric Ratios to Transformations (Trigonometry) Mathematics E-Book For Public Exams
From Everand
Trigonometric Ratios to Transformations (Trigonometry) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
From Everand
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

4 LSTM Gru

Uploaded by

4 LSTM Gru

Uploaded by

LSTM & GRU

DSE 3151 DEEP LEARNING

Dr. Rohini Rao & Dr. Abhilash K Pai

• The state (si) of an RNN records information from

• At each new timestep the old information gets

• After ‘t’ steps the information stored at time step t-

• It would be impossible to extract the original

• Also, there is the Vanishing gradients problem!

• A solution is to do the following:

• A solution is to do the following:

• A solution is to do the following:

• A solution is to do the following:

• A solution is to do the following:

• A solution is to do the following:

• A solution is to do the following:

• A solution is to do the following:

• RNN reads the document from left to right and after

• By the time we reach the end of the document the

• In our improvised network, ideally, we would like to:

• Forget the information added by stop words (a, the,

• Selectively read the information added by previous

• In an RNN, the state st is defined as follows:

• Instead of passing st-1 as it is, we need to pass

• To do this, we introduce a vector ot-1 which

New parameters to be learned are: Wo, Uo, bo

• We will now use ht-1 and xt to compute the

• Again, to pass only useful information from

• To do this we introduce another gate called as

• And use to selectively read the

Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras

Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras

Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras

Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras

• A popular variant of LSTM is the Gated Recurrent Unit (GRU).

▪ Neuron called a Cell

▪ FC are fully connected layers

▪ Long Term state c(t-1) traverses through forget gate

▪ Long term state c(t-1) is passed through tanh and

▪ Update gate- g(t) takes current input x(t) and

▪ Important parts of output g(t) goes to long term

▪ Gating Mechanism- regulates information

Other 3 layers are gate controllers

▪ Input gate i(t) controls which part of g(t)

▪ Output gate o(t) controls which parts of long

An LSTM cell can learn to recognize an important

input (role of the input gate), store it in long term

state, preserve it for as long as possible it is needed

(role of forget gate), and extract it whenever it is

Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras

Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras

Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras

Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras

Source: CS7015 Deep Learning, Dept. of CSE, IIT Madras

• Gate controller z(t) controller controls both the

• No output gate, the full state vector is output at

• Reset gate controller r(t) that controls which

• GRU Performance is good but may have a slight dip in the

• Similarly during backward propagation they control the flow of gradients.

Eg: Image Captioning

Eg: Sentiment classification

Eg: Machine Translation

Eg: Video Classification on frame level

RNNs that are deep not only in the time

Source: codebasics - YouTube

Source: codebasics - YouTube

Source: codebasics - YouTube

▪ Example - speech detection

▪ I am ___ hungry, and I can eat half a

Ht (Forward) = A(Xt * WXH (forward) + Ht-1 (Forward) * WHH (Forward) + bH (Forward))

Ht (Backward) = A(Xt * WXH (Backward) + Ht+1 (Backward) * WHH (Backward) + bH (Backward))

The output at any given hidden state is :

Yt = Ht * WAY + by , where Ht is a concatenation of Ht (Forward) and Ht (Backward)

▪ “The unreasonable Effectiveness of ▪ Chop the Sequential dataset into

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.