GROUP19_EEE_PAPER
GROUP19_EEE_PAPER
GROUP19_EEE_PAPER
SUBMITTED BY
BATCH B GROUP 19
B.Nikitha (CB.SC.U4AIE23119)
K.Bhanu Prakash(CB.SC.U4AIE23134)
M.Sravanthi Suma (CB.SC.U4AIE23148)
M.Kavya Srihitha (CB.SC.U4AIE23167)
1
CONTENTS
1. ACKNOWLEGEMENT................................................................. 3
2. ABSTRACT .................................................................................... 4
3. INTRODUCTION ........................................................................... 5
4. BACKGROUND ............................................................................. 6
5. METHODOLOGY ......................................................................... 8
6. RESULTS AND DISCUSSION .................................................... 20
7. REFERENCES .............................................................................. 22
8. CONCLUSION .............................................................................23
2
ACKNOWLEDGEMENT
PROJECT TEAM:
1.B.NIKHITA – CB.SC.U4AIE23119
2.K.BHANU PRAKASH – CB.SC.U4AIE23134
3.M.SRAVANTHI SUMA – CB.SC.U4AIE23148
4.M.KAVYA SRIHITHA – CB.SC.U4AIE23167
3
ABSTRACT
Neural Machine Translation (NMT) has seen a tremendous spurt of growth in less
than ten years, and has already entered a mature phase. While considered as the
most widely used solution for Machine Translation, its performance on low-
resource language pairs still remains sub-optimal compared to the high-resource
counterparts, due to the unavailability of large parallel corpora. Therefore, the
implementation of NMT techniques for low-resource language pairs has been
receiving the spotlight in the recent NMT research arena, thus leading to a
substantial amount of research reported on this topic. This paper presents a
detailed survey of research advancements in low-resource language NMT (LRL-
NMT), along with a quantitative analysis aimed at identifying the most popular
solutions. Based on our findings from reviewing previous work, this survey paper
provides a set of guidelines to select the possible NMT technique
The Neural Translation has been done using Neural network various architectures
We used these four modules:
4
INTRODUCTION
5
BACKGROUND THEORY
6
The differences between RNN and feedforward neural network are as follows:
Feed forward neural networks are those artificial neural networks that do not
possess /include looping nodes. This kind of neural network is also called a multi-
layer neural network because of all information can only be passed to
forward.Data flows in feedforward ann from the input layer to the output layer or
indirectly to the hidden layers if exist. These networks are suitable for use in
image classification problems that have input/output autonomy for instance.
However, the inability to memorize previous inputs to a certain degree allows
them to be less effective for sequential data analysis..
7
METHODOLOGY
Tamil to English translation using Neural Networks is the main aim of the project
Models:
For detailed analysis, the fundamental concepts of GRU include the usage of
gating mechanisms maintain and control the hidden state of a network at each
time step. The gating can be performed based on a set of criteria that limit the
access and sharing of data within the network. Minimizing the memory carry-over
and keeping the gradients normalized are two reasons the GRU has two gating
mechanisms: The reset gate; The update gate
The reset gate precisely defines how much of the previous hidden state to forget;
whereas, the update gate dictates how much of the new input to use in modifying
the present hidden state. The end result of the successive computations in the
GRU is, therefore, the output computed based on the updated h t .
The equations used to calculate the reset gate, update gate, and hidden state of a
GRU are as follows:
8
Prerequisites: A common type of Recurrent Neural Network is the Long Short
Term Memory network.
Knowing that a basic method of Recurrent Neural Network often faces the
Vanishing-Exploding gradients problem while operating, improvements and
variations were introduced. Some of the most used variations include the
following: One of the most popular is the Long Short Term Memory Network
(LSTM)’. Another one of the variations not quite as well-known but since it is
nearly as good as the previous one is the Gated Recurrent Unit Network (GRU).
In the cell it has only three gates and it does not possess Internal Cell State like
LSTM does.
The information that is being stored in the Internal Cell State in the LSTM
recurrent unit is assimilated in the hidden state of the Gated Recurrent Unit. This
collective information is then transferred to the next Gated Recurrent Unit of the
model. The different gates of a GRU are as described below:
Update Gate(z): It fulfills the function of defining how much of the prior
knowledge has to be transferred into the future. It has a similar function to the
Output Gate in LSTM recurrent unit category.
9
Working of a Gated Recurrent Unit:
Calculate the values of the three different gates by following the steps given
below:
For each gate, sum up the products between the concerned vector and weights of
the matrix multiplied by each vector respectively, in which a new parameterized
current input and the previously hidden state vectors are obtained through
element-wise multiplication (Hadamard Product).
Perform the respective activation function on each element of the matrices for the
given gates and the parameterized vectors. Here, is given the list of the gates with
the type of activation function to be applied for the gate Site Level Gates Gate
Activation function Targeted Gate None Targeted Gate None
Update Gate : The mix of the differential and the sigmoid functionsCMD27 is
given below The derivative of the sigmoid function is calculated as 0 for values of
y that are in the sigmoid function sigmoid curve as shown below Conversion of
the sigmoid function to angle form Using the trigonometric table, the following
shows how the sigmoid function can be converted to an angular form:
Reset Gate : Sigmoid function is a mathematical function that transforms the data
and models the financial and economic processes effectively
The calculation of the Current Memory Gate is slightly different from the above
described formula and procedure actually. First Hadamard product of the Reset
Gate and the state vector which was formerly concealed needs to be calculated.
Then this vector is passed to parameterize and then add this current input vector is
also made to be parameterized.
10
Therefore when calculating the current hidden state then a vector of all ones with
dimensions equal to the inputs is defined. This vector will be referred to as ones
and in mathematical terms can be represented by 1. First, take the element-wise
product of update gate and the previous hidden state vector by using a Hadamard
Product. By applying the sigmoid function on the intermediate is obtained a new
vector: Vector Update Gate = As sigmoid (layer 4) – 1 Notes: Then apply another
layer sigmoid on the result is obtained Hadamard Product from the new vector with
the current memory gate . Lastly, sum the two vectors to obtain the currently
hidden state vector that is otherwise hidden from the user.
Notably, using blue circles in the context of soft max and w represents element-
wise multiplication. The positive sign inside the circle represents the vector
summation while the negative sign represents the vector subtraction(vector
summation with negative value). The matrix W here depict the weights of the
current input vector and the previous hidden state for each of the gates.
Similar to the case of Recurrent Neural Networks, the output of a GRU network
also allows its calculation at each time step and the output is then used to train the
network with the help of gradient descent.
11
2. LSTM with embedding layer
12
Term Index: Embedding LSTM layers In computer vision, the term “Embedding
LSTM layers” refers to the integration of two important aspects of Deep learning.
So in this there are embedding layers and LSTM layers which is useful in thinking
about the past information while thinking about what next word to generate.
It is used in Deep Neural Network when handling with, for example, sequential
data in the field of computer vision. Let’s break down these two concepts:Let’s
break down these two concepts:
Embedding Layers:
• In computer vision, the embedding layers often refer to the process that
changes any kind of information on objects, classes or other discrete
materialized values into vectors with continuous values. For instance, while
constructing an image you might employ embedding layers where object label
or tag is represented as a realvalued vector.
LSTM Layers:
13
Combining Embedding and LSTM Layers in Computer Vision:
3. Bidirectional LSTM
In this topic we will first talk about bidirectional LSTMs and how the architecture
of such networks operates. We will then examine how we can apply a form of a
review system through Bidirectional LSTM.
While these statements are placed in a sequence then the regular RNN like LSTM
will definitely consider the word Server as same object.
But That not the same in the case of Bidirectional LSTM, as it propagates in
backword direction also and achieves a good relation with the sentence and gives
the perfect results.
14
Architecture
In bidirectional LSTM concept it has two unidirectional LSTM through which it
takes the input sequence and processes it both forwards and in reverse. This
architecture can be interpreted simply as there are two LSTM with the first getting
the sequence of tokens and the second getting reverse while the two are combined.
Both of these LSTM network provide a probability vector as the output and the
final output is the weighted sum of the probabilities from the two LSTM
networks. It can be represented
15
4. Encoders-Decoders Model
In fact, the main architecture that is used in machine translation models is known as
the Encoders-Decoders or the Sequence to Sequence model.
What is Encoder-Decoder?
2. Decoder: The decoder then uses the context vector, which has been produced
from the encoder, and then provide an output sequence. It can be another sequence
of different length, for example, the translated sentence or the reply in chatbot. And
just like in the case of the encoder, the decoder can also employ the use of an RNN.
The first hidden state is the context vector, and after that, each element of the output
sequence is calculated using the recurrent calculation process of the RNN model.
The decoding process is then conditioned on the context vector and previous
generated output, to produce the output sequence.
16
Lets discuss how it works internally?
1. Encoder: Let the English input Text be: “I love playing cricket”.
It takes the input sentence word by word and then encodes it for the purpose of
reformatting by the decoder. Each word is represented as a vector (e. g. , using
word embeddings (Ex: Glovec). The so-called encoder in this process is usually
based on LSTM or GRU to receive these word vectors continuously and adjust the
hidden status, and it will finally output the fixed-size vector.
The encoder, LSTM or GRU-based, takes these word vectors in sequence and
modifies its hidden state at each time step. The last one is the context vector in the
encoder, which encapsulates the entire sequence of the input sentence.
2. Decoder: The decoder then uses the contextual vector created by the encoder to
generate the translated output sequence in word-by-word procedure. In our
example, the target is the corresponding German translation: “I like playing
cricket. ”
Before decoding phase, the context vector is passed to the decoder as the first
hidden state of the decoding process. The decoder provides a word by word output
sequence, with each word predicting in the selection based on the context vector
and previously produced words.
In training, the decoder is trained to expect the correct previous words as has been
seen from the above flow chart. For instance, at the first decoding step, the decoder
uses the context vector and then produces the word “Ich. ” The generated word
“Ich” is taken as input to the next internal step in the decoder to come up with the
word “liebe. ” These steps are repeated until the entire output sequence is
produced.
Training involves procedures such as teacher forcing where the program is trained
to accept the proper previous words during training irrespective of the decoder’s
secured prediction. This makes it possible to train the decoder in producing the
right output sequence to match with the intended one.
17
The key idea is that the encoder captures the essential information from the input
sentence and represents it in the context vector. The decoder then uses this context
vector to generate the corresponding output sequence, one word at a time. By
jointly training the encoder and decoder, the model learns to effectively encode and
decode the input-output relationship, enabling translation or other sequence
generation tasks.
18
input that is sequential in nature and the hidden layer is updated each time.
It provides full information about dependence and context of the input
sequence.
• Context Vector: Thus, the context vector which we used also for the final
output of RNN is the summarize representation of the input sequence. But
in the case of the embedding layer, it transforms the input sequence into a
vector representation which is the same size as the embedding layer. This
context vector is passed to the decoder component; this specific
component also makes use of Equation (7) during the decoder procedure
• Output Embedding Layer: Just like with the input embedding layer, the
output is ex pressed as dense vectors using embeddings of sequences. Then
each word in the output sequence is assigned to the vector relevant to it.
19
RESULTS AND DISCUSSION
GRU
LSTM
20
BI-LSTM
21
REFERENCES
FOR MODELS:
https://www.geeksforgeeks.org/gated-recurrent-unit-networks/
https://www.natalieparde.com/teaching/cs_521_spring2020/LSTMs,%20GRUs,%20
Encoder-Decoder%20Models,%20and%20Attention.pdf
DATA SET:
https://huggingface.co/datasets/Hemanth-thunder/en_ta
CODE:
https://github.com/Barqawiz/aind2-nlp-translation/tree/master
22
CONCLUSION
23