ML 5

cd UNIT 4 ae RECURRENT NEURAL NETWORK, LONG SHORT-TERM MEMORY, GATED RECURRENT UNIT, TRANSLATION, BEAM ‘SEARCH AND WIDTH, BLEU SCORE, ATTENTION MODEL RL CNT SSRN Q.1. Explain in detail about recurrent neural network. Ans. A recurrent neural network (RNN) is an extension of feedforward neural network, input. The RNN handles the v State whose activation at eac a conventional which is able to handle a variable-length sequence ariable-length sequence by having a recurrent hidden h time is dependent on that of the previous time. More formally, given a sequence x= (2, 3. X,), the RNN updates its recurrent hidden state h, by 0, t=0 B= Lodhix), otherwise @ where ¢ is @ non-linear function such as cor With an affine transformation. Optionally, the RNN may heve ac output YO Ya, = Fa) which may again be of variable length, Traditionally, the update of the recurrent hidden state in ‘equation (i) is implemented as Imposition of a logistic Sigmoid y= e(Wx, + Ub), ‘whete gis a smooth, bounded function such as a lo hyperbolic tangent function 4 generative RNN outputs Stribution ov E \e represent (ii) tic sigmoid function or gi ofth 4 probability distribution over the next element adi state h,,and this gener. tive model can capture er sequences of variable length by using 4 special output symbol 'MPOSed into : : pee : Pir, *) PO) PO | 41) POX | xy, x2) PAXp | x, Xp) «iy yy, ~_ ee126 Machine Leaming (Vi-Sem) where the last clement is special end-ofsequence vatue conditional probability distribution with 5 POL Ris ewes Xe) = wh) ‘where ty is from equation (). We modht Q.2. Briefly explain the term long short-term memory unit. Ans. The long short-term memory (LSTM) unit was intially proposed by Hochreiter and Schmidhuber. Sines then, a number of minor modifications i the original LSTM unit have been made. Unlike to the recurrent unit which simply computes a weighted sum oft input signal and applies a non-linear function, each j* LSTM unit maintains. memory ¢} at time, The output hi}, or the activation, of the LSTM units the hj = of tanh (Ct), where 0} — An output gate that modulates the amount of memory ens exposure. “The output gate is computed by of = o€Wors + Usb +Voe) =A logistic Sigmoid function V =A diagonal matrix. is updated by partially forge where The memory cell cf ih oj = fly tie ‘memory content i, where tbe new memOry SAME st to which the existin ‘content is ‘each time-step from sing the existing mem d- ‘and adding a new memory content i js mosultd? 1g memory is foroten ‘ate ‘The extent ee 1 phe deg tic EM To forget gate ft. 2 yam npr ete GH pepe? pie ocwen Um el is ts 4+ Vert : i= ocwin, Ube! 4 sce ise par ¥gand Vi 906 Ging ich One wt ne she traditional Umer ini ‘Unlike t© jon (ii)in 1 equation Unit lV 127 ep the exit mem? ‘importa wy sae eas ii the existence of the feature) ry via the introduced gates. Intuitively ifthe LSTM feature from an iteasily carries input sea eon theese a fong distance, hence, ence deen fe Here, Fand oare the input, forget and ee ively. ¢ and © denot cs espentvely. and © denon the memory cell and the new memory Fig, 4.1 Long Short-term content is shown in fig. 4.1 Memory (03: What do you mean by gated recurrent unit ? Explain. “dns. A gated recurrent unit (GRU) was proposed by Cho et al. to make cach eouent unit to adaptively capture dependencies of different time scales. Silay to the LSTM unit, the GRU has gating units that modulate the flow of infomation inside the unit, however, without having a separate memory cell The activation hi of the GRU at time tis a linear interpolation between the previous activation hj , and the candidate activation fh) : hi d-ziyni +2) Bi, “i where an update gate z} decides how much the unit updates its activation, or Content. The update gate is computed by she 2 = ofW,x, + Uh 4)? ‘This procedure of taking a linear ev compat oa kine linea sum between the existing Sate andthe 's similar the LSTM unit. The GRU, however, does ty makes the unit act as ifit is reading allowing it to forget the previously Say MPU sequence, TRA128 Machine Learning (VI-Sem) ‘The reset gate xj to the update gate — j q is computed similarly OCW, x, + Uyhy y)! Here, and 2 ae th 2 re the rest and pa gates, and h and h are the activation. me candidate activation. The ated recent is shown in fig. 4.2. : Fig. 4.2 Gated Recurrent Uni @.4. How can we use SMT to find synonyms ? Ans. The word “ship” ina 2 ina particular context cam be translated to ano . err word “ship” is synonymous with the word “transport”. Sp, our example abowe fof a query such as “how to ship a box” might have the same translation “how to transport box” 3 “The search might be expanded to include both queries ~"how to ships box” as well as “how to transport a box” ‘A machine translation system may also collect information about work in the same language, o lear about how those words might be related. Q.5. Write short note on beam search. “Ams. Neural sequence models are widely used to mode time-series Equally ubiquitous is the usage of beam search (BS) as an approxinwt tartpence algorithm decode ourput sequences from thes models. BSexplrs greedy left-right fashion retaining oly te OPP candids ane aetting in sequences that differ only stiBbly from each other. “The most prevalent method for approxima decoding is ih od the top-B highly scoring: ates at each time step; where B ons bel by BS atthe sat oF ‘Let us denote the set of B solutic ee rey. ane oe, fet At ‘each time step, BS on e ; be single token Tabastons ‘of these beams (ae Oe rs oh y i sons. More formally at 8°? selects the B most likely extens! Oe atid acgraxa es oy sien Fes an be sivial See ee advantages of beam search. lof beam search are as follows — r-identical beams make BS ac .e computation being repeat reduction of ea computationally ction of with essentially the same ted for hm, stl aT in in performance: oa mismatch i.e. improvements in posterior rosary corresponding 10 improvements in task-specific netesractce to deliberately throtile BS to Become & poorer om algorithm by using reduced beam widths. This treatment of an ce aiorim as 0 byper-parameter is nol only tellectually opting bt also has a significant practical side-effect — it leads to the tf arely bland, generic, and “safe” outputs, © always saying “I ort know” in conversion models Most importantly, ack of diversity in the decoded solutions is findamenily crippling in AI problems with significant ambiguity ~e.g. there tromulipie way of describing an image or responding in a conversation that Se as sine! to capture this ambiguity by finding several Q7. Explain the term BLEU score. Ans, BLEU (BiLit i eee ais Understudy) is an algorithm that was Pomel aluaichow accuntemachine translatedtextwas, Here, same approach to evaluate the quality of the text response that we is com130 Machine Leaming (Vi-Sem,) This are the BLEU seores from n-gram Se that it gives a higher score for lower n, case zero for 4-grams as no sequence of sentences. Tis isthe general methodology As mentioned earlier, BLEU score helps us to determine the next step for our model, As depicted in fig. 4.3, the methodology behind using BLEU score is to improve our model. A low score indicates that may be the performance is not as good as expected and so, we need to improve our fc

ML 5

Uploaded by

Copyright:

Available Formats

ML 5

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML 5

Uploaded by

Copyright:

Available Formats

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.