0% found this document useful (0 votes)
8 views

Lecture Attention Neural Networks

The document outlines Lecture 7 of the CS5242 course on Neural Networks and Deep Learning, focusing on Attention Neural Networks. It covers topics such as language models, memory networks, transformers, and transfer learning. The lecture is presented by Xavier Bresson from the National University of Singapore.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Lecture Attention Neural Networks

The document outlines Lecture 7 of the CS5242 course on Neural Networks and Deep Learning, focusing on Attention Neural Networks. It covers topics such as language models, memory networks, transformers, and transfer learning. The lecture is presented by Xavier Bresson from the National University of Singapore.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

1

CS5242 : Neural Networks and Deep Learning

Lecture 7 : Attention Neural Networks


Semester 2 2024/25

Xavier Bresson
https://twitter.com/xbresson

Department of Computer Science


National University of Singapore (NUS)

Xavier Bresson 1
2

Outline

Language Models

Memory Networks

Transformers

Language Model Transformers

Sequence-To-Sequence Transformers

Transfer Learning

Conclusion

Xavier Bresson 2
3

Outline

Language Models

Memory Networks

Transformers

Language Model Transformers

Sequence-To-Sequence Transformers

Transfer Learning

Conclusion

Xavier Bresson 3
4

Language models

Language model predicts the next word given a context windows.


Most fundamental problem in NLP.

“Yesterday I went to the


beach and I saw a …” Neural network

Input: Output:
Sequence of words Probability distribution over
Xavier Bresson the dictionary/vocabulary 4
5

Recurrent neural networks

Data structure
Input is an ordered sequence.
Input length and output length can be variable.
RNNs are designed for sequences.
They learn a representation of sequence independently of its length.
Recurrence formula summarizes the sequence with a vector h :
<latexit sha1_base64="cIALSGUUxbizcHRyTDEQX3cYWoU=">AAACAnicbVDLSsNAFJ3UV42vqCtxM1iEClKSIuqy4MZlBfuANoTJdNIMnUzCzEQtobjxV9y4UMStX+HOv3HSZqGtBy4czrmXe+/xE0alsu1vo7S0vLK6Vl43Nza3tnes3b22jFOBSQvHLBZdH0nCKCctRRUj3UQQFPmMdPzRVe537oiQNOa3apwQN0JDTgOKkdKSZx2YIewzEigkRHwPA69TDU8fTkzT9KyKXbOngIvEKUgFFGh61ld/EOM0IlxhhqTsOXai3AwJRTEjE7OfSpIgPEJD0tOUo4hIN5u+MIHHWhnAIBa6uIJT9fdEhiIpx5GvOyOkQjnv5eJ/Xi9VwaWbUZ6kinA8WxSkDKoY5nnAARUEKzbWBGFB9a0Qh0ggrHRqeQjO/MuLpF2vOec15+as0qgXcZTBITgCVeCAC9AA16AJWgCDR/AMXsGb8WS8GO/Gx6y1ZBQz++APjM8f2R2VFg==</latexit>

h fW (h, x)

Weight sharing across time (translation invariance)


They learn to keep or ignore information in the sequence for the downstream task.
Gating mechanism to forget/remember the past or the new input :
h <latexit sha1_base64="zkdywCL+gMcdgkFgl+QfQl1Airw=">AAAB9XicbVDLSgNBEJz1GeMr6tHLYBA8hd0g6DHgxWME84DsGmZnJ8mQeSwzvUpY8h9ePCji1X/x5t84SfagiQUNRVU33V1xKrgF3//21tY3Nre2Szvl3b39g8PK0XHb6sxQ1qJaaNONiWWCK9YCDoJ1U8OIjAXrxOObmd95ZMZyre5hkrJIkqHiA04JOOkhtHwoCQ51ogGP+pWqX/PnwKskKEgVFWj2K19homkmmQIqiLW9wE8hyokBTgWblsPMspTQMRmynqOKSGajfH71FJ87JcEDbVwpwHP190ROpLUTGbtOSWBkl72Z+J/Xy2BwHeVcpRkwRReLBpnAoPEsApxwwyiIiSOEGu5uxXREDKHggiq7EILll1dJu14L/Fpwd1lt1Is4SugUnaELFKAr1EC3qIlaiCKDntErevOevBfv3ftYtK55xcwJ+gPv8wf96pId</latexit>

Xavier Bresson 5
6

Recurrent neural networks

Performance
Significant progress in NLP but not a breakthrough.
Dominant in NLP for Machine Translation (MT), Q&A, summarization up to 2018.
Limitation
RNNs cannot learn long-term dependencies (no more than 50 steps).
Hard to train because they are non-linear dynamical systems
Any small perturbation can amplify or vanish.
Slow to train because of their sequential nature (due to recurrence mechanism).
Important limitation when training on large-scale datasets.

Xavier Bresson 6
7

Outline

Language Models

Memory Networks

Transformers

Language Model Transformers

Sequence-To-Sequence Transformers

Transfer Learning

Conclusion

Xavier Bresson 7
8

Memory networks

How do these models work?


Let us consider a simple example from the bAbI dataset (Meta 2015).
This dataset is used to evaluate simple reasoning property of models.
Example :
Joe went to the kitchen
Joe picked up milk
Joe went to the bathroom
Joe put down the milk
Joe went to the bedroom
Question : Where is the milk?
Answer : ---

Xavier Bresson 8
9

Memory networks
Memory networks (Weston-Chopra-Bordes, Meta 2015) are designed to response to the
questions with multi-step word matching :
Joe went to the kitchen
Joe picked up milk
Joe went to the bathroom Time
(3) (4)
Joe put down the milk
(2)
Joe went to the bedroom
(1)
Question : Where is the milk?
Answer : bathroom
This matching process is also called multi-hop attention.
Multi-hop attention is a mechanism that performs multi-step reasoning.

Xavier Bresson 9
10

Multi-hop attention steps


Unrolling memory layers :
Answer

Joe went to the kitchen. Joe picked up milk. Joe went to the bathroom. Joe put down the milk. Joe went to the bedroom. Query: Where is the milk? Layer 4
(4)

Joe went to the kitchen. Joe picked up milk. Joe went to the bathroom. Joe put down the milk. Joe went to the bedroom. Query: Where is the milk? Layer 3
(3)

Joe went to the kitchen. Joe picked up milk. Joe went to the bathroom. Joe put down the milk. Joe went to the bedroom. Query: Where is the milk? Layer 2
(2)

Joe went to the kitchen. Joe picked up milk. Joe went to the bathroom. Joe put down the milk. Joe went to the bedroom. Query: Where is the milk? Layer 1

(1)

Joe went to the kitchen. Joe picked up milk. Joe went to the bathroom. Joe put down the milk. Joe went to the bedroom. Query: Where is the milk? Layer 0

Xavier Bresson 10
11

Implementation
Attention Mechanism
(q changes at each
Learnable
memory write)
Write parameters
Weighted memory X
Task
<latexit sha1_base64="m9wuDDWrHAqL3VKSqVylm8AydOw=">AAACCnicbVDLSsNAFJ34rPEVdelmtAh1E5Ii1Y1QcOOygn1AE8JkOmmHziRxZiKW0rUbf8WNC0Xc+gXu/BsnbRbaeuDA4Zx7mbknTBmVynG+jaXlldW19dKGubm1vbNr7e23ZJIJTJo4YYnohEgSRmPSVFQx0kkFQTxkpB0Or/K8fU+EpEl8q0Yp8TnqxzSiGCltBdbRHbyEnsx4QCHS9KCtGQWtykNAT6FpmoFVdmxnCrgo3EKUQYFGYH15vQRnnMQKMyRl13VS5Y+RUBQzMjG9TJIU4SHqk66WMeJE+uPpKRN4op0ejBKhGSs4dX9vjBGXcsRDPcmRGsj5LDf/y7qZii78MY3TTJEYzx6KMgZVAvNeYI8KghUbaYGwoPqvEA+QQFjp9vIS3PmTF0Wrars1u3ZzVq5XizpK4BAcgwpwwTmog2vQAE2AwSN4Bq/gzXgyXox342M2umQUOwfgD4zPH45FlvA=</latexit>

q= ai . fV (xi ) MLP
sum response
i After
multiple
{a0 , ..., an 1 }, ai 2R
<latexit sha1_base64="MjDOKLvX7jOj70cDzB3f0b2hQII=">AAACFHicbZDLSsNAFIYn9VbrLerSzWARBGtIilSXBTcuq9gLNCFMptN26GQSZiZCCXkIN76KGxeKuHXhzrdx0mahrT8MfPznHOacP4gZlcq2v43Syura+kZ5s7K1vbO7Z+4fdGSUCEzaOGKR6AVIEkY5aSuqGOnFgqAwYKQbTK7zeveBCEkjfq+mMfFCNOJ0SDFS2vLNMzeFyLdr0LIsWNOY8nMng26WM3Upd0OkxkGQ3mUV36zalj0TXAangCoo1PLNL3cQ4SQkXGGGpOw7dqy8FAlFMSNZxU0kiRGeoBHpa+QoJNJLZ0dl8EQ7AziMhH5cwZn7eyJFoZTTMNCd+YpysZab/9X6iRpeeSnlcaIIx/OPhgmDKoJ5QnBABcGKTTUgLKjeFeIxEggrnWMegrN48jJ06pbTsBq3F9VmvYijDI7AMTgFDrgETXADWqANMHgEz+AVvBlPxovxbnzMW0tGMXMI/sj4/AEZl5vA</latexit>

Value layers/hops
vectors
Attention
weights w/ Loop process/
Softmax Multi-hop attention
layers
{qfK (x0 )T , ..., qfK (xn T
<latexit sha1_base64="wBMsU8WiI+LbOqQ9oLQfUeyaeZ4=">AAACMHicbZDNSsNAFIUn/hv/qi7dDBahQg1JkepScKHgRsVqoYlhMp20QyeTODMRS8gjufFRdKOgiFufwkmtoK0HBg7fvZe59wQJo1LZ9osxMTk1PTM7N28uLC4tr5RW1y5lnApMGjhmsWgGSBJGOWkoqhhpJoKgKGDkKugdFvWrWyIkjfmF6ifEi1CH05BipDTyS0duBm9C/6Ry59vb1xdVaFkWrP6gjO84ucbQzavQdX8w1cil3I2Q6gZBdp6bpl8q25Y9EBw3ztCUwVCnfunRbcc4jQhXmCEpW46dKC9DQlHMSG66qSQJwj3UIS1tOYqI9LLBwTnc0qQNw1joxxUc0N8TGYqk7EeB7ix2lKO1Av5Xa6Uq3PcyypNUEY6/PwpTBlUMi/RgmwqCFetrg7CgeleIu0ggrHTGRQjO6Mnj5rJmOXWrfrZbPqgN45gDG2ATVIAD9sABOAanoAEwuAdP4BW8GQ/Gs/FufHy3ThjDmXXwR8bnF0qbpYs=</latexit>

1) },
Memory/Query
qfK (xi )T 2 R
Dot
q 2 R1⇥d
<latexit sha1_base64="cVmRyZJL/ZIak0Ufm5diOceywTU=">AAACCnicbVBNS8NAEJ3Ur1q/oh69rBbBU0mKqMeCF49V7Ac0sWw223bpZhN3N0IJPXvxr3jxoIhXf4E3/42btgdtfTDweG+GmXlBwpnSjvNtFZaWV1bXiuuljc2t7R17d6+p4lQS2iAxj2U7wIpyJmhDM81pO5EURwGnrWB4mfutByoVi8WtHiXUj3BfsB4jWBupax/eI48J5EVYD4IguxnfZa6nWUQVCselHF277FScCdAicWekDDPUu/aXF8YkjajQhGOlOq6TaD/DUjPC6bjkpYommAxxn3YMFdgs87PJK2N0bJQQ9WJpSmg0UX9PZDhSahQFpjM/Wc17ufif10l178LPmEhSTQWZLuqlHOkY5bmgkElKNB8Zgolk5lZEBlhiok16eQju/MuLpFmtuGcV9/q0XKvO4ijCARzBCbhwDjW4gjo0gMAjPMMrvFlP1ov1bn1MWwvWbGYf/sD6/AE/mpie</latexit>

Product Read
memory Hidden state
controlling/learning
attention mechanism
1⇥d
2R
<latexit sha1_base64="83q9tfjgssd8WJV1jp3fxVcFTYA=">AAACLHicbVDLSsNAFJ34rPUVdelmsAgVakiKqMtCN4KbKlaFJobJZNIOnUzCzEQsIR/kxl8RxIUibv0OJ20XWj0wcOace7n3niBlVCrbfjfm5hcWl5YrK9XVtfWNTXNr+1ommcCkixOWiNsAScIoJ11FFSO3qSAoDhi5CYbt0r+5J0LShF+pUUq8GPU5jShGSku+2XbzyD+vP/j2QQNalgUbcPLP+aFTHEC3aMAHn0KXcujGSA2CIL8s7nLHVTQmEoZFteqbNduyx4B/iTMlNTBFxzdf3DDBWUy4wgxJ2XPsVHk5EopiRoqqm0mSIjxEfdLTlCM9ycvHxxZwXyshjBKhH1dwrP7syFEs5SgOdGW5r5z1SvE/r5ep6NTLKU8zRTieDIoyBlUCy+RgSAXBio00QVhQvSvEAyQQVjrfMgRn9uS/5LppOcfW8cVRrdWcxlEBu2AP1IEDTkALnIEO6AIMHsEzeAPvxpPxanwYn5PSOWPaswN+wfj6Bup1pGo=</latexit>

2 R1⇥d
<latexit sha1_base64="VngTe44eqt3fy3cr0o81Tg4xGmg=">AAACInicbZBLS8NAEMc3Pmt9VT16WSyChxqSIj5uBS8eq9gHNDVsNtt26WYTdjdiWfJZvPhVvHhQ1JPgh3HT9qCtAwM//jPDzPyDhFGpHOfLWlhcWl5ZLawV1zc2t7ZLO7tNGacCkwaOWSzaAZKEUU4aiipG2okgKAoYaQXDy7zeuidC0pjfqlFCuhHqc9qjGCkj+aULT8MH36lA27ZhxaDmx24GvSxnCj3KoRchNQgCfZPdaddTNCIShlnRL5Ud2xkHnAd3CmUwjbpf+vDCGKcR4QozJGXHdRLV1UgoihnJil4qSYLwEPVJxyBHZlFXj1/M4KFRQtiLhUmu4Fj9PaFRJOUoCkxnfq6creXif7VOqnrnXU15kirC8WRRL2VQxTD3C4ZUEKzYyADCgppbIR4ggbAyruYmuLMvz0Ozaruntnt9Uq5Vp3YUwD44AEfABWegBq5AHTQABo/gGbyCN+vJerHerc9J64I1ndkDf8L6/gGuDaFV</latexit>

{fK (x0 ), ..., fK (xn 1 )}, xi {x0 , ..., xn 1 }, xi {w0 , ..., wn


<latexit sha1_base64="1tFec54Oq92mOG1wmQMolT2iwf0=">AAACAnicbZDLSsNAFIYn9VbjLepK3AwWwUUNSRF0WXDjsoK9QBPCZDpph04mYWailBDc+CpuXCji1qdw59s4bbPQ1h8GPv5zDmfOH6aMSuU430ZlZXVtfaO6aW5t7+zuWfsHHZlkApM2TlgieiGShFFO2ooqRnqpICgOGemG4+tpvXtPhKQJv1OTlPgxGnIaUYyUtgLryPRy+BA4dWjbNqxrzPm5W0CvCKyaYzszwWVwS6iBUq3A+vIGCc5iwhVmSMq+66TKz5FQFDNSmF4mSYrwGA1JXyNHMZF+PjuhgKfaGcAoEfpxBWfu74kcxVJO4lB3xkiN5GJtav5X62cquvJzytNMEY7ni6KMQZXAaR5wQAXBik00ICyo/ivEIyQQVjo1U4fgLp68DJ2G7Tq2e3tRazbKOKrgGJyAM+CCS9AEN6AF2gCDR/AMXsGb8WS8GO/Gx7y1YpQzh+CPjM8fzkmUcQ==</latexit>
1}
Key vectors Word Input sequence
Learnable embeddings of words
parameters
Xavier Bresson 11
12

Formalization

Input sequence of words : {w0 , ..., wn 1 }, wi 2 {0, ..., V


<latexit sha1_base64="J/GlfBh4dVOf+CBwFcDEMDWSgUc=">AAACGHicbZDLSsNAFIYnXmu9RV26GSyCizYmRarLghuXFewFmhAm02k7dDIJMxOlhDyGG1/FjQtF3Hbn2zhps9DWAwMf/38OZ84fxIxKZdvfxtr6xubWdmmnvLu3f3BoHh13ZJQITNo4YpHoBUgSRjlpK6oY6cWCoDBgpBtMbnO/+0iEpBF/UNOYeCEacTqkGCkt+ealm8In365Cy7JgVWPKa04G3Sxn6lLupnZVe9VOzXGzctk3K7ZlzwuuglNABRTV8s2ZO4hwEhKuMENS9h07Vl6KhKKYkazsJpLECE/QiPQ1chQS6aXzwzJ4rpUBHEZCP67gXP09kaJQymkY6M4QqbFc9nLxP6+fqOGNl1IeJ4pwvFg0TBhUEcxTggMqCFZsqgFhQfVfIR4jgbDSWeYhOMsnr0KnbjkNq3F/VWnWizhK4BScgQvggGvQBHegBdoAg2fwCt7Bh/FivBmfxteidc0oZk7AnzJmP2U+m5Y=</latexit>

1}

xi = fE (wi ) 2 Rd
<latexit sha1_base64="5wCg24oP7WAdNMbC6ouD8V2R43g=">AAACJ3icbVDLSsNAFJ34Nr6iLt0MFqFCDYmIulEKIrisYm2hiWEynbSDk0mYmagl5G/c+CtuBBXRpX/ipHbh68DAmXPu5d57wpRRqRzn3Rgbn5icmp6ZNefmFxaXrOWVC5lkApMmTlgi2iGShFFOmooqRtqpICgOGWmFV0el37omQtKEn6tBSvwY9TiNKEZKS4F16OXwNnBq0LZtWNM051tuAb2iBj39owdRcFy9Cegm9CiHXoxUPwzzs+Iy7xamaQZWxbGdIeBf4o5IBYzQCKwnr5vgLCZcYYak7LhOqvwcCUUxI4XpZZKkCF+hHuloylFMpJ8P7yzghla6MEqEflzBofq9I0exlIM41JXlovK3V4r/eZ1MRft+TnmaKcLx16AoY1AlsAwNdqkgWLGBJggLqneFuI8EwkpHW4bg/j75L7nYtt1de/d0p1LfHsUxA9bAOqgCF+yBOjgBDdAEGNyBB/AMXox749F4Nd6+SseMUc8q+AHj4xOc36Ik</latexit>

Continuous representation/ {x0 , ..., xn 1 },


word embedding :

Hidden features :
2
<latexit sha1_base64="gXJzKmezbtnRhQOThSrEdZ3meCY=">AAACVnicbVFNa9wwEJWdpEmdfjjtsReRJZAcutghtL0UAr0EcklLNwmsXCPL410RWTbSOHQx/pPJJfkpvZTKuz40HwOCx3szvJmnrFbSYhTde/7a+saLza2Xwfar12/ehjvvzm3VGAETUanKXGbcgpIaJihRwWVtgJeZgovs6luvX1yDsbLSP3FRQ1LymZaFFBwdlYblKf1KmYICpwHLYCZ1y43hi64VoguK9HT/dxodMBaw67xC68CKa/XHuDsIGOh8GAiYkbM5JgGTmrKS4zzL2h/dr1YzlCVYmndBGo6icbQs+hTEAxiRoc7S8IbllWhK0CgUt3YaRzUmzhGlUOA8Gws1F1d8BlMHNXdGSbuMpaN7jslpURn3NNIl+/9Ey0trF2XmOvt17WOtJ5/Tpg0WX5JW6rpB0GJlVDSKYkX7jGkuDQhUCwe4MNLtSsWcGy7Q/UQfQvz45Kfg/HAcfxrH349Gx4dDHFvkA9kl+yQmn8kxOSFnZEIEuSV/PN9b8+68v/6Gv7lq9b1h5j15UH74D5gbsqY=</latexit>

3 2 3
fK (x0 ) fV (x0 )
6 .. 7 n⇥d 6 .. 7
52R q 2 R1⇥d 52R
n⇥d
<latexit sha1_base64="mqB7ZmK3xC12Ywwmu41TTnWDnZA=">AAACDXicbVBNS8NAEJ3Ur1q/qh69LFbBU0mKqMeCF49V7Ac0sWw223bpZhN3N0IJ/QNe/CtePCji1bs3/42bNgdtfTDweG+GmXl+zJnStv1tFZaWV1bXiuuljc2t7Z3y7l5LRYkktEkiHsmOjxXlTNCmZprTTiwpDn1O2/7oMvPbD1QqFolbPY6pF+KBYH1GsDZSr3x0j1wmkBtiPfT99GZylzquZiFVKJiUcvTKFbtqT4EWiZOTCuRo9MpfbhCRJKRCE46V6jp2rL0US80Ip5OSmygaYzLCA9o1VGCzz0un30zQsVEC1I+kKaHRVP09keJQqXHom87sajXvZeJ/XjfR/QsvZSJONBVktqifcKQjlEWDAiYp0XxsCCaSmVsRGWKJiTYBZiE48y8vklat6pxVnevTSr2Wx1GEAziEE3DgHOpwBQ1oAoFHeIZXeLOerBfr3fqYtRasfGYf/sD6/AH9GJja</latexit>

K=4 . V =4 .
Key fK (xn Query Value
1) <latexit sha1_base64="x+jW5nczloF/dJbrOd82aXE8liE=">AAACVnicbVFNa9wwEJWdpEnUL7c59iKyFJJDFzsUkksg0EuPaehuAivXyPJ4V0SWjTQOWYz/ZHtpf0ovpdpdH9KkA4LHezO8mae80cphHP8Kwq3tnWe7e/v0+YuXr15Hb95OXd1aCRNZ69re5MKBVgYmqFDDTWNBVLmG6/z200q/vgPrVG2+4rKBtBJzo0olBXoqiyo6ZeeMayhxRnkOc2U6Ya1Y9p2UPS2z6dF9Fh9zTvldUaPzYMN15kPSH1MOphgGKLdqvsCUcmUYrwQu8ry76r91hqOqwLGiz6JRPI7XxZ6CZAAjMtRlFn3nRS3bCgxKLZybJXGDqTdEJTV4y9ZBI+StmMPMQyO8T9qtY+nZe88UrKytfwbZmn040YnKuWWV+87Vtu6xtiL/p81aLM/STpmmRTByY1S2mmHNVhmzQlmQqJceCGmV35XJhbBCov8J6kNIHp/8FExPxkk8Tr58HF2cDHHskXfkkByRhJySC/KZXJIJkeQH+R2EwVbwM/gT7oS7m9YwGGYOyD8VRn8Br/CywQ==</latexit>
fV (xn 1)

Repeat K times : Softmax(qK T ) 2 R1⇥n


<latexit sha1_base64="z67NkCooJuJcgyC3hNXd6sboB5I=">AAACjHicdVFNTxsxEPVuP4C0pWk59mIRtaKXdBehQlUhIVWqKnGBlgSkOESzXi9Y+GNjzxai1f6a/qPe+m/qDTlQUp5k6Xme5814JiuV9Jgkf6L40eMnT1dW1zrPnr9Yf9l99XrobeW4GHCrrDvLwAsljRigRCXOSidAZ0qcZldfWv30p3BeWnOCs1KMNVwYWUgOGEKT7i+gjL5jShQIztnrcGMobtDp+octUMNNszU9PD953wrSUKYBL7Os/t6c1ylDqYWnpqGMdaZLRkCH+w+YDVs7yqYV5A/a5g3tzDHp9pJ+MgddJumC9MgCR5Pub5ZbXmlhkCvwfpQmJY5rcCi5Ek2HVV6UwK/gQowCNRDKjev5MBv6NkRyWlgXjkE6j97NqEF7P9NZeNk27e9rbfB/2qjCYm9cS1NWKAy/LVRUiqKl7WZoLp3gqGaBAHcy9Er5JTjgGPbXDiG9/+VlMtzupx/76fFO72B7MY5V8oZski2Skl1yQL6RIzIgPFqLPkR70ad4Pd6JP8f7t0/jaJGzQf5B/PUv3rjBuQ==</latexit>

a
q aV = Softmax(qK T )V 2 R1⇥d

s = MLP(q) 2 RV
<latexit sha1_base64="5dzF+lAomRn037W1AVbq43Vuz40=">AAACDHicbVDLSgMxFM3UV62vqks3wSLUTZkpUt0IBTcuFKrYB3RqyaRpG5rJjMkdsQz9ADf+ihsXirj1A9z5N2baLrT1QOBwzrnk3uOFgmuw7W8rtbC4tLySXs2srW9sbmW3d2o6iBRlVRqIQDU8opngklWBg2CNUDHie4LVvcFZ4tfvmdI8kDcwDFnLJz3Ju5wSMFI7m9P4FLvAHkD58eVFZZS/O3S5dH0Cfc+Lr0e3tYxJ2QV7DDxPnCnJoSkq7eyX2wlo5DMJVBCtm44dQismCjgVbJRxI81CQgekx5qGSuIz3YrHx4zwgVE6uBso8yTgsfp7Iia+1kPfM8lkST3rJeJ/XjOC7kkr5jKMgEk6+agbCQwBTprBHa4YBTE0hFDFza6Y9okiFEx/SQnO7MnzpFYsOKVC6eooVy5O60ijPbSP8shBx6iMzlEFVRFFj+gZvaI368l6sd6tj0k0ZU1ndtEfWJ8/f7uanA==</latexit>

Output :

Xavier Bresson 12
13

Properties

This model is seen as differentiable memory computers, i.e. memory operations read and write
can be differentiable and thus be used with backpropagation.
This network can update its memory by stacking multiple-hop attention layers to perform
multi-step reasoning.
This model is based on the principle that intelligence requires an adaptive long-term memory,
unlike RNNs which is limited to short-term memory.
This technique is a precursor of Transformers.

Xavier Bresson 13
14

Outline

Language Models

Memory Networks

Transformers

Language Model Transformers

Sequence-To-Sequence Transformers

Transfer Learning

Conclusion

Xavier Bresson 14
15

Limitations of memory networks

Memory networks were promising, but not ground-breaking.


Transformers designed the first efficient version of attention networks !
Transformer improvements over memory networks :
Multiple queries (one per word)
Multi-head attention mechanism (more learning capacity)
Residual blocks (better backpropagation)

Xavier Bresson 15
16

Self-attention mechanism

Input set (continuous representation) :


2 3
x0
6 .. 7 n⇥d
{x0 , ..., xn 1 }, X=4 . 52R
xn 1
Initiate hidden state :
<latexit sha1_base64="fx3pNNiFAajWNyJIOtirmTBZ1Gg=">AAACbXicbVFNb9QwEHXCR4v5CiAOfAhZrBActlFSIcEFqVIvPRbEtiutw8pxJrtWHSfYk4pVlBu/kBt/gQt/AWebA7SMZOnpzTy9mee80cphkvwMwmvXb9zc2b1Fb9+5e+9+9ODhiatbK2Ema13beS4caGVghgo1zBsLoso1nOZnh0P/9BysU7X5jJsGskqsjCqVFOipZfSdd+zbMpmyOI7Z1MPO7KU94z2dMv61FQWdsw+MayhxQXkOK2U6Ya3Y9J2UPfVSzik/L2p0HoxyysEU4xjlVq3WmFGuDOOVwHWed5/6L53hqCpwrOjpMpokcbItdhWkI5iQsY6X0Q9e1LKtwKDUwrlFmjSYeUdUUoP3bB00Qp6JFSw8NMIbZd02rZ698kzBytr6Z5Bt2b8Vnaic21S5nxzWdZd7A/m/3qLF8n3WKdO0CEZeGJWtZlizIXpWKAsS9cYDIa3yuzK5FlZI9B80hJBePvkqONmP0yROP76dHOyPceySZ+QleUNS8o4ckCNyTGZEkl9BFDwJnga/w8fh8/DFxWgYjJpH5J8KX/8Besu4Dg==</latexit>

H = X 2 Rn⇥d
<latexit sha1_base64="AGgqVkFcaKeSvwE3pVF8gc+lZTE=">AAACC3icbVDLSsNAFJ3UV42vqEs3Q4vgqiRF0I1QcNNlFfuAJpbJZNIOnUzCzEQoIXs3/oobF4q49Qfc+TdO2iy09cCFwzn3cu89fsKoVLb9bVTW1jc2t6rb5s7u3v6BdXjUk3EqMOnimMVi4CNJGOWkq6hiZJAIgiKfkb4/vS78/gMRksb8Ts0S4kVozGlIMVJaGlm1NryCA+hSDt0IqYnvZ7f5fcZdRSMiYZCb5siq2w17DrhKnJLUQYnOyPpygxinEeEKMyTl0LET5WVIKIoZyU03lSRBeIrGZKgpR3qTl81/yeGpVgIYxkIXV3Cu/p7IUCTlLPJ1Z3GvXPYK8T9vmKrw0ssoT1JFOF4sClMGVQyLYGBABcGKzTRBWFB9K8QTJBBWOr4iBGf55VXSazYcu+HcnNdbzTKOKjgBNXAGHHABWqANOqALMHgEz+AVvBlPxovxbnwsWitGOXMM/sD4/AGO+Jlt</latexit>

Repeat K layers :
Softmax(QK T )V 2 Rn⇥d
<latexit sha1_base64="Z17LKnHzChkDZTU2dN73abf35uw=">AAACM3icbVBNSyNBEO3xY41ZXaMevTSGBb2EGVlWj4KXoBd1TRQyMfR0arSxP4buml3DMP/Ji3/EgyB7WBGv+x/siTm4ug8aXr9XRVW9JJPCYRg+BFPTM7Of5mrz9c8Li1+WGssrXWdyy6HDjTT2LGEOpNDQQYESzjILTCUSTpOrvco//QnWCaNPcJRBX7ELLVLBGXpp0Nhv05jGElJk1ppf1QfhGq0qfpgUFbsuN44Ozk82Ke1WntA0Vgwvk6Q4Ls8LHaNQ4OiwpPX6oNEMW+EY9COJJqRJJjgcNO7ioeG5Ao1cMud6UZhhv2AWBZdQ1uPcQcb4FbuAnqea+VH9YnxzSb96ZUhTY/3TSMfq246CKedGKvGV1cLuvVeJ//N6OaY7/ULoLEfQ/HVQmkuKhlYB0qGwwFGOPGHcCr8r5ZfMMo4+5iqE6P3JH0l3qxV9b0VH35q7W5M4amSNrJMNEpFtskva5JB0CCc35J78IY/BbfA7eAqeX0ungknPKvkHwd8XCA+pQw==</latexit>

H
Self-attention layer

Differentiable dictionary K = HW K 2 Rn⇥d , W K 2 Rd⇥d


<latexit sha1_base64="PJ5J7Z6lDOWhvX8uSiApVKQNxPc=">AAACzXichVJdS8MwFE3r15xfVR99CQ7FBxntEPVFEHwZ+qAT1w3WOdI0c2FpWpJUmHW++v98890fYroPcVbYhcDh3JNzb+6NHzMqlW1/GubC4tLySmG1uLa+sbllbe+4MkoEJnUcsUg0fSQJo5zUFVWMNGNBUOgz0vD7V1m+8UyEpBF/UIOYtEP0xGmXYqQ01bG+buDhBazCxuMN9CiHXohUz/fT++Fjyj1FQyJhMDyG3r+K4EcBPa/oTq3cuVZ5xaxVbWpVm2uVV8xa6ehYJbtsjwLmgTMBJTCJu4714QURTkLCFWZIypZjx6qdIqEoZmRY9BJJYoT76Im0NORIl2uno20M4YFmAtiNhD5cwRH7+0aKQikHoa+VWdPyby4j/8u1EtU9b6eUx4kiHI8LdRMGVQSz1cKACoIVG2iAsKC6V4h7SCCs9AfIhuD8fXIeuJWyc1o+rZ2ULiuTcRTAHtgHR8ABZ+ASVMEdqANsXBuxMTBezFszMV/Nt7HUNCZ3dsFMmO/f4a3ZmA==</latexit>

Dict is a standard structure in CS


Dict=(Key,Value) V = HW V 2 Rn⇥d , W V 2 Rd⇥d
Q = HW Q 2 Rn⇥d , W Q 2 Rd⇥d
Query the dictionary
for a key and its value. Learnable parameters
Xavier Bresson 16
17

Context-to-word representation
context

From fixed word representation to context-to-word representation :


Softmax(QK T )V 2 Rn⇥d
<latexit sha1_base64="Z17LKnHzChkDZTU2dN73abf35uw=">AAACM3icbVBNSyNBEO3xY41ZXaMevTSGBb2EGVlWj4KXoBd1TRQyMfR0arSxP4buml3DMP/Ji3/EgyB7WBGv+x/siTm4ug8aXr9XRVW9JJPCYRg+BFPTM7Of5mrz9c8Li1+WGssrXWdyy6HDjTT2LGEOpNDQQYESzjILTCUSTpOrvco//QnWCaNPcJRBX7ELLVLBGXpp0Nhv05jGElJk1ppf1QfhGq0qfpgUFbsuN44Ozk82Ke1WntA0Vgwvk6Q4Ls8LHaNQ4OiwpPX6oNEMW+EY9COJJqRJJjgcNO7ioeG5Ao1cMud6UZhhv2AWBZdQ1uPcQcb4FbuAnqea+VH9YnxzSb96ZUhTY/3TSMfq246CKedGKvGV1cLuvVeJ//N6OaY7/ULoLEfQ/HVQmkuKhlYB0qGwwFGOPGHcCr8r5ZfMMo4+5iqE6P3JH0l3qxV9b0VH35q7W5M4amSNrJMNEpFtskva5JB0CCc35J78IY/BbfA7eAqeX0ungknPKvkHwd8XCA+pQw==</latexit>

H
H
Softmax(QKT)V
The new data representation is a sum of all input data weighted by the
pairwise matching (or attention) scores.
Attention/
The subset of data with non-zero attention scores forms the context. Transformer layer

The attention mechanism allows to dynamically change the word


representation according to its context. Q K V

Context-to-word is a powerful idea in NLP because a word may have


different meanings, that can only be clarified in a particular context :
The vase broke. The news broke. Sandy broke the world record. Sandy
broke the law. We broke even. The burglar broke into the house. Etc.

Xavier Bresson 17
18

Memory
Computational cost
RNN cell
vector

he drives the car ConvNet cell


Convolution/Sliding pattern

RNNs

Sequence length L Self-Attention cell


Sum of inputs weighted
by matching scores
he drives the car

Pattern
centered at Same pattern
word “drives” centered at
word “the”
Kernel size k Pairwise
CNNs matching
scores he drives the car

Xavier Bresson ANNs 18


19

Computational cost

RNN layer : O(L.d2)

ConvNet layer : O(L.d2.k) Seems bad !


Transformer layer : O(L2.d)

with L : sequence length, d : hidden feature size, k : kernel size

Attention networks have actually less parameters to learn as long as L ≤ d !

Example 1 : L = 100, d = 1000, k = 3

RNN: O(108), ConvNet: O(3.108), Transformer: O(107)

Example 2 : L = 1000, d = 1000, k = 3

RNN: O(109), ConvNet: O(3.109), Transformer: O(109)

Xavier Bresson 19
20

Outline

Language Models

Memory Networks

Transformers

Language Model Transformers

Sequence-To-Sequence Transformers

Transfer Learning

Conclusion

Xavier Bresson 20
21

Language model transformers


Output
probability
Given a sequence of words, predict the next word. for next word

This task to be successful requires a word


representation that can be changed with different
contexts.
Transformers offer expressive word representation
with its word-in-context property.
A LM transformer is composed of three layers :
Word embedding layer
Attention layer
Positional Positional
Classification layer Encoding Encoding

Sequence of Current word


context words (query the
“Yesterday I went to next word)
the beach and I saw” “a”
Xavier Bresson 21
22

Word Embedding Layer Output


Probability

Categorical variables (dictionary of 10,000 words) are


represented by one-hot vectors and then
embedded into a linear space.
This is the same input embedding as in RNNs :
PyTorch nn.Embedding()

Positional Positional
Encoding Encoding
g1
<latexit sha1_base64="00Y0Og0iMWYqzRUKiiljeo3gU6Y=">AAAH0HicjVVLb9QwEA7PheVV4MglsK1UpLLaFAm4VKrooVxABdGHtF5VjuNNojp2ZHvbXUyEuHLnCj+B38O/YZxkX84iiLTK+Pu+mbHHM5swZ6nSvd7vS5evXL12vXXjZvvW7Tt3763df3CkxEgSekgEE/IkxIqylNNDnWpGT3JJcRYyehye7Vn++JxKlQr+UU9yOshwzNNhSrAGqL++3kbxadCG9+lap9ftlY/fNILa6Hj1c3B6//ovFAkyyijXhGGl+kEv1wODpU4Jo0UbjRTNMTnDMe2DyXFG1cCUey78DUAifygk/Lj2S3TRw+BMqUkWgjLDOlEuZ8FVXH+kh68GJuX5SFNOqkTDEfO18G0B/CiVlGg2AQMTmcJefZJgiYmGMrXdNDrJtupsW/WOtqoTLCktr4VgaktpgCiP4T4KiMbpBRFZhnlk0LgwBoVDf1wUy0RSE4lLqJpQLhHXRFy4SeDyWWGQ3VAYmg+uY170g4FBjA71ZidAMo0T/dTVyDqAzMoksHYUOFTzOJ+ncT47Ki5kNpehmQ65QmtAE4mRLuVuETJWRVFwXblW6SdqOkGjVDPZUAjNhaZ/EeZShLZ6BDPfP3BZaNEZ+dYlL5JU0zKJ3TM0gakQv5klZKNlZQmsEkpxwZeVJbJCGktKl6UVskIqabQktOsVskhoKEcl9eGpLophHjMKcgCqC5MVAt4RHdriHFcNApY5Lmr4JOW6E9REIlJC2wadWBRFqcoZnig9YRRpOtalVV7Oxkw0w6cXvUKzwEztf2n/wwM9RuAwPUZ1js525zmUT1EdinFvByXwMutwPkvAfFiHdVuRcwKtS6WZKoAGvEBnVPJn3RfoIupNKxRFWCXgaMpa7UzRRfCZOxowE7Nh3IOFw58LNuePYNEYLR4tBLArtwHxQoTXuBEBGmQ8F5QrV5EX83ZoTBRfOoL9P3lHmufIG6qDFar9hUT7jemctnFzOH2/sO0Nn7jA/aA1jaPtbtDrBu+3O7uv64/dDe+R98Tb9ALvpbfrvfEOvEOPeML77v3wfrY+tMatL62vlfTypdrnobf0tL79AT451hU=</latexit>
g2
<latexit sha1_base64="VJ2euAYwwf8YHQQUAtPa/r4zbWQ=">AAAH0HicjVVLb9QwEA7PheVV4MglsK1UpLLaFAm4VKrooVxABdGHtF5VjuNNojp2ZHvbXUyEuHLnCj+B38O/YZxkX84iiLTK+Pu+mbHHM5swZ6nSvd7vS5evXL12vXXjZvvW7Tt3763df3CkxEgSekgEE/IkxIqylNNDnWpGT3JJcRYyehye7Vn++JxKlQr+UU9yOshwzNNhSrAGqL++3kbx6XYb3qdrnV63Vz5+0whqo+PVz8Hp/eu/UCTIKKNcE4aV6ge9XA8MljoljBZtNFI0x+QMx7QPJscZVQNT7rnwNwCJ/KGQ8OPaL9FFD4MzpSZZCMoM60S5nAVXcf2RHr4amJTnI005qRINR8zXwrcF8KNUUqLZBAxMZAp79UmCJSYaytR20+gk26qzbdU72qpOsKS0vBaCqS2lAaI8hvsoIBqnF0RkGeaRQePCGBQO/XFRLBNJTSQuoWpCuURcE3HhJoHLZ4VBdkNhaD64jnnRDwYGMTrUm50AyTRO9FNXI+sAMiuTwNpR4FDN43yexvnsqLiQ2VyGZjrkCq0BTSRGupS7RchYFUXBdeVapZ+o6QSNUs1kQyE0F5r+RZhLEdrqEcx8/8BloUVn5FuXvEhSTcskds/QBKZC/GaWkI2WlSWwSijFBV9WlsgKaSwpXZZWyAqppNGS0K5XyCKhoRyV1IenuiiGecwoyAGoLkxWCHhHdGiLc1w1CFjmuKjhk5TrTlATiUgJbRt0YlEUpSpneKL0hFGk6ViXVnk5GzPRDJ9e9ArNAjO1/6X9Dw/0GIHD9BjVOTrbnedQPkV1KMa9HZTAy6zD+SwB82Ed1m1Fzgm0LpVmqgAa8AKdUcmfdV+gi6g3rVAUYZWAoylrtTNFF8Fn7mjATMyGcQ8WDn8u2Jw/gkVjtHi0EMCu3AbECxFe40YEaJDxXFCuXEVezNuhMVF86Qj2/+QdaZ4jb6gOVqj2FxLtN6Zz2sbN4fT9wrY3fOIC94PWNI62u0GvG7zf7uy+rj92N7xH3hNv0wu8l96u98Y78A494gnvu/fD+9n60Bq3vrS+VtLLl2qfh97S0/r2B0Wf1hY=</latexit>
g3
<latexit sha1_base64="U/joC5aM7ldT90xu7lzV6UgqZTU=">AAAH0HicjVVLb9QwEE55bVleLRy5BLZIIJVq00rApVIFh3IBFUQf0npVOY43ierYke1td3EjxJU7V/gJ/B7+DeMk+3IWQaRVxt/3zYw9ntmEOUuV7nZ/r1y5eu36jdbqzfat23fu3ltbv3+kxFASekgEE/IkxIqylNNDnWpGT3JJcRYyehyevbH88TmVKhX8kx7ntJ/hmKeDlGANUG9jo43i0502vE/XOt2tbvn4TSOojY5XPwen6zd+oUiQYUa5Jgwr1Qu6ue4bLHVKGC3aaKhojskZjmkPTI4zqvqm3HPhPwEk8gdCwo9rv0TnPQzOlBpnISgzrBPlchZcxvWGevCqb1KeDzXlpEo0GDJfC98WwI9SSYlmYzAwkSns1ScJlphoKFPbTaOTbLPOtlnvaLM6wYLS8loIpjaVBojyGO6jgGicXhCRZZhHBo0KY1A48EdFsUgkNZG4hKoJ5RJxTcSFmwQunxUG2Q2FofnoOuZFL+gbxOhAP+0ESKZxop+5GlkHkFmZBNaOAodqFudyEufSUXEhs5kMTXXIFVoDmkgMdSl3i5CxKoqC68q1Sj9T0wkapZrKBkJoLjT9izCXIrTVI5j5/oHLQotOyXcueZGkmpZJ7J6hCUyF+M0sIRsuKktgmVCKC76oLJEl0lhSuiitkCVSSaMFoV0vkUVCQzkqqQ9PdVEM85hRkANQXZisEPCO6MAW57hqELDMcVHDJynXnaAmEpES2jboxKIoSlXO8FjpMaNI05EurfJynkxFU3xy0Us0c8zE/pf2PzzQIwQOk2NU5+hsd3agfIrqUIy6uyiBl9mA81kC5sM6bNiKnBNoXSrNRAE04AU6o5I/33qBLqLupEJRhFUCjqas1e4EnQefu6MBMzEdxjewcPhzwWb8ESwao8WjuQB25TYgnovwGjciQIOMZoJy5SryYtYOjYniC0ew/yfvSfMceUN1sES1P5dovzGdkzZuDqfvF7a94RMXuB+0pnG0vRV0t4IP25291/XHbtV76D32nnqB99Lb8956B96hRzzhffd+eD9bH1uj1pfW10p6ZaX2eeAtPK1vfwBNBdYX</latexit>
g4
<latexit sha1_base64="uHnRM4zytLtp8psjMB9nWobHQdU=">AAAH0HicjVVLb9QwEA6vblleLRy5BLZIILXVpiDgUqmCA1xABdGHtF5VjuNNojp2ZHvbXdwIceXOFX4Cv4d/wzjJvpxFEGmV8fd9M2OPZzZhzlKlu93fly5fuXptpbV6vX3j5q3bd9bW7x4qMZSEHhDBhDwOsaIs5fRAp5rR41xSnIWMHoWnry1/dEalSgX/pMc57Wc45ukgJVgD1NvYaKP45Fkb3idrne52t3z8phHURsern/2T9ZVfKBJkmFGuCcNK9YJurvsGS50SRos2GiqaY3KKY9oDk+OMqr4p91z4jwCJ/IGQ8OPaL9F5D4MzpcZZCMoM60S5nAWXcb2hHrzsm5TnQ005qRINhszXwrcF8KNUUqLZGAxMZAp79UmCJSYaytR20+gk26yzbdY72qxOsKC0vBaCqU2lAaI8hvsoIBqn50RkGeaRQaPCGBQO/FFRLBJJTSQuoWpCuURcE3HhJoHLZ4VBdkNhaD66jnnRC/oGMTrQjzsBkmmc6CeuRtYBZFYmgbWjwKGaxbmYxLlwVFzIbCZDUx1yhdaAJhJDXcrdImSsiqLgunKt0s/UdIJGqaaygRCaC03/IsylCG31CGa+v++y0KJT8p1LnieppmUSu2doAlMhfjNLyIaLyhJYJpTinC8qS2SJNJaULkorZIlU0mhBaNdLZJHQUI5K6sNTXRTDPGYU5ABUFyYrBLwjOrDFOaoaBCxzVNTwccp1J6iJRKSEtg06tiiKUpUzPFZ6zCjSdKRLq7ycR1PRFJ9c9BLNHDOx/6X9Dw/0AIHD5BjVOTo7nadQPkV1KEbdXZTAy2zA+SwB82EdNmxFzgi0LpVmogAa8AKdUsm3tp+j86g7qVAUYZWAoylrtTtB58EtdzRgJqbD+BoWDn8m2Iw/hEVjtHg0F8Cu3AbEcxFe4UYEaJDRTFCuXEVezNqhMVF84Qj2/+Q9aZ4jb6j2l6jezCV605jOSRs3h9P3C9ve8IkL3A9a0zjc2Q6628GHnc7eq/pjt+rd9x56j73Ae+HteW+9fe/AI57wvns/vJ+tj61R60vrayW9fKn2uectPK1vfwBUa9YY</latexit>
sha1_base64="rg2bC7kLVucqcZBxOLpyGf0aF/s=">AAAHt3icjVXbbtQwEE25bVkKtM+8BLaVitRWSZGAl0qISpQXUEH0ItWrynG8m6iOHdlOu4ubH+CVD+F7+BvGSfbmLAJLqx2fczxjj2fiKGep0kHwe+XO3Xv3H3RWH3YfrXUfP3m6vnaqRCEJPSGCCXkeYUVZyumJTjWj57mkOIsYPYuuDi1/dk2lSgX/psc57Wd4yNNBSrAG6PhyvRfsBdXw20bYGD2vGZcbD36hWJAio1wThpW6CINc9w2WOiWMll1UKJpjcoWH9AJMjjOq+qbaZ+lvARL7AyHhx7VfofMrDM6UGmcRKDOsE+VyFlzGXRR68LZvUp4XmnJSBxoUzNfCt4f241RSotkYDExkCnv1SYIlJhpS03XD6CTbaaLtNDvaqU+woLS8FoKpHaUBonwId1CCN05viMgyzGODRqUxKBr4o7JcJJKGSFxCNYRyiWFDDEs3CFw4Kw2yG4oi89VdmJcXYd8gRgd6uxcimQ4T/dLVyMaBzKogMHcUOFIzP7cTP7eOiguZzWRoqkOu0BpQRKLQldxNQsZqLwquK9cq/U5NL2ylaiobCKG50PQvwlyKyGaPYOb7xy4LJTolP7nkTZJqWgWxe4YiMDXit6NErFhUVsAyoRQ3fFFZIUukQ0nporRGlkgljReEdr5EFgsN6ailPoz6ohjmQ0ZBDkB9YbJGYHVMBzY5Z3WBgGXOygY+T7nuhQ2RiJTQrkHnFkVxqnKGx0qPGUWajnRlVZezNRVN8clFL9HMMRP7X9r/WIGeI1gwOUZ9jt5+7xWkT1EdiVFwgBL4M5twPktAf9gFmzYj1wRKl0ozUQANeImuqOS7e6/RTRxMMhTHWCWw0FS5Opig8+Cu2xrQE9NmPISJw18LNuNPYdJqLR7PObAztwDxnIf3uOUBCmQ0E1QzV5GXs3JodRRfOIL9nnwm7XPkLdXxEtXRXKCjVndOyrjdnL5f2vKGFy5037O2cbq/FwZ74ZfAW/WeeS+8bS/03njvvI/esXfiES/2fng/Ox86rKPql/DOSvMkbngLo1P8AfvQ0YI=</latexit>
sha1_base64="1/ymqKuJw60FflTbqdoSVy4So7M=">AAAHxXicjVXbbtw2EGWu625udl7zomZtIAEcQ3KLJC8GguYheWnhGPEFWC4MiuJKgilSILn2bhih6Ef0tfmEfk//pkNJe6M2aAksNDzncIYczizjkufahOE/t27fuXvvfm/rh/6Dh48eP9neeXim5URRdkoll+oiJprxXLBTkxvOLkrFSBFzdh5fvXf8+TVTOpfis5mVbFSQVOTjnBID0HB3t4/Ty5/78L3cHoQHYT2CrhG1xgC14/hy5/7fOJF0UjBhKCdaD6OwNCNLlMkpZ1UfTzQrCb0iKRuCKUjB9MjWe66CPUCSYCwV/IQJanR1hSWF1rMiBmVBTKZ9zoGbuOHEjN+ObC7KiWGCNoHGEx4YGbgEBEmuGDV8BgahKoe9BjQjilADaer7YUxW7LfR9tsd7TcnWFM63kjJ9b42ADGRwn1U4E2wGyqLgojE4mllLY7HwbSq1omsJTKf0C2hfSJtibTyg8Dl88pit6E4tif+wrIaRiOLORubF4MIqzzNzEtfo1oHqqiDwNxTkFgv/Xyd+/nqqYRUxVKGFzrsC50BRSQnppb7SSh440XDdZVG51+YHUSdVC1kYymNkIZ9R1gqGbvsUcKD4NhnoUQX5K8+eZPlhtVB3J6hCGyDBN0oMZ+sK2tgk1DJG7GurJEN0lQxti5tkA1SxZI1oZtvkCXSQDoaaQCjuShORMoZyAFoLkw1CKxO2Ngl57wpELDsedXCF7kwg6glMplT1rf4wqE4yXXJyUybGWfYsKmprfpy9haiBT6/6A2aFWZu/5f2f6zAP2JYMD9Gc47B4eAnSJ9mJpbT8Ahn8LG7cD5HQH+4BbsuI9cUSpcpO1cADXiFr5gSrw5e45sknGcoSYjOYKGtc3U0R1fBV35rQE8smvE9TDz+WvIlfwaTTmuJZMWBm/kFSFY8/EI6HqBApktBPfMVZbUsh05HibUjuP+T32j3HGVHdbxB9WEl0IdOd87LuNucQVC58oYnLvIftK5xdngQhQfRpxBtoWfoOXqBIvQGvUMf0TE6RRRJ9Cf6C33rnfSmvd+bx/D2rfZVfIrWRu+PfwH4sdTZ</latexit>
sha1_base64="8hS6En7yoUeJfpzKXVynuzsw0KI=">AAAH0HicjVVLb9QwEE55LSxvOHIJbJFAKlVSEHBBQu2hXEAF0Ye0XlWO402iOnZke9tdTIS4cucKP4Hfw79hnGRfziKItMr4+76Zscczm6hgmdJB8Hvt3PkLFy91Ll/pXr12/cbNW7fvHCgxkoTuE8GEPIqwoizjdF9nmtGjQlKcR4weRic7lj88pVJlgn/Uk4IOcpzwbJgRrAHqr693UXL8rAvv41u9YDOoHr9thI3R85pn7/j2pV8oFmSUU64Jw0r1w6DQA4OlzgijZReNFC0wOcEJ7YPJcU7VwFR7Lv2HgMT+UEj4ce1X6KKHwblSkzwCZY51qlzOgqu4/kgPXw5MxouRppzUiYYj5mvh2wL4cSYp0WwCBiYyg736JMUSEw1l6rppdJpvNNk2mh1t1CdYUlpeC8HUhtIAUZ7AfZQQjdMzIvIc89igcWkMiob+uCyXibQhUpdQDaFcImmIpHSTwOWz0iC7oSgyH1zHouyHA4MYHepHvRDJLEn1Y1cjmwAyr5LA2lHgSM3jfJ7G+eyouJD5XIZmOuQKrQFNJEa6krtFyFkdRcF1FVpln6jpha1SzWRDITQXmv5FWEgR2eoRzHx/z2WhRWfkW5c8SzNNqyR2z9AEpkb8dpaIjZaVFbBKKMUZX1ZWyAppIildltbICqmk8ZLQrlfIYqGhHLXUh6e+KIZ5wijIAagvTNYIeMd0aItzWDcIWOawbOCjjOte2BCpyAjtGnRkURRnqmB4ovSEUaTpWFdWdTkPZ6IZPr3oFZoFZmr/S/sfHug+AofpMepz9LZ6T6F8iupIjINXKIWXWYfzWQLmwzqs24qcEmhdKs1UATTgJTqhkj/ZfI7O4mBaoTjGKgVHU9Xq1RRdBJ+4owEzMRvGHVg4/Klgc/4AFq3R4vFCALtyGxAvRNjGrQjQIOO5oFq5iqKct0NrovjSEez/yTvSPkfRUu2tUO0uJNptTee0jdvD6fulbW/4xIXuB61tHGxthsFm+D7ovd5uPnaXvXveA++RF3ovvNfeG2/P2/eIJ7zv3g/vZ+dDZ9z50vlaS8+tNT53vaWn8+0PU8vWFg==</latexit>
g9 g10

U
<latexit sha1_base64="n7ZTHUiUop3CT0ZMCIHMhParPlg=">AAAH1nicjVXPj9Q2FA6UMnRoy9IeewnMIlFpWU0WifaChOAAl6IFsbsjrUcrx3kzidaxI9vZnalJbxVX7lzh3L+H/4bnJPPLGUQtrfb5+773nv383iQueKbNcPj5ytXvrn1/vXfjh/7NH3/6+dbO7V+OtSwVgyMmuVSjmGrgmYAjkxkOo0IBzWMOJ/H5M8efXIDSmRRvzLyAcU6nIptkjBqEznZu7e72ScxLsEdVH+2zncFwf1ivsGtErTEI2nV4dvv6fySRrMxBGMap1qfRsDBjS5XJGIeqT0oNBWXndAqnaAqagx7b+uRVeA+RJJxIhX/ChDW67mFprvU8j1GZU5Nqn3PgNu60NJM/xzYTRWlAsCbRpOShkaErQ5hkCpjhczQoUxmeNWQpVZQZLFbfT2PSfK/NtteeaK+5wYbS8UZKrve0QQjEFF+lwmgCLpnMcyoSS2aVtSSehLOq2iTSlkh9QreE9olpS0wrPwm2AK8scQeKY/vadyyq02hsCYeJuT+IiMqmqfnd16g2gMrrJLj3FDTWqzhvF3HeeiohVb6SkaWO+EJnYBPJ0tRyvwg5b6JofK7C6OxvsIOoU6qlbCKlEdLAV4SFkrGrHqM8DA99Flt0Sf7lk5dpZqBO4s6MTWAbJOxmcYO1oayBbUIlL8Wmska2SKcKYFPaIFukCpINodtvkSXSYDkaaYireShOxZQDyhFoHkw1CHonMHHFOWkaBC17UrXwKBNmELVEKjMGfUtGDiVJpgtO59rMORADM1Nb9ePcW4qW+OKht2jWmIX9Le3/8CB3CDosrtHcY3AweIjl02BiORs+Jin+s7t4P0fgfDiHXVeRC4atC8ouFEgjXpFzUOLB/iNymQwXFUoSqlN0tHWtHi/QdfCBPxo4E8thfIYbj7+QfMUf46YzWiJZC+B2fgPStQhPaScCNshsJah3vqKoVu3QmSixcQX3e/KSde9RdFSHW1TP1xI970znoo27wxmGlWtv/MRF/getaxwf7EfD/ejVweDJ0/ZjdyP4Lbgb3A+i4I/gSfAiOAyOAhaUwYfgY/CpN+r90/u3966RXr3S+vwabKze+y/fmdhs</latexit>
<latexit

U
<latexit sha1_base64="n7ZTHUiUop3CT0ZMCIHMhParPlg=">AAAH1nicjVXPj9Q2FA6UMnRoy9IeewnMIlFpWU0WifaChOAAl6IFsbsjrUcrx3kzidaxI9vZnalJbxVX7lzh3L+H/4bnJPPLGUQtrfb5+773nv383iQueKbNcPj5ytXvrn1/vXfjh/7NH3/6+dbO7V+OtSwVgyMmuVSjmGrgmYAjkxkOo0IBzWMOJ/H5M8efXIDSmRRvzLyAcU6nIptkjBqEznZu7e72ScxLsEdVH+2zncFwf1ivsGtErTEI2nV4dvv6fySRrMxBGMap1qfRsDBjS5XJGIeqT0oNBWXndAqnaAqagx7b+uRVeA+RJJxIhX/ChDW67mFprvU8j1GZU5Nqn3PgNu60NJM/xzYTRWlAsCbRpOShkaErQ5hkCpjhczQoUxmeNWQpVZQZLFbfT2PSfK/NtteeaK+5wYbS8UZKrve0QQjEFF+lwmgCLpnMcyoSS2aVtSSehLOq2iTSlkh9QreE9olpS0wrPwm2AK8scQeKY/vadyyq02hsCYeJuT+IiMqmqfnd16g2gMrrJLj3FDTWqzhvF3HeeiohVb6SkaWO+EJnYBPJ0tRyvwg5b6JofK7C6OxvsIOoU6qlbCKlEdLAV4SFkrGrHqM8DA99Flt0Sf7lk5dpZqBO4s6MTWAbJOxmcYO1oayBbUIlL8Wmska2SKcKYFPaIFukCpINodtvkSXSYDkaaYireShOxZQDyhFoHkw1CHonMHHFOWkaBC17UrXwKBNmELVEKjMGfUtGDiVJpgtO59rMORADM1Nb9ePcW4qW+OKht2jWmIX9Le3/8CB3CDosrtHcY3AweIjl02BiORs+Jin+s7t4P0fgfDiHXVeRC4atC8ouFEgjXpFzUOLB/iNymQwXFUoSqlN0tHWtHi/QdfCBPxo4E8thfIYbj7+QfMUf46YzWiJZC+B2fgPStQhPaScCNshsJah3vqKoVu3QmSixcQX3e/KSde9RdFSHW1TP1xI970znoo27wxmGlWtv/MRF/getaxwf7EfD/ejVweDJ0/ZjdyP4Lbgb3A+i4I/gSfAiOAyOAhaUwYfgY/CpN+r90/u3966RXr3S+vwabKze+y/fmdhs</latexit>
<latexit
U
<latexit sha1_base64="n7ZTHUiUop3CT0ZMCIHMhParPlg=">AAAH1nicjVXPj9Q2FA6UMnRoy9IeewnMIlFpWU0WifaChOAAl6IFsbsjrUcrx3kzidaxI9vZnalJbxVX7lzh3L+H/4bnJPPLGUQtrfb5+773nv383iQueKbNcPj5ytXvrn1/vXfjh/7NH3/6+dbO7V+OtSwVgyMmuVSjmGrgmYAjkxkOo0IBzWMOJ/H5M8efXIDSmRRvzLyAcU6nIptkjBqEznZu7e72ScxLsEdVH+2zncFwf1ivsGtErTEI2nV4dvv6fySRrMxBGMap1qfRsDBjS5XJGIeqT0oNBWXndAqnaAqagx7b+uRVeA+RJJxIhX/ChDW67mFprvU8j1GZU5Nqn3PgNu60NJM/xzYTRWlAsCbRpOShkaErQ5hkCpjhczQoUxmeNWQpVZQZLFbfT2PSfK/NtteeaK+5wYbS8UZKrve0QQjEFF+lwmgCLpnMcyoSS2aVtSSehLOq2iTSlkh9QreE9olpS0wrPwm2AK8scQeKY/vadyyq02hsCYeJuT+IiMqmqfnd16g2gMrrJLj3FDTWqzhvF3HeeiohVb6SkaWO+EJnYBPJ0tRyvwg5b6JofK7C6OxvsIOoU6qlbCKlEdLAV4SFkrGrHqM8DA99Flt0Sf7lk5dpZqBO4s6MTWAbJOxmcYO1oayBbUIlL8Wmska2SKcKYFPaIFukCpINodtvkSXSYDkaaYireShOxZQDyhFoHkw1CHonMHHFOWkaBC17UrXwKBNmELVEKjMGfUtGDiVJpgtO59rMORADM1Nb9ePcW4qW+OKht2jWmIX9Le3/8CB3CDosrtHcY3AweIjl02BiORs+Jin+s7t4P0fgfDiHXVeRC4atC8ouFEgjXpFzUOLB/iNymQwXFUoSqlN0tHWtHi/QdfCBPxo4E8thfIYbj7+QfMUf46YzWiJZC+B2fgPStQhPaScCNshsJah3vqKoVu3QmSixcQX3e/KSde9RdFSHW1TP1xI970znoo27wxmGlWtv/MRF/getaxwf7EfD/ejVweDJ0/ZjdyP4Lbgb3A+i4I/gSfAiOAyOAhaUwYfgY/CpN+r90/u3966RXr3S+vwabKze+y/fmdhs</latexit>
<latexit
U
<latexit sha1_base64="n7ZTHUiUop3CT0ZMCIHMhParPlg=">AAAH1nicjVXPj9Q2FA6UMnRoy9IeewnMIlFpWU0WifaChOAAl6IFsbsjrUcrx3kzidaxI9vZnalJbxVX7lzh3L+H/4bnJPPLGUQtrfb5+773nv383iQueKbNcPj5ytXvrn1/vXfjh/7NH3/6+dbO7V+OtSwVgyMmuVSjmGrgmYAjkxkOo0IBzWMOJ/H5M8efXIDSmRRvzLyAcU6nIptkjBqEznZu7e72ScxLsEdVH+2zncFwf1ivsGtErTEI2nV4dvv6fySRrMxBGMap1qfRsDBjS5XJGIeqT0oNBWXndAqnaAqagx7b+uRVeA+RJJxIhX/ChDW67mFprvU8j1GZU5Nqn3PgNu60NJM/xzYTRWlAsCbRpOShkaErQ5hkCpjhczQoUxmeNWQpVZQZLFbfT2PSfK/NtteeaK+5wYbS8UZKrve0QQjEFF+lwmgCLpnMcyoSS2aVtSSehLOq2iTSlkh9QreE9olpS0wrPwm2AK8scQeKY/vadyyq02hsCYeJuT+IiMqmqfnd16g2gMrrJLj3FDTWqzhvF3HeeiohVb6SkaWO+EJnYBPJ0tRyvwg5b6JofK7C6OxvsIOoU6qlbCKlEdLAV4SFkrGrHqM8DA99Flt0Sf7lk5dpZqBO4s6MTWAbJOxmcYO1oayBbUIlL8Wmska2SKcKYFPaIFukCpINodtvkSXSYDkaaYireShOxZQDyhFoHkw1CHonMHHFOWkaBC17UrXwKBNmELVEKjMGfUtGDiVJpgtO59rMORADM1Nb9ePcW4qW+OKht2jWmIX9Le3/8CB3CDosrtHcY3AweIjl02BiORs+Jin+s7t4P0fgfDiHXVeRC4atC8ouFEgjXpFzUOLB/iNymQwXFUoSqlN0tHWtHi/QdfCBPxo4E8thfIYbj7+QfMUf46YzWiJZC+B2fgPStQhPaScCNshsJah3vqKoVu3QmSixcQX3e/KSde9RdFSHW1TP1xI970znoo27wxmGlWtv/MRF/getaxwf7EfD/ejVweDJ0/ZjdyP4Lbgb3A+i4I/gSfAiOAyOAhaUwYfgY/CpN+r90/u3966RXr3S+vwabKze+y/fmdhs</latexit>
<latexit

U
<latexit sha1_base64="n7ZTHUiUop3CT0ZMCIHMhParPlg=">AAAH1nicjVXPj9Q2FA6UMnRoy9IeewnMIlFpWU0WifaChOAAl6IFsbsjrUcrx3kzidaxI9vZnalJbxVX7lzh3L+H/4bnJPPLGUQtrfb5+773nv383iQueKbNcPj5ytXvrn1/vXfjh/7NH3/6+dbO7V+OtSwVgyMmuVSjmGrgmYAjkxkOo0IBzWMOJ/H5M8efXIDSmRRvzLyAcU6nIptkjBqEznZu7e72ScxLsEdVH+2zncFwf1ivsGtErTEI2nV4dvv6fySRrMxBGMap1qfRsDBjS5XJGIeqT0oNBWXndAqnaAqagx7b+uRVeA+RJJxIhX/ChDW67mFprvU8j1GZU5Nqn3PgNu60NJM/xzYTRWlAsCbRpOShkaErQ5hkCpjhczQoUxmeNWQpVZQZLFbfT2PSfK/NtteeaK+5wYbS8UZKrve0QQjEFF+lwmgCLpnMcyoSS2aVtSSehLOq2iTSlkh9QreE9olpS0wrPwm2AK8scQeKY/vadyyq02hsCYeJuT+IiMqmqfnd16g2gMrrJLj3FDTWqzhvF3HeeiohVb6SkaWO+EJnYBPJ0tRyvwg5b6JofK7C6OxvsIOoU6qlbCKlEdLAV4SFkrGrHqM8DA99Flt0Sf7lk5dpZqBO4s6MTWAbJOxmcYO1oayBbUIlL8
<latexit sha1_base64="n7ZTHUiUop3CT0ZMCIHMhParPlg=">AAAH1nicjVXPj9Q2FA6UMnRoy9IeewnMIlFpWU0WifaChOAAl6IFsbsjrUcrx3kzidaxI9vZnalJbxVX7lzh3L+H/4bnJPPLGUQtrfb5+773nv383iQueKbNcPj5ytXvrn1/vXfjh/7NH3/6+dbO7V+OtSwVgyMmuVSjmGrgmYAjkxkOo0IBzWMOJ/H5M8efXIDSmRRvzLyAcU6nIptkjBqEznZu7e72ScxLsEdVH+2zncFwf1ivsGtErTEI2nV4dvv6fySRrMxBGMap1qfRsDBjS5XJGIeqT0oNBWXndAqnaAqagx7b+uRVeA+RJJxIhX/ChDW67mFprvU8j1GZU5Nqn3PgNu60NJM/xzYTRWlAsCbRpOShkaErQ5hkCpjhczQoUxmeNWQpVZQZLFbfT2PSfK/NtteeaK+5wYbS8UZKrve0QQjEFF+lwmgCLpnMcyoSS2aVtSSehLOq2iTSlkh9QreE9olpS0wrPwm2AK8scQeKY/vadyyq02hsCYeJuT+IiMqmqfnd16g2gMrrJLj3FDTWqzhvF3HeeiohVb6SkaWO+EJnYBPJ0tRyvwg5b6JofK7C6OxvsIOoU6qlbCKlEdLAV4SFkrGrHqM8DA99Flt0Sf7lk5dpZqBO4s6MTWAbJOxmcYO1oayBbUIlL8Wmska2SKcKYFPaIFukCpINodtvkSXSYDkaaYireShOxZQDyhFoHkw1CHonMHHFOWkaBC17UrXwKBNmELVEKjMGfUtGDiVJpgtO59rMORADM1Nb9ePcW4qW+OKht2jWmIX9Le3/8CB3CDosrtHcY3AweIjl02BiORs+Jin+s7t4P0fgfDiHXVeRC4atC8ouFEgjXpFzUOLB/iNymQwXFUoSqlN0tHWtHi/QdfCBPxo4E8thfIYbj7+QfMUf46YzWiJZC+B2fgPStQhPaScCNshsJah3vqKoVu3QmSixcQX3e/KSde9RdFSHW1TP1xI970znoo27wxmGlWtv/MRF/getaxwf7EfD/ejVweDJ0/ZjdyP4Lbgb3A+i4I/gSfAiOAyOAhaUwYfgY/CpN+r90/u3966RXr3S+vwabKze+y/fmdhs</latexit>
U U
Yesterday I went to … saw a Sequence of Query the
context words next word

Sequence of context words Query


World Embedding Layer
Xavier Bresson 22
23

Multi-head attention Output


Probability

Update equation for one MHA layer :


PyTorch nn.MultiheadAttention()
Multi-Head
Query Key Value Attention (MHA)

h̄ = MHA(q, K, V ) 2 Rd , q 2 Rd , K 2 RL⇥d , V 2 RL⇥d


<latexit sha1_base64="lulX1scWBYPoVfZBVEb2LzpCUgM=">AAADUXicdVLLbhMxFPUkPErKI4UlG4sIlEjTUaZChU2kAptIKaIgMqkUJyOPx8lYnVdsDxAZ/yILWPEfbFiA8KQDNEl7JcvH577OtW6Qx0zIbve7Vatfu37j5s6txu7tO3fvNffueyIrOKFDksUZPw2woDFL6VAyGdPTnFOcBDEdBWevSv/oA+WCZel7uczpJMHzlM0YwdJQ/p41RwHmKtLwSQ8iST9JnqjX/Re6vbAHttdBLEUJllEQqHd6GtoQwcUWN1hn1DGSLKEChtqG3pU+iFCjbPqSzdsQffZV1HP1tG86/NVhZPhRJWQV14Gj6ZtSg7k26obrdf/VgB+ZjKCGC/h/vrn25SWDDXpeD6kLQUpqe+2572rbcZwNsn2873Y00ldO2jDmN1tdp7syuA3cCrRAZSd+8ysKM1IkNJUkxkKM3W4uJwpzyUhMdQMVguaYnOE5HRuYYtNrolYboeFjw4RwlnFzUglX7MUMhRMhlklgIkvFYtNXkpf5xoWcPZ8oluaFpCk5bzQrYigzWK4XDBmnRMZLAzDhzGiFJMIcE2mWsPwEd3PkbeAdOO6hc/j2aevooPqOHfAQPAJt4IJn4Aj0wQkYAmJ9sX5Yv6zftW+1n3VQr52H1qwq5wFYs/ruH6VnDXU=</latexit>

⇣ ⌘
= kh=1 HAh (q, K, V ) W O , W O 2 Rd⇥d
H

Concatenation
operation with q = gt 2 Rd , K = V = {gt , gt 1 , ..., gt (L 1) } 2 RL⇥d
Positional Positional
⇣ q KT ⌘
<latexit sha1_base64="XJxsbxiqn0moy0i0DVYB61bfbXY=">AAADgHicnVLbbtNAEN3YXEq4NIVHXkZEVKkU2rhCBSFVKvASyS8FGqdSnFrrzTpe1dfdMTSy/Bt8GG98DBJrJ6CSRiAxkq3jc2bOWa/GzyKhcDD43jLMW7fv3N26177/4OGj7c7OY0elhWR8xNIolec+VTwSCR+hwIifZ5LT2I/42L98X+vjz1wqkSZnuMj4NKbzRASCUdSUt9P62naRX6GMy+Hbygt7ed/uO3uwewy/+E9pgDG9qtx3Yt4DN5CUlbkXgu2FF2dV6apcYjk7KIdVVUHdtAeOF7oicWOKoe+XH6sLrQ+16LZ3f9vCF4EhVFBbHUM+1m45bJjqQyOtKy6KmCvY4Ku/84LOVm+7sbdrD3vdPrlmsoyx/zfGaWKc2sP5d4zzt5h2XV6nO9gfNAU3gbUCXbKqU6/zzZ2lrIh5giyiSk2sQYbTkkoULOJV2y0Uzyi7pHM+0TChOm9aNgtUwXPNzCBIpX4ShIa9PlHSWKlF7OvO+tRqXavJTdqkwOD1tBRJViBP2DIoKCLAFOpthJmQnGG00IAyKfRZgYVUbxjqna0vwVr/5ZvAOdy3jvaPPrzsnhyurmOLPCXPSI9Y5BU5IUNySkaEtX4YXaNvvDANs2cemNay1WitZp6QP8p88xMznhpm</latexit>

Encoding
h Encoding
HAh (q, K, V ) = Softmax p h Vh 2 Rd/H
d/H
with qh = qWhq 2 Rd/H , Whq 2 Rd⇥d/H
Kh = KWhK 2 Rn⇥d/H , WhK 2 Rd⇥d/H Sequence of Query the
context words next word
Vh = V WhV 2 Rn⇥d/H , WhV 2 Rd⇥d/H

Xavier Bresson 23
24

Head attention layer

Equation of a single head attention layer :


⇣ qK T ⌘
<latexit sha1_base64="LiIjjOia89b+4+q+Ni24AXHNeVg=">AAACp3icfZFdb9MwFIad8DXKxzq43I1FAbVSVSUT2rhB2tjNRJEYrM0q1V3lOE5nzU5S+2RaZeWv8SO449/gdAWNFnEkS6/e59jn+Jy4kMJAEPz0/Hv3Hzx8tPW48eTps+fbzZ0XkclLzfiQ5TLXo5gaLkXGhyBA8lGhOVWx5Ofx1XHNz6+5NiLPBrAo+ETRWSZSwSg4a9r8ToDfgFb25Khqz7v9btTBbz9g/Ns+y1NQ9KYiH8WsjUmqKbNz3L8YVJaYuQabVJXLdrSDIyIyoihcxrH9Vl3YkIBQ3OCk6mKC5/+h/TX2+Q/D3Y1n78DGtNkKesEy8KYIV6KFVnE6bf4gSc5KxTNgkhozDoMCJpZqEEzyqkFKwwvKruiMj53MqKs0scs5V/iNcxKc5tqdDPDSvXvDUmXMQsUus+7XrLPa/Bcbl5C+n1iRFSXwjN0WSkuJIcf10nAiNGcgF05QpoXrFbNL6nYBbrX1EML1L2+KaK8X7vf2v75rHe6txrGFdtEr1EYhOkCH6ASdoiFi3mvvk3fmDfyO/8WP/NFtqu+t7rxEf4VPfwE6/tAh</latexit>

HA(q, K, V ) = Softmax p V 2 R1⇥d , q 2 R1⇥d , K 2 RL⇥d , V 2 RL⇥d


d

Attention to the
context words
Prediction of
the next
word

Yesterday I … saw a dolphin Words in


document

gt
<latexit sha1_base64="4oifmi6YMpjuyVBbE/rL2FKzDJ0=">AAAB/HicbVDLSsNAFJ3UV62vaJdugkWoYEsiUl0W3LhwUcE+oA1hMpm0QyczYWYihBB/xY0LRdz6Ie78GydtF9p64MLhnHu59x4/pkQq2/42SmvrG5tb5e3Kzu7e/oF5eNSTPBEIdxGnXAx8KDElDHcVURQPYoFh5FPc96c3hd9/xEISzh5UGmM3gmNGQoKg0pJnVkc+p0E2zr1MNep3Decsr3hmzW7aM1irxFmQGlig45lfo4CjJMJMIQqlHDp2rNwMCkUQxXlllEgcQzSFYzzUlMEISzebHZ9bp1oJrJALXUxZM/X3RAYjKdPI150RVBO57BXif94wUeG1mxEWJwozNF8UJtRS3CqSsAIiMFI01QQiQfStFppAAZHSeRUhOMsvr5LeRdNpNVv3l7X2+SKOMjgGJ6AOHHAF2uAWdEAXIJCCZ/AK3own48V4Nz7mrSVjMVMFf2B8/gAf7ZO3</latexit>

(L 1) gt
<latexit sha1_base64="xKCFVgjdocVyvenp8fH+w4TBFZs=">AAAB/HicbVDLSsNAFJ34rPUV7dJNsAgVbEmKVJcFNy5cVLAPaEOYTCbt0MlMmJkIIdRfceNCEbd+iDv/xkmbhbYeuHA4517uvcePKZHKtr+NtfWNza3t0k55d2//4NA8Ou5JngiEu4hTLgY+lJgShruKKIoHscAw8inu+9Ob3O8/YiEJZw8qjbEbwTEjIUFQackzKyOf0yAbz7xM1Wt39eb5rOyZVbthz2GtEqcgVVCg45lfo4CjJMJMIQqlHDp2rNwMCkUQxbPyKJE4hmgKx3ioKYMRlm42P35mnWklsEIudDFlzdXfExmMpEwjX3dGUE3kspeL/3nDRIXXbkZYnCjM0GJRmFBLcStPwgqIwEjRVBOIBNG3WmgCBURK55WH4Cy/vEp6zYbTarTuL6vtiyKOEjgBp6AGHHAF2uAWdEAXIJCCZ/AK3own48V4Nz4WrWtGMVMBf2B8/gAhdJO4</latexit>

(L 2)
… gt
<latexit sha1_base64="RgoPsL38sDPl7mYgZ2akQngPojo=">AAAB+XicbVBNS8NAEN3Urxq/oh69LBbBg5ZEpHosePFYwdpCG8Jms2mXbnbD7qZQQv6JFw+KePWfePPfuGlz0NYHA4/3ZpiZF6aMKu2631ZtbX1jc6u+be/s7u0fOIdHT0pkEpMuFkzIfogUYZSTrqaakX4qCUpCRnrh5K70e1MiFRX8Uc9S4idoxGlMMdJGChxnGAoW5aMiyPWlV9h24DTcpjsHXCVeRRqgQidwvoaRwFlCuMYMKTXw3FT7OZKaYkYKe5gpkiI8QSMyMJSjhCg/n19ewDOjRDAW0hTXcK7+nshRotQsCU1ngvRYLXul+J83yHR86+eUp5kmHC8WxRmDWsAyBhhRSbBmM0MQltTcCvEYSYS1CasMwVt+eZU8XTW9VrP1cN1oX1Rx1MEJOAXnwAM3oA3uQQd0AQZT8AxewZuVWy/Wu/WxaK1Z1cwx+APr8wd79pLZ</latexit>

1 q = gt
<latexit sha1_base64="iwTNnvjF4D82jHvNQ2VKVI9uvn8=">AAAB+XicbVDLSsNAFJ3UV62vqEs3g0VwISURqW6EghuXFewD2hAmk0k7dDITZyaFEvInblwo4tY/ceffOGmz0NYDFw7n3Mu99wQJo0o7zrdVWVvf2Nyqbtd2dvf2D+zDo64SqcSkgwUTsh8gRRjlpKOpZqSfSILigJFeMLkr/N6USEUFf9SzhHgxGnEaUYy0kXzbfrodBoKF2Sj3M53Xar5ddxrOHHCVuCWpgxJt3/4ahgKnMeEaM6TUwHUS7WVIaooZyWvDVJEE4QkakYGhHMVEedn88hyeGSWEkZCmuIZz9fdEhmKlZnFgOmOkx2rZK8T/vEGqoxsvozxJNeF4sShKGdQCFjHAkEqCNZsZgrCk5laIx0girE1YRQju8surpHvZcJuN5sNVvXVRxlEFJ+AUnAMXXIMWuAdt0AEYTMEzeAVvVma9WO/Wx6K1YpUzx+APrM8f9JCTKQ==</latexit>

Sequence length L

Xavier Bresson 24
25

Attention block
Output
Probability
Attention block layer (2017) :

h̄ = LN q + MHA(q, K, V ) 2 Rd
<latexit sha1_base64="FivNuXl7IqGschm93z5IQD8lRpQ=">AAACn3icfVHfT9swEHYyBixsUNjjeDBUm4pAVYIQ7GUSGw8g0aFuWgtTUyrbdVsLx0ntC6KK8m/xh/DGf4NTwsQv7SSfPn139935jiZSGPD9W8d9M/N2dm7+nbfw/sPiUmV5pW3iVDPeYrGM9RklhkuheAsESH6WaE4iKvkpvTgo4qeXXBsRqz8wSXg3IkMlBoIRsFSvcv0Fh+OU9KcOh5TobJTjbzgEfgU6yhoneUjFsIbHePMf+fPoe14bbx1vtTdwEbVeqDAiMKI0+52fW6HQeyI8ek3yodsj4UYzr5X0f6Q9r1ep+nV/avglCEpQRaU1e5WbsB+zNOIKmCTGdAI/gW5GNAgmee6FqeEJYRdkyDsWKhJx082m+83xZ8v08SDW9inAU/ZxRUYiYyYRtZnFpOZ5rCBfi3VSGHztZkIlKXDF7hsNUokhxsWxcF9ozkBOLCBMCzsrZiOiCQN70mIJwfMvvwTt7XqwW9/9tVPd3y7XMY8+oXVUQwHaQ/voCDVRCzFn1fnhHDsNd809dE/c5n2q65Q1H9ETc//eAZKPyZI=</latexit>

h = LN h̄ + MLP(h̄) 2 Rd

Layer normalization Residual connection


z-scoring with learnable parameters Attention Block
nn.LayerNorm()

Positional Positional
In 2019, LayerNorm (LN) was applied before non- Encoding Encoding
linear operations :

h̄ = q + MHA(LN(q), LN(K), LN(V )) 2 Rd


<latexit sha1_base64="JWLsfBiOyRHu0fh+yQfy1QUynAg=">AAACpXicbVFdT9swFHWysXUBRmGPe7GogFabqgRNwMukTrxUop0KogWpKZXjutTCcVL7ZloV5Z/xK3jj3+CUVGoLV7J17rkf5/o6iAXX4LrPlv3h48anz6UvzubW9ted8u5eT0eJoqxLIxGp24BoJrhkXeAg2G2sGAkDwW6Ch/M8fvOPKc0jeQ2zmA1Cci/5mFMChhqWHw+xP03IaH5hPyAqnWT4N57iH9gH9h9UmLabf7Lqwmn9zarT2s9l92LV7dVq2OfSDwlMgiC9yu5MY99ZEZoYiYXYklCrsypUpLzf0HGG5Ypbd+eG3wKvABVUWGdYfvJHEU1CJoEKonXfc2MYpEQBp4Jljp9oFhP6QO5Z30BJQqYH6XzLGT4wzAiPI2WOBDxnlytSEmo9CwOTmU+q12M5+V6sn8D4bJByGSfAJH0VGicCQ4TzL8MjrhgFMTOAUMXNrJhOiCIUzMfmS/DWn/wW9I7r3kn95PJXpXFcrKOEvqN9VEUeOkUN1EQd1EXU2rea1qV1ZR/Zbfva7r2m2lZR8w2tmD18AfXmzNw=</latexit>

h = h̄ + MLP(LN(h̄)) 2 Rd Sequence of Query the


context words next word

Xavier Bresson 25
26

LayerNorm and residual connection


Output
Probability
Layer normalization : z-scoring
<latexit sha1_base64="rLgUoLal+wNcmV7HwP3n9q6QMMc=">AAACpnicbVFdb9MwFHXC11a+CjzyckXF1IlRJRMCXiZN8DIkmMZG10l1KY7jtNZiO7NvgCrKT9uf2Bv/BqftEOt2JUtH55z74XuTIpcOo+hPEN66fefuvbX11v0HDx89bj95euxMabnoc5Mbe5IwJ3KpRR8l5uKksIKpJBeD5PRjow9+Cuuk0d9wVoiRYhMtM8kZemrcPqcofqNV1ef9unu2CRs7wICa1CDQD3LSBZpZxquz11SVdUWdnChWz6VNeAUJUKmpYjhNkuqw/p4Cpa2Ny5K/JE7Bm1W5c0l9EUw3fa6kbcGi7j/XEaY3mNhWstKtNW53ol40D7gO4iXokGUcjNsXNDW8VEIjz5lzwzgqcFQxi5Lnom7R0omC8VM2EUMPNVPCjar5mmt46ZkUMmP90whz9v+MiinnZirxzmZIt6o15E3asMTs/aiSuihRaL5olJU5oIHmZpBKKzjmMw8Yt9LPCnzK/F3QX7ZZQrz65evgeLsXv+3FX990dreX61gjz8kL0iUxeUd2yR45IH3Cg07wKTgMjsJuuB/2w8HCGgbLnGfkSoQ//gLsas+z</latexit>

⇣q µ⌘
LN(q) = a + b 2 Rd
with µ = Mean(q) 2 R, = Std(q) 2 R, a, b 2 Rd

Residual/skip connection :
<latexit sha1_base64="rz+dMgom4vANeyK8Aeaxeii1/ns=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahIpSkiHoRCl48VrAf0May2W7apZtN2N1oQ+j/8OJBEa/+F2/+G7dtDtr6YODx3gwz87yIM6Vt+9vKrayurW/kNwtb2zu7e8X9g6YKY0log4Q8lG0PK8qZoA3NNKftSFIceJy2vNHN1G89UqlYKO51ElE3wAPBfEawNtJDgq7RGJ0hv9cqj097xZJdsWdAy8TJSAky1HvFr24/JHFAhSYcK9Vx7Ei7KZaaEU4nhW6saITJCA9ox1CBA6rcdHb1BJ0YpY/8UJoSGs3U3xMpDpRKAs90BlgP1aI3Ff/zOrH2r9yUiSjWVJD5Ij/mSIdoGgHqM0mJ5okhmEhmbkVkiCUm2gRVMCE4iy8vk2a14lxUnLvzUq2axZGHIziGMjhwCTW4hTo0gICEZ3iFN+vJerHerY95a87KZg7hD6zPH9ICkLo=</latexit>

· + fW (·) y = x + fW (x)
<latexit sha1_base64="USvj3oi+twil8YeX/h1Fw+zl6ko=">AAAB+3icbZDLSsNAFIZPvNZ6i3XpZrAIFaEkRdRlwY3LCvYCbQiTyaQdOpmEmYlYSl/FjQtF3Poi7nwbp2kW2vrDwMd/zuGc+YOUM6Ud59taW9/Y3Nou7ZR39/YPDu2jSkclmSS0TRKeyF6AFeVM0LZmmtNeKimOA067wfh2Xu8+UqlYIh70JKVejIeCRYxgbSzfrgxImGh0gSK/W8v53LerTt3JhVbBLaAKhVq+/TUIE5LFVGjCsVJ910m1N8VSM8LprDzIFE0xGeMh7RsUOKbKm+a3z9CZcUIUJdI8oVHu/p6Y4lipSRyYzhjrkVquzc3/av1MRzfelIk001SQxaIo40gnaB4ECpmkRPOJAUwkM7ciMsISE23iKpsQ3OUvr0KnUXev6u79ZbXZKOIowQmcQg1cuIYm3EEL2kDgCZ7hFd6smfVivVsfi9Y1q5g5hj+yPn8AOneTOQ==</latexit>

x
<latexit sha1_base64="zBytBMYdIu5fHbB0Z0zhfJX4Ma4=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mKqMeCF48t2FZoQ9lsJ+3azSbsbsQS+gu8eFDEqz/Jm//GbZuDtj4YeLw3w8y8IBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoreNUMWyxWMTqPqAaBZfYMtwIvE8U0igQ2AnGNzO/84hK81jemUmCfkSHkoecUWOl5lO/XHGr7hxklXg5qUCORr/81RvELI1QGiao1l3PTYyfUWU4Ezgt9VKNCWVjOsSupZJGqP1sfuiUnFllQMJY2ZKGzNXfExmNtJ5Ege2MqBnpZW8m/ud1UxNe+xmXSWpQssWiMBXExGT2NRlwhcyIiSWUKW5vJWxEFWXGZlOyIXjLL6+Sdq3qXVa95kWlXsvjKMIJnMI5eHAFdbiFBrSAAcIzvMKb8+C8OO/Ox6K14OQzx/AHzucP42GM8g==</latexit>

Forward pass
<latexit sha1_base64="RirLaIJz3bqPK26aFIqcBygtBpQ=">AAAC3XicjVJBS9xAFJ7EanVrdatHL4NLRRGWRErrRRB6qeBhC65KN0t4mZ2sg5NJmHmxG0LASw8V6bX/y1v/R39AJ+tC11VsHwx8fO/73pt5b6JMCoOe98tx517ML7xcXGq8Wn69stp8s3Zq0lwz3mWpTPV5BIZLoXgXBUp+nmkOSST5WXT5sc6fXXFtRKpOsMh4P4GhErFggJYKm7+DWAMrgww0CpD0uPqLRxXdOqDPCIpqJlk8cAdB45/+SAy3aYB8hDopjwYV3aWBgkhCOKJx+JXWgp3/KFT7nm00XTRstry2Nw76GPgT0CKT6ITNu2CQsjzhCpkEY3q+l2G/rKszyatGkBueAbuEIe9ZqCDhpl+Ot1PRt5YZ0DjV9iikY3baUUJiTJFEVpkAXpjZXE0+levlGO/3S6GyHLli943iXFJMab1qOhCaM5SFBcC0sHel7ALskNB+iIYdgj/75MfgdK/tv2/7n9+1Dvcm41gkG2STbBOffCCH5BPpkC5hzhfn2vnu3Lih+829dX/cS11n4lknD8L9+Qdocefm</latexit>

@L @L @y
<latexit sha1_base64="PkznhpaZaBRFnH4DEtJiO1Hs0lE=">AAACBnicbZDLSsNAFIZPvNZ6i7oUYbAIrkpSRF0W3LhwUcFeoAllMp20QycXZiZCCFm58VXcuFDErc/gzrdx0gbU1h8GPv5zzsyc34s5k8qyvoyl5ZXVtfXKRnVza3tn19zb78goEYS2ScQj0fOwpJyFtK2Y4rQXC4oDj9OuN7kq6t17KiSLwjuVxtQN8ChkPiNYaWtgHjm+wCRzYiwUwxzd5D+c5gOzZtWtqdAi2CXUoFRrYH46w4gkAQ0V4VjKvm3Fys2KCwmnedVJJI0xmeAR7WsMcUClm03XyNGJdobIj4Q+oUJT9/dEhgMp08DTnQFWYzlfK8z/av1E+ZduxsI4UTQks4f8hCMVoSITNGSCEsVTDZgIpv+KyBjrXJROrqpDsOdXXoROo26f1+3bs1qzUcZRgUM4hlOw4QKacA0taAOBB3iCF3g1Ho1n4814n7UuGeXMAfyR8fENwiKZQg==</latexit>

· + fW (·)
<latexit sha1_base64="USvj3oi+twil8YeX/h1Fw+zl6ko=">AAAB+3icbZDLSsNAFIZPvNZ6i3XpZrAIFaEkRdRlwY3LCvYCbQiTyaQdOpmEmYlYSl/FjQtF3Poi7nwbp2kW2vrDwMd/zuGc+YOUM6Ud59taW9/Y3Nou7ZR39/YPDu2jSkclmSS0TRKeyF6AFeVM0LZmmtNeKimOA067wfh2Xu8+UqlYIh70JKVejIeCRYxgbSzfrgxImGh0gSK/W8v53LerTt3JhVbBLaAKhVq+/TUIE5LFVGjCsVJ910m1N8VSM8LprDzIFE0xGeMh7RsUOKbKm+a3z9CZcUIUJdI8oVHu/p6Y4lipSRyYzhjrkVquzc3/av1MRzfelIk001SQxaIo40gnaB4ECpmkRPOJAUwkM7ciMsISE23iKpsQ3OUvr0KnUXev6u79ZbXZKOIowQmcQg1cuIYm3EEL2kDgCZ7hFd6smfVivVsfi9Y1q5g5hj+yPn8AOneTOQ==</latexit>

@L
= Positional Positional
@x @y @x @y Encoding
Backward pass Encoding
@L
= Id + rx fw
@y
@L @L
= + rx f w Sequence of Query the
@y @y context words next word
No vanishing gradient
for residual connection
Xavier Bresson 26
27

Positional encoding
Output
Probability
Transformers are designed to process sets of vectors but
items in a set are not ordered.
This is an issue for NLP tasks.
An additional ordering feature is required to inject
causal ordering in the attention mechanism.
Two classes of Positional Encoding (PE) :
Learnable vs non-learnable PE
Learnable PE :
Embedding of discrete ordering index 0,1,2,3,…,L-1,
with L is the sequence length.
Positional Positional
Two issues : Encoding Encoding

Requires to know the maximum L value among


all training sequences.
Some test sequences may have lengths not Sequence of Query the
present in the train set. context words next word

Xavier Bresson 27
28

Positional encoding
Non-learnable PE :
Continuous ordering with sin and cos functions.
Advantages :
No training necessary
No need to know the maximum length in the train set.
Test sequences may have lengths not present in the train set.

PEt 2 Rd is defined as
<latexit sha1_base64="i5Xx8nrNgDx06yPtBiWpH1ejnHE=">AAADEHichVJLbxMxEPYur7K8UjhysUipihRFuxECLpUqISSOAZG2UpxGXq+dWPV6F3u2EFn+CVz4K1w4gBBXjtz4N3iTVOoDiTl9/ubxjWcmr5W0kKZ/ovjK1WvXb2zcTG7dvnP3Xmfz/r6tGsP4iFWqMoc5tVxJzUcgQfHD2nBa5oof5McvW//BCTdWVvodLGo+KelMSyEZhUBNN6PtbQL8I5jSDV/5KRCpSUlhnufurT9yhcenbiwtLrgIOgWmFntCknOpDnrS7yZEcQHEJSTnM6kdNYYuvFOMMZ8QKzXeGZBaYjGVGJ6cKS7wltxqNfgJ176HcahPWGX/G18Vhe+1wVwXazmcECNnc+gn5H1Di9OkDxLm2LeldokwlLks7aVperR6FN4RJVRVGTyQxCyR965V9/1k2umm/XRp+DLI1qCL1jacdn6TomJNyTUwRa0dZ2kNk9AgSKZ4mEVjeU3ZMZ3xcYCaltxO3HKhHj8OTIFFaEVUGvCSPZvhaGntosxDZLsse9HXkv/yjRsQLyZO6roBrtlKSDQKQ4Xb68CFNJyBWgRAmZGhV8zmNIwHwg21Q8gufvky2B/0s2f97M3T7t5gPY4N9BA9QjsoQ8/RHnqNhmiEWPQp+hJ9i77Hn+Ov8Y/45yo0jtY5D9A5i3/9Bcsd91U=</latexit>

⇢ d
sin(2⇡fi t) if i is even, 10, 000 b2ic
PEt,i = with fi = .
cos(2⇡fi t) if i is odd, 2⇡

Xavier Bresson 28
29

Classification layer
Output
probability
for next word
The last layer is a standard linear layer to
<latexit sha1_base64="5aQlUiwBW1m2+47j+F7nBBOdGuE=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0mKVI8FLx6r2A9oQ9lsN+3SzSbsToRS+g+8eFDEq//Im//GTZuDtj4YeLw3w8y8IJHCoOt+O4WNza3tneJuaW//4PCofHzSNnGqGW+xWMa6G1DDpVC8hQIl7yaa0yiQvBNMbjO/88S1EbF6xGnC/YiOlAgFo2ilh6Q0KFfcqrsAWSdeTiqQozkof/WHMUsjrpBJakzPcxP0Z1SjYJLPS/3U8ISyCR3xnqWKRtz4s8Wlc3JhlSEJY21LIVmovydmNDJmGgW2M6I4NqteJv7n9VIMb/yZUEmKXLHlojCVBGOSvU2GQnOGcmoJZVrYWwkbU00Z2nCyELzVl9dJu1b16tX6/VWlUcvjKMIZnMMleHANDbiDJrSAQQjP8ApvzsR5cd6dj2VrwclnTuEPnM8fDWaNAw==</latexit>

p
compute the scores of the next word in the s
<latexit sha1_base64="O2K56xTd+mze1CGaAlqioGifjig=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0mKVI8FLx6r2A9oQ9lsN+3SzSbsToRS+g+8eFDEq//Im//GTZuDtj4YeLw3w8y8IJHCoOt+O4WNza3tneJuaW//4PCofHzSNnGqGW+xWMa6G1DDpVC8hQIl7yaa0yiQvBNMbjO/88S1EbF6xGnC/YiOlAgFo2ilB1MalCtu1V2ArBMvJxXI0RyUv/rDmKURV8gkNabnuQn6M6pRMMnnpX5qeELZhI54z1JFI2782eLSObmwypCEsbalkCzU3xMzGhkzjQLbGVEcm1UvE//zeimGN/5MqCRFrthyUZhKgjHJ3iZDoTlDObWEMi3srYSNqaYMbThZCN7qy+ukXat69Wr9/qrSqOVxFOEMzuESPLiGBtxBE1rAIIRneIU3Z+K8OO/Ox7K14OQzp/AHzucPEfWNBg==</latexit>

sequence, followed by a Softmax function to


produce the probability vector. h
<latexit sha1_base64="nMu2zxcS4cdbARA3QYucx+hauMU=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0mKVI8FLx6r2A9oQ9lsJ+3SzSbsboQS+g+8eFDEq//Im//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKHSPJaPZpqgH9GR5CFn1FjpYVwalCtu1V2ArBMvJxXI0RyUv/rDmKURSsME1brnuYnxM6oMZwJnpX6qMaFsQkfYs1TSCLWfLS6dkQurDEkYK1vSkIX6eyKjkdbTKLCdETVjverNxf+8XmrCGz/jMkkNSrZcFKaCmJjM3yZDrpAZMbWEMsXtrYSNqaLM2HDmIXirL6+Tdq3q1av1+6tKo5bHUYQzOIdL8OAaGnAHTWgBgxCe4RXenInz4rw7H8vWgpPPnMIfOJ8/AT6M+w==</latexit>

Classification
Layer
s = MLP(h) 2 RV
<latexit sha1_base64="IjrUytIlDaaYH98+5fEKOoISdNQ=">AAACQnicbVC7SgNBFJ31bXxFLW0GgyE2YVdEbYSAjYVCfCRGsjHMTmbN4MzsMnNXEpb9Nhu/wM4PsLFQxNbC3ZhCYw4MHM45l3vneKHgBmz72ZqYnJqemZ2bzy0sLi2v5FfX6iaINGU1GohANzximOCK1YCDYI1QMyI9wa68u6PMv7pn2vBAXUI/ZC1JbhX3OSWQSu38dREbfIhdYD3QMj49qSal7rbLlSsJdD0vPk9u4nqCXTdXxOGv5EXggyS9pGTGp3PtfMEu2wPg/8QZkgIaotrOP7mdgEaSKaCCGNN07BBaMdHAqWBJzo0MCwm9I7esmVJFJDOteFBBgrdSpYP9QKdPAR6ovydiIo3pSy9NZqeaUS8Tx3nNCPyDVsxVGAFT9GeRHwkMAc76xB2uGQXRTwmhmqe3YtolmlBIW89KcEa//J/Ud8rOXnnvbLdQ2RnWMYc20CYqIQftowo6RlVUQxQ9oBf0ht6tR+vV+rA+f6IT1nBmHf2B9fUNT62vdQ==</latexit>

p = Softmax(s) 2 RV

Positional Positional
Encoding Encoding

Sequence of Query the


context words next word

Xavier Bresson 29
30

Efficient training

It is possible to train in parallel the prediction of the next words in a sequence.


For this, we need to hide the future words with a masked attention matrix.
Here is the process :
Step 1 : Compute the attention matrix.
Step 2 : Mask next word to predict.
Step 3 : Softmax calculation.
Step 4 : Calculate weighted linear combination.
⇣ QK T ⌘
<latexit sha1_base64="fhWTGGhYEHcEDOLrARlXTsNJnxA=">AAADYXicdVJRT1MxFL7bUOEqOPCRlxOZZCRj2YhBX0hQX0jAhCkbJHQsvb29W1lv72zPHEvTP+mbL774R+wdU2HoSZp8Od/5vrZfTjSSwmCj8b1QLC09evxkeSV8+mx17Xl5faNjsrFmvM0ymemLiBouheJtFCj5xUhzmkaSn0fDDzl//pVrIzJ1htMR76a0r0QiGEXf6q0XJgT5DerUfqRmuHv0zlVbteNaZwe2DwB+c5+zBFN648h70a8CSTRltgXHV2fOEvNFo42dA5LFGcJdO+cdvGIHOjUgJNz+Q04EDsBBqwbHNegQoUhKcRBF9pO7sicERcoNxK523+0/cyduwZ2qGNw9Zc+Ka3cAIZE8QWJDEvG+UJZqTafOSild2IS/BiASoIhc5RlBxHHCuYKKqEBuXbmueHsS7oJ/UILTBaHK7mhdSLiK5xeFRIv+AOthXr3yVqPemBU8BM052ArmddorfyNxxsapd2aSGnPZbIyw661RMMm9+djwEWVD2ueXHirqs+na2YY4eOU7MSSZ9kchzLp3FZamxkzTyE/mCZtFLm/+i7scY/K2a4Uajf2X2e1FyVgCZpCvG8RCc4Zy6gFlWvi3AhtQvz/olzIPobn45Yegs1dv7tf3W6+3DvfmcSwHm8HLoBo0gzfBYXAUnAbtgBV+FJeKq8W14s/SSqlc2rgdLRbmmhfBvSpt/gK8UA90</latexit>

Mask-HA(Q, K, V ) = Softmax p Mask V,


d
with Q, K, V 2 RL⇥d , Mask 2 RL⇥L ,

1 if attention between i and j
and Maskij =
1 if no attention

Xavier Bresson 30
31

Attention matrix

Step 1 : Compute the attention matrix A, i.e. Aij is the dot product between word vector qi
and word vector kj.
Two similar vectors will receive a high value and inversely, two dissimilar ones a low value.

Yesterday I … saw a
j
<latexit sha1_base64="UeR/RbFXcM5OszSfL2wz/T9FNMs=">AAAB6HicbVDLTgJBEOzFF+IL9ehlIjHxRHaJQY8kXjxCIo8ENmR2aGBgdnYzM2tCNnyBFw8a49VP8ubfOMAeFKykk0pVd7q7glhwbVz328ltbe/s7uX3CweHR8cnxdOzlo4SxbDJIhGpTkA1Ci6xabgR2IkV0jAQ2A6m9wu//YRK80g+mlmMfkhHkg85o8ZKjUm/WHLL7hJkk3gZKUGGer/41RtELAlRGiao1l3PjY2fUmU4Ezgv9BKNMWVTOsKupZKGqP10eeicXFllQIaRsiUNWaq/J1Iaaj0LA9sZUjPW695C/M/rJmZ456dcxolByVaLhokgJiKLr8mAK2RGzCyhTHF7K2FjqigzNpuCDcFbf3mTtCplr1quNm5KtUoWRx4u4BKuwYNbqMED1KEJDBCe4RXenInz4rw7H6vWnJPNnMMfOJ8/z8OM6Q==</latexit>

Yesterday qk qk qk
<latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit> <latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit>

<latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit>

qk qk
<latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit> <latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit>

qk qk qk
<latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit> <latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit>

<latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit>

I qk qk
<latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit> <latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit>

A = QK T 2 RL⇥L
<latexit sha1_base64="MVTyiYPLGRNaQpvt0C0rgYEzwQg=">AAACNXicbVDLSgMxFM34rPU16tJNsCiuykyR6kaouBHsopW+oNMOmTRt02YyY5IRyjA/5cb/cKULF4q49RdMHwttPRA4nHMuufd4IaNSWdarsbS8srq2ntpIb25t7+yae/s1GUQCkyoOWCAaHpKEUU6qiipGGqEgyPcYqXvD67FffyBC0oBX1CgkLR/1OO1SjJSWXLN4Aq/gJSzftivQoRw6PlJ9z4vvknZcdBT1iYTFBDpOWgfdmA4Snb53qY4P3cHciGtmrKw1AVwk9oxkwAwl13x2OgGOfMIVZkjKpm2FqhUjoShmJEk7kSQhwkPUI01NOdLrtOLJ1Qk81koHdgOhH1dwov6eiJEv5cj3dHK8oZz3xuJ/XjNS3YtWTHkYKcLx9KNuxKAK4LhC2KGCYMVGmiAsqN4V4j4SCCtddFqXYM+fvEhquaydz+bLZ5lCblZHChyCI3AKbHAOCuAGlEAVYPAIXsA7+DCejDfj0/iaRpeM2cwB+APj+wdawajZ</latexit>

… qk qk qk
<latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit> <latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit>

<latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit>

qk qk
<latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit> <latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit>

Aij = qiT kj 2 R
saw qk qk qk
<latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit> <latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit>

<latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit>

qk qk
<latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit> <latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit>

a qk qk qk
<latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit> <latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit>

<latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit>

qk qk
<latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit> <latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit>

i
<latexit sha1_base64="/567GNfqs6hI1+r9uu/P1P0myJ4=">AAAB6HicbVBNS8NAEJ34WetX1aOXxSJ4KkmR6rHgxWML9gPaUDbbSbt2swm7G6GE/gIvHhTx6k/y5r9x2+agrQ8GHu/NMDMvSATXxnW/nY3Nre2d3cJecf/g8Oi4dHLa1nGqGLZYLGLVDahGwSW2DDcCu4lCGgUCO8Hkbu53nlBpHssHM03Qj+hI8pAzaqzU5INS2a24C5B14uWkDDkag9JXfxizNEJpmKBa9zw3MX5GleFM4KzYTzUmlE3oCHuWShqh9rPFoTNyaZUhCWNlSxqyUH9PZDTSehoFtjOiZqxXvbn4n9dLTXjrZ1wmqUHJlovCVBATk/nXZMgVMiOmllCmuL2VsDFVlBmbTdGG4K2+vE7a1YpXq9Sa1+V6NY+jAOdwAVfgwQ3U4R4a0AIGCM/wCm/Oo/PivDsfy9YNJ585gz9wPn8Azj+M6A==</latexit>

Xavier Bresson 31
32

Masked attention

Step 2 : Mask next word to predict Hide these words during


next word prediction

Yesterday I … saw a Yesterday I … saw a


j
<latexit sha1_base64="UeR/RbFXcM5OszSfL2wz/T9FNMs=">AAAB6HicbVDLTgJBEOzFF+IL9ehlIjHxRHaJQY8kXjxCIo8ENmR2aGBgdnYzM2tCNnyBFw8a49VP8ubfOMAeFKykk0pVd7q7glhwbVz328ltbe/s7uX3CweHR8cnxdOzlo4SxbDJIhGpTkA1Ci6xabgR2IkV0jAQ2A6m9wu//YRK80g+mlmMfkhHkg85o8ZKjUm/WHLL7hJkk3gZKUGGer/41RtELAlRGiao1l3PjY2fUmU4Ezgv9BKNMWVTOsKupZKGqP10eeicXFllQIaRsiUNWaq/J1Iaaj0LA9sZUjPW695C/M/rJmZ456dcxolByVaLhokgJiKLr8mAK2RGzCyhTHF7K2FjqigzNpuCDcFbf3mTtCplr1quNm5KtUoWRx4u4BKuwYNbqMED1KEJDBCe4RXenInz4rw7H6vWnJPNnMMfOJ8/z8OM6Q==</latexit>

Yesterday qk qk qk Yesterday 1 1 1 1 1
<latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit> <latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit>

<latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit>

qk qk
<latexit sha1_base64="sx8aqgD1cIeIGNAHGJRIIe7slBA=">AAAB6nicbVBNS8NAEJ3Urxq/qh69LBbBg5REpHosePFY0X5AG8pmu2mXbnbD7kYooT/BiwdFvPqLvPlv3LQ5aOuDgcd7M8zMCxPOtPG8b6e0tr6xuVXednd29/YPKodHbS1TRWiLSC5VN8SaciZoyzDDaTdRFMchp51wcpv7nSeqNJPi0UwTGsR4JFjECDZWevBdd1CpejVvDrRK/IJUoUBzUPnqDyVJYyoM4Vjrnu8lJsiwMoxwOnP7qaYJJhM8oj1LBY6pDrL5qTN0ZpUhiqSyJQyaq78nMhxrPY1D2xljM9bLXi7+5/VSE90EGRNJaqggi0VRypGRKP8bDZmixPCpJZgoZm9FZIwVJsamk4fgL7+8StqXNb9eq99fVRsXRRxlOIFTOAcfrqEBd9CEFhAYwTO8wpvDnRfn3flYtJacYuYY/sD5/AHgF4zS</latexit>

<latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit> <latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit>

<latexit sha1_base64="JR5w8iGbxoaP1JzolmcCEj+dNyg=">AAAB8XicbVBNS8NAEJ34WeNX1aOXxSJ40JKIVI8FLx4r2A9sQ9lsN+3SzSbsToQS+i+8eFDEq//Gm//GpM1BWx8MPN6bYWaeH0th0HG+rZXVtfWNzdKWvb2zu7dfPjhsmSjRjDdZJCPd8anhUijeRIGSd2LNaehL3vbHt7nffuLaiEg94CTmXkiHSgSCUcykxwvSEyrAiW33yxWn6sxAlolbkAoUaPTLX71BxJKQK2SSGtN1nRi9lGoUTPKp3UsMjykb0yHvZlTRkBsvnV08JaeZMiBBpLNSSGbq74mUhsZMQj/rDCmOzKKXi/953QSDGy8VKk6QKzZfFCSSYETy98lAaM5QTjJCmRbZrYSNqKYMs5DyENzFl5dJ67Lq1qq1+6tK/byIowTHcAJn4MI11OEOGtAEBgqe4RXeLGO9WO/Wx7x1xSpmjuAPrM8f7aWPug==</latexit>

<latexit sha1_base64="JR5w8iGbxoaP1JzolmcCEj+dNyg=">AAAB8XicbVBNS8NAEJ34WeNX1aOXxSJ40JKIVI8FLx4r2A9sQ9lsN+3SzSbsToQS+i+8eFDEq//Gm//GpM1BWx8MPN6bYWaeH0th0HG+rZXVtfWNzdKWvb2zu7dfPjhsmSjRjDdZJCPd8anhUijeRIGSd2LNaehL3vbHt7nffuLaiEg94CTmXkiHSgSCUcykxwvSEyrAiW33yxWn6sxAlolbkAoUaPTLX71BxJKQK2SSGtN1nRi9lGoUTPKp3UsMjykb0yHvZlTRkBsvnV08JaeZMiBBpLNSSGbq74mUhsZMQj/rDCmOzKKXi/953QSDGy8VKk6QKzZfFCSSYETy98lAaM5QTjJCmRbZrYSNqKYMs5DyENzFl5dJ67Lq1qq1+6tK/byIowTHcAJn4MI11OEOGtAEBgqe4RXeLGO9WO/Wx7x1xSpmjuAPrM8f7aWPug==</latexit>

<latexit sha1_base64="JR5w8iGbxoaP1JzolmcCEj+dNyg=">AAAB8XicbVBNS8NAEJ34WeNX1aOXxSJ40JKIVI8FLx4r2A9sQ9lsN+3SzSbsToQS+i+8eFDEq//Gm//GpM1BWx8MPN6bYWaeH0th0HG+rZXVtfWNzdKWvb2zu7dfPjhsmSjRjDdZJCPd8anhUijeRIGSd2LNaehL3vbHt7nffuLaiEg94CTmXkiHSgSCUcykxwvSEyrAiW33yxWn6sxAlolbkAoUaPTLX71BxJKQK2SSGtN1nRi9lGoUTPKp3UsMjykb0yHvZlTRkBsvnV08JaeZMiBBpLNSSGbq74mUhsZMQj/rDCmOzKKXi/953QSDGy8VKk6QKzZfFCSSYETy98lAaM5QTjJCmRbZrYSNqKYMs5DyENzFl5dJ67Lq1qq1+6tK/byIowTHcAJn4MI11OEOGtAEBgqe4RXeLGO9WO/Wx7x1xSpmjuAPrM8f7aWPug==</latexit>

<latexit sha1_base64="JR5w8iGbxoaP1JzolmcCEj+dNyg=">AAAB8XicbVBNS8NAEJ34WeNX1aOXxSJ40JKIVI8FLx4r2A9sQ9lsN+3SzSbsToQS+i+8eFDEq//Gm//GpM1BWx8MPN6bYWaeH0th0HG+rZXVtfWNzdKWvb2zu7dfPjhsmSjRjDdZJCPd8anhUijeRIGSd2LNaehL3vbHt7nffuLaiEg94CTmXkiHSgSCUcykxwvSEyrAiW33yxWn6sxAlolbkAoUaPTLX71BxJKQK2SSGtN1nRi9lGoUTPKp3UsMjykb0yHvZlTRkBsvnV08JaeZMiBBpLNSSGbq74mUhsZMQj/rDCmOzKKXi/953QSDGy8VKk6QKzZfFCSSYETy98lAaM5QTjJCmRbZrYSNqKYMs5DyENzFl5dJ67Lq1qq1+6tK/byIowTHcAJn4MI11OEOGtAEBgqe4RXeLGO9WO/Wx7x1xSpmjuAPrM8f7aWPug==</latexit>

I I 1 1 1 1 1
<latexit sha1_base64="sx8aqgD1cIeIGNAHGJRIIe7slBA=">AAAB6nicbVBNS8NAEJ3Urxq/qh69LBbBg5REpHosePFY0X5AG8pmu2mXbnbD7kYooT/BiwdFvPqLvPlv3LQ5aOuDgcd7M8zMCxPOtPG8b6e0tr6xuVXednd29/YPKodHbS1TRWiLSC5VN8SaciZoyzDDaTdRFMchp51wcpv7nSeqNJPi0UwTGsR4JFjECDZWevBdd1CpejVvDrRK/IJUoUBzUPnqDyVJYyoM4Vjrnu8lJsiwMoxwOnP7qaYJJhM8oj1LBY6pDrL5qTN0ZpUhiqSyJQyaq78nMhxrPY1D2xljM9bLXi7+5/VSE90EGRNJaqggi0VRypGRKP8bDZmixPCpJZgoZm9FZIwVJsamk4fgL7+8StqXNb9eq99fVRsXRRxlOIFTOAcfrqEBd9CEFhAYwTO8wpvDnRfn3flYtJacYuYY/sD5/AHgF4zS</latexit>

qk qk qk
<latexit sha1_base64="sx8aqgD1cIeIGNAHGJRIIe7slBA=">AAAB6nicbVBNS8NAEJ3Urxq/qh69LBbBg5REpHosePFY0X5AG8pmu2mXbnbD7kYooT/BiwdFvPqLvPlv3LQ5aOuDgcd7M8zMCxPOtPG8b6e0tr6xuVXednd29/YPKodHbS1TRWiLSC5VN8SaciZoyzDDaTdRFMchp51wcpv7nSeqNJPi0UwTGsR4JFjECDZWevBdd1CpejVvDrRK/IJUoUBzUPnqDyVJYyoM4Vjrnu8lJsiwMoxwOnP7qaYJJhM8oj1LBY6pDrL5qTN0ZpUhiqSyJQyaq78nMhxrPY1D2xljM9bLXi7+5/VSE90EGRNJaqggi0VRypGRKP8bDZmixPCpJZgoZm9FZIwVJsamk4fgL7+8StqXNb9eq99fVRsXRRxlOIFTOAcfrqEBd9CEFhAYwTO8wpvDnRfn3flYtJacYuYY/sD5/AHgF4zS</latexit>

<latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit> <latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit>

<latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit>

qk qk
<latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit> <latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit>

<latexit sha1_base64="JR5w8iGbxoaP1JzolmcCEj+dNyg=">AAAB8XicbVBNS8NAEJ34WeNX1aOXxSJ40JKIVI8FLx4r2A9sQ9lsN+3SzSbsToQS+i+8eFDEq//Gm//GpM1BWx8MPN6bYWaeH0th0HG+rZXVtfWNzdKWvb2zu7dfPjhsmSjRjDdZJCPd8anhUijeRIGSd2LNaehL3vbHt7nffuLaiEg94CTmXkiHSgSCUcykxwvSEyrAiW33yxWn6sxAlolbkAoUaPTLX71BxJKQK2SSGtN1nRi9lGoUTPKp3UsMjykb0yHvZlTRkBsvnV08JaeZMiBBpLNSSGbq74mUhsZMQj/rDCmOzKKXi/953QSDGy8VKk6QKzZfFCSSYETy98lAaM5QTjJCmRbZrYSNqKYMs5DyENzFl5dJ67Lq1qq1+6tK/byIowTHcAJn4MI11OEOGtAEBgqe4RXeLGO9WO/Wx7x1xSpmjuAPrM8f7aWPug==</latexit>

<latexit sha1_base64="JR5w8iGbxoaP1JzolmcCEj+dNyg=">AAAB8XicbVBNS8NAEJ34WeNX1aOXxSJ40JKIVI8FLx4r2A9sQ9lsN+3SzSbsToQS+i+8eFDEq//Gm//GpM1BWx8MPN6bYWaeH0th0HG+rZXVtfWNzdKWvb2zu7dfPjhsmSjRjDdZJCPd8anhUijeRIGSd2LNaehL3vbHt7nffuLaiEg94CTmXkiHSgSCUcykxwvSEyrAiW33yxWn6sxAlolbkAoUaPTLX71BxJKQK2SSGtN1nRi9lGoUTPKp3UsMjykb0yHvZlTRkBsvnV08JaeZMiBBpLNSSGbq74mUhsZMQj/rDCmOzKKXi/953QSDGy8VKk6QKzZfFCSSYETy98lAaM5QTjJCmRbZrYSNqKYMs5DyENzFl5dJ67Lq1qq1+6tK/byIowTHcAJn4MI11OEOGtAEBgqe4RXeLGO9WO/Wx7x1xSpmjuAPrM8f7aWPug==</latexit>

<latexit sha1_base64="JR5w8iGbxoaP1JzolmcCEj+dNyg=">AAAB8XicbVBNS8NAEJ34WeNX1aOXxSJ40JKIVI8FLx4r2A9sQ9lsN+3SzSbsToQS+i+8eFDEq//Gm//GpM1BWx8MPN6bYWaeH0th0HG+rZXVtfWNzdKWvb2zu7dfPjhsmSjRjDdZJCPd8anhUijeRIGSd2LNaehL3vbHt7nffuLaiEg94CTmXkiHSgSCUcykxwvSEyrAiW33yxWn6sxAlolbkAoUaPTLX71BxJKQK2SSGtN1nRi9lGoUTPKp3UsMjykb0yHvZlTRkBsvnV08JaeZMiBBpLNSSGbq74mUhsZMQj/rDCmOzKKXi/953QSDGy8VKk6QKzZfFCSSYETy98lAaM5QTjJCmRbZrYSNqKYMs5DyENzFl5dJ67Lq1qq1+6tK/byIowTHcAJn4MI11OEOGtAEBgqe4RXeLGO9WO/Wx7x1xSpmjuAPrM8f7aWPug==</latexit>

<latexit sha1_base64="boscvz4VDrtRuQyjRUqNKxuzqnY=">AAAB7HicbVBNS8NAEJ34WetX1aOXYBE8laRI9Vjw4rGCaQttKJvNtl262Q27E6GE/gYvHhTx6g/y5r9x2+agrQ8GHu/NMDMvSgU36Hnfzsbm1vbObmmvvH9weHRcOTltG5VpygKqhNLdiBgmuGQBchSsm2pGkkiwTjS5m/udJ6YNV/IRpykLEzKSfMgpQSsFfRUrHFSqXs1bwF0nfkGqUKA1qHz1Y0WzhEmkghjT870Uw5xo5FSwWbmfGZYSOiEj1rNUkoSZMF8cO3MvrRK7Q6VtSXQX6u+JnCTGTJPIdiYEx2bVm4v/eb0Mh7dhzmWaIZN0uWiYCReVO//cjblmFMXUEkI1t7e6dEw0oWjzKdsQ/NWX10m7XvMbtcbDdbVZL+IowTlcwBX4cANNuIcWBECBwzO8wpsjnRfn3flYtm44xcwZ/IHz+QPqII65</latexit>

… qk qk … 1 1 1 1 1
<latexit sha1_base64="sx8aqgD1cIeIGNAHGJRIIe7slBA=">AAAB6nicbVBNS8NAEJ3Urxq/qh69LBbBg5REpHosePFY0X5AG8pmu2mXbnbD7kYooT/BiwdFvPqLvPlv3LQ5aOuDgcd7M8zMCxPOtPG8b6e0tr6xuVXednd29/YPKodHbS1TRWiLSC5VN8SaciZoyzDDaTdRFMchp51wcpv7nSeqNJPi0UwTGsR4JFjECDZWevBdd1CpejVvDrRK/IJUoUBzUPnqDyVJYyoM4Vjrnu8lJsiwMoxwOnP7qaYJJhM8oj1LBY6pDrL5qTN0ZpUhiqSyJQyaq78nMhxrPY1D2xljM9bLXi7+5/VSE90EGRNJaqggi0VRypGRKP8bDZmixPCpJZgoZm9FZIwVJsamk4fgL7+8StqXNb9eq99fVRsXRRxlOIFTOAcfrqEBd9CEFhAYwTO8wpvDnRfn3flYtJacYuYY/sD5/AHgF4zS</latexit>

qk
<latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit> <latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit>

<latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit>

qk qk
<latexit sha1_base64="sx8aqgD1cIeIGNAHGJRIIe7slBA=">AAAB6nicbVBNS8NAEJ3Urxq/qh69LBbBg5REpHosePFY0X5AG8pmu2mXbnbD7kYooT/BiwdFvPqLvPlv3LQ5aOuDgcd7M8zMCxPOtPG8b6e0tr6xuVXednd29/YPKodHbS1TRWiLSC5VN8SaciZoyzDDaTdRFMchp51wcpv7nSeqNJPi0UwTGsR4JFjECDZWevBdd1CpejVvDrRK/IJUoUBzUPnqDyVJYyoM4Vjrnu8lJsiwMoxwOnP7qaYJJhM8oj1LBY6pDrL5qTN0ZpUhiqSyJQyaq78nMhxrPY1D2xljM9bLXi7+5/VSE90EGRNJaqggi0VRypGRKP8bDZmixPCpJZgoZm9FZIwVJsamk4fgL7+8StqXNb9eq99fVRsXRRxlOIFTOAcfrqEBd9CEFhAYwTO8wpvDnRfn3flYtJacYuYY/sD5/AHgF4zS</latexit>

<latexit sha1_base64="sx8aqgD1cIeIGNAHGJRIIe7slBA=">AAAB6nicbVBNS8NAEJ3Urxq/qh69LBbBg5REpHosePFY0X5AG8pmu2mXbnbD7kYooT/BiwdFvPqLvPlv3LQ5aOuDgcd7M8zMCxPOtPG8b6e0tr6xuVXednd29/YPKodHbS1TRWiLSC5VN8SaciZoyzDDaTdRFMchp51wcpv7nSeqNJPi0UwTGsR4JFjECDZWevBdd1CpejVvDrRK/IJUoUBzUPnqDyVJYyoM4Vjrnu8lJsiwMoxwOnP7qaYJJhM8oj1LBY6pDrL5qTN0ZpUhiqSyJQyaq78nMhxrPY1D2xljM9bLXi7+5/VSE90EGRNJaqggi0VRypGRKP8bDZmixPCpJZgoZm9FZIwVJsamk4fgL7+8StqXNb9eq99fVRsXRRxlOIFTOAcfrqEBd9CEFhAYwTO8wpvDnRfn3flYtJacYuYY/sD5/AHgF4zS</latexit>

<latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit> <latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit>

<latexit sha1_base64="JR5w8iGbxoaP1JzolmcCEj+dNyg=">AAAB8XicbVBNS8NAEJ34WeNX1aOXxSJ40JKIVI8FLx4r2A9sQ9lsN+3SzSbsToQS+i+8eFDEq//Gm//GpM1BWx8MPN6bYWaeH0th0HG+rZXVtfWNzdKWvb2zu7dfPjhsmSjRjDdZJCPd8anhUijeRIGSd2LNaehL3vbHt7nffuLaiEg94CTmXkiHSgSCUcykxwvSEyrAiW33yxWn6sxAlolbkAoUaPTLX71BxJKQK2SSGtN1nRi9lGoUTPKp3UsMjykb0yHvZlTRkBsvnV08JaeZMiBBpLNSSGbq74mUhsZMQj/rDCmOzKKXi/953QSDGy8VKk6QKzZfFCSSYETy98lAaM5QTjJCmRbZrYSNqKYMs5DyENzFl5dJ67Lq1qq1+6tK/byIowTHcAJn4MI11OEOGtAEBgqe4RXeLGO9WO/Wx7x1xSpmjuAPrM8f7aWPug==</latexit>

<latexit sha1_base64="JR5w8iGbxoaP1JzolmcCEj+dNyg=">AAAB8XicbVBNS8NAEJ34WeNX1aOXxSJ40JKIVI8FLx4r2A9sQ9lsN+3SzSbsToQS+i+8eFDEq//Gm//GpM1BWx8MPN6bYWaeH0th0HG+rZXVtfWNzdKWvb2zu7dfPjhsmSjRjDdZJCPd8anhUijeRIGSd2LNaehL3vbHt7nffuLaiEg94CTmXkiHSgSCUcykxwvSEyrAiW33yxWn6sxAlolbkAoUaPTLX71BxJKQK2SSGtN1nRi9lGoUTPKp3UsMjykb0yHvZlTRkBsvnV08JaeZMiBBpLNSSGbq74mUhsZMQj/rDCmOzKKXi/953QSDGy8VKk6QKzZfFCSSYETy98lAaM5QTjJCmRbZrYSNqKYMs5DyENzFl5dJ67Lq1qq1+6tK/byIowTHcAJn4MI11OEOGtAEBgqe4RXeLGO9WO/Wx7x1xSpmjuAPrM8f7aWPug==</latexit>

Pointwise
multiplication
saw qk qk saw 1 1 1 1 1
<latexit sha1_base64="sx8aqgD1cIeIGNAHGJRIIe7slBA=">AAAB6nicbVBNS8NAEJ3Urxq/qh69LBbBg5REpHosePFY0X5AG8pmu2mXbnbD7kYooT/BiwdFvPqLvPlv3LQ5aOuDgcd7M8zMCxPOtPG8b6e0tr6xuVXednd29/YPKodHbS1TRWiLSC5VN8SaciZoyzDDaTdRFMchp51wcpv7nSeqNJPi0UwTGsR4JFjECDZWevBdd1CpejVvDrRK/IJUoUBzUPnqDyVJYyoM4Vjrnu8lJsiwMoxwOnP7qaYJJhM8oj1LBY6pDrL5qTN0ZpUhiqSyJQyaq78nMhxrPY1D2xljM9bLXi7+5/VSE90EGRNJaqggi0VRypGRKP8bDZmixPCpJZgoZm9FZIwVJsamk4fgL7+8StqXNb9eq99fVRsXRRxlOIFTOAcfrqEBd9CEFhAYwTO8wpvDnRfn3flYtJacYuYY/sD5/AHgF4zS</latexit>

qk
<latexit sha1_base64="sx8aqgD1cIeIGNAHGJRIIe7slBA=">AAAB6nicbVBNS8NAEJ3Urxq/qh69LBbBg5REpHosePFY0X5AG8pmu2mXbnbD7kYooT/BiwdFvPqLvPlv3LQ5aOuDgcd7M8zMCxPOtPG8b6e0tr6xuVXednd29/YPKodHbS1TRWiLSC5VN8SaciZoyzDDaTdRFMchp51wcpv7nSeqNJPi0UwTGsR4JFjECDZWevBdd1CpejVvDrRK/IJUoUBzUPnqDyVJYyoM4Vjrnu8lJsiwMoxwOnP7qaYJJhM8oj1LBY6pDrL5qTN0ZpUhiqSyJQyaq78nMhxrPY1D2xljM9bLXi7+5/VSE90EGRNJaqggi0VRypGRKP8bDZmixPCpJZgoZm9FZIwVJsamk4fgL7+8StqXNb9eq99fVRsXRRxlOIFTOAcfrqEBd9CEFhAYwTO8wpvDnRfn3flYtJacYuYY/sD5/AHgF4zS</latexit>

<latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit> <latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit>

<latexit sha1_base64="sx8aqgD1cIeIGNAHGJRIIe7slBA=">AAAB6nicbVBNS8NAEJ3Urxq/qh69LBbBg5REpHosePFY0X5AG8pmu2mXbnbD7kYooT/BiwdFvPqLvPlv3LQ5aOuDgcd7M8zMCxPOtPG8b6e0tr6xuVXednd29/YPKodHbS1TRWiLSC5VN8SaciZoyzDDaTdRFMchp51wcpv7nSeqNJPi0UwTGsR4JFjECDZWevBdd1CpejVvDrRK/IJUoUBzUPnqDyVJYyoM4Vjrnu8lJsiwMoxwOnP7qaYJJhM8oj1LBY6pDrL5qTN0ZpUhiqSyJQyaq78nMhxrPY1D2xljM9bLXi7+5/VSE90EGRNJaqggi0VRypGRKP8bDZmixPCpJZgoZm9FZIwVJsamk4fgL7+8StqXNb9eq99fVRsXRRxlOIFTOAcfrqEBd9CEFhAYwTO8wpvDnRfn3flYtJacYuYY/sD5/AHgF4zS</latexit>

<latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit>

qk qk
<latexit sha1_base64="sx8aqgD1cIeIGNAHGJRIIe7slBA=">AAAB6nicbVBNS8NAEJ3Urxq/qh69LBbBg5REpHosePFY0X5AG8pmu2mXbnbD7kYooT/BiwdFvPqLvPlv3LQ5aOuDgcd7M8zMCxPOtPG8b6e0tr6xuVXednd29/YPKodHbS1TRWiLSC5VN8SaciZoyzDDaTdRFMchp51wcpv7nSeqNJPi0UwTGsR4JFjECDZWevBdd1CpejVvDrRK/IJUoUBzUPnqDyVJYyoM4Vjrnu8lJsiwMoxwOnP7qaYJJhM8oj1LBY6pDrL5qTN0ZpUhiqSyJQyaq78nMhxrPY1D2xljM9bLXi7+5/VSE90EGRNJaqggi0VRypGRKP8bDZmixPCpJZgoZm9FZIwVJsamk4fgL7+8StqXNb9eq99fVRsXRRxlOIFTOAcfrqEBd9CEFhAYwTO8wpvDnRfn3flYtJacYuYY/sD5/AHgF4zS</latexit>

<latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit> <latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit>

<latexit sha1_base64="JR5w8iGbxoaP1JzolmcCEj+dNyg=">AAAB8XicbVBNS8NAEJ34WeNX1aOXxSJ40JKIVI8FLx4r2A9sQ9lsN+3SzSbsToQS+i+8eFDEq//Gm//GpM1BWx8MPN6bYWaeH0th0HG+rZXVtfWNzdKWvb2zu7dfPjhsmSjRjDdZJCPd8anhUijeRIGSd2LNaehL3vbHt7nffuLaiEg94CTmXkiHSgSCUcykxwvSEyrAiW33yxWn6sxAlolbkAoUaPTLX71BxJKQK2SSGtN1nRi9lGoUTPKp3UsMjykb0yHvZlTRkBsvnV08JaeZMiBBpLNSSGbq74mUhsZMQj/rDCmOzKKXi/953QSDGy8VKk6QKzZfFCSSYETy98lAaM5QTjJCmRbZrYSNqKYMs5DyENzFl5dJ67Lq1qq1+6tK/byIowTHcAJn4MI11OEOGtAEBgqe4RXeLGO9WO/Wx7x1xSpmjuAPrM8f7aWPug==</latexit>

a qk qk a 1 1 1
<latexit sha1_base64="sx8aqgD1cIeIGNAHGJRIIe7slBA=">AAAB6nicbVBNS8NAEJ3Urxq/qh69LBbBg5REpHosePFY0X5AG8pmu2mXbnbD7kYooT/BiwdFvPqLvPlv3LQ5aOuDgcd7M8zMCxPOtPG8b6e0tr6xuVXednd29/YPKodHbS1TRWiLSC5VN8SaciZoyzDDaTdRFMchp51wcpv7nSeqNJPi0UwTGsR4JFjECDZWevBdd1CpejVvDrRK/IJUoUBzUPnqDyVJYyoM4Vjrnu8lJsiwMoxwOnP7qaYJJhM8oj1LBY6pDrL5qTN0ZpUhiqSyJQyaq78nMhxrPY1D2xljM9bLXi7+5/VSE90EGRNJaqggi0VRypGRKP8bDZmixPCpJZgoZm9FZIwVJsamk4fgL7+8StqXNb9eq99fVRsXRRxlOIFTOAcfrqEBd9CEFhAYwTO8wpvDnRfn3flYtJacYuYY/sD5/AHgF4zS</latexit>

qk 1 1
<latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit> <latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit>

<latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit>

<latexit sha1_base64="sx8aqgD1cIeIGNAHGJRIIe7slBA=">AAAB6nicbVBNS8NAEJ3Urxq/qh69LBbBg5REpHosePFY0X5AG8pmu2mXbnbD7kYooT/BiwdFvPqLvPlv3LQ5aOuDgcd7M8zMCxPOtPG8b6e0tr6xuVXednd29/YPKodHbS1TRWiLSC5VN8SaciZoyzDDaTdRFMchp51wcpv7nSeqNJPi0UwTGsR4JFjECDZWevBdd1CpejVvDrRK/IJUoUBzUPnqDyVJYyoM4Vjrnu8lJsiwMoxwOnP7qaYJJhM8oj1LBY6pDrL5qTN0ZpUhiqSyJQyaq78nMhxrPY1D2xljM9bLXi7+5/VSE90EGRNJaqggi0VRypGRKP8bDZmixPCpJZgoZm9FZIwVJsamk4fgL7+8StqXNb9eq99fVRsXRRxlOIFTOAcfrqEBd9CEFhAYwTO8wpvDnRfn3flYtJacYuYY/sD5/AHgF4zS</latexit> <latexit sha1_base64="sx8aqgD1cIeIGNAHGJRIIe7slBA=">AAAB6nicbVBNS8NAEJ3Urxq/qh69LBbBg5REpHosePFY0X5AG8pmu2mXbnbD7kYooT/BiwdFvPqLvPlv3LQ5aOuDgcd7M8zMCxPOtPG8b6e0tr6xuVXednd29/YPKodHbS1TRWiLSC5VN8SaciZoyzDDaTdRFMchp51wcpv7nSeqNJPi0UwTGsR4JFjECDZWevBdd1CpejVvDrRK/IJUoUBzUPnqDyVJYyoM4Vjrnu8lJsiwMoxwOnP7qaYJJhM8oj1LBY6pDrL5qTN0ZpUhiqSyJQyaq78nMhxrPY1D2xljM9bLXi7+5/VSE90EGRNJaqggi0VRypGRKP8bDZmixPCpJZgoZm9FZIwVJsamk4fgL7+8StqXNb9eq99fVRsXRRxlOIFTOAcfrqEBd9CEFhAYwTO8wpvDnRfn3flYtJacYuYY/sD5/AHgF4zS</latexit>

qk qk
<latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit> <latexit sha1_base64="p2wCpVEKraZO55cbiWrKB/TDMbg=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqseCF49V7Ae0oWy2k3bpZhN3N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3n1BpHssHM0nQj+hQ8pAzaqx0/zjulytu1Z2DrBIvJxXI0eiXv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtVTSCLWfzS+dkjOrDEgYK1vSkLn6eyKjkdaTKLCdETUjvezNxP+8bmrCaz/jMkkNSrZYFKaCmJjM3iYDrpAZMbGEMsXtrYSNqKLM2HBKNgRv+eVV0rqoerVq7e6yUq/lcRThBE7hHDy4gjrcQgOawCCEZ3iFN2fsvDjvzseiteDkM8fwB87nD6MjjWk=</latexit>

<latexit sha1_base64="sx8aqgD1cIeIGNAHGJRIIe7slBA=">AAAB6nicbVBNS8NAEJ3Urxq/qh69LBbBg5REpHosePFY0X5AG8pmu2mXbnbD7kYooT/BiwdFvPqLvPlv3LQ5aOuDgcd7M8zMCxPOtPG8b6e0tr6xuVXednd29/YPKodHbS1TRWiLSC5VN8SaciZoyzDDaTdRFMchp51wcpv7nSeqNJPi0UwTGsR4JFjECDZWevBdd1CpejVvDrRK/IJUoUBzUPnqDyVJYyoM4Vjrnu8lJsiwMoxwOnP7qaYJJhM8oj1LBY6pDrL5qTN0ZpUhiqSyJQyaq78nMhxrPY1D2xljM9bLXi7+5/VSE90EGRNJaqggi0VRypGRKP8bDZmixPCpJZgoZm9FZIwVJsamk4fgL7+8StqXNb9eq99fVRsXRRxlOIFTOAcfrqEBd9CEFhAYwTO8wpvDnRfn3flYtJacYuYY/sD5/AHgF4zS</latexit>

<latexit sha1_base64="sx8aqgD1cIeIGNAHGJRIIe7slBA=">AAAB6nicbVBNS8NAEJ3Urxq/qh69LBbBg5REpHosePFY0X5AG8pmu2mXbnbD7kYooT/BiwdFvPqLvPlv3LQ5aOuDgcd7M8zMCxPOtPG8b6e0tr6xuVXednd29/YPKodHbS1TRWiLSC5VN8SaciZoyzDDaTdRFMchp51wcpv7nSeqNJPi0UwTGsR4JFjECDZWevBdd1CpejVvDrRK/IJUoUBzUPnqDyVJYyoM4Vjrnu8lJsiwMoxwOnP7qaYJJhM8oj1LBY6pDrL5qTN0ZpUhiqSyJQyaq78nMhxrPY1D2xljM9bLXi7+5/VSE90EGRNJaqggi0VRypGRKP8bDZmixPCpJZgoZm9FZIwVJsamk4fgL7+8StqXNb9eq99fVRsXRRxlOIFTOAcfrqEBd9CEFhAYwTO8wpvDnRfn3flYtJacYuYY/sD5/AHgF4zS</latexit>

i i
<latexit sha1_base64="/567GNfqs6hI1+r9uu/P1P0myJ4=">AAAB6HicbVBNS8NAEJ34WetX1aOXxSJ4KkmR6rHgxWML9gPaUDbbSbt2swm7G6GE/gIvHhTx6k/y5r9x2+agrQ8GHu/NMDMvSATXxnW/nY3Nre2d3cJecf/g8Oi4dHLa1nGqGLZYLGLVDahGwSW2DDcCu4lCGgUCO8Hkbu53nlBpHssHM03Qj+hI8pAzaqzU5INS2a24C5B14uWkDDkag9JXfxizNEJpmKBa9zw3MX5GleFM4KzYTzUmlE3oCHuWShqh9rPFoTNyaZUhCWNlSxqyUH9PZDTSehoFtjOiZqxXvbn4n9dLTXjrZ1wmqUHJlovCVBATk/nXZMgVMiOmllCmuL2VsDFVlBmbTdGG4K2+vE7a1YpXq9Sa1+V6NY+jAOdwAVfgwQ3U4R4a0AIGCM/wCm/Oo/PivDsfy9YNJ585gz9wPn8Azj+M6A==</latexit>

<latexit sha1_base64="/567GNfqs6hI1+r9uu/P1P0myJ4=">AAAB6HicbVBNS8NAEJ34WetX1aOXxSJ4KkmR6rHgxWML9gPaUDbbSbt2swm7G6GE/gIvHhTx6k/y5r9x2+agrQ8GHu/NMDMvSATXxnW/nY3Nre2d3cJecf/g8Oi4dHLa1nGqGLZYLGLVDahGwSW2DDcCu4lCGgUCO8Hkbu53nlBpHssHM03Qj+hI8pAzaqzU5INS2a24C5B14uWkDDkag9JXfxizNEJpmKBa9zw3MX5GleFM4KzYTzUmlE3oCHuWShqh9rPFoTNyaZUhCWNlSxqyUH9PZDTSehoFtjOiZqxXvbn4n9dLTXjrZ1wmqUHJlovCVBATk/nXZMgVMiOmllCmuL2VsDFVlBmbTdGG4K2+vE7a1YpXq9Sa1+V6NY+jAOdwAVfgwQ3U4R4a0AIGCM/wCm/Oo/PivDsfy9YNJ585gz9wPn8Azj+M6A==</latexit>

<latexit sha1_base64="hq8ds1GzNxndWdr5mfVf8jEt2wI=">AAACh3icbVFNb9QwEHVCgdZ8LXDkYrELQkhsk1ItvSAVceGCVCS2rbRerRxnsuuu40T2pCWK8lf4Udz4NzjbIJUtI1l6ejPvjWcmKbVyGEW/g/DOzt1793f36IOHjx4/GTx9duqKykqYykIX9jwRDrQyMEWFGs5LCyJPNJwl689d/uwSrFOF+Y51CfNcLI3KlBToqcXgJ0f4gTZvvgq3bheNumg/Mso1ZMgbyhNYKtMIa0XdNlrrlsbsNfurYSpjAhFM58USwCsAw0ZqxIRJ2ehixFrO6TvGlcmw3hKa4oa2pRxM2jei3KrlCseU0sVgGI2jTbDbIO7BkPRxshj84mkhq9z7Si2cm8VRiXNvjEpq8NaVg1LItVjCzEMjcnDzZrPHlr3yTMqywvpnkG3Ym4pG5M7VeeIrc4Ert53ryP/lZhVmR/NGmbLyA8vrRlmlGRasOwpLlQWJuvZASKv8X5lcCSsk+tN1S4i3R74NTg/G8WQ8+XY4PD7o17FLXpCX5A2JyQdyTL6QEzIlMtgJ3gbvg8NwL9wPJ+HRdWkY9Jrn5J8IP/0Bs9DBow==</latexit>


<latexit sha1_base64="pm30OV/CIeIa8nSQbMezpf4a7MY=">AAACDHicbVDLSsNAFJ34rPVVdelmsAiuSlKkuhEqbgS7aKUvaNIymU7aoZNJmJkIJeQD3Pgrblwo4tYPcOffOGmz0NYDFw7n3Mu997gho1KZ5rexsrq2vrGZ28pv7+zu7RcODtsyiAQmLRywQHRdJAmjnLQUVYx0Q0GQ7zLScSc3qd95IELSgDfVNCSOj0acehQjpaVBoXgNr2Djrt+ENuXQ9pEau258n/Tjmq2oTySsJbrLLJkzwGViZaQIMtQHhS97GODIJ1xhhqTsWWaonBgJRTEjSd6OJAkRnqAR6WnKkd7jxLNnEniqlSH0AqGLKzhTf0/EyJdy6ru6M71WLnqp+J/Xi5R36cSUh5EiHM8XeRGDKoBpMnBIBcGKTTVBWFB9K8RjJBBWOr+8DsFafHmZtMslq1KqNM6L1XIWRw4cgxNwBixwAargFtRBC2DwCJ7BK3gznowX4934mLeuGNnMEfgD4/MHsPSaIw==</latexit>

A = QK T 2 RL⇥L 1 if attention between i and j


Maskij =
1 if no attention
Xavier Bresson 32
33

Masked attention

Step 3 : Softmax calculation


Yesterday I … saw a Yesterday I … saw a
<latexit sha1_base64="a/l1BFMoxqpLl0Ek1dbfORNKOpY=">AAAB6nicbVBNS8NAEJ34WeNX1aOXxSL0ICURqR4LXjxWtB/QhrLZbtqlm92wuxFK6E/w4kERr/4ib/4bN20O2vpg4PHeDDPzwoQzbTzv21lb39jc2i7tuLt7+weH5aPjtpapIrRFJJeqG2JNORO0ZZjhtJsoiuOQ0044uc39zhNVmknxaKYJDWI8EixiBBsrPVRdd1CueDVvDrRK/IJUoEBzUP7qDyVJYyoM4Vjrnu8lJsiwMoxwOnP7qaYJJhM8oj1LBY6pDrL5qTN0bpUhiqSyJQyaq78nMhxrPY1D2xljM9bLXi7+5/VSE90EGRNJaqggi0VRypGRKP8bDZmixPCpJZgoZm9FZIwVJsamk4fgL7+8StqXNb9eq99fVRoXRRwlOIUzqIIP19CAO2hCCwiM4Ble4c3hzovz7nwsWtecYuYE/sD5/AHSYYzJ</latexit>

(
(
Yesterday qk 1 1 1 1 1 0 0 0 0
<latexit sha1_base64="JZSNh8HMqbgLWiOy9tGv37FCpyg=">AAAB/HicbVBNS8NAEN3Ur1q/oj16CRbBU0lEqseCF48V7Ac0oWw2m3bpZpPuToQQ4l/x4kERr/4Qb/4bt20O2vpg4PHeDDPz/IQzBbb9bVQ2Nre2d6q7tb39g8Mj8/ikp+JUEtolMY/lwMeKciZoFxhwOkgkxZHPad+f3s79/iOVisXiAbKEehEeCxYygkFLI7PuhhKTfDYtclfNJORBUYzMht20F7DWiVOSBirRGZlfbhCTNKICCMdKDR07AS/HEhjhtKi5qaIJJlM8pkNNBY6o8vLF8YV1rpXACmOpS4C1UH9P5DhSKot83RlhmKhVby7+5w1TCG+8nIkkBSrIclGYcgtia56EFTBJCfBME0wk07daZIJ1GqDzqukQnNWX10nvsum0mq37q0a7VcZRRafoDF0gB12jNrpDHdRFBGXoGb2iN+PJeDHejY9la8UoZ+roD4zPHxT9la4=</latexit>

Yesterday
<latexit sha1_base64="JR5w8iGbxoaP1JzolmcCEj+dNyg=">AAAB8XicbVBNS8NAEJ34WeNX1aOXxSJ40JKIVI8FLx4r2A9sQ9lsN+3SzSbsToQS+i+8eFDEq//Gm//GpM1BWx8MPN6bYWaeH0th0HG+rZXVtfWNzdKWvb2zu7dfPjhsmSjRjDdZJCPd8anhUijeRIGSd2LNaehL3vbHt7nffuLaiEg94CTmXkiHSgSCUcykxwvSEyrAiW33yxWn6sxAlolbkAoUaPTLX71BxJKQK2SSGtN1nRi9lGoUTPKp3UsMjykb0yHvZlTRkBsvnV08JaeZMiBBpLNSSGbq74mUhsZMQj/rDCmOzKKXi/953QSDGy8VKk6QKzZfFCSSYETy98lAaM5QTjJCmRbZrYSNqKYMs5DyENzFl5dJ67Lq1qq1+6tK/byIowTHcAJn4MI11OEOGtAEBgqe4RXeLGO9WO/Wx7x1xSpmjuAPrM8f7aWPug==</latexit>

<latexit sha1_base64="JR5w8iGbxoaP1JzolmcCEj+dNyg=">AAAB8XicbVBNS8NAEJ34WeNX1aOXxSJ40JKIVI8FLx4r2A9sQ9lsN+3SzSbsToQS+i+8eFDEq//Gm//GpM1BWx8MPN6bYWaeH0th0HG+rZXVtfWNzdKWvb2zu7dfPjhsmSjRjDdZJCPd8anhUijeRIGSd2LNaehL3vbHt7nffuLaiEg94CTmXkiHSgSCUcykxwvSEyrAiW33yxWn6sxAlolbkAoUaPTLX71BxJKQK2SSGtN1nRi9lGoUTPKp3UsMjykb0yHvZlTRkBsvnV08JaeZMiBBpLNSSGbq74mUhsZMQj/rDCmOzKKXi/953QSDGy8VKk6QKzZfFCSSYETy98lAaM5QTjJCmRbZrYSNqKYMs5DyENzFl5dJ67Lq1qq1+6tK/byIowTHcAJn4MI11OEOGtAEBgqe4RXeLGO9WO/Wx7x1xSpmjuAPrM8f7aWPug==</latexit>

<latexit sha1_base64="JR5w8iGbxoaP1JzolmcCEj+dNyg=">AAAB8XicbVBNS8NAEJ34WeNX1aOXxSJ40JKIVI8FLx4r2A9sQ9lsN+3SzSbsToQS+i+8eFDEq//Gm//GpM1BWx8MPN6bYWaeH0th0HG+rZXVtfWNzdKWvb2zu7dfPjhsmSjRjDdZJCPd8anhUijeRIGSd2LNaehL3vbHt7nffuLaiEg94CTmXkiHSgSCUcykxwvSEyrAiW33yxWn6sxAlolbkAoUaPTLX71BxJKQK2SSGtN1nRi9lGoUTPKp3UsMjykb0yHvZlTRkBsvnV08JaeZMiBBpLNSSGbq74mUhsZMQj/rDCmOzKKXi/953QSDGy8VKk6QKzZfFCSSYETy98lAaM5QTjJCmRbZrYSNqKYMs5DyENzFl5dJ67Lq1qq1+6tK/byIowTHcAJn4MI11OEOGtAEBgqe4RXeLGO9WO/Wx7x1xSpmjuAPrM8f7aWPug==</latexit>

<latexit sha1_base64="JR5w8iGbxoaP1JzolmcCEj+dNyg=">AAAB8XicbVBNS8NAEJ34WeNX1aOXxSJ40JKIVI8FLx4r2A9sQ9lsN+3SzSbsToQS+i+8eFDEq//Gm//GpM1BWx8MPN6bYWaeH0th0HG+rZXVtfWNzdKWvb2zu7dfPjhsmSjRjDdZJCPd8anhUijeRIGSd2LNaehL3vbHt7nffuLaiEg94CTmXkiHSgSCUcykxwvSEyrAiW33yxWn6sxAlolbkAoUaPTLX71BxJKQK2SSGtN1nRi9lGoUTPKp3UsMjykb0yHvZlTRkBsvnV08JaeZMiBBpLNSSGbq74mUhsZMQj/rDCmOzKKXi/953QSDGy8VKk6QKzZfFCSSYETy98lAaM5QTjJCmRbZrYSNqKYMs5DyENzFl5dJ67Lq1qq1+6tK/byIowTHcAJn4MI11OEOGtAEBgqe4RXeLGO9WO/Wx7x1xSpmjuAPrM8f7aWPug==</latexit>

p
d

qk qk
<latexit sha1_base64="JZSNh8HMqbgLWiOy9tGv37FCpyg=">AAAB/HicbVBNS8NAEN3Ur1q/oj16CRbBU0lEqseCF48V7Ac0oWw2m3bpZpPuToQQ4l/x4kERr/4Qb/4bt20O2vpg4PHeDDPz/IQzBbb9bVQ2Nre2d6q7tb39g8Mj8/ikp+JUEtolMY/lwMeKciZoFxhwOkgkxZHPad+f3s79/iOVisXiAbKEehEeCxYygkFLI7PuhhKTfDYtclfNJORBUYzMht20F7DWiVOSBirRGZlfbhCTNKICCMdKDR07AS/HEhjhtKi5qaIJJlM8pkNNBY6o8vLF8YV1rpXACmOpS4C1UH9P5DhSKot83RlhmKhVby7+5w1TCG+8nIkkBSrIclGYcgtia56EFTBJCfBME0wk07daZIJ1GqDzqukQnNWX10nvsum0mq37q0a7VcZRRafoDF0gB12jNrpDHdRFBGXoGb2iN+PJeDHejY9la8UoZ+roD4zPHxT9la4=</latexit>

<latexit sha1_base64="JZSNh8HMqbgLWiOy9tGv37FCpyg=">AAAB/HicbVBNS8NAEN3Ur1q/oj16CRbBU0lEqseCF48V7Ac0oWw2m3bpZpPuToQQ4l/x4kERr/4Qb/4bt20O2vpg4PHeDDPz/IQzBbb9bVQ2Nre2d6q7tb39g8Mj8/ikp+JUEtolMY/lwMeKciZoFxhwOkgkxZHPad+f3s79/iOVisXiAbKEehEeCxYygkFLI7PuhhKTfDYtclfNJORBUYzMht20F7DWiVOSBirRGZlfbhCTNKICCMdKDR07AS/HEhjhtKi5qaIJJlM8pkNNBY6o8vLF8YV1rpXACmOpS4C1UH9P5DhSKot83RlhmKhVby7+5w1TCG+8nIkkBSrIclGYcgtia56EFTBJCfBME0wk07daZIJ1GqDzqukQnNWX10nvsum0mq37q0a7VcZRRafoDF0gB12jNrpDHdRFBGXoGb2iN+PJeDHejY9la8UoZ+roD4zPHxT9la4=</latexit>

I p p
<latexit sha1_base64="JR5w8iGbxoaP1JzolmcCEj+dNyg=">AAAB8XicbVBNS8NAEJ34WeNX1aOXxSJ40JKIVI8FLx4r2A9sQ9lsN+3SzSbsToQS+i+8eFDEq//Gm//GpM1BWx8MPN6bYWaeH0th0HG+rZXVtfWNzdKWvb2zu7dfPjhsmSjRjDdZJCPd8anhUijeRIGSd2LNaehL3vbHt7nffuLaiEg94CTmXkiHSgSCUcykxwvSEyrAiW33yxWn6sxAlolbkAoUaPTLX71BxJKQK2SSGtN1nRi9lGoUTPKp3UsMjykb0yHvZlTRkBsvnV08JaeZMiBBpLNSSGbq74mUhsZMQj/rDCmOzKKXi/953QSDGy8VKk6QKzZfFCSSYETy98lAaM5QTjJCmRbZrYSNqKYMs5DyENzFl5dJ67Lq1qq1+6tK/byIowTHcAJn4MI11OEOGtAEBgqe4RXeLGO9WO/Wx7x1xSpmjuAPrM8f7aWPug==</latexit>

1 <latexit sha1_base64="JR5w8iGbxoaP1JzolmcCEj+dNyg=">AAAB8XicbVBNS8NAEJ34WeNX1aOXxSJ40JKIVI8FLx4r2A9sQ9lsN+3SzSbsToQS+i+8eFDEq//Gm//GpM1BWx8MPN6bYWaeH0th0HG+rZXVtfWNzdKWvb2zu7dfPjhsmSjRjDdZJCPd8anhUijeRIGSd2LNaehL3vbHt7nffuLaiEg94CTmXkiHSgSCUcykxwvSEyrAiW33yxWn6sxAlolbkAoUaPTLX71BxJKQK2SSGtN1nRi9lGoUTPKp3UsMjykb0yHvZlTRkBsvnV08JaeZMiBBpLNSSGbq74mUhsZMQj/rDCmOzKKXi/953QSDGy8VKk6QKzZfFCSSYETy98lAaM5QTjJCmRbZrYSNqKYMs5DyENzFl5dJ67Lq1qq1+6tK/byIowTHcAJn4MI11OEOGtAEBgqe4RXeLGO9WO/Wx7x1xSpmjuAPrM8f7aWPug==</latexit>

1 <latexit sha1_base64="JR5w8iGbxoaP1JzolmcCEj+dNyg=">AAAB8XicbVBNS8NAEJ34WeNX1aOXxSJ40JKIVI8FLx4r2A9sQ9lsN+3SzSbsToQS+i+8eFDEq//Gm//GpM1BWx8MPN6bYWaeH0th0HG+rZXVtfWNzdKWvb2zu7dfPjhsmSjRjDdZJCPd8anhUijeRIGSd2LNaehL3vbHt7nffuLaiEg94CTmXkiHSgSCUcykxwvSEyrAiW33yxWn6sxAlolbkAoUaPTLX71BxJKQK2SSGtN1nRi9lGoUTPKp3UsMjykb0yHvZlTRkBsvnV08JaeZMiBBpLNSSGbq74mUhsZMQj/rDCmOzKKXi/953QSDGy8VKk6QKzZfFCSSYETy98lAaM5QTjJCmRbZrYSNqKYMs5DyENzFl5dJ67Lq1qq1+6tK/byIowTHcAJn4MI11OEOGtAEBgqe4RXeLGO9WO/Wx7x1xSpmjuAPrM8f7aWPug==</latexit>

1 0.3 0.7 0 0 0
d d I

Softmax qk
=
<latexit sha1_base64="JZSNh8HMqbgLWiOy9tGv37FCpyg=">AAAB/HicbVBNS8NAEN3Ur1q/oj16CRbBU0lEqseCF48V7Ac0oWw2m3bpZpPuToQQ4l/x4kERr/4Qb/4bt20O2vpg4PHeDDPz/IQzBbb9bVQ2Nre2d6q7tb39g8Mj8/ikp+JUEtolMY/lwMeKciZoFxhwOkgkxZHPad+f3s79/iOVisXiAbKEehEeCxYygkFLI7PuhhKTfDYtclfNJORBUYzMht20F7DWiVOSBirRGZlfbhCTNKICCMdKDR07AS/HEhjhtKi5qaIJJlM8pkNNBY6o8vLF8YV1rpXACmOpS4C1UH9P5DhSKot83RlhmKhVby7+5w1TCG+8nIkkBSrIclGYcgtia56EFTBJCfBME0wk07daZIJ1GqDzqukQnNWX10nvsum0mq37q0a7VcZRRafoDF0gB12jNrpDHdRFBGXoGb2iN+PJeDHejY9la8UoZ+roD4zPHxT9la4=</latexit>

qk qk
<latexit sha1_base64="JZSNh8HMqbgLWiOy9tGv37FCpyg=">AAAB/HicbVBNS8NAEN3Ur1q/oj16CRbBU0lEqseCF48V7Ac0oWw2m3bpZpPuToQQ4l/x4kERr/4Qb/4bt20O2vpg4PHeDDPz/IQzBbb9bVQ2Nre2d6q7tb39g8Mj8/ikp+JUEtolMY/lwMeKciZoFxhwOkgkxZHPad+f3s79/iOVisXiAbKEehEeCxYygkFLI7PuhhKTfDYtclfNJORBUYzMht20F7DWiVOSBirRGZlfbhCTNKICCMdKDR07AS/HEhjhtKi5qaIJJlM8pkNNBY6o8vLF8YV1rpXACmOpS4C1UH9P5DhSKot83RlhmKhVby7+5w1TCG+8nIkkBSrIclGYcgtia56EFTBJCfBME0wk07daZIJ1GqDzqukQnNWX10nvsum0mq37q0a7VcZRRafoDF0gB12jNrpDHdRFBGXoGb2iN+PJeDHejY9la8UoZ+roD4zPHxT9la4=</latexit>

<latexit sha1_base64="JZSNh8HMqbgLWiOy9tGv37FCpyg=">AAAB/HicbVBNS8NAEN3Ur1q/oj16CRbBU0lEqseCF48V7Ac0oWw2m3bpZpPuToQQ4l/x4kERr/4Qb/4bt20O2vpg4PHeDDPz/IQzBbb9bVQ2Nre2d6q7tb39g8Mj8/ikp+JUEtolMY/lwMeKciZoFxhwOkgkxZHPad+f3s79/iOVisXiAbKEehEeCxYygkFLI7PuhhKTfDYtclfNJORBUYzMht20F7DWiVOSBirRGZlfbhCTNKICCMdKDR07AS/HEhjhtKi5qaIJJlM8pkNNBY6o8vLF8YV1rpXACmOpS4C1UH9P5DhSKot83RlhmKhVby7+5w1TCG+8nIkkBSrIclGYcgtia56EFTBJCfBME0wk07daZIJ1GqDzqukQnNWX10nvsum0mq37q0a7VcZRRafoDF0gB12jNrpDHdRFBGXoGb2iN+PJeDHejY9la8UoZ+roD4zPHxT9la4=</latexit>

… p p p 1 1 … 0.4 0.2 0.4 0 0


<latexit sha1_base64="JR5w8iGbxoaP1JzolmcCEj+dNyg=">AAAB8XicbVBNS8NAEJ34WeNX1aOXxSJ40JKIVI8FLx4r2A9sQ9lsN+3SzSbsToQS+i+8eFDEq//Gm//GpM1BWx8MPN6bYWaeH0th0HG+rZXVtfWNzdKWvb2zu7dfPjhsmSjRjDdZJCPd8anhUijeRIGSd2LNaehL3vbHt7nffuLaiEg94CTmXkiHSgSCUcykxwvSEyrAiW33yxWn6sxAlolbkAoUaPTLX71BxJKQK2SSGtN1nRi9lGoUTPKp3UsMjykb0yHvZlTRkBsvnV08JaeZMiBBpLNSSGbq74mUhsZMQj/rDCmOzKKXi/953QSDGy8VKk6QKzZfFCSSYETy98lAaM5QTjJCmRbZrYSNqKYMs5DyENzFl5dJ67Lq1qq1+6tK/byIowTHcAJn4MI11OEOGtAEBgqe4RXeLGO9WO/Wx7x1xSpmjuAPrM8f7aWPug==</latexit>

<latexit sha1_base64="JR5w8iGbxoaP1JzolmcCEj+dNyg=">AAAB8XicbVBNS8NAEJ34WeNX1aOXxSJ40JKIVI8FLx4r2A9sQ9lsN+3SzSbsToQS+i+8eFDEq//Gm//GpM1BWx8MPN6bYWaeH0th0HG+rZXVtfWNzdKWvb2zu7dfPjhsmSjRjDdZJCPd8anhUijeRIGSd2LNaehL3vbHt7nffuLaiEg94CTmXkiHSgSCUcykxwvSEyrAiW33yxWn6sxAlolbkAoUaPTLX71BxJKQK2SSGtN1nRi9lGoUTPKp3UsMjykb0yHvZlTRkBsvnV08JaeZMiBBpLNSSGbq74mUhsZMQj/rDCmOzKKXi/953QSDGy8VKk6QKzZfFCSSYETy98lAaM5QTjJCmRbZrYSNqKYMs5DyENzFl5dJ67Lq1qq1+6tK/byIowTHcAJn4MI11OEOGtAEBgqe4RXeLGO9WO/Wx7x1xSpmjuAPrM8f7aWPug==</latexit>

d d d

qk
<latexit sha1_base64="JZSNh8HMqbgLWiOy9tGv37FCpyg=">AAAB/HicbVBNS8NAEN3Ur1q/oj16CRbBU0lEqseCF48V7Ac0oWw2m3bpZpPuToQQ4l/x4kERr/4Qb/4bt20O2vpg4PHeDDPz/IQzBbb9bVQ2Nre2d6q7tb39g8Mj8/ikp+JUEtolMY/lwMeKciZoFxhwOkgkxZHPad+f3s79/iOVisXiAbKEehEeCxYygkFLI7PuhhKTfDYtclfNJORBUYzMht20F7DWiVOSBirRGZlfbhCTNKICCMdKDR07AS/HEhjhtKi5qaIJJlM8pkNNBY6o8vLF8YV1rpXACmOpS4C1UH9P5DhSKot83RlhmKhVby7+5w1TCG+8nIkkBSrIclGYcgtia56EFTBJCfBME0wk07daZIJ1GqDzqukQnNWX10nvsum0mq37q0a7VcZRRafoDF0gB12jNrpDHdRFBGXoGb2iN+PJeDHejY9la8UoZ+roD4zPHxT9la4=</latexit>

qk qk
<latexit sha1_base64="JZSNh8HMqbgLWiOy9tGv37FCpyg=">AAAB/HicbVBNS8NAEN3Ur1q/oj16CRbBU0lEqseCF48V7Ac0oWw2m3bpZpPuToQQ4l/x4kERr/4Qb/4bt20O2vpg4PHeDDPz/IQzBbb9bVQ2Nre2d6q7tb39g8Mj8/ikp+JUEtolMY/lwMeKciZoFxhwOkgkxZHPad+f3s79/iOVisXiAbKEehEeCxYygkFLI7PuhhKTfDYtclfNJORBUYzMht20F7DWiVOSBirRGZlfbhCTNKICCMdKDR07AS/HEhjhtKi5qaIJJlM8pkNNBY6o8vLF8YV1rpXACmOpS4C1UH9P5DhSKot83RlhmKhVby7+5w1TCG+8nIkkBSrIclGYcgtia56EFTBJCfBME0wk07daZIJ1GqDzqukQnNWX10nvsum0mq37q0a7VcZRRafoDF0gB12jNrpDHdRFBGXoGb2iN+PJeDHejY9la8UoZ+roD4zPHxT9la4=</latexit> <latexit sha1_base64="JZSNh8HMqbgLWiOy9tGv37FCpyg=">AAAB/HicbVBNS8NAEN3Ur1q/oj16CRbBU0lEqseCF48V7Ac0oWw2m3bpZpPuToQQ4l/x4kERr/4Qb/4bt20O2vpg4PHeDDPz/IQzBbb9bVQ2Nre2d6q7tb39g8Mj8/ikp+JUEtolMY/lwMeKciZoFxhwOkgkxZHPad+f3s79/iOVisXiAbKEehEeCxYygkFLI7PuhhKTfDYtclfNJORBUYzMht20F7DWiVOSBirRGZlfbhCTNKICCMdKDR07AS/HEhjhtKi5qaIJJlM8pkNNBY6o8vLF8YV1rpXACmOpS4C1UH9P5DhSKot83RlhmKhVby7+5w1TCG+8nIkkBSrIclGYcgtia56EFTBJCfBME0wk07daZIJ1GqDzqukQnNWX10nvsum0mq37q0a7VcZRRafoDF0gB12jNrpDHdRFBGXoGb2iN+PJeDHejY9la8UoZ+roD4zPHxT9la4=</latexit>

qk
<latexit sha1_base64="JZSNh8HMqbgLWiOy9tGv37FCpyg=">AAAB/HicbVBNS8NAEN3Ur1q/oj16CRbBU0lEqseCF48V7Ac0oWw2m3bpZpPuToQQ4l/x4kERr/4Qb/4bt20O2vpg4PHeDDPz/IQzBbb9bVQ2Nre2d6q7tb39g8Mj8/ikp+JUEtolMY/lwMeKciZoFxhwOkgkxZHPad+f3s79/iOVisXiAbKEehEeCxYygkFLI7PuhhKTfDYtclfNJORBUYzMht20F7DWiVOSBirRGZlfbhCTNKICCMdKDR07AS/HEhjhtKi5qaIJJlM8pkNNBY6o8vLF8YV1rpXACmOpS4C1UH9P5DhSKot83RlhmKhVby7+5w1TCG+8nIkkBSrIclGYcgtia56EFTBJCfBME0wk07daZIJ1GqDzqukQnNWX10nvsum0mq37q0a7VcZRRafoDF0gB12jNrpDHdRFBGXoGb2iN+PJeDHejY9la8UoZ+roD4zPHxT9la4=</latexit>

saw p p p p 1 saw 0.2 0.1 0.1 0.5 0


<latexit sha1_base64="JR5w8iGbxoaP1JzolmcCEj+dNyg=">AAAB8XicbVBNS8NAEJ34WeNX1aOXxSJ40JKIVI8FLx4r2A9sQ9lsN+3SzSbsToQS+i+8eFDEq//Gm//GpM1BWx8MPN6bYWaeH0th0HG+rZXVtfWNzdKWvb2zu7dfPjhsmSjRjDdZJCPd8anhUijeRIGSd2LNaehL3vbHt7nffuLaiEg94CTmXkiHSgSCUcykxwvSEyrAiW33yxWn6sxAlolbkAoUaPTLX71BxJKQK2SSGtN1nRi9lGoUTPKp3UsMjykb0yHvZlTRkBsvnV08JaeZMiBBpLNSSGbq74mUhsZMQj/rDCmOzKKXi/953QSDGy8VKk6QKzZfFCSSYETy98lAaM5QTjJCmRbZrYSNqKYMs5DyENzFl5dJ67Lq1qq1+6tK/byIowTHcAJn4MI11OEOGtAEBgqe4RXeLGO9WO/Wx7x1xSpmjuAPrM8f7aWPug==</latexit>

d d d d

qk qk qk
<latexit sha1_base64="JZSNh8HMqbgLWiOy9tGv37FCpyg=">AAAB/HicbVBNS8NAEN3Ur1q/oj16CRbBU0lEqseCF48V7Ac0oWw2m3bpZpPuToQQ4l/x4kERr/4Qb/4bt20O2vpg4PHeDDPz/IQzBbb9bVQ2Nre2d6q7tb39g8Mj8/ikp+JUEtolMY/lwMeKciZoFxhwOkgkxZHPad+f3s79/iOVisXiAbKEehEeCxYygkFLI7PuhhKTfDYtclfNJORBUYzMht20F7DWiVOSBirRGZlfbhCTNKICCMdKDR07AS/HEhjhtKi5qaIJJlM8pkNNBY6o8vLF8YV1rpXACmOpS4C1UH9P5DhSKot83RlhmKhVby7+5w1TCG+8nIkkBSrIclGYcgtia56EFTBJCfBME0wk07daZIJ1GqDzqukQnNWX10nvsum0mq37q0a7VcZRRafoDF0gB12jNrpDHdRFBGXoGb2iN+PJeDHejY9la8UoZ+roD4zPHxT9la4=</latexit>

qk qk
<latexit sha1_base64="JZSNh8HMqbgLWiOy9tGv37FCpyg=">AAAB/HicbVBNS8NAEN3Ur1q/oj16CRbBU0lEqseCF48V7Ac0oWw2m3bpZpPuToQQ4l/x4kERr/4Qb/4bt20O2vpg4PHeDDPz/IQzBbb9bVQ2Nre2d6q7tb39g8Mj8/ikp+JUEtolMY/lwMeKciZoFxhwOkgkxZHPad+f3s79/iOVisXiAbKEehEeCxYygkFLI7PuhhKTfDYtclfNJORBUYzMht20F7DWiVOSBirRGZlfbhCTNKICCMdKDR07AS/HEhjhtKi5qaIJJlM8pkNNBY6o8vLF8YV1rpXACmOpS4C1UH9P5DhSKot83RlhmKhVby7+5w1TCG+8nIkkBSrIclGYcgtia56EFTBJCfBME0wk07daZIJ1GqDzqukQnNWX10nvsum0mq37q0a7VcZRRafoDF0gB12jNrpDHdRFBGXoGb2iN+PJeDHejY9la8UoZ+roD4zPHxT9la4=</latexit>

<latexit sha1_base64="JZSNh8HMqbgLWiOy9tGv37FCpyg=">AAAB/HicbVBNS8NAEN3Ur1q/oj16CRbBU0lEqseCF48V7Ac0oWw2m3bpZpPuToQQ4l/x4kERr/4Qb/4bt20O2vpg4PHeDDPz/IQzBbb9bVQ2Nre2d6q7tb39g8Mj8/ikp+JUEtolMY/lwMeKciZoFxhwOkgkxZHPad+f3s79/iOVisXiAbKEehEeCxYygkFLI7PuhhKTfDYtclfNJORBUYzMht20F7DWiVOSBirRGZlfbhCTNKICCMdKDR07AS/HEhjhtKi5qaIJJlM8pkNNBY6o8vLF8YV1rpXACmOpS4C1UH9P5DhSKot83RlhmKhVby7+5w1TCG+8nIkkBSrIclGYcgtia56EFTBJCfBME0wk07daZIJ1GqDzqukQnNWX10nvsum0mq37q0a7VcZRRafoDF0gB12jNrpDHdRFBGXoGb2iN+PJeDHejY9la8UoZ+roD4zPHxT9la4=</latexit>

<latexit sha1_base64="JZSNh8HMqbgLWiOy9tGv37FCpyg=">AAAB/HicbVBNS8NAEN3Ur1q/oj16CRbBU0lEqseCF48V7Ac0oWw2m3bpZpPuToQQ4l/x4kERr/4Qb/4bt20O2vpg4PHeDDPz/IQzBbb9bVQ2Nre2d6q7tb39g8Mj8/ikp+JUEtolMY/lwMeKciZoFxhwOkgkxZHPad+f3s79/iOVisXiAbKEehEeCxYygkFLI7PuhhKTfDYtclfNJORBUYzMht20F7DWiVOSBirRGZlfbhCTNKICCMdKDR07AS/HEhjhtKi5qaIJJlM8pkNNBY6o8vLF8YV1rpXACmOpS4C1UH9P5DhSKot83RlhmKhVby7+5w1TCG+8nIkkBSrIclGYcgtia56EFTBJCfBME0wk07daZIJ1GqDzqukQnNWX10nvsum0mq37q0a7VcZRRafoDF0gB12jNrpDHdRFBGXoGb2iN+PJeDHejY9la8UoZ+roD4zPHxT9la4=</latexit> <latexit sha1_base64="JZSNh8HMqbgLWiOy9tGv37FCpyg=">AAAB/HicbVBNS8NAEN3Ur1q/oj16CRbBU0lEqseCF48V7Ac0oWw2m3bpZpPuToQQ4l/x4kERr/4Qb/4bt20O2vpg4PHeDDPz/IQzBbb9bVQ2Nre2d6q7tb39g8Mj8/ikp+JUEtolMY/lwMeKciZoFxhwOkgkxZHPad+f3s79/iOVisXiAbKEehEeCxYygkFLI7PuhhKTfDYtclfNJORBUYzMht20F7DWiVOSBirRGZlfbhCTNKICCMdKDR07AS/HEhjhtKi5qaIJJlM8pkNNBY6o8vLF8YV1rpXACmOpS4C1UH9P5DhSKot83RlhmKhVby7+5w1TCG+8nIkkBSrIclGYcgtia56EFTBJCfBME0wk07daZIJ1GqDzqukQnNWX10nvsum0mq37q0a7VcZRRafoDF0gB12jNrpDHdRFBGXoGb2iN+PJeDHejY9la8UoZ+roD4zPHxT9la4=</latexit>

a p
d
p
d
p
d
p
d
p
d a 0.3 0.1 0.1 0.2 0.3

i
<latexit sha1_base64="/567GNfqs6hI1+r9uu/P1P0myJ4=">AAAB6HicbVBNS8NAEJ34WetX1aOXxSJ4KkmR6rHgxWML9gPaUDbbSbt2swm7G6GE/gIvHhTx6k/y5r9x2+agrQ8GHu/NMDMvSATXxnW/nY3Nre2d3cJecf/g8Oi4dHLa1nGqGLZYLGLVDahGwSW2DDcCu4lCGgUCO8Hkbu53nlBpHssHM03Qj+hI8pAzaqzU5INS2a24C5B14uWkDDkag9JXfxizNEJpmKBa9zw3MX5GleFM4KzYTzUmlE3oCHuWShqh9rPFoTNyaZUhCWNlSxqyUH9PZDTSehoFtjOiZqxXvbn4n9dLTXjrZ1wmqUHJlovCVBATk/nXZMgVMiOmllCmuL2VsDFVlBmbTdGG4K2+vE7a1YpXq9Sa1+V6NY+jAOdwAVfgwQ3U4R4a0AIGCM/wCm/Oo/PivDsfy9YNJ585gz9wPn8Azj+M6A==</latexit>

<latexit sha1_base64="a/l1BFMoxqpLl0Ek1dbfORNKOpY=">AAAB6nicbVBNS8NAEJ34WeNX1aOXxSL0ICURqR4LXjxWtB/QhrLZbtqlm92wuxFK6E/w4kERr/4ib/4bN20O2vpg4PHeDDPzwoQzbTzv21lb39jc2i7tuLt7+weH5aPjtpapIrRFJJeqG2JNORO0ZZjhtJsoiuOQ0044uc39zhNVmknxaKYJDWI8EixiBBsrPVRdd1CueDVvDrRK/IJUoEBzUP7qDyVJYyoM4Vjrnu8lJsiwMoxwOnP7qaYJJhM8oj1LBY6pDrL5qTN0bpUhiqSyJQyaq78nMhxrPY1D2xljM9bLXi7+5/VSE90EGRNJaqggi0VRypGRKP8bDZmixPCpJZgoZm9FZIwVJsamk4fgL7+8StqXNb9eq99fVRoXRRwlOIUzqIIP19CAO2hCCwiM4Ble4c3hzovz7nwsWtecYuYE/sD5/AHSYYzJ</latexit>

i
<latexit sha1_base64="/567GNfqs6hI1+r9uu/P1P0myJ4=">AAAB6HicbVBNS8NAEJ34WetX1aOXxSJ4KkmR6rHgxWML9gPaUDbbSbt2swm7G6GE/gIvHhTx6k/y5r9x2+agrQ8GHu/NMDMvSATXxnW/nY3Nre2d3cJecf/g8Oi4dHLa1nGqGLZYLGLVDahGwSW2DDcCu4lCGgUCO8Hkbu53nlBpHssHM03Qj+hI8pAzaqzU5INS2a24C5B14uWkDDkag9JXfxizNEJpmKBa9zw3MX5GleFM4KzYTzUmlE3oCHuWShqh9rPFoTNyaZUhCWNlSxqyUH9PZDTSehoFtjOiZqxXvbn4n9dLTXjrZ1wmqUHJlovCVBATk/nXZMgVMiOmllCmuL2VsDFVlBmbTdGG4K2+vE7a1YpXq9Sa1+V6NY+jAOdwAVfgwQ3U4R4a0AIGCM/wCm/Oo/PivDsfy9YNJ585gz9wPn8Azj+M6A==</latexit>

Attention probability
between “saw” and
⇣ QK T
<latexit sha1_base64="OpEc/enLiBPMDCig0mcf91/67mQ=">AAACXnicbVFdaxQxFM1Mtdaptdv6IvgSXIT6sswUqX0s9UWwQqvdtrDZLplMZhs2H2Nyp3YJ+ZO+iS/+FDPbLWjrhcDhnHO5956UjRQO8vxnkq48erz6ZO1ptv5s4/lmb2v7zJnWMj5kRhp7UVLHpdB8CAIkv2gsp6qU/Lycfej082tunTD6FOYNHys61aIWjEKkJr2WAL8Bq/xXU4OiN2FyR1jzPZBDMd3BpLaU+RP86fI0eOK+WfBVCJiYygC+s3+mbhYw7jreYiI0URSuytJ/CZf+iIBQ3OGjzkCyLJv0+vkgXxR+CIol6KNlHU96P0hlWKu4Biapc6Mib2DsqQXBJA8ZaR1vKJvRKR9FqGkcN/aLeAJ+E5kK18bGpwEv2L87PFXOzVUZnd3S7r7Wkf/TRi3U+2MvdNMC1+x2UN1KDAZ3WeNKWM5AziOgzIq4K2ZXNIYJ8Ue6EIr7Jz8EZ7uDYm+wd/Kuf7C7jGMNvUKv0Q4q0Ht0gD6iYzREDP1KkiRL1pPf6Wq6kW7eWtNk2fMC/VPpyz+Z8rYM</latexit>

⌘ “yesterday”, “I”,…”saw”.
Softmaxrow p Mask 2 RL⇥L
d

Xavier Bresson 33
34

Updated word vectors


d1 d2 … dF-1 dF
d
<latexit sha1_base64="fxg9Ck0HDp4Pl8JR1zhp96MhZQc=">AAAB63icbVBNS8NAEJ34WeNX1aOXxSJ4kJKIVI8FLx4r2A9oQ9lsNu3S3U3Y3Qgl9C948aCIV/+QN/+NmzYHbX0w8Hhvhpl5YcqZNp737aytb2xubVd23N29/YPD6tFxRyeZIrRNEp6oXog15UzStmGG016qKBYhp91wclf43SeqNEvko5mmNBB4JFnMCDaFFLmuO6zWvLo3B1olfklqUKI1rH4NooRkgkpDONa673upCXKsDCOcztxBpmmKyQSPaN9SiQXVQT6/dYbOrRKhOFG2pEFz9fdEjoXWUxHaToHNWC97hfif189MfBvkTKaZoZIsFsUZRyZBxeMoYooSw6eWYKKYvRWRMVaYGBtPEYK//PIq6VzV/Ua98XBda16WcVTgFM7gAny4gSbcQwvaQGAMz/AKb45wXpx352PRuuaUMyfwB87nD2JgjRk=</latexit>

Step 4 : Calculate weighted linear Yesterday

combination
⇣ QK T ⌘
<latexit sha1_base64="moYfNZvChcfjNchjDIa3lmswfRo=">AAACWnicbVFdaxQxFM1MtR9bP9bqmy/BRagvy0wp1cdSXwQVWu1uC5vtkslktmHzMU3u1C4hf9IXEfwrhWa2W9DWC4HDOedy7z0paikcZNnvJF159Hh1bX2js/nk6bPn3RdbQ2cay/iAGWnsaUEdl0LzAQiQ/LS2nKpC8pNi9rHVTy65dcLoY5jXfKzoVItKMAqRmnQvCPArsMp/NxUoehUmd4Q1PwI5ENNtTCpLmT/Cn8+OgyfuwoIvQ8DElAbwnf0rdbOAcdvxDg+J0ERROC8K/y2c+S8EhOIOlwF3Jt1e1s8WhR+CfAl6aFmHk+5PUhrWKK6BSercKM9qGHtqQTDJQ4c0jteUzeiUjyLUNE4a+0U0Ab+NTIkrY+PTgBfs3x2eKufmqojOdl93X2vJ/2mjBqoPYy903QDX7HZQ1UgMBrc541JYzkDOI6DMirgrZuc0BgnxN9oQ8vsnPwTDnX6+19872u3t7yzjWEev0Ru0jXL0Hu2jT+gQDRBDv9B1spqsJX/SNN1IN2+tabLseYn+qfTVDcDftmU=</latexit>

I
Softmaxrow p Mask V 2 RL⇥d
d …

saw

a
Yesterday I … saw a i
<latexit sha1_base64="/567GNfqs6hI1+r9uu/P1P0myJ4=">AAAB6HicbVBNS8NAEJ34WetX1aOXxSJ4KkmR6rHgxWML9gPaUDbbSbt2swm7G6GE/gIvHhTx6k/y5r9x2+agrQ8GHu/NMDMvSATXxnW/nY3Nre2d3cJecf/g8Oi4dHLa1nGqGLZYLGLVDahGwSW2DDcCu4lCGgUCO8Hkbu53nlBpHssHM03Qj+hI8pAzaqzU5INS2a24C5B14uWkDDkag9JXfxizNEJpmKBa9zw3MX5GleFM4KzYTzUmlE3oCHuWShqh9rPFoTNyaZUhCWNlSxqyUH9PZDTSehoFtjOiZqxXvbn4n9dLTXjrZ1wmqUHJlovCVBATk/nXZMgVMiOmllCmuL2VsDFVlBmbTdGG4K2+vE7a1YpXq9Sa1+V6NY+jAOdwAVfgwQ3U4R4a0AIGCM/wCm/Oo/PivDsfy9YNJ585gz9wPn8Azj+M6A==</latexit>

Yesterday 1 0 0 0 0 Yesterday
The new vector
representation of “saw” is
I 0.3 0.7 0 0 0 0.3*Yesterday + 0.7*I given by a weighted linear
combination of the vector
0.4 0.2 0.4 0 0 0.4*Yesterday + 0.2*I + 0.2*went
representations of

“yesterday”, “I”,…”saw”.
And the weights are the
saw 0.2 0.1 0.1 0.5 0 0.2*Yesterday + 0.1*I + 0.1*went + …
attention probabilities
between the pair of words.
a 0.3 0.1 0.1 0.2 0.3 0.3*Yesterday + 0.1*I + 0.1*went + … + 0.3*a

i
<latexit sha1_base64="/567GNfqs6hI1+r9uu/P1P0myJ4=">AAAB6HicbVBNS8NAEJ34WetX1aOXxSJ4KkmR6rHgxWML9gPaUDbbSbt2swm7G6GE/gIvHhTx6k/y5r9x2+agrQ8GHu/NMDMvSATXxnW/nY3Nre2d3cJecf/g8Oi4dHLa1nGqGLZYLGLVDahGwSW2DDcCu4lCGgUCO8Hkbu53nlBpHssHM03Qj+hI8pAzaqzU5INS2a24C5B14uWkDDkag9JXfxizNEJpmKBa9zw3MX5GleFM4KzYTzUmlE3oCHuWShqh9rPFoTNyaZUhCWNlSxqyUH9PZDTSehoFtjOiZqxXvbn4n9dLTXjrZ1wmqUHJlovCVBATk/nXZMgVMiOmllCmuL2VsDFVlBmbTdGG4K2+vE7a1YpXq9Sa1+V6NY+jAOdwAVfgwQ3U4R4a0AIGCM/wCm/Oo/PivDsfy9YNJ585gz9wPn8Azj+M6A==</latexit>

Xavier Bresson 34
35

Attention blocks
Output
Probability
Attention blocks
<latexit sha1_base64="4ilYVmQFqfNdWYDQPJxMA44Yslo=">AAADnXicjVLbjtMwEHUTLku5deENHrCoQK02GyUVWnip1BVaVFB2VRDdrrRuK8d1W6vOBdsBqshfxZ/wxt/g9LLqtiCYh2h8Zs6ck9GEKWdSed6vkmXfuHnr9t6d8t179x88rOw/OpdJJgjtkoQn4iLEknIW065iitOLVFAchZz2wtnbot77SoVkSfxZzVPaj/AkZmNGsDLQcL/04yVEin5XIsrfJQJqiCjnTc/xnYbjug4MBuvysVL60IcIlQ3lS4ZHiw9EIRZ5Ww/ygnjga9iEsD0oHvDgavQplrPD0/axrq2R4EzXlm115z+xOkQsRhFW0zDMPxnFACkWUQlHesdVe9PPjsUNY0Hnuqft3n+qrsnfmJqa9a2Em55uXo0NzNh8/ZroYe5rs1vX2YQCjYyvNdI50fW/ypaHlarneouAu4m/SqpgFZ1h5ScaJSSLaKwIx1Je+l6q+jkWihFOdRllkqaYzPCEXpo0xkaony+uS8MXBhnBsTmOcRIruEA3GTmOpJxHoeks7MrtWgH+qXaZqfGbfs7iNFM0JkuhccahSmBxqnDEBCWKz02CiWDGKyRTLDBR5qCLJfjbv7ybnDdc/8g9+viq2mqs1rEHnoLnoAZ88Bq0QBt0QBcQ64nVst5bH+xn9okd2GfLVqu04jwG18Lu/QbqXCa8</latexit>

For ` = 0, 1, 2, .., LAtt 1


H̄ `+1 = H ` + Mask-MHA(LN(H ` ), LN(H ` ), LN(H ` )) 2 RL⇥d <latexit sha1_base64="xZKdwo3g3YWoOjhafVmgugmsWhA=">AAAB8XicbVDLSgNBEOz1GeMr6tHLYBAEIewGiR4DXnKMYB6YrGF20kmGzM4uM7NCWPIXXjwo4tW/8ebfOEn2oIkFDUVVN91dQSy4Nq777aytb2xubed28rt7+weHhaPjpo4SxbDBIhGpdkA1Ci6xYbgR2I4V0jAQ2ArGtzO/9YRK80jem0mMfkiHkg84o8ZKD7XHtItCXHrTXqHoltw5yCrxMlKEDPVe4avbj1gSojRMUK07nhsbP6XKcCZwmu8mGmPKxnSIHUslDVH76fziKTm3Sp8MImVLGjJXf0+kNNR6Ega2M6RmpJe9mfif10nM4MZPuYwTg5ItFg0SQUxEZu+TPlfIjJhYQpni9lbCRlRRZmxIeRuCt/zyKmmWS16lVLm7KlbLWRw5OIUzuAAPrqEKNahDAxhIeIZXeHO08+K8Ox+L1jUnmzmBP3A+fwD2/pBs</latexit>

`+1 `+1 `+1 L⇥d H `+1


H = H̄ + MLP(LN(H̄ )) 2 R
with H `=0 = LL({g1 , ..., gL } + PE) 2 RL⇥d
<latexit sha1_base64="f710BQpexVpuk5buPMohYM8b0PI=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBZBEEpSpHoseOmxgq2FJpbNdtIu3WzC7qZQQv6JFw+KePWfePPfuG1z0NYHA4/3ZpiZFyScKe0431ZpY3Nre6e8W9nbPzg8so9PuipOJYUOjXksewFRwJmAjmaaQy+RQKKAw2MwuZv7j1OQisXiQc8S8CMyEixklGgjDWzbC4jMWvlT5gHnV24+sKtOzVkArxO3IFVUoD2wv7xhTNMIhKacKNV3nUT7GZGaUQ55xUsVJIROyAj6hgoSgfKzxeU5vjDKEIexNCU0Xqi/JzISKTWLAtMZET1Wq95c/M/rpzq89TMmklSDoMtFYcqxjvE8BjxkEqjmM0MIlczciumYSEK1CatiQnBXX14n3XrNbdQa99fVZr2Io4zO0Dm6RC66QU3UQm3UQRRN0TN6RW9WZr1Y79bHsrVkFTOn6A+szx9DrZNi</latexit>

Attention Blocks H̄ `+1


x LAtt

Positional Positional
Encoding Encoding

Sequence of Query the


context words next word
{g1 , ..., gL 1 } gL
<latexit sha1_base64="+dagZ24hUujQycVcuZzodNQUcGE=">AAACEHicbVDLSsNAFJ3UV62vqEs3wSK6qCEpUl0W3LhwUcE+oAlhMp22QycPZm7EEvIJbvwVNy4UcevSnX/jtM2ith64cDjnXu69x485k2BZP1phZXVtfaO4Wdra3tnd0/cPWjJKBKFNEvFIdHwsKWchbQIDTjuxoDjwOW37o+uJ336gQrIovIdxTN0AD0LWZwSDkjz91EkdoI8ggnSQeamdVUzTrMxLt+d25mSeXrZMawpjmdg5KaMcDU//dnoRSQIaAuFYyq5txeCmWAAjnGYlJ5E0xmSEB7SraIgDKt10+lBmnCilZ/QjoSoEY6rOT6Q4kHIc+KozwDCUi95E/M/rJtC/clMWxgnQkMwW9RNuQGRM0jF6TFACfKwIJoKpWw0yxAITUBmWVAj24svLpFU17ZpZu7so16t5HEV0hI7RGbLRJaqjG9RATUTQE3pBb+hde9ZetQ/tc9Za0PKZQ/QH2tcvywedBQ==</latexit>

<latexit sha1_base64="ty0PYRkDA0w4RTwBIaIbdQfyOH0=">AAAB9XicbVA9SwNBEN3zM8avqKXNYhCswl2QaBmwsbCIYD4gOcPeZi9Zsrt37M6p4bj/YWOhiK3/xc5/4ya5QhMfDDzem2FmXhALbsB1v52V1bX1jc3CVnF7Z3dvv3Rw2DJRoilr0khEuhMQwwRXrAkcBOvEmhEZCNYOxldTv/3AtOGRuoNJzHxJhoqHnBKw0n0P2BNomQ6zfnqT9Utlt+LOgJeJl5MyytHol756g4gmkimgghjT9dwY/JRo4FSwrNhLDIsJHZMh61qqiGTGT2dXZ/jUKgMcRtqWAjxTf0+kRBozkYHtlARGZtGbiv953QTCSz/lKk6AKTpfFCYCQ4SnEeAB14yCmFhCqOb2VkxHRBMKNqiiDcFbfHmZtKoVr1ap3Z6X69U8jgI6RifoDHnoAtXRNWqgJqJIo2f0it6cR+fFeXc+5q0rTj5zhP7A+fwBUfKTAw==</latexit>

Xavier Bresson 35
36

Training with mini-batch


Training is done with a batch size of B sequences and batch length of L.

Sequence of L words

Next word Document


Current word
Context window

Cross
Entropy Labels: 174, 564, 13, … , 876
Criterion

L number of words
L = 0.01 B to predict
Xavier Bresson 36
<latexit sha1_base64="Dc/3Cq9NxQoPTPNfgyCOy/2v+w0=">AAAB/HicdVDLSgMxFM34rPU12qWbYBFcDZk6tHUhFN24cFHBPqAdSybNtKGZB0lGGIb6K25cKOLWD3Hn35hpK6jogcDhnHu5J8eLOZMKoQ9jaXlldW29sFHc3Nre2TX39tsySgShLRLxSHQ9LClnIW0ppjjtxoLiwOO0400ucr9zR4VkUXij0pi6AR6FzGcEKy0NzFI/wGpMMM+uprfnZ8hCdnFglpF1Wq9WnCrUAqrZFTsnlZpz4kBbKznKYIHmwHzvDyOSBDRUhGMpezaKlZthoRjhdFrsJ5LGmEzwiPY0DXFApZvNwk/hkVaG0I+EfqGCM/X7RoYDKdPA05N5VPnby8W/vF6i/LqbsTBOFA3J/JCfcKgimDcBh0xQoniqCSaC6ayQjLHAROm+8hK+fgr/J+2KZSPLvnbKDWdRRwEcgENwDGxQAw1wCZqgBQhIwQN4As/GvfFovBiv89ElY7FTAj9gvH0CJ+STvQ==</latexit>
37

Generation
The transformer network is trained in parallel (using the mask to hide the predicted words).
After training, the mask is not required anymore (there are no future words to hide).
And the sequence is generated auto-regressively, i.e. one word at a time.

Generate next word wt+1 .


<latexit sha1_base64="OUF/ZtqhYuBDqEUMxTLDiTC41gQ=">AAAE5HicdVRLb9NAEHabAKW8WjhyGdGCWrVYdoUKQorUCiEqNUCgbYpUJ9HaXrerrr3OekwbWXvmwgGEuPKjuPFXOLF2kjZ2yh6smW8e38zsrN2YswQt68/MbK1+7fqNuZvzt27fuXtvYfF+OxGp9OiBJ7iQn1ySUM4ieoAMOf0US0pCl9ND9/RVbj/8TGXCRLSPg5h2QnIcsYB5BDXUW5z9+8RBeo4yzN7QiEqCFCKtw5mQPiyf9TJcs9WyqcBx5i9c36cYpwgigH1JoiQQMqQSQuFT/rLsqU2gEcp5w1o3TXMdmt2xbRtRPbULd3D6KfGLDzgukVlfdbM8SpNDA6DfzRVYg3Hs251ttTJWmu/UytBjdX0S270Caw+xVXBY5IQET1w3+6i6/lQZ/VIBU0VNlNJslUup+mqyK9nGMWcMT/SQRoQNSzUusjUvMx8rfRead6y3XqserpbzZr6qtrF7mbX9H4JsksJWxTWVWZ0yb5UVHWQhTWDIXlmSWAqXuIwzHOgm4x5ekO+JAENyftliMcnxGMp7oodYIW2XF22PhDGf3N0y72iRwUlYmBdh6tjewpJlWsWBacEeCUvG6LR6C78dX3hpSCP0OEmSI9uKsZMRiczjVM07aUJj4p2SY3qkxYjooXSy4pEqeKwRH/L3EIgIoUAnIzISJskgdLVn3mVSteXgVbajFIMXnYxFetg08oZEQcoBBeQvHnwmqYd8oAXiSaZrBe+ESOKh/i/kQ7CrLU8L7Q3T3jQ3Pzxb2toYjWPOeGg8MlYM23hubBk7Rss4MLwaqX2pfat9rwf1r/Uf9Z9D19mZUcwDo3Tqv/4BynemNA==</latexit>

Output
wt+1 ⇠ pt 2 RV
<latexit sha1_base64="UZLviB/OeiyVLo3Xb3VrvcL0w0c=">AAACC3icbVBNS8NAEN3Ur1q/qh69LC2CIJSkSPVY8OKxiv2ApobNdtsu3WzC7kQpIXcv/hUvHhTx6h/w5r9x0/agrQ8GHu/NMDPPjwTXYNvfVm5ldW19I79Z2Nre2d0r7h+0dBgrypo0FKHq+EQzwSVrAgfBOpFiJPAFa/vjy8xv3zOleShvYRKxXkCGkg84JWAkr1h68BI4dVLsah7gyAPscukGBEa+n9ykd0kr9Yplu2JPgZeJMydlNEfDK365/ZDGAZNABdG669gR9BKigFPB0oIbaxYROiZD1jVUkoDpXjL9JcXHRunjQahMScBT9fdEQgKtJ4FvOrMr9aKXif953RgGF72EyygGJuls0SAWGEKcBYP7XDEKYmIIoYqbWzEdEUUomPgKJgRn8eVl0qpWnFqldn1WrlfnceTRESqhE+Sgc1RHV6iBmoiiR/SMXtGb9WS9WO/Wx6w1Z81nDtEfWJ8/tUKa0w==</latexit>

probability
Output of Transformer model: for next word

for ` = 0, ..., LAtt 1 <latexit sha1_base64="x7oWYqCA3ODSKkXqv6yydUMF09M=">AAACAHicbVC7SgNBFJ2NrxhfUQsLm8UgWIXdINFGiNhYWEQwD8huwuzkJhky+3DmrhiWbfwVGwtFbP0MO//GyaPQxAMXDufcy733eJHgCi3r28gsLa+srmXXcxubW9s7+d29ugpjyaDGQhHKpkcVCB5ADTkKaEYSqO8JaHjDq7HfeACpeBjc4SgC16f9gPc4o6ilTv7gvp04IMTFTdtBeETpJ5eIadrJF6yiNYG5SOwZKZAZqp38l9MNWexDgExQpVq2FaGbUImcCUhzTqwgomxI+9DSNKA+KDeZPJCax1rpmr1Q6grQnKi/JxLqKzXyPd3pUxyoeW8s/ue1YuyduwkPohghYNNFvViYGJrjNMwul8BQjDShTHJ9q8kGVFKGOrOcDsGef3mR1EtFu1ws354WKqVZHFlySI7ICbHJGamQa1IlNcJISp7JK3kznowX4934mLZmjNnMPvkD4/MHfC2W9A==</latexit>

Att
q̄ `+1 ` ` ` `
= q + MHA(LN(q ), LN(K ), LN(V )) 2 R d q `=L

q `+1 = q̄ `+1 + MLP(LN(q̄ `+1 )) 2 Rd


with q `=0 = LL(gt + PEt ) 2 Rd
K `=0 = V `=0 = LL({g1 , ..., gt } + PE) 2 Rt⇥d
Positional Positional
`=LAtt V
Output probability pt = Softmax(MLP(q )) 2 R Encoding Encoding

Sample next word probability wt+1 ⇠ pt .


Sequence of Query the
context words next word
{g1 , ..., gt 1} gt
<latexit sha1_base64="IU5KswMxQnJUKE1w++zt8fFop5A=">AAACEHicbVDLSsNAFJ34rPUVdekmWEQXNSRFqsuCG5cV7AOaECbTSTt08mDmRiwhn+DGX3HjQhG3Lt35N07bLGrrgQuHc+7l3nv8hDMJlvWjrayurW9slrbK2zu7e/v6wWFbxqkgtEViHouujyXlLKItYMBpNxEUhz6nHX90M/E7D1RIFkf3ME6oG+JBxAJGMCjJ08+czAH6CCLMBrmX2XnVNM3qvAQXdu7knl6xTGsKY5nYBamgAk1P/3b6MUlDGgHhWMqebSXgZlgAI5zmZSeVNMFkhAe0p2iEQyrdbPpQbpwqpW8EsVAVgTFV5ycyHEo5Dn3VGWIYykVvIv7n9VIIrt2MRUkKNCKzRUHKDYiNSTpGnwlKgI8VwUQwdatBhlhgAirDsgrBXnx5mbRrpl0363eXlUatiKOEjtEJOkc2ukINdIuaqIUIekIv6A29a8/aq/ahfc5aV7Ri5gj9gfb1Cwh+nS0=</latexit>

<latexit sha1_base64="2c6BlZyRVlknxZoE8507zzUwov0=">AAAB9XicbVBNSwMxEM3Wr1q/qh69BIvgqewWqR4LXjxWsB/QriWbZtvQJLsks2pZ9n948aCIV/+LN/+NabsHbX0w8Hhvhpl5QSy4Adf9dgpr6xubW8Xt0s7u3v5B+fCobaJEU9aikYh0NyCGCa5YCzgI1o01IzIQrBNMrmd+54FpwyN1B9OY+ZKMFA85JWCl+z6wJ9AyHWWDFLJBueJW3TnwKvFyUkE5moPyV38Y0UQyBVQQY3qeG4OfEg2cCpaV+olhMaETMmI9SxWRzPjp/OoMn1lliMNI21KA5+rviZRIY6YysJ2SwNgsezPxP6+XQHjlp1zFCTBFF4vCRGCI8CwCPOSaURBTSwjV3N6K6ZhoQsEGVbIheMsvr5J2rerVq/Xbi0qjlsdRRCfoFJ0jD12iBrpBTdRCFGn0jF7Rm/PovDjvzseiteDkM8foD5zPH466kys=</latexit>

Xavier Bresson 37
38

Lab 01
PyTorch implementation of Language Model Transformers

Output
Probability

Positional Positional
Encoding Encoding

Sequence of Query the


context words next word

Xavier Bresson 38
39

Lab 01
Numerical results on PTB :

Vanilla RNN: exp(train_loss) = 111 exp(test_loss) = 155 3 million parameters

LSTM: exp(train_loss) = 59 exp(test_loss) = 106 7 million parameters

LSTM state-of-the-art: exp(test_loss) = 50 50 million parameters

Vanilla LM Transformer:

exp(train_loss) = 54 exp(test_loss) = 174 3 million parameters

LM Transformer state-of-the-art (with pre-training):

exp(test_loss) = 20.5 175 billion parameters

Xavier Bresson 39
40

Attention mechanism
It is a breakthrough idea in NLP !
It is as revolutionary as CNNs in Computer Vision.

Transformers

RNNs
(degrades quickly
after 30 words)

(Bahdanau-Cho-Bengio 2014)

Xavier Bresson 40
41

Why attention nets are better?


Why casting the LM task as a soft alignment/attention problem is a great idea ?
With RNNs, the very long sequence requires to be memorized and represented by a single
vector.
RNN architectures with a single memory vector cannot simply deal with long sequences (limit
of non-linear dynamic systems).
We ask too much to memorize everything with one vector!
With attention, we distribute the memorization load over each pair of words.
Each word in the target sequence only needs to find its match with the word (or a few words)
in the source target.
It solves the limitation of long-term dependencies in RNNs (a word in the target sequence
communicates with all words in the source sequence).
The matching is made easy by transforming the words with hidden representations.
Attention is a key mathematical structure for NLP and several other domains.
SOTA for all NLP tasks since 2019.

Xavier Bresson 41
42

Attention interpretation

The attention score matrix provides the matching between words in a sequence :

⇣ QK T ⌘
<latexit sha1_base64="UVBbZQpecFTQ/Y+KgyzK2fgGTcY=">AAACTXicbVHLahsxFNW4eThuk7jtshsRU0g3ZiaENMuQbgrNIi8nAcsxGo3GEdFjKt1pbcT8YDaB7voX3XSRUEo0jgvN44DgcM693HuP0kIKB3H8M2q8mJtfWGwutV6+Wl5Zbb9+c+JMaRnvMSONPUup41Jo3gMBkp8VllOVSn6aXn6q/dNv3Dph9DFMCj5QdKRFLhiFIA3bGQE+Bqv8kclB0XE1/CdY870iu2K0jkluKfMH+Mv5ceWJ+2rBZ1WFMa7tD5gITRSFizT1h9W53yMgFHd4L1QQ0qoxbHfibjwFfkqSGemgGfaH7R8kM6xUXAOT1Ll+Ehcw8NSCYJJXLVI6XlB2SUe8H6imYeDAT9Oo8PugZDg3NjwNeKr+3+Gpcm6i0lBZr+0ee7X4nNcvId8eeKGLErhm94PyUmIwuI4WZ8JyBnISCGVWhF0xu6AhOwgfUIeQPD75KTnZ6CZb3a2Dzc7OxiyOJnqH1tA6StBHtIM+o33UQwxdoV/oBt1G19Hv6E/09760Ec163qIHaCzeAYRzs7o=</latexit>

Softmaxrow p 2 RL⇥L
d

Xavier Bresson 42
43

Outline

Language Models

Memory Networks

Transformers

Language Model Transformers

Sequence-To-Sequence Transformers

Transfer Learning

Conclusion

Xavier Bresson 43
44

Seq2Seq Transformers

Given a sequence of words, convert it into a


different sequence.
Basic tasks in NLP
Translation
Question & Answer
Summarization LDec
<latexit sha1_base64="85MQDivJuxTSBnH6YaruPNAM3yQ=">AAAB/HicbVBNS8NAEN34WetXtEcvi0XwVJIi1WNBDx48VLAf0May2U7bpZsPdidiCPWvePGgiFd/iDf/jUmbg7Y+GHi8N8PMPDeUQqNlfRsrq2vrG5uFreL2zu7evnlw2NJBpDg0eSAD1XGZBil8aKJACZ1QAfNcCW13cpn57QdQWgT+HcYhOB4b+WIoOMNU6pulm/sewiMqL7kCPi1m6Jtlq2LNQJeJnZMyydHom1+9QcAjD3zkkmndta0QnYQpFFzCtNiLNISMT9gIuin1mQfaSWbHT+lJqgzoMFBp+Uhn6u+JhHlax56bdnoMx3rRy8T/vG6EwwsnEX4YIfh8vmgYSYoBzZKgA6GAo4xTwrgS6a2Uj5liHNO8shDsxZeXSatasWuV2u1ZuV7N4yiQI3JMTolNzkmdXJMGaRJOYvJMXsmb8WS8GO/Gx7x1xchnSuQPjM8faW+TRA==</latexit>

LEnc
<latexit sha1_base64="hsLCwsvcLoF3ruaKtobkw3+YAU4=">AAAB/HicbVBNS8NAEN3Ur1q/oj16CRbBU0mKVI8FETx4qGA/oI1ls920SzebsDsRQ6h/xYsHRbz6Q7z5b9zEHLT1wcDjvRlm5nkRZwps+8sorayurW+UNytb2zu7e+b+QVeFsSS0Q0Ieyr6HFeVM0A4w4LQfSYoDj9OeN7vI/N49lYqF4haSiLoBngjmM4JBSyOzen03BPoAMkgvBZlXMozMml23c1jLxClIDRVoj8zP4TgkcUAFEI6VGjh2BG6KJTDC6bwyjBWNMJnhCR1oKnBAlZvmx8+tY62MLT+UugRYufp7IsWBUkng6c4Aw1Qtepn4nzeIwT93UyaiGKh+LV/kx9yC0MqSsMZMUgI80QQTyfStFpliiQnovLIQnMWXl0m3UXea9ebNaa3VKOIoo0N0hE6Qg85QC12hNuogghL0hF7Qq/FoPBtvxvtPa8koZqroD4yPb3jek04=</latexit>

A Seq2Sep Transformer is composed of


World embedding layer
Self-attention encoding layer
Self-attention decoding layer
Cross-attention layer
Classification layer Lin Lout
<latexit sha1_base64="FlwZCfnBP8VQ7bRfekoNPY/vQ5Q=">AAAB9HicbVA9SwNBEN3zM8avqKXNYhCswl2QaBmwsbCIYD4gOcLeZi5Zsrd37s4FQ8jvsLFQxNYfY+e/cZNcoYkPBh7vzTAzL0ikMOi6387a+sbm1nZuJ7+7t39wWDg6bpg41RzqPJaxbgXMgBQK6ihQQivRwKJAQjMY3sz85gi0EbF6wHECfsT6SoSCM7SSf9ftIDyhjiZCTbuFolty56CrxMtIkWSodQtfnV7M0wgUcsmMaXtugv6EaRRcwjTfSQ0kjA9ZH9qWKhaB8Sfzo6f03Co9GsbalkI6V39PTFhkzDgKbGfEcGCWvZn4n9dOMbz27T9JiqD4YlGYSooxnSVAe0IDRzm2hHEt7K2UD5hmHG1OeRuCt/zyKmmUS16lVLm/LFbLWRw5ckrOyAXxyBWpkltSI3XCySN5Jq/kzRk5L86787FoXXOymRPyB87nD1blknE=</latexit> <latexit sha1_base64="b+9PfU9UYGJSPBybLLQgknB6RXs=">AAAB9XicbVDLSgNBEJz1GeMr6tHLYBA8hd0g0WPAiwcPEcwDkjXMTibJkHksM71qWPIfXjwo4tV/8ebfOEn2oIkFDUVVN91dUSy4Bd//9lZW19Y3NnNb+e2d3b39wsFhw+rEUFanWmjTiohlgitWBw6CtWLDiIwEa0ajq6nffGDGcq3uYByzUJKB4n1OCTjp/qbbAfYERqY6gUm3UPRL/gx4mQQZKaIMtW7hq9PTNJFMARXE2nbgxxCmxACngk3yncSymNARGbC2o4pIZsN0dvUEnzqlh/vauFKAZ+rviZRIa8cycp2SwNAuelPxP6+dQP8yTLmKE2CKzhf1E4FB42kEuMcNoyDGjhBquLsV0yExhIILKu9CCBZfXiaNcimolCq358VqOYsjh47RCTpDAbpAVXSNaqiOKDLoGb2iN+/Re/HevY9564qXzRyhP/A+fwBFBZL8</latexit>

Input(EN): He drives the car Output(FR): Il conduit la voiture

Seq2Seq Transformer Architecture


(Vaswani-et-al Google Brain 2017)

Xavier Bresson 44
45

Self-attention encoder

Encode the complete input sequence with a self-attention layer :

H̄ = MHA(H) 2 RLin ⇥d , H 2 RLin ⇥d


<latexit sha1_base64="PHEUTKg+FSj+GIWPOt4yXQHG8cA=">AAADL3icjVJdb9MwFHUyPkb4WDceebGoQJ1URcmEBi+VBggpDyAGouukuo0cx22sOR+yHaAy/ke88Ff2ghAI8cq/wOlatHYg7UpWjs659557HScVZ1IFwTfH3bhy9dr1zRvezVu372y1tneOZFkLQvuk5KU4TrCknBW0r5ji9LgSFOcJp4Pk5HmjD95TIVlZvFOzio5yPC3YhBGsLBVvOy9QgoWODHzYg0jRj0rk+lX01HSiXcQKlGOVJYl+a8b6ZbzUWWGQYjmVMDVdiGB0qUyIkNeYPGPTDkSfYp31QjOObP0y29rGmTWe5+zCwfh1091+1vqnqz3/1sMPTGXQwKiH9JKbmvG5YWIdmq7v+93/ySujG2QutZnnxa124AfzgBdBuABtsIjDuHWK0pLUOS0U4VjKYRhUaqSxUIxwajxUS1phcoKndGhhga3TSM//t4EPLJPCSSnsKRScs+crNM6lnOWJzWxml+taQ/5LG9Zq8mRkV6tqRQtyZjSpOVQlbB4PTJmgRPGZBZgIZmeFJMMCE2WfWHMJ4frKF8HRnh/u+/tvHrUP9hbXsQnugfugA0LwGByACByCPiDOZ+fU+e78cL+4X92f7q+zVNdZ1NwFK+H+/gPH+Qh2</latexit>

⇣ ⌘
= kh=1 HAh (H) W O , W O 2 Rd⇥d
H

Lin ⇥d
with H = {gin in
1 , ..., gLin } 2 R

Word embedding Self-attention


encoding layer LDec
<latexit sha1_base64="85MQDivJuxTSBnH6YaruPNAM3yQ=">AAAB/HicbVBNS8NAEN34WetXtEcvi0XwVJIi1WNBDx48VLAf0May2U7bpZsPdidiCPWvePGgiFd/iDf/jUmbg7Y+GHi8N8PMPDeUQqNlfRsrq2vrG5uFreL2zu7evnlw2NJBpDg0eSAD1XGZBil8aKJACZ1QAfNcCW13cpn57QdQWgT+HcYhOB4b+WIoOMNU6pulm/sewiMqL7kCPi1m6Jtlq2LNQJeJnZMyydHom1+9QcAjD3zkkmndta0QnYQpFFzCtNiLNISMT9gIuin1mQfaSWbHT+lJqgzoMFBp+Uhn6u+JhHlax56bdnoMx3rRy8T/vG6EwwsnEX4YIfh8vmgYSYoBzZKgA6GAo4xTwrgS6a2Uj5liHNO8shDsxZeXSatasWuV2u1ZuV7N4yiQI3JMTolNzkmdXJMGaRJOYvJMXsmb8WS8GO/Gx7x1xchnSuQPjM8faW+TRA==</latexit>

⇣ Q KT ⌘
<latexit sha1_base64="zVs2xW5Rrc6WxjzEx5YJtMhbsAk=">AAADu3iclZJNb9NAEIY3NtBiPprCkcuKiCqVUOpUqO0lUoGLpXBooHEqxYm1Xq/jpf7q7hgaWfsj4ca/YZ2mtKQooiPZejXzzjy7qwmKhEuw7V8Nw3zw8NHG5mPrydNnz7ea2y9cmZeCsiHNk1ycBUSyhGdsCBwSdlYIRtIgYaPg/GNdH31jQvI8O4V5wSYpmWU84pSATvnbjR+WZXnALkGklfNe+XHb2cU7PXyd+5JHkJJL5X3gszb2IkFoNfBj3Pfj6amqPHkhoAr3KkcphWvTLnb92OOZlxKIg6D6rKbVJ/96HM+UBzxlEod7jm7wrJ0/KPydQ4wVrsf3sDPShAH+z0lv8cK+4g5vs1ZgGn5RknD5798w+/dj9tcx1yHdG6R7P6S7BmlZfrNld+xF4LuiuxQttIwTv/nTC3NapiwDmhApx127gElFBHCaMGV5pWQFoedkxsZaZkSzJtVi9xR+ozMhjnKhvwzwInu7oyKplPM00M76xHK1Vif/VRuXEB1N9PWLElhGr0BRmWDIcb3IOOSCUUjmWhAquD4rpjHR6wl63etH6K5e+a5w9zvdg87B4F3reH/5HJvoFXqN2qiLDtExctAJGiJqHBlTY2bEZs+k5lczubIajWXPS/RXmOVvUK80Fw==</latexit>

h
HAh (H) = Softmax p h Vh 2 RLin ⇥d/H LEnc
<latexit sha1_base64="hsLCwsvcLoF3ruaKtobkw3+YAU4=">AAAB/HicbVBNS8NAEN3Ur1q/oj16CRbBU0mKVI8FETx4qGA/oI1ls920SzebsDsRQ6h/xYsHRbz6Q7z5b9zEHLT1wcDjvRlm5nkRZwps+8sorayurW+UNytb2zu7e+b+QVeFsSS0Q0Ieyr6HFeVM0A4w4LQfSYoDj9OeN7vI/N49lYqF4haSiLoBngjmM4JBSyOzen03BPoAMkgvBZlXMozMml23c1jLxClIDRVoj8zP4TgkcUAFEI6VGjh2BG6KJTDC6bwyjBWNMJnhCR1oKnBAlZvmx8+tY62MLT+UugRYufp7IsWBUkng6c4Aw1Qtepn4nzeIwT93UyaiGKh+LV/kx9yC0MqSsMZMUgI80QQTyfStFpliiQnovLIQnMWXl0m3UXea9ebNaa3VKOIoo0N0hE6Qg85QC12hNuogghL0hF7Qq/FoPBtvxvtPa8koZqroD4yPb3jek04=</latexit>

d/H
with Qh = HWhQ 2 RLin ⇥d/H , WhQ 2 Rd⇥d/H
Kh = HWhK 2 RLin ⇥d/H , WhK 2 Rd⇥d/H
Vh = HWhV 2 RLin ⇥d/H , WhV 2 Rd⇥d/H
self-attention

Produce a representation of
words that depends on the
he drives the car context of surrounding words.
Xavier Bresson 45
46

Encoder layer

The encoder is composed of multiple attention blocks.


Each attention block has MHA, MLP, Residual Encoder
Connection, Layer Normalization, PE. layer

H Enc
<latexit sha1_base64="OA7V4VFRvUYoDHbR8ZoUinmXlrg=">AAAB+HicbVBNS8NAEN3Ur1o/GvXoZbEInkpSpHosiNBjBfsBbSyb7bZdupuE3YlYQ3+JFw+KePWnePPfuIk5aOuDgcd7M8zM8yPBNTjOl1VYW9/Y3Cpul3Z29/bL9sFhR4exoqxNQxGqnk80EzxgbeAgWC9SjEhfsK4/u0r97j1TmofBLcwj5kkyCfiYUwJGGtrl5t0A2AMomVwHdFEa2hWn6mTAq8TNSQXlaA3tz8EopLFkAVBBtO67TgReQhRwKtiiNIg1iwidkQnrGxoQybSXZIcv8KlRRngcKlMB4Ez9PZEQqfVc+qZTEpjqZS8V//P6MYwvvYQHUQzMvJUtGscCQ4jTFPCIK0ZBzA0hVHFzK6ZToggFk1Uagrv88irp1KpuvVq/Oa80ankcRXSMTtAZctEFaqAmaqE2oihGT+gFvVqP1rP1Zr3/tBasfOYI/YH18Q2KG5L6</latexit>

for ` = 0, ..., LEnc


<latexit sha1_base64="SCQGWGVgDo22TT2dPl7hWIwoQRY=">AAAEZHiclVPNb9MwFPfawkYZ0G7ihIQsKqZW66JkQoNLpSE0qYeCCqLbpLqtHMdtrTlO5jhAF/mf5MaRC38HTtqgfoBU3sF6n3m/98t7bshZpGz7x06hWLp3f3fvQfnh/qPHTyrVg8soiCWhPRLwQF67OKKcCdpTTHF6HUqKfZfTK/fmXRq/+kJlxALxWc1COvDxRLAxI1gZ16hauDtCin5T0k/GgYQaIsp5y25altXsDPPQhSD6xIEIlY8guo2xlz0QuVgmbT1M0qJjR8MWhO1hasBjmNe+b7/V9dzofND1eUajuaWvARETyMdq6rrJJ9OsM8qzmNBIMZ9G0NMb4NorsDagLgHsdFcBruf+F4Q88JWpKWSCKYY5u8vYNuwuQLVs3frTsWM6Jrk10cOlT48SR2e/4l/hFSAambFyu3uhG1uhnvOWP3kGFh6kggQelTCIVRirDP3yQrTyYVb3RG/btTyq1GzLzgRuKs5CqYGFdEeV78gLSOxToQjHUdR37FANEiwVI5zqMoojGmJygye0b1SBTa9Bkh2Jhi+Nx4Ppko8DoWDmXa5IsB9FM981mSn6aD2WOv8W68dq/GZghjMcGcbmjcYxhyqA6cVBj0lKFJ8ZBRNpFoJAMsUSE2XuMiXBWR95U7k8tZwz6+zjq9r56YKOPfAMvAB14IDX4By0QRf0ACn8LO4WK8Vq8Vdpv3RYejpPLewsag7BipSe/wYSP3Z3</latexit>

1
H̄ `+1 = H ` + MHA(LN(H ` ), LN(H ` ), LN(H ` )) 2 RLin ⇥d LDec
<latexit sha1_base64="85MQDivJuxTSBnH6YaruPNAM3yQ=">AAAB/HicbVBNS8NAEN34WetXtEcvi0XwVJIi1WNBDx48VLAf0May2U7bpZsPdidiCPWvePGgiFd/iDf/jUmbg7Y+GHi8N8PMPDeUQqNlfRsrq2vrG5uFreL2zu7evnlw2NJBpDg0eSAD1XGZBil8aKJACZ1QAfNcCW13cpn57QdQWgT+HcYhOB4b+WIoOMNU6pulm/sewiMqL7kCPi1m6Jtlq2LNQJeJnZMyydHom1+9QcAjD3zkkmndta0QnYQpFFzCtNiLNISMT9gIuin1mQfaSWbHT+lJqgzoMFBp+Uhn6u+JhHlax56bdnoMx3rRy8T/vG6EwwsnEX4YIfh8vmgYSYoBzZKgA6GAo4xTwrgS6a2Uj5liHNO8shDsxZeXSatasWuV2u1ZuV7N4yiQI3JMTolNzkmdXJMGaRJOYvJMXsmb8WS8GO/Gx7x1xchnSuQPjM8faW+TRA==</latexit>

`+1 `+1 `+1 Lin ⇥d


)) 2 R LEnc
<latexit sha1_base64="hsLCwsvcLoF3ruaKtobkw3+YAU4=">AAAB/HicbVBNS8NAEN3Ur1q/oj16CRbBU0mKVI8FETx4qGA/oI1ls920SzebsDsRQ6h/xYsHRbz6Q7z5b9zEHLT1wcDjvRlm5nkRZwps+8sorayurW+UNytb2zu7e+b+QVeFsSS0Q0Ieyr6HFeVM0A4w4LQfSYoDj9OeN7vI/N49lYqF4haSiLoBngjmM4JBSyOzen03BPoAMkgvBZlXMozMml23c1jLxClIDRVoj8zP4TgkcUAFEI6VGjh2BG6KJTDC6bwyjBWNMJnhCR1oKnBAlZvmx8+tY62MLT+UugRYufp7IsWBUkng6c4Aw1Qtepn4nzeIwT93UyaiGKh+LV/kx9yC0MqSsMZMUgI80QQTyfStFpliiQnovLIQnMWXl0m3UXea9ebNaa3VKOIoo0N0hE6Qg85QC12hNuogghL0hF7Qq/FoPBtvxvtPa8koZqroD4yPb3jek04=</latexit>

H = H̄ + MLP(LN(H̄
Lin ⇥d
with initialization H `=0 = LL({gin in
1 , ..., gLin } + PE) 2 R

Linear layer Positional


(linear embedding) Encoding
Enc
and encoder output H Enc = H `=L 2 RLin ⇥d

Xavier Bresson 46
47

Masked self-attention decoder

Compute the query to the next word to predict


with a masked self-attention layer to speed up
training.
The mask hides future words in the output Masked self-attention
LDec decoding Layer
<latexit sha1_base64="85MQDivJuxTSBnH6YaruPNAM3yQ=">AAAB/HicbVBNS8NAEN34WetXtEcvi0XwVJIi1WNBDx48VLAf0May2U7bpZsPdidiCPWvePGgiFd/iDf/jUmbg7Y+GHi8N8PMPDeUQqNlfRsrq2vrG5uFreL2zu7evnlw2NJBpDg0eSAD1XGZBil8aKJACZ1QAfNcCW13cpn57QdQWgT+HcYhOB4b+WIoOMNU6pulm/sewiMqL7kCPi1m6Jtlq2LNQJeJnZMyydHom1+9QcAjD3zkkmndta0QnYQpFFzCtNiLNISMT9gIuin1mQfaSWbHT+lJqgzoMFBp+Uhn6u+JhHlax56bdnoMx3rRy8T/vG6EwwsnEX4YIfh8vmgYSYoBzZKgA6GAo4xTwrgS6a2Uj5liHNO8shDsxZeXSatasWuV2u1ZuV7N4yiQI3JMTolNzkmdXJMGaRJOYvJMXsmb8WS8GO/Gx7x1xchnSuQPjM8faW+TRA==</latexit>

sequence, i.e. the words we want to predict.


LEnc
<latexit sha1_base64="hsLCwsvcLoF3ruaKtobkw3+YAU4=">AAAB/HicbVBNS8NAEN3Ur1q/oj16CRbBU0mKVI8FETx4qGA/oI1ls920SzebsDsRQ6h/xYsHRbz6Q7z5b9zEHLT1wcDjvRlm5nkRZwps+8sorayurW+UNytb2zu7e+b+QVeFsSS0Q0Ieyr6HFeVM0A4w4LQfSYoDj9OeN7vI/N49lYqF4haSiLoBngjmM4JBSyOzen03BPoAMkgvBZlXMozMml23c1jLxClIDRVoj8zP4TgkcUAFEI6VGjh2BG6KJTDC6bwyjBWNMJnhCR1oKnBAlZvmx8+tY62MLT+UugRYufp7IsWBUkng6c4Aw1Qtepn4nzeIwT93UyaiGKh+LV/kx9yC0MqSsMZMUgI80QQTyfStFpliiQnovLIQnMWXl0m3UXea9ebNaa3VKOIoo0N0hE6Qg85QC12hNuogghL0hF7Qq/FoPBtvxvtPa8koZqroD4yPb3jek04=</latexit>

Update equation for the masked self-attention


decoder layer :
H̄ = Mask-MHA(H, H, H) 2 RLout ⇥d
<latexit sha1_base64="IHaIB30Z78aUM7tkWdpH+QrKzhQ=">AAAC1nicjVJbixMxFM6Mt7Xeqj76EiwuFesws8jqy8KKCPPQhXrpbqHpDpk0bUNnMkNyRi0hPijiq7/NN3+E/8FMtwN7UfCEwJfvOyfn5JykZSY0hOEvz790+crVa1vXWzdu3rp9p3333qEuKsX4kBVZoUYp1TwTkg9BQMZHpeI0TzN+lC5f1frRB660KOR7WJV8ktO5FDPBKDgqaf/exiSlysQW72EC/BOo3BxQvXx6EL+03bjn1mMiJMkpLNLUvLXHpp80jkUFloDIucZTiwlpbTfKRwELbHG81xD9vu0S05zm9vj0HYmJbC8Igt4/9bM5LbH4SUMMXtv/rLCVtDthEK4NXwTRBnTQxgZJ+yeZFqzKuQSWUa3HUVjCxFAFgmXctkileUnZks752EFJXaKJWY/F4keOmeJZodyWgNfs6QhDc61Xeeo869r1ea0m/6aNK5i9mBghywq4ZCeJZlWGocD1jPFUKM4gWzlAmRKuVswWVFEG7ifUTYjOP/kiONwJot1g982zzv7Oph1b6AF6iLooQs/RPorRAA0R8955K++L99Uf+Z/9b/73E1ff28TcR2fM//EHUTroiw==</latexit>

Lout ⇥d
with H = LL({gout out
1 , ..., gLout } + PE) 2 R

Self-attention layer produces a representation of


the sequence of output words that depend on the self-attention

context of their surrounding words.


il conduit la voiture

Query word Future word that is


(index t) hidden at time t
with a mask
Xavier Bresson 47
48

Cross-attention layer

Compute attention between pairs of words coming


from the encoded input sequence and the query from
the output sequence.
Cross-attention
Update equation for a single query : H Enc
<latexit sha1_base64="8GOzta95oXZt9+TPIfwwxLmgkT0=">AAAB9XicbVBNSwMxEM3Wr1q/qh69BIvgqewWqR4LIvRYwX5Auy3ZNG1Dk+ySzKpl6f/w4kERr/4Xb/4b03UP2vpg4PHeDDPzgkhwA6775eTW1jc2t/LbhZ3dvf2D4uFRy4SxpqxJQxHqTkAME1yxJnAQrBNpRmQgWDuYXi/89j3ThofqDmYR8yUZKz7ilICV+vV+D9gjaJncKDofFEtu2U2BV4mXkRLK0BgUP3vDkMaSKaCCGNP13Aj8hGjgVLB5oRcbFhE6JWPWtVQRyYyfpFfP8ZlVhngUalsKcKr+nkiINGYmA9spCUzMsrcQ//O6MYyu/ISrKAZmv0oXjWKBIcSLCPCQa0ZBzCwhVHN7K6YTogkFG1TBhuAtv7xKWpWyVy1Xby9KtUoWRx6doFN0jjx0iWqojhqoiSjS6Am9oFfnwXl23pz3n9ack80coz9wPr4B2H2StQ==</latexit>

layer

h̄ = MHA(q, K, V ) 2 Rd , q 2 Rd , K, V 2 RLin ⇥d
<latexit sha1_base64="bbf7pzggQ3msE8AevKbDjzqpcZ0=">AAADLHicjVJdixMxFM2MX2v96q6PvlwsShdK6Syy7kthXRUKVVzFTheadshk0k7Y+WqSUcuYH+SLf0UQH1zEV3+Hmdqutt0HL4Q5nHtyT04mfhZxqVqtM8u+dPnK1Wtb1ys3bt66fae6vePKNBeU9WgapeLEJ5JFPGE9xVXETjLBSOxHrO+fPi37/XdMSJ4mb9UsY8OYTBI+5pQoQ3nb1hH2iShCDQ/bgBX7oERcvOw80fVpo9twdzFPcExU6PvFGz0KGoBhusF1G+CuksULbzmMJxorHjMJgQaMK6XPEZ/UAX/0irDt6FHHDF2qjbMXLrznul3oj16Vtuaz5hGszj2fAe+5CkHDFP5GCvVoCZ8xqj11QbBu2213zmXPE6r/K1TFq9Zazda8YBM4C1BDizr2ql9xkNI8ZomiEZFy4LQyNSyIUJxGTFdwLllG6CmZsIGBCTFGw2L+szU8MEwA41SYlSiYs//uKEgs5Sz2jbI8ulzvleRFvUGuxgdDkyzLFTPZ50bjPAKVQvlyIOCCURXNDCBUcHNWoCERhCrzvspLcNYjbwJ3r+nsN/dfP6od7i2uYwvdQ/dRHTnoMTpEHXSMeohan6wv1nfrzP5sf7N/2D//SG1rsecuWin712+P2AOo</latexit>

⇣ ⌘ LDec
<latexit sha1_base64="85MQDivJuxTSBnH6YaruPNAM3yQ=">AAAB/HicbVBNS8NAEN34WetXtEcvi0XwVJIi1WNBDx48VLAf0May2U7bpZsPdidiCPWvePGgiFd/iDf/jUmbg7Y+GHi8N8PMPDeUQqNlfRsrq2vrG5uFreL2zu7evnlw2NJBpDg0eSAD1XGZBil8aKJACZ1QAfNcCW13cpn57QdQWgT+HcYhOB4b+WIoOMNU6pulm/sewiMqL7kCPi1m6Jtlq2LNQJeJnZMyydHom1+9QcAjD3zkkmndta0QnYQpFFzCtNiLNISMT9gIuin1mQfaSWbHT+lJqgzoMFBp+Uhn6u+JhHlax56bdnoMx3rRy8T/vG6EwwsnEX4YIfh8vmgYSYoBzZKgA6GAo4xTwrgS6a2Uj5liHNO8shDsxZeXSatasWuV2u1ZuV7N4yiQI3JMTolNzkmdXJMGaRJOYvJMXsmb8WS8GO/Gx7x1xchnSuQPjM8faW+TRA==</latexit>

q = hDec
<latexit sha1_base64="EMdPqxz10n1W3qWZrv/RSRwQYxM=">AAACBnicbZDJSgNBEIZ74hbjNupRhMYgeAozQaIXIaAHjxHMAkkMPZ1K0qRnsbtGDENOXnwVLx4U8eozePNt7CyCJv7Q8PFXFdX1e5EUGh3ny0otLC4tr6RXM2vrG5tb9vZORYex4lDmoQxVzWMapAigjAIl1CIFzPckVL3++ahevQOlRRhc4yCCps+6gegIztBYLXv/lp7RBsI9Kj/pDW9+8AL4sIUtO+vknLHoPLhTyJKpSi37s9EOeexDgFwyreuuE2EzYQoFlzDMNGINEeN91oW6wYD5oJvJ+IwhPTROm3ZCZV6AdOz+nkiYr/XA90ynz7CnZ2sj879aPcbOaTMRQRQjBHyyqBNLiiEdZULbQgFHOTDAuBLmr5T3mGIcTXIZE4I7e/I8VPI5t5ArXB1ni/lpHGmyRw7IEXHJCSmSS1IiZcLJA3kiL+TVerSerTfrfdKasqYzu+SPrI9vwTyZSw==</latexit>

= kh=1 HAh (q, K, V ) W O , W O 2 Rd⇥d


H
LEnc
<latexit sha1_base64="hsLCwsvcLoF3ruaKtobkw3+YAU4=">AAAB/HicbVBNS8NAEN3Ur1q/oj16CRbBU0mKVI8FETx4qGA/oI1ls920SzebsDsRQ6h/xYsHRbz6Q7z5b9zEHLT1wcDjvRlm5nkRZwps+8sorayurW+UNytb2zu7e+b+QVeFsSS0Q0Ieyr6HFeVM0A4w4LQfSYoDj9OeN7vI/N49lYqF4haSiLoBngjmM4JBSyOzen03BPoAMkgvBZlXMozMml23c1jLxClIDRVoj8zP4TgkcUAFEI6VGjh2BG6KJTDC6bwyjBWNMJnhCR1oKnBAlZvmx8+tY62MLT+UugRYufp7IsWBUkng6c4Aw1Qtepn4nzeIwT93UyaiGKh+LV/kx9yC0MqSsMZMUgI80QQTyfStFpliiQnovLIQnMWXl0m3UXea9ebNaa3VKOIoo0N0hE6Qg85QC12hNuogghL0hF7Qq/FoPBtvxvtPa8koZqroD4yPb3jek04=</latexit>

with q = hDec
t 2 Rd , K = V = H Enc 2 RLin ⇥d

cross-attention

q = Query word
(index t)

he drives the car il conduit la voiture


Future word that is
hidden at time t.
Input sequence Output word
encoded by self-attention encoded by self-attention
Xavier Bresson
HEnc htDec 48
49

Cross-attention layer

To speed up training, we also use the mask matrix to


predict multiple next words at the same time without
looking at the future.
Cross-attention
Update equation for all queries with the mask matrix : Decoding Layer
H Enc
<latexit sha1_base64="8GOzta95oXZt9+TPIfwwxLmgkT0=">AAAB9XicbVBNSwMxEM3Wr1q/qh69BIvgqewWqR4LIvRYwX5Auy3ZNG1Dk+ySzKpl6f/w4kERr/4Xb/4b03UP2vpg4PHeDDPzgkhwA6775eTW1jc2t/LbhZ3dvf2D4uFRy4SxpqxJQxHqTkAME1yxJnAQrBNpRmQgWDuYXi/89j3ThofqDmYR8yUZKz7ilICV+vV+D9gjaJncKDofFEtu2U2BV4mXkRLK0BgUP3vDkMaSKaCCGNP13Aj8hGjgVLB5oRcbFhE6JWPWtVQRyYyfpFfP8ZlVhngUalsKcKr+nkiINGYmA9spCUzMsrcQ//O6MYyu/ISrKAZmv0oXjWKBIcSLCPCQa0ZBzCwhVHN7K6YTogkFG1TBhuAtv7xKWpWyVy1Xby9KtUoWRx6doFN0jjx0iWqojhqoiSjS6Am9oFfnwXl23pz3n9ack80coz9wPr4B2H2StQ==</latexit>

H̄ = Mask-MHA(Q, K, V ) 2 RLout ⇥d , Q 2 RLout ⇥d , K, V 2 RLin ⇥d


<latexit sha1_base64="UmhFjljV1ITuNtIXIbO1dm4NLXY=">AAADa3icjVJdaxNBFJ1s/KhR29Q+KOrDxVBJIYbdItWXQP2CQCw2YpJCJllmZyfZIfvFzKwa1n3wL/rmP/DF/+DsurFtqtILwxzOPXfuucN1Yp9LZZrfK0b1ytVr1zdu1G7eur25Vd++M5RRIigb0MiPxIlDJPN5yAaKK5+dxIKRwPHZyFm8yvOjj0xIHoUf1DJmk4DMQz7jlChN2duVr9ghIu1m8LgDWLHPSgTpEZGLJ0fdF1mz3+q1hnuYhzggynOc9H02Td/aK2GUqAwrHjAJbtYCDP3LSnstGP5Ty8NTKWBcy6295PMm4C926nWsbNrVvVZq7dP2SqeFbg9G03e5G32t9XDPv/vnDfjElQcZ9KED3emKfc1odvnZe51h57T2Tfif2rMD1ux6w2ybRcBFYJWggco4tuvfsBvRJGChoj6RcmyZsZqkRChOfZbVcCJZTOiCzNlYw5DoRpO02JUMdjXjwiwS+oQKCvZsRUoCKZeBo5W5dbmey8m/5caJmj2f6MniRDE9e9FolvigIsgXD1wuGFX+UgNCBddegXpEEKr0euafYK2PfBEM99vWQfug/7RxuF9+xwZ6gB6hJrLQM3SIuugYDRCt/DA2jbvGPeNndad6v/rwt9SolDU76FxUd38BdKgYBg==</latexit>

LDec
<latexit sha1_base64="85MQDivJuxTSBnH6YaruPNAM3yQ=">AAAB/HicbVBNS8NAEN34WetXtEcvi0XwVJIi1WNBDx48VLAf0May2U7bpZsPdidiCPWvePGgiFd/iDf/jUmbg7Y+GHi8N8PMPDeUQqNlfRsrq2vrG5uFreL2zu7evnlw2NJBpDg0eSAD1XGZBil8aKJACZ1QAfNcCW13cpn57QdQWgT+HcYhOB4b+WIoOMNU6pulm/sewiMqL7kCPi1m6Jtlq2LNQJeJnZMyydHom1+9QcAjD3zkkmndta0QnYQpFFzCtNiLNISMT9gIuin1mQfaSWbHT+lJqgzoMFBp+Uhn6u+JhHlax56bdnoMx3rRy8T/vG6EwwsnEX4YIfh8vmgYSYoBzZKgA6GAo4xTwrgS6a2Uj5liHNO8shDsxZeXSatasWuV2u1ZuV7N4yiQI3JMTolNzkmdXJMGaRJOYvJMXsmb8WS8GO/Gx7x1xchnSuQPjM8faW+TRA==</latexit>

⇣ ⌘
H Dec
<latexit sha1_base64="qh9MG+z7H9Zjv2zQa5Lwo0ZIujQ=">AAAB9XicbVBNS8NAEN34WetX1aOXYBE8laRI9VjQQ48V7Ae0adlsJ+3S3STsTtQS+j+8eFDEq//Fm//GbZuDtj4YeLw3w8w8PxZco+N8W2vrG5tb27md/O7e/sFh4ei4qaNEMWiwSESq7VMNgofQQI4C2rECKn0BLX98M/NbD6A0j8J7nMTgSToMecAZRSP1ar0uwhMqmd4Cm/YLRafkzGGvEjcjRZKh3i98dQcRSySEyATVuuM6MXopVciZgGm+m2iIKRvTIXQMDakE7aXzq6f2uVEGdhApUyHac/X3REql1hPpm05JcaSXvZn4n9dJMLj2Uh7GCULIFouCRNgY2bMI7AFXwFBMDKFMcXOrzUZUUYYmqLwJwV1+eZU0yyW3UqrcXRar5SyOHDklZ+SCuOSKVEmN1EmDMKLIM3klb9aj9WK9Wx+L1jUrmzkhf2B9/gDJQJKr</latexit>

= kh=1 HAh (Q, K, V ) W O , W O 2 Rd⇥d


H Enc
<latexit sha1_base64="hsLCwsvcLoF3ruaKtobkw3+YAU4=">AAAB/HicbVBNS8NAEN3Ur1q/oj16CRbBU0mKVI8FETx4qGA/oI1ls920SzebsDsRQ6h/xYsHRbz6Q7z5b9zEHLT1wcDjvRlm5nkRZwps+8sorayurW+UNytb2zu7e+b+QVeFsSS0Q0Ieyr6HFeVM0A4w4LQfSYoDj9OeN7vI/N49lYqF4haSiLoBngjmM4JBSyOzen03BPoAMkgvBZlXMozMml23c1jLxClIDRVoj8zP4TgkcUAFEI6VGjh2BG6KJTDC6bwyjBWNMJnhCR1oKnBAlZvmx8+tY62MLT+UugRYufp7IsWBUkng6c4Aw1Qtepn4nzeIwT93UyaiGKh+LV/kx9yC0MqSsMZMUgI80QQTyfStFpliiQnovLIQnMWXl0m3UXea9ebNaa3VKOIoo0N0hE6Qg85QC12hNuogghL0hF7Qq/FoPBtvxvtPa8koZqroD4yPb3jek04=</latexit>

with Q = H Dec 2 RLout ⇥d , K = V = H Enc 2 RLin ⇥d

Xavier Bresson 49
50

Decoder layer

Output of the decoder :


P
<latexit sha1_base64="m0xvASNUb9dCgRz7Qx9GG0Y4ShM=">AAAB6HicbVDLSgMxFL1TX7W+qi7dBIvgqswUqS4Lbly2YB/QDpJJ77SxmcyQZIQy9AvcuFDErZ/kzr8xbWehrQcCh3POJfeeIBFcG9f9dgobm1vbO8Xd0t7+weFR+fiko+NUMWyzWMSqF1CNgktsG24E9hKFNAoEdoPJ7dzvPqHSPJb3ZpqgH9GR5CFn1Fip1XwoV9yquwBZJ15OKpDD5r8Gw5ilEUrDBNW677mJ8TOqDGcCZ6VBqjGhbEJH2LdU0gi1ny0WnZELqwxJGCv7pCEL9fdERiOtp1FgkxE1Y73qzcX/vH5qwhs/4zJJDUq2/ChMBTExmV9NhlwhM2JqCWWK210JG1NFmbHdlGwJ3urJ66RTq3r1ar11VWnU8jqKcAbncAkeXEMD7qAJbWCA8Ayv8OY8Oi/Ou/OxjBacfOYU/sD5/AGoW4zP</latexit>

Decoder
Layer
Dec
<latexit sha1_base64="pV/DVfyUUr+uDZB87nCDl0iNDj8=">AAACAHicbVC7SgNBFJ2NrxhfqxYWNotBsAq7QaKNENAihUUE84BsEmYnN8mQ2Qczd8WwbOOv2FgoYutn2Pk3Th6FJh64cDjnXu69x4sEV2jb30ZmZXVtfSO7mdva3tndM/cP6iqMJYMaC0Uomx5VIHgANeQooBlJoL4noOGNrid+4wGk4mFwj+MI2j4dBLzPGUUtdc2jSidxQYir246L8IjST26ApWnXzNsFewprmThzkidzVLvml9sLWexDgExQpVqOHWE7oRI5E5Dm3FhBRNmIDqClaUB9UO1k+kBqnWqlZ/VDqStAa6r+nkior9TY93SnT3GoFr2J+J/XirF/2U54EMUIAZst6sfCwtCapGH1uASGYqwJZZLrWy02pJIy1JnldAjO4svLpF4sOKVC6e48Xy7O48iSY3JCzohDLkiZVEiV1AgjKXkmr+TNeDJejHfjY9aaMeYzh+QPjM8fDkyWrg==</latexit>

Dec H `=L
<latexit sha1_base64="tljXSfJ7LQlN6FESUsHB5APNtYM=">AAAF8XictVRLbxMxEN6WDS3hlcKRi0VElSjpardChUukIqiUQ4qWR9JKcRJ5HSexso9ge2mjlf8FFw4gxJV/w41/g3e72yYbHjnAHKzxPDzfN2PbmbmUC9P8sbF5TS9c39q+Ubx56/adu6Wdex0ehAyTNg7cgJ06iBOX+qQtqHDJ6YwR5DkuOXGmz2P/yXvCOA38t2I+Iz0PjX06ohgJZRrs6Fu7UJBzwbxoFDAgASSu2zDrhmHUW/3M9YJguWcBCIu7AL4L0TBZAHQQi5qyH8VJNUuCBmj2Yx3UQJZ6jPh077j5TFYyS+ulrFyEVetr2qoAUh96SEwcJ3qtCrYGWVQQCgkF9QgHQwmKEIIcxgkSOYwrsP+GNp+wgjHdHfn4z751eSTGZLjR2YQKIqMsDE7CMYkqUsr8MJpLFPOsFyi27By7XOz63Y4hZJ4zKibq+qQoGqZsXNZoqRqXBMayv3jYILJkctl+618uLqGsZXv7SFb/bT8X7nswJAyoQ2ahWKC1/CTk/6lus8BBDnWpmF8hsOOppgFvgpHw0PnVGJOZKow5eNW12tORRSWDUtk0zETAqmKlSllLxR6UvsNhgEOP+AK7iPOuZc5EL0JMUOwSWYQhJzOEp2hMukr1karVi5I2SPBIWYYg/nFGgS9AYl3MiJDH+dxzVGQMn+d9sfFXvm4oRk97EfVV04h6cUmhUegCEYD4+wNDyggW7lwpCDOqsAI8QQxhoT7JuAlWnvKq0tk3rAPj4NXj8uF+2o5t7YH2UKtolvZEO9Samq21Naz7+gf9k/65wAsfC18KXy9CNzfSnPvakhS+/QSsoxYk</latexit>

for ` = 0, ..., L 1 <latexit sha1_base64="xZKdwo3g3YWoOjhafVmgugmsWhA=">AAAB8XicbVDLSgNBEOz1GeMr6tHLYBAEIewGiR4DXnKMYB6YrGF20kmGzM4uM7NCWPIXXjwo4tW/8ebfOEn2oIkFDUVVN91dQSy4Nq777aytb2xubed28rt7+weHhaPjpo4SxbDBIhGpdkA1Ci6xYbgR2I4V0jAQ2ArGtzO/9YRK80jem0mMfkiHkg84o8ZKD7XHtItCXHrTXqHoltw5yCrxMlKEDPVe4avbj1gSojRMUK07nhsbP6XKcCZwmu8mGmPKxnSIHUslDVH76fziKTm3Sp8MImVLGjJXf0+kNNR6Ega2M6RmpJe9mfif10nM4MZPuYwTg5ItFg0SQUxEZu+TPlfIjJhYQpni9lbCRlRRZmxIeRuCt/zyKmmWS16lVLm7KlbLWRw5OIUzuAAPrqEKNahDAxhIeIZXeHO08+K8Ox+L1jUnmzmBP3A+fwD2/pBs</latexit>

H̄ `+1 ` `
= H + Mask-MHA(LN(H ), LN(H ), LN(H )) 2 R ` ` Lout ⇥d H `+1
H Enc
<latexit sha1_base64="8GOzta95oXZt9+TPIfwwxLmgkT0=">AAAB9XicbVBNSwMxEM3Wr1q/qh69BIvgqewWqR4LIvRYwX5Auy3ZNG1Dk+ySzKpl6f/w4kERr/4Xb/4b03UP2vpg4PHeDDPzgkhwA6775eTW1jc2t/LbhZ3dvf2D4uFRy4SxpqxJQxHqTkAME1yxJnAQrBNpRmQgWDuYXi/89j3ThofqDmYR8yUZKz7ilICV+vV+D9gjaJncKDofFEtu2U2BV4mXkRLK0BgUP3vDkMaSKaCCGNP13Aj8hGjgVLB5oRcbFhE6JWPWtVQRyYyfpFfP8ZlVhngUalsKcKr+nkiINGYmA9spCUzMsrcQ//O6MYyu/ISrKAZmv0oXjWKBIcSLCPCQa0ZBzCwhVHN7K6YTogkFG1TBhuAtv7xKWpWyVy1Xby9KtUoWRx6doFN0jjx0iWqojhqoiSjS6Am9oFfnwXl23pz3n9ack80coz9wPr4B2H2StQ==</latexit>

Self-attention
( Self-
<latexit sha1_base64="ILtqT4137CxtFtJrhX6FLAgEBjU=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBZBEEpSpHoseOmxgq2FJpbNdtMu3WzC7qRQQv6JFw+KePWfePPfuG1z0NYHA4/3ZpiZFySCa3Ccb6u0sbm1vVPereztHxwe2ccnXR2nirIOjUWsegHRTHDJOsBBsF6iGIkCwR6Dyd3cf5wypXksH2CWMD8iI8lDTgkYaWDb3phA1sqfMo8JceXmA7vq1JwF8DpxC1JFBdoD+8sbxjSNmAQqiNZ910nAz4gCTgXLK16qWULohIxY31BJIqb9bHF5ji+MMsRhrExJwAv190RGIq1nUWA6IwJjverNxf+8fgrhrZ9xmaTAJF0uClOBIcbzGPCQK0ZBzAwhVHFzK6ZjoggFE1bFhOCuvrxOuvWa26g17q+rzXoRRxmdoXN0iVx0g5qohdqogyiaomf0it6szHqx3q2PZWvJKmZO0R9Ynz9QOZNq</latexit>

Ĥ `+1
= H̄ `+1
+ Mask-MHA(LN(H̄ `+1
), LN(H Enc
), LN(H Enc
)) 2 R Lout ⇥d Ĥ `+1
attention
Cross-attention
`+1 `+1 `+1 Lout ⇥d
)) 2 R LDec
<latexit sha1_base64="85MQDivJuxTSBnH6YaruPNAM3yQ=">AAAB/HicbVBNS8NAEN34WetXtEcvi0XwVJIi1WNBDx48VLAf0May2U7bpZsPdidiCPWvePGgiFd/iDf/jUmbg7Y+GHi8N8PMPDeUQqNlfRsrq2vrG5uFreL2zu7evnlw2NJBpDg0eSAD1XGZBil8aKJACZ1QAfNcCW13cpn57QdQWgT+HcYhOB4b+WIoOMNU6pulm/sewiMqL7kCPi1m6Jtlq2LNQJeJnZMyydHom1+9QcAjD3zkkmndta0QnYQpFFzCtNiLNISMT9gIuin1mQfaSWbHT+lJqgzoMFBp+Uhn6u+JhHlax56bdnoMx3rRy8T/vG6EwwsnEX4YIfh8vmgYSYoBzZKgA6GAo4xTwrgS6a2Uj5liHNO8shDsxZeXSatasWuV2u1ZuV7N4yiQI3JMTolNzkmdXJMGaRJOYvJMXsmb8WS8GO/Gx7x1xchnSuQPjM8faW+TRA==</latexit>

H = Ĥ + MLP(LN(Ĥ <latexit sha1_base64="f710BQpexVpuk5buPMohYM8b0PI=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBZBEEpSpHoseOmxgq2FJpbNdtIu3WzC7qZQQv6JFw+KePWfePPfuG1z0NYHA4/3ZpiZFyScKe0431ZpY3Nre6e8W9nbPzg8so9PuipOJYUOjXksewFRwJmAjmaaQy+RQKKAw2MwuZv7j1OQisXiQc8S8CMyEixklGgjDWzbC4jMWvlT5gHnV24+sKtOzVkArxO3IFVUoD2wv7xhTNMIhKacKNV3nUT7GZGaUQ55xUsVJIROyAj6hgoSgfKzxeU5vjDKEIexNCU0Xqi/JzISKTWLAtMZET1Wq95c/M/rpzq89TMmklSDoMtFYcqxjvE8BjxkEqjmM0MIlczciumYSEK1CatiQnBXX14n3XrNbdQa99fVZr2Io4zO0Dm6RC66QU3UQm3UQRRN0TN6RW9WZr1Y79bHsrVkFTOn6A+szx9DrZNi</latexit>

Enc
H̄ `+1 Cross-
<latexit sha1_base64="hsLCwsvcLoF3ruaKtobkw3+YAU4=">AAAB/HicbVBNS8NAEN3Ur1q/oj16CRbBU0mKVI8FETx4qGA/oI1ls920SzebsDsRQ6h/xYsHRbz6Q7z5b9zEHLT1wcDjvRlm5nkRZwps+8sorayurW+UNytb2zu7e+b+QVeFsSS0Q0Ieyr6HFeVM0A4w4LQfSYoDj9OeN7vI/N49lYqF4haSiLoBngjmM4JBSyOzen03BPoAMkgvBZlXMozMml23c1jLxClIDRVoj8zP4TgkcUAFEI6VGjh2BG6KJTDC6bwyjBWNMJnhCR1oKnBAlZvmx8+tY62MLT+UugRYufp7IsWBUkng6c4Aw1Qtepn4nzeIwT93UyaiGKh+LV/kx9yC0MqSsMZMUgI80QQTyfStFpliiQnovLIQnMWXl0m3UXea9ebNaa3VKOIoo0N0hE6Qg85QC12hNuogghL0hF7Qq/FoPBtvxvtPa8koZqroD4yPb3jek04=</latexit>

with H `=0 = LL({gout out


1 , ..., gLout } + PE) 2 R
Lout ⇥d
( <latexit sha1_base64="5RFO9N/dSZ2LHccp9apzqQuH7IY=">AAAB8XicbVDLSgNBEOz1GeMr6tHLYBA8hd0g0YsQ8JJjBPPAZA2zk04yZHZ2mZkVwpK/8OJBEa/+jTf/xkmyB00saCiquunuCmLBtXHdb2dtfWNzazu3k9/d2z84LBwdN3WUKIYNFolItQOqUXCJDcONwHaskIaBwFYwvp35rSdUmkfy3kxi9EM6lHzAGTVWeqg9pl0U4sad9gpFt+TOQVaJl5EiZKj3Cl/dfsSSEKVhgmrd8dzY+ClVhjOB03w30RhTNqZD7FgqaYjaT+cXT8m5VfpkEClb0pC5+nsipaHWkzCwnSE1I73szcT/vE5iBtd+ymWcGJRssWiQCGIiMnuf9LlCZsTEEsoUt7cSNqKKMmNDytsQvOWXV0mzXPIqpcrdZbFazuLIwSmcwQV4cAVVqEEdGsBAwjO8wpujnRfn3flYtK452cwJ/IHz+QMQ9JB9</latexit>
attention

H `=0
Decoder output H `=LDec
2R Lout ⇥d
(
Dec
Probability output P = Softmax(MLP(H L )) 2 RLout ⇥V

Xavier Bresson 50
51

Generation
At inference, the input sequence is first encoded in parallel and provides HEnc.
Then, the output sequence is generated auto-regressively, i.e. one word at a time.

wt+1 ⇠ pt 2 RV
<latexit sha1_base64="UZLviB/OeiyVLo3Xb3VrvcL0w0c=">AAACC3icbVBNS8NAEN3Ur1q/qh69LC2CIJSkSPVY8OKxiv2ApobNdtsu3WzC7kQpIXcv/hUvHhTx6h/w5r9x0/agrQ8GHu/NMDPPjwTXYNvfVm5ldW19I79Z2Nre2d0r7h+0dBgrypo0FKHq+EQzwSVrAgfBOpFiJPAFa/vjy8xv3zOleShvYRKxXkCGkg84JWAkr1h68BI4dVLsah7gyAPscukGBEa+n9ykd0kr9Yplu2JPgZeJMydlNEfDK365/ZDGAZNABdG669gR9BKigFPB0oIbaxYROiZD1jVUkoDpXjL9JcXHRunjQahMScBT9fdEQgKtJ4FvOrMr9aKXif953RgGF72EyygGJuls0SAWGEKcBYP7XDEKYmIIoYqbWzEdEUUomPgKJgRn8eVl0qpWnFqldn1WrlfnceTRESqhE+Sgc1RHV6iBmoiiR/SMXtGb9WS9WO/Wx6w1Z81nDtEfWJ8/tUKa0w==</latexit>

Generate next word wt+1 .


<latexit sha1_base64="7y4p2oJ2w1VgAU+YI4tyX7riCcI=">AAAFrXichVRbb9MwFM62Fsa4bfDIi8UGWtkIyYQGQqo0BBOTVqCwtdtY2spxnNZabnVO2KrIv45/wBv/BidNuibdxQ/R8bn4+84XH5uBw0LQtH9z8wuV6p27i/eW7j94+Ojx8sqTduhHnNAW8R2fH5s4pA7zaAsYOPQ44BS7pkOPzLNPSfzoN+Uh871DGAW04+K+x2xGMEhXb2Xhz0sD6AVwN/5CPcoxUOTJPTr3uYXWznsxbOhiTRXIMJYmqd8jCCJAvo0OOfZC2+cu5cj1Lep8KGbKEJIe6jh1bVNV1c1GNw99pkS81tNsZAwjbKUfZJiYx0PRjZMiiY3qCA27yQZtoLz2695HsZ5vGt/E+jijtjnt27/C1x77aggZzDNcDAPTjH+KrjXLY4ChzCPndhubLK+WJhbw9ybt73pE1G6K1W6lOJwmN8N3il+jWeJXyr1Oj7zmnMFA/sYMr66J+uS0xuXJfTFpwI9AyKsjSeSe5q7oQa0IElui3NL+JUT7GrT4WjxdpFfsBj5GkVGZDxjAXBqiMa/SbQ+4b2KTOQxGUougBxNaB74NLr64VCIVPFereOOl1iXQdnFiDrAbONNDWMTNJhIZIXMTEqqs7S2vaqqWLjRr6JmxqmSr2Vv+a1g+iVzqAXFwGJ7qWgCdGHNgxKFiyYhCGmByhvv0VJoelqJ04vS1EeiF9FgoGWzb9wCl3umKGLthOHJNmZl0GZZjifOq2GkE9vtOzDwpNpUDkALZkYPAR8nThSzGKQFnJA1MOJNcERlgjgnIBy4RQS+3PGu0t1R9W93+8XZ1ZyuTY1F5pjxX1hVdeafsKHtKU2kppPKq0qycVH5V31RbVaPaHafOz2U1T5XCqvb/Awaw8RY=</latexit>

Output of Transformer model:


Dec
<latexit sha1_base64="BVond1STyd72W4TckxCGF7uiBt0=">AAACAHicbVC7SgNBFJ2NrxhfUQsLm8UgWIXdINFGCGhhYRHBPCC7CbOTm2TI7MOZu2JYtvFXbCwUsfUz7PwbJ49CEw9cOJxzL/fe40WCK7SsbyOztLyyupZdz21sbm3v5Hf36iqMJYMaC0Uomx5VIHgANeQooBlJoL4noOENL8d+4wGk4mFwh6MIXJ/2A97jjKKWOvmD+3bigBAXN20H4RGln1wBS9NOvmAVrQnMRWLPSIHMUO3kv5xuyGIfAmSCKtWyrQjdhErkTECac2IFEWVD2oeWpgH1QbnJ5IHUPNZK1+yFUleA5kT9PZFQX6mR7+lOn+JAzXtj8T+vFWPv3E14EMUIAZsu6sXCxNAcp2F2uQSGYqQJZZLrW002oJIy1JnldAj2/MuLpF4q2uVi+fa0UCnN4siSQ3JETohNzkiFXJMqqRFGUvJMXsmb8WS8GO/Gx7Q1Y8xm9skfGJ8/T/aW1w==</latexit>

for ` = 0, ..., L Dec


1 q `=L
q `+1
<latexit sha1_base64="9fVdnvrffyKxrSNU3Lc6k3mkK08=">AAAB8XicbVDLSgNBEOz1GeMr6tHLYBAEIewGiR4DXjxGMA9M1jA76SRDZmfXmVkhLPkLLx4U8erfePNvnCR70MSChqKqm+6uIBZcG9f9dlZW19Y3NnNb+e2d3b39wsFhQ0eJYlhnkYhUK6AaBZdYN9wIbMUKaRgIbAaj66nffEKleSTvzDhGP6QDyfucUWOl+8eHtINCnHuTbqHoltwZyDLxMlKEDLVu4avTi1gSojRMUK3bnhsbP6XKcCZwku8kGmPKRnSAbUslDVH76eziCTm1So/0I2VLGjJTf0+kNNR6HAa2M6RmqBe9qfif105M/8pPuYwTg5LNF/UTQUxEpu+THlfIjBhbQpni9lbChlRRZmxIeRuCt/jyMmmUS16lVLm9KFbLWRw5OIYTOAMPLqEKN1CDOjCQ8Ayv8OZo58V5dz7mrStONnMEf+B8/gA2opCV</latexit>

q̄ `+1 = q ` + MHA(LN(q ` ), LN(K ` ), LN(V ` )) 2 Rd


H Enc
<latexit sha1_base64="8GOzta95oXZt9+TPIfwwxLmgkT0=">AAAB9XicbVBNSwMxEM3Wr1q/qh69BIvgqewWqR4LIvRYwX5Auy3ZNG1Dk+ySzKpl6f/w4kERr/4Xb/4b03UP2vpg4PHeDDPzgkhwA6775eTW1jc2t/LbhZ3dvf2D4uFRy4SxpqxJQxHqTkAME1yxJnAQrBNpRmQgWDuYXi/89j3ThofqDmYR8yUZKz7ilICV+vV+D9gjaJncKDofFEtu2U2BV4mXkRLK0BgUP3vDkMaSKaCCGNP13Aj8hGjgVLB5oRcbFhE6JWPWtVQRyYyfpFfP8ZlVhngUalsKcKr+nkiINGYmA9spCUzMsrcQ//O6MYyu/ISrKAZmv0oXjWKBIcSLCPCQa0ZBzCwhVHN7K6YTogkFG1TBhuAtv7xKWpWyVy1Xby9KtUoWRx6doFN0jjx0iWqojhqoiSjS6Am9oFfnwXl23pz3n9ack80coz9wPr4B2H2StQ==</latexit>

`+1 ` ` Enc Enc d


q̂ = q̄ + MHA(LN(q̄) , LN(H ), LN(H )) 2 R q̂ `+1
<latexit sha1_base64="y+lOxtA4VKy9T0yG8wFD9pFO2FE=">AAAB+XicbVBNS8NAEN34WetX1KOXxSIIQkmKVI8FLx4r2A9oYtlsN+3SzSbuTgol5J948aCIV/+JN/+N2zYHbX0w8Hhvhpl5QSK4Bsf5ttbWNza3tks75d29/YND++i4reNUUdaisYhVNyCaCS5ZCzgI1k0UI1EgWCcY3878zoQpzWP5ANOE+REZSh5ySsBIfdv2RgSyp/wx85gQl27etytO1ZkDrxK3IBVUoNm3v7xBTNOISaCCaN1znQT8jCjgVLC87KWaJYSOyZD1DJUkYtrP5pfn+NwoAxzGypQEPFd/T2Qk0noaBaYzIjDSy95M/M/rpRDe+BmXSQpM0sWiMBUYYjyLAQ+4YhTE1BBCFTe3YjoiilAwYZVNCO7yy6ukXau69Wr9/qrSqBVxlNApOkMXyEXXqIHuUBO1EEUT9Ixe0ZuVWS/Wu/WxaF2zipkT9AfW5w+P95OT</latexit>

q `+1 = q̂ `+1 + MLP(LN(q̂ `+1 )) 2 Rd LDec


<latexit sha1_base64="85MQDivJuxTSBnH6YaruPNAM3yQ=">AAAB/HicbVBNS8NAEN34WetXtEcvi0XwVJIi1WNBDx48VLAf0May2U7bpZsPdidiCPWvePGgiFd/iDf/jUmbg7Y+GHi8N8PMPDeUQqNlfRsrq2vrG5uFreL2zu7evnlw2NJBpDg0eSAD1XGZBil8aKJACZ1QAfNcCW13cpn57QdQWgT+HcYhOB4b+WIoOMNU6pulm/sewiMqL7kCPi1m6Jtlq2LNQJeJnZMyydHom1+9QcAjD3zkkmndta0QnYQpFFzCtNiLNISMT9gIuin1mQfaSWbHT+lJqgzoMFBp+Uhn6u+JhHlax56bdnoMx3rRy8T/vG6EwwsnEX4YIfh8vmgYSYoBzZKgA6GAo4xTwrgS6a2Uj5liHNO8shDsxZeXSatasWuV2u1ZuV7N4yiQI3JMTolNzkmdXJMGaRJOYvJMXsmb8WS8GO/Gx7x1xchnSuQPjM8faW+TRA==</latexit>

with q `=0 = LL(gout + PEt ) 2 Rd q̄ `+1


<latexit sha1_base64="/JH3w71QsRgyuS0g4aX3fBoVGY4=">AAAB+XicbVBNS8NAEJ34WetX1KOXxSIIQkmKVI8FLx4r2A9oYtlsN+3SzSbubgol5J948aCIV/+JN/+N2zYHbX0w8Hhvhpl5QcKZ0o7zba2tb2xubZd2yrt7+weH9tFxW8WpJLRFYh7LboAV5UzQlmaa024iKY4CTjvB+HbmdyZUKhaLBz1NqB/hoWAhI1gbqW/bXoBl9pQ/Zh7l/NLN+3bFqTpzoFXiFqQCBZp9+8sbxCSNqNCEY6V6rpNoP8NSM8JpXvZSRRNMxnhIe4YKHFHlZ/PLc3RulAEKY2lKaDRXf09kOFJqGgWmM8J6pJa9mfif10t1eONnTCSppoIsFoUpRzpGsxjQgElKNJ8agolk5lZERlhiok1YZROCu/zyKmnXqm69Wr+/qjRqRRwlOIUzuAAXrqEBd9CEFhCYwDO8wpuVWS/Wu/WxaF2zipkT+APr8weDa5OL</latexit>

LEnc
<latexit sha1_base64="hsLCwsvcLoF3ruaKtobkw3+YAU4=">AAAB/HicbVBNS8NAEN3Ur1q/oj16CRbBU0mKVI8FETx4qGA/oI1ls920SzebsDsRQ6h/xYsHRbz6Q7z5b9zEHLT1wcDjvRlm5nkRZwps+8sorayurW+UNytb2zu7e+b+QVeFsSS0Q0Ieyr6HFeVM0A4w4LQfSYoDj9OeN7vI/N49lYqF4haSiLoBngjmM4JBSyOzen03BPoAMkgvBZlXMozMml23c1jLxClIDRVoj8zP4TgkcUAFEI6VGjh2BG6KJTDC6bwyjBWNMJnhCR1oKnBAlZvmx8+tY62MLT+UugRYufp7IsWBUkng6c4Aw1Qtepn4nzeIwT93UyaiGKh+LV/kx9yC0MqSsMZMUgI80QQTyfStFpliiQnovLIQnMWXl0m3UXea9ebNaa3VKOIoo0N0hE6Qg85QC12hNuogghL0hF7Qq/FoPBtvxvtPa8koZqroD4yPb3jek04=</latexit>

K `=0 = V `=0 = LL({gout out


1 , ..., gt } + PE) 2 R
t⇥d

`=LDec
Output probability pt = Softmax(MLP(q )) 2 RV
Sample next word probability wt+1 ⇠ pt .

{gout out
<latexit sha1_base64="BHpDNv4RzJ4vrRNZUsaodzbJscY=">AAACKHicdZDLSsNAFIYn9VbrLerSTbAILkpIilR3Fty4rGAv0NQwmU7aoZMLMydiCXkcN76KGxFFuvVJnLYRtNUDAx//fw5nzu/FnEmwrIlWWFldW98obpa2tnd29/T9g5aMEkFok0Q8Eh0PS8pZSJvAgNNOLCgOPE7b3uhq6rfvqZAsCm9hHNNegAch8xnBoCRXv3RSB+gDiCAdZHffGCWQuamdVUzTrPzrQ+Zkrl62TGtWxjLYOZRRXg1Xf3X6EUkCGgLhWMqubcXQS7EARjjNSk4iaYzJCA9oV2GIAyp76ezQzDhRSt/wI6FeCMZM/TmR4kDKceCpzgDDUC56U/Evr5uAf9FLWRgnQEMyX+Qn3IDImKZm9JmgBPhYASaCqb8aZIgFJqCyLakQ7MWTl6FVNe2aWbs5K9ereRxFdISO0Smy0Tmqo2vUQE1E0CN6Rm/oXXvSXrQPbTJvLWj5zCH6VdrnFzOKqRs=</latexit>

1 , ..., gt }
Xavier Bresson 51
52

Understanding attention layers

There are two types of attention layers.


Self-attention layer
Used to represent a word in (learned) context.
Cross-attention layer
Used to query with (learned) matching mechanism.

Xavier Bresson 52
53

Understanding self-attention

Illustration of self-attention :
Language model during training
Language model at inference
These two representations of
the word “broke” are different.
Classification French
layer

<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>

<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>

Self-attention
layer
           
<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>

 <latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>

   
Word
<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit> <latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit> <latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit> <latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit> <latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit> <latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit> <latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit> <latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit> <latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit> <latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit> <latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>

<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit> <latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit> <latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit> <latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>

embedding layer

I live in Paris. My citizenship is MASK. I eat cheese. Sandy broke the world record. Sandy broke the law.

During training, the network learns to give At inference, the network computes the
attention to the words (the context) that word representation depending on the
make sense to predict the masked word. context.

Xavier Bresson 53
54

Understanding cross-attention

Illustration of cross-attention :
Machine translation

Matching the same 


word in two languages.
<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>

 <latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>

Input(EN): He drives the car Output(FR): Il conduit la voiture

Xavier Bresson 54
55

Reception field

Multiple layers increase the reception field.

This word depends on the context


of 3 words in Layer 1 and the
context of 8 words in Layer 0.


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>

Layer 2


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>

Layer 1


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>


<latexit sha1_base64="wnZRiOy9uRjGNvcrK/B4kdblvVc=">AAACIHicbVBNS8NAEN34bfyqevQSLYKnkohYj4IXjxVsKyShbDaTdnGzCbsToYT+FC/+FS8eFNGb/hq3bQ7aOssOj/dmZndelAuu0XW/rIXFpeWV1bV1e2Nza3untrvX0VmhGLRZJjJ1F1ENgktoI0cBd7kCmkYCutH91VjvPoDSPJO3OMwhTGlf8oQziobq1ZqBgAR9O4igz2VJlaLDUSnEyA4OzQnG2Q5AxpVkB4r3Bxj2anW34U7CmQdeBeqkilav9hnEGStSkMgE1dr33BxDMxU5E2DmFhpyyu5pH3wDJU1Bh+VkwZFzbJjYSTJlrkRnwv7uKGmq9TCNTGVKcaBntTH5n+YXmFyEJZd5gSDZ9KGkEA5mztgtJ+YKGIqhAZQpbv7qsAFVlKHx1DYmeLMrz4POacM7b3g3Z/XLk8qONXJAjsgJ8UiTXJJr0iJtwsgjeSav5M16sl6sd+tjWrpgVT375E9Y3z/p3aK8</latexit>

Layer 0

The size of the original reception field/context of attention


increases at each layer. For each word, the context size in the
previous layer is variable (but small due to sparse softmax).

Xavier Bresson 55
56

Hierarchical representation
Multiple layers capture hierarchical
representation.
Layer 2
A simple illustration
Given the distribution of data (Layer 0).
Suppose that the attention context is
defined by the closest data points.
At each layer, the self-attention Layer 1

mechanism smooths out the distribution


with the context.
Successive layers provide a multi-
scale/hierarchical representation of the
data. Layer 0

Xavier Bresson 56
57

Lab 02
PyTorch implementation of Seq2Seq Transformers

LDec
<latexit sha1_base64="85MQDivJuxTSBnH6YaruPNAM3yQ=">AAAB/HicbVBNS8NAEN34WetXtEcvi0XwVJIi1WNBDx48VLAf0May2U7bpZsPdidiCPWvePGgiFd/iDf/jUmbg7Y+GHi8N8PMPDeUQqNlfRsrq2vrG5uFreL2zu7evnlw2NJBpDg0eSAD1XGZBil8aKJACZ1QAfNcCW13cpn57QdQWgT+HcYhOB4b+WIoOMNU6pulm/sewiMqL7kCPi1m6Jtlq2LNQJeJnZMyydHom1+9QcAjD3zkkmndta0QnYQpFFzCtNiLNISMT9gIuin1mQfaSWbHT+lJqgzoMFBp+Uhn6u+JhHlax56bdnoMx3rRy8T/vG6EwwsnEX4YIfh8vmgYSYoBzZKgA6GAo4xTwrgS6a2Uj5liHNO8shDsxZeXSatasWuV2u1ZuV7N4yiQI3JMTolNzkmdXJMGaRJOYvJMXsmb8WS8GO/Gx7x1xchnSuQPjM8faW+TRA==</latexit>

LEnc
<latexit sha1_base64="hsLCwsvcLoF3ruaKtobkw3+YAU4=">AAAB/HicbVBNS8NAEN3Ur1q/oj16CRbBU0mKVI8FETx4qGA/oI1ls920SzebsDsRQ6h/xYsHRbz6Q7z5b9zEHLT1wcDjvRlm5nkRZwps+8sorayurW+UNytb2zu7e+b+QVeFsSS0Q0Ieyr6HFeVM0A4w4LQfSYoDj9OeN7vI/N49lYqF4haSiLoBngjmM4JBSyOzen03BPoAMkgvBZlXMozMml23c1jLxClIDRVoj8zP4TgkcUAFEI6VGjh2BG6KJTDC6bwyjBWNMJnhCR1oKnBAlZvmx8+tY62MLT+UugRYufp7IsWBUkng6c4Aw1Qtepn4nzeIwT93UyaiGKh+LV/kx9yC0MqSsMZMUgI80QQTyfStFpliiQnovLIQnMWXl0m3UXea9ebNaa3VKOIoo0N0hE6Qg85QC12hNuogghL0hF7Qq/FoPBtvxvtPa8koZqroD4yPb3jek04=</latexit>

Xavier Bresson 57
58

Transformers in 2017

Machine Translation
WMT-2014 dataset
BLEU score

LSTM + Attention EN-DE EN-FR


Google Brain’s neural machine GNMT 24.6 39.9
translation system in 2016
ConvSeq2Seq 25.2 40.5
Transformer 28.4 41.8

CNN
Facebook Research’s Transformer is 3x faster to train than LSTM and CNN.
Convolutional sequence to
sequence learning
Transformer has 24 layers vs LSTM w/ 3 layers and
CNN w/ 40 layers.

Xavier Bresson 58
59

Outline

Language Models

Memory Networks

Transformers

Language Model Transformers

Sequence-To-Sequence Transformers

Transfer Learning

Conclusion

Xavier Bresson 59
60

Transfer learning with language models

A major theme in NLP since 2019 is sequential transfer learning.


Pre-trained language models on large-scale corpus (for capturing language prior) and
Post-trained to new tasks s.a. document classification, Q&A, named-entity recognition,
etc by fine-tuning some layers on top of the pre-trained network.
Best performance for NLP tasks.

Corpus: Language Tasks:


Pre-training Models: Post-training
blablablabl QA
ablablablab Word2Vec Sentence
lablablabla Glove labeling
ELMO Adaptation/
blablablabl
ablablablab BERT Transfer
GPT

Large dataset Sequential Transfer Small dataset


Learning

Xavier Bresson 60
61

Language modeling

Language modeling is a self-supervised task.


It is unsupervised in the sense that there is no need for human labeled data.
Additionally, large-scale unlabeled datasets are available for training.
Train with billions or trillions of words from public datasets s.a. Wikipedia, Reddit, etc.

English Wikipedia alone has Words Posted to Reddit:


3.9 billion words (2021) 72 billion (2015)

Xavier Bresson 61
62

BERT
Bi-directional Encoder Representation for Transformers (Devlin-et-al Google Brain 2019)
Use positional encoding, class index and sentence index.
Trained with two levels of hierarchical context :
Local dependencies with word prediction.
Global dependencies with sequence prediction.

Self-attention transformer layers xL

x0 x1 x2 x3

Encoder
Sum Sum Sum Sum

w0 p0 s0 w1 p1 s0 w2 p2 s0 w3 p3 s0

Discrete
Class Embedding
index
CLS0 0 SEN0 the 1 SEN0 cat 2 SEN0 sleeps 3 SEN0

Positional Sentence
Feature index

Xavier Bresson 62
63

Training with hidden words


Train by predicting words
Randomly replace x% of words by token MASK.
Randomly replace y% of words by random cat
Learn context-to-word
words, s.a. “car”. representation
Linear for transfer learning
Randomly replace z% of word index by same Layer

words, here “cat”. h2

Self-attention transformer layers xL

x0 x1 x2 x3

Sum Sum Sum Sum

w0 p0 s0 w1 p1 s0 w2 p2 s0 w3 p3 s0
Discrete
Class Embedding
index
CLS0 0 SEN0 the 1 SEN0 MASK 2 SEN0 sleeps 3 SEN0

Positional Sentence
Feature index

Xavier Bresson 63
64

Training with sentence prediction


Train by predicting the next sentence
Use CLS token : CLS = IsNext / NotNext
Positive example/consecutive pair of sentences :
Class
Label [CLS] the cat sleeps [SEP] it wakes up
Negative example/random pair of sentences :
h0 [CLS] the cat sleeps [SEP] John drives fast

Self-attention transformer layers xL

x0 x1 x2 x3

Sum Sum Sum Sum

w0 p0 s0 w1 p1 s0 w2 p2 s0 w3 p3 s0

Discrete
Embedding

CLS0 0 SEN0 the 1 SEN0 cat 2 SEN0 sleeps 3 SEN0

Xavier Bresson 64
65

Training

BERT base
12 Transformers layers
768 hidden features
12 Attention heads
110M parameters
BERT large
340M parameters
Special tokenization of words with only 30K tokens.
Dataset of 3B words
Training took 256 TPU days (Oct 2018)
Fine-tune on sentence classification, named-entity recognition (word classification), Q&A, etc.

Xavier Bresson 65
66

GPT-2

Improving Language Understanding by Generative


Pre-Training (GPT) (Radford-et-al OpenAI 2018)
Pre-trained on 8M webpages, WebText 40GB.
SOTA on 7 NLP tasks without fine-tuning,
simply by zero-shot learning !
1.5B parameters
2048 GPUs

Xavier Bresson 66
67

GPT-3

Introduced in Brown-et-al OpenAI 2020


Pre-trained on multiple datasets.
175B parameters
“The supercomputer developed for OpenAI is a single
system with more than 285,000 CPU cores, 10,000 GPUs
and 400 gigabits per second of network connectivity for
each GPU server”
US$12 Million to train

Xavier Bresson 67
68

GPT-4

Introduced in Achiam-et-al OpenI 2023


Pre-trained on dataset ?
? parameters
Training time ?

Xavier Bresson 68
69

Outline

Language Models

Memory Networks

Transformers

Language Model Transformers

Sequence-To-Sequence Transformers

Transfer Learning

Conclusion

Xavier Bresson 69
70

Conclusion

Human attention mechanism allows to focus biological resources on a small set of important
things (visual, sound, cognitive signals) to make decisions.
ANNs are a generic/universal architecture to process any unstructured datasets, a.k.a. sets.
Attention is “eating” deep learning.
Transformers for Computer Vision with Visual Transformers (Dosovitskiy-et-al Google Brain
2021).
Transformers for Graphs with Graph Transformers.
Issue with long sequences because complexity is (L2d).

Xavier Bresson 70
71

Reducing complexity

Long sequence issue with O(L2d), n being sequence length and d hidden dimension.
Sparse transformers s.a. BigBird (Zaheer-et-al Google Brain 2021).

Original Structured
Transformers Transformers

Xavier Bresson 71
72

Interpretability

What does BERT look at? (Clark-et-al, 2019) An Analysis of Transformer’s attention heads.

⇣ QK T ⌘
<latexit sha1_base64="UVBbZQpecFTQ/Y+KgyzK2fgGTcY=">AAACTXicbVHLahsxFNW4eThuk7jtshsRU0g3ZiaENMuQbgrNIi8nAcsxGo3GEdFjKt1pbcT8YDaB7voX3XSRUEo0jgvN44DgcM693HuP0kIKB3H8M2q8mJtfWGwutV6+Wl5Zbb9+c+JMaRnvMSONPUup41Jo3gMBkp8VllOVSn6aXn6q/dNv3Dph9DFMCj5QdKRFLhiFIA3bGQE+Bqv8kclB0XE1/CdY870iu2K0jkluKfMH+Mv5ceWJ+2rBZ1WFMa7tD5gITRSFizT1h9W53yMgFHd4L1QQ0qoxbHfibjwFfkqSGemgGfaH7R8kM6xUXAOT1Ll+Ehcw8NSCYJJXLVI6XlB2SUe8H6imYeDAT9Oo8PugZDg3NjwNeKr+3+Gpcm6i0lBZr+0ee7X4nNcvId8eeKGLErhm94PyUmIwuI4WZ8JyBnISCGVWhF0xu6AhOwgfUIeQPD75KTnZ6CZb3a2Dzc7OxiyOJnqH1tA6StBHtIM+o33UQwxdoV/oBt1G19Hv6E/09760Ec163qIHaCzeAYRzs7o=</latexit>

Softmaxrow p 2 RL⇥L
d

Xavier Bresson 72
73

Open-source

Training large-scale models is costly in terms of time, money, CO2 :

Available pre-trained/post-trained models :


PyTorch (Llama), TensorFlow (Gemma)
Hugging Face, https://huggingface.co

Xavier Bresson 73
74

Questions?

Xavier Bresson 74

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy