D 02 Large Language Models
D 02 Large Language Models
Architecture
Large Language Models
Quick Recap
It resembles a complex function, designed to predict the probability of word sequences within a
specific language corpus.
𝑃(This is a new technology) = 𝑃(This) 𝑃(is|This) 𝑃(a|This is) 𝑃(new|This is a) 𝑃(technology|This is a new)
Language Models: Calculation
Solution: Sentence 1 gets a high probability, leveraging common context, and in sentence 2, rare and
challenging words result in a lower probability.
Power of Language Models
Chatbots
Text classification
Demo: Text Generation
Duration: 20 minutes
Imagine you are on a quest to understand the intricate art of text generation, where a computer
learns the patterns of a given writing style and crafts its sentences.
Today’s session will explore a Python script designed for educational purposes. This script employs
the Natural Language Toolkit (NLTK) and the Brown corpus to demonstrate text generation through a
Markov chain model using trigrams.
Note
Please download the solution document from the Reference Material Section and follow
the Jupyter Notebook for step-by-step execution.
Quick Check
A. Text generation
B. Machine translation
C. Speech recognition
D. Image processing
Large Language Models
Large Language Models
Large Language Models (LLMs) are state-of-the-art AI models designed to comprehend and generate
human language.
This process involves breaking down text into smaller units called tokens, which can be words, phrases, or even
individual characters.
This embedding component maps tokens to a high-dimensional vector space, representing each token with a
unique vector.
This attention mechanism lets the model concentrate on specific parts of the input text when generating
output.
This involves pretraining LLMs on extensive text data to understand the underlying patterns and structures of
human language.
This component allows the model to adapt to new tasks by fine-tuning the pre-trained model on a smaller
dataset.
This employs the Transformer framework in a large language model architecture, comprising two main parts:
an encoder and a decoder.
This necessitates significant computational resources for training and upkeep, making scaling a challenging but
essential part of its architecture.
• Input embeddings
• Positional encoding
• Encoder
o Attention mechanism
o Feed-forward neural network
• Decoder
• Multi-headed attention
• Layer normalization
• Output
LLM Operations
Input embeddings
Positional encoding
Layer normalization
Output
LLM Operations
Input embeddings
Positional encoding
Encoder
• The machine wants to understand not just what words are
there but also their order in the sentence.
Decoder
• So, it adds some extra information to the code to show
where each word is in the sentence.
Multi-headed attention
Layer normalization
Output
LLM Operations
Input embeddings
Positional encoding
• Encoder: Now, the machine gets to work on analyzing the
Encoder sentence. It creates a bunch of memories to remember what
it has read.
Decoder • Attention mechanism: The machine pays more attention to
some words depending on their importance in the sentence.
Multi-headed attention • Feed forward: After paying attention to words, the machine
thinks hard about each word on its own.
Layer normalization
Output
LLM Operations
Input embeddings
Positional encoding
Encoder • The machine not only understands but also generates new
sentences.
Decoder • For this, it has a special part called the decoder.
• The decoder helps the machine predict what word comes
Multi-headed attention next based on what it has understood so far.
Layer normalization
Output
LLM Operations
Input embeddings
Positional encoding
Encoder
• The machine looks at the words in different ways
simultaneously.
Decoder
• This helps the machine grasp different aspects of the
sentence all at once.
Multi-headed attention
Layer normalization
Output
LLM Operations
Input embeddings
Positional encoding
Encoder
• This layer is in place to keep everything in check and make
Decoder sure the machine learns well.
• The machine normalizes its understanding at each step.
Multi-headed attention
Layer normalization
Output
LLM Operations
Input embeddings
Positional encoding
Layer normalization
Output
LLM Training Steps
A. Tokenization
B. Embedding
C. Neural network training
D. Fine-tuning
Types of Large Language Models (LLMs)
Types of LLMs
GPT 3.5
and GPT 4 PaLM Claude
This model is a sophisticated addition to OpenAI's GPT series, pushing the boundaries of language
processing.
This is a big language model created by OpenAI. It uses GPT-3's strengths, reaching new levels of scale
and performance.
It's not as commonly used as other models and may lack extensive
Cons
support.
Types of LLMs: Claude V1
It creates clear and interesting answers, and you can fine-tune it for
Pros
specific topics.
Pros It manages various tasks and can be fine-tuned for specific areas.
This is a foundational large language model from the Technology Innovation Institute (TII) in the United
Arab Emirates.
It is known for its quick processing speed, which makes it perfect for
Pros
real-time applications.
It is an autoregressive Large Language Model trained on extensive text data using industrial-scale
computational resources.
Bloom’s Architecture
It is trained on a It boasts a
massive 1.6TB of staggering 176
text data. billion parameters.
BLOOM
A. Cross-Modal Learning
B. Few-Shot Learning
C. Chain-of-Thought Prompting
D. Self-Supervised Learning
LLM Considerations and Future Implications
LLM Considerations
Deployment cost
Testing and evaluation
considerations
Technical Considerations