100% found this document useful (1 vote)
106 views

D 02 Large Language Models

Uploaded by

srinivasa p
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
106 views

D 02 Large Language Models

Uploaded by

srinivasa p
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Advanced Generative AI: Models and

Architecture
Large Language Models
Quick Recap

• How do Generative AI features contribute to


different domains like healthcare, finance,
and others?
• What emerging trends in Generative AI do
you foresee shaping the future?
Engage and Think

What if Large Language Models (LLMs) could generate


completely original and human-like text in any
language or programming language?

How would this revolutionize the way humans


communicate and interact with technology?
Learning Objectives

By the end of this lesson, you will be able to:

Develop an understanding of the core components and architecture


of Large Language Models (LLMs)
Experiment with analyzing the LLM in action and its training process,
encompassing tokenization, embedding, neural network training,
and fine-tuning
Identify the functioning of LLMs, focusing on how they generate
human-like text and respond to prompts
Organize a comparison and contrast of various LLMs
Language Models
Language Models

A language model is a probabilistic machine learning entity.

It resembles a complex function, designed to predict the probability of word sequences within a
specific language corpus.

It is represented as: 𝑃(Any sentence here)


Language Models: Equation

Language models operate by assigning probabilities to sequences of words.

Mathematically, it looks like this:

𝑃 𝜔1 , 𝜔2 ⋯ , 𝜔𝑛 = 𝑃 𝜔1 ⋅ 𝑃 𝜔2 ȁ𝜔1 ⋅ 𝑃 𝜔3 ȁ𝜔1 , 𝜔2 ⋅…⋅ 𝑃 𝜔𝑛 ȁ𝜔1 , 𝜔2 , … , 𝜔𝑛− 1


Language Models: Example

Consider the sentence: This is a new technology.

The language model calculates the probability of the sentence as:

𝑃(This is a new technology)

𝑃(This is a new technology) = 𝑃(This) 𝑃(is|This) 𝑃(a|This is) 𝑃(new|This is a) 𝑃(technology|This is a new)
Language Models: Calculation

To illustrate, let's calculate the probability of two different sentences:

1. 𝑃(This is a fluffy dog.)


2. 𝑃(This are a purple flying deer.)

Solution: Sentence 1 gets a high probability, leveraging common context, and in sentence 2, rare and
challenging words result in a lower probability.
Power of Language Models

The powers of language models extend beyond just sentence prediction.

They are incredibly versatile. They can answer questions.


Applications of Language Models

Chatbots

Text generation Code completion

Text summarization Applications Sentiment analysis

Text correction Machine translation

Text classification
Demo: Text Generation

Duration: 20 minutes

Imagine you are on a quest to understand the intricate art of text generation, where a computer
learns the patterns of a given writing style and crafts its sentences.

Today’s session will explore a Python script designed for educational purposes. This script employs
the Natural Language Toolkit (NLTK) and the Brown corpus to demonstrate text generation through a
Markov chain model using trigrams.

Note
Please download the solution document from the Reference Material Section and follow
the Jupyter Notebook for step-by-step execution.
Quick Check

Which of the following is not an application of


language models?

A. Text generation
B. Machine translation
C. Speech recognition
D. Image processing
Large Language Models
Large Language Models

Large Language Models (LLMs) are state-of-the-art AI models designed to comprehend and generate
human language.

Refers to the significant size and complexity of these models, which


Large
contains hundreds of millions or even billions of parameters

Denotes their primary function, which is to understand and generate


Language
human language

Describes them as mathematical representations that capture the


Model
patterns and structure of language data
Components of LLMs

Tokenization Embedding Attention Pretraining

This process involves breaking down text into smaller units called tokens, which can be words, phrases, or even
individual characters.

Transfer learning Encoder and Scaling


decoder
Components of LLMs

Tokenization Embedding Attention Pretraining

This embedding component maps tokens to a high-dimensional vector space, representing each token with a
unique vector.

Transfer learning Encoder and Scaling


decoder
Components of LLMs

Tokenization Embedding Attention Pretraining

This attention mechanism lets the model concentrate on specific parts of the input text when generating
output.

Transfer learning Encoder and Scaling


decoder
Components of LLMs

Tokenization Embedding Attention Pretraining

This involves pretraining LLMs on extensive text data to understand the underlying patterns and structures of
human language.

Transfer learning Encoder and Scaling


decoder
Components of LLMs

Tokenization Embedding Attention Pretraining

This component allows the model to adapt to new tasks by fine-tuning the pre-trained model on a smaller
dataset.

Transfer learning Encoder and Scaling


decoder
Components of LLMs

Tokenization Embedding Attention Pretraining

This employs the Transformer framework in a large language model architecture, comprising two main parts:
an encoder and a decoder.

Transfer learning Encoder and Scaling


decoder
Components of LLMs

Tokenization Embedding Attention Pretraining

This necessitates significant computational resources for training and upkeep, making scaling a challenging but
essential part of its architecture.

Transfer learning Encoder and Scaling


decoder
LLM Architecture

Components of LLM architecture

• Input embeddings
• Positional encoding
• Encoder
o Attention mechanism
o Feed-forward neural network
• Decoder
• Multi-headed attention
• Layer normalization
• Output
LLM Operations

These represent the functions of components within an architecture.

Input embeddings

Positional encoding

Encoder • The machine takes in a sentence and breaks it down into


smaller pieces.
Decoder • Each of these pieces is turned into a special kind of code that
the machine can understand.
Multi-headed attention • This code holds the meaning of the words.

Layer normalization

Output
LLM Operations

These represent the functions of components within an architecture.

Input embeddings

Positional encoding

Encoder
• The machine wants to understand not just what words are
there but also their order in the sentence.
Decoder
• So, it adds some extra information to the code to show
where each word is in the sentence.
Multi-headed attention

Layer normalization

Output
LLM Operations

These represent the functions of components within an architecture.

Input embeddings

Positional encoding
• Encoder: Now, the machine gets to work on analyzing the
Encoder sentence. It creates a bunch of memories to remember what
it has read.
Decoder • Attention mechanism: The machine pays more attention to
some words depending on their importance in the sentence.
Multi-headed attention • Feed forward: After paying attention to words, the machine
thinks hard about each word on its own.
Layer normalization

Output
LLM Operations

These represent the functions of components within an architecture.

Input embeddings

Positional encoding

Encoder • The machine not only understands but also generates new
sentences.
Decoder • For this, it has a special part called the decoder.
• The decoder helps the machine predict what word comes
Multi-headed attention next based on what it has understood so far.

Layer normalization

Output
LLM Operations

These represent the functions of components within an architecture.

Input embeddings

Positional encoding

Encoder
• The machine looks at the words in different ways
simultaneously.
Decoder
• This helps the machine grasp different aspects of the
sentence all at once.
Multi-headed attention

Layer normalization

Output
LLM Operations

These represent the functions of components within an architecture.

Input embeddings

Positional encoding

Encoder
• This layer is in place to keep everything in check and make
Decoder sure the machine learns well.
• The machine normalizes its understanding at each step.
Multi-headed attention

Layer normalization

Output
LLM Operations

These represent the functions of components within an architecture.

Input embeddings

Positional encoding

Encoder • Finally, the machine produces its own understanding or


generates new sentences.
Decoder • The output depends on what the machine is designed to do.
• For example, if it's predicting the next word in a sentence, it
Multi-headed attention gives a probability for each word.

Layer normalization

Output
LLM Training Steps

The steps in the training process of a language model are:

Corpus preparation Tokenization

Neural network training Embedding generation


Quick Check

When considering the architecture of Large Language


Models (LLMs), which of the following components is
responsible for generating human-like text and
responding to prompts?

A. Tokenization
B. Embedding
C. Neural network training
D. Fine-tuning
Types of Large Language Models (LLMs)
Types of LLMs

Below are the various pretrained LLMs available in the market:

GPT 3.5
and GPT 4 PaLM Claude

Cohere Falcon LLaMA


Types of LLMs: GPT 3.5

This model is a sophisticated addition to OpenAI's GPT series, pushing the boundaries of language
processing.

It delivers outstanding performance across a variety of natural


Performance
language processing tasks.

The model excels at executing a broad spectrum of natural language


Pros
processing tasks.

Compared to GPT-4, this model may generate more restricted content


Cons
and is considered less advanced.
Types of LLMs: GPT 4

This is a big language model created by OpenAI. It uses GPT-3's strengths, reaching new levels of scale
and performance.

It performs like a human on tests, scoring in the top 10% on a simulated


Performance
bar exam.

This is a top-notch language model, handling tough problems more


Pros
accurately and is multimodal.

Cons It is likely to be more expensive than other language models.


Types of LLMs: PaLM 2

This is a Google AI-developed next-gen LLM.

It excels in reasoning tests, outdoing its predecessor on various


Performance
NLP benchmarks.

Pros It can process both text and image inputs.

It's not as commonly used as other models and may lack extensive
Cons
support.
Types of LLMs: Claude V1

This is a big language model crafted by Anthropic, an AI research company.

It performs better, provides longer answers, and can be used through


Performance
an API and the public beta site, claude.ai.

It creates clear and interesting answers, and you can fine-tune it for
Pros
specific topics.

It requires a large amount of training data to achieve optimal


Cons
performance.
Types of LLMs: Cohere

This is a big language model made by Cohere Technologies.

It excels in various natural language processing tasks, showing


Performance
remarkable performance.

Pros It manages various tasks and can be fine-tuned for specific areas.

Cons Training it demands a lot of computational resources.


Types of LLMs: Falcon

This is a foundational large language model from the Technology Innovation Institute (TII) in the United
Arab Emirates.

Its performance stands out, boasting high accuracy, robustness, and


Performance
efficiency.

It is known for its quick processing speed, which makes it perfect for
Pros
real-time applications.

It might not be suitable for tasks requiring advanced natural language


Cons
processing capabilities.
Types of LLMs: LLaMA

This is a family of LLMs launched by Meta AI in February 2023.

It shows outstanding performance in various natural language


Performance
processing tasks.

It understands context-rich information, enhancing its effectiveness in


Pros
complex tasks.

Cons It might unintentionally produce biased or inaccurate content.


Bloom
Bloom Overview

It is an autoregressive Large Language Model trained on extensive text data using industrial-scale
computational resources.
Bloom’s Architecture

BLOOM adopts a conventional decoder-only transformer architecture.


Bloom’s Architecture

It features several notable modifications, including:

This component enhances the model's capacity to generalize to longer


ALiBi
context lengths beyond what it encounters during training.

Embedding layer An additional layer of normalization is introduced after the model's


norm embedding layer, contributing to enhanced training stability.
Unpacking Bloom

It is trained on a It boasts a
massive 1.6TB of staggering 176
text data. billion parameters.

BLOOM

It excels in text generation: It is an architecture,


46 natural and 13 rooted in an auto-
programming languages. regressive model.
LLM Reasoning

The LLM explores varied reasoning, including common


Diverse reasoning sense and math, adapting to diverse contexts.

Methods like chain-of-thought prompting guide LLMs to


Eliciting reasoning stimulate and prompt thoughtful reasoning.

Reasoning contribution The challenge lies in understanding reasoning's role and


enigma impact, differentiating it from factual information.
Quick Check

Which method can be utilized to unleash the


reasoning capabilities of LLMs?

A. Cross-Modal Learning
B. Few-Shot Learning
C. Chain-of-Thought Prompting
D. Self-Supervised Learning
LLM Considerations and Future Implications
LLM Considerations

There are two types of considerations for choosing an LLM:

Critical considerations: Technical considerations:

Evaluate non-technical aspects, like Assess performance, architecture,


ethics and biases. and computational requirements.
Critical Considerations

The critical considerations for choosing an LLM are:

Practical factors for inference


Licensing and commercial use
speed and precision

The impact of context length Task-specific vs. general-


and model size purpose

Deployment cost
Testing and evaluation
considerations
Technical Considerations

The technical considerations for choosing an LLM are:

Data security and privacy Model inference monitoring

Scalability and performance Version control and updating

APIs and integration security


Future Implications of LLMs

LLMs have far-reaching implications, which include:

• Job market disruption


• Enhancing productivity and creativity
• Societal impact
• Responsible use
• Evolving opportunities
Quick Check

What is not a potential future implication of using


LLMs in real-world applications?

A. Increased job opportunities and economic growth


B. Automation of tasks leading to job market
disruption
C. Enhanced productivity and creativity for individuals
and businesses
D. Ethical and societal considerations surrounding the
use of LLMs
Guided Practice

Overview Duration: 25 minutes


This activity focuses on testing understanding of diverse language models and their applications. It
presents scenarios that require applying learned concepts to solve problems or accomplish tasks.
Key Takeaways

Language model is a machine learning entity.


Large Language Models are trained on large datasets, and they can
generate human-like text, images, and many more.
Pretrained LLMs available in the market can be utilized for powerful
generative AI solutions
Bloom is an autoregressive LLM capable of generating text in 46
natural languages and 13 programming languages.
Q&A

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy