0% found this document useful (0 votes)

9 views

Demystifying LLMs

Uploaded by

민냥

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Demystifying LLMs

Uploaded by

민냥

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 53

Demystifying LLMs

Devendra Singh Chaplot

Mistral AI

Feb 13, 2024

Mistral AI
Co-Founders

Arthur Mensch Timothée Lacroix Guillaume Lample

CEO CTO Chief Scientist
Former AI researcher at Former AI researcher at Former AI Researcher at
DeepMind, Polytechnique alum Meta, ENS alum Meta, Polytechnique alum

Releases

$500M+ funding, Of ces in Paris/London/SF Bay Area

fi
Mistral AI LLMs
Contents
• Stages of LLM Training:

• Pretraining

• Instruction-Tuning

• Learning from Human Preferences: DPO/RLHF

• Evaluation of LLMs

• Retrieval Augmented Generation (RAG)

• Recipe for RAG with code

Stages of LLM Training
1. Pretraining

2. Instruction-Tuning

3. Learning from Human Feedback

Stages of LLM Training
1. Pretraining

2. Instruction-Tuning

3. Learning from Human Feedback

Pretraining
Pretraining
We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same
architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e.
experts). For every token, at each layer, a router network selects two experts to process the current
state and combine their outputs. Even though each token only sees two experts, the selected experts
can be different at each timestep. As a result, each token has access to 47B parameters, but only uses
13B active parameters during inference. Mixtral was trained with a context size of 32k tokens and it
outperforms or matches Llama 2 70B and GPT-3.5 across all evaluated benchmarks. In particular,
Mixtral vastly outperforms Llama 2 70B on mathematics, code generation, and multilingual benchmarks.
We also provide a model
fi
Pretraining
We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same
architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e.
experts). For every token, at each layer, a router network selects two experts to process the current
state and combine their outputs. Even though each token only sees two experts, the selected experts
can be different at each timestep. As a result, each token has access to 47B parameters, but only uses
13B active parameters during inference. Mixtral was trained with a context size of 32k tokens and it
outperforms or matches Llama 2 70B and GPT-3.5 across all evaluated benchmarks. In particular,
Mixtral vastly outperforms Llama 2 70B on mathematics, code generation, and multilingual benchmarks.
We also provide a model

Large Language Model

O(1-100B) parameters

We
fi
Pretraining
We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same
architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e.
experts). For every token, at each layer, a router network selects two experts to process the current
state and combine their outputs. Even though each token only sees two experts, the selected experts
can be different at each timestep. As a result, each token has access to 47B parameters, but only uses
13B active parameters during inference. Mixtral was trained with a context size of 32k tokens and it
outperforms or matches Llama 2 70B and GPT-3.5 across all evaluated benchmarks. In particular,
Mixtral vastly outperforms Llama 2 70B on mathematics, code generation, and multilingual benchmarks.
We also provide a model

introduce

Large Language Model

O(1-100B) parameters

introduce Mixtral

Large Language Model

O(1-100B) parameters

We introduce
fi
Pretraining
We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same
architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e.
experts). For every token, at each layer, a router network selects two experts to process the current
state and combine their outputs. Even though each token only sees two experts, the selected experts
can be different at each timestep. As a result, each token has access to 47B parameters, but only uses
13B active parameters during inference. Mixtral was trained with a context size of 32k tokens and it
outperforms or matches Llama 2 70B and GPT-3.5 across all evaluated benchmarks. In particular,
Mixtral vastly outperforms Llama 2 70B on mathematics, code generation, and multilingual benchmarks.
We also provide a model

introduce Mixtral 8x7B , a Sparse Mixture

Large Language Model

O(1-100B) parameters

We introduce Mixtral 8x7B , a Sparse

fi
Pretraining
• Task: Next token prediction
Pretraining
• 1 token ~= 0.75 word
introduce Mixtral 8x7B
• Vocab size: O(10K) tokens
• Each token is represented by an integer
Large Language Model
(LLM)

We introduce Mixtral
Pretraining
• Why is it hard?
Pretraining
• Why is it hard?
• Time: Datasets are huge - O(1T) tokens
• Preprocessing, Cleaning, Deduplication
• More data might not lead to better model

Llama pretraining data mixture

Pretraining
• Why is it hard?
• Time: Datasets are huge - O(1T) tokens
• Preprocessing, Cleaning, Deduplication
• More data might not lead to better model
• Money: O(1-100B) parameters
• O(1-10K) GPUs for weeks or months Llama pretraining data mixture
• O($10-100M) per model
Pretraining
• Why is it hard?
• Time: Datasets are huge - O(1T) tokens
• Preprocessing, Cleaning, Deduplication
• More data might not lead to better model
• Money: O(1-100B) parameters
• O(1-10K) GPUs for weeks or months Llama pretraining data mixture
• O($10-100M) per model
• YOLO: decide model architecture, hyper
parameters, data mixture for the “big run”
• Best hyper-parameters for a smaller model
Llama Model sizes, architectures, and
might not be the best for a larger model optimization hyper-parameters.
Can we use the Pretrained model?
Can we use the Pretrained model?
Prompt:

Write a python function to nd whether the input number is prime.

fi
Can we use the Pretrained model?
Prompt:

Write a python function to nd whether the input number is prime.

Response:
Example:

Input: 17
Output: True

Input: 15
Output: False

Approach:

1. Let’s assume that the input number is n

2. Check if n is divisible by 1. If n is divisible by 1 then it is not a prime number.
3. Check if n is divisible by 2. If n is divisible by 2 then it is not a prime number.
fi
Can we use the Pretrained model?
Prompt:

def is_prime(x: int):

"""
takes as input an integer x. Returns True if x is prime and False otherwise
"""
Can we use the Pretrained model?
Prompt:

def is_prime(x: int):

"""
takes as input an integer x. Returns True if x is prime and False otherwise
"""

Response:

if x <= 1:
return False
for i in range(2, int(x ** 0.5) + 1):
if x % i == 0:
return False
return True
Can we use the Pretrained model?
Prompt:

def is_prime(x: int):

"""
takes as input an integer x. Returns True if x is prime and False otherwise
"""

Response:

if x <= 1:
return False
for i in range(2, int(x ** 0.5) + 1):
if x % i == 0:
return False
return True

Model knows the answer but it is not aligned with human preferences
Stages of LLM Training
1. Pretraining

2. Instruction-Tuning

3. Learning from Human Feedback

Instruction Tuning
Prompt: [INST] Write a python function to nd whether the input number is prime. [\INST]

def is_prime(x: int):

“""
takes as input an integer x. Returns True if x is prime and False otherwise
"""
if x <= 1:
Response: return False
for i in range(2, int(x ** 0.5) + 1):
if x % i == 0:
return False
return True
fi
Instruction Tuning
Prompt: [INST] Write a python function to nd whether the input number is prime. [\INST]

def is_prime(x: int):

“""
takes as input an integer x. Returns True if x is prime and False otherwise
"""
if x <= 1:
Response: return False
for i in range(2, int(x ** 0.5) + 1):
if x % i == 0:
return False
return True

Large Language Model

O(1-100B) parameters

[INST] Write … [\INST]

fi
Instruction Tuning
Prompt: [INST] Write a python function to nd whether the input number is prime. [\INST]

def is_prime(x: int):

“""
takes as input an integer x. Returns True if x is prime and False otherwise
"""
if x <= 1:
Response: return False
for i in range(2, int(x ** 0.5) + 1):
if x % i == 0:
return False
return True

def

Large Language Model

O(1-100B) parameters

[INST] Write … [\INST] def

fi
Instruction Tuning
Prompt: [INST] Write a python function to nd whether the input number is prime. [\INST]

def is_prime(x: int):

“""
takes as input an integer x. Returns True if x is prime and False otherwise
"""
if x <= 1:
Response: return False
for i in range(2, int(x ** 0.5) + 1):
if x % i == 0:
return False
return True

def is_prime

Large Language Model

O(1-100B) parameters

[INST] Write … [\INST] def

fi
Instruction Tuning
Prompt: [INST] Write a python function to nd whether the input number is prime. [\INST]

def is_prime(x: int):

“""
takes as input an integer x. Returns True if x is prime and False otherwise
"""
if x <= 1:
Response: return False
for i in range(2, int(x ** 0.5) + 1):
if x % i == 0:
return False
return True

def is_prime (x) :

Large Language Model

O(1-100B) parameters

[INST] Write … [\INST] def is_prime (x)

fi
Instruction Fine-tuning

• Dataset Instruction-tuning

• Paired: (Prompt, Response)

def is_prime
• O(10-100K instructions)

Large Language Model

(LLM)

[INST] … [\INST] Def

Instruction Fine-tuning

• Dataset Instruction-tuning

• Paired: (Prompt, Response)

def is_prime
• O(10-100K instructions)
• Task:
Large Language Model
• Next word prediction (Masked) (LLM)

[INST] … [\INST] Def

Instruction Fine-tuning

• Dataset Instruction-tuning

• Paired: (Prompt, Response)

def is_prime
• O(10-100K instructions)
• Task:
Large Language Model
• Next word prediction (Masked) (LLM)
• Compute:
• O(1-100) GPUs [INST] … [\INST] Def

• Few hrs/days
Stages of LLM Training
1. Pretraining

2. Instruction-Tuning

3. Learning from Human Feedback

Human Preferences
Human preferences are cheaper/easier than human annotation

Prompt: [INST] Write a python function to nd whether the input number is prime. [\INST]

def is_prime(x: int):

“""
takes as input an integer x. Returns True if x is prime and False otherwise
"""
if x <= 1:
Response 1: return False
for i in range(2, int(x ** 0.5) + 1):
if x % i == 0:
return False
return True
def is_prime(x: int):
“""
takes as input an integer x. Returns True if x is prime and False otherwise
"""
if x <= 1:
Response 2: return False
for i in range(2, x):
if x % i == 0:
return False
return True
fi
Human Preferences
Human preferences are cheaper/easier than human annotation

Prompt: [INST] Write a python function to nd whether the input number is prime. [\INST]

def is_prime(x: int):

“""
takes as input an integer x. Returns True if x is prime and False otherwise
"""
if x <= 1:
Response 1: return False
for i in range(2, int(x ** 0.5) + 1):
if x % i == 0:
return False
return True
def is_prime(x: int):
“""
takes as input an integer x. Returns True if x is prime and False otherwise
"""
if x <= 1:
Response 2: return False
for i in range(2, x):
if x % i == 0:
return False
return True
Response 1 > Response 2
fi
Reinforcement Learning
from Human Feedback (RLHF)

[Deep Reinforcement Learning from Human Preferences. Christiano et al. 2017]

Direct Preference Optimization (DPO)

[Deep Reinforcement Learning from Human Preferences. Christiano et al. 2017]

[Direct Preference Optimization: Your Language Model is Secretly a Reward Model. Rafailov et al. 2023]
Stages of LLM Training
Pretraining Instruction-Tuning Learning from Human Feedback

Dataset: Dataset: Dataset:

Raw text Paired: (Prompt, Response) Human Preference Data
Few trillions of tokens O(10-100K instructions) O(10-100K)
Task: Task: Task:
Next word prediction Next word prediction (Masked) RLHF/DPO
Compute: Compute: Compute:
O(1-10K) GPUs O(1-100) GPUs O(1-100) GPUs
Weeks/months of training Few hrs/days Few hrs/days
Evaluation of LLMs
Evaluation of pretrained models
Evaluation of pretrained models
0-shot:

def is_prime(x: int):

"""
takes as input an integer x.
Returns True if x is prime and False otherwise
"""
Evaluation of pretrained models
0-shot:

def is_prime(x: int):

"""
takes as input an integer x.
Returns True if x is prime and False otherwise
"""

3-shot:
## How old is Barack Obama in 2014?

Barack Obama is 57 years old in 2014.

## What is Barack Obama’s birthday?

Barack Obama was born on August 4, 1961.

## What is the name of Barack Obama’s wife?

Barack Obama’s wife is Michelle Obama.

## How tall is Barack Obama?

Evaluation of pretrained models
0-shot:

def is_prime(x: int):

"""
takes as input an integer x.
Returns True if x is prime and False otherwise
"""

3-shot:
## How old is Barack Obama in 2014?

Barack Obama is 57 years old in 2014.

## What is Barack Obama’s birthday?

Barack Obama was born on August 4, 1961.

## What is the name of Barack Obama’s wife?

Barack Obama’s wife is Michelle Obama.

## How tall is Barack Obama?

Evaluation of Instruction-tuned models

https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard
Evaluation of Instruction-tuned models

LMSYS Chatbot Arena Leaderboard

https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard
Evaluation of Instruction-tuned models
• Proxies for human evaluation:
• MT Bench:

• Ask GPT-4 to score responses

• 0.90 correlation with human
preferences
• Alpaca Eval:

• Compare win-rate against GPT-4 (v2)

• 0.84 correlation with human
preferences
Practical tips
• Proprietary vs Open-Source
• For proprietary models:
• Prompt Engineering: Few-shot prompting, Chain-of-thought
• Retrieval Augmented Generation (RAG)
Practical tips
• Proprietary vs Open-Source
• For proprietary models:
• Prompt Engineering: Few-shot prompting, Chain-of-thought
• Retrieval Augmented Generation (RAG)
• For open-source
• Everything above
• Task-speci c ne-tuning and DPO: Need data and a bit of compute
fi
fi
Practical tips
Open-source Proprietary
• Proprietary vs Open-Source
• For proprietary models:
• Prompt Engineering: Few-shot prompting, Chain-of-thought
• Retrieval Augmented Generation (RAG)
• For open-source
• Everything above
• Task-speci c ne-tuning and DPO: Need data and bit of compute
• Balance performance vs cost (training and inference)
• Proprietary models higher general-purpose performance
• Open-source models can beat proprietary models on speci c tasks
with ne-tuning
• Proprietary models typically have higher inference cost
Price
0.42€ 1.8€ 7.5€
(per M tokens)
fi
fi
fi
fi
Retrieval Augmented Generation (RAG)
Retrieval Augmented Generation (RAG)
When do we need Retrieval Augmented Generation (RAG)?
• LLM doesn’t know everything, sometimes require task-speci c knowledge
• Sometimes you want LLMs to answer queries based on some data source to reduce hallucinations
• Knowledge resource doesn’t t in the context window of the LLM

[Figure from https://lemaoliu.github.io/retrieval-generation-tutorial/]

fi
fi
Recipe for RAG

[Figure from https://gradient ow.substack.com/p/best-practices-in-retrieval-augmented]

fl
Basic RAG code
https://docs.mistral.ai/guides/basic-RAG/

Transcription Course 2
No ratings yet
Transcription Course 2
11 pages
lec20.LLM
No ratings yet
lec20.LLM
58 pages
S 001: N Q A C LLM E: Afurai EW Ualitative Pproach For ODE Valuation
No ratings yet
S 001: N Q A C LLM E: Afurai EW Ualitative Pproach For ODE Valuation
22 pages
Viva Questions 2023
No ratings yet
Viva Questions 2023
21 pages
Rohan Reflections
No ratings yet
Rohan Reflections
8 pages
Mixtral of Experts
No ratings yet
Mixtral of Experts
13 pages
代码大模型
No ratings yet
代码大模型
18 pages
W03 Benchmarking
No ratings yet
W03 Benchmarking
25 pages
Foundations of Large Language Models 1738142777
No ratings yet
Foundations of Large Language Models 1738142777
101 pages
Foundations of LLM
No ratings yet
Foundations of LLM
231 pages
Deepseek LLM
No ratings yet
Deepseek LLM
48 pages
Model Pretraining
No ratings yet
Model Pretraining
11 pages
Textbooks Are All You Need
No ratings yet
Textbooks Are All You Need
26 pages
AI Professional Workshop
No ratings yet
AI Professional Workshop
32 pages
Answer: A
No ratings yet
Answer: A
48 pages
Open Mixture-of-Experts Language Models
No ratings yet
Open Mixture-of-Experts Language Models
61 pages
Achieving Peak Performance for Large Language
No ratings yet
Achieving Peak Performance for Large Language
34 pages
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
No ratings yet
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
52 pages
Deepseek-V2: A Strong, Economical, and Efficient Mixture-Of-Experts Language Model
No ratings yet
Deepseek-V2: A Strong, Economical, and Efficient Mixture-Of-Experts Language Model
50 pages
Balancing Enhancement, Harmlessness, and General Capabilities Enhancing Conversational LLMs With Direct RLHF
No ratings yet
Balancing Enhancement, Harmlessness, and General Capabilities Enhancing Conversational LLMs With Direct RLHF
13 pages
The Llama Hitchiking Guide to Local LLMs – hackerllama
No ratings yet
The Llama Hitchiking Guide to Local LLMs – hackerllama
13 pages
Summer Course Material
No ratings yet
Summer Course Material
52 pages
Deepseek v2 Tech Report
No ratings yet
Deepseek v2 Tech Report
50 pages
OpenCoder_1731317971
No ratings yet
OpenCoder_1731317971
35 pages
Efficient Large Language Models- A Survey
No ratings yet
Efficient Large Language Models- A Survey
67 pages
2108.07732v1
No ratings yet
2108.07732v1
34 pages
Inference Efficiency by Learning Task Complexity
No ratings yet
Inference Efficiency by Learning Task Complexity
9 pages
Platypus
No ratings yet
Platypus
17 pages
Exam Killer
100% (1)
Exam Killer
246 pages
Quiz AI2
No ratings yet
Quiz AI2
11 pages
Token-by-Token Regeneration and Domain Biases- A Benchmark of LLMs on Advanced Mathematical Problem-Solving
No ratings yet
Token-by-Token Regeneration and Domain Biases- A Benchmark of LLMs on Advanced Mathematical Problem-Solving
8 pages
Code Generation With LLMs
No ratings yet
Code Generation With LLMs
59 pages
Course 2 Outline
No ratings yet
Course 2 Outline
4 pages
Tutorial 1 Question
No ratings yet
Tutorial 1 Question
3 pages
ML Libraries
No ratings yet
ML Libraries
19 pages
pdf2306 08997 PDF
No ratings yet
pdf2306 08997 PDF
20 pages
Assessing Fine-Tuning Efficacy in LLMS: A Case Study With Learning Guidance Chatbots
No ratings yet
Assessing Fine-Tuning Efficacy in LLMS: A Case Study With Learning Guidance Chatbots
11 pages
Professional Machine Learning Engineer Demo
No ratings yet
Professional Machine Learning Engineer Demo
9 pages
Review 1 capstone
No ratings yet
Review 1 capstone
9 pages
LLM_introduction 2024
No ratings yet
LLM_introduction 2024
77 pages
50 LLM Interview Questions
No ratings yet
50 LLM Interview Questions
56 pages
P G - C 2: B L L M C R F: AN U Oder Oosting Arge Anguage Odels For Ode With Anking Eedback
No ratings yet
P G - C 2: B L L M C R F: AN U Oder Oosting Arge Anguage Odels For Ode With Anking Eedback
15 pages
Introduction To ML
No ratings yet
Introduction To ML
34 pages
Deep Learning Library PDF
No ratings yet
Deep Learning Library PDF
12 pages
2024 Findings-Eacl 141
No ratings yet
2024 Findings-Eacl 141
17 pages
2. 02 PyTorch, Datasets, and Models
No ratings yet
2. 02 PyTorch, Datasets, and Models
39 pages
survey
No ratings yet
survey
23 pages
Competition Level Code Generation With Alphacode
No ratings yet
Competition Level Code Generation With Alphacode
74 pages
week 11 chats
No ratings yet
week 11 chats
5 pages
ALL QUIZ QUESTIONS and ANSWERS v2.0.5
No ratings yet
ALL QUIZ QUESTIONS and ANSWERS v2.0.5
12 pages
ML Lab Manual
No ratings yet
ML Lab Manual
36 pages
Python ML Interview Questions
No ratings yet
Python ML Interview Questions
4 pages
python_genai_intqa 2
No ratings yet
python_genai_intqa 2
5 pages
s41586-025-08661-4
No ratings yet
s41586-025-08661-4
18 pages
New Microsoft Word Document 1
No ratings yet
New Microsoft Word Document 1
12 pages
Kalyan 1 s2.0 S2949719123000456 Main
No ratings yet
Kalyan 1 s2.0 S2949719123000456 Main
48 pages
Downloed Papers
No ratings yet
Downloed Papers
700 pages
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet
Basic Information About C language PDF
From Everand
Basic Information About C language PDF
Suraj Das
No ratings yet
Mastering Python: A Comprehensive Guide for Beginners and Experts
From Everand
Mastering Python: A Comprehensive Guide for Beginners and Experts
Rick Spair
No ratings yet
101 Productivity Boosting ChatGPT Prompts
100% (2)
101 Productivity Boosting ChatGPT Prompts
28 pages
Coin Metrics Crypto Asset Valuation Primer I
No ratings yet
Coin Metrics Crypto Asset Valuation Primer I
11 pages
Predicting Individual Equity Options
No ratings yet
Predicting Individual Equity Options
38 pages
Introduction To Transformers
No ratings yet
Introduction To Transformers
187 pages
AI For Everyone
No ratings yet
AI For Everyone
23 pages
Multimodal Chain-of-Thought Reasoning
No ratings yet
Multimodal Chain-of-Thought Reasoning
25 pages
LLM For Recommandation
No ratings yet
LLM For Recommandation
101 pages
Rise of LLM
No ratings yet
Rise of LLM
64 pages
A Guide To GenerativeAI (GAI) and Large Language Models (LLMS)
No ratings yet
A Guide To GenerativeAI (GAI) and Large Language Models (LLMS)
14 pages
ChatGPT in Finance - Applications, Challenges, and Solutions
No ratings yet
ChatGPT in Finance - Applications, Challenges, and Solutions
8 pages
LLM Fince-Tuning
No ratings yet
LLM Fince-Tuning
16 pages
DLL - English 6 - Q2 - W4
No ratings yet
DLL - English 6 - Q2 - W4
10 pages
High School Report Card
No ratings yet
High School Report Card
2 pages
Adverbs and Its Type
No ratings yet
Adverbs and Its Type
8 pages
Dap Elln Slac Requina
100% (1)
Dap Elln Slac Requina
21 pages
DLL Week 4 2nd Quart
No ratings yet
DLL Week 4 2nd Quart
4 pages
Part a Ch 1 Communication Skills Notes
No ratings yet
Part a Ch 1 Communication Skills Notes
3 pages
Memory in Neurodegenerative Disease - A. Troster (Cambridge, 1998) WW
No ratings yet
Memory in Neurodegenerative Disease - A. Troster (Cambridge, 1998) WW
428 pages
GEOR-59-331 (2)
No ratings yet
GEOR-59-331 (2)
10 pages
Rwservlet
No ratings yet
Rwservlet
7 pages
Whitcomb 2013 - Teachingimprovisationinelementarygeneralmusicfacin Retrieved 2015-09-28
100% (1)
Whitcomb 2013 - Teachingimprovisationinelementarygeneralmusicfacin Retrieved 2015-09-28
8 pages
Vocabulary Development
100% (1)
Vocabulary Development
32 pages
Analogy-Cause-and-Effect-Examples
No ratings yet
Analogy-Cause-and-Effect-Examples
7 pages
REFERENCES
No ratings yet
REFERENCES
5 pages
Unit 1: Talking About People: 1.1 What Are You Like?
No ratings yet
Unit 1: Talking About People: 1.1 What Are You Like?
35 pages
Esl Levels With Descriptions
No ratings yet
Esl Levels With Descriptions
1 page
RPH BI Year 2 W13 Mon2 2022
No ratings yet
RPH BI Year 2 W13 Mon2 2022
1 page
Assignment 6
No ratings yet
Assignment 6
8 pages
Learning Episode 4 FS2 October 11
100% (3)
Learning Episode 4 FS2 October 11
7 pages
AP Psychology - Textbook - Module 37
No ratings yet
AP Psychology - Textbook - Module 37
6 pages
Curriculam Development HRM
No ratings yet
Curriculam Development HRM
28 pages
Fall 2014 Syllabus Esl 1040 Conversation Sonia Ortega 2nd 8 Wks
No ratings yet
Fall 2014 Syllabus Esl 1040 Conversation Sonia Ortega 2nd 8 Wks
2 pages
A Project Report On Training and Development of WNS
No ratings yet
A Project Report On Training and Development of WNS
39 pages
vers.2 (FINAL) Navya Kyal, 8A, Individual Report - Global perspectives
No ratings yet
vers.2 (FINAL) Navya Kyal, 8A, Individual Report - Global perspectives
4 pages
Katie Bland-Resume
No ratings yet
Katie Bland-Resume
1 page
Cultivating Peace: in The 21st Century
No ratings yet
Cultivating Peace: in The 21st Century
60 pages
SHS HUMSS & GAS ACTION PROGRAM
No ratings yet
SHS HUMSS & GAS ACTION PROGRAM
9 pages
SNED-Consent-Form
No ratings yet
SNED-Consent-Form
2 pages
The typical school bsed project
100% (4)
The typical school bsed project
4 pages
Joylyn Castillejos Project 2 3
No ratings yet
Joylyn Castillejos Project 2 3
16 pages
Nature Prints 2018 Fort Mill High
No ratings yet
Nature Prints 2018 Fort Mill High
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Demystifying LLMs

Uploaded by

Demystifying LLMs

Uploaded by

Demystifying LLMs

Devendra Singh Chaplot

Feb 13, 2024

Arthur Mensch Timothée Lacroix Guillaume Lample

$500M+ funding, Of ces in Paris/London/SF Bay Area

• Learning from Human Preferences: DPO/RLHF

• Retrieval Augmented Generation (RAG)

• Recipe for RAG with code

3. Learning from Human Feedback

3. Learning from Human Feedback

Large Language Model

Large Language Model

Large Language Model

introduce Mixtral 8x7B , a Sparse Mixture

Large Language Model

We introduce Mixtral 8x7B , a Sparse

Llama pretraining data mixture

Write a python function to nd whether the input number is prime.

Write a python function to nd whether the input number is prime.

1. Let’s assume that the input number is n

def is_prime(x: int):

def is_prime(x: int):

def is_prime(x: int):

3. Learning from Human Feedback

def is_prime(x: int):

def is_prime(x: int):

Large Language Model

[INST] Write … [\INST]

def is_prime(x: int):

Large Language Model

[INST] Write … [\INST] def

def is_prime(x: int):

Large Language Model

[INST] Write … [\INST] def

def is_prime(x: int):

def is_prime (x) :

Large Language Model

[INST] Write … [\INST] def is_prime (x)

• Paired: (Prompt, Response)

Large Language Model

[INST] … [\INST] Def

• Paired: (Prompt, Response)

[INST] … [\INST] Def

• Paired: (Prompt, Response)

3. Learning from Human Feedback

def is_prime(x: int):

def is_prime(x: int):

[Deep Reinforcement Learning from Human Preferences. Christiano et al. 2017]

[Deep Reinforcement Learning from Human Preferences. Christiano et al. 2017]

Dataset: Dataset: Dataset:

def is_prime(x: int):

def is_prime(x: int):

Barack Obama is 57 years old in 2014.

## What is Barack Obama’s birthday?

Barack Obama was born on August 4, 1961.

## What is the name of Barack Obama’s wife?

Barack Obama’s wife is Michelle Obama.

## How tall is Barack Obama?

def is_prime(x: int):

Barack Obama is 57 years old in 2014.

## What is Barack Obama’s birthday?

Barack Obama was born on August 4, 1961.

## What is the name of Barack Obama’s wife?

Barack Obama’s wife is Michelle Obama.

## How tall is Barack Obama?

LMSYS Chatbot Arena Leaderboard

• Ask GPT-4 to score responses

• Compare win-rate against GPT-4 (v2)

[Figure from https://lemaoliu.github.io/retrieval-generation-tutorial/]

[Figure from https://gradient ow.substack.com/p/best-practices-in-retrieval-augmented]

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.