0% found this document useful (0 votes)

78 views

LLMs and Retrieval-Augmented Generation (RAG)

Llm and RAG

Uploaded by

vivekkumar.sharma65

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views

LLMs and Retrieval-Augmented Generation (RAG)

Llm and RAG

Uploaded by

vivekkumar.sharma65

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 120

Retrieval-augmented LMs:

Past, Present and Future

Large Language Models: Methods and Applications
Akari Asai (akari@cs.washington.edu)

Feel free to post questions on Sli.do!

Sli.do code #2068655
How do normal parametric LLMs work?
Encapsulating everything in parameters by pre-training parameters on large-scale text corpora

Pittsburgh is a city in the county

seat of Allegheny County,
Pennsylvania, United States
Pre-training
data

𝑃 𝑥𝑛 𝑥1 , 𝑥2 , … , 𝑥𝑛−1 )

LLM
How do normal parametric LLMs work?
Encapsulating everything in parameters by pre-training parameters on large-scale text corpora

Pittsburgh is a city in and the

county seat of Allegheny County,
Pennsylvania, United States Allegheny
Pre-training Pennsylvania
data
King

𝑃 𝑥𝑛 𝑥1 , 𝑥2 , … , 𝑥𝑛−1 )

LLM

Pittsburgh is located in
𝑥1 𝑥2 𝑥3 𝑥4
Limitations of parametric LLMs #1: Hallucinations
LLMs cannot memorize everything in their parameters (yet), resulting in factual inaccuracy
Limitations of parametric LLMs #1: Hallucinations
LLMs cannot memorize everything in their parameters (yet), resulting in factual inaccuracy
Limitations of parametric LLMs #1: Hallucinations
LLMs cannot memorize everything in their parameters (yet), resulting in factual inaccuracy
Limitations of parametric LLMs #1: Hallucinations
LLMs cannot memorize everything in their parameters (yet), resulting in factual inaccuracy
Limitations of parametric LLMs #1: Hallucinations
LLMs cannot memorize everything in their parameters (yet), resulting in factual inaccuracy

…
Limitations of parametric LLMs #1: Hallucinations
LLMs cannot encapslate everything in their parameters yet.

…
Catastrophic incidents due to LLM hallucinations
Such LLM hallucinations have been causing many critical incidents in the real world
Retrieval-augmented LMs: Definitions & Notations
A new type of LMs that can use large-scale text data (datastore) at inference-time

Input 𝒙

LLM

Pre-training
data Output 𝒚
Retrieval-augmented LMs: Definitions & Notations
A new type of LMs that can use large-scale text data (datastore) at inference-time

Input 𝒙

Retriever LLM

Datastore
Retrieval-augmented LMs: Definitions & Notations
A new type of LMs that can use large-scale text data (datastore) at inference-time

Query 𝒒 Input 𝒙

Retriever LLM

Datastore
Retrieval-augmented LMs: Definitions & Notations
A new type of LMs that can use large-scale text data (datastore) at inference-time

Query 𝒒 Input 𝒙

Retriever LLM

sim(𝑞, 𝑑)
Documents
𝒁
Datastore
Retrieval-augmented LMs: Definitions & Notations
A new type of LMs that can use large-scale text data (datastore) at inference-time

Query 𝒒 Input 𝒙

Retriever LLM

Documents
𝒁
Datastore Output 𝒚
Benefit of retrieval-augmented LMs #1: reduce hallucinations
Retrieval-augmented LMs can reduce hallucinations, especially in long-tail knowledge

“When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories”. Mallen*, Asai* et al. ACL 2023
Benefit of retrieval-augmented LMs #1: reduce hallucinations
Retrieval-augmented LMs can reduce hallucinations, especially in long-tail knowledge

“When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories”. Mallen*, Asai* et al. ACL 2023
Quiz:
What are the other benefits of using
retrieval-augmented LMs?
Benefit of retrieval-augmented LMs #2: Adaptations w/o training
Parametric LMs’ knowledge gets obsolete quickly & requires continuous training

“RealTime QA: What's the Answer Right Now?” Kasai et al. NeurIPS (Benchmark). 2023
Benefit of retrieval-augmented LMs #2: Adaptations w/o training
Parametric LMs’ knowledge gets obsolete quickly & requires continuous training

Rishi Sunak is a British politician and

the current Prime Minister of the
United Kingdom (May 2024
Wikipedia)
Pre-training
data
2024 May data

𝑃 𝑥𝑛 𝑥1 , 𝑥2 , … , 𝑥𝑛−1 )

LLM

Rishi Sunak is a British politician and

the current Prime Minister of the
United Kingdom (May 2024 Rishi
Wikipedia)
Pre-training Boris
data
2024 May data Liz

𝑃 𝑥𝑛 𝑥1 , 𝑥2 , … , 𝑥𝑛−1 )

LLM

The current Prime Minister UK is

2023
The incumbent prime minister is Keir
Starmer, who assumed the office on
5 July 2024. Rishi
Pre-training Boris
data 2024 data
Keir

𝑃 𝑥𝑛 𝑥1 , 𝑥2 , … , 𝑥𝑛−1 )

LLM

The current Prime Minister UK is

“RealTime QA: What's the Answer Right Now?” Kasai et al. NeurIPS (Benchmark). 2023
Benefit of retrieval-augmented LMs #2: Adaptations w/o training
We can easily swap datastores for retrieval-augmented LMs for new data distributions

2023

2024 data

“RealTime QA: What's the Answer Right Now?” Kasai et al. NeurIPS (Benchmark). 2023
Benefit of retrieval-augmented LMs #3: Providing attributions
Retrieval-augmented LMs can provide a small number of documents as attributions

“Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models”. Bohnet et al. ArXiv 2020.
Benefit of retrieval-augmented LMs #4: Flexible data opt-in / out
We can incorporate or remove high-risk data dynamically at inference, not training time

“SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore”. Min et al. In ICLR 2024
Benefit of retrieval-augmented LMs #5: parameter efficiency
Retrieval-augmented LMs can be much more parameter efficient and compute-optimal

“Scaling Retrieval-Based Language Models with a Trillion-Token Datastore.” Shao, He, Asai et al., ArXiv 2024.
Benefit of retrieval-augmented LMs #5: parameter efficiency
Retrieval-augmented LMs can be much more parameter efficient and compute-optimal

“Scaling Retrieval-Based Language Models with a Trillion-Token Datastore.” Shao, He, Asai et al., ArXiv 2024.
Retrieval-augmented LMs have been widely used!
Retrieval-augmented LMs have been widely used both in academia and industry
Retrieval-augmented LMs have been widely used!
Retrieval-augmented LMs have been widely used both in academia and industry

“60% of LLM applications use some form

of retrieval-augmented generation (RAG)”
Today’s outline
1. Introduction: What are retrieval-augmented LMs? Why do we want
to use them?
2. Past: Architecture and training of retrieval-augmented LMs for
downstream tasks
3. Present: Retrieval-augmented generation with LLMs
4. Future: Limitations & future directions

Feel free to post questions on Sli.do!

Sli.do code #2068655
Past: Architecture and training
of retrieval-augmented LMs for
downstream tasks
Brief history of retrieval-augmented LMs development
RAG was initially extensively studied for certain NLP tasks, namely Question Answering

2017: DrQA

Trained QA model

BM25

Retrieve and read Wikipedia articles for open-domain QA

“Reading Wikipedia to Answer Open-Domain Questions.” Chen et al., ACL 2017.

Brief history of retrieval-augmented LMs development
RAG was initially extensively studied for certain NLP tasks, namely Question Answering

2017: DrQA

2019: ORQA
2020: RALM, RAG

End-to-end pre-training → fine-tuning of retriever & LM

“Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.” Lewis et al., NeurIPS 2020.
Brief history of retrieval-augmented LMs development
RAG was initially extensively studied for certain NLP tasks, namely Question Answering

2017: DrQA

2019: ORQA
2020: RALM, RAG
2020: kNN LM

New architectures for retrieval-augmented LMs

“Generalization through Memorization: Nearest Neighbor Language Models.” Khandelwal et al., ICLR 2020.
Brief history of retrieval-augmented LMs development
RAG was initially extensively studied for certain NLP tasks, namely Question Answering

2017: DrQA

2019: ORQA
2020: RALM, RAG
2020: kNN LM

2021: RETRO
New architectures for retrieval-augmented LMs

“Improving language models by retrieving from trillions of tokens.” Borgeaud et al., Arxiv 2020.
Brief history of retrieval-augmented LMs development
Versatile and powerful LLMs demonstrate effectiveness even without fine-tuning

2017: DrQA

2019: ORQA
2020: RALM, RAG
2020: kNN LM
2020: GPT3
2021: RETRO
2022: ChatGPT

LLMs surpressed specialized QA models w/ retrieval

https://paperswithcode.com/sota/question-answering-on-triviaqa
Brief history of retrieval-augmented LMs development
Success of In-Context Retrieval-Augmented LMs (commonly referred to as RAG today)

2017: DrQA

2019: ORQA
2020: RALM, RAG
2020: kNN LM

2021: RETRO
Use off-the-shelf LLMs & retrieval systems

2023: In-Context Retrieval-

Augmented LMs
“In-Context Retrieval-Augmented Language Models.” Ram et al., TACL 2023.
Brief history of retrieval-augmented LMs development
RAG was initially extensively studied for certain NLP tasks, namely Question Answering

2017: DrQA

2019: ORQA
Past: Developments in
2020: RALM, RAG Architecture and Training for
2020: kNN LM Specific Tasks

2021: RETRO

2023: Retrieval-
augmented LLMs
Brief history of retrieval-augmented LMs development
RAG was initially extensively studied for certain NLP tasks, namely Question Answering

2017: DrQA

2019: ORQA
Past: Architecture / training
2020: RALM, RAG developments for certain down or
2020: kNN LM up-stream tasks

2021: RETRO

Current: Designing versatile and

2023: Retrieval- reliable LLM-based RAG systems
augmented LLMs for diverse use cases
Diverse architectures of retrieval-augmented LMs
Classifying retrieval-augmented LMs based on “where” we incorporate retrieved context

● Input augmentaiton
○ Augment the input of LMs with retrieved context
○ E.g., RAG, REALM, DrQA, In-context RALM Retriever 𝑥

LLM

𝑦
REALM: Augmenting input space of LMs
REALM is an retrieval-augmented masked LMs that predicts next tokens / spans in context

“REALM: Retrieval-Augmented Language Model Pre-Training.” Guu et al., ICML 2020.

REALM: Augmenting input space of LMs
REALM finds relevant context by conducting kNN search in embedding spaces

“REALM: Retrieval-Augmented Language Model Pre-Training.” Guu et al., ICML 2020.

REALM: Augmenting input space of LMs
REALM compute weighted averages of final answer distributions, using retrieval similarities

“REALM: Retrieval-Augmented Language Model Pre-Training.” Guu et al., ICML 2020.

REALM: Augmenting input space of LMs
REALM compute weighted averages of final answer distributions, using retrieval similarities

“REALM: Retrieval-Augmented Language Model Pre-Training.” Guu et al., ICML 2020.

RAG: Augmenting input space of LMs
RAG combines a trained retriever & autoregressive BART, starting from pre-trained weights

“Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.” Lewis et al., NeurIPS 2020.
RAG & REALM: Results
RAG and REALM show their effectiveness on open-domain QA and other tasks

● RAG outperforms REALM and other baselines on Open-domain QA such as NaturalQuestion

Open-domain QA (EM)
50
NQ WQ
40

30
Higher is
better 20

0
T5 REALM DPR RAG

No retrieval With retrieval

“Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.” Lewis et al., NeurIPS 2020.
RAG & REALM: Results
RAG and REALM show their effectiveness on open-domain QA and other tasks

● RAG outperforms REALM and other baselines on Open-domain QA such as NaturalQuestions

● RAG also show their effectiveness on generation tasks

Open-domain QA (EM) Question generation (Tri-match)

50 MS MARCO Jeopardy
NQ WQ 100
40
80
30
60
Higher is
better 20 40

10 20

0 0
T5 REALM DPR RAG Gold BART RAG-Token

No retrieval With retrieval No retrieval With retrieval

“Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.” Lewis et al., NeurIPS 2020.
Recent follow-up: In-context Retrieval-augmented LMs
Similar principles as in DrQA, REALM, RAG, but completely removes retrieval

● Combining retrieval and off-the-shelf LMs e.g., GPT-4 at inference time without training
● Often referred to as “RAG” nowadays
● We’ll cover this in depth in the next section!

“In-Context Retrieval-Augmented Language Models.” Ram et al., TACL 2023.

Pros and cons of input augmentation
Input augmentation is powerful but has several limitations

● Pros
○ Easy to switch to new, more powerful LMs with fine-tuning / without training
○ LLMs can effectively levarage input context
Pros and cons of input augmentation
Input augmentation is powerful but has several limitations.

● Pros
○ Easy to switch to new, more powerful LMs with fine-tuning / without training
○ LLMs can effectively levarage input context

● Cons
○ Expensive to scale up to hundreads or thousands of documents
■ LLMs also often do not fully levarage long context
○ No strict attributions to specific evidences

“Lost in the Middle: How Language Models Use Long Contexts.” Liu et al., TACL 2023.
Diverse architectures of retrieval-augmented LMs
Classifying retrieval-augmented LMs based on “where” we incorporate retrieved context

● Input augmentaiton
○ Augment the input of LMs with retrieved context
○ E.g., RAG, REALM, DrQA, In-context RALM Retriever

● Intermediate incorporation
○ Incorporate retrieved context in intermediate LLM
spaces of transformers
○ E.g., RETRO, Instruct RETRO
RETRO: Incorporating context in intermediate layers
RETRO enables more efficient incorporations of many documents

“Improving language models by retrieving from trillions of tokens.” Borgeaud et al., Arxiv 2020.
RETRO: Incorporating context in intermediate layers
RETRO enables more efficient incorporations of many documents

Standard transformer block

“Improving language models by retrieving from trillions of tokens.” Borgeaud et al., Arxiv 2020.
RETRO: Incorporating context in intermediate layers
RETRO uses frozen BERT as a retriever, and retrieve nearest neighbors from 1.7T datastore

“Improving language models by retrieving from trillions of tokens.” Borgeaud et al., Arxiv 2020.
RETRO: Incorporating context in intermediate layers
Given the input sequence, it first retrieves a set of relevant documents (embedding of text)

“Improving language models by retrieving from trillions of tokens.” Borgeaud et al., Arxiv 2020.
RETRO: Incorporating context in intermediate layers
Use cross-attention to generate retrieved context-aware representations

“Improving language models by retrieving from trillions of tokens.” Borgeaud et al., Arxiv 2020.
RETRO: Incorporating context in intermediate layers
Concatnate all of the CA output (the size of input H and output CCA(H,E) remains the same

“Improving language models by retrieving from trillions of tokens.” Borgeaud et al., Arxiv 2020.
RETRO: Results
RETRO shows impressive performance improvements on upstream (language modeling) tasks

● RETRO significantly outperforms non-retrieved baselines

Lower is
better

● RETRO significantly outperforms non-retrieved baselines

● RETRO performance continues to improve as the datastore scales from a few billion to 1.7
trillion data points

Lower is
better

● RETRO significantly outperforms non-retrieved baselines

● RETRO performance continues to improve as the datastore scales from a few billion to 1.7
trillion data points
● Increasing # of docs up to 40 helps

Lower is
better

“Improving language models by retrieving from trillions of tokens.” Borgeaud et al., Arxiv 2020.
Recent follow-up: Instruct RETRO
Develop RETRO-block on top of Llama (autoregressive LMs), pre-training & multi-task training

“InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining.” Wang et al., ICML 2024.
Pros and cons of intermediate incorporation
Alternative way to incorporate retrieved context in a more scalable way, but requires training

● Pros
○ More efficiently incorporates many passages than input augmentation
○ Possibly more effective than retrieval augmentaion (i.e., Instruct RETRO results)

● Cons
○ Require modification of underlying LMs
○ Expensive pre-training is necessary
○ Doesn’t provide strict attribution
Diverse architectures of retrieval-augmented LMs
Classifying retrieval-augmented LMs based on “where” we incorporate retrieved context

● Input augmentaiton
○ Augment the input of LMs with retrieved context
○ E.g., RAG, REALM, DrQA, In-context RALM

● Intermediate incorporation
○ Incorporate retrieved context in intermediate
spaces of transformers
○ E.g., RETRO, Instruct RETRO

● Output interpolation
○ Interpolate output token probabilities with retrieved Retriever LLM
non-parametric distributions
○ E.g., kNN LM
kNN LM: directly interpolate output token distributions
Directly interpolate output token distributions of LMs

● Given a context 𝑥, a model predicts parametric distributions for next token

Parametric distribution
(LM output distribution)

“Generalization through Memorization: Nearest Neighbor Language Models.” Khandelwal et al., ICLR 2020.
kNN LM: directly interpolate output token distributions
Directly interpolate output token distributions of LMs

● Given a context 𝑥, a model predicts parametric distributions for next token

● kNN LM computes nonparametric distributions, by finding similar training context 𝐶𝑖

Nonparametric distribution

● Given a context 𝑥, a model predicts parametric distributions for next token

● kNN LM computes nonparametric distributions, by finding similar training context 𝐶𝑖

● Interpolates two token distributions, adjusting the balance using a hyperparamter 𝜆

Nonparametric
distribution

Parametric
distribution

“Generalization through Memorization: Nearest Neighbor Language Models.” Khandelwal et al., ICLR 2020.
kNN LM: Results
kNN LM outperforms much larger parametric LMs by large margin

● kNN LM constantly outperforms parametric 100M LMs & 30x larger 3B LMs with larger datastore

“Generalization through Memorization: Nearest Neighbor Language Models.” Khandelwal et al., ICLR 2020.
kNN LM: Results
kNN LM outperforms much larger parametric LMs by large margin

● kNN LM constantly outperforms parametric LMs and 30x larger 3B LMs with larger datastore
● kNN LM also enables efficient & controlled domain adaptations

“Generalization through Memorization: Nearest Neighbor Language Models.” Khandelwal et al., ICLR 2020.
Recent follow-up: TRIME
Training kNN LM to better learn interpolations

● kNN LM uses pre-trained LMs without any training

● TRIME introduces an efficient training method, outperforming kNN LM

Wikitext 103 (Perplexity)

0
Transformer kNN LM TRIME
Dev Perp Test Perp

“Training Language Models with Memory Augmentation.” Zhong et al., EMNLP 2022.
Pros and cons of output interpolation
kNN LM & variatns have unique advantages but have several empirical challenges

● Pros
○ Provides token-level attributions
○ Enables explicit control between parametric and non-parametric memories
Pros and cons of output interpolation
kNN LM & variatns have unique advantages but have several empirical challenges

● Pros
○ Provides token-level attributions
○ Enables explicit control between parametric and non-parametric memories
● Cons
○ Difficult to scale to large retrieval corpora (i.e., the number of embeddings equals the
number of tokens)
○ Empirically shows limited effectiveness outside of upsteam language modeling tasks
Summary
Diverse types of retrieval-augmented LMs have been studied; have pros & cons

● Input augmentation: widely used and effective but faces challenges when incorporating more
passages
● Intermediate incorporation: can efficiently handle more passages but requires pre-training and
fine-tuning
● Output interpolation: provides direct control over LM output, but has limited success in
downstream tasks and faces challenges of scaling the datastore
Representative Retrieval unit Retrieval frequency
methods
Input DrQA, RAG, REALM, Passage Once at the beginning
augmentation ICRALM
Intermediate RETRO, Passage Every k tokens
incorporation InstructRETRO
Output kNNLM Token Every token
interpolation TRIME
Present: Retrieval-augmented
Generation with LLMs
In-context retrieval-augmented LMs
Simply augmenting input of LMs gives signficant gain across different tasks

LLM

Answer the following question, based on the reference.

Reference
Q: Who is the current PM of UK?
A:

“When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories”. Mallen*, Asai* et al. ACL 2023
In-context retrieval-augmented LMs
Simply augmenting input of LMs gives signficant gain across different tasks

LLM

Answer the following question, based on the reference.

Reference Rishi Sunak
Q: Who is the current PM of UK?
A:

Retriever LLM

Answer the following question, based on the reference.

Reference
The current prime minister is Keir Starmer, who succeeded
Rishi Sunak on 5 July 2024, following the 2024 general
election
Q: Who is the current PM of UK?
A:

Retriever LLM

Answer the following question, based on the reference. Keir Starmer

Reference
The current prime minister is Keir Starmer, who succeeded
Rishi Sunak on 5 July 2024, following the 2024 general
election
Q: Who is the current PM of UK?
A:

“When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories”. Mallen*, Asai* et al. ACL 2023
In-context Retrieval-augmented LMs: Result
Simply augmenting input-space of LMs give signficant gain across different tasks

● In upsream language modeling task, simply adding retrieved context gives large gains, especially
smaller models
● Similar significant gains in downstream tasks such as Question Answering

“In-Context Retrieval-Augmented Language Models.” Ram et al., TACL 2023.

In-context Retrieval-augmented LMs: Result
Effects of retrieval systems for downstream task performance

● On language modeling, BM 25 results in best performance

“In-Context Retrieval-Augmented Language Models.” Ram et al., TACL 2023.

In-context Retrieval-augmented LMs: Result
Effects of retrieval systems for downstream task performance

● On language modeling, BM 25 results in best performance

● On downstream QA tasks, trained retrieval models eg Contriever results in best performance

“When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories”. Mallen*, Asai* et al. ACL 2023
Limitations of such naïve “RAG”
Is combining off-the-shelf models sufficient?

● In-context retrieval-augmented LMs sometimes generate content that is not

fully supported by their citations

“Evaluating Verifiability in Generative Search Engines”. Liu et al. Findings of EMNLP 2023.
Limitations of such naïve “RAG”
Is combining off-the-shelf models sufficient?

● In-context retrieval-augmented LMs sometimes generate content that is not

fully supported by their citations
● They can easily be distracted by unhelpful context

“Making Retrieval-Augmented Language Models Robust to Irrelevant Context”. ICLR 2024.

Limitations of such naïve “RAG”
Is combining off-the-shelf models sufficient?

● In-context retrieval-augmented LMs generate what is not fully supported by

their citations
● They can easily get distracted by unhelpful context
● Diverse tasks require diffent retrieval needs e.g., content, frequency

Can be easily answered based on top

Who is the current PM of UK?
documents retrieved at the beginning
Limitations of such naïve “RAG”
Is combining off-the-shelf models sufficient?

● In-context retrieval-augmented LMs generate what is not fully supported by

their citations
● They can easily get distracted by unhelpful context
● Diverse tasks require diffent retrieval needs e.g., content, frequency

Can be easily answered based on top

Who is the current PM of UK?
documents retrieved at the beginning

Create a table listing all previous UK Prime

This may require iterative retrieval, based on
Ministers, including their terms in office, political
the current generation
party, alma mater, and notable achievements.
Limitations of such naïve “RAG”
Is combining off-the-shelf models sufficient?

● In-context retrieval-augmented LMs generate what is not fully supported by

their citations
● They can easily be distracted by unhelpful context
● Diverse tasks require diffent retrieval needs e.g., content, frequency

Can be easily answered based on top

Who is the current PM of UK?
documents retrieved at the beginning

Create a table listing all previous UK Prime

This may require iterative retrieval, based on
Ministers, including their terms in office, political
the current generation
party, alma mater, and notable achievements.

The equation x 2 + 2x = i has two complex Questions with similar solutions may have limited
solutions. Determine the product of their real semantic similarities in embedding space
parts (from MATH)
Designing and training more reliable LLM RAG
Approaches to optimize (1) LM, (2) Retrievers, or (3) prompts for LLM RAG

1. Optimizing LLMs for RAG: training / controlling LLMs with retrieved context

Retriever LLM
SAIL: Training LMs with retrieval-augmented data
SAIL augments existing instruction-tuning data to teach the LM how to use retrieved context

SAIL sytnthetically generates

explanations by using a NLI
model

“Search Augmented Instruction Learning”. Luo et al. Findings of EMNLP 2024.

Self-RAG: Teaching LLMs to learn to levarage retrieved context
Self-RAG teaches LMs to adaptively retrieve and evaluates context & own generation

● Train an arbitrary LM (e.g., Llama 3) to generate special tokens for (1) triggering retrieval only
when necessary and (2) evaluating the relevance of retrieved context and its own generations.

“Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection”. Asai et al. ICLR 2024.
Advanced RAG inference algorithm
Advanced RAG inference algorithm to better incorporate retrieved context

● Levarage model-generated tokens to improve search process at inference time

“Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection”. Asai et al. ICLR 2024.
Optimizing LLMs for RAG: Results
New training and advanced inference algorithm for RAG significantly boost performance

● Training with 8B and 13B models significantly boosts performance compared to off-the-shelf RAG
pipelines

Acc,
rouge 100

0
PopQA FActScore ASQA PubHealth
Llama 2 13B Self-RAG 13B ChatGPT SAIL

● Training with 8B and 13B models significantly boosts performance compared to off-the-shelf RAG
pipelines
● Adaptive use of retrieval also improves the efficiency of RAG systems

Acc,
rouge 100

0
PopQA FActScore ASQA PubHealth
Llama 2 13B Self-RAG 13B ChatGPT SAIL

“Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection”. Asai et al. ICLR 2024.
Designing and training more reliable LLM RAG
Approaches to optimize (1) LM, (2) Retrievers, or (3) prompts for LLM RAG

1. Optimizing LLMs for RAG: training / controlling LLMs with retrieved context

2. Optimizing Retriever for RAG: training retrievers for LLM RAG

Retriever LLM
Optimizing retrievers for RAG
Training retrieval modules using LM feedback

● For RAG pipelines using blackbox LLMs e.g., GPT o1, we cannot directly train the LLMs for RAG
● Can we train retrievers instead?

“REPLUG: Retrieval-Augmented Black-Box Language Models”. Shi et al. NAACL 2024.

REPLUG: Training a retriever using blackbox LLM feedback
Training retrieval model using LM feedback

● Train retrievers for black-box LLMs by minimizing KL divergence between LM & retriever

“REPLUG: Retrieval-Augmented Black-Box Language Models”. Shi et al. NAACL 2024.

REPLUG: Training a retriever using blackbox LLM feedback
Training retrieval model using LM feedback

● Train retrievers for black-box LLMs by minimizing KL divergence between LM & retriever

“REPLUG: Retrieval-Augmented Black-Box Language Models”. Shi et al. NAACL 2024.

REPLUG: Training a retriever using blackbox LLM feedback
Training retrieval model using LM feedback

● Train retrievers for black-box LLMs by minimizing KL divergence between LM & retriever

“REPLUG: Retrieval-Augmented Black-Box Language Models”. Shi et al. NAACL 2024.

RA-DIT: Combining REPLUG + retrieval-augmented LM training
Trains both retriever and LM on multiple tasks using REPLUG + retrieval-augmented training

“RA-DIT: Retrieval-Augmented Dual Instruction Tuning”. Lin et al. ICLR 2024.

RA-DIT: Combining REPLUG + retrieval-augmented LM training
Trains both retriever and LM on multiple tasks using REPLUG + retrieval-augmented training

“RA-DIT: Retrieval-Augmented Dual Instruction Tuning”. Lin et al. ICLR 2024.

REPLUG, RA-DIT: Results
Training retriever & LM gives large improvements across diverse tasks

● RA-DIT observes performance gain from combinations of off-the-shelf (REPLUG w/o LSR)

80
Acc, EM
75

50
MMLU TQA Avg
REPLUG RA-DIT (LM only) RA-DIT (R only) RA-DIT

“RA-DIT: Retrieval-Augmented Dual Instruction Tuning”. Lin et al. ICLR 2024.

REPLUG, RA-DIT: Results
Training retriever & LM gives large improvements across diverse tasks

● RA-DIT observes performance gain from combinations of off-the-shelf (REPLUG w/o LSR)
● Both LM and retriever training contributes to performance gain

80
Acc, EM
75

50
MMLU TQA Avg
REPLUG RA-DIT (LM only) RA-DIT (R only) RA-DIT

“RA-DIT: Retrieval-Augmented Dual Instruction Tuning”. Lin et al. ICLR 2024.

REPLUG, RA-DIT: Results
Training retriever & LM gives large improvements across diverse tasks

● RA-DIT observes performance gain from combinations of off-the-shelf (REPLUG w/o LSR)
● Both LM and retriever training contributes to performance gain

80
Acc, EM
75

50
MMLU TQA Avg
REPLUG RA-DIT (LM only) RA-DIT (R only) RA-DIT

“RA-DIT: Retrieval-Augmented Dual Instruction Tuning”. Lin et al. ICLR 2024.

Optimizing retrievers for RAG
Alternative appraoches: introducing additional modules for reranking or filtering

● From initial retrived docs 𝑍, select more relevant context before feeding it to LMs
● Examples include: cross-encoder, context compression (Xi et al., 2024)

“RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective Augmentation”. Xu et al. ICLR 2024.
Designing and training more reliable LLM RAG
Approaches to optimize (1) LM, (2) Retrievers, or (3) prompts for LLM RAG

1. Optimizing LLMs for RAG: training / controlling LLMs with retrieved context

2. Optimizing Retriever for RAG: training retrievers for LLM RAG

3. Optimizing Prompts for RAG: advanced prompt techniques

Retriever LLM
DSPy: Optimizing prompts for LLM RAG
Optimizing prompts for RAG applications

● Training-free RAG systems are brittle to prompts

“DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines”. Khattab et al. ICLR 2024.
DSPy: Optimizing prompts for LLM RAG
Optimizing prompts for RAG applications

● Training-free RAG systems are brittle to prompts

Scores

33%
with GPT-3.5
on a multi-hop QA task

“DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines”. Khattab et al. ICLR 2024.
DSPy: Optimizing prompts for LLM RAG
Optimizing prompts for RAG applications

● DSPy optimizes instructions and few-shot demonstrations to achieve the best performance

“DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines”. Khattab et al. ICLR 2024.
DSPy: Optimizing prompts for LLM RAG
Optimizing prompts for RAG applications

“DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines”. Khattab et al. ICLR 2024.
DSPy: Optimizing prompts for LLM RAG

Scores

55%
with GPT-3.5
on a multi-hop QA task
Future: Limitations & future
directions
Roadmap for more efficient & reliable retrieval-augmented LMs
Challenges of scaling up datastores & increased inference-time costs

● Performance gains are achieved by scaling up the datastore

to trillions of tokens
Evaluations ● Significantly increases inference costs, including CPU memory
and storage requirements (e.g., 24 TB for 1.7 trillion-token).

Algorithms

Infrastructure

“Scaling Retrieval-Based Language Models with a Trillion-Token Datastore.” Shao, He, Asai et al., ArXiv 2024.
Roadmap for more efficient & reliable retrieval-augmented LMs
New algorithms & arthictectures to enable more efficient and effective RAG

● Current “RAG” has many issues such as efficiency &

redundancy
Evaluations ● Alternative algorithms, better LM architectures, caching … etc
for improving efficiency and performance

Algorithms

Infrastructure

“Generative Representational Instruction Tuning.” Muennighoff et al., ArXiv 2024.

Roadmap for more efficient & reliable retrieval-augmented LMs
New algorithms & arthictectures to enable more efficient and effective RAG

● Current “RAG” has many issues such as efficiency &

redundancy
Evaluations ● Alternative algorithms, better LM architectures, caching … etc
for improving efficiency and performance

Algorithms

Infrastructure

“PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design.” Jiang et al., ArXiv 2024.
Roadmap for more efficient & reliable retrieval-augmented LMs
Careful analyses on their effectiveness and limitations

Prior systems are often evaluated only on simple general-domain

Evaluations tasks. Further exploration into their evaluation are needed

● Domains: most prior evaluations are in general-domain tasks,

where Wikipedia is a sufficient knowledge source
Algorithms
● Tasks: going beyond open-domain QA, multiple-choice QA

● Aspects: instead of merely evaluating final “correctness”, more

Infrastructure holistic evaluations of different aspects of RAG
Questions? Sli.do code #2068655

Acknowledgements: Some slides are adapted from our ACL 2023 tutorials https://acl2023-retrieval-
lm.github.io/ co-taught by Akari, Sewon Min, Zexuan Zhong and Danqi Chen. We thank Omar Khattab
for sharing the DSPy slides

Italian Language and Culture Beginner Volume 1 by Daniela Bartalesi Graf 1523607130 PDF
0% (2)
Italian Language and Culture Beginner Volume 1 by Daniela Bartalesi Graf 1523607130 PDF
5 pages
Langgraph
No ratings yet
Langgraph
94 pages
Langchain Retrieval Augmented Generation White Paper
100% (1)
Langchain Retrieval Augmented Generation White Paper
23 pages
MM-LLMs Recent Advances in MultiModal Large Language Models
No ratings yet
MM-LLMs Recent Advances in MultiModal Large Language Models
22 pages
Guide To Evaluating LLM and RAG Systems
No ratings yet
Guide To Evaluating LLM and RAG Systems
41 pages
LLM and RAG
No ratings yet
LLM and RAG
12 pages
RAG and AI Agents Simplified
No ratings yet
RAG and AI Agents Simplified
14 pages
Long-Context LLMs Meet RAG: Overcoming Challenges For Long Inputs in RAG
No ratings yet
Long-Context LLMs Meet RAG: Overcoming Challenges For Long Inputs in RAG
34 pages
mcp9
No ratings yet
mcp9
17 pages
Hands-On Lab With LLMs and Gen AI Within IDC
No ratings yet
Hands-On Lab With LLMs and Gen AI Within IDC
57 pages
Generative AI Interview Questions and Answers
No ratings yet
Generative AI Interview Questions and Answers
7 pages
Kubernetes
No ratings yet
Kubernetes
42 pages
mcp_security
No ratings yet
mcp_security
28 pages
Elevating Customer Satisfaction With LLM-Powered Chatbots
100% (1)
Elevating Customer Satisfaction With LLM-Powered Chatbots
18 pages
Software AI
No ratings yet
Software AI
64 pages
RAG (Retrieval Augmented Generation)
No ratings yet
RAG (Retrieval Augmented Generation)
3 pages
Little Guide To Building Large Language Models in 2024
100% (1)
Little Guide To Building Large Language Models in 2024
65 pages
LangChain_Academy_-_Introduction_to_LangGraph_-_Motivation
No ratings yet
LangChain_Academy_-_Introduction_to_LangGraph_-_Motivation
17 pages
RAG Slide ENG
No ratings yet
RAG Slide ENG
41 pages
Multimodal RAG Systems Hands-On Guide
No ratings yet
Multimodal RAG Systems Hands-On Guide
7 pages
How To Prepare For Optimal Results With Azure AI Security Copilot
No ratings yet
How To Prepare For Optimal Results With Azure AI Security Copilot
32 pages
1GitHub - Modelcontextprotocol_python-sdk_ the Official Python SDK for Model Context Protocol Servers and Clients
No ratings yet
1GitHub - Modelcontextprotocol_python-sdk_ the Official Python SDK for Model Context Protocol Servers and Clients
9 pages
LLM Challenges
No ratings yet
LLM Challenges
1 page
FAANGPath Simple Template 1
No ratings yet
FAANGPath Simple Template 1
2 pages
ChatGPT Assignments To Use in Your Classroom Today
No ratings yet
ChatGPT Assignments To Use in Your Classroom Today
145 pages
Synthetic Generation of High Dimensional Dataset
No ratings yet
Synthetic Generation of High Dimensional Dataset
8 pages
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
LLM Knowledge Graph Builder
No ratings yet
LLM Knowledge Graph Builder
27 pages
GenAI_Interview_Questions-Draft
No ratings yet
GenAI_Interview_Questions-Draft
27 pages
LLM Benchmark
No ratings yet
LLM Benchmark
21 pages
Developing Retrieval Augmented Generation (RAG) Based LLM Systems From Pdfs - An Expert Report
No ratings yet
Developing Retrieval Augmented Generation (RAG) Based LLM Systems From Pdfs - An Expert Report
36 pages
Retrieval Augmentation Reduces Hallucination in Conversation
No ratings yet
Retrieval Augmentation Reduces Hallucination in Conversation
21 pages
Guide to Fast GraphRAG
No ratings yet
Guide to Fast GraphRAG
7 pages
Ai Notes
No ratings yet
Ai Notes
2 pages
Basics of Retrieval-Augmented Generation or RAG
No ratings yet
Basics of Retrieval-Augmented Generation or RAG
2 pages
Knowledge Graphs v Vector Databases and when not to use them!
No ratings yet
Knowledge Graphs v Vector Databases and when not to use them!
3 pages
Graph RAG
No ratings yet
Graph RAG
7 pages
Evolving LLOMPS For RAG
No ratings yet
Evolving LLOMPS For RAG
6 pages
Hybrid RAG for Unstructured Data_
No ratings yet
Hybrid RAG for Unstructured Data_
25 pages
Building a Dynamic Multi-Agent Workflow_ Harnessing AI Collaboration with LangChain & LangGraph _ by Rohit Kumar _ Oct, 2024 _ Medium
No ratings yet
Building a Dynamic Multi-Agent Workflow_ Harnessing AI Collaboration with LangChain & LangGraph _ by Rohit Kumar _ Oct, 2024 _ Medium
13 pages
1. Application Of Large Language
No ratings yet
1. Application Of Large Language
75 pages
GenAI POC - Training
100% (1)
GenAI POC - Training
43 pages
Agents & Environment
No ratings yet
Agents & Environment
24 pages
Building a Streamlit Chatbot with LangChain and Llama 3.1_ Exploring LLMs — 3 _ by Abou Zuhayr _ Sep, 2024 _ GoPenAI
No ratings yet
Building a Streamlit Chatbot with LangChain and Llama 3.1_ Exploring LLMs — 3 _ by Abou Zuhayr _ Sep, 2024 _ GoPenAI
15 pages
Agentic_RAGs_1740054167
No ratings yet
Agentic_RAGs_1740054167
10 pages
AI Agents and Environment
No ratings yet
AI Agents and Environment
42 pages
Software Architecture in An AI World
No ratings yet
Software Architecture in An AI World
25 pages
How LLMs and Quantum Science Can Empower Each Other
No ratings yet
How LLMs and Quantum Science Can Empower Each Other
94 pages
Data Science Guide
No ratings yet
Data Science Guide
275 pages
Analysis_on_Enhancing_Financial_Decision-making_Through_Prompt_Engineering
No ratings yet
Analysis_on_Enhancing_Financial_Decision-making_Through_Prompt_Engineering
5 pages
ChatBot PDF
No ratings yet
ChatBot PDF
109 pages
Extensive Database Management Using Artificial Intelligence
100% (2)
Extensive Database Management Using Artificial Intelligence
7 pages
Google.Professional-Machine-Learning-Engineer.v2024-10-23.q109
No ratings yet
Google.Professional-Machine-Learning-Engineer.v2024-10-23.q109
120 pages
Kubernetes For MLOps Engineers
No ratings yet
Kubernetes For MLOps Engineers
7 pages
Ai Fundamentals
No ratings yet
Ai Fundamentals
15 pages
An Introduction To Vision-Language Modeling: Aishwarya Agrawal Kate Saenko Asli Celikyilmaz Vikas Chandra
No ratings yet
An Introduction To Vision-Language Modeling: Aishwarya Agrawal Kate Saenko Asli Celikyilmaz Vikas Chandra
76 pages
Neo4j - GraphRAG - 2024
100% (1)
Neo4j - GraphRAG - 2024
23 pages
ARTICLE- Is Agentic RAG Worth the Investment? Agentic RAG Pricing and ROI Breakdown
No ratings yet
ARTICLE- Is Agentic RAG Worth the Investment? Agentic RAG Pricing and ROI Breakdown
1 page
eBook Scaling RAG Systems From POC to Production – 2025
No ratings yet
eBook Scaling RAG Systems From POC to Production – 2025
28 pages
Generative AI 101 Introduction to the Fundamentals michael-callaghan
No ratings yet
Generative AI 101 Introduction to the Fundamentals michael-callaghan
145 pages
Using SOLR For Enabling Highly Customized Sitewide Navigation
No ratings yet
Using SOLR For Enabling Highly Customized Sitewide Navigation
12 pages
? Week 2 - Writing Task Assignment - My Daily Routine
No ratings yet
? Week 2 - Writing Task Assignment - My Daily Routine
8 pages
Grammar Time Unit 1 - The Verb To Be: Positive Statements Negative Statements Questions
100% (1)
Grammar Time Unit 1 - The Verb To Be: Positive Statements Negative Statements Questions
4 pages
Human, Cynidicean
No ratings yet
Human, Cynidicean
104 pages
Wallace Rex The Sabellic Languages of Ancient Italy
No ratings yet
Wallace Rex The Sabellic Languages of Ancient Italy
45 pages
11.-sinif-ingilizce-3.-unite-gramer-konu-ozeti-konu-anlatim-notlari 2
No ratings yet
11.-sinif-ingilizce-3.-unite-gramer-konu-ozeti-konu-anlatim-notlari 2
2 pages
Huddleston and Pullum Chapter 6
No ratings yet
Huddleston and Pullum Chapter 6
15 pages
Unit 9 Student
No ratings yet
Unit 9 Student
23 pages
Translation Theories
No ratings yet
Translation Theories
15 pages
Exercise 5 – Mode_ laporan – Unit 1 _ 1B Grammar_ Expressions of frequency – 210_A2P_WIB – MyEnglishLab
No ratings yet
Exercise 5 – Mode_ laporan – Unit 1 _ 1B Grammar_ Expressions of frequency – 210_A2P_WIB – MyEnglishLab
1 page
GR 12 Paper 1 Language Telematics 25 July 2023 Final
No ratings yet
GR 12 Paper 1 Language Telematics 25 July 2023 Final
43 pages
Research Kamensa
No ratings yet
Research Kamensa
33 pages
Explanation Text
0% (1)
Explanation Text
11 pages
Past Simple Affirmative and Negative Interactive Worksheet Teachers Notes
No ratings yet
Past Simple Affirmative and Negative Interactive Worksheet Teachers Notes
1 page
Test 4 Sapphire
No ratings yet
Test 4 Sapphire
4 pages
Pragmatics Lecture Notes
No ratings yet
Pragmatics Lecture Notes
47 pages
Language Families Class Notes
No ratings yet
Language Families Class Notes
11 pages
1 - The Passive Voice - notes+exercises (for students)
No ratings yet
1 - The Passive Voice - notes+exercises (for students)
4 pages
ĐỀ CƯƠNG CUỐI KÌ I- MÔN ANH 9 2023- 2024
No ratings yet
ĐỀ CƯƠNG CUỐI KÌ I- MÔN ANH 9 2023- 2024
5 pages
1st Summative Test in English6
No ratings yet
1st Summative Test in English6
2 pages
K. IN I: Halliday
No ratings yet
K. IN I: Halliday
27 pages
CHAPTER 3
No ratings yet
CHAPTER 3
71 pages
English (4.1) Descriptive and Limiting (Review)
No ratings yet
English (4.1) Descriptive and Limiting (Review)
6 pages
ConnectPlus6 Excellence2024
No ratings yet
ConnectPlus6 Excellence2024
48 pages
Opi Span 493
No ratings yet
Opi Span 493
1 page
Ho 2 - Guillermo Mendoza Syntax
No ratings yet
Ho 2 - Guillermo Mendoza Syntax
5 pages
CONJUNCTION
No ratings yet
CONJUNCTION
4 pages
U1L3 SN - Language Learning at University
No ratings yet
U1L3 SN - Language Learning at University
6 pages
Monosylables and Polisylables
No ratings yet
Monosylables and Polisylables
4 pages
Les Verbes Réguliers en
No ratings yet
Les Verbes Réguliers en
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.