Large Language Models: CSC413 Tutorial 9 Yongchao Zhou

Large Language Models
CSC413 Tutorial 9
Yongchao Zhou
Overview
● What are LLMs?
● Why LLMs?
● Emergent Capabilities
○ Few-shot In-context Learning
○ Advanced Prompt Techniques
● LLM Training
○ Architectures
○ Objectives
● LLM Finetuning
○ Instruction finetuning
○ RLHF
○ Bootstrapping
● LLM Risks
What are Language Models?
● Narrow Sense
○ A probabilistic model that assigns a probability to every finite sequence (grammatical or not)
● Broad Sense
○ Decoder-only models (GPT-X, OPT, LLaMA, PaLM)
○ Encoder-only models (BERT, RoBERTa, ELECTRA)
○ Encoder-decoder models (T5, BART)
Large Language Models - Billions of Parameters
https://huggingface.co/blog/large-language-models
Large Language Models - Hundreds of Billions of Tokens
https://babylm.github.io/
Large Language Models - yottaFlops of Compute
https://web.stanford.edu/class/cs224n/slides/cs224n-2023-lecture11-prompting-rlhf.pdf
Why LLMs?
● Scaling Law for Neural Language Models
○ Performance depends strongly on scale! We keep getting better performance as we scale
the model, data, and compute up!
https://arxiv.org/pdf/2001.08361.pdf
Why LLMs?
● Generalization
○ We can now use one single model to solve many NLP tasks
Why LLMs?
● Emergent Abilities
○ Some ability of LM is not present in smaller models but is present in larger models
https://docs.google.com/presentation/d/1yzbmYB5E7G8lY2-KzhmArmPYwwl7o7CUST1xRZDUu1Y/edit?resourcekey=0-6_TnUMoK
WCk_FN2BiPxmbw#slide=id.g1fc34b3ac18_0_27
Emergent Capability - In-Context Learning
https://www.cs.princeton.edu/courses/archive/fall22/cos597G/lectures/lec04.pdf
Pretraining + Fine-tuning Paradigm
Pretraining + Prompting Paradigm
● Fine-tuning (FT) Stronger
○ + Strongest performance task-specific
○ - Need curated and labeled dataset for each performance
new task (typically 1k-100k ex.)
○ - Poor generalization, spurious feature
exploitation
● Few-shot (FS)
○ + Much less task-specific data needed
○ + No spurious feature exploitation
○ - Challenging
● One-shot (1S)
○ + "Most natural," e.g. giving humans instructions
○ - Challenging
● Zero-shot (OS)
○ + Most convenient More convenient,
○ - Challenging, can be ambiguous
general, less data
Emergent Capability - Chain of Thoughts Prompting
Emergent Capability - Chain of Thoughts Prompting
Emergent Capability - Zero Shot CoT Prompting
Emergent Capability - Zero Shot CoT Prompting
Emergent Capability - Self-Consistency Prompting
Emergent Capability - Least-to-Most Prompting
Emergent Capability - Augmented Prompting Abilities
Advanced Prompting Techniques Ask a human to
● Zero-shot CoT Prompting ● Explain the rationale

● Self-Consistency ● Double check the answer
● Divide-and-Conquer ● Decompose to easy subproblems
Large Language Models demonstrate some human-like behaviors!

Training Architectures
Encoder-decoder models (T5, BART) Decoder-only models (GPT-X, PaLM)
http://jalammar.github.io/illustrated-transformer/
Training Objectives - UL2
What kinds of things does pretraining learn?
https://web.stanford.edu/class/cs224n/slides/cs224n-2023-lecture11-prompting-rlhf.pdf
Finetune - Instruction Finetune
Finetune - RLHF
Application - ChatGPT
Application - ChatGPT
https://yaofu.notion.site/How-does-GPT-Obtain-its-Ability-Tracing-Emergent-Abilities-of-Language-Models-to-their-Source
s-b9a57ac0fcf74f30a1ab9e3e36fa1dc1
Finetune - Bootstrapping
Finetune - Bootstrapping
Large Language models Risks
● LLMs make mistakes
(falsehoods, hallucinations)
● LLMs can be misused
(misinformation, spam)
● LLMs can cause harms
(toxicity, biases, stereotypes)
● LLMs can be attacked
(adversarial examples, poisoning, prompt injection)
● LLMs can be useful as defenses
(content moderation, explanations)

Resources for further reading
● https://web.stanford.edu/class/cs224n/
● https://stanford-cs324.github.io/winter2022/
● https://stanford-cs324.github.io/winter2023/
● https://www.cs.princeton.edu/courses/archive/fall22/cos597G/
● https://rycolab.io/classes/llm-s23/
● https://yaofu.notion.site/How-does-GPT-Obtain-its-Ability-Tracing-Emergent-Ab
ilities-of-Language-Models-to-their-Sources-b9a57ac0fcf74f30a1ab9e3e36fa1d
c1
● https://www.jasonwei.net/blog/emergence
Emergent Capability - Decomposed Prompting
Training Objectives - UL2
Training Techniques - Parallelism
https://openai.com/research/techniques-for-training-large-neural-networks
Training Techniques - Parallelism
https://www.microsoft.com/en-us/research/blog/deepspeed-extreme-scale-model-training-for-everyone/

Large Language Models: CSC413 Tutorial 9 Yongchao Zhou

Uploaded by

Copyright:

Available Formats

Large Language Models: CSC413 Tutorial 9 Yongchao Zhou

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Large Language Models: CSC413 Tutorial 9 Yongchao Zhou

Uploaded by

Copyright:

Available Formats

Large Language Models

Advanced Prompting Techniques Ask a human to

● Zero-shot CoT Prompting ● Explain the rationale

Large Language Models demonstrate some human-like behaviors!

Encoder-decoder models (T5, BART) Decoder-only models (GPT-X, PaLM)

● LLMs can be misused

● LLMs can cause harms

(toxicity, biases, stereotypes)

● LLMs can be attacked

(adversarial examples, poisoning, prompt injection)

● LLMs can be useful as defenses

(content moderation, explanations)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.