0% found this document useful (0 votes)
15 views

2_notes (3)

This lecture explores Large Language Models (LLMs), advanced AI systems that process and generate human language, transforming technology interactions. It covers the complexities of human language, the architecture and training of LLMs, and the importance of high-quality training data. Additionally, it discusses customization options for LLMs to enhance performance and adapt to specific tasks.

Uploaded by

nihilnoths
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

2_notes (3)

This lecture explores Large Language Models (LLMs), advanced AI systems that process and generate human language, transforming technology interactions. It covers the complexities of human language, the architecture and training of LLMs, and the importance of high-quality training data. Additionally, it discusses customization options for LLMs to enhance performance and adapt to specific tasks.

Uploaded by

nihilnoths
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

SCDS1001 - Artificial Intelligence Literacy I

L-4: Understanding Large Language Models (LLMs)

Overview
This lecture delves into the intricacies of Large Language Models (LLMs), which are sophisticated
artificial intelligence systems that excel in processing and generating human language. LLMs are
transforming how we interact with technology, enabling more natural conversations and more effective
communication with machines.

Human Language
Human language is fundamentally a tool that facilitates communication and is a cornerstone of human
progress and innovation. It is not only a means of expressing ideas but also a meta-tool that enables the
creation, sharing, and refinement of various other tools. The complexity of human language arises from
its numerous irregularities, exceptions, idioms, and evolving meanings, which can pose challenges for
both learners and AI systems. Additionally, language is inherently ambiguous, as its meaning can shift
based on context, tone, and cultural nuances. This richness allows for creativity in expression, leading
to the emergence of new words and slang as society evolves. Furthermore, language has layers that
encapsulate emotional, cultural, and social norms, making it a dynamic and multifaceted system. As AI
models attempt to mimic human language, they must navigate these complexities to generate relevant
and coherent text.

Large Language Models (LLMs)


Large Language Models are a subset of deep neural networks specifically engineered to process and
generate human language. They operate by predicting the next word in a sequence, utilizing vast
amounts of data to learn the patterns and structures inherent in language. LLMs are distinguished by
their architecture, which includes multiple layers—input, hidden, and output layers—that work together
to analyze and produce text. The learning process for LLMs involves training on extensive datasets
derived from internet text, allowing them to acquire a broad understanding of language use across
different contexts.

Types of LLMs
Several prominent LLMs have emerged in recent years, each with unique features and capabilities.
Notable examples include Deepseek-R1 by High-Flyer, GPT-4o by OpenAI, Qwen2.5 by Alibaba, and
Llama3.3 by Meta. Each of these models employs advanced techniques and architectures to enhance
performance and adaptability.
LLMs can be grouped by how they are accessed: downloadable models (Deepseek-R1, Qwen2.5,
Llama3.3) and API-based models (GPT-4o). Downloadable models can be installed and run on your
own computer, giving you more control and customization options, but they require a powerful machine
to operate. On the other hand, API-based models are hosted on the servers of the provider, meaning you
can use them over the internet through an application programming interface (API). This method is
easier to integrate into apps and doesn’t require you to manage any hardware, but it usually comes with
usage fees and can be slower because it relies on internet connection.

Prediction Mechanisms
There are two primary mechanisms for predicting the next word in a text sequence: deterministic (not
commonly used) and stochastic (commonly used) prediction. Deterministic next word prediction means
that given the same input, the model will always predict the same next word, which can be useful in
controlled environments. In contrast, stochastic next word prediction introduces variability by utilizing
probabilities to determine the likelihood of various possible continuations. For instance, when presented
with the phrase "To be or not to ___," the model might predict "be" with a 75% probability, while other
options like "do" or "say" have lower probabilities. This stochastic nature allows LLMs to generate
diverse and contextually rich text, making their outputs more dynamic and engaging. Adjusting the
probabilities can control the randomness of the predictions, which can be particularly useful in
applications where creativity is desired.

File generated on 22 February 2025 at 2:02 PM Page 1 of 3


SCDS1001 - Artificial Intelligence Literacy I
L-4: Understanding Large Language Models (LLMs)

Stages of LLM Development


The training involves two key stages: pre-training on extensive text data to build a foundational
understanding of language, followed by fine-tuning on more specific tasks or datasets to improve
accuracy and relevance.
• Pre-training stage: Developers gather extensive text data from the internet, often amounting to
terabytes of information, and use powerful computational resources, such as clusters of GPUs,
to process this data. The model learns to recognize patterns and structures within the language,
forming a foundational understanding of communication.
• Fine-tuning stage: This involves crafting specific labelling instructions and collecting high-
quality, human-labelled responses to train the model for particular tasks. The fine-tuning
process allows developers to refine the model’s capabilities, ensuring that it can provide
accurate and contextually relevant responses. After deploying the model, continuous
monitoring and evaluation are essential to identify and correct any misbehaviours, thereby
enhancing the model's reliability over time.

Training Data
The quality and diversity of training data are critical to the success of LLMs. High-quality training data
must be large in volume and diverse in content, encompassing a wide range of topics and language uses
to ensure the model learns effectively. Most LLM providers do not disclose the exact datasets used, but
they typically consist of vast amounts of text harvested from the internet, including books, articles, and
websites. This data must be carefully curated to avoid biases and maintain relevance. For example,
Hugging Face’s FineWeb dataset comprises 3 billion web pages from 39 million domains,
demonstrating the scale required for effective training. The removal of personally identifiable
information (PII) and irrelevant content is also essential to ensure ethical use and compliance with data
privacy standards.

Important Terms
• Tokens are the fundamental units of text that the model processes, typically representing words
or subworlds. Tokenization is the process of breaking down text into smaller units, or tokens,
which can be words or subworlds. This enables the model to analyse and generate text more
efficiently. Effective tokenization helps the model manage vocabulary size and handle rare or
complex words by breaking them into manageable pieces, allowing for better performance
across diverse language inputs.
• The transformer model is a crucial architecture that powers many LLMs, utilizing attention
mechanisms to determine which parts of the input data are most relevant during processing.
This attention-based approach allows LLMs to consider the relationships between words in a
sentence, improving their ability to generate coherent and contextually appropriate text.
Attention works by computing a score for each word in relation to others, determining which
words should be emphasized when making predictions. This is particularly beneficial for
capturing long-range dependencies in language, where the meaning of a word can be influenced
by others that are far apart in the text. For example, in the sentence “The cat that chased the
mouse was very quick,” the attention mechanism helps the model understand that “cat” and
“quick” are related despite being separated by several words.

File generated on 22 February 2025 at 2:02 PM Page 2 of 3


SCDS1001 - Artificial Intelligence Literacy I
L-4: Understanding Large Language Models (LLMs)

Customizations of LLMs
Customizing LLMs can significantly enhance their performance and adapt them to specific tasks. Two
common customization options include adjusting the temperature and the maximum length of generated
text. The temperature controls the randomness of predictions; a lower temperature (e.g., 0.2) results in
more deterministic and focused outputs, while a higher temperature (e.g., 1.0) yields more diverse and
creative responses. This allows users to strike a balance between coherence and creativity based on their
needs.

The maximum length parameter dictates how many tokens the model will generate in response to a
prompt. Setting an appropriate maximum length is crucial, as it affects the completeness and relevance
of the output. Too short a length may truncate valuable information, while too long a length can lead to
irrelevant or overly verbose responses. By fine-tuning these parameters, users can tailor LLM outputs
to better suit specific applications, whether for casual conversation, technical writing, or creative
storytelling.

File generated on 22 February 2025 at 2:02 PM Page 3 of 3

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy