0% found this document useful (0 votes)
14 views

ML Algorithms

Uploaded by

mrnags430
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

ML Algorithms

Uploaded by

mrnags430
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

A Transformer Model: What Is It?

One kind of deep learning model that is mostly utilized for Natural Language Processing
(NLP) applications is the Transformer model. It was first presented in the 2017 paper
"Attention is All You Need" by Vaswani et al.

The Transformer's attention mechanism, which aids the model in concentrating on


various aspects of input data whether generating responses, interpreting context, or
making predictions, is its most crucial component.

To put it simply, imagine it as a model that reads and understands material by


concentrating on the most crucial words or phrases, regardless of where they are in a
sentence. This is particularly useful for comprehending lengthy sentences or paragraphs
when word relationships are important.

Why is the Transformer Model Used?


Handles Long Texts Well: Transformers read the full sentence or paragraph at once, in
contrast to more traditional models like Recurrent Neural Networks(RNNs) or Long
Short Term Memory Networks(LSTMs) that analyze text sequentially (word by word).
This indicates that even with lengthy texts, individuals comprehend the context better.

Faster and More Efficient: Transformers may be trained more quickly since they read
every word at once. Additionally, they enable parallelization, which enables training on
massive datasets.

Better Context Understanding: Transformers employ self-attention, which means that


the model considers each word in a sentence and decides which other words are
significant. This improves its comprehension of word relationships, which enhances its
capacity to translate languages, provide answers to queries, and summarize content.
The Transformer Model: Where Is It Used?
Virtual assistants and chatbots: Transformer models serve as the foundation for models
such as ChatGPT, Siri, and Google Assistant. They are able to have conversations and
produce human-like answers to questions.

Language Translation: By comprehending the sentence's context, translators ensure


correct translations while converting material between languages. Examples of these
services include Google Translate.

Content Summarization: To automatically condense lengthy pieces into manageable


chunks while preserving their meaning, news organizations and websites employ
transformers.

Text Generation: Transformers are used by programs such as AI writing tools and
content producers (like Grammarly and Jasper AI) to produce or enhance written text.

Real life Example


Large Language Models(LLMs) - ChatGPT
ChatGPT is an extension of the Large Language Model (LLM) class of machine learning natural
language processing models. It is based on GPT(Generative Pre-Trained Transformer). Large
amounts of text data are processed by LLMs, which then use this information to deduce word
relationships. As computing power has increased over the past few years, these models have
expanded. As the amount of their parameter space and input datasets grows, LLMs become
more capable.

Predicting a word in a string of words is the simplest method of training language models. This
is most frequently seen as either masked-language-modeling or next-token-prediction.

Inorder to create a powerful model like ChatGPT use these steps:

Step1:Supervised Fine-Tuning
The model learns from labeled datasets, meaning it is given input-output pairs
(examples of questions and their correct responses). This process is guided by human
annotators who provide correct answers.
Technical Explanation:

● The model is first pre-trained on massive amounts of general data, often


unsupervised (without labels), learning patterns in language (how words fit
together, common sentence structures, etc.).
● During fine-tuning, the model is trained on labeled data. This data consists of
conversational examples where the input (a question or statement) has a known
target output (a correct or preferred response).
● Loss functions like cross-entropy are used to measure the difference between
the predicted output of the model and the true label (the correct response). The
model is then adjusted to minimize this error through backpropagation and
gradient descent.

Technical Example:

● Input: “What is the capital of Japan?”


● Labeled Output: “The capital of Japan is Tokyo.”

The model compares its generated response with the labeled output, calculating the
loss, and then adjusts its parameters to improve.

Step 2: Reward Model

The next step involves creating a Reward Model, where the model’s outputs are
evaluated based on how good or bad they are. This involves collecting human feedback
to train a separate model to predict a reward score for each output.

● Technical Explanation:
○ After fine-tuning, the model’s responses are evaluated by human
annotators, who rank responses from best to worst based on criteria like
relevance, helpfulness, and clarity.
○ A reward model is then trained to predict the quality of responses. The
human evaluations are used as labels, and the reward model learns to
assign a reward score (like a numerical value) to each response the base
model generates.
○ The Mean Squared Error (MSE) or Ranked Loss can be used to train
the reward model, minimizing the difference between predicted rewards
and actual rewards (human evaluations).
● Technical Example:
○ The AI generates three different responses to the input “Explain the theory
of relativity.”
○ Humans rank the responses based on how well they explain the concept:
■ Response A: “Relativity is a theory about space and time.” (Rank:
3)
■ Response B: “Einstein’s theory of relativity describes how objects
move through space and time and how gravity affects that motion.”
(Rank: 1)
■ Response C: “Relativity talks about the speed of light and black
holes.” (Rank: 2)
● The reward model learns from these rankings and assigns higher reward scores
to responses like B that are more accurate and clear.

Step 3: Reinforcement Learning from Human Feedback (RLHF)

In this step, the model is fine-tuned again using Reinforcement Learning (RL), with the
reward model providing feedback on the generated responses. The goal is for the model
to maximize the rewards over time.

● Technical Explanation:
○ The model is now trained using Proximal Policy Optimization (PPO), a
popular reinforcement learning algorithm. The model generates responses
(called actions), and the reward model gives feedback (reward scores)
based on how good or bad the response is.
○ The model adjusts its responses (or policy) to maximize the cumulative
reward by trying different strategies and learning which ones work best.
○ This is where exploration vs. exploitation comes in. The model explores
different ways to answer questions, then exploits the patterns that receive
the highest rewards.
○ The policy is updated to generate responses that are not only accurate but
also helpful, polite, and informative, guided by the feedback from the
reward model.
● Technical Example:
○ Input: “What’s the difference between machine learning and deep
learning?”
○ Initial Response: “Machine learning is about algorithms. Deep learning
uses neural networks.” (Reward Score: 0.3)
○ The model is penalized (low reward) because this explanation is too vague.
○ After many rounds of RL, the model learns to provide a more
comprehensive answer:
■ Improved Response: “Machine learning is a broad field where
computers learn from data. Deep learning is a subset of machine
learning that uses neural networks to mimic how the human brain
processes data.” (Reward Score: 0.9)
● The model now generates better responses that earn higher reward scores.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy