100% found this document useful (1 vote)

4K views13 pages

[Turing] Guidelines for RLHF Assessment_2025_DS.pptx

Uploaded by

kpinkyfam274

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

4K views13 pages

[Turing] Guidelines for RLHF Assessment_2025_DS.pptx

Uploaded by

kpinkyfam274

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Reinforcement learning from human

feedback (RLHF) Assessment

READ THE INSTRUCTIONS IN THESE SLIDES CAREFULLY. TAKE YOUR TIME.
YOU WILL ONLY GET 1 ATTEMPT ONCE YOU BEGIN THE ASSESSMENT
Overview

We are seeking talented individuals capable of training Large Language Models (LLMs) to solve
real-world problems for the world’s biggest companies.

This RLHF assessment assesses your skills in:

- Analyzing LLM outputs critically and constructively

- Assessing responses for relevance, accuracy, and safety

- Refining AI systems through detailed, context-aware feedback

- English writing proficiency

Guidelines
Test Integrity Guidelines

To ensure a fair evaluation for all candidates, please adhere to the following:

● No additional monitors allowed

● Use only the provided test environment and IDE

● Close unrelated browser tabs

● No online resources or tools (e.g., LeetCode, ChatGPT, search engines) are permitted

● Consult official language/API documentation only

● Complete the assessment independently, without external assistance

● Respect the integrity of the auto-proctoring system

🚩 Failure to comply with these guidelines WILL result in immediate disqualification. We appreciate your
honesty and commitment to a fair process.
RLHF Assessment Objective

- Evaluate English writing proficiency:

- Produce clear, effective, and grammatically correct texts
- Craft high-quality rationales

- Analyze model responses comprehensively:

- Articulate advantages and disadvantages
- Identify objective differences
- Maintain appropriate technical terminology

- Demonstrate attention to detail:

- Compare model responses accurately
- Notice and rank subtle differences
RLHF Test Instructions (30 minutes)

- You will be provided a short conversation based on interactions between a user and an AI model
assistant.

- At the end of the conversation, there will be two potential responses given by the AI model to
answer the user’s last message/request.

- Based on the conversation, you are required to

- Read the full interaction between the User and AI model assistant to understand the context.
- Evaluate and compare the two potential responses by the AI model assistant, and write
analyses considering:
- Response evaluation: Here, we need to select the dropdown options and submit our
evaluation for both model responses.
- Preference explanation: Please provide a brief explanation of your preference,
highlighting why you chose one model response over the other.
- Ideal response: Please generate your own ideal response using the 'Code Execution'
cell, as demonstrated in the upcoming slides.
Test Case Example: Here is the user prompt
Use Case: A team is experiencing intermittent connectivity issues between their web applications and
the backend API
AI Model outputs
based on user
prompt
Please note that the actual fields we collect during the test may differ from these in the example.
Use the 'Code
Executions' cell to
generate an ideal
response, which can
then be included in
the 'Ideal Response'
section as
demonstrated in the
previous slide.
Solution:

Good Example(1/2)
Correctness: Consider the factuality, logical flow, contextual relevance, and
consistency.
Response A's approach provides code that has no major issues in satisfying the user's ask
User: Give me a fast method in python to check if a list contains
for python code to check if a list contains duplicated items. It is highly efficient due to
duplicated items. leveraging Python's set, which only holds unique elements. This allows for constant-time

Model response A: membership checks (O(1)), ensuring quick operations for most dataset sizes.

Response B's approach provides code that has minor issues in satisfying the user's ask for
python code to check if a list contains duplicated items. It is less efficient for larger lists, as
it relies on comparing each element to every other, yielding a time complexity of O(n^2).
Additionally, there's a minor error in using "for i in range(n+1)" instead of the correct "for i in
range(n)".

Model response B:
Informativeness: Is the information provided relevant to the user’s request? Is the
information complete?
Response A's approach is not informative. It has a lack of comments, so a practical usage
example might limit its immediate clarity for some users.

Response B's approach is very informative. Overall extensive commenting enhances its
readability, aiding beginner understanding.

Which one would you choose as the best possible model response? Explain why.
Response A is a better response as it has no issues in correctness and more efficient.
Response A is the preferred way to check for duplicates in most cases. Response B, while
simpler and easier to understand, can be impractically slow for lists with many elements.
The user is looking for a fast solution, so model response A is the best of the two.
Good Example(2/2)
User: Give me a fast method in python to check if a list contains
duplicated items. In this solution:

Model response A:
- The answer is comprehensive, demonstrating that aspects of
correctness and informativeness of both responses were
addressed in the respective input boxes.
- The depth of analysis is commendable, as the two
approaches were compared in terms of their time

Model response B: complexity.

- The most important differences between the two model
responses were identified: comments, test case, incorrect
range in the outer FOR loop and the difference between the
two algorithms (set() vs FOR loops).
- The text is free of spelling/grammar mistakes,
well-structured, coherent and seamlessly articulate the most
important findings of the comparison.

.
Solution:
Bad Example Correctness: Consider the factuality, logical flow, contextual relevance, and
consistency.
User: Give me a fast method in python to check if a list contains Response A is good because set a fast function. Only use here 1 for and 1 if. Code is short.
duplicated items.
Response B is also good because is return corrected result. FOR loops is correct.
Model response A:

Informativeness: Is the information provided relevant to the user’s request? Is the

information complete?
Response B. It has more comments.

Which one would you choose as the best possible model response? Explain why.
Response A is better than B because using set is the best solution.
Model response B:

In this solution:
- The answer is too short lacking a comprehensive comparison of the two model
responses.
- In "Informativeness", response A is not even discussed.
- The depth of analysis is too superficial.
- Some important differences are missing: comments, test case, incorrect range in
outer FOR loop.
- Spelling/grammar mistakes are notorious.

This is not a good solution, even though model response A is indeed better than B, and the
set() is a better approach.
You’re good to start!

Go back to the assessment tab to begin.

[Turing] Guidelines for RLHF Assessment_2025_DS
No ratings yet
[Turing] Guidelines for RLHF Assessment_2025_DS
10 pages
Machine Learning Short Notes
No ratings yet
Machine Learning Short Notes
36 pages
ML Interview Questions PDF
100% (5)
ML Interview Questions PDF
20 pages
A Final Year Internship Report On SEO
100% (1)
A Final Year Internship Report On SEO
34 pages
40 R Programming Interview Questions & Answers For All Levels - DataCamp
No ratings yet
40 R Programming Interview Questions & Answers For All Levels - DataCamp
22 pages
MA3391 Probability and Statistics Lecture Notes 1
No ratings yet
MA3391 Probability and Statistics Lecture Notes 1
288 pages
Dog Breed Recognizer System: A Project Report On
No ratings yet
Dog Breed Recognizer System: A Project Report On
77 pages
WEEK 5-6 Landscapes of Secondary Activities Industrial Revolution
100% (1)
WEEK 5-6 Landscapes of Secondary Activities Industrial Revolution
10 pages
Elitmus Previous Paper
100% (2)
Elitmus Previous Paper
15 pages
For Exercises 5.17-5.20, Perform Each of The Following Steps
0% (1)
For Exercises 5.17-5.20, Perform Each of The Following Steps
6 pages
Engineering Material Specification: Controlled Document at Page 1 of 7
No ratings yet
Engineering Material Specification: Controlled Document at Page 1 of 7
7 pages
18CS71-AI&ML Notes
100% (1)
18CS71-AI&ML Notes
159 pages
PU Chronicles - AY 2019-2020
No ratings yet
PU Chronicles - AY 2019-2020
201 pages
Lab Manual Deep Learning
No ratings yet
Lab Manual Deep Learning
20 pages
Cours 4 - Loading and Preprocessing Data With TensorFlow
No ratings yet
Cours 4 - Loading and Preprocessing Data With TensorFlow
23 pages
AI - ML in Healthcare - Notes
No ratings yet
AI - ML in Healthcare - Notes
34 pages
ML Projects 1
No ratings yet
ML Projects 1
29 pages
PU Chronicles - Summer Internship - AY 20-21
100% (1)
PU Chronicles - Summer Internship - AY 20-21
200 pages
Kavya
No ratings yet
Kavya
38 pages
Rules& Regulations: Entry Level Integrated Training and Enablement (ELITE)
No ratings yet
Rules& Regulations: Entry Level Integrated Training and Enablement (ELITE)
4 pages
Inter IIT Placement 2020-2021
67% (3)
Inter IIT Placement 2020-2021
1 page
Elitmus Sample Papers I
No ratings yet
Elitmus Sample Papers I
6 pages
Artificial Intelligence
50% (2)
Artificial Intelligence
16 pages
101905CS502H - Neural Networks and Deep Learning - Model Question Paper
100% (1)
101905CS502H - Neural Networks and Deep Learning - Model Question Paper
4 pages
DeltaX - Digital Marketing Champion - Next Steps
No ratings yet
DeltaX - Digital Marketing Champion - Next Steps
2 pages
Ai Voice Assistant PPT Project
0% (1)
Ai Voice Assistant PPT Project
22 pages
Infosys Cheat Sheet
No ratings yet
Infosys Cheat Sheet
40 pages
Syllabus Generative AI
100% (1)
Syllabus Generative AI
22 pages
Ann Lab Manual 1
No ratings yet
Ann Lab Manual 1
50 pages
Celeb AI
No ratings yet
Celeb AI
2 pages
NSIT B.E. Placements 2019-20-2
No ratings yet
NSIT B.E. Placements 2019-20-2
8 pages
deep learning
No ratings yet
deep learning
34 pages
Types of Data Represented As Strings
No ratings yet
Types of Data Represented As Strings
2 pages
TalentNext WCF Students Engagement - FY 24 - v3
100% (1)
TalentNext WCF Students Engagement - FY 24 - v3
7 pages
AIHIFusion Hiring Notice - Campus (2025)
50% (2)
AIHIFusion Hiring Notice - Campus (2025)
3 pages
Coding Questions (Accenture and Cognizant)
No ratings yet
Coding Questions (Accenture and Cognizant)
30 pages
Deep Learning Syllabus
100% (1)
Deep Learning Syllabus
2 pages
18ai72 Aml QP Solutions
No ratings yet
18ai72 Aml QP Solutions
39 pages
Thought Works Placement Paper
No ratings yet
Thought Works Placement Paper
20 pages
Tcs Slot 1 - 19th Aug - 9 Am - Questions
No ratings yet
Tcs Slot 1 - 19th Aug - 9 Am - Questions
12 pages
Factset Placement Paper 1
No ratings yet
Factset Placement Paper 1
6 pages
Accenture Cracker by Pappu Career Guide
No ratings yet
Accenture Cracker by Pappu Career Guide
46 pages
Cs2351 Artificial Intelligence 16 Marks
100% (1)
Cs2351 Artificial Intelligence 16 Marks
1 page
Tech Mahindra Exam Pattern 2024
No ratings yet
Tech Mahindra Exam Pattern 2024
13 pages
LLM - Seminar Report
No ratings yet
LLM - Seminar Report
13 pages
TCS CodeVita Preparation Guide
No ratings yet
TCS CodeVita Preparation Guide
37 pages
VGST FDP Proposal 2016 Modified
No ratings yet
VGST FDP Proposal 2016 Modified
15 pages
Capgemini Cognitive Assessment
50% (2)
Capgemini Cognitive Assessment
75 pages
DSA Sheet by Rohit Negi
No ratings yet
DSA Sheet by Rohit Negi
38 pages
Infytq-Practice Sheet Problems
No ratings yet
Infytq-Practice Sheet Problems
22 pages
Final Year Project
No ratings yet
Final Year Project
41 pages
ChatBot Project in Machine Learning PPT Kundan
No ratings yet
ChatBot Project in Machine Learning PPT Kundan
11 pages
Unit I Probabilistic Reasoning I 9
No ratings yet
Unit I Probabilistic Reasoning I 9
20 pages
COSM Questions
No ratings yet
COSM Questions
4 pages
Reinforcement Learning Notes
No ratings yet
Reinforcement Learning Notes
167 pages
(Turing) Guidelines For RLHF Assessment - 2025 - DS
No ratings yet
(Turing) Guidelines For RLHF Assessment - 2025 - DS
13 pages
[Turing] Guidelines for Technical Writing Assessment (April 2024)
100% (1)
[Turing] Guidelines for Technical Writing Assessment (April 2024)
8 pages
New Text Document
No ratings yet
New Text Document
1 page
LLM Critics Help Catch LLM Bugs Paper
No ratings yet
LLM Critics Help Catch LLM Bugs Paper
23 pages
Databricks Generative AI Engineer Associate Practice Questions
No ratings yet
Databricks Generative AI Engineer Associate Practice Questions
7 pages
432007867003922 Prashanna Shrestha AssessmentCenterReport 163
No ratings yet
432007867003922 Prashanna Shrestha AssessmentCenterReport 163
10 pages
450011872000078_anurag yadav_AssessmentCenterReport_163
No ratings yet
450011872000078_anurag yadav_AssessmentCenterReport_163
23 pages
bootcamp of AI
No ratings yet
bootcamp of AI
9 pages
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
From Everand
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
Timothy King
No ratings yet
The Status of The Uk Domestic PV Market - A Review of The Impact of The Low Carbon Buildings Programme
No ratings yet
The Status of The Uk Domestic PV Market - A Review of The Impact of The Low Carbon Buildings Programme
4 pages
Folk Art Scarf: Amy Christoffers Leave A Tail at Least 6" Long Every Time CC1 or CC2 Are Joined
No ratings yet
Folk Art Scarf: Amy Christoffers Leave A Tail at Least 6" Long Every Time CC1 or CC2 Are Joined
1 page
WEEK 1 - Dropbox #1
0% (1)
WEEK 1 - Dropbox #1
3 pages
Latest Bollywood Love Songs 2022 Romantic Hindi Love Songs 2022
No ratings yet
Latest Bollywood Love Songs 2022 Romantic Hindi Love Songs 2022
2 pages
Project Report On Market Potential Analysis
100% (1)
Project Report On Market Potential Analysis
92 pages
Sav Siegfried
No ratings yet
Sav Siegfried
4 pages
Peru
No ratings yet
Peru
4 pages
Product Manual IS 17293 Revised
No ratings yet
Product Manual IS 17293 Revised
12 pages
1-4 - April 22
No ratings yet
1-4 - April 22
3 pages
Pest0398 PDF
No ratings yet
Pest0398 PDF
2 pages
Balabadra Kavacha
No ratings yet
Balabadra Kavacha
11 pages
Cost of Cybersecurity Breaches
No ratings yet
Cost of Cybersecurity Breaches
40 pages
Lesson Plan in Mapeh
100% (1)
Lesson Plan in Mapeh
3 pages
E & HV HANDBOOK 2024-25 modified
No ratings yet
E & HV HANDBOOK 2024-25 modified
29 pages
REAL PROPERTY II WORKSHEET V-Condominium Legislation 2020
No ratings yet
REAL PROPERTY II WORKSHEET V-Condominium Legislation 2020
3 pages
Management Accounting Assignment - 1
No ratings yet
Management Accounting Assignment - 1
4 pages
PR1 - LESSON PLAN - Week 1.1 - Feb 15, 2023
No ratings yet
PR1 - LESSON PLAN - Week 1.1 - Feb 15, 2023
5 pages
MRC-LSTM a Hybrid Approach of Multi-scale
No ratings yet
MRC-LSTM a Hybrid Approach of Multi-scale
8 pages
Chapter 4: Absorbers Rules of Thumb For Chemical Engineers, 5th Edition by Stephen Hall
No ratings yet
Chapter 4: Absorbers Rules of Thumb For Chemical Engineers, 5th Edition by Stephen Hall
11 pages
Tylenol Case Study
No ratings yet
Tylenol Case Study
7 pages
Ortner's Identification of Pathological Conditions in Human Skeletal Remains - 3rd Edition Premium Download
100% (12)
Ortner's Identification of Pathological Conditions in Human Skeletal Remains - 3rd Edition Premium Download
17 pages
BSc-Data Science
No ratings yet
BSc-Data Science
9 pages
You Exec - Ultimate Pitch Deck Part3 Complete
No ratings yet
You Exec - Ultimate Pitch Deck Part3 Complete
9 pages
Da0 001 6
No ratings yet
Da0 001 6
6 pages
Procedure For Radio Communication
No ratings yet
Procedure For Radio Communication
4 pages
Od 430089069754638100
No ratings yet
Od 430089069754638100
5 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

[Turing] Guidelines for RLHF Assessment_2025_DS.pptx

Uploaded by

[Turing] Guidelines for RLHF Assessment_2025_DS.pptx

Uploaded by

Reinforcement learning from human

feedback (RLHF) Assessment

This RLHF assessment assesses your skills in:

- Analyzing LLM outputs critically and constructively

- Assessing responses for relevance, accuracy, and safety

- Refining AI systems through detailed, context-aware feedback

- English writing proficiency

● No additional monitors allowed

● Use only the provided test environment and IDE

● Close unrelated browser tabs

● Consult official language/API documentation only

● Complete the assessment independently, without external assistance

● Respect the integrity of the auto-proctoring system

- Evaluate English writing proficiency:

- Analyze model responses comprehensively:

- Demonstrate attention to detail:

- Based on the conversation, you are required to

Model response B: complexity.

Informativeness: Is the information provided relevant to the user’s request? Is the

Go back to the assessment tab to begin.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.