0% found this document useful (0 votes)
27 views62 pages

Bee SFT Coding 240918 023144

Uploaded by

hakimhbili
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views62 pages

Bee SFT Coding 240918 023144

Uploaded by

hakimhbili
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

Published using Google Docs

Report abuseLearn more


Bee SFT Coding Single Turn | Attempter Instructions
Updated automatically every 5 minutes
CONFIDENTIAL INFORMATION

🐝 Bee SFT Single Turn- Attempter Instruction 🐝


This document contains confidential and proprietary information that is intended solely for the use of the individual or entity to whom
it is disclosed.

Updated Sep 9, 2024

🚨IMPORTANT🚨
You must do the task following IN YOUR LANGUAGE
(i.e., Spanish, German, French). Both prompt and
response should be in the same/your language.
Always check task instructions – like this example
is in Spanish:

Table of Contents
🔭 Goal
🧑‍💻Structured Data Generation
📐Data Format
✅ Good Example
🛑 Bad Example
🔑 Rubric
❓FAQ
🔭 Goal
Your mission: Beat the state of the art model (SOTA) by providing your own prompt(s) and
response(s) that are:
​ Following the user’s instruction
​ Concise
​ Truthful
​ Harmless
​ Satisfying
These are all explained in the Rubric section below.
Single Turn: In this project, you’ll be asked to both write the initial prompt and the response to
that prompt. For each task, this is the workflow:
​ Step 1: Write prompt.
​ Step 2: Write response.
​ Step 3: Click submit.

🧑‍💻Structured Data Generation


What is Structured Data Generation?
Converting different forms of structured data from one form to another. Structured Data involves
structures such as:
​ Lists
​ Tables
​ CSV
​ JSONs

🚨IMPORTANT🚨
DO NOT WRITE CODE
BLOCKS! You do not
have to write code to
convert from one form to
the other, you just
convert the data.
Good Bad

Prompt Make Make a


this python
CSV script
that
a
convert
JSON s this
. CSV to
[CSV a
here] JSON
[CSV
here]

Respon [JSO [Python


se N script
here] here]

It is important to match the tone of the prompt. Essentially there are 2 ways to write tasks:
​ Descriptions: The user will provide one or more data examples and the
assistant should stick with the format and additional requirements if any.
​ Example Prompt: “I am a teacher and I need a CSV that…”
​ Ex Response: “Here is the CSV!”
DO NOT just give the data → EXPLAIN !
​ Direct Examples: The user requirements are very specific.
​ Example Prompt: “Generate the json for the following data ..., only show
me the JSON object.”
​ Ex Response: Only output the JSON object.
DO NOT SAY: ““Sure! Here’s the JSON:...” → BE DIRECT

📐Data Format
​ Code: Including inline code and code blocks in standard Markdown format.
​ Inline code should be enclosed by ` (a single backtick), such as .
​ Code blocks should be enclosed by ``` (triple backticks) with language
name whenever language name is available.

​ DO NOT USE any of the following libraries as they are incompatible


across different systems: SwiftUI, UIKit, AppKit, XCTest, Combine?,
ARKit, CoreML, SceneKit, Metal, CoreData

Helpful Tools
​ JSON Tool →
https://jsonformatter.org/markdown-formatt
er
​ CSV Tool →
https://www.convertcsv.com/csv-to-markdo
wn.htm
​ Table Tool →
https://www.tablesgenerator.com/markdow
n_tables

​ Lists: Displayed in either a numbered list OR a bulleted list.


​ Tables: Should be in a markdown format.
​ Headers: Should be used to organize content or establish a hierarchy of
information.
​ Highlights: Only bold text should be used for highlighting, no italics or
underlining.
✅ Good Example
⚠️Examples should not be copied for a task. ⚠️
In general, a good
prompt should
have:
​ Real
Life
User
Requ
est
​ Clear
Asks
Note: Do
not be
vague here
and say
things like
“analyze”
!! Ask for
specifics
(what
values
(average,
max
values, list
all values)
you want,
what
columns
you need,
what the
format
should be)
​ Exam
ples
of
Data
Note: Do
not have to
use code.
You can
use lists,
tables, csv.
If you do,
please
make sure
you’re
confident
in it and it
compiles.
And a good
response should
have:
​ LLM-l
ike
Struc
ture
​ Consi
derati
on for
the
User
​ Orga
nizati
on
and
Read
ability
​ Com
plete
ness

Below are 2 examples of a prompt and response.


Example 1: Description Example
Prompt: (Translated from Spanish for readability, all of the prompt should be in target language)
Original: Necesito crear un formato CSV para llevar un registro de los ingresos de mi esposo y
los míos durante el año. Imprima la fila del encabezado y las 5 filas de datos de muestra para
que pueda ver cómo usarlo. Mi nombre es Sue y mi esposo es Jeremy. Incluya una columna en
algún lugar para anotar la retención de impuestos federales y estatales para que pueda calcular
mis pagos estimados cada trimestre.
Translated for readability: I need to come up with a CSV format to keep track of earnings
throughout the year for my husband and myself. Please print the header row and 5 rows of
sample data so that I can see how to use it. My name is Sue, and my husband is Jeremy.
Please include a column somewhere to note federal and state tax withholding so I can calculate
my estimated payments each quarter.
What makes this a good prompt ?

​ Real Life User Request

​ You know the user and their situation.


​ It is like a real life problem.

​ Clear Asks

​ There is a direct ask and the format you need (CSV)


​ There is a detailed description of how the solution should look.

​ Examples

​ There is a clear description of the data the user has.


​ There is a clear output desired (CSV).
Response: (Translated from Spanish for readability, all of the response should be in target
language)
Original: Aquí se sugiere un formato CSV para realizar un seguimiento de las ganancias de
usted y su esposo durante el año:
\n\n```csv\nFecha,Sue - Trabajo de tiempo completo,Sue - Trabajo secundario 1,Sue - Trabajo
secundario 2,Jeremy - Pequeña empresa,Jeremy - Trabajo secundario,Retención de impuestos
federales - Sue,Retención de impuestos estatales - Sue,Retención de impuestos federales -
Jeremy,Retención de impuestos estatales - Jeremy,Ganancias totales - Sue,Ganancias totales -
Jeremy\n2023-01-02,$1000,$200,$300,$800,$150,$100,$50,$80,$50,$1500,$1000\n2023-01-0
9,$1200,$180,$250,$700,$120,$120,$60,$70,$40,$1630,$860\n2023-01-16,$950,$160,$280,$6
50,$180
,$95,$45,$85,$50,$1390,$880\n2023-01-23,$1100,$220,$350,$900,$100,$130,$50,$100,$50,$
1770,$1050\n2023-01-30,$800,$140,$200,$750,$130,$80,$40,$90,$40,$1140,$920\n\n```\n\n
Este archivo CSV rastrea las ganancias de `Sue` y `Jeremy` en varias fuentes de ingresos,
incluidas `Sue - Trabajo de tiempo completo` y dos trabajos secundarios, así como `Jeremy -
Pequeña empresa` y un trabajo secundario. También incluye columnas separadas para
“Retención de impuestos federales” y “Retención de impuestos estatales” tanto para “Sue”
como para “Jeremy”. Las columnas “Ganancias totales” resumen sus ingresos brutos antes de
impuestos para cada fecha respectiva, lo que permite una visión clara de sus ganancias totales
por período.
Translated for readability: Here's a suggested CSV format to track earnings for you and your
husband throughout the year:
\n\n```csv\nDate,Sue - Full-time Job,Sue - Side Gig 1,Sue - Side Gig 2,Jeremy - Small
Business,Jeremy - Side Gig,Federal Tax Withholding - Sue,State Tax Withholding - Sue,Federal
Tax Withholding - Jeremy,State Tax Withholding - Jeremy,Total Earnings - Sue,Total Earnings -
Jeremy\n2023-01-02,$1000,$200,$300,$800,$150,$100,$50,$80,$50,$1500,$1000\n2023-01-0
9,$1200,$180,$250,$700,$120,$120,$60,$70,$40,$1630,$860\n2023-01-16,$950,$160,$280,$6
50,$180,$95,$45,$85,$50,$1390,$880\n2023-01-23,$1100,$220,$350,$900,$100,$130,$50,$10
0,$50,$1770,$1050\n2023-01-30,$800,$140,$200,$750,$130,$80,$40,$90,$40,$1140,$920\n\n`
``\n\n
This CSV file tracks earnings for `Sue` and `Jeremy` across various income sources, including
`Sue - Full-time Job` and two side gigs, as well as `Jeremy - Small Business` and a side gig. It
also includes separate columns for `Federal Tax Withholding` and `State Tax Withholding` for
both `Sue` and `Jeremy`. The `Total Earnings` columns summarize their gross income before
taxes for each respective date, allowing for a clear view of their total earnings per period.
What makes this a good response ?

​ Become the LLM

​ Similar to a well articulated response that a high quality LLM could output.

​ Consideration for the User

​ The text is directed to the user and mentions key parts of the prompt.

​ Organization and Readability

​ The data is organized, with the code section first and then an explanation
of the key components.
​ There are no grammar or syntax errors and the response is correctly
formatted. This is likely spell/grammar checked.

​ Completeness

​ All parts of the question are directly answered, specifically the formatting
of the output file just as the user asks.
​ There is the data component and also an explanation.
Example 2: Direct Example
Prompt: I want ideas for spring and summer painting motifs for my large scale canvas
paintings. I want the ideas in three tables, each table with the following themes: Botanical
Delights, Tranquil Scenes, Abstract Expressions.
What makes this a good prompt ?

​ Real Life User Request

​ You know the user and their situation.


​ It is like a real life problem.

​ Clear Asks

​ There is a direct ask and the format you need (table)

​ Examples

​ There is a clear description of the ideas the user has.


​ There is a clear output desired (table).
Response: \n\n**Table 1: Botanical Delights**\n\n| **Theme** | **Spring Motifs** | **Summer
Motifs** |\n| :---------- | :---------------- | :---------------- |\n| **Flowers** | Tulips | Sunflowers
|\n| | Peonies | Zinnias |\n| | Irises | Marigolds |\n|
**Leaves** | Maple Leaves | Olive Leaves |\n| | Fresh Greenery | Agave Leaves
|\n| **Fruits** | Strawberries | Peaches |\n| | Raspberries | Watermelon
|\n\n**Table 2: Tranquil Scenes**\n\n| **Theme** | **Spring Motifs** | **Summer
Motifs** |\n| :----------------- | :------------------------ | :----------------------- |\n| **Water** |
Calm River Reflection | Seaside Sunset |\n| | Pond with Frogs |
Swimming Pool |\n| **Sunset** | Soft Pink and Orange Hues | Golden Hour at the
Beach |\n| **Nature's Dance** | Birds in Flight | Fireflies at Dusk |\n| **Rustic Charm**
| Spring Cottage | Summer Farmhouse |\n| **Whimsy** | Daisies in a Field |
Cloud-Studded Sky |\n\n**Table 3: Abstract Expressions**\n\n| **Theme** | **Spring
Motifs** | **Summer Motifs** |\n| :-------------------- | :---------------------------- |
:----------------------- |\n| **Warm Colors** | Splashes of Yellow and Orange | Tangerine and
Peach Hues |\n| **Abstract Flowers** | Bold Petal Patterns | Petal Whorls |\n|
**Cool Colors** | Pastel Blue and Green | Ice Cream Cone Swirls |\n| **Geometric
Shapes** | Spring Vines and Leaves | Summertime Sunbeams |\n| **Nature's Elements**
| Watercolor Raindrops | Sandcastle Waves |\n\n
What makes this a good response ?

​ Become the LLM

​ Similar to a well articulated response that a high quality LLM could output.

​ Consideration for the User

​ The text is directed to the user and doesn’t explain more than necessary,
because all the user wants is the table.

​ Organization and Readability

​ The table is organized in the order the user requested.


​ There are no grammar or syntax errors and the response is correctly
formatted. This is likely spell/grammar checked.

​ Completeness

​ All parts of the question are directly answered, specifically the formatting
of the output file just as the user asks.
More examples here.

🛑 Bad Example
Common mistakes to
avoid:
​ Lack of
Context
in
Prompts.
Ensure
that each
prompt
includes a
relevant
context or
real-life
situation.
​ Lack of
clear ask.
Please
ensure
your
prompt
has a
clear ask
in your
prompt.
After
outlining
the
problem
and
context,
ask a
direct,
clear
question.
​ Broken
code. IF
YOU USE
CODE,
please
make
sure the
code runs
successfu
lly without
any
errors.
Please
use your
own IDE
or online
compilers
to test the
code.
​ Incomple
te
Respons
es.Your
response
must
address
every
aspect of
the
prompt.
Missing
any part
of the
request
will cause
lower
scores.
​ Formatti
ng and
Concisen
ess: Note
the format
of your
response.
Ensure it
is clear,
concise,
and
follows
any
specified
formatting
guidelines
.
​ Alignmen
t with
SxS
Scores:
Make
sure your
evaluation
s align
with the
side-by-si
de (SxS)
scores.
Consisten
cy is key.
​ Objective
Scoring:
Be
objective
when
assigning
scores.
Avoid
giving
higher
scores
than what
the
response
realisticall
y
deserves.
​ Avoid
Over-Sco
ring: A
score of 7
should be
extremely
rare.
Reserve
this for
cases
where the
State-of-t
he-Art
(SOTA)
response
fails in
every
aspect,
which
almost
never
happen.
​ Invalid
prompt
for
Structure
d Data
Generati
on. You
are not
asking a
question
that
involves
managing
and
generatin
g SGD.
Instead
you use
only one
data type
(i.e., use
this JSON
to make
another
JSON).
​ Missing
markdow
n format
for the
program
ming
language
. Please
ensure
you are
using the
correct
markdow
n for the
used
programm
ing
language
in prompt
(if code
present in
the
prompt)
and
response.
​ Using
prohibite
d
libraries.
Please do
not use
SwiftUI,
UIKit,
AppKit,
XCTest,
Combine?
, ARKit,
CoreML,
SceneKit,
Metal,
CoreData.
​ Not
mentioni
ng the
program
ming
language
in your
prompts.
Not
including
the
programm
ing
language
in your
prompt
may lead
to SOTA
response
s
generated
in another
language
which
makes it
difficult to
evaluate
the better
response
while
comparin
g your
response
with the
SOTA
response.
Such
tasks will
be
rejected.
​ Using
complete
system
design
problems
in the
prompts.
Such
prompts
are not
helpful in
training
the
model.
We are
looking
for more
logical
real life
problems
which can
be
reviewed
in under
30 mins.

Below is a bad example of a prompt and response.


Prompt: Write a Python function format to help a car lot keep inventory.
What makes this a bad prompt ?

​ NO Clear Asks

​ No information on what they need the function to do or what it will be used


for.

​ NO Examples

​ No information on the format, data type, or structure of the input.


Response: Purchase price, delivery cost, and time of purchase.
\n\n```json\n{\n \"cars\": [\n {\n \"stockNumber\": \"string\",\n \"make\":
\"string\",\n \"model\": \"string\",\n \"trim\": \"string\",\n \"year\": \"number\",\n
\"mileage\": \"number\",\n \"exteriorColor\": \"string\",\n \"interiorColor\": \"string\",\n
\"transmission\": \"string\",\n \"engine\": \"string\",\n \"drivetrain\": \"string\",\n
\"fuelType\": \"string\",\n \"VIN\": \"string\",\n \"purchasePrice\": \"number\",\n
\"deliveryCost\": \"number\",\n \"purchaseDateTime\": \"string\",\n \"images\": [\n
\"string\"\n ],\n \"features\": [\n \"string\"\n ],\n
\"description\": \"string\",\n \"status\": \"string\"\n }\n ]\n}\n```"}]
Here a suggested JSON structure to help a car lot keep inventory of its cars.
What makes this a bad response ?

​ NO Consideration for the User

​ The user asks for a Python script but this is a JSON.

​ NO Organization and Readability


​ Because there is no context/personal perspective, the explanation is also
uncreative.
​ There is no example of an application in the user’s life/scenario.

🔑 Rubric
In addition to a prompt having:
​ Real Life User Request
​ Clear Asks
​ Examples of Data
And a response having:
​ LLM-like Structure
​ Consideration for the User
​ Organization and Readability
​ Completeness

Please focus on the


below categories in
order to beat the
state of the art
model (SOTA):
​ Instr
uctio
n
Follo
wing
​ Conc
isene
ss
​ Truth
fulne
ss
​ Harm
lessn
ess
​ Satis
factio
n

Each of these is explained below.


​ INSTRUCTION FOLLOWING
Definition
All responses must follow the prompt
precisely and answer all
questions/requests in the prompt.

Requirements

Follow all of the requirements in the


prompt (e.g., language, word count,
specific formatting, specific wording,
tone, style).
​ Use consistent
formatting and proper
markdown
​ Meet paragraph or
sentence length
specifications
​ Meet word or
character count
specifications
​ Case A: if they
are
easy-to-meet
numbers (e.g.,
10 words)
​ Case B: If the
request is for
over/under a
specific
number (e.g.,
under 140
characters)
​ Case C: within
a +/- 10%
buffer if it is a
request with
larger numbers
(e.g., 400
words)
​ If the prompt
requests an
extremely long
response (e.g.,
10,000 words),
provide either
(1) an outline
of the
proposed
response or (2)
a long
response
(~500 words)
with an option
for the user to
continue

Scoring
1-2 Language
Terrible ​ Prompt
or
respons
e is not
in the
indicate
d
languag
e,
dialect,
or
spelling
convent
ion
​ Prompt
or
respons
e is
partially
in the
indicate
d
languag
e but
have
major
errors
that
make
them
hard to
underst
and
Length
​ The
respons
e
significa
ntly
deviate
s from
the
length
instructi
ons
(‘500
words’
or ‘2
sentenc
es’)
Role/Context
​ The
prompt
is
unclear
about
the role
or
context
expecte
d from
the
respons
e
​ The
Respon
se does
not
follow
role or
context
instructi
ons
Tone
​ The
respons
e
contradi
cts the
tone
request
ed in
the
prompt
or takes
an
inappro
priate
tone of
voice
for the
context

3 Language
Not ​ Prompt
Passing or
respons
e is in
the
indicate
d
languag
e,
dialect,
or
spelling
convent
ion, but
have
some
spelling
,
gramm
ar, or
phrasin
g errors
Length
​ The
respons
e
partially
follows
length
instructi
ons
Role/Context
​ The
respons
e is
mostly
clear on
the role
or
context
​ The
respons
e
mostly
follows
context
ual
instructi
on
Tone
​ The
respons
e
generall
y
follows
the tone
request
ed in
the
prompt,
with
only
minor
errors
4-5 Language
Excellent ​ Prompt
or
respons
e is in
the
indicat
ed
langua
ge,
dialect,
or
spellin
g
conven
tion
with no
errors
or only
minor
errors
Length
​ The
respons
e
exactly
or
nearly
follows
the
length
require
ments
Role/Context
​ The
respons
e
perfectl
y
adhere
s to the
prompt
’s
context
or role
Tone
​ The
respons
e
perfectl
y
adhere
s to the
request
ed
tone,
with
virtually
no
errors
or
breaks
​ CONCISENESS (Not too many words)
Definition

Prompts/responses must be
written with essential and relevant
details, removing unnecessary
details, fluff, or pleasantries.

Requirements
Prompts/responses SHOULD:
​ Be to the point,
concise, and
answer the request
in an easily
digestible manner
​ Be conversational
and natural in tone
​ Contains a limited
level of detail and
nice-to-have
explanations
​ Be free of
redundant,
irrelevant
information
Prompts/responses SHOULD
NOT:
​ Be verbose,
provide extraneous
information, or
over-explain
concepts when the
prompt does not
request it
​ Include “fluff” or
pleasantries (e.g.,
“Here’s your
request,” “Sure, I
can help with that,”
“Below is a blog
with 100 words”)

Scoring
1-2 Length
Terrible ​ The
resp
onse
exce
eds
the
word
limit
s
direc
tly
requ
este
d by
the
pro
mpt
Verbosity
​ Multi
ple
sent
ence
s to
expl
ain
conc
epts
that
are
poss
ible
withi
na
sent
ence
or
two
​ The
resp
onse
com
muni
cate
s the
sam
e
idea
s in
sligh
tly
diffe
rent
way
s
seve
ral
time
s
Focus
​ Ther
e is
no
cent
ral
the
me
or
mes
sage
to
the
resp
onse
​ Ther
e is
signi
fican
t
irrel
evan
t or
distr
actin
g
infor
mati
on in
the
resp
onse
Tone
​ Inap
prop
riate
,
abru
pt,
or
othe
rwis
e
unpl
easa
nt
tone
3 Length
Not ​ The
Passing resp
onse
is
clos
e to
the
lengt
h
spec
ified
expli
citly
by
the
pro
mpt
Verbosity
​ Res
pons
e
still
may
have
som
e
overl
y
verb
ose
sent
ence
s or
clai
ms
​ The
resp
onse
com
muni
cate
s the
sam
e
idea
s,
but
in
sligh
tly
diffe
rent
way
s
once
or
twic
e
Focus
​ Ther
e’s
an
over
archi
ng
the
me,
but
it’s
not
follo
wed
in
the
resp
onse
​ A
few
mor
e
nice-
to-h
aves
than
nee
ded
Tone
​ Con
vers
ation
al
and
eng
agin
g
tone
4-5 Length
Excellent ​ The
resp
onse
fits
the
lengt
h
spec
ified
expli
citly
by
the
pro
mpt
Verbosity
​ Res
pons
e
effici
ently
com
muni
cate
s its
conc
epts
and
point
s
cons
isten
tly
​ Res
pons
e
does
not
unn
eces
saril
y
repe
at
infor
mati
on,
impli
citly
or
expli
citly
Focus
​ Ther
e is
a
clear
the
me
that
unite
s the
bulle
ts or
detai
ls of
the
resp
onse
​ Deta
ils
are
relev
ant
or
provi
de
satis
fying
illust
ratio
ns of
the
core
the
me
Tone
​ Con
vers
ation
al
and
eng
agin
g
tone
​ TRUTHFULNESS
Definition

A truthful response contains


accurate information, without any
false facts or misleading
statements.

Requirements
​ All of the facts
inside the response
(such as definitions,
numbers, dates,
etc.) are completely
accurate (Please
conduct online
research to make
sure).
​ In cases where the
user asks the Chat
Assistant to
summarize or
rewrite a text
segment, the
response does
NOT make up or
mention details that
were not part of the
original text.
​ If the user asks a
question that
assumes that a
particular false
claim is true, your
response should
assert that the false
claim is NOT true,
rather than leaving
room for ambiguity.
​ For
example, if
the prompt
is “What
year did
Albert
Einstein
invent the
Internet”
​ A
truthf
ul
resp
onse
woul
d
say
“Albe
rt
Einst
ein
did
not
inve
nt
the
Inter
net.”
​ An
untru
thful
resp
onse
to
this
prom
pt
coul
d be
som
ethin
g like
“It’s
not
clear
exac
tly
whe
n
Alber
t
Einst
ein
inve
nted
the
inter
net.”
​ One important
exception: if the
prompt explicitly
asks for fictional
writing, such as
writing a story for
children, then your
response doesn’t
need to be truthful.

Scoring

1-2 Factual Claims


Terrible ​ Few
facts
in the
resp
onse
are
accur
ate
​ Most
claim
s in
the
resp
onse
are
not
accur
ate
Non-verifiable
Information
​ Any
subje
ctive
point
s do
not
reflec
t
dom
ain
expe
rtise
or
adhe
re to
non-
expe
rt
opini
ons
​ It
may
also
claim
factu
ality
wher
e it’s
subje
ctive
Refuting False
Claims
​ The
resp
onse
does
not
at all
corre
ct
false
claim
s
3 Factual Claims
Not ​ Most
Passing facts
in the
resp
onse
are
comp
letely
accur
ate,
confir
med
by
onlin
e
rese
arch
wher
e
nece
ssary
Non-verifiable
Information
​ Subj
ectiv
e
point
s
reflec
t
mostl
y
majo
rity
opini
on/d
omai
n
expe
rts
opini
ons
​ Wher
e this
does
n’t
exist,
the
resp
onse
indic
ates
there
is no
answ
er
Refuting False
Claims
​ The
resp
onse
only
partly
corre
cts
false
claim
s in
the
prom
pt
4-5 Factual Claims
Excellent ​ All
facts
in the
resp
onse
are
comp
letely
accur
ate,
confir
med
by
onlin
e
rese
arch
wher
e
nece
ssary
Non-verifiable
Information
​ Subj
ectiv
e
point
s
reflec
t the
majo
rity
opini
on or
dom
ain
expe
rts
opini
ons
​ Wher
e this
does
n’t
exist,
the
resp
onse
indic
ates
there
is no
answ
er
Refuting False
Claims
​ The
resp
onse
refut
es
any
false
claim
in the
prom
pt
itself
​ HARMLESSNESS
Definition

A harmless response does


NOT damage individuals or
groups of people, their privacy
or reputation, their home or
belongings, intellectual or
physical property, animals, the
environment, institutions, or
resources valuable to humanity
or to the environment. This
holds even if the prompt is
inappropriate, and even if the
prompt is directly asking the
Chat Assistant to be harmful in
any way.
Requirements

Prompts/responses SHOULD:
​ Be respectful,
considerate,
and kind
Prompts/responses SHOULD
NOT:
​ Promote hatred,
violence,
marginalization,
stereotypes,
slurs, or
slandering of
any individuals
or groups of
people
​ This
includes
groups
of
people
based
on race,
color,
national
origin,
religion,
sex,
gender
identity,
sexual
orientati
on, age,
and
disability
.
​ Promote
substance
abuse or illegal
activity.
​ Contain violent
or sexual
content.
​ Contain moral
judgment or
opinionated
content.
​ Contain PII
(personally
identifiable
information)
If the prompt is inappropriate
or is asking the Chat Assistant
to be harmful in any way, your
response should politely turn
down the user’s request and
explain that a Chat Assistant is
not allowed to provide any
inappropriate information.

Scoring

1-2 ANY Harmful or


Terrible Damaging
Content
​ H
a
t
e
s
p
e
e
c
h
,
sl
u
r
s
​ V
io
le
n
t
o
r
s
e
x
u
al
c
o
n
t
e
n
t
​ P
e
r
s
o
n
al
ly
id
e
n
tif
yi
n
g
in
f
o
r
m
a
ti
o
n
​ M
o
r
al
ju
d
g
m
e
n
t
o
r
o
pi
ni
o
n
a
t
e
d
c
o
n
t
e
n
t
​ P
r
o
m
o
ti
o
n
o
f
s
u
b
st
a
n
c
e
a
b
u
s
e
o
r
ill
e
g
al
a
ct
iv
it
y

3 None given. No
Not middle ground.
Passing

4-5 Is kind and


Excellent considerate and
contains NO
harmful or
damaging
content :)
​ SATISFACTION
Definition

A satisfying prompt or
response fits like a glove. It’s
engaging, it’s human, it’s
correct, and it’s just right. To
assess this, look for prompts
and responses that fit all or
most of the rubric dimensions,
and completely deliver the
intended format and effect of
the prompt.

Requirements
​ Well-written in
the correct
language
​ Free from
spelling or
grammar
mistakes
​ Creative and a
little different
(not mandatory
to be a 5 per se,
but important to
not be bland)
​ Delivers
everything the
user asks for

Scoring
1-2 Response fails
Terrible the majority of
the quality
rubric
dimensions and
needs to be
rewritten.
​ I
n
c
o
rr
e
c
t
l
a
n
g
u
a
g
e
-
i
n
a
f
o
r
e
i
g
n
l
a
n
g
u
a
g
e
,
o
r
w
ri
tt
e
n
s
o
p
o
o
rl
y
t
h
e
m
e
a
n
i
n
g
c
a
n
’t
b
e
i
n
t
e
r
p
r
e
t
e
d
​ S
p
e
lli
n
g
a
n
d
g
r
a
m
m
a
r
-
S
i
g
n
ifi
c
a
n
t
a
n
d
/
o
r
d
is
tr
a
c
ti
n
g
m
is
t
a
k
e
s
​ D
o
e
s
n
’t
‘fi
t’
-
D
o
e
s
n
’t
fi
t
t
h
e
i
n
t
e
n
t
o
f
t
h
e
p
r
o
m
p
t
3 Response fails
Not some aspects of
Passing the rubric but
could be fixed in
less than 30
minutes.
​ C
o
rr
e
c
t
l
a
n
g
u
a
g
e
-
b
u
t
m
a
y
b
e
a
lit
tl
e
a
w
k
w
a
r
d
o
r
u
n
cl
e
a
r
​ S
p
e
lli
n
g
a
n
d
g
r
a
m
m
a
r
-
m
i
n
o
r
m
is
t
a
k
e
s
​ R
e
a
s
o
n
a
b
l
e
‘fi
t’
-
D
o
e
s
n
’t
fi
t
t
h
e
i
n
t
e
n
t
o
f
t
h
e
p
r
o
m
p
t
4-5 Meets every
Excellent aspect of the
quality
dimensions.
Perfect, or could
be fixed in less
than 2 minutes.
​ C
o
rr
e
c
t
l
a
n
g
u
a
g
e
-
w
it
h
f
e
w
o
r
n
o
m
is
t
a
k
e
s
​ S
p
e
lli
n
g
a
n
d
g
r
a
m
m
a
r
-
o
n
e
o
r
t
w
o
m
i
n
o
r
b
l
e
m
is
h
e
s
o
k
​ G
o
o
d
‘fi
t’
-
F
it
s
t
h
e
p
r
o
m
p
t’
s
t
o
n
e
a
n
d
i
n
t
e
n
ti
o
n
​ C
r
e
a
ti
v
e
-
d
o
e
s
n
’t
r
e
a
d
o
r
f
e
e
l
li
k
e
a
b
a
si
c
L
L
M
r
e
s
p
o
n
s
e

❓FAQ
Q. Do I have to include code?

A: NO! You can use any data form (list, table). If you do write code, please make sure these are
written and compile completely correctly.

Q. How deep should the code be in problem reflection (pseudocode vs. very specific
implementation)?
A: The depth should be sufficient to demonstrate a clear understanding and solid solution, but it
can vary depending on the time available. Ensure it is well-done and effectively addresses the
prompt.

Q: Is it an issue if a dataset script is generated with the help of an LLM?

A: DO NOT COMPARE YOUR RESPONSE WITH OTHER LLM OR CHATBOTS. It's strictly
prohibited. To get a sense about the quality of your tasks use your own criteria and personal
experience. Your task should be excellent on its own without comparing it.

Q. Is there a limit on skips?

A: No for now. (Last Updated: Jul 3, 2024)

Q. Is there a time limit?

A: Yes, task attempts have a limited time of 45 min. L0 Reviewers have a limit of 30 min. L4
Reviewers have a limit of 20 min.

Q. What level of detail should code comments have?

A: Comments should be very concise and aligned with the instructions, avoiding verbosity and
irrelevant information.

Q. What is the difference between code generation and problem reflection?

A: Problem Reflection: is more oriented towards seeking help to solve a problem by providing
the necessary context. It is required to provide at least two approaches to the problem, each of
which should be thoroughly explained.
Code Generation: focuses more on providing the solution code addressing the requirements.

Q. Where should I ask questions?

A: For general questions that can help other contributors, post your questions in the dedicated
Slack channel. If it's a specific question, please contact your QM. This way, we can optimize
communication to specific issues.

Q. What does Structured Data Generation refer to?


A: The assistant is expected to generate different forms of structured data such as JSON, short
code pieces, lists, and tables. Instructions can be either natural language descriptions of the
expected output or examples demonstrating it.
​ For example:
Detailed requests: Specify very detailed requirements, including generating partial data
or outputting data only without extra text. For example, if a prompt asks for a JSON
object, the response should contain the JSON object.

Q. In the prompt, do we describe the data? Provide 4-5 lines of example? Should it be
code formatted?

A: It is up to the tasker. It is recommended to generate it, either in the prompt or in the response
for better understanding.

Q. What’s the complexity threshold?

A: You can exploit the weak points of LLMs (code structure, adding or removing comments,
latest library releases, 3D animation or graphics that other LLMs cannot produce, tricky words).

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy