Carneros Transcription Guidelines - Updated 20210727
Carneros Transcription Guidelines - Updated 20210727
Carneros Transcription Guidelines - Updated 20210727
Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.
Project goal: The goal of this project is to transcribe audio files that will ultimately
help our client build state of the art automatic speech recognition models.
The aim of this project is to accurately transcribe (i.e. type out or represent with pre-filled tags)
the speech presented to you in audio files. You will be using our online transcription platform
called "Ampersand". A separate guide is provided for using Ampersand.
Please read these guidelines in full and keep them handy when you start transcription. There
are a lot of things to remember, but you will find it gets easier once you have done a few
transcriptions. If anything is unclear, please contact your project supervisor. Good luck!
1
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.
General information
The purpose of this project is to transcribe all valid speech as
well as the non-speech sounds which occur at the same time as
speech.
Speech is anything which contains human language. In this project,
we transcribe speech even if it is not grammatically correct —
including:
Example
2
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.
Your volume settings should be set so that the loudest speaker in the
utterance is at a comfortable volume. Foreground speech is any
Foreground speech which can be clearly understood at that volume, without
speech/noise straining or repeated listening.
Speech and noises which are clearly quieter than this volume should
not be transcribed or tagged, even if they are audible and intelligible.
An utterance is a single unit of transcription. Each utterance has its
own text input box and needs to be saved before a user can move on
Utterance to the next utterance. The breaks between utterances can generally be
ignored: they are only intended to break up the audio into easily
transcribable sections.
A batch of transcription work is a single, continuous audio file which is
Batch further divided into pages and utterances.
3
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.
Transcribing speech
Use standard US English spelling.
Example:
Correct Incorrect
traveled travelled
canceled cancelled
neighbor neighbour
Example
Speaker says '24' – use a hyphen
● 24 ==> TRANSCRIPTION: twenty-four
Speaker says '20' followed by '4' – do NOT use a hyphen
4
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.
Example
One sound different
unintelligible Also use this tag for word fragments and stutters.
5
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.
Example
A speaker says a word you don't understand
We’d like to identify at what point the speakers change at a batch level
using timestamps. This means you will identify the following points in
an audio
- [Speaker]_start: This is used when there is a new speaker in the
audio, or a changed speaker
- [Speaker]_end: This is used when the speaker finishes speaking,
either when the batch is complete or before another person
starts speaking.
6
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.
7
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.
If the speaker changes during the audio, the transcriptions look like
this.
speech.
8
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.
speech
Use when two or more foreground speakers talk at the same time at
more or less the same volume. Do NOT transcribe overlapping speech,
insert this tag in place of overlapping words.
9
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.
what happened?
speech
10
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.
1. Use the event tag for each sung word that you
cannot understand (e.g. unintelligible singing, mumbling...) or for each
sung word in a foreign language (even if you can understand it). Use
Example:
A speaker starts a sentence in English and then says a word in
German but in a sing-song manner “kaaaartoffeeeeellll!”.
TRANSCRIPTION:
If there is more than one word sung in a sequence, please use one
singing tag for each word. Use your best judgement to determine
the number of sung words.
11
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.
Example:
Someone starts rapping but you cannot understand the
words. You believe you can hear at least 5 words.
TRANSCRIPTION:
Example:
Someone is singing two words you don’t understand and a
few seconds later someone starts rapping at the same time.
TRANSCRIPTION:
Example:
The interviewer recites a poem in French and then the main
subject follows.
TRANSCRIPTION:
2. If you can understand the sung words (and they are in English), you
should write them down and highlight them with the span tag
Example:
Someone is reciting a poem.
12
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.
TRANSCRIPTION: speech
speech
I love it!
/!\ Tips:
● Use the event tag for sung words that you cannot
understand.
13
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.
● Use the multiple speaker event tag that applies for singing like
you do for spoken speech or when the singer changes.
● Use punctuation in places where it falls naturally in songs,
singsong words, poems or sermons.
● If multiple people sing the same words at the same time, please
transcribe it as one speech.
● If multiple people sing different words at the same time (i.e.
different songs, out of sync, in a round), use the
tag.
● If singing and spoken speech occur at the same time and at a
similar volume, use the tag.
● If singing and spoken speech occur at the same time but one of
the two is in the background, only transcribe what is in the
foreground.
Example:
Foreign Speech A speaker says a foreign word after “does” and you cannot
identify the foreign word
If there is more than one foreign word in sequence, use one foreign
tag for each word. Use your best judgement to determine the
number of foreign words.
Example:
14
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.
Example:
A speaker says “denken Sie an die Kinder“ in the middle of a
sentence and you understand the words
/!\ Tips:
15
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.
Numbers should be spelled out as full words in the way they were
said.
/!\ - For the number 0 (zero), if the speaker says it as the letter ‘O’, it
should be typed as ‘oh’. For example:
101 ⇒ TRANSCRIPTION: one oh one
Example
The number '2012' may be said in many different ways
16
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.
Digits (e.g. 1 2 3 4 5 ...) can be used ONLY when they are joined to a
letter as part of a name without a space.
Example
● H2O ==> TRANSCRIPTION: H2O
● iPhone 6S ==> TRANSCRIPTION: iPhone 6S
● PS4 ==> TRANSCRIPTION: PS4
However
Example
Acronyms &
Initialisms ● N.A.S.A or N A S A ==> NASA
● U.S.A. or U S A ==> USA
● A.M / P.M. ==> AM / PM
● FIFA
● UNESCO
17
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.
Example
List of Hesitations/Interjections
Acceptable
Meaning
Spelling
Hesitations and Agreement hm, mm
interjections
Disagreement huh, ah, oh, uh
Surprise wow, oh, ah
Seeking
eh, mhm, ehm
Confirmation
Disgust bah, bleah
18
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.
Example
19
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.
Example
colloquial Speaker's
Transcription Full Form
Pronunciation
reservatio
rez
n
20
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.
Example
If you hear a word in the audio but you are not entirely sure how to
spell it or you are not entirely confident you are hearing the word
correctly, highlight the word using the best guess tag.
This tag might be needed if the speaker uses a proper name you are
unfamiliar with.
Please do not use the best guess tag if the speech is unintelligible
because the audio quality is poor, the speaker mumbles, etc. for these
Example:
● You hear "he told me to go to Wolengi” but you are not sure
b
best guess what is Wolengi or how to spell it; you spell as best guess and
use the tag: Wolengi
Do NOT use this tag for words you can easily spell correctly by doing a
quick online search.
Examples:
21
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.
/!\ Remember:
If you hear something in your language but cannot make out at all
22
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.
Example
You hear some speech punctuated by a cough, followed by a
half second pause, and then a loud noise:
Even if there is more than half a second of no speech at one point, you
only need one no speech tag to represent that event.
no speech c Example
There are 5 seconds of no speech. Then the speaker starts
talking. Then there is 1 second of no speech at the end.
23
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.
Example
TRANSCRIPTION:
You must ignore all sounds if there is no speech in the entire
utterance.
Use for all sounds made by a foreground human which is not speech
(e.g. any sounds from the mouth or nose: breath, cough, lipsmack, and
laughing).
Example
s
spk Someone laughs in the middle of their sentence
some socks.
Use for music (without lyrics) that does not overlap with foreground
speech. Singing from the foreground speaker should be tagged as
singing, not as music.
m
music Only use this tag if:
- the volume is at or near the volume of the surrounding
foreground speech.
24
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.
Example
A news broadcaster announces a news headline before a
music jingle that starts less than half a second after the last
word is pronounced:
You hear some music, then a pause without any sound for
more than half a second and then some speech. Music is
ignored as it’s too far from speech (more than half a second
away).
Use for any non-speaker noise that occurs at the same volume as
foreground speech.
Do not tag background noise that is at a lower volume than speech.
Use when a word gets cut off at the end of an utterance because the
computer has not cut up the audio correctly. This is different from a
t
fragment (where the person stops talking part way through a word). In
truncation a truncation, the recording has cut someone off while they were saying
25
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.
When you hear a truncation at the end of an utterance and you can
transcribe the word with certainty, write out the truncated word in full
Example
The word 'probably' is split with "prob-" at the end of the first
utterance and "-ably" at the beginning of the second
utterance.
If you are unable to tell what the truncated word is at the end of an
Example
An unintelligible word is truncated.
UTTERANCE 1: we bought a
26
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.
Punctuation
A sentence is a grammatically complete unit. A sentence will usually, but not
always, contain a subject (e.g. "the cat") and a verb (e.g. "sat"). Examples of
grammatically complete sentences which do not have a subject and verb
include answers to questions (e.g. "yes." and "no.") and exclamations ("what!"
and "really?").
Example
At the end of each sentence, use either a period (.) for statements, a question
mark (?) for questions, or an exclamation mark (!) for exclamations. Do not
use punctuation combinations ("?!", "!!!", "..."). Do not use hyphens or
Punctuation quotation marks to indicate quoted or mentioned speech. No other punctuation
(such as : ;) should be used.
Only place punctuation at the end of an utterance if the end of the utterance is
also the end of a sentence. If the speaker continues the same sentence into the
next utterance, put the punctuation wherever it naturally falls in the speech.
See the description of an utterance.
Examples:
● TRANSCRIPTION:
UTT1: win this year! what do you think
UTT2: about the Knicks? they seem to have finally
See the "incomplete" tag section below for instructions about sentence
fragments which are not grammatically complete.
Insert the incomplete tag when a foreground speaker begins a sentence and is
either (a) interrupted by a new speaker, or (b) begins a new sentence before
incomplete the first grammatically complete sentence is finished.
27
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.
The tag should not be used to indicate that a sentence is continuing into a
second utterance.
Examples
You do not need to use the incomplete tag when the speaker restarts or
repeats a single word.
Use commas (,) in two situations only:
● For lists of items ("I ate two apples, three oranges, and a banana.") and
sequences of adjectives ("he was a big, red haired, evil man.")
Commas
● For introductory phrases ("so I was thinking, how do you do it?", "at the
end of the day, what matters is your health.").
When unsure whether to use a comma, err on the side of not using one.
28
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.
Resources
● English Punctuation Rules
● Capitalization in English
● Merriam Webster Dictionary
29