Carneros Transcription Guidelines - Updated 20210727

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

NOTE: All information provided in this document is confidential.

Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.

Carneros Transcription Guidelines


Introduction

Project goal: The goal of this project is to transcribe audio files that will ultimately
help our client build state of the art automatic speech recognition models.

The aim of this project is to accurately transcribe (i.e. type out or represent with pre-filled tags)
the speech presented to you in audio files. You will be using our online transcription platform
called "Ampersand". A separate guide is provided for using Ampersand.

Please read these guidelines in full and keep them handy when you start transcription. There
are a lot of things to remember, but you will find it gets easier once you have done a few
transcriptions. If anything is unclear, please contact your project supervisor. Good luck!

1
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.

General information
The purpose of this project is to transcribe all valid speech as
well as the non-speech sounds which occur at the same time as
speech.
Speech is anything which contains human language. In this project,
we transcribe speech even if it is not grammatically correct —
including:

● hesitations ("um", "er"),


● colloquial words ("gonna", "wassup"), and
● repeated words ("they they was gonna be there.").

Example

● TRANSCRIPTION: warm colors like uh red, orange and uh


yellow. I seen
Speech, ● TRANSCRIPTION: in my opinion, the Cavs are the best team.
non-speech noise, they're gonna
and no-speech

Most speech is represented by words and characters. Some speech,


however, is unintelligible or overlaps with other speech from a
different speaker. This speech should be represented with pre-filled
tags.
Non-speech sounds which occur during speech also need to be
tagged. If non-speech sounds such as music, laughter, coughing,
clicks, and bangs occur within half a second of speech, these sounds
should be tagged.
If an entire utterance doesn't contain any speech (words), then the
sounds that occur in this utterance should not be tagged. Instead, use
the tag no speech and move on.

If the utterance contains speech, then insert the tag no speech


wherever a pause longer than half a second occurs.

2
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.

Your volume settings should be set so that the loudest speaker in the
utterance is at a comfortable volume. Foreground speech is any
Foreground speech which can be clearly understood at that volume, without
speech/noise straining or repeated listening. 
Speech and noises which are clearly quieter than this volume should
not be transcribed or tagged, even if they are audible and intelligible. 
An utterance is a single unit of transcription. Each utterance has its
own text input box and needs to be saved before a user can move on
Utterance to the next utterance. The breaks between utterances can generally be
ignored: they are only intended to break up the audio into easily
transcribable sections.
A batch of transcription work is a single, continuous audio file which is
Batch further divided into pages and utterances.

3
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.

Transcribing speech
Use standard US English spelling.
Example:
Correct Incorrect
traveled travelled
canceled cancelled
neighbor neighbour

Use standard contractions ("I'm", "could've", "let's" but not "tryna" or


"'em") if this is how a word is pronounced in the audio. Also use
possessive apostrophes where necessary, e.g. "Mike's job", "both kids'
toys".

Spelling Hyphens should be used in compound words or expressions, especially


if the hyphen will change the meaning. In numbers hyphens are
sometimes required to distinguish large numbers from sequences of
smaller numbers. 

Example
Speaker says '24' – use a hyphen
● 24 ==> TRANSCRIPTION: twenty-four 
Speaker says '20' followed by '4' – do NOT use a hyphen

● 20 4 ==> TRANSCRIPTION: twenty four

So-called expressions should use hyphens.


● TRANSCRIPTION: her mother-in-law and her so-called genius
son

If a pronunciation is only one sound different from its conventional


Acceptable spelling, please use the conventional spelling. If the spoken form
differs by more than one sound, and there is a commonly-used English
non-standard spelling, please use that spelling.
spellings

4
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.

Example
One sound different

● bruh  ==> TRANSCRIPTION: bro


● K  ==> TRANSCRIPTION: okay
● walkin' , talkin' , seein' ==> TRANSCRIPTION: walking, talking,
seeing

More than one sound different

● wanna, gonna ==> TRANSCRIPTION: wanna, gonna


● c'mon, cuz, dunno, gimme ==> TRANSCRIPTION: c'mon, cuz,
dunno, gimme
Use English capitalization rules with one exception: do not use a
capital letter if the only reason to do so is that the word is at
the start of a sentence.
Most person names ("Barack Obama"), location names ("Golden Gate
Bridge", "Russia"), products, and brand names ("Five Guys",
Capital letters "YouTube") should be capitalized.

/!\ We do use a capital letter if the word will be capitalized regardless


of the place within the sentence, e.g. "I", name of days, name of
months, proper nouns (brand name, people name, etc.)
Use as a placeholder for a word, or several words, that cannot be
understood because there is interference, an audio problem, or
because the person is not talking clearly.
Enter this tag in place of the speech which cannot be understood after
three attempts at listening.
If there is more than one unintelligible word in sequence, use a single
tag. If the entire sentence or utterance cannot be understood, use a
single unintelligible tag.

unintelligible Also use this tag for word fragments and stutters. 

l If you cannot understand a word because it is in a foreign language,


use the tag.

5
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.

Example
A speaker says a word you don't understand

TRANSCRIPTION: I want to go to tomorrow


Speaker says 'go t- tomorrow'

TRANSCRIPTION: go  tomorrow

Each audio contains 2 main speakers: a main subject (who is


answering questions), an interviewer (who is asking the questions) and
possibly some background speakers (like a crew member).

We’d like to identify at what point the speakers change at a batch level
using timestamps. This means you will identify the following points in
an audio
- [Speaker]_start: This is used when there is a new speaker in the
audio, or a changed speaker
- [Speaker]_end: This is used when the speaker finishes speaking,
either when the batch is complete or before another person
starts speaking.

To try to be as precise as possible, please place the timestamps within


0.5 seconds of the event happening. Do not put the timestamp in the
Multiple Speakers middle of a word (or you will cut the word).

Every batch with foreground speech should contain at least two


timestamps - to show when they started and stopped speaking.

The following are the identified speakers and timestamps to be used in


the task.

(Shortcut H): This is used when the main subject


(person being interviewed) starts speaking.

(Shortcut Z): This is used when the main subject


(person being interviewed) stops speaking.

6
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.

(Shortcut A): This is used when the asker


(interviewer) starts speaking in the audio

(Shortcut B): This is used when the asker


(interviewer) stops speaking in the audio.

: This is used when another speaker in the audio


starts speaking. Use this only if there are at least 2 other people in the
audio.

: This is used when another speaker in the audio


stops speaking. Use this only if there are at least 2 other people in the
audio.

: This is used when another speaker in the audio


starts speaking. Use this only if there are at least 3 other people in the
audio.

: This is used when another speaker in the audio


stops speaking. Use this only if there are at least 3 other people in the
audio.

: This is used when another speaker in the audio


starts speaking. Use this only if there are at least 4 other people in the
audio.

: This is used when another speaker in the audio


stops speaking. Use this only if there are at least 4 other people in the
audio.

Timestamps should be used at a batch level. That is, you should


consider the context of the batch from one utterance to the next.

7
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.

If the speaker does not change throughout the entire audio, a


speaker_start timestamp is needed when they first speak, and a
speaker_end timestamp is used for the last speech in the file.

Example: There is a batch with 13 utterances. In the first


utterance, the interviewer starts talking straight away. All utterances
consist of this pattern . In the last utterance, they say ‘that’s a wrap’
and there is silence for 0.5 seconds.

TRANSCRIPTION Utterance 1: speech


TRANSCRIPTION Utterance 2-12: speech

TRANSCRIPTION: Utterance 13: that’s a wrap.


[no speech]

If the speaker changes during the audio, the transcriptions look like
this.

Example 1: There is a batch with 6 utterances. The first utterance


starts with 3 seconds of silence then the main subject starts talking.
They talk until the middle of utterance 5. After, the interviewer asks
a question (for utterances 5 and 6). The audio ends immediately
after the interviewer’s question.

TRANSCRIPTION Utterance 1: [no speech]


speech
TRANSCRIPTION Utterance 2-4: speech

TRANSCRIPTION Utterance 5: speech.

speech.

TRANSCRIPTION Utterance 6: speech.

8
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.

Example 2: There is a batch with 7 utterances. The first 2


utterances contain main subject speech. The 3rd utterance starts
with the interviewer’s speech. The interviewer talks until the middle
of utterance 7. The main subject talks again and the audio finishes.

TRANSCRIPTION Utterance 1: speech

TRANSCRIPTION Utterance 2 : speech

TRANSCRIPTION Utterance 3: speech


TRANSCRIPTION Utterance 4-6: speech

TRANSCRIPTION Utterance 7: speech

speech

Use when two or more foreground speakers talk at the same time at
more or less the same volume. Do NOT transcribe overlapping speech,
insert this tag in place of overlapping words.

When there is overlapping speech but you can distinguish and


understand a single speaker, transcribe that speaker.

Overlapping speech can occur at any point, i.e. in the middle of a


speaker turn, at the beginning and/or end of a speaker turn. In ALL
overlap o
cases, overlapping speech should NOT be transcribed, and the tag
should be inserted in place of the overlapping speech.

Example: Assume there is a batch with only one utterance. The


main subject says “I like books more than movies.” Then, there is

9
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.

overlapping speech. The main subject then continues and says


“Pollyanna is my favorite book.” The speaker then finishes speaking.

TRANSCRIPTION: I like books more than

movies. Pollyanna is my favorite book.

An overlap tag is placed between the speaker1_end and speaker2_start


after the tag if there is a speaker change after the overlapping speech,
i.e. if it is the end of the speaker's turn.

Example: Assume there is a batch with only one utterance. The


main subject says “I’m really tired.” Then the interviewer asks “what
happened?” The interviewer asks a further question but the main
subject has already started answering. The interviewer stops and the
main speaker continues talking. The speaker then stops.

TRANSCRIPTION: I’m really tired.

what happened?

speech

Singing that occurs in the foreground should be transcribed.


We consider the following as singing: rapping, chanting mantras,
Singing recital of poetry, words spoken in a sing-song manner, or ritualistic
holy sermons.

10
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.

There are two ways to treat singing in this project: a


tag to replace each word that is sung that you do not know or cannot

understand and a span tag to highlight words that are sung


and that you can write down.
Singing is considered as speech and therefore the same rule applies: if
this occurs, you need to add the speaker tag to represent there is a
different speaker (e.g. main subject, or interviewer).

1. Use the event tag for each sung word that you
cannot understand (e.g. unintelligible singing, mumbling...) or for each
sung word in a foreign language (even if you can understand it). Use

the event tag also for scatting/nonsense singing.

Example:
A speaker starts a sentence in English and then says a word in
German but in a sing-song manner “kaaaartoffeeeeellll!”.

TRANSCRIPTION: and then he told me !

Someone scats/sings nonsense


“dee-doo-dee-daba-deedoo-boobee-bah”: Use a single singing
tag.

TRANSCRIPTION:

If there is more than one word sung in a sequence, please use one
singing tag for each word. Use your best judgement to determine
the number of sung words.

11
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.

Example:
Someone starts rapping but you cannot understand the
words. You believe you can hear at least 5 words.

TRANSCRIPTION:

Example:
Someone is singing two words you don’t understand and a
few seconds later someone starts rapping at the same time.

TRANSCRIPTION:

Example:
The interviewer recites a poem in French and then the main
subject follows.

TRANSCRIPTION:

2. If you can understand the sung words (and they are in English), you
should write them down and highlight them with the span tag

Example:
Someone is reciting a poem.

TRANSCRIPTION: and then I saw a fairy, come flying right by me

12
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.

Multiple people sing the Happy Birthday song together at the


same time

TRANSCRIPTION: happy birthday dear Rajesh, happy birthday to


you!

Multiple people sing the song Frère Jacques in a round


(starting a few seconds apart from each other). There was
speech by the main speaker just before the singing started,
and speech by the interviewer just after the singing stopped.

TRANSCRIPTION: speech

speech

Multiple people take turns singing a song (e.g. interviewer


sings, and then the subject follows.

TRANSCRIPTION: I believe I can fly,

I believe I can touch the sky

The interviewer speaks and then the main subject starts


talking in a sing-song manner.

TRANSCRIPTION: what do you say?

I love it!

/!\ Tips:

● Use the event tag for sung words that you cannot
understand.

● Use the event tag for each foreign sung word,


even if you can understand it.
● Ignore singing in the background.
● Ignore music that accompanies singing. Brief periods of music
alone without singing within an utterance should also be ignored.

13
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.

● Use the multiple speaker event tag that applies for singing like
you do for spoken speech or when the singer changes.
● Use punctuation in places where it falls naturally in songs,
singsong words, poems or sermons.
● If multiple people sing the same words at the same time, please
transcribe it as one speech.
● If multiple people sing different words at the same time (i.e.
different songs, out of sync, in a round), use the
tag.
● If singing and spoken speech occur at the same time and at a
similar volume, use the tag.
● If singing and spoken speech occur at the same time but one of
the two is in the background, only transcribe what is in the
foreground.

There are two ways to transcribe foreign speech: a tag


to replace a word you do not know and a span tag to highlight foreign
words you can write down.

Use the tag for speech in a language other than English


which would not be understood by US English speakers.
Loan words such as “sombrero” and “sayonara” are acceptable and
should be transcribed.

Example:
Foreign Speech A speaker says a foreign word after “does” and you cannot
identify the foreign word

TRANSCRIPTION: what does mean in Russian?

If there is more than one foreign word in sequence, use one foreign
tag for each word. Use your best judgement to determine the
number of foreign words.

Example:

14
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.

A speaker says “denken Sie an die Kinder“ in the middle of a


sentence but you do not understand

TRANSCRIPTION: I thought she said


and then

If you can understand the foreign language, please write the


words down and highlight them with the span tag .

Example:
A speaker says “denken Sie an die Kinder“ in the middle of a
sentence and you understand the words

TRANSCRIPTION: I thought she said denken Sie an die Kinder and


then

/!\ Tips:

● Remember that loanwords are words borrowed from other


languages that are widely known and understood by English
speakers. They are not considered foreign words for the
purposes of this project and should not receive a foreign tag.
● Foreign names (people’s names, places, etc.) are not considered
foreign words and should be transcribed.
● If you cannot understand a word due to interference, audio
problems, or because the person is not talking clearly but it is in

your language, use


● If you cannot understand a word because it is in a foreign
language, use
● If you are unsure of the spelling but you understand the word
and it is used in your language as a loanword, do an internet
search to find the most common spelling.

15
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.

● If you can understand and transcribe what is said but it is not in


English and not a loanword, please highlight the words with
foreign.
● Singing in a foreign language should be tagged as

(one tag for each sung word you can identify),


do not write down the words even if you can understand them.

Numbers should be spelled out as full words in the way they were
said.

/!\ - For the number 0 (zero), if the speaker says it as the letter ‘O’, it
should be typed as ‘oh’. For example:
101 ⇒ TRANSCRIPTION: one oh one

Example
The number '2012' may be said in many different ways

● 2012  ==> TRANSCRIPTION: twenty twelve


● 2012  ==> TRANSCRIPTION: two thousand and twelve

Speaker states a lottery number (4 8 6 2)

Numbers ● 4 8 6 2 ==> TRANSCRIPTION: four eight six two

Speaker reads the time

● now it is 5:30pm. ==> TRANSCRIPTION: now it is five thirty


PM.

Speaker reads a math equation

● 1 + 1 = 2. ==> TRANSCRIPTION: one plus one equals two.

Speaker uses a currency

● this item costs $12.99. ==> TRANSCRIPTION: this item costs


twelve dollars ninety-nine.

16
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.

Digits (e.g. 1 2 3 4 5 ...) can be used ONLY when they are joined to a
letter as part of a name without a space.

Example

● H2O ==> TRANSCRIPTION: H2O
● iPhone 6S ==> TRANSCRIPTION: iPhone 6S
● PS4 ==> TRANSCRIPTION: PS4

However

● Xbox 360 ==> TRANSCRIPTION: Xbox three sixty

Acronyms and initialisms are words made up of the first letters of


words. They may be pronounced as a word, or each letter may be
pronounced separately. Acronyms and initialisms are spelled using
uppercase letters with no space or period in between.

Example
Acronyms &
Initialisms ● N.A.S.A or N A S A ==> NASA
● U.S.A. or U S A ==> USA
● A.M / P.M. ==> AM / PM
● FIFA
● UNESCO

When a speaker spells a word out, letter by letter, please transcribe


uppercase letters with a space in between.
Spelled out words
Example

17
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.

● TRANSCRIPTION: spelling sequences are transcribed as


isolated uppercase letters. if I spell my name to you, I would
say J O H N.
● TRANSCRIPTION: M A N H A T T A N. M A N H A double T A N.

If you need to transcribe an email address or website address,


separate the elements as spoken.

Example

● www.facebook.com  ==> TRANSCRIPTION: WWW dot


Emails / websites Facebook dot com.
● johndoe@gmail.tv ==> TRANSCRIPTION: John Doe at Gmail
dot TV.
● maeve17@hotmail.ie => TRANSCRIPTION: Maeve seventeen
at Hotmail dot IE.

All inappropriate language should be transcribed. If you feel


Inappropriate
uncomfortable typing a particular word, use the unintelligible tag (see
language unintelligible tag) in its place.
Transcribe hesitations and other disfluencies like uh-huh and hm, using
the table below.

List of Hesitations/Interjections
Acceptable
Meaning
Spelling
Hesitations and Agreement hm, mm
interjections
Disagreement huh, ah, oh, uh
Surprise wow, oh, ah
Seeking
eh, mhm, ehm
Confirmation
Disgust  bah, bleah

18
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.

Delight  eh, wow, ah


Calling Someone  ehi, eh, oh
 eh, wa, oh, ah,
Emphasizing
uh

Example

● TRANSCRIPTION: hm, what did I say?


● TRANSCRIPTION: oh, I totally forgot that.
● TRANSCRIPTION: I'd like to watch this movie uh with uh
it's some kinda love story.

19
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.

Span Tags (highlighting)


There are two types of tags: span tags (colored) and event tags (gray). Look for these in the
screenshot below.
Event tags are inserted between words, while span tags are used to highlight words.
To undo highlighting tags, select the highlighted word and then click on untag. You will not notice
any change until you move on, then the highlighting color will revert to white.

Span Tag Shortcut How to use it

For non-standard words and spellings that often appear in spoken


language, transcribe what is heard and highlight the word using the
colloquial span tag.

In general, if a word would not appear in a dictionary or formal written


context (e.g. a newspaper), then the word is likely to be colloquial.
When in doubt, use the colloquial tag rather than leaving a word
untagged.

Example

colloquial Speaker's
Transcription Full Form
Pronunciation

aight all right

aboutcha about you

reservatio
rez
n

Use this to highlight any words that were accidentally mispronounced.


Spell the word in the normal (correct) way, then highlight it. There is
mispronunciat no need to use this if someone has an accent — it should only be used
ion when the person accidentally said something the wrong way. When in
doubt ask yourself "would this person pronounce the word differently if

20
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.

I asked them to repeat themselves?" If they would, it can be classified


as a mispronunciation.

Example

You hear “what time are you leabing?”


TRANSCRIPTION: what time are you leaving?

If you hear a word in the audio but you are not entirely sure how to
spell it or you are not entirely confident you are hearing the word
correctly, highlight the word using the best guess tag.

This tag might be needed if the speaker uses a proper name you are
unfamiliar with.

Please do not use the best guess tag if the speech is unintelligible
because the audio quality is poor, the speaker mumbles, etc. for these

cases please use the tag.

Example:

● You hear "he told me to go to Wolengi” but you are not sure
b
best guess what is Wolengi or how to spell it; you spell as best guess and
use the tag: Wolengi

Do NOT use this tag for words you can easily spell correctly by doing a
quick online search.

Examples:

● You are unsure of the name of an artist, "Emir Kusturica" you


should look up online with an approximate spelling + keywords
(e.g you heard "movie" in the batch) to find the correct spelling.
● You are unsure of the spelling of "necessary" you should look up
online or in a dictionary and use the correct spelling.

21
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.

/!\ Remember:
If you hear something in your language but cannot make out at all

the word = use

If you hear something in a foreign language that you cannot


understand = use

22
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.

Tagging non-speech noises and events


These are listed in order of how often they are likely to be used. The more common tags are listed
at the top of the table.

Event Tag Shortcut How to use it


Any pause of at least half a second without speech should be tagged
with the no speech tag.
Non-speech noises which are not within half a second of speech do not
need to be tagged.

Example
You hear some speech punctuated by a cough, followed by a
half second pause, and then a loud noise:

TRANSCRIPTION: I ever heard of him.

Even if there is more than half a second of no speech at one point, you
only need one no speech tag to represent that event.

no speech c Example
There are 5 seconds of no speech. Then the speaker starts
talking. Then there is 1 second of no speech at the end.

TRANSCRIPTION: I never heard of him.

/!\ If an entire utterance does not contain any speech, it should be


transcribed with one tag ONLY: no speech tag. Even if it contains
other sounds, you must ignore them if there is no speech at all.

23
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.

Example

The whole utterance contains someone crying, loud noises or


instrumental music:

TRANSCRIPTION:
You must ignore all sounds if there is no speech in the entire
utterance.

Use for all sounds made by a foreground human which is not speech
(e.g. any sounds from the mouth or nose: breath, cough, lipsmack, and
laughing).

Only use this tag if:


- the volume is at or near the volume of the surrounding
foreground speech.
- AND the sound occurs within half a second of speech.

Example
s
spk Someone laughs in the middle of their sentence

TRANSCRIPTION: seriously that’s ridiculous!

Someone is speaking and then someone else coughs loudly.


TRANSCRIPTION: and after that I went to Forever twenty-one to buy

some socks.

Use for music (without lyrics) that does not overlap with foreground
speech. Singing from the foreground speaker should be tagged as
singing, not as music.
m
music Only use this tag if:
- the volume is at or near the volume of the surrounding
foreground speech.

24
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.

- AND the sound occurs within half a second of speech.

Example
A news broadcaster announces a news headline before a
music jingle that starts less than half a second after the last
word is pronounced:

TRANSCRIPTION : more on that tonight on BBC.

You hear some music, then a pause without any sound for
more than half a second and then some speech. Music is
ignored as it’s too far from speech (more than half a second
away).

TRANSCRIPTION : I told you so!

Use for any non-speaker noise that occurs at the same volume as
foreground speech.
Do not tag background noise that is at a lower volume than speech.

Only use this tag if:


- the volume is at or near the volume of the surrounding
foreground speech.
- AND the sound occurs within half a second of speech.
n
noise
Example
Someone knocks loudly at the door (within half a second of
speech), and then someone speaks.

TRANSCRIPTION : who is it?

Use when a word gets cut off at the end of an utterance because the
computer has not cut up the audio correctly. This is different from a
t
fragment (where the person stops talking part way through a word). In
truncation a truncation, the recording has cut someone off while they were saying

25
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.

a word. Therefore, truncations only occur at the start or end of an


utterance.

When you hear a truncation at the end of an utterance and you can
transcribe the word with certainty, write out the truncated word in full

followed by the  tag. When you hear a truncation at the

start of an utterance, insert the  tag only.

Example
The word 'probably' is split with "prob-" at the end of the first
utterance and "-ably" at the beginning of the second
utterance.

UTTERANCE 1: in that case we should probably 

UTTERANCE 2:  consider other options

If you are unable to tell what the truncated word is at the end of an

utterance, simply insert the tag in place of the word

followed by the tag.

Example
An unintelligible word is truncated.

UTTERANCE 1: we bought a

UTTERANCE 2: from the market yesterday and

26
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.

Punctuation
A sentence is a grammatically complete unit. A sentence will usually, but not
always, contain a subject (e.g. "the cat") and a verb (e.g. "sat"). Examples of
grammatically complete sentences which do not have a subject and verb
include answers to questions (e.g. "yes." and "no.") and exclamations ("what!"
and "really?").

Example

● TRANSCRIPTION: running smoothly now. could I do more? yes,


maybe.

At the end of each sentence, use either a period (.) for statements, a question
mark (?) for questions, or an exclamation mark (!) for exclamations. Do not
use punctuation combinations ("?!", "!!!", "..."). Do not use hyphens or
Punctuation quotation marks to indicate quoted or mentioned speech. No other punctuation
(such as : ;) should be used.
Only place punctuation at the end of an utterance if the end of the utterance is
also the end of a sentence. If the speaker continues the same sentence into the
next utterance, put the punctuation wherever it naturally falls in the speech.
See the description of an utterance.

Examples:

● TRANSCRIPTION:
UTT1: win this year! what do you think
UTT2: about the Knicks? they seem to have finally

See the "incomplete" tag section below for instructions about sentence
fragments which are not grammatically complete.

Insert the incomplete tag when a foreground speaker begins a sentence and is
either (a) interrupted by a new speaker, or (b) begins a new sentence before
incomplete the first grammatically complete sentence is finished.

27
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.

The tag should not be used to indicate that a sentence is continuing into a
second utterance.

Examples

● TRANSCRIPTION: I'm going to the  the weather is


lovely today.

● TRANSCRIPTION: I don't know if  they're very


conservative about it.

You do not need to use the incomplete tag when the speaker restarts or
repeats a single word.
Use commas (,) in two situations only:

● For lists of items ("I ate two apples, three oranges, and a banana.") and
sequences of adjectives ("he was a big, red haired, evil man.")
Commas
● For introductory phrases ("so I was thinking, how do you do it?", "at the
end of the day, what matters is your health.").

When unsure whether to use a comma, err on the side of not using one.

28
NOTE: All information provided in this document is confidential. Any publication,
provision, or dissemination of this content is strictly prohibited. Do not share or
post the contents on the internet.

Resources
● English Punctuation Rules
● Capitalization in English
● Merriam Webster Dictionary

29

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy