Week 5: - Speech To Radio
Week 5: - Speech To Radio
Week 5: - Speech To Radio
Speech Coder
Channel Coder
Interleaving
• Speech to Radio
Burst Assembly
Chipering
Modulation
From source information to radio waves
Kevin McDermott 2
S p eec h S peec h
Coding D ec oding
Channel Channel
Codin g D ec o ding
Burs t Burs t
As s em bly D is as s em bly
M odulation D em odulation
T rans m is s ion
Kevin McDermott 3
F rom S peec h S ourc e to Radio W aves
Speech coding
• Voice is generally assumed not to contain any useful
information above frequencies of 4 kHz.
• Hence, a sampling rate of 8 kHz is typically sufficient for an
acceptable voice quality.
• There is an internationally agreed-upon standard for voice
coding using 8-kHz sampling, known as pulse code modulation
(PCM).
• This utilises eight-bit sampling at 8 kHz, resulting in a bit rate
of 64 Kbytes/s.
Kevin McDermott 4
Speech coding
• More optimal voice coders would model the vocal tract of the
speaker based on their first few syllables and then send
information on how this vocal tract was generating sound.
• GSM utilises a coder type known as regular pulse excited–long-
term prediction (RPE-LTP)
• The LTP part sends some parameters showing what the vocal
tract is doing and the RPE shows how it is generating sound
(“being excited”).
• The speech signal is divided into blocks of 20 ms.
• These blocks are then passed to the speech codec, which has a
rate of 13 kbps, in order to obtain blocks of 260 bits.
Kevin McDermott 5
Physical Model
Kevin McDermott 6
When you speak:
• Air is pushed from your lung through your vocal tract and out
of your mouth comes speech.
• For certain voiced sound, your vocal cords vibrate (open and
close). The rate at which the vocal cords vibrate determines
the pitch of your voice. Women and young children tend to
have high pitch (fast vibration) while adult males tend to have
low pitch (slow vibration).
• For certain fricatives and plosive (or unvoiced) sound, your
vocal cords do not vibrate but remain constantly opened.
• The shape of your vocal tract determines the sound that you
make.
Kevin McDermott 7
When you speak:
• As you speak, your vocal tract changes its shape
producing different sound.
• The shape of the vocal tract changes relatively slowly
(on the scale of 10 msec to 100 msec).
• The amount of air coming from your lung determines
the loudness of your voice.
Kevin McDermott 8
Formant Frequencies
• Voiced sounds are by air flowing from the lungs over the
vocal chords, causing them to vibrate in a periodic
pattern generating a series of air pulses called ‘glottal
pulses’.
• The rate of vibration of the vocal chords determines the
‘pitch’ of sound produced.
• As these air pulses pass along the vocal tract, some of
the frequencies resonate.
• These frequencies are called the ‘format frequencies’ of
the voice being produced.
Kevin McDermott 9
• Unvoiced sounds are those which do not
cause vibration of the vocal chords.
• The vocal tract is modelled as a time varying
filter.
• It amplifies certain sound frequencies and
attenuates other frequencies.
• The sound is produced when a sound source
excites the vocal tract filter.
Kevin McDermott 10
Mathematical Model
Kevin McDermott 11
LPC Model
• The above model is often called the LPC
Model.
• The model says that the digital speech signal is
the output of a digital filter (called the LPC
filter) whose input is either a train of impulses
or a white noise sequence.
Kevin McDermott 12
Where is LPC10?
• Taxonomy of Speech Coders
Speech Coders
Pitch Decoder
Period Signal Power
Pulse Train V/U
Vocal Tract
G Model
Synthesized Speech
Random Noise
Voicing Classification(1)
Voiced Source
– Generated by vocal cords’ vibrations
– Periodic, spacing is the pitch, F0
Unvoiced Source
– Generated without vibrations
– Excitation is modeled by a White Gaussian Noise source
– No pitch
Kevin McDermott 16
Voice Coding
260 bits
IA – 50 bits IB – 132 bits II – 78 bits
Most critical Very Important Icing
Channel Coding - Blocks
• The 260 bit (20ms) sample is divided into class IA, IB and II,
Channel based on how important the bits are in determining the
Encoder sound quality.
One sample is 20ms • IA uses a 3 bit CRC. If the CRC fails, the whole sample is
of speech thrown out.
--> 456 bits • IA and IB together have a 4-bit trailer. This is then put
--> 8 blocks into a 1/2 convolutional coder of length 4 that doubles
the number of bits.
One block is 2.5ms of
• II bits are appended unencoded, giving an overall sample
speech
of 456 bits.
--> 57 bits
456 bits
IA – 50 IB – 132 bits IB – 132 bits II – 78 bits
IA – 50
First sample (20ms)
IB – 132 bits IB – 132 bits II – 78 bits IA – 50
Second sample (20ms)
IB – 132 bits IB – 132 bits II – 78 bits
Block - 57 Block - 57 Block - 57 Block - 57 Block - 57 Block - 57 Block - 57 Block - 57 Block - 57 Block - 57 Block - 57 Block - 57 Block - 57 Block - 57 Block - 57 Block - 57
156.25 bits
T T T
Block - 57 Training - 26 Block - 57 T/G
G G G
Sharing the channel – TDMA Frames
F F F F F F F F F F F F F F F F F F F F F F F F F F
MultiFrame - 26 Frames - 24 x 8 x 2 x 2.5ms sample of speech - 32500 bits -
One TDMA MultiFrame takes 120ms
--> 8 sources, Two 60ms samples of speech each
--> Each of eight sources can transmit 2 60ms samples of speech
every 120 ms
Interleaving
• In a radio environment, the signal strength can fade rapidly
for short periods of time due to fading (Rayeigh) and
shadowing.
• This will introduce high errors for short bursts.
• In order for error correction codes to work effectively the
errors should be evenly distributed in time.
• By using interleaving the risk of loosing consecutive data bits
is greatly reduced
Kevin McDermott 22
Interleaving
• A normal burst in GSM transmits two blocks of 57 data bits
• Therefore the 456 bits corresponding to the output of the
channel coder fit into four bursts (4*114 = 456).
• The 456 bits are divided into eight blocks of 57 bits.
• The first block of 57 bits contains the bit numbers (0, 8,
16, .....448), the second one the bit numbers (1, 9,
17, .....449), etc.
• The last block of 57 bits will then contain the bit numbers (7,
15, .....455).
Kevin McDermott 23
0 1 2 3 4 5 6 7 0 8 ..... 440 448
8 9 10 11 12 13 14 15
1 9 ..... 441 449
. . . . . . . .
. . . . . . . . 2 10 ..... 442 450
. . . . . . . .
3 11 ..... 443 451
440 441 442 443 444 445 446 447
448 449 450 451 452 453 454 455 4 12 ..... 444 452
Kevin McDermott 24
Interleaving
• The output’s of the interleaver are then grouped into bursts
that are modulated and transmitted.
• Each sub block is carried by a different burst and in a different
TDMA frame as shown below.
• The interleaving pattern will vary depending on whether we
are talking about a control channel, speech channel or data
channel.
• With interleaving the bursty noise is effectively spread out
which will allow the convolutional code to recover the
corrupted bits. So a sudden bursty deterioration of the S/N
ratio is not a problem.
Kevin McDermott 25
Bloc k n-1 (456 bits ) Bloc k n (456 bits )
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
Interleaving
Kevin McDermott 26
T RAN SM IT T ER RECEIVER
Read in by row s
Read out by row s
1 2 3 4 5 6 1 2
7 8 9 10 11 12 7 8
13 14 15 16 17 18 13
19 20 21 22 23 24 19
25 26 27 28 29 30 25
31 32 33 34 35 36 31
Read out by
c olum n Read in by
1 7 13 19 25 31 2 8 etc c olum n
Interleaving
Kevin McDermott 27
Interleaving in GSM
160 s am ples 160 s am ples
2048 bits (20m s ) 2048 bits (20m s )
R P E-LT P RP E-LT P
S peec h S peec h
Enc oder Enc oder
C hannel Channel
Enc oding Enc oding
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
S tream of Burs ts
Kevin McDermott 1 2 3 4 5 6 7 8 28
GSM: Modulation
• GSM uses Gaussian-filtered Minimum Shift Keying (GMSK).
– MSK is a minimum-shift form of FSK
Modulator – Gaussian pre-filter reduces bandwidth