Chapter 6 - Basics of Digital Audio
Chapter 6 - Basics of Digital Audio
com
Fundamentals of Multimedia, Chapter 6
o m
(a) For example, a speaker in an audio system vibrates back and
forth and pproduces a longitudinal ppressure wave that we pperceive
rs
I t f
Interface as opposed to digitized ones.
6.3 Q
Quantization and Transmission of Audio
e e
n
2 Multimedia Systems (eadeli@iust.ac.ir)
g i
Fundamentals of Multimedia, Chapter 6
((c)) Even
E th
thoughh suchh pressure waves are
longitudinal, they still have ordinary wave
o O Digitization
D
properties and behaviors, such as reflection
a • Digitization means conversion to a stream of
((bouncing),
g), refraction ((change
F a
g of angle
g when
entering a medium with a different density)
and diffraction (bending around an obstacle).
obstacle)
numbers,, and ppreferablyy these numbers should
be integers for efficiency.
(a) Sampling means measuring the quantity we are interested in, usually at
evenly-spaced intervals.
(b) The first kind of sampling, using measurements only at evenly spaced
time intervals, is simply called, sampling. The rate at which it is
performed is called the sampling
p p g ffrequency
q y ((see Fig.
g 6.2(a)).
( ))
o m
(c) For audio, typical sampling rates are from 8 kHz (8,000 samples per
g is determined by y the Nyquist
yq theorem,,
.
((d)) Sampling
c
p g in the amplitude p or voltageg dimension is called
rs
quantization. Fig. 6.2(b) shows this kind of sampling.
Fig. 6.1: An analog signal: continuous measurement
of ppressure wave.
e e
n
5 Multimedia Systems (eadeli@iust.ac.ir) 6 Multimedia Systems (eadeli@iust.ac.ir)
g i
Fundamentals of Multimedia, Chapter 6
a D
F a 1. What is the sampling rate?
Fig. 6.2:
Fig 6 2: Sampling and Quantization.
Quantization (a): Sampling the 3 How is audio data formatted? (file format)
3.
analog signal in the time dimension. (b): Quantization is
sampling the analog signal in the amplitude dimension.
7 Multimedia Systems (eadeli@iust.ac.ir) 8 Multimedia Systems (eadeli@iust.ac.ir)
www.jntuworld.com
Fundamentals of Multimedia, Chapter 6 Fundamentals of Multimedia, Chapter 6
N
Nyquist
i t Theorem
Th
o m
. c
e rs
Fig. 6.3: Building up a complex signal by superposing sinusoids
n g
Fundamentals of Multimedia, Chapter 6
• The
Th Nyquist
N i theorem
h states how
h frequently
f l we must sample
l in
i time
i to
O E Fundamentals of Multimedia, Chapter 6
Fig 6.4:
Fig. 6 4: Aliasing.
Aliasing
be able to recover the original sound.
D o
(a) Fig. 6.4(a) shows a single sinusoid: it is a single, pure, frequency (only
(a): A single frequency.
a
electronic instruments can create such sounds).
a
F
(b) If sampling rate just equals the actual frequency, Fig. 6.4(b) shows that
a false signal is detected: it is simply a constant, with zero frequency. (b): Sampling at exactly the frequency
produces a constant.
(c) Now if sample at 1.5 times the actual frequency, Fig. 6.4(c) shows that
we obtain an incorrect (alias) frequency that is lower than the correct
one — it is half the correct one (the wavelength, from peak to peak, is
double that of the actual signal).
signal)
(d) Thus for correct sampling we must use a sampling rate equal to at least (c): Sampling at 1.5 times per cycle
twice the maximum frequency content in the signal.
signal This rate is called produces
d an alias
l perceivedi d frequency.
f
the Nyquist rate.
o m
input to the sampler to a range at or below Nyquist
. c
• The relationshipp amongg the Sampling
p g Frequency,
q y, True
rs
Frequency, and the Alias Frequency is as follows:
falias = fsampling − ftrue, for ftrue < fsampling < 2 × ftrue (6.1)
S
Sampling
li Without
With t Aliasing
Ali i S
Sampling
li With Aliasing
Ali i
e e
n
13 Multimedia Systems (eadeli@iust.ac.ir) 14 Multimedia Systems (eadeli@iust.ac.ir)
g i
Fundamentals of Multimedia, Chapter 6
a
i isi
called the signal to noise ratio (SNR) — a measure of the signal voltage Vsignal is 10 times the noise,
quality of the signal.
F a
• The SNR is usually measured in decibels (dB), where 1 dB is
a tenth of a bel.
bel The SNR value,
value in units of dB,
dB is defined in
then the SNR is 20 log10(10) = 20dB.
20dB
terms of base-10 logarithms of squared voltages, as follows: b) In terms of power, if the power from ten
violins is ten times that from one violin
2
Vsignal Vsignal playing, then the ratio of power is 10dB, or
SNR = 10 log10 2
= 20 log10 (6.2)
V noise Vnoise 1B.
1B
• Th
The usuall levels
l l off soundd we hear
h around
d us are described
d ib d in
i terms off decibels,
d ib l as a Signal to Quantization Noise Ratio
ratio to the quietest sound we are capable of hearing. Table 6.1 shows approximate
levels for these sounds. (SQNR)
Table 6.1: Magnitude levels of common sounds, in decibels •A
Aside
id from
f any noise
i that
th t may have
h b
been presentt
Threshold of hearing 0
in the original analog signal, there is also an
Rustle of leaves 10 additional error that results from quantization.
quantization
Very quiet room 20
Average room 40
Conversation 60
(a) If voltages are actually in 0 to 1 but we have only 8
bits in which to store values, then effectively we force
m
Busy street 70
all continuous values of voltage into only 256 different
o
Loud radio 80
Train through station 90
values.
l
Riveter 100
. c
(b) This introduces a roundoff error.
error It is not really
rs
Threshold of discomfort 120
Threshold of pain 140
“noise”. Nevertheless it is called quantization noise
(or quantization error).
e
Damage to ear drum 160
n g
Fundamentals of Multimedia, Chapter 6
((c)) For
F a quantization
i i accuracy off N bits
bi per sample,
l the
h SQNR can
a
signal
SQNR = 20 log = 20 log
10 V 10 1
F a
(a) Quantization noise: the difference between the
actual value of the analog signal, for the
quan _ noise
= 20
0 × N × log
og 2 = 6.02
2
6.0 N (dB)
(d ) (6.3)
particular sampling time, and the nearest • Notes:
qquantization interval value.
(a) We map the maximum signal to 2N−1 − 1 (≃ 2N−1) and the most
negative signal to −2N−1.
(b) At most, this error can be as much as half of the
interval. (b) Eq. (6.3) is the Peak signal-to-noise ratio, PSQNR: peak signal and
peak noise.
(c) The dynamic range is the ratio of maximum to Linear and Non-linear Quantization
minimum absolute values of the signal: Vmax/Vmin. The
abs value Vmax gets mapped to 2NN−11 − 1; the min
max abs. • Linear format: samples are typically stored as uniformly quantized values.
values
abs. value Vmin gets mapped to 1. Vmin is the smallest
• Non-uniform quantization: set up more finely-spaced levels where humans
positive voltage that is not masked by noise. The most hear with the most acuity.
acuity
negative signal, −Vmax, is mapped to −2N−1.
– Weber’s Law stated formally says that equally perceived differences have values
proportional to absolute levels:
(d) The quantization interval is ΔV=(2Vmax)/2N, since
there are 2N intervals. The whole range
(Vmax − ΔV/2) is mapped to 2N−1 − 1.
g Vmax down to
o m
ΔResponse ∝ ΔStimulus/Stimulus
states:
. c
rs
dr = k (1/s) ds (6.6)
(e) The maximum noise, in terms of actual voltages, is
half the quantization interval: ΔV/2 = Vmax/2N.
e
with response r and stimulus s.
n g
Fundamentals of Multimedia, Chapter 6
– Integrating,
I i we arrive
i at a solution
l i
O E Fundamentals of Multimedia, Chapter 6
μ-law:
⎧⎪ s ⎫⎪
r = k ln s + C
D o (6.7)
r=
sgn( s )
ln ⎨1 + μ
( + μ ) ⎪⎩
ln(1
⎬,
s p ⎪⎭
s
sp
≤1 (6.9)
a a A-law:
⎧ ⎛ s ⎞
F
A s 1
⎪ ⎜ ⎟⎟ , ≤
r = k ln(s/s0) (6.8) ⎪ 1 + ln A ⎜⎝ s p sp A
⎠
⎪ (6.10)
r = ⎨
s0 = the lowest level of stimulus that causes a response (r = 0 when s = s0).
) ⎪
⎪ sgn (s) ⎡ s ⎤ 1 s
⎪ 1 + ln A ⎢ 1 + ln A ⎥,
s p ⎦⎥ A
≤
sp
≤1
• Nonlinear quantization works by first transforming an analog signal from the raw s ⎩ ⎣⎢
p
space into the theoretical r space,
p , and then uniformlyy qquantizingg the resultingg
values. ⎧ 1 if s > 0,
0
where sgn( s ) = ⎨
⎩−1 otherwise
• Such a law for audio is called μ-law encoding, (or u-law). A very similar rule, called
A-law, is used in telephony in Europe. • Fig. 6.6 shows these curves. The parameter μ is set to μ = 100 or μ = 255; the
parameter A for the A-law encoder is usually set to A = 87.6.
• The equations for these very similar encodings are as follows:
Audio Filtering
• Prior to sampling and AD conversion,
con ersion the audio
a dio signal is also usually
s all filtered
filt d
to remove unwanted frequencies. The frequencies kept depend on the
application:
(a) For speech, typically from 50Hz to 10kHz is retained, and other frequencies
are blocked by the use of a band-pass filter that screens out lower and higher
q
frequencies.
m
(b) An audio music signal will typically contain from about 20Hz up to 20kHz.
c
because of sampling and then quantization, smooth input signal is replaced by a
.
series of step functions containing all possible frequencies.
rs
• The μ-law in audio is used to develop a nonuniform quantization rule (d) So at the decoder side, a lowpass filter is used after the DA circuit.
for sound: uniform quantization of r gives finer resolution in s at the
quiet end.
e e
n
25 Multimedia Systems (eadeli@iust.ac.ir) 26 Multimedia Systems (eadeli@iust.ac.ir)
g i
Fundamentals of Multimedia, Chapter 6
a
f
quantization. Stereo: double the bandwidth. to transmit a digital 1. FM (Frequency Modulation): one approach to
audio signal.
π π
In this technique,
q , the actual digital
g samples
p of
sounds from real instruments are stored. Since
m
wave tables are stored in memory on the sound
o
Fig.
g 6.7: Frequency
q y Modulation. ((a):
) A single
g frequency.
q y ((b):
) Twice the
c
card, they can be manipulated by software so
.
that sounds can be combined,
combined edited,
edited and
rs
frequency. (c): Usually, FM is carried out using a sinusoid argument to
a sinusoid. (d): A more complex form arises from a carrier frequency,
enhanced.
2πt and a modulating frequency 4πt cosine inside the sinusoid.
e e
n
29 Multimedia Systems (eadeli@iust.ac.ir) 30 Multimedia Systems (eadeli@iust.ac.ir)
g i
Fundamentals of Multimedia, Chapter 6
a D
d ⇒ use a simple can be played and manipulated on another
• MIDI Overview
F a
(a) MIDI is a scripting language — it codes “events” that stand for
synthesizer and sound reasonably close.
• System
S messages
MIDI Concepts (a) Several other types of messages, e.g. a general message
• MIDI channels
h l are usedd to
t separate
t messages. for all instruments indicating a change in tuning or timing.
timing
(b) If the first 4 bits are all 1s, then the message is interpreted
(a) There are 16 channels numbered from 0 to 15. 15 The as a system common message.
channel forms the last 4 bits (the least significant bits) of
the message.
• The way
a a synthetic
s nthetic musical
m sical instrument
instr ment responds to a
MIDI message is usually by simply ignoring any play
(b) Usually a channel is associated with a particular
instrument: e.g., channel 1 is the piano, channel 10 is the
drums, etc.
g that is not for its channel.
sound message
o m
. c
– If several messages are for its channel, then the instrument
rs
((c)) Nevertheless,
N th l one can switch
it h instruments
i t t midstream,
id t if responds,
d provided
id d it is
i multi-voice,
lti i i
i.e., can play
l more
desired, and associate another instrument with any channel. than a single note at once.
e e
n
33 Multimedia Systems (eadeli@iust.ac.ir) 34 Multimedia Systems (eadeli@iust.ac.ir)
g i
Fundamentals of Multimedia, Chapter 6
• It
I is
i easy to confuse
f the
h term voice
i withi h the
h term timbre
i b — the
o Oh latter
is MIDI terminology for just what instrument that is trying to be
l
emulated, e.g. a piano as opposed to a violin: it is the quality of the
• G MIDI A standard
Generall MIDI: d d mappingi specifying
if i what
h
instruments (what patches) will be associated with what
sound.
sound
a D channels.
capable
(b) On
bl off playing
O the
l i many different
brass, drums, etc.
h other
h hand,
h d the
diff
F a
(a) An instrument (or sound card) that is multi-timbral is one that is
sounds
h term voice,
d at the
i while
h same time,
i
hil sometimes
i
e.g., piano,
used
d by
i
b musicians
i i
(a) In General MIDI, channel 10 is reserved for percussion
instruments and there are 128 patches associated with standard
instruments,
instruments.
(b) For
F most instruments,
i a typical
i l message might
i h beb a Note
N O
On
to mean the same thing as timbre, is used in MIDI to mean every message (meaning, e.g., a keypress and release), consisting of
different timbre and pitch that the tone module can produce at the same what channel, what pitch, and what “velocity” (i.e., volume).
time.
time
• Different timbres are produced digitally by using a patch — the set of (c) A Note On message consists of “status” byte — which channel,
control settings that define a particular timbre.
timbre Patches are often what pitch — followed by two data bytes. It is followed by a
organized into databases, called banks. Note Off message, which also has a pitch (which note to turn
off) and a velocity (often set to zero).
• The
Th data
d ini a MIDI status byte b i between
is b 128 and
d 255;
255 each
h • A MIDI device
d i oftenf i capable
is bl off programmability,
bili andd also
l can
of the data bytes is between 0 and 127. Actual MIDI bytes change the envelope describing how the amplitude of a sound
are 10-bit,, includingg a 0 start and 0 stop
p bit. changes over time.
o m
Fig.
g 6.8: Stream of 10-bit bytes;
y ; for typical
yp MIDI messages,
g ,
. c
rs
these consist of {Status byte, Data Byte, Data Byte} = {Note
On, Note Number, Note Velocity}
e
Fig. 6.9: Stages of amplitude versus time for a music note
n g
Fundamentals of Multimedia, Chapter 6
• Th
The physical
h i l MIDI ports consisti off 5-pin
5 i connectors for
f
• The
Hardware Aspects of MIDI
Th MIDI hardware
h d setup
t consists
i t off a 31.25
D
31 25 kbps
kb serial
i l o IN and OUT, as well as a third connector called THRU.
p devices or Output
Input
a
p devices,, not both. a
connection. Usually, MIDI-capable units are either (a) MIDI communication is half-duplex.
• A traditional synthesizer
y F
is shown in Fig.
g 6.10:
(b) MIDI IN is the connector via which the device receives all
MIDI data.
• A typical
i l MIDI sequencer setup is
i shown
h i Fig.
in Fi 6.11:
6 11
Structure of MIDI Messages
• MIDI messages can be
b classified
l ifi d into
i two types: channel
h l
messages and system messages, as in Fig. 6.12:
o m
. c
Fig. 6.11: A typical MIDI setup
e rsFig.
g 6.12: MIDI message
g taxonomyy
41 Multimedia Systems (eadeli@iust.ac.ir)
n g
Fundamentals of Multimedia, Chapter 6
T bl 6.3:
Table 6 3 MIDI voice
i messages
D o
a) The first byte is the status byte (the opcode, as it were); has its most significant bit set to 1.
Voice Message
g Status Byte
y Data Byte1
y Data Byte2
y
a
b) The 4 low-order bits identify which channel this message belongs to (for 16 possible channels). Note Off &H8n Key number Note Off velocity
F a
c) The 3 remaining bits hold the message. For a data byte, the most significant bit is set to 0.
a) This type of channel message controls a voice, i.e., sends information specifying which note to
play or to turn off, and encodes key pressure.
Note On
Poly. Key Pressure
Control Change
P
Program Change
Ch
&H9n
&HAn
&HBn
&HC
&HCn
Key number
Key number
Controller num.
P
Program number
b
Note On velocity
Amount
Controller value
N
None
Channel Pressure &HDn Pressure value None
b) Voice messages are also used to specify controller effects such as sustain, vibrato, tremolo, and
the ppitch wheel. Pitch
tc Bend
e d &HEn
& MSB
S LSB
S
c) Table 6.3 lists these operations.
• A.2.
A 2 Channel
Ch l mode
d messages: T bl 6.4:
Table 6 4 MIDI mode
d messages
1st Data Byte
y Description
p Meaningg of 2nd Data
a)) Ch
Channell mode
d messages: special
i l case off the
th Control
C t l Byte
Change message → opcode B (the message is &HBn, or
&H79 Reset all controllers None; set to 0
1011nnnn).
&H7A L l control
Local t l 0 = off;
ff 127 = on
b) However, a Channel Mode message has its first data byte &H7B All notes off None; set to 0
in 121 through 127 (&H79
(&H79–7F)
7F). &H7C Omni mode off None; set to 0
&H7D Omni mode on None; set to 0
c) Channel mode messages determine how an instrument
processes MIDI voice messages: respond to all messages,
&H7E
&H7F
o m
Mono mode on (Poly mode off)
Poly mode on (Mono mode off)
Controller number
None; set to 0
respond just to the correct channel, don’t respond at all, or
ggo over to local control of the instrument.
. c
d) The data bytes have meanings as shown in Table 6.4.
e rs
45 Multimedia Systems (eadeli@iust.ac.ir)
n g
Fundamentals of Multimedia, Chapter 6
B.1.
B 1 System
S common messages: relate
l to timing
i i or
o
• B. System Messages:
positioning.
a D
a) System messages have no channel number — Table 6.5: MIDI System Common messages.
F a
commands that are not channel specific, such as
timing signals for synchronization, positioning
information in pre-recorded MIDI sequences, and
detailed setup information for the destination device.
device
System Common Message
MIDI Timing Code
Song Position Pointer
Status Byte
&HF1
&HF2
Number of Data Bytes
1
2
Song Select &HF3 1
Tune Request &HF6 None
b) Opcodes for all system messages start with &HF.
&HF
EOX (terminator) &HF7 None
B.2. System real-time messages: related to synchronization. B.3. System exclusive message: included so that the
MIDI
T bl 6.6:
Table 6 6 MIDI System
S t R Real-Time
l Ti messages. standard
d d can be
b extended
d d by
b manufacturers.
f
System Real-Time Message Status Byte
Timing Clock &HF8 a)) Af
After the
h initial
i i i l code,
d a stream off any specific
ifi messages
Start Sequence &HFA can be inserted that apply to their own product.
Continue Sequence &HFB
b) A System Exclusive message is supposed to be terminated
m
Stop Sequence &HFC
y a terminator byte
by y &HF7, as specified in Table 6.
o
Active Sensingg &HFE
c
System Reset &HFF
b ended
be
rs
d d by
.
c) The terminator is optional and the data stream may simply
b sending
di theh status byte
b off theh next message.
e e
n
49 Multimedia Systems (eadeli@iust.ac.ir) 50 Multimedia Systems (eadeli@iust.ac.ir)
g i
Fundamentals of Multimedia, Chapter 6
General MIDI
o O MIDI to WAV Conversion
• General MIDI is a scheme for standardizing the assignment
of instruments to patch numbers.
b) Where a “note” appears on a musical score determines what percussion instrument is being
c) Other requirements for General MIDI compatibility: MIDI device must support all 16 channels;
they insist on .wav
wav format files.
files
• General MIDI Level2: An extended general MIDI has recently been defined, with a b) These programs essentially consist of large lookup
standard
t d d .smf f “Standard
“St d d MIDI File” Fil ” format
f t defined
d fi d — inclusion
i l i off extra
t files that try to substitute pre-defined
pre defined or shifted WAV
character information, such as karaoke lyrics. output for MIDI messages, with inconsistent success.
6.3 Quantization and Transmission of Audio c) The result of reducing the variance of values is
that lossless compression methods produce a
• C
Coding A di Quantization
di off Audio: Q i i andd transformation
f i off bitstream with shorter bit lengths for more likely
data are collectively known as coding of the data. values (→ expanded discussion in Chap.7).
rs
th histogram
the hi t off pixel
i l values
l (diff
(differences, now)) into
i t a
much smaller range. called DM). The adaptive version is called
e e ADPCM.
n
53 Multimedia Systems (eadeli@iust.ac.ir) 54 Multimedia Systems (eadeli@iust.ac.ir)
g i
Fundamentals of Multimedia, Chapter 6
F
• Quantization consists of selecting breakpoints (a) (b)
in magnitude, and then re-mapping any value
within an interval to one of the representative Fig 6.2:
Fig. 6 2: Sampling and Quantization.
Quantization
output levels. −→ Repeat of Fig. 6.2:
a)) Th
The set off interval
i l boundaries
b d i are called
ll d decision
d ii • Every compression scheme has three stages:
boundaries, and the representative values are called
reconstruction levels. A. The input data is transformed to a new
representation that is easier or more efficient to
compress.
b) The boundaries for quantizer input intervals that will
all be mapped into the same output level form a coder
mapping. B. We may introduce loss of information. Quantization is
the main lossy step we use a limited number of
c) The representative values that are the output values reconstruction levels, fewer than in the original signal.
from a q
quantizer are a decoder mapping.
pp g
o m
d) Finally, we may wish to compress the data, by
. c
C. Coding. Assign a codeword (thus forming a binary
bitstream)) to each output
p level or symbol.
y This could
rs
assigning
i i a bitbi stream that
h uses fewer
f bi for
bits f the
h most be a fixed-length code, or a variable length code such
prevalent signal values (Chap. 7).
as Huffman coding (Chap. 7).
e e
n
57 Multimedia Systems (eadeli@iust.ac.ir) 58 Multimedia Systems (eadeli@iust.ac.ir)
g i
Fundamentals of Multimedia, Chapter 6
(b) With companding, we can reduce the sample size down to about 8 bits
with the same perceived level of quality, and thus reduce the bit-rate to
160 kbps.
kbps
(c) However, the standard approach to telephony in fact assumes that the
highest-frequency audio signal we want to reproduce is only about 4
kHz. Therefore the sampling rate is only 8 kHz, and the companded bit-
rate thus reduces this to 64 kbps.
• H
However, there
h are two small
ll wrinkles
i kl we must
also address:
n
61 Multimedia Systems (eadeli@iust.ac.ir) 62 Multimedia Systems (eadeli@iust.ac.ir)
g i
Fundamentals of Multimedia, Chapter 6
2 A discontinuous
2. di i signal
i l contains i not just
j
frequency components due to the original signal,
o O • Th
The completel scheme
h f
for encoding
di andd decoding
d di
telephony signals is shown as a schematic in Fig. 6.14.
D
but also a theoretically infinite set of higher-
frequency components:
higher
a
As a result of the low
low-pass
pass filtering, the output becomes
smoothed and Fig. 6.13(c) above showed this effect.
signal
g pprocessing. g F a
(a) This result is from the theory of Fourier analysis, in
Differential Coding of Audio (b) For example, as an extreme case the histogram
for a linear ramp signal that has constant slope is
• A
Audio
di isi often
f storedd not in
i simple
i l PCM C but b flat, whereas the histogram for the derivative of the
instead in a form that exploits differences — signal (i.e., the differences, from sampling point to
which
hi h are generally
ll smaller
ll numbers,
b so offer
ff the
th sampling point) consists of a spike at the slope
possibility of using fewer bits to store. value.
e rs
rarely occurring ones.
n g
Fundamentals of Multimedia, Chapter 6
a a
sample as being equal to the current sample; send not the sample itself but provides a better prediction. Typically, a linear
predictor function is used:
PCM system.
F
(a) Predictive coding consists of finding differences, and transmitting these using a
(b) Note that differences of integers will be integers. Denote the integer input
signal as the set of values fn. Then we predict values lf n as simply the previous
2 to 4
f n = ∑ an − k f n − k
l (6.13)
value, and define the error en as the difference between the actual and the k =1
predicted signal:
l
f n = f n −1
en = f n − l
fn (6.12)
• Th
The idea
id off forming
f i differences
diff i to make
is k the
h histogram
hi off
sample values more peaked.
n
69 Multimedia Systems (eadeli@iust.ac.ir) 70 Multimedia Systems (eadeli@iust.ac.ir)
g i
Fundamentals of Multimedia, Chapter 6
• O
One problem:
bl suppose our integer
i sample
l values
l are in
i the
h range
0..255. Then differences could be as much as -255..255 —we’ve
increased our dynamic range (ratio of maximum to minimum) by a
o O • Lossless predictive coding — the decoder
produces
p oduces thee sa
samee ssignals
g a s as thee o
original.
g a . Ass a
a D
factor of two → need more bits to transmit some differences.
differences
simple example, suppose we devise a predictor
standing
t di
(b) Then
Th
f
for Shift U and
Shift-Up
d for
f only
S
Some
l a limited
special
i l code
d
l ⎢1 ⎥
f n = ⎢ ( f n −1 + f n − 2 ) ⎥
differences, say only the range −15..16. Differences which lie in the
limited range can be coded as is, but with the extra two values for SU, ⎣2 ⎦
SD,, a value outside the range
g −15..16 can be transmitted as a series of
shifts, followed by a value that is indeed inside the range −15..16. (6 14)
(6.14)
((c)) For example,
p , 100 is transmitted as: SU,, SU,, SU,, 4,, where ((the codes
for) SU and for 4 are what are transmitted (or stored). en = f n − l
fn
• Let’s
L ’ consider
id an explicit
li i example.l Suppose
S we wish
i h to code
d
the sequence f1, f2, f3, f4, f5 = 21, 22, 27, 25, 22. For the • The error does center around zero, we see, and
ppurposes
p of the ppredictor,, we’ll invent an extra signal
g value codingg (ass
cod (assigning
g gb bit-string
s g codewo
codewords)
ds) w
will be
f0, equal to f1 = 21, and first transmit this initial value,
uncoded: efficient. Fig. 6.16 shows a typical schematic
l
f 2 = 21,
21 e2 = 22 − 21 = 1;
diagram used to encapsulate this type of
l ⎢1 ⎥ ⎢1 ⎥
f3 = ⎢ ( f 2 + f1 ) ⎥ = ⎢ (22 + 21) ⎥ = 21,
system:
⎣2 ⎦ ⎣2 ⎦
e3 = 27 − 21 = 6;
l ⎢1 ⎥ ⎢1 ⎥
f 4 = ⎢ ( f3 + f 2 ) ⎥ = ⎢ (27 + 22) ⎥ = 24,
⎣2 ⎦ ⎣2 ⎦
24
o m
e4 = 25 − 24 = 1;
. c
rs
l ⎢1 ⎥ ⎢1 ⎥
f5 = ⎢ ( f 4 + f3 ) ⎥ = ⎢ (25 + 27) ⎥ = 26,
⎣2 ⎦ ⎣2 ⎦
(6.15)
e5 = 22 − 26 = − 4
e e
n
73 Multimedia Systems (eadeli@iust.ac.ir) 74 Multimedia Systems (eadeli@iust.ac.ir)
g i
Fundamentals of Multimedia, Chapter 6
o O DPCM
a D • Differential
Diff ti l PCM is
i exactly
tl the
th same as Predictive
P di ti
Coding, except that it incorporates a quantizer
F a step.
step
( ) DPCM
(c) DPCM: formf the
h prediction;
di i f
form an error en by
b subtracting
b i the
h
prediction from the actual signal; then quantize the error to a quantized (d) The main effect of the coder-decoder process
version, . ei n
The set of equations that describe DPCM are as follows: iss too p
produce
oduce reconstructed,
eco s uc ed, qua
quantized
ed ssignal
g a
i l
fn ei
l
f n = function _ of ( if n −1 , if n − 2 , if n −3 ,...) , values = + . fn n
en = f n − l
fn ,
ein = Q[en ],
]
The distortion is the average g squared
q error
N
[∑ ( f − f ) ] / N ; one often plots distortion versus
2
m
n n
n =1
transmit codeword (ein ) ,
reconstruct: i
f n = fˆn + ein .
(6.16) the number of bit-levels
. c o
bit levels used.
used A Lloyd-Max
Lloyd Max
quantizer will do better (have less distortion)
rs
Then codewords for quantized error values ei are produced using n
than
h a uniform if quantizer.
i
entropy coding, e.g. Huffman coding (Chapter 7).
e e
n
77 Multimedia Systems (eadeli@iust.ac.ir) 78 Multimedia Systems (eadeli@iust.ac.ir)
g i
Fundamentals of Multimedia, Chapter 6
• S
So typically
t i ll one assigns i quantization
ti ti steps
t f a quantizer
for ti with
ith
i + N −1 nonuniform steps by assuming signal differences, dn are drawn from
∑ (f
such a distribution and then choosing steps to minimize
n − Q[ f n ])
2
min (6 17)
(6.17) i + N −1
∑ (d − Q[d n ]) l (d n ).
2
n =i min n (6.19)
n =i
• This
Thi is
i a least-squares
l problem,
bl andd can be
b solved
l d iteratively
i i l • Notice
N i thath theh quantization
i i noise,i f n − fn , is
i equall to the
h quantization
i i
== the Lloyd-Max quantizer. e −
e
effect on the error term, n n .
(
fˆn = trunc fn −1 + fn − 2 ) (6.19)
th t en = f n − fˆn is
so that i an integer.
i t
o m
c
en = Q[en ] = 16*trunc ( ( 255 + en ) /16 ) − 256 + 8
.
Fig.
g 6.17: Schematic diagram
g for DPCM encoder and decoder
e rs fn = fˆn + en (6.20)
n g
Fundamentals of Multimedia, Chapter 6
• Table
T bl 6.7
6 7 gives
i output values
l f any off the
for h input
i codes:
d 4-bit
4 bi codes
d are mapped
d to 32
Table 6.7
6 7 DPCM quantizer reconstruction levels.
levels
for the error term. The quantizer simply
a a
divides the error range into 32 patches of about
en in range
-255
255 .. -240
240
Quantized to value
-248
248
reconstructed
t t d valuel for
f eachh patch
F
16 levels each. It also makes the representative
t h equall to
t the
th
-239 .. -224
.
.
.
-31 .. -16
-232
.
.
.
-24
midway point for each group of 16 levels. -15 .. 0
1 .. 16
-8
8
17 .. 32 24
. .
. .
. .
225 .. 240 232
241 .. 255 248
• As
A an example
l stream off signal
i l values,
l consider
id the
h set off values:
l
f1 f2 f3 f4 f5
DM
130 150 140 200 300 • DM (Delta Modulation):
Mod lation): simplified version
ersion of DPCM.
DPCM Often used
sed as a quick
q ick
AD converter.
• Prepend extra values f = 130 to replicate the first value, f1. Initialize with quantized
error ei1 ≡ 0, so that the first reconstructed value is exact: i
f=1 130. Then the rest of 1
1. Delta DM: use only a single quantized error value,
Uniform-Delta
Uniform value either positive
the values calculated are as follows (with prepended values in a box): or negative.
l
f = 130, 130, 142, 144, 167 (a) ⇒ a 1-bit
b t code
coder.. Produces
oduces coded output tthat
at follows
o ows tthee o
original
g a ssignal
g a in a
e = 0 , 20, −2, 56, 63 staircase fashion. The set of equations is:
e =
m
0 , 24, −8, 56, 56 fˆn = fn −1 ,
f =
i 130, 154, 134, 200, 223 en = f n − fˆn = f n − fn −1 ,
. c o
⎧+ k if en > 0, where k is a constant
en = ⎨
rs
⎩−k otherwise
the same prediction rule.
fn = fˆn + en .
e e N t that
Note th t the
th prediction
di ti simply
i l involves
i l a delay.
d l
n
85 Multimedia Systems (eadeli@iust.ac.ir) 86 Multimedia Systems (eadeli@iust.ac.ir)
g i
Fundamentals of Multimedia, Chapter 6
(b) Consider
C id actuall numbers:
b S
Suppose
f1
signal
i l values
l
f2
are
f3 f4
o O 2. Adaptive DM: If the slope of the actual signal
curve
cu ve iss high,
g , thee sstaircase
a case app
approximation
o a o
10 11 13
i
15
As well, define an exact reconstructed value f 1 = f1 = 10 .
e2 = 11 − 10 = 1,
e3
3 = 13 − 14 = −1,
1
F a change the step size k adaptively.
adaptively
e4 = 15 − 10 = 5,
– One scheme for analytically determining the best
set of qquantizer steps,
p for a non-uniform qquantizer,
The reconstructed set of values 10, 14, 10, 14 is close to the correct set 10, 11, 13,
15.
is Lloyd-Max.
(d) However, DM copes less well with rapidly changing signals. One approach to
mitigating this problem is to simply increase the sampling, perhaps to many times
the Nyquist rate.
2 We
2. W can alsol adapt
d the
h predictor,
di again
i using
i forward
f d
ADPCM or backward adaptation. Making the predictor
coefficients adaptive
p is called Adaptive
p Predictive
• ADPCM (Adaptive
(Ad ti DPCM) takest k theth idea
id off adapting
d ti the th Coding (APC):
coder to suit the input much farther. The two pieces that
make up a DPCM coder: the quantizer and the predictor.
( ) Recall
(a) ll that
h the
h predictor
di i usually
is ll taken
k to be b a linear
li
function of previous reconstructed quantized values, i
f n.
1. In Adaptive DM, adapt the quantizer step size to suit the input. In
DPCM we can change the step size as well as decision
DPCM,
boundaries, using a non-uniform quantizer. (b) The number of previous values used is called the “order” of
We can carry this out in two ways:
m
the predictor. For example, if we use M previous values, we
need M coefficients ai, i = 1..M in a ppredictor
o
c
(a) Forward adaptive quantization: use the properties of the input
signal.
i =1
ai fn −i (6 22)
(6.22)
g the non-uniform q
change quantizer.
e e
n
89 Multimedia Systems (eadeli@iust.ac.ir) 90 Multimedia Systems (eadeli@iust.ac.ir)
g i
Fundamentals of Multimedia, Chapter 6
•H
However we can get into
i a difficult
diffi l situation
i i if we try to change
h
prediction coefficients, that multiply previous quantized values, O
the
o
h
because that makes a complicated set of equations to solve for these
((c)) Instead,
I d one usually
ll resorts to solving
l i the
problem that results from using not in thei
h simpler
i l
f prediction, n
coefficients:
minimization
i i i i trying
i to find
fi d the
h best
F a
(a) Suppose we decide to use a least-squares approach to solving a
b values
N
l off the
h ai:
• Fi
Fig. 6.18
6 18 shows
h a schematic
h i diagram
di f the
for h ADPCM
coder and decoder:
o m
. c
rs
Fig. 6.18: Schematic diagram for ADPCM encoder and
decoder
e e
n
93 Multimedia Systems (eadeli@iust.ac.ir)
g i
E n
o O
a D
F a