0% found this document useful (0 votes)
62 views

Chapter 6 - Basics of Digital Audio

The document discusses the digitization of sound. It explains that sound is a wave phenomenon involving compressed and expanded air molecules. To digitize sound, it must be converted to a stream of numbers, preferably integers. This involves sampling the sound signal at regular time intervals to measure the amplitude, called time sampling. Typical audio sampling rates are between 8 kHz to 48 kHz, determined by the Nyquist theorem. Digitizing sound in both time and amplitude allows it to be represented digitally.

Uploaded by

Ankur Mittal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views

Chapter 6 - Basics of Digital Audio

The document discusses the digitization of sound. It explains that sound is a wave phenomenon involving compressed and expanded air molecules. To digitize sound, it must be converted to a stream of numbers, preferably integers. This involves sampling the sound signal at regular time intervals to measure the amplitude, called time sampling. Typical audio sampling rates are between 8 kHz to 48 kHz, determined by the Nyquist theorem. Digitizing sound in both time and amplitude allows it to be represented digitally.

Uploaded by

Ankur Mittal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

www.jntuworld.

com
Fundamentals of Multimedia, Chapter 6

6.1 Digitization of Sound


Wh t is
What i Sound?
S d?
Iran University of Science and Technology,
Computer Engineering Department,
F ll 2008 (1387)
Fall • Sound is a wave phenomenon like light,
light but is macroscopic
Chapter 6 and involves molecules of air being compressed and
expanded under the action of some physical device.
Basics of Digital Audio
6.1 Digitization of Sound
as sound.
g

o m
(a) For example, a speaker in an audio system vibrates back and
forth and pproduces a longitudinal ppressure wave that we pperceive

6.2 MIDI: Musical Instrument Digital


. c
(b) Since sound is a pressure wave,
wave it takes on continuous values,
values

rs
I t f
Interface as opposed to digitized ones.
6.3 Q
Quantization and Transmission of Audio

e e
n
2 Multimedia Systems (eadeli@iust.ac.ir)

g i
Fundamentals of Multimedia, Chapter 6

E n Fundamentals of Multimedia, Chapter 6

((c)) Even
E th
thoughh suchh pressure waves are
longitudinal, they still have ordinary wave
o O Digitization
D
properties and behaviors, such as reflection
a • Digitization means conversion to a stream of
((bouncing),
g), refraction ((change

F a
g of angle
g when
entering a medium with a different density)
and diffraction (bending around an obstacle).
obstacle)
numbers,, and ppreferablyy these numbers should
be integers for efficiency.

• Fig. 6.1 shows the 1-dimensional nature of


(d) If we wish to use a digital version of sound sound: amplitude values depend on a 1D
waves we must form digitized representations variable, time. (And note that images depend
of audio information. instead on a 2D set of variables, x and y).

3 Multimedia Systems (eadeli@iust.ac.ir) 4 Multimedia Systems (eadeli@iust.ac.ir)


www.jntuworld.com
Fundamentals of Multimedia, Chapter 6 Fundamentals of Multimedia, Chapter 6

• The graph in Fig.


Fig 6.1
6 1 has to be made digital in both time and
amplitude. To digitize, the signal must be sampled in each
dimension: in time, and in amplitude.

(a) Sampling means measuring the quantity we are interested in, usually at
evenly-spaced intervals.

(b) The first kind of sampling, using measurements only at evenly spaced
time intervals, is simply called, sampling. The rate at which it is
performed is called the sampling
p p g ffrequency
q y ((see Fig.
g 6.2(a)).
( ))

second)) to 48 kHz. This range


discussed later.

o m
(c) For audio, typical sampling rates are from 8 kHz (8,000 samples per
g is determined by y the Nyquist
yq theorem,,

.
((d)) Sampling
c
p g in the amplitude p or voltageg dimension is called

rs
quantization. Fig. 6.2(b) shows this kind of sampling.
Fig. 6.1: An analog signal: continuous measurement
of ppressure wave.

e e
n
5 Multimedia Systems (eadeli@iust.ac.ir) 6 Multimedia Systems (eadeli@iust.ac.ir)

g i
Fundamentals of Multimedia, Chapter 6

E n Fundamentals of Multimedia, Chapter 6

o O • Thus to decide how to digitize audio data we


eed to
need o aanswer
swe thee following
o ow g ques
questions:
o s:

a D
F a 1. What is the sampling rate?

2. How finely is the data to be quantized, and is


(a) (b) quantization uniform?
q

Fig. 6.2:
Fig 6 2: Sampling and Quantization.
Quantization (a): Sampling the 3 How is audio data formatted? (file format)
3.
analog signal in the time dimension. (b): Quantization is
sampling the analog signal in the amplitude dimension.
7 Multimedia Systems (eadeli@iust.ac.ir) 8 Multimedia Systems (eadeli@iust.ac.ir)
www.jntuworld.com
Fundamentals of Multimedia, Chapter 6 Fundamentals of Multimedia, Chapter 6

N
Nyquist
i t Theorem
Th

• Signals can be decomposed into a sum of


sinusoids Fig.
sinusoids. Fig 6.3
6 3 shows how weighted
sinusoids can build up quite a complex signal.

o m
. c
e rs
Fig. 6.3: Building up a complex signal by superposing sinusoids

9 Multimedia Systems (eadeli@iust.ac.ir)

i ne 10 Multimedia Systems (eadeli@iust.ac.ir)

n g
Fundamentals of Multimedia, Chapter 6

• The
Th Nyquist
N i theorem
h states how
h frequently
f l we must sample
l in
i time
i to
O E Fundamentals of Multimedia, Chapter 6

Fig 6.4:
Fig. 6 4: Aliasing.
Aliasing
be able to recover the original sound.

D o
(a) Fig. 6.4(a) shows a single sinusoid: it is a single, pure, frequency (only
(a): A single frequency.

a
electronic instruments can create such sounds).

a
F
(b) If sampling rate just equals the actual frequency, Fig. 6.4(b) shows that
a false signal is detected: it is simply a constant, with zero frequency. (b): Sampling at exactly the frequency
produces a constant.
(c) Now if sample at 1.5 times the actual frequency, Fig. 6.4(c) shows that
we obtain an incorrect (alias) frequency that is lower than the correct
one — it is half the correct one (the wavelength, from peak to peak, is
double that of the actual signal).
signal)

(d) Thus for correct sampling we must use a sampling rate equal to at least (c): Sampling at 1.5 times per cycle
twice the maximum frequency content in the signal.
signal This rate is called produces
d an alias
l perceivedi d frequency.
f
the Nyquist rate.

11 Multimedia Systems (eadeli@iust.ac.ir) 12 Multimedia Systems (eadeli@iust.ac.ir)


www.jntuworld.com
Fundamentals of Multimedia, Chapter 6 Fundamentals of Multimedia, Chapter 6

• Nyquist Theorem: If a signal is band-limited,


band limited i.e.,
i e there
Aliasing in an Image Signal is a lower limit f1 and an upper limit f2 of frequency
p
components in the signal,
g , then the sampling
p g rate should
be at least 2(f2 − f1).

• Nyquist frequency: half of the Nyquist rate.


– Since it would be impossible to recover frequencies higher
than Nyquist frequency in any event,
event most systems have an
antialiasing filter that restricts the frequency content in the
frequency.
frequency

o m
input to the sampler to a range at or below Nyquist

. c
• The relationshipp amongg the Sampling
p g Frequency,
q y, True

rs
Frequency, and the Alias Frequency is as follows:
falias = fsampling − ftrue, for ftrue < fsampling < 2 × ftrue (6.1)
S
Sampling
li Without
With t Aliasing
Ali i S
Sampling
li With Aliasing
Ali i

e e
n
13 Multimedia Systems (eadeli@iust.ac.ir) 14 Multimedia Systems (eadeli@iust.ac.ir)

g i
Fundamentals of Multimedia, Chapter 6

E n Fundamentals of Multimedia, Chapter 6

Signal to Noise Ratio (SNR)


o O a) The power in a signal is proportional to the
squaree o
squa of thee vo
voltage.
age. For
o eexample,
a p e, if thee
• Th
The ratio
ti off the
th power off the
th correctt signal
i l and
d the
D
th noise

a
i isi
called the signal to noise ratio (SNR) — a measure of the signal voltage Vsignal is 10 times the noise,
quality of the signal.

F a
• The SNR is usually measured in decibels (dB), where 1 dB is
a tenth of a bel.
bel The SNR value,
value in units of dB,
dB is defined in
then the SNR is 20 log10(10) = 20dB.
20dB

terms of base-10 logarithms of squared voltages, as follows: b) In terms of power, if the power from ten
violins is ten times that from one violin
2
Vsignal Vsignal playing, then the ratio of power is 10dB, or
SNR = 10 log10 2
= 20 log10 (6.2)
V noise Vnoise 1B.
1B

15 Multimedia Systems (eadeli@iust.ac.ir) 16 Multimedia Systems (eadeli@iust.ac.ir)


www.jntuworld.com
Fundamentals of Multimedia, Chapter 6 Fundamentals of Multimedia, Chapter 6

• Th
The usuall levels
l l off soundd we hear
h around
d us are described
d ib d in
i terms off decibels,
d ib l as a Signal to Quantization Noise Ratio
ratio to the quietest sound we are capable of hearing. Table 6.1 shows approximate
levels for these sounds. (SQNR)
Table 6.1: Magnitude levels of common sounds, in decibels •A
Aside
id from
f any noise
i that
th t may have
h b
been presentt
Threshold of hearing 0
in the original analog signal, there is also an
Rustle of leaves 10 additional error that results from quantization.
quantization
Very quiet room 20
Average room 40
Conversation 60
(a) If voltages are actually in 0 to 1 but we have only 8
bits in which to store values, then effectively we force

m
Busy street 70
all continuous values of voltage into only 256 different

o
Loud radio 80
Train through station 90
values.
l
Riveter 100

. c
(b) This introduces a roundoff error.
error It is not really

rs
Threshold of discomfort 120
Threshold of pain 140
“noise”. Nevertheless it is called quantization noise
(or quantization error).
e
Damage to ear drum 160

17 Multimedia Systems (eadeli@iust.ac.ir)

i ne 18 Multimedia Systems (eadeli@iust.ac.ir)

n g
Fundamentals of Multimedia, Chapter 6

• The quality of the quantization is characterized


O E Fundamentals of Multimedia, Chapter 6

((c)) For
F a quantization
i i accuracy off N bits
bi per sample,
l the
h SQNR can

by the Signal to Quantization Noise Ratio


(SQNR).
D o be simply expressed:
V
2N −1

a
signal
SQNR = 20 log = 20 log
10 V 10 1

F a
(a) Quantization noise: the difference between the
actual value of the analog signal, for the
quan _ noise

= 20
0 × N × log
og 2 = 6.02
2

6.0 N (dB)
(d ) (6.3)
particular sampling time, and the nearest • Notes:
qquantization interval value.
(a) We map the maximum signal to 2N−1 − 1 (≃ 2N−1) and the most
negative signal to −2N−1.
(b) At most, this error can be as much as half of the
interval. (b) Eq. (6.3) is the Peak signal-to-noise ratio, PSQNR: peak signal and
peak noise.

19 Multimedia Systems (eadeli@iust.ac.ir) 20 Multimedia Systems (eadeli@iust.ac.ir)


www.jntuworld.com
Fundamentals of Multimedia, Chapter 6 Fundamentals of Multimedia, Chapter 6

(c) The dynamic range is the ratio of maximum to Linear and Non-linear Quantization
minimum absolute values of the signal: Vmax/Vmin. The
abs value Vmax gets mapped to 2NN−11 − 1; the min
max abs. • Linear format: samples are typically stored as uniformly quantized values.
values
abs. value Vmin gets mapped to 1. Vmin is the smallest
• Non-uniform quantization: set up more finely-spaced levels where humans
positive voltage that is not masked by noise. The most hear with the most acuity.
acuity
negative signal, −Vmax, is mapped to −2N−1.
– Weber’s Law stated formally says that equally perceived differences have values
proportional to absolute levels:
(d) The quantization interval is ΔV=(2Vmax)/2N, since
there are 2N intervals. The whole range
(Vmax − ΔV/2) is mapped to 2N−1 − 1.
g Vmax down to

o m
ΔResponse ∝ ΔStimulus/Stimulus

– Inserting a constant of proportionality k, we have a differential equation that


(6.5)

states:

. c
rs
dr = k (1/s) ds (6.6)
(e) The maximum noise, in terms of actual voltages, is
half the quantization interval: ΔV/2 = Vmax/2N.
e
with response r and stimulus s.

21 Multimedia Systems (eadeli@iust.ac.ir)

i ne 22 Multimedia Systems (eadeli@iust.ac.ir)

n g
Fundamentals of Multimedia, Chapter 6

– Integrating,
I i we arrive
i at a solution
l i
O E Fundamentals of Multimedia, Chapter 6

μ-law:
⎧⎪ s ⎫⎪
r = k ln s + C

D o (6.7)
r=
sgn( s )
ln ⎨1 + μ
( + μ ) ⎪⎩
ln(1
⎬,
s p ⎪⎭
s
sp
≤1 (6.9)

with constant of integration C.


Stated differently, the solution is

a a A-law:
⎧ ⎛ s ⎞

F
A s 1
⎪ ⎜ ⎟⎟ , ≤
r = k ln(s/s0) (6.8) ⎪ 1 + ln A ⎜⎝ s p sp A

⎪ (6.10)
r = ⎨
s0 = the lowest level of stimulus that causes a response (r = 0 when s = s0).
) ⎪
⎪ sgn (s) ⎡ s ⎤ 1 s
⎪ 1 + ln A ⎢ 1 + ln A ⎥,
s p ⎦⎥ A

sp
≤1
• Nonlinear quantization works by first transforming an analog signal from the raw s ⎩ ⎣⎢
p
space into the theoretical r space,
p , and then uniformlyy qquantizingg the resultingg
values. ⎧ 1 if s > 0,
0
where sgn( s ) = ⎨
⎩−1 otherwise
• Such a law for audio is called μ-law encoding, (or u-law). A very similar rule, called
A-law, is used in telephony in Europe. • Fig. 6.6 shows these curves. The parameter μ is set to μ = 100 or μ = 255; the
parameter A for the A-law encoder is usually set to A = 87.6.
• The equations for these very similar encodings are as follows:

23 Multimedia Systems (eadeli@iust.ac.ir) 24 Multimedia Systems (eadeli@iust.ac.ir)


www.jntuworld.com
Fundamentals of Multimedia, Chapter 6 Fundamentals of Multimedia, Chapter 6

Audio Filtering
• Prior to sampling and AD conversion,
con ersion the audio
a dio signal is also usually
s all filtered
filt d
to remove unwanted frequencies. The frequencies kept depend on the
application:

(a) For speech, typically from 50Hz to 10kHz is retained, and other frequencies
are blocked by the use of a band-pass filter that screens out lower and higher
q
frequencies.

m
(b) An audio music signal will typically contain from about 20Hz up to 20kHz.

Fig. 6.6: Nonlinear transform for audio signals o


(c) At the DA converter end, high frequencies may reappear in the output —

c
because of sampling and then quantization, smooth input signal is replaced by a

.
series of step functions containing all possible frequencies.

rs
• The μ-law in audio is used to develop a nonuniform quantization rule (d) So at the decoder side, a lowpass filter is used after the DA circuit.
for sound: uniform quantization of r gives finer resolution in s at the
quiet end.

e e
n
25 Multimedia Systems (eadeli@iust.ac.ir) 26 Multimedia Systems (eadeli@iust.ac.ir)

g i
Fundamentals of Multimedia, Chapter 6

E n Fundamentals of Multimedia, Chapter 6

Audio Quality vs. Data Rate


o O Synthetic Sounds
• Th
The uncompressedd data
d t rate
t increases
i as more bits
D
bit are usedd for

a
f
quantization. Stereo: double the bandwidth. to transmit a digital 1. FM (Frequency Modulation): one approach to
audio signal.

Quality Sample Rate


(Khz)
Bits per
Sample
F a
Table 6.2: Data rate and bandwidth in sample audio applications
Mono /
Stereo
Data Rate
(uncompressed)
Frequency Band
(KHz)
generating
g g synthetic
y sound:

x(t ) = A(t ) cos[[ωcπ t + I (t ) cos((ωmπ t + φm ) + φc ] (6 11)


(6.11)
(kB/sec)
Telephone
p 8 8 Mono 8 0.200-3.4
AM Radio 11.025 8 Mono 11.0 0.1-5.5
FM Radio 22.05 16 Stereo 88.2 0.02-11
CD 44.1 16 Stereo 176.4 0.005-20
DAT 48 16 Stereo 192.0 0.005-20
DVD Audio 192 ((max)) 24(max)
( ) 6 channels 1,200
, (max)
( ) 0-96 ((max))

27 Multimedia Systems (eadeli@iust.ac.ir) 28 Multimedia Systems (eadeli@iust.ac.ir)


www.jntuworld.com
Fundamentals of Multimedia, Chapter 6 Fundamentals of Multimedia, Chapter 6

π π

2. Wave Table synthesis: A more accurate way


of ge
o generating
e a g sou
sounds
ds from
o ddigital
g a ssignals.
g a s. Also
so
known, simply, as sampling.
π π π

In this technique,
q , the actual digital
g samples
p of
sounds from real instruments are stored. Since
m
wave tables are stored in memory on the sound
o
Fig.
g 6.7: Frequency
q y Modulation. ((a):
) A single
g frequency.
q y ((b):
) Twice the
c
card, they can be manipulated by software so
.
that sounds can be combined,
combined edited,
edited and

rs
frequency. (c): Usually, FM is carried out using a sinusoid argument to
a sinusoid. (d): A more complex form arises from a carrier frequency,
enhanced.
2πt and a modulating frequency 4πt cosine inside the sinusoid.

e e
n
29 Multimedia Systems (eadeli@iust.ac.ir) 30 Multimedia Systems (eadeli@iust.ac.ir)

g i
Fundamentals of Multimedia, Chapter 6

E n Fundamentals of Multimedia, Chapter 6

6 2 MIDI: Musical Instrument Digital


6.2
Interface
o O (c) The MIDI standard is supported by most
synthesizers, so sounds created on one synthesizer
• U
Use the
th soundd card’s
d’ defaults
d f lt for
f sounds:
scripting language and hardware setup called MIDI.
i l

a D
d ⇒ use a simple can be played and manipulated on another

• MIDI Overview
F a
(a) MIDI is a scripting language — it codes “events” that stand for
synthesizer and sound reasonably close.

(d) Computers must have a special MIDI interface,


interface
the production of sounds. E.g., a MIDI event might include but this is incorporated into most sound cards. The
values for the pitch of a single note, its duration, and its volume. sound card must also have both D/A and A/D
(b) MIDI is a standard adopted by the electronic music industry for converters.
controlling devices, such as synthesizers and sound cards, that
produce music.

31 Multimedia Systems (eadeli@iust.ac.ir) 32 Multimedia Systems (eadeli@iust.ac.ir)


www.jntuworld.com
Fundamentals of Multimedia, Chapter 6 Fundamentals of Multimedia, Chapter 6

• System
S messages
MIDI Concepts (a) Several other types of messages, e.g. a general message
• MIDI channels
h l are usedd to
t separate
t messages. for all instruments indicating a change in tuning or timing.
timing

(b) If the first 4 bits are all 1s, then the message is interpreted
(a) There are 16 channels numbered from 0 to 15. 15 The as a system common message.
channel forms the last 4 bits (the least significant bits) of
the message.
• The way
a a synthetic
s nthetic musical
m sical instrument
instr ment responds to a
MIDI message is usually by simply ignoring any play
(b) Usually a channel is associated with a particular
instrument: e.g., channel 1 is the piano, channel 10 is the
drums, etc.
g that is not for its channel.
sound message

o m
. c
– If several messages are for its channel, then the instrument

rs
((c)) Nevertheless,
N th l one can switch
it h instruments
i t t midstream,
id t if responds,
d provided
id d it is
i multi-voice,
lti i i
i.e., can play
l more
desired, and associate another instrument with any channel. than a single note at once.

e e
n
33 Multimedia Systems (eadeli@iust.ac.ir) 34 Multimedia Systems (eadeli@iust.ac.ir)

g i
Fundamentals of Multimedia, Chapter 6

E n Fundamentals of Multimedia, Chapter 6

• It
I is
i easy to confuse
f the
h term voice
i withi h the
h term timbre
i b — the

o Oh latter
is MIDI terminology for just what instrument that is trying to be
l
emulated, e.g. a piano as opposed to a violin: it is the quality of the
• G MIDI A standard
Generall MIDI: d d mappingi specifying
if i what
h
instruments (what patches) will be associated with what
sound.
sound

a D channels.

capable

(b) On
bl off playing

O the
l i many different
brass, drums, etc.

h other
h hand,
h d the
diff

F a
(a) An instrument (or sound card) that is multi-timbral is one that is
sounds

h term voice,
d at the

i while
h same time,
i

hil sometimes
i
e.g., piano,

used
d by
i

b musicians
i i
(a) In General MIDI, channel 10 is reserved for percussion
instruments and there are 128 patches associated with standard
instruments,
instruments.

(b) For
F most instruments,
i a typical
i l message might
i h beb a Note
N O
On
to mean the same thing as timbre, is used in MIDI to mean every message (meaning, e.g., a keypress and release), consisting of
different timbre and pitch that the tone module can produce at the same what channel, what pitch, and what “velocity” (i.e., volume).
time.
time

• Different timbres are produced digitally by using a patch — the set of (c) A Note On message consists of “status” byte — which channel,
control settings that define a particular timbre.
timbre Patches are often what pitch — followed by two data bytes. It is followed by a
organized into databases, called banks. Note Off message, which also has a pitch (which note to turn
off) and a velocity (often set to zero).

35 Multimedia Systems (eadeli@iust.ac.ir) 36 Multimedia Systems (eadeli@iust.ac.ir)


www.jntuworld.com
Fundamentals of Multimedia, Chapter 6 Fundamentals of Multimedia, Chapter 6

• The
Th data
d ini a MIDI status byte b i between
is b 128 and
d 255;
255 each
h • A MIDI device
d i oftenf i capable
is bl off programmability,
bili andd also
l can
of the data bytes is between 0 and 127. Actual MIDI bytes change the envelope describing how the amplitude of a sound
are 10-bit,, includingg a 0 start and 0 stop
p bit. changes over time.

• Fig. 6.9 shows a model of the response of a digital instrument to a


Note On message:

o m
Fig.
g 6.8: Stream of 10-bit bytes;
y ; for typical
yp MIDI messages,
g ,
. c
rs
these consist of {Status byte, Data Byte, Data Byte} = {Note
On, Note Number, Note Velocity}

e
Fig. 6.9: Stages of amplitude versus time for a music note

37 Multimedia Systems (eadeli@iust.ac.ir)

i ne 38 Multimedia Systems (eadeli@iust.ac.ir)

n g
Fundamentals of Multimedia, Chapter 6

O E Fundamentals of Multimedia, Chapter 6

• Th
The physical
h i l MIDI ports consisti off 5-pin
5 i connectors for
f

• The
Hardware Aspects of MIDI
Th MIDI hardware
h d setup
t consists
i t off a 31.25
D
31 25 kbps
kb serial
i l o IN and OUT, as well as a third connector called THRU.

p devices or Output
Input
a
p devices,, not both. a
connection. Usually, MIDI-capable units are either (a) MIDI communication is half-duplex.

• A traditional synthesizer
y F
is shown in Fig.
g 6.10:
(b) MIDI IN is the connector via which the device receives all
MIDI data.

(c) MIDI OUT is the connector through which the device


transmits all the MIDI data it g
generates itself.

(d) MIDI THRU is the connector by which the device echoes


the data it receives from MIDI IN.IN Note that it is only the
MIDI IN data that is echoed by MIDI THRU — all the data
Fig. 6.10: A MIDI synthesizer generated by the device itself is sent via MIDI OUT.

39 Multimedia Systems (eadeli@iust.ac.ir) 40 Multimedia Systems (eadeli@iust.ac.ir)


www.jntuworld.com
Fundamentals of Multimedia, Chapter 6 Fundamentals of Multimedia, Chapter 6

• A typical
i l MIDI sequencer setup is
i shown
h i Fig.
in Fi 6.11:
6 11
Structure of MIDI Messages
• MIDI messages can be
b classified
l ifi d into
i two types: channel
h l
messages and system messages, as in Fig. 6.12:

o m
. c
Fig. 6.11: A typical MIDI setup
e rsFig.
g 6.12: MIDI message
g taxonomyy
41 Multimedia Systems (eadeli@iust.ac.ir)

i ne 42 Multimedia Systems (eadeli@iust.ac.ir)

n g
Fundamentals of Multimedia, Chapter 6

• A. l messages: can have


A Channel
Ch h up to 3 bytes:
b
O E Fundamentals of Multimedia, Chapter 6

T bl 6.3:
Table 6 3 MIDI voice
i messages

D o
a) The first byte is the status byte (the opcode, as it were); has its most significant bit set to 1.
Voice Message
g Status Byte
y Data Byte1
y Data Byte2
y

a
b) The 4 low-order bits identify which channel this message belongs to (for 16 possible channels). Note Off &H8n Key number Note Off velocity

• A.1. Voice messages:

F a
c) The 3 remaining bits hold the message. For a data byte, the most significant bit is set to 0.

a) This type of channel message controls a voice, i.e., sends information specifying which note to
play or to turn off, and encodes key pressure.
Note On
Poly. Key Pressure
Control Change
P
Program Change
Ch
&H9n
&HAn
&HBn
&HC
&HCn
Key number
Key number
Controller num.
P
Program number
b
Note On velocity
Amount
Controller value
N
None
Channel Pressure &HDn Pressure value None
b) Voice messages are also used to specify controller effects such as sustain, vibrato, tremolo, and
the ppitch wheel. Pitch
tc Bend
e d &HEn
& MSB
S LSB
S
c) Table 6.3 lists these operations.

(** &H indicates hexadecimal,


hexadecimal and ‘n’
n in the status byte hex
value stands for a channel number. All values are in 0..127
except Controller number, which is in 0..120)

43 Multimedia Systems (eadeli@iust.ac.ir) 44 Multimedia Systems (eadeli@iust.ac.ir)


www.jntuworld.com
Fundamentals of Multimedia, Chapter 6 Fundamentals of Multimedia, Chapter 6

• A.2.
A 2 Channel
Ch l mode
d messages: T bl 6.4:
Table 6 4 MIDI mode
d messages
1st Data Byte
y Description
p Meaningg of 2nd Data
a)) Ch
Channell mode
d messages: special
i l case off the
th Control
C t l Byte
Change message → opcode B (the message is &HBn, or
&H79 Reset all controllers None; set to 0
1011nnnn).
&H7A L l control
Local t l 0 = off;
ff 127 = on

b) However, a Channel Mode message has its first data byte &H7B All notes off None; set to 0
in 121 through 127 (&H79
(&H79–7F)
7F). &H7C Omni mode off None; set to 0
&H7D Omni mode on None; set to 0
c) Channel mode messages determine how an instrument
processes MIDI voice messages: respond to all messages,
&H7E
&H7F

o m
Mono mode on (Poly mode off)
Poly mode on (Mono mode off)
Controller number
None; set to 0
respond just to the correct channel, don’t respond at all, or
ggo over to local control of the instrument.
. c
d) The data bytes have meanings as shown in Table 6.4.

e rs
45 Multimedia Systems (eadeli@iust.ac.ir)

i ne 46 Multimedia Systems (eadeli@iust.ac.ir)

n g
Fundamentals of Multimedia, Chapter 6

O E Fundamentals of Multimedia, Chapter 6

B.1.
B 1 System
S common messages: relate
l to timing
i i or

o
• B. System Messages:
positioning.

a D
a) System messages have no channel number — Table 6.5: MIDI System Common messages.

F a
commands that are not channel specific, such as
timing signals for synchronization, positioning
information in pre-recorded MIDI sequences, and
detailed setup information for the destination device.
device
System Common Message
MIDI Timing Code
Song Position Pointer
Status Byte
&HF1
&HF2
Number of Data Bytes
1
2
Song Select &HF3 1
Tune Request &HF6 None
b) Opcodes for all system messages start with &HF.
&HF
EOX (terminator) &HF7 None

c) System messages are divided into three classifications,


classifications
according to their use:

47 Multimedia Systems (eadeli@iust.ac.ir) 48 Multimedia Systems (eadeli@iust.ac.ir)


www.jntuworld.com
Fundamentals of Multimedia, Chapter 6 Fundamentals of Multimedia, Chapter 6

B.2. System real-time messages: related to synchronization. B.3. System exclusive message: included so that the
MIDI
T bl 6.6:
Table 6 6 MIDI System
S t R Real-Time
l Ti messages. standard
d d can be
b extended
d d by
b manufacturers.
f
System Real-Time Message Status Byte
Timing Clock &HF8 a)) Af
After the
h initial
i i i l code,
d a stream off any specific
ifi messages
Start Sequence &HFA can be inserted that apply to their own product.
Continue Sequence &HFB
b) A System Exclusive message is supposed to be terminated

m
Stop Sequence &HFC
y a terminator byte
by y &HF7, as specified in Table 6.

o
Active Sensingg &HFE

c
System Reset &HFF

b ended
be

rs
d d by
.
c) The terminator is optional and the data stream may simply
b sending
di theh status byte
b off theh next message.

e e
n
49 Multimedia Systems (eadeli@iust.ac.ir) 50 Multimedia Systems (eadeli@iust.ac.ir)

g i
Fundamentals of Multimedia, Chapter 6

E n Fundamentals of Multimedia, Chapter 6

General MIDI
o O MIDI to WAV Conversion
• General MIDI is a scheme for standardizing the assignment
of instruments to patch numbers.

a D • SSome programs, suchh as early l versions


i off
Premiere, cannot include .mid files — instead,

struck: a bongo drum, a cymbal.


F a
a) A standard percussion map specifies 47 percussion sounds.
sounds

b) Where a “note” appears on a musical score determines what percussion instrument is being

c) Other requirements for General MIDI compatibility: MIDI device must support all 16 channels;
they insist on .wav
wav format files.
files

a) Various shareware programs exist for approximating a


a device must be multitimbral (i.e., each channel can play a different instrument/program); a reasonable conversion between MIDI and WAV
device must be polyphonic (i.e., each channel is able to play many voices); and there must be a
minimum of 24 dynamically allocated voices. formats.

• General MIDI Level2: An extended general MIDI has recently been defined, with a b) These programs essentially consist of large lookup
standard
t d d .smf f “Standard
“St d d MIDI File” Fil ” format
f t defined
d fi d — inclusion
i l i off extra
t files that try to substitute pre-defined
pre defined or shifted WAV
character information, such as karaoke lyrics. output for MIDI messages, with inconsistent success.

51 Multimedia Systems (eadeli@iust.ac.ir) 52 Multimedia Systems (eadeli@iust.ac.ir)


www.jntuworld.com
Fundamentals of Multimedia, Chapter 6 Fundamentals of Multimedia, Chapter 6

6.3 Quantization and Transmission of Audio c) The result of reducing the variance of values is
that lossless compression methods produce a
• C
Coding A di Quantization
di off Audio: Q i i andd transformation
f i off bitstream with shorter bit lengths for more likely
data are collectively known as coding of the data. values (→ expanded discussion in Chap.7).

a) For audio, the μ-law technique for companding audio


signals is usually combined with an algorithm that exploits • In general,
general producing quantized sampled output
the temporal redundancy present in audio signals.
for audio is called PCM (Pulse Code
b) Differences in signals between the present and a past time M d l ti ) The
Modulation). Th differences
diff
o
versionm
i isi called
ll d
can reduce the size of signal values and also concentrate
. c
DPCM (and a crude but efficient variant is

rs
th histogram
the hi t off pixel
i l values
l (diff
(differences, now)) into
i t a
much smaller range. called DM). The adaptive version is called

e e ADPCM.

n
53 Multimedia Systems (eadeli@iust.ac.ir) 54 Multimedia Systems (eadeli@iust.ac.ir)

g i
Fundamentals of Multimedia, Chapter 6

E n Fundamentals of Multimedia, Chapter 6

Pulse Code Modulation


o O
D
• The basic techniques for creating digital signals
a
from analogg signals
quantization.
g are sampling
a
p g and

F
• Quantization consists of selecting breakpoints (a) (b)
in magnitude, and then re-mapping any value
within an interval to one of the representative Fig 6.2:
Fig. 6 2: Sampling and Quantization.
Quantization
output levels. −→ Repeat of Fig. 6.2:

55 Multimedia Systems (eadeli@iust.ac.ir) 56 Multimedia Systems (eadeli@iust.ac.ir)


www.jntuworld.com
Fundamentals of Multimedia, Chapter 6 Fundamentals of Multimedia, Chapter 6

a)) Th
The set off interval
i l boundaries
b d i are called
ll d decision
d ii • Every compression scheme has three stages:
boundaries, and the representative values are called
reconstruction levels. A. The input data is transformed to a new
representation that is easier or more efficient to
compress.
b) The boundaries for quantizer input intervals that will
all be mapped into the same output level form a coder
mapping. B. We may introduce loss of information. Quantization is
the main lossy step we use a limited number of
c) The representative values that are the output values reconstruction levels, fewer than in the original signal.
from a q
quantizer are a decoder mapping.
pp g

o m
d) Finally, we may wish to compress the data, by
. c
C. Coding. Assign a codeword (thus forming a binary
bitstream)) to each output
p level or symbol.
y This could

rs
assigning
i i a bitbi stream that
h uses fewer
f bi for
bits f the
h most be a fixed-length code, or a variable length code such
prevalent signal values (Chap. 7).
as Huffman coding (Chap. 7).

e e
n
57 Multimedia Systems (eadeli@iust.ac.ir) 58 Multimedia Systems (eadeli@iust.ac.ir)

g i
Fundamentals of Multimedia, Chapter 6

E n Fundamentals of Multimedia, Chapter 6

• For audio signals, we first consider PCM for


ddigitization.
g a o . Thiss leads
eads too Lossless
oss ess Predictive
ed c ve o O PCM in Speech Compression
Coding as well as the DPCM scheme; both
a D • Assuming
A i a bandwidth
b d idth forf speechh from
f about
b t 50 HzH to
t about
b t 10 kHz,
the Nyquist rate would dictate a sampling rate of 20 kHz.
kH

methods use differential coding.


F a
coding As well,
well we
look at the adaptive version, ADPCM, which
can provide
id better
b tt compression.
i
(a) Using uniform quantization without companding, the minimum sample
size we could get away with would likely be about 12 bits. Hence for
mono speech transmission the bit-rate would be 240 kbps.

(b) With companding, we can reduce the sample size down to about 8 bits
with the same perceived level of quality, and thus reduce the bit-rate to
160 kbps.
kbps

(c) However, the standard approach to telephony in fact assumes that the
highest-frequency audio signal we want to reproduce is only about 4
kHz. Therefore the sampling rate is only 8 kHz, and the companded bit-
rate thus reduces this to 64 kbps.

59 Multimedia Systems (eadeli@iust.ac.ir) 60 Multimedia Systems (eadeli@iust.ac.ir)


www.jntuworld.com
Fundamentals of Multimedia, Chapter 6 Fundamentals of Multimedia, Chapter 6

• H
However, there
h are two small
ll wrinkles
i kl we must
also address:

1. Since only sounds up to 4 kHz are to be considered,


all other frequency content must be noise.
noise Therefore,
Therefore
we should remove this high-frequency content from
the analog input signal. This is done using a band-
li iti filter
limiting filt that
th t blocks
bl k outt high,
hi h as wellll as very
low, frequencies.

– Also, once we arrive at a pulse signal, such as that in


o m
Fig. 6.13(a) below, we must still perform DA
. c
rs
conversion
i and d then
h construct a final
fi l output analog
l Fig. 6.13: Pulse Code Modulation (PCM). (a) Original analog signal
signal. But, effectively, the signal we arrive at is the and its corresponding PCM signals. (b) Decoded staircase signal. (c)
staircase shown in Fig.
g 6.13(b).
( )

e eReconstructed signal after low-pass filtering.

n
61 Multimedia Systems (eadeli@iust.ac.ir) 62 Multimedia Systems (eadeli@iust.ac.ir)

g i
Fundamentals of Multimedia, Chapter 6

E n Fundamentals of Multimedia, Chapter 6

2 A discontinuous
2. di i signal
i l contains i not just
j
frequency components due to the original signal,
o O • Th
The completel scheme
h f
for encoding
di andd decoding
d di
telephony signals is shown as a schematic in Fig. 6.14.

D
but also a theoretically infinite set of higher-
frequency components:
higher

a
As a result of the low
low-pass
pass filtering, the output becomes
smoothed and Fig. 6.13(c) above showed this effect.

signal
g pprocessing. g F a
(a) This result is from the theory of Fourier analysis, in

(b) These higher frequencies are extraneous.

(c) Therefore the output of the digital-to-analog


converter goes to a low-pass
low pass filter that allows only
frequencies up to the original maximum to be retained.
Fig. 6.14: PCM signal encoding and decoding.

63 Multimedia Systems (eadeli@iust.ac.ir) 64 Multimedia Systems (eadeli@iust.ac.ir)


www.jntuworld.com
Fundamentals of Multimedia, Chapter 6 Fundamentals of Multimedia, Chapter 6

Differential Coding of Audio (b) For example, as an extreme case the histogram
for a linear ramp signal that has constant slope is
• A
Audio
di isi often
f storedd not in
i simple
i l PCM C but b flat, whereas the histogram for the derivative of the
instead in a form that exploits differences — signal (i.e., the differences, from sampling point to
which
hi h are generally
ll smaller
ll numbers,
b so offer
ff the
th sampling point) consists of a spike at the slope
possibility of using fewer bits to store. value.

(a) If a time-dependent signal has some consistency over


time
i (“
(“temporal
l redundancy”),
d d ”) the
h difference
diff signal,
i l
(c) So if we then go on to assign bit
o m bit-string
string
subtracting the current sample from the previous one,
will have a more peaked histogram,
histogram with a maximum
. c
codewords to differences, we can assign short
codes to prevalent values and long codewords to
around zero.

e rs
rarely occurring ones.

65 Multimedia Systems (eadeli@iust.ac.ir)

i ne 66 Multimedia Systems (eadeli@iust.ac.ir)

n g
Fundamentals of Multimedia, Chapter 6

O E Fundamentals of Multimedia, Chapter 6

(c) But it is often the case that some function of a


Lossless Predictive Coding
• Predictive coding: simply
D o
simpl means transmitting differences — predict the next
ne t
few of the pprevious values,, fnn−11, fnn−22, fnn−33, etc.,,
the difference between previous and next.

a a
sample as being equal to the current sample; send not the sample itself but provides a better prediction. Typically, a linear
predictor function is used:
PCM system.
F
(a) Predictive coding consists of finding differences, and transmitting these using a

(b) Note that differences of integers will be integers. Denote the integer input
signal as the set of values fn. Then we predict values lf n as simply the previous
2 to 4
f n = ∑ an − k f n − k
l (6.13)
value, and define the error en as the difference between the actual and the k =1
predicted signal:
l
f n = f n −1

en = f n − l
fn (6.12)

67 Multimedia Systems (eadeli@iust.ac.ir) 68 Multimedia Systems (eadeli@iust.ac.ir)


www.jntuworld.com
Fundamentals of Multimedia, Chapter 6 Fundamentals of Multimedia, Chapter 6

• Th
The idea
id off forming
f i differences
diff i to make
is k the
h histogram
hi off
sample values more peaked.

(a) For example, Fig.6.15(a) plots 1 second of sampled speech at 8


kHz, with magnitude resolution of 8 bits per sample.

(b) A histogram of these values is actually centered around zero, as


in Fig. 6.15(b).

(c) Fig. 6.15(c) shows the histogram for corresponding speech


signal
g differences: difference values are much more clustered
around zero than are sample values themselves.
o m
(d) As a result,
result a method that assigns short codewords to frequently
. c
rs
occurring symbols will assign a short code to zero and do rather Fig. 6.15: Differencing concentrates the histogram. (a): Digital speech
well: such a coding scheme will much more efficiently code signal. (b): Histogram of digital speech signal values. (c): Histogram of
sample
p differences than samples
p themselves.

e edigital speech signal differences.

n
69 Multimedia Systems (eadeli@iust.ac.ir) 70 Multimedia Systems (eadeli@iust.ac.ir)

g i
Fundamentals of Multimedia, Chapter 6

E n Fundamentals of Multimedia, Chapter 6

• O
One problem:
bl suppose our integer
i sample
l values
l are in
i the
h range
0..255. Then differences could be as much as -255..255 —we’ve
increased our dynamic range (ratio of maximum to minimum) by a
o O • Lossless predictive coding — the decoder
produces
p oduces thee sa
samee ssignals
g a s as thee o
original.
g a . Ass a

a D
factor of two → need more bits to transmit some differences.
differences
simple example, suppose we devise a predictor
standing
t di

(b) Then
Th
f
for Shift U and
Shift-Up

we can use codewords


d F a
(a) A clever solution for this: define two new codes, denoted SU and SD,
d Shift-Down.
values will be reserved for these.
Shift D

d for
f only
S
Some

l a limited
special
i l code
d

li it d sett off signal


i l
l
for as follows:
fn

l ⎢1 ⎥
f n = ⎢ ( f n −1 + f n − 2 ) ⎥
differences, say only the range −15..16. Differences which lie in the
limited range can be coded as is, but with the extra two values for SU, ⎣2 ⎦
SD,, a value outside the range
g −15..16 can be transmitted as a series of
shifts, followed by a value that is indeed inside the range −15..16. (6 14)
(6.14)
((c)) For example,
p , 100 is transmitted as: SU,, SU,, SU,, 4,, where ((the codes
for) SU and for 4 are what are transmitted (or stored). en = f n − l
fn

71 Multimedia Systems (eadeli@iust.ac.ir) 72 Multimedia Systems (eadeli@iust.ac.ir)


www.jntuworld.com
Fundamentals of Multimedia, Chapter 6 Fundamentals of Multimedia, Chapter 6

• Let’s
L ’ consider
id an explicit
li i example.l Suppose
S we wish
i h to code
d
the sequence f1, f2, f3, f4, f5 = 21, 22, 27, 25, 22. For the • The error does center around zero, we see, and
ppurposes
p of the ppredictor,, we’ll invent an extra signal
g value codingg (ass
cod (assigning
g gb bit-string
s g codewo
codewords)
ds) w
will be
f0, equal to f1 = 21, and first transmit this initial value,
uncoded: efficient. Fig. 6.16 shows a typical schematic
l
f 2 = 21,
21 e2 = 22 − 21 = 1;
diagram used to encapsulate this type of
l ⎢1 ⎥ ⎢1 ⎥
f3 = ⎢ ( f 2 + f1 ) ⎥ = ⎢ (22 + 21) ⎥ = 21,
system:
⎣2 ⎦ ⎣2 ⎦
e3 = 27 − 21 = 6;

l ⎢1 ⎥ ⎢1 ⎥
f 4 = ⎢ ( f3 + f 2 ) ⎥ = ⎢ (27 + 22) ⎥ = 24,
⎣2 ⎦ ⎣2 ⎦
24

o m
e4 = 25 − 24 = 1;

. c
rs
l ⎢1 ⎥ ⎢1 ⎥
f5 = ⎢ ( f 4 + f3 ) ⎥ = ⎢ (25 + 27) ⎥ = 26,
⎣2 ⎦ ⎣2 ⎦
(6.15)
e5 = 22 − 26 = − 4

e e
n
73 Multimedia Systems (eadeli@iust.ac.ir) 74 Multimedia Systems (eadeli@iust.ac.ir)

g i
Fundamentals of Multimedia, Chapter 6

E n Fundamentals of Multimedia, Chapter 6

o O DPCM

a D • Differential
Diff ti l PCM is
i exactly
tl the
th same as Predictive
P di ti
Coding, except that it incorporates a quantizer

F a step.
step

(a) One scheme for analytically determining the best set


of quantizer steps, for a non-uniform quantizer, is the
Lloyd-Max quantizer, which is based on a least-
squares minimization
i i i ti off theth error term.
t

(b) Our nomenclature: signal values: fn – the original


Fig. 6.16: Schematic diagram for Predictive Coding signal, lf n – the predicted signal, and if n the quantized,
encoder and decoder. reconstructed signal.
75 Multimedia Systems (eadeli@iust.ac.ir) 76 Multimedia Systems (eadeli@iust.ac.ir)
www.jntuworld.com
Fundamentals of Multimedia, Chapter 6 Fundamentals of Multimedia, Chapter 6

( ) DPCM
(c) DPCM: formf the
h prediction;
di i f
form an error en by
b subtracting
b i the
h
prediction from the actual signal; then quantize the error to a quantized (d) The main effect of the coder-decoder process
version, . ei n

The set of equations that describe DPCM are as follows: iss too p
produce
oduce reconstructed,
eco s uc ed, qua
quantized
ed ssignal
g a
i l
fn ei
l
f n = function _ of ( if n −1 , if n − 2 , if n −3 ,...) , values = + . fn n

en = f n − l
fn ,

ein = Q[en ],
]
The distortion is the average g squared
q error
N
[∑ ( f − f ) ] / N ; one often plots distortion versus
2

m
n n
n =1
transmit codeword (ein ) ,

reconstruct: i
f n = fˆn + ein .
(6.16) the number of bit-levels

. c o
bit levels used.
used A Lloyd-Max
Lloyd Max
quantizer will do better (have less distortion)

rs
Then codewords for quantized error values ei are produced using n
than
h a uniform if quantizer.
i
entropy coding, e.g. Huffman coding (Chapter 7).

e e
n
77 Multimedia Systems (eadeli@iust.ac.ir) 78 Multimedia Systems (eadeli@iust.ac.ir)

g i
Fundamentals of Multimedia, Chapter 6

E n Fundamentals of Multimedia, Chapter 6

• For speech, we could modify quantization steps


adaptively by estimating the mean and variance of a
o O • Since
Si signal
i l differences
diff are very peaked,
k d we could
ld model
d l them
h using
i
a Laplacian probability distribution function, which is strongly
peaked at zero: it looks like
patch of signal values,
values and shifting quantization steps
accordingly, for every block of signal values. That is,
a D ( x) = (1/ 2σ 2 )exp (− 2 | x | /σ ) (6.18)

and try to minimize the quantization error:


F a
starting at time i we could take a block of N values fn
for variance σ2.

• S
So typically
t i ll one assigns i quantization
ti ti steps
t f a quantizer
for ti with
ith
i + N −1 nonuniform steps by assuming signal differences, dn are drawn from

∑ (f
such a distribution and then choosing steps to minimize
n − Q[ f n ])
2
min (6 17)
(6.17) i + N −1

∑ (d − Q[d n ]) l (d n ).
2
n =i min n (6.19)
n =i

79 Multimedia Systems (eadeli@iust.ac.ir) 80 Multimedia Systems (eadeli@iust.ac.ir)


www.jntuworld.com
Fundamentals of Multimedia, Chapter 6 Fundamentals of Multimedia, Chapter 6

• This
Thi is
i a least-squares
l problem,
bl andd can be
b solved
l d iteratively
i i l • Notice
N i thath theh quantization
i i noise,i f n − fn , is
i equall to the
h quantization
i i
== the Lloyd-Max quantizer. e − 
e
effect on the error term, n n .

• Let’s look at actual numbers: Suppose we adopt the particular


• Schematic diagram for DPCM: predictor below:

(
fˆn = trunc fn −1 + fn − 2 ) (6.19)

th t en = f n − fˆn is
so that i an integer.
i t

• As well,, use the qquantization scheme:

o m
c
en = Q[en ] = 16*trunc ( ( 255 + en ) /16 ) − 256 + 8

.
Fig.
g 6.17: Schematic diagram
g for DPCM encoder and decoder
e rs fn = fˆn + en (6.20)

81 Multimedia Systems (eadeli@iust.ac.ir)

i ne 82 Multimedia Systems (eadeli@iust.ac.ir)

n g
Fundamentals of Multimedia, Chapter 6

• First, we note that the error is in the range


O E Fundamentals of Multimedia, Chapter 6

• Table
T bl 6.7
6 7 gives
i output values
l f any off the
for h input
i codes:
d 4-bit
4 bi codes
d are mapped
d to 32

55.. 55, i.e.,


−255..255, .e., there
e e aaree 5511 poss
possible
b e levels
Deve so reconstruction levels in a staircase fashion.

Table 6.7
6 7 DPCM quantizer reconstruction levels.
levels
for the error term. The quantizer simply

a a
divides the error range into 32 patches of about
en in range
-255
255 .. -240
240
Quantized to value
-248
248

reconstructed
t t d valuel for
f eachh patch
F
16 levels each. It also makes the representative
t h equall to
t the
th
-239 .. -224
.
.
.
-31 .. -16
-232
.
.
.
-24
midway point for each group of 16 levels. -15 .. 0
1 .. 16
-8
8
17 .. 32 24
. .
. .
. .
225 .. 240 232
241 .. 255 248

83 Multimedia Systems (eadeli@iust.ac.ir) 84 Multimedia Systems (eadeli@iust.ac.ir)


www.jntuworld.com
Fundamentals of Multimedia, Chapter 6 Fundamentals of Multimedia, Chapter 6

• As
A an example
l stream off signal
i l values,
l consider
id the
h set off values:
l

f1 f2 f3 f4 f5
DM
130 150 140 200 300 • DM (Delta Modulation):
Mod lation): simplified version
ersion of DPCM.
DPCM Often used
sed as a quick
q ick
AD converter.
• Prepend extra values f = 130 to replicate the first value, f1. Initialize with quantized
error ei1 ≡ 0, so that the first reconstructed value is exact: i
f=1 130. Then the rest of 1
1. Delta DM: use only a single quantized error value,
Uniform-Delta
Uniform value either positive
the values calculated are as follows (with prepended values in a box): or negative.
l
f = 130, 130, 142, 144, 167 (a) ⇒ a 1-bit
b t code
coder.. Produces
oduces coded output tthat
at follows
o ows tthee o
original
g a ssignal
g a in a
e = 0 , 20, −2, 56, 63 staircase fashion. The set of equations is:
e =

m
0 , 24, −8, 56, 56 fˆn = fn −1 ,
f =
i 130, 154, 134, 200, 223 en = f n − fˆn = f n − fn −1 ,

so that the first reconstructed value i


reconstructed i
i
• On the decoder side, we again assume extra values f equal to the correct value f 1 ,
f 1 is correct. What is received is ein , and the
f n is identical to that on the encoder side,
side provided we use exactly
i

. c o
⎧+ k if en > 0, where k is a constant
en = ⎨

rs
⎩−k otherwise
the same prediction rule.
fn = fˆn + en .

e e N t that
Note th t the
th prediction
di ti simply
i l involves
i l a delay.
d l

n
85 Multimedia Systems (eadeli@iust.ac.ir) 86 Multimedia Systems (eadeli@iust.ac.ir)

g i
Fundamentals of Multimedia, Chapter 6

E n Fundamentals of Multimedia, Chapter 6

(b) Consider
C id actuall numbers:
b S
Suppose
f1
signal
i l values
l
f2
are
f3 f4
o O 2. Adaptive DM: If the slope of the actual signal
curve
cu ve iss high,
g , thee sstaircase
a case app
approximation
o a o
10 11 13
i
15
As well, define an exact reconstructed value f 1 = f1 = 10 .

a D cannot keep up. For a steep curve, should


(c) E.g.,
E g use step value k = 4:

e2 = 11 − 10 = 1,
e3
3 = 13 − 14 = −1,
1
F a change the step size k adaptively.
adaptively

e4 = 15 − 10 = 5,
– One scheme for analytically determining the best
set of qquantizer steps,
p for a non-uniform qquantizer,
The reconstructed set of values 10, 14, 10, 14 is close to the correct set 10, 11, 13,
15.
is Lloyd-Max.
(d) However, DM copes less well with rapidly changing signals. One approach to
mitigating this problem is to simply increase the sampling, perhaps to many times
the Nyquist rate.

87 Multimedia Systems (eadeli@iust.ac.ir) 88 Multimedia Systems (eadeli@iust.ac.ir)


www.jntuworld.com
Fundamentals of Multimedia, Chapter 6 Fundamentals of Multimedia, Chapter 6

2 We
2. W can alsol adapt
d the
h predictor,
di again
i using
i forward
f d
ADPCM or backward adaptation. Making the predictor
coefficients adaptive
p is called Adaptive
p Predictive
• ADPCM (Adaptive
(Ad ti DPCM) takest k theth idea
id off adapting
d ti the th Coding (APC):
coder to suit the input much farther. The two pieces that
make up a DPCM coder: the quantizer and the predictor.
( ) Recall
(a) ll that
h the
h predictor
di i usually
is ll taken
k to be b a linear
li
function of previous reconstructed quantized values, i
f n.
1. In Adaptive DM, adapt the quantizer step size to suit the input. In
DPCM we can change the step size as well as decision
DPCM,
boundaries, using a non-uniform quantizer. (b) The number of previous values used is called the “order” of
We can carry this out in two ways:
m
the predictor. For example, if we use M previous values, we
need M coefficients ai, i = 1..M in a ppredictor

o
c
(a) Forward adaptive quantization: use the properties of the input
signal.

(b) Backward adaptive quantizationor: use the properties of the


quantized output. If quantized errors become too large, we should
rs . ∑ fˆn =
M

i =1
ai fn −i (6 22)
(6.22)

g the non-uniform q
change quantizer.

e e
n
89 Multimedia Systems (eadeli@iust.ac.ir) 90 Multimedia Systems (eadeli@iust.ac.ir)

g i
Fundamentals of Multimedia, Chapter 6

E n Fundamentals of Multimedia, Chapter 6

•H
However we can get into
i a difficult
diffi l situation
i i if we try to change
h
prediction coefficients, that multiply previous quantized values, O
the

o
h
because that makes a complicated set of equations to solve for these
((c)) Instead,
I d one usually
ll resorts to solving
l i the
problem that results from using not in thei
h simpler
i l
f prediction, n

coefficients:

a D but instead simply


p y the signal
g fn itself. Explicitly
p
writing in terms of the coefficients ai, we wish to
solve:
y

minimization
i i i i trying
i to find
fi d the
h best

F a
(a) Suppose we decide to use a least-squares approach to solving a
b values
N
l off the
h ai:

min ∑ ( f n − fˆn ) 2 ((6.23)) N M


n =1 min ∑ ( f n − ∑ ai f n −i ) 2 (6.24)
n =1 i =1
(b) Here we would sum over a large number of samples fn, for the current
patch of speech, say. But because l f n depends on the quantization we
have a difficult problem to solve. As well, we should really be changing
the fineness of the qquantization at the same time,, to suit the signal’s
g Differentiation with respect to each of the ai, and
changing nature; this makes things problematical. setting
tti t zero, produces
to d a linear
li system
t off M
equations that is easy to solve. (The set of equations is
called the Wiener-Hopfp equations.)
q )
91 Multimedia Systems (eadeli@iust.ac.ir) 92 Multimedia Systems (eadeli@iust.ac.ir)
www.jntuworld.com
Fundamentals of Multimedia, Chapter 6

• Fi
Fig. 6.18
6 18 shows
h a schematic
h i diagram
di f the
for h ADPCM
coder and decoder:

o m
. c
rs
Fig. 6.18: Schematic diagram for ADPCM encoder and
decoder

e e
n
93 Multimedia Systems (eadeli@iust.ac.ir)

g i
E n
o O
a D
F a

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy