Lecture 3
Lecture 3
Lecture 3 - Sound/Audio
Introduction to Multimedia 1
Objectives
To understand how computers process sound.
Introduction to Multimedia 2
Contents
The Nature of Sound
Computer Representation of Sound
Computer Music — MIDI
Summary —MIDI versus digital audio
Exercises
Introduction to Multimedia 3
The Nature of Sound
Sound is a physical phenomenon produced by the
vibration of matter and transmitted as waves.
Introduction to Multimedia 4
The Nature of Sound
However, the perception of sound by human beings is
a very complex process. It involves three systems:
The source which emits sound;
The medium through which the sound propagates;
The detector which receives and interprets the sound.
Introduction to Multimedia 5
The Nature of Sound
Sounds we heard everyday are very complex.
Every sound is comprised of waves of many different
frequencies and shapes. But the simplest sound we
can hear is a sine wave.
Sound waves can be characterized by the following
attributes:
Period, Frequency, Amplitude, Bandwidth,
Pitch, Loudness, Dynamic.
Introduction to Multimedia 6
The Nature of Sound
Introduction to Multimedia 7
Pitch and Frequency
Period is the interval of time at which a periodic signal repeats
regularly.
Frequency measures a physical property of a wave. It is the
reciprocal value of period f =1/P. The sensation of a frequencies is
commonly referred to as the pitch of a sound.
The unit is Herts (Hz) or kiloHertz (kHz).
Introduction to Multimedia 8
Pitch and Frequency
Introduction to Multimedia 9
Loudness and Amplitude
The other important perceptual quality is loudness or
volume.
Amplitude is the measure of sound levels.
For a digital sound, amplitude is the sample value.
The reason that sounds have different loudness is that
they carry different amount of power.
Introduction to Multimedia 10
Loudness and Amplitude
The unit of power is watt. The intensity of sound is
the amount of power transmitted through an area of 1
m2 oriented perpendicular to the propagation direction
of the sound.
If the intensity of a sound is 1 watt/m2, we may start
feel the sound. The ear may be damaged. This is
known as the threshold of feeling.
Introduction to Multimedia 11
Loudness and Amplitude
If the intensity is 10 -12 watt/m2, we may just be able to
hear it. This is know as the threshold of hearing.
The relative intensity of two different sounds is
measured using the deciBel (dB).
Very often, we will compare a sound with the
threshold of hearing.
Introduction to Multimedia 12
Introduction to Multimedia 13
Dynamic and Bandwidth
Dynamic range means the change in sound levels.
For example, a large orchestra can reach 130dB at its
climax and drop to as low as 30dB at its softest,
giving a range of 100dB.
Introduction to Multimedia 14
Dynamic and Bandwidth
Introduction to Multimedia 15
Computer Representation of
Sound
Sound waves are continuous while computers are
good at handling discrete numbers.
In order to store a sound wave in a computer,
samples of the wave are taken.
Each sample is represented by a number, the ‘code’.
This process is known as digitization.
This method of digitizing sound is know as pulse
code modulation (PCM).
Introduction to Multimedia 16
Computer Representation of
Sound
According to Nyquist sampling theorem, in order to capture all
audible frequency components of a sound, i.e., up to 20kHz,
we need to set the sampling to at least twice of this.
This is why one of the most popular sampling rate for high
quality sound is 44,100 sample/sec.
Another aspect we need to consider is the resolution
(Quantization bands), i.e., the number of bits used to represent
a sample.
Often, 16-bits are used for each sample in high quality sound.
Introduction to Multimedia 17
Quality versus File Size
The size of a digital recording depends on the
sampling rate, resolution and number of channels.
S = R (b/8) C D
S file size bytes
R sampling rate (samples / second)
b resolution bits
C channels 1 - mono, 2 - stereo
D recording duration seconds
Introduction to Multimedia 18
Introduction to Multimedia 19
Introduction to Multimedia 20
Quality versus File Size
High quality sound files are very big, however, the file size can
be reduced by compression.
Introduction to Multimedia 21
File size for some common
sampling rates and resolutions
Introduction to Multimedia 22
File size for some common
sampling rates and resolutions
Introduction to Multimedia 23
Audio File Formats
The most commonly used digital sound format in Windows systems is .wav files.
Sound is stored in .wav as digital samples known as Pulse Code Modulation
(PCM).
Each .wav file has a header containing information of the file.
type of format, e.g., PCM or other modulations
size of the data
number of channels
samples per second
bytes per sample
There is usually no compression in .wav files.
Other format may use different compression technique to reduce file size.
.vox use Adaptive Delta Pulse Code Modulation (ADPCM).
.mp3 MPEG-1 layer 3 audio.
RealAudio file is a proprietary format.
Introduction to Multimedia 24
Some common audio files formats
Introduction to Multimedia 25
.WAV file format
Introduction to Multimedia 26
.WAV file format
(0,4) ChunkID: Contains the letters "RIFF" in ASCII
form (0x52494646 big-endian form).
(4,4) ChunkSize: 36 + SubChunk2Size. This is the
size of the rest of the chunk following this number.
This is the size of the entire file in bytes minus 8 bytes
for the two fields not included in this count: ChunkID
and ChunkSize.
(8,4)Format: Contains the letters
"WAVE"(0x57415645 big-endian form).
Introduction to Multimedia 27
.WAV file format
The "WAVE" format consists of two subchunks: "fmt
" and "data":
The "data" subchunk contains the size of the data and the
actual sound:
Introduction to Multimedia 28
"fmt " subchunk
(12,4)Subchunk1ID:Contains the letters "fmt "(0x666d7420
big-endian form).
Introduction to Multimedia 29
"fmt " subchunk
(24,4)SampleRate: 8000, 44100, etc.
Introduction to Multimedia 30
"data" subchunk
(36,4) Subchunk2ID: Contains the letters "data" (0x64617461
big-endian form).
(40,4) Subchunk2Size:
= = NumSamples * NumChannels * BitsPerSample/8. This is
the number of bytes in the data.
You can also think of this as the size of the read of the
subchunk following this number.
Introduction to Multimedia 31
Audio Hardware
Recording and Digitizing sound:
An analog-to-digital converter (ADC) converts the analog
sound signal into digital samples.
A digital signal processor (DSP) processes the sample, e.g.
filtering, modulation, compression, and so on.
Play back sound:
A digital signal processor processes the sample, e.g.
decompression, demodulation, and so on.
An digital-to-analog converter (DAC) converts the digital
samples into sound signal.
All these hardware devices are integrated into a few
chips on a sound card.
Introduction to Multimedia 32
Audio Hardware
Different sound card have different capability of processing
digital sounds.
When buying a sound card, you should look at:
maximum sampling rate.
stereo or mono.
duplex or simplex.
Introduction to Multimedia 33
Audio Software
Windows device driver — controls the hardware
device.
Introduction to Multimedia 34
Audio Software
Mixer — its functions are:
To combine sound from different sources.
To adjust the playback volume of sound sources.
To adjust the recording volume of sound sources.
Recording — Windows has a simple Sound
Recorder.
Editing — The Windows Sound Recorder has a
limiting editing function, such as changing volume
and speed, deleting part of the sound.
There are many freeware and shareware programs for
sound recording, editing and processing.
Introduction to Multimedia 36
Computer Music MIDI
Introduction to Multimedia 37
Computer Music (MIDI)
Sound waves, whether occurred natural or man-made,
are often very complex, i.e., they consist of many
frequencies.
Digital sound is relatively straight forward to record
complex sound. However, it is quite difficult to
generate (or synthesize) complex sound.
There is a better way to generate high quality music.
This is known as MIDI — Musical Instrument Digital
Interface.
Introduction to Multimedia 38
Computer Music (MIDI)
It is a communication standard developed in the early
1980s for electronic instruments and computers.
It specifies the hardware connection between
equipments as well as the format in which the data are
transferred between the equipments.
Common MIDI devices include electronic music
synthesizers, modules.
Introduction to Multimedia 39
General MIDI
General MIDI is a standard specified by MIDI Manufacturers
Association. To be GM compatible, a sound generating device
must meet the General MIDI system level-1 performance
requirement:
Minimum of 24 fully voices
16 channels
Minimum 16 simultaneous and different timbre instruments
Minimum 128 preset instruments (MIDI program numbers)
Support certain controllers
http://www.midi.org/techspecs/gm.php
http://en.wikipedia.org/wiki/General_MIDI
Introduction to Multimedia 40
MIDI Hardware
An electronic musical instrument or a computer
which has MIDI interface should has one or more
MIDI ports. The MIDI ports on musical instruments
are usually labeled with:
IN — for receiving MIDI data;
OUT — for outputting MIDI data that are generated by the
instrument;
THRU — for passing MIDI data coming from IN to the
next instrument.
MIDI devices can be daisy-chained together.
Introduction to Multimedia 41
MIDI Hardware
Introduction to Multimedia 42
MIDI Daisy-chain Network
IN
OUT
IN
IN THRU
THRU
MIDI
USB
Introduction to Multimedia 43
Introduction to Multimedia 44
Multi-port MIDI Interface (2 in/out pairs)
OUT IN
A B A B
Lights!
USB port
Thru switch – connects In to Out,
for use without a computer
Leave in ‘out’ position!
Introduction to Multimedia 45
Multi-port MIDI Interface (8 in/out pairs)
Front
Back
USB port
Introduction to Multimedia 46
MIDI Data
Unlike digital sound, MIDI data does not encode individual
samples. MIDI data encode musical events and commands to
control instruments.
MIDI data are grouped into MIDI messages. Each MIDI
message represents a musical event, e.g., pressing a key,
setting a switch or adjusting foot pedals.
A sequence of MIDI messages is grouped into a track.
Introduction to Multimedia 47
MIDI Files
When using computers to play MIDI music, the MIDI
data are often stored in MIDI files. Each MIDI files
contains a number of chunks. There are two types of
chunks:
Header chunk — contains information about the entire
file: the type of MIDI file, number of tracks and the timing.
Track chunk — the actual data of MIDI track.
Introduction to Multimedia 48
MIDI music
Musical instruments are tuned to produce a set of fixed
pitches.
Octave (0.125)
Introduction to Multimedia 49
Octave
For example, any two sounds whose frequencies make a 2:1
ratio are said to be separated by an octave and result in a
particularly pleasing sound when heard.
Similarly two sounds with a frequency ratio of 5:4 are said to
be separated by an interval of a third; such sound waves also
sound good when played together.
Interval Frequency Ratio Examples
Octave 2:1 512 Hz and 256 Hz
Third 5:4 320 Hz and 256 Hz
Fourth 4:3 342 Hz and 256 Hz
Fifth 3:2 384 Hz and 256 Hz
Introduction to Multimedia 50
MIDI File
Music produced by musical instruments can be stored as codes
since it produces discrete number of sounds (no. of keys).
For a piano of 12 sounds (12 keys):
- Code 0-11 (position of a key) : 1 byte
- Octave 0-7 (principle sounds): 1 byte
- Duration of using a key: 1 byte
For a key press it needs 3 bytes.
Introduction to Multimedia 51
How MIDI Sounds Are Synthesized
A simplistic view is that:
The MIDI device stores the characteristics of sounds
produced by different sound sources.
The MIDI messages tell the device which kind of
sound, at which pitch is to be generated, how long the
sound is played and other attributes the note should
have.
Introduction to Multimedia 52
How Sounds Are Synthesized
There are two ways of synthesizing sounds:
FM Synthesis (Frequency Modulation)—Using one sine wave to
modulate another sine wave, thus generating a new wave which is
rich in timbre. It consists of the two original waves, their sum and
difference are harmonics. The drawbacks of FM synthesis are: the
generated sound is not real; there is no exact formula for
generating a particular sound.
Wave-table synthesis— It stores representative digital sound
samples. It manipulates these sample, e.g., by changing the pitch,
to create the complete range of notes.
Introduction to Multimedia 53
FM Synthesis
Introduction to Multimedia 54
MIDI versus digital audio
Introduction to Multimedia 55
Exercises
Introduction to Multimedia 56
Introduction to Multimedia 57
Introduction to Multimedia 58
Introduction to Multimedia 59