AppNote5 MeasTechDigAudio
AppNote5 MeasTechDigAudio
AppNote5 MeasTechDigAudio
by Julian Dunn
Copyright Ó 2001–2004 Audio Precision, Inc.
Copyright © 2001–2003 Julian Dunn
8211.0143 Revision 1
No part of this manual may be reproduced or transmitted in any form or by any means,
electronic or mechanical, including photocopying, recording, or by any information storage
and retrieval system, without permission in writing from the publisher.
Audio Precision®, System One®, System Two™, System Two Cascade™, System One +
DSP™, System Two + DSP™, Dual Domain®, FASTTEST®, and APWIN™ are
trademarks of Audio Precision, Inc.
Published by:
The macros and tests that originally accompanied this publication have been
updated and optimized to run on the latest Audio Precision hardware and
software. Therefore, please note the following changes from what appears in
the text:
Compatible analyzer hardware is the 2700 Series, System Two Cascade Plus,
and System Two Cascade.
Sample rates up to 192 kHz are supported on the 2700 Series, and up to 96
kHz on the Cascade Plus and Cascade.
Substitute 2700 Series, System Two Cascade Plus, and System Two Cascade
where System Two Cascade is mentioned.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Jitter Theory
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
What Is Jitter? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Measuring Jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
The Unit Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
How Can You See Jitter? . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Jitter in Sampling Processes. . . . . . . . . . . . . . . . . . . . . . . . . 7
Jitter in the Interface: Data Recovery . . . . . . . . . . . . . . . . . . . . 7
Jitter in Clock Recovery for Synchronization. . . . . . . . . . . . . . . . . 8
Digital Interface Jitter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Intrinsic Jitter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Cable-Induced Jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Data Jitter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Preamble Jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Interfering-Noise-Induced Jitter. . . . . . . . . . . . . . . . . . . . . . . 13
Jitter Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
The Jitter Transfer Function and Jitter Gain . . . . . . . . . . . . . . . . 15
Non-Linear Jitter Behavior . . . . . . . . . . . . . . . . . . . . . . . . . 16
Jitter Accumulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Sampling Jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Sampling Jitter and the External Clock . . . . . . . . . . . . . . . . . . . 19
Time-Domain Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Frequency-Domain Model . . . . . . . . . . . . . . . . . . . . . . . . . 21
Influence of ADC/DAC Architecture . . . . . . . . . . . . . . . . . . . . 23
Much has been written about digital audio, its defining standards, the ever-
changing hardware and software, the various applications in recording and
broadcasting and telecommunications and the audibility of this or that configu-
ration or artifact. In this book the late Julian Dunn focused instead on the mea-
surement of digital audio signals, and examined in great detail techniques to
evaluate the performance of the converters and interface through which the au-
dio passes. Mr. Dunn passed away early in 2003, cutting much too short a bril-
liant career as one of the world’s premier designers and consultants in digital
audio.
Chapter One, Jittery Theory, studies the causes and effects of the interface
timing variations called jitter with a number of tests designed to characterize
this pervasive malady.
Chapter Two, Analog-to-Digital Converter Measurements, looks at key
ADC parameters and behavior and includes 15 AP Basic macros to run the nec-
essary tests.
Chapter Three, Digital-to-Analog Converter Measurements, does the
same for DACs. A sidebar looks at dither. Twenty-five macros are included.
Chapter Four, The Digital Interface, discusses the AES3/IES60958 digital
interface, examining the basic format and the means of characterizing the sig-
nal. Sidebars focus on the international standards and on synchronization
considerations.
The macros and tests used in making the measurements discussed in the two
converter chapters are listed at the end of the chapters. With the tests and mac-
ros on the CD-ROM you’ll find two AP Basic menus (a-d menu.apb and d-a
menu.apb) to make running the macros easy, along with detailed notes are in
the file README.DOC. Note that all these files must be copied to a local
folder on your computer to run properly. The tests and macros were written
with the Audio Precision System Two Cascade in mind, and although the con-
cepts and techniques are portable the tests would need to be re-written for
other instruments.
Check the Audio Precision Web site at audioprecision.com for additional re-
lated material, tests, macros and other solutions which may be developed from
time to time. We are interested in your comments and suggestions; contact us
at techsupport@audioprecision.com.
Introduction
Digital audio systems are unlike analog audio systems in two fundamental
respects:
Figure 1. 8
Jittered AES3 waveform. 6
V 0
-2
-4
-6
-8
0 1u 2u 3u
sec
What Is Jitter?
Jitter is the variation in the time of an event—such as a regular clock sig-
nal—from nominal.
For example, the jitter on a regular clock signal is the difference between
the actual pulse transition times of the real clock and the transition times that
would have occurred had the clock been ideal, that is to say, perfectly regular.
Against this nominal reference, the zero-crossing transitions of many of the
pulses in a jittered data stream are seen to vary in time from the ideal clock tim-
ing. Expressed another way, jitter is phase modulation of the digital interface
signal.
The jitter component can be extracted from the clock or digital interface sig-
nal to be analyzed as a signal in its own right. Among the more useful ways of
characterizing jitter is by examining its frequency spectrum and identifying the
significant frequency components of the jitter itself.
Measuring Jitter
When very little jitter is present, the pulse transitions are moved back or
forth by only small measures of time. When the jitter is increased, the transi-
tions move across a larger range of times.
Jitter amplitude, then, is a measure of time displacement and is expressed in
units of time, either as fractions of a second or unit intervals. For those new to
jitter measurement, this can lead to some disconcerting graph labels, with time
on the vertical axis versus time on the horizontal axis, for example.
Jitter frequency is the rate at which this phase-shifting is taking place. Like
other noise or interference signals, the jitter modulation signal can be a pure
V 0
-2
-4
-6
-8
0 1u 2u 3u
sec
minimum input
-200m
characteristics specified in
AES3. -400m
-600m
0 50n 100n 150n
sec
In practice, there are often no ideal clocks to compare with, and real jitter
measurements must be self-referenced—made relative to the signal itself.
The simplest and most misleading self-referenced technique is “looking at
the waveform on an oscilloscope,” triggering the oscilloscope on the jittered
signal as shown in Figure 2. Unfortunately, you will get deceptive results that
depend on the interval between the oscilloscope trigger and the transition be-
ing examined, and also on the frequency spectrum of the jitter. Rather than jit-
ter, this technique displays interval variations. There is a relationship between
the two, but at some frequencies jitter will not be shown at all, and at others
the jitter amplitude will appear doubled. In particular, this approach is very
insensitive to low-frequency jitter.
Instead, the ideal clock can be simulated by phase-locking a relatively low-
jitter oscillator to the jittered signal or real clock, using a phase-locked loop
(PLL). (A sidebar on phase-locked loop characteristics is on page 9.) This self-
referencing technique will have a high-pass characteristic with a corner fre-
quency that is related to the PLL corner frequency. The PLL provides an ideal
clock signal useful as an oscilloscope external trigger or as a reference signal
in dual-trace oscilloscope viewing, for example.
If an oscilloscope is triggered by the PLL reference clock and the scope
time base is set to the duration of about one UI, a great many sequential pulses
will be shown at once, all stacked on top of one another due to the persistence
of the screen phosphors. This distinctive display is called an eye pattern, a ver-
sion of which is shown in Figure 3. The opening in an eye pattern is narrowed
by the time spread of the pulse transitions. A narrow eye, then, indicates jitter.
Using digital signal processing (DSP) techniques, a DSP analyzer can ap-
proximate the ideal clock reference by calculating the clock timing based on
an averaging of the incoming signal. The DSP analyzer can then capture the
signal (and its jitter) very accurately. From this data the analyzer can display
the variation in timing and amplitude of the pulse stream as an eye pattern as
in Figure 3; show the jitter waveform in the time domain as in Figure 4, or, us-
80n
Figure 4. 5 kHz jitter vs.
time. 60n
40n
-20n
-40n
-60n
-80n
0 100 200 300 400 500
sec
ing FFT spectrum analysis, plot the jitter in the frequency domain, as in Fig-
ure 5.
1m
U
I
100µ
10µ
1µ
2.5k 5k 7.5k 10k 12.5k 15k 17.5k 20k 22.5k
Hz
than the jitter levels that would cause concern in sampling clocks. Interface jit-
ter is discussed in detail beginning on this page.
Interface jitter occurs as digital signals are passed from one device to an-
other, where jitter can be introduced, amplified, accumulated and attenuated,
depending on the characteristics of the devices in the signal chain. Jitter in
data transmitters and receivers, line losses in cabling, and noise and other spuri-
ous signals can all cause jitter and degrade the interface signal.
The AES3 digital audio interface format1 now has specifications for jitter.
(The consumer version of the interface, which is described in IEC60958-
3:20002 also has jitter specifications.) This specification was drawn up to re-
solve problems that would occur when units that conformed to the interface
specification were interconnected and yet the interface did not work reliably.3
Intrinsic Jitter
If a unit is either free-running or synchronized with a relatively jitter-free
signal, then any output jitter measured at the transmitter is due to the device it-
self. This is referred to as intrinsic jitter.
The level of intrinsic jitter is mainly determined by two characteristics: the
phase noise of the oscillator in the clock circuit and, for an externally synchro-
nized device, the characteristics of the clock recovery PLL.
10
Gain (dB)
10
20
1 .10 1 .10
3 4
1 10 100
Jitter frequency (Hz)
PLL loop gain
VCO phase noise transfer to PLL output
Input jitter transfer to PLL output
of this PLL will determine the low-frequency cut-off point of the measure-
ment. AES3 specifies a standard response for this measurement with a 3 dB
corner frequency of 700 Hz.
The intrinsic jitter levels in AES3 are specified as a peak measurement,
rather than rms. This is because the authors were concerned with the maxi-
mum excursion of timing deviations—as it is these that would produce data
errors.
Cable-Induced Jitter
The other source of jitter on the digital interface is as a result of the non-
ideal nature of the interconnection. Resistance in the cable or inconsistent im-
pedance can cause high frequency losses which result in a smearing of the
pulse transitions, as shown in Figure 7.
This would not be a serious problem if the effect were the same on every
transition. That would just result in a small static delay to the signal which
could be ignored. However, that would only be the case the pulse stream were
perfectly regular—a string of embedded ones or zeros, for example. But real
pulse streams consist of bit patterns which are changing from moment to mo-
ment, and in the presence of cable losses these give rise to intersymbol interfer-
ence. The proximity and width of data pulses effectively shift the baseline for
their neighbors, and with the longer rise and fall times in the cable, the transi-
tions are moved from their ideal zero crossings.
AES3
Data Stream
AES3
Preamble
3 UI 1 UI 1 UI 3 UI 3 UI 2 UI 1 UI 2 UI 3 UI 3 UI 1 UI 1 UI 3 UI 2 UI 1 UI 2 UI
Patterns
Z (B) PREAMBLE Y (W) PREAMBLE X (M) PREAMBLE Y (W) PREAMBLE
Unit Interval (UI)
Time Reference
Figure 8. AES3 data pattern. Note that the Y preambles are identical in every frame.
As the AES3 interface uses the same signal to carry both clock and data, it
is possible to induce jitter on the clock as a result of the data modulation. This
means that care should be taken about mechanisms for interference between
the data and the timing of the clock. The smearing of the waveform as a result
of cable losses is one such mechanism. See Figure 9 and the Intersymbol Inter-
ference sidebar.
Data Jitter
Data jitter is a term used to describe the jitter of the transitions in the parts
of the AES3 waveform modulated by the data. This form of jitter is often an in-
dicator of intersymbol interference.
Figure 9 in the Intersymbol Interference sidebar illustrates this mechanism
inducing data jitter of about 50 ns peak-to-peak in some of the transitions.
Data jitter can also be produced by circuit asymmetries where a delay may
vary between positive-going and negative-going transitions.
Preamble Jitter
Preamble jitter is a term used to describe the jitter on the transitions in
AES3 preambles. The preambles are a set of static patterns which are used to
identify the start of the digital audio subframes and blocks. (See Figure 8.) The
Y preamble at the start of the second (B) subframe is a completely regular
fixed pattern. This unchanging preamble can be used to make jitter measure-
ments that are not sensitive to intersymbol interference, and are therefore a
better indicator of either jitter at the transmitter device or noise-induced jitter,
rather than jitter due to data modulation.
Intersymbol Interference
Figure 9 shows five AES3 interface signals, each with a different data pattern in the
first three bits. The data is encoded by the bi-phase mark encoding scheme (also called
Manchester code or FM code), which has a transition between every bit symbol and
also a transition in the middle of the symbol if it is “1,” but not if it is “0.” The top sig-
nal represents “1-1-1,” the second is “1-1-0,” the middle “1-0-0,” the next “0-1-0” and
the last is “0-0-0.”
BIT 0 BIT 1 BIT 2 BIT 3
1-1-1
1-1-0
1-0-0
0-1-0
0-0-0
a b
2
After Cable
0
Simulation
-2
1500 1600 1700 1800 1900 2000 2100 2200 2300 2400
Time since the start of subframe B (ns)
At the bottom of the chart, the figure also shows the signals as they may look after
transmission down a long length of cable. These cable-affected signals were generated
using the Audio Precision System Two cable simulation, and the five results have been
overlaid on each other. The losses in a real cable would affect the signals in this manner,
rolling off the high frequencies and reshaping the pulses with slower rise and fall times.
In each case the data shown were immediately preceded by the Y preamble, the pre-
amble which begins the B subframe. (See Figure 8.) This preamble is a fixed pattern
which lasts for 5 bit periods (10 unit intervals, or UI). A consequence of this is that the
traces coming into the left-hand side of the cable simulator plot are at almost exactly the
same voltage, since they have all followed the same path for a while. (The preamble is
nominally 8 UI long, but the last part of the preceding bit and the first part of the follow-
ing bit period are fixed to the pattern, resulting in a fixed pattern that is 10 UI long.)
The 1-1-1, 1-1-0 and 1-0-0 traces have a transition starting at 1465 ns (9 UI) from
the subframe start because they have an initial “1” in their data. The 0-1-0 and 0-0-0
traces start with an initial “0” so they do not yet show a transition. All five traces then
change direction at 1628 ns (10 UI) corresponding with the end of the first bit symbol.
(The frame rate of this signal is 48 kHz, so 1 UI is 162.8 ns.)
The markers “a” and “b” indicate that the times of the zero-crossings from those
transitions are 1705 ns and 1745 ns. The earlier transitions are those which have a “1”
value in the first bit and the later transitions those which have a “0.”
As a result of the high-frequency losses in the cable simulation the transition time is
quite slow, so the zero crossings are about 100 ns after the inflections that indicate the
start of the transitions. This interaction between the value of the first data symbol and
the timing of the start of the second data symbol is called intersymbol interference.
This interference is more complex after the second bit symbol (about 2050 ns from
the start of the subframe, also shown in the magnified view). Here there are four differ-
ent zero-crossing times corresponding to the four possible bit patterns of the first two
bits in the subframe. Most of the timing difference is due to the value of the second bit,
but in addition there is a smaller difference relating to the state of the first bit.
Interfering-Noise-Induced Jitter
If the pulse transitions were not sloped by the cable losses, the rise and fall
times of the pulses would be so short that their zero crossings would be rela-
tively unaffected by any added noise. However, the long transition times in-
duced by cable losses allow noise and other spurious signals to “ride” the
transitions, resulting in a shift of the zero crossing points of the pulses.
For example, noise on the signal can vary the time at which a transition is
detected. The sensitivity to this noise depends on the speed of the transition,
which, in turn, depends on the cable losses. This is illustrated in Figure 10.
The five traces on Figure 10 are all of the same part of the B subframe Y
preamble. (As mentioned before, this static preamble pattern is chosen because
it is not sensitive to data jitter, making the noise-induced jitter mechanism
more obvious.) The two markers, “a” and “b” show the range of timings for
a b
-2
1000 1100 1200 1300 1400 1500 1600 1700 1800 1900
Time since the start of subframe B (ns)
the zero crossing resulting from the third transition. Their separation is 31 ns.
In this example, the noise producing this variation is a low-frequency sine
wave of about 300 mV. This type of interference might be induced by coupling
from a power line.
The amount of jitter introduced by noise on the cable is directly related to
the slope at the zero crossing, as voltage is related to time by that slope. With
fast transitions any interfering noise will not produce much jitter: the voltage
deviation will cause a smaller time deviation.
Notice that the direction of the time deviation is related to the direction of
the transition. For a transition shifted up by noise the rising transition will be
early and the falling transition will be late; for a transition shifted down the op-
posite is true. Unlike data jitter from intersymbol interference, this form of jit-
ter is more apparent to devices that recover a clock from a particular edge in
the preamble pattern. That edge will only have one polarity and so the timing
deviation of successive edges will sum together.
However, for systems using many of the edges in the subframe, transitions
will be almost evenly matched in both directions and the cancellation will re-
duce the coupling of low frequency noise-induced jitter into the recovered
clock. For noise at high frequencies successive deviations will not correlate
and so cancellation will not occur.
Jitter Tolerance
An AES3 digital audio receiver should be able to decode interface signals
that have jitter that is small compared with the length of the pulses that it has
to decode. As the jitter level is increased the receiver will start to decode the
signal incorrectly and then will fail to decode the signal— occasionally muting
or sometimes losing “lock” altogether. The maximum level of jitter before the
receiver starts to produce errors is called the jitter tolerance of the device.
As the PLL characteristics sidebar showed, a clock-recovery PLL has a low-
pass characteristic analogous to a mechanical flywheel: it responds or “tracks”
to changes slower than the rate of the corner frequency, and it filters out
changes that are faster.
Jitter tolerance, then, is independent of frequency for jitter above the corner
frequency of the receiver, but as the rate of change of the timing (the jitter fre-
quency) is reduced, the receiver is increasingly able to follow the changes.
This means that at lower jitter rates the receiver will be able to track increasing
amounts of jitter, and so jitter tolerance rises.
For jitter frequencies close to the corner frequency it is possible—as a result
of a poorly damped design—that the jitter tolerance is significantly reduced.
This occurs because the resonance in the receiver is causing the match be-
tween the deviation of the incoming data transition timing and the receiver’s es-
timation of the data transition timing to actually be worse than if the receiver
was not tracking the jitter at all.
100
Figure 11. AES3 jitter
tolerance template.
Peak-to-Peak Amplitude (UI)
200 Hz
10
8000 Hz
0.25 UI
0.1
0.01 0.1 1 10 100
Jitter frequency (kHz)
+10
Figure 12. Jitter transfer +8
function. +6
+4
+2
+0
d -2
B
-4
-6
-8
-10
-12
-14
Jitter Accumulation
In a short chain of digital audio devices, with each device locked to the pre-
vious one, there are several contributions to the jitter at the end of the chain.
Each device will add its own intrinsic jitter, and each interconnecting cable
will make some contribution with cable-induced jitter. There will also be some
jitter gain or loss at each stage.
This process has been called jitter accumulation. The effect varies with the
individual device jitter characteristics and the data patterns at each stage, but
in some circumstances and with some “pathological” signals the jitter mecha-
nisms could all combine in an unfortunate manner.
In a chain of devices with clock recovery systems having similar characteris-
tics a pathological signal will have the same effect at each stage. As Table 1
shows, this can lead to a very large amount of jitter accumulation after only a
few similar stages.
For the purposes of this calculation we are looking at jitter at frequencies be-
low the jitter transfer function corner frequencies of all the devices, so jitter at-
tenuation does not occur. Assume—for simplicity—that all the devices
contribute the same amount of jitter, J, at each stage (this is lumping cable-in-
duced and intrinsic jitter together). Also assume that each device also ampli-
fies the jitter from the previous stage by the same gain—bearing in mind that
gain is only possible for jitter near the peak in the jitter transfer function.
Table 1 lists the total output jitter produced at the end of three chains of
stages, as a multiple of J:
Jitter Gain per Total Jitter (J) Total Jitter (J) Total Jitter (J)
Device after 3 Stages after 4 Stages after 5 Stages
0 dB (ideal) 3J 4J 5J
1 dB 3.8 J 5.4 J 7.1 J
3 dB 6.2 J 10.2 J 15.8 J
6 dB 13.9 J 29.8 J 61.4 J
Table 1. Jitter accumulation
This shows that with a gain of 0 dB at each stage the output jitter is simply
a sum of the jitter produced at each stage. (These jitter levels are peak values
so they will add). Remember that this happens at frequencies below the corner
frequency; at higher frequencies the input jitter will be attenuated, so the final
output jitter will grow more slowly.
The gains of greater than 0 dB show the effect of jitter transfer function
peaking. If peaking is present it will only occur near to the PLL corner fre-
quency. Where the jitter is wide-band only a small proportion of it will be am-
plified and the peaking will have little effect. However, there are mechanisms
that can concentrate the jitter in the region of the peak.
First, AES3 data-jitter can have narrow spectral components. With low-
level audio signals, for example, the jitter will become coherent with the polar-
ity of the signal. This occurs because for signals close to zero, the more signifi-
cant bits within the data word change together as an extension of the sign bit.
If the interface audio signal is a low-level tone at one frequency, then the ca-
ble-induced jitter will tend towards a square wave at that frequency. Occasion-
ally, a spectral peak could coincide with the peak in the jitter transfer function.
In a chain of devices using clock recovery systems with similar characteris-
tics, this signal will have the same effect at each stage. The figure of 6 dB in
the table reflects levels of peaking found in equipment that had been designed
before this problem was widely understood. As the table shows, this can lead
to a very large amount of jitter accumulation after only a few similar stages.
The normal symptom of a pathological level of jitter accumulation is for the
equipment towards the end of the chain to very occasionally lose data, or even
lock. Unfortunately, the circumstances are such that it is difficult to reproduce
when the maintenance engineer is called.
The AES3 specification, since 1997, has two clauses that are intended to ad-
dress potential jitter accumulation problems. The primary statement specifies
that all devices should have a sinusoidal jitter gain of less than 2 dB at any
frequency.
In addition, there is a standard jitter attenuation specification that should be
met by devices claiming to attenuate interface jitter. This requires attenuation
of at least 6 dB above 1 kHz. This frequency is much lower than the jitter toler-
ance template corner frequency, so these devices need a transmit clock which
is separate from the data recovery clock that determines the jitter tolerance.
Sampling Jitter
Sampling jitter is the variation in the timing an audio signal through jitter in
an analog to digital (ADC), digital to analog (DAC), or asynchronous sample
rate converter (ASRC). In the former two cases this can often be associated
with an observable sample clock signal but in an ASRC it may be a totally nu-
merical process, as the samples of a signal are regenerated to correspond with
new sampling instants: in that case the sample clock is a virtual sample clock.
There are many circumstances where a sample clock has to be derived from
an external source. In the domestic environment this could be a digital audio re-
corder or a digital surround processor where the DAC sample clock is derived
from the digital input data stream. For professional applications there are also
devices with DACs, applications where the sample clocks of ADCs need to be
derived from an external sync or where a digital stream needs to be
resynchronized to a different reference using an ASRC.
Often this external source will have jitter that can be observed, measured
and commented on. However, that is not sampling jitter. The external source
might make a contribution to the sample clock jitter but that contribution de-
pends on the characteristics of the clock recovery circuit (or numerical algo-
rithm) between the external source connection and the actual sample clock.
This will have intrinsic jitter, jitter attenuation, and non-linearities in its
behavior.
error
J
S1 S2 S3 S4
6 kHz 6 dBV
error
J
S1 S2 S3 S4
tion and the actual (or virtual) sample clock. This will have intrinsic jitter,
jitter attenuation, and jitter non-linearities in its behavior.
Time-Domain Model
First, we will look at sampling jitter in the time domain.
The effect of a sample being converted at the wrong time can be considered
simply in terms of the amplitude error introduced. Any signal that is not DC
will change over time, and a wrong sampling instant will produce a wrong am-
plitude value. As you can see in Figure 13, the amplitude error is proportional
to the rate of change, or slope, of the audio signal, which is greatest for high-
level high-frequency signals.
Figure 14 illustrates the effect of random sampling jitter on a pure tone. The
tone is shown as having an amplitude of 2 V rms and a frequency of 1 kHz.
The error signal is calculated using random Gaussian jitter of amplitude
10 ns rms, and the simulation that produced this graph calculates the error of
each sample at a sampling frequency of 176.4 kHz, which represents a 4X
oversampled DAC in a CD player.
Notice how the error signal and the tone intermodulate. The error is the
product of the slope of the tone and the jitter; as a result there are minima in
the error at the peaks of the tone where the slope is flat.
The root-mean-square (rms) error computed by the simulation is
124 µV rms, or –84 dB relative to the tone. Assuming that this error is spread
fairly evenly throughout the 88.2 kHz bandwidth represented by the sampling
frequency of 176.4 kHz, we can estimate that measured over the nominal au-
dio band to 20 kHz, the noise level would be 60 µV rms. This is 90.5 dB be-
low the level of the tone.
This method of analyzing the effect of jitter can be used to make an estimate
of the acceptable level of jitter of any given form. It can be simplified to calcu-
late the level of jitter that, if applied to a “worst-case” signal, would produce
an error of amplitude equal to the quantization interval. For example, a worst-
case full-scale 20 kHz sine wave in a 16-bit system would have a maximum
slope of:
2×p ×F × A = 4.1LSB / ns
where
F = 20 kHz, the tone frequency
A = 215 = 32768 LSB, the tone amplitude (peak).
From this one might conclude that the jitter level should be no more than
244 ps peak, but that limit is fairly arbitrary—there is nothing special about an
4
Figure 14. Sampling jitter on
a 1 kHz tone. The black line
–2
–4
0 1 2 3 4
Time (ms)
error of 1 LSB amplitude—and has little relation to the audibility of the error,
which will be related to the spectral content of the error.
Frequency-Domain Model
Another method of looking at the effect of jitter is to consider it as a modula-
tion process, and analyze it in terms of frequency components. It can be shown
mathematically that a simple relationship exists between a jitter spectral com-
ponent, an audio signal spectral component and the resulting jitter modulation
product.
If a signal is sampled with errors in the sampling instants, the effect is to
modulate the signal in time. This is expressed mathematically in (1). The out-
put signal v¢( t ) is a time-displaced version of the input signal, v( t ), and the
variation in the displacement is the jitter.
v¢( t) = v( t - Dt). (1)
Dt = j( t) = ×sin(w j t).
J
2 (2)
Jitter amplitude (typically less than 10 ns) is generally much smaller than
the signal period (typically greater than 40,000 ns). The product of small jitter
modulation levels is itself very small, and for such cases we can make the fol-
lowing small-angle approximations:
æ Jw ö
cosç i sin(w j t)÷» 1
è 2 ø (5)
and
æ Jw ö Jw
sinç i sin(w j t)÷» i sin(w j t).
è 2 ø 2 (6)
The output signal has the input signal with two other components at frequen-
cies offset from the input signal frequency by the jitter frequency, and their am-
plitude is related to the product of jitter amplitude and signal frequency. This
-20
-40
-60
dBV
-80
-100
-120
-140
result can be used when estimating the potential audibility of jitter modulation
products.
Figure 15 illustrates this effect on a real signal. The input signal is at
10 kHz and the jitter modulation is at 3 kHz. The two components at 3 kHz off-
set from the input signal are the upper and lower jitter modulation sidebands.
(In this figure there are also “skirts” to the spectrum closer to the 10 kHz com-
ponent. These are due to some low-frequency noise-like jitter in the system).
The ratio of signal to each ‘single’ sideband, in dB, is:
æ Jw ö
R ssb = 20 log 10ç i ÷dB.
è 4 ø (8)
This result is for sinusoidal jitter components. Using Fourier analysis, more
complex waveforms can be broken down into sinusoidal components and the
formula can be applied.
For convenience, the formula can be modified by summing the levels in
both sidebands to give a total error, and using rms jitter levels, J n , in nanosec-
onds and frequency, f i , in kHz:
R dsb = 20 log 10 ( J n f i ) - 104 dB. (9)
Oversampling Converters
An oversampling converter is one that is processing samples at much more
than the minimum rate required by the bandwidth of the system. This
oversampling rate can typically be from 2X to 256X. The higher rates also use
noise shaping, an important technique which can help provide low-cost solu-
tions to high-resolution conversion. (Noise shaping can produce a separate
side effect that is discussed later.)
Since the jitter bandwidth in a sample clock can extend to half the sampling
frequency of the converter, the jitter in an oversampled converter will be
spread over a wider spectrum than the jitter in a non-oversampled converter.
The error caused by jitter modulation is related to the jitter spectrum, so the er-
1X Fs 2X Fs 3X Fs 4X Fs
2 2 2 2
ror signal from an oversampled converter is also spread across a wider
spectrum.
To illustrate this: consider a 1 kHz signal being sampled with 1 ns of spec-
trally flat, noise-like jitter. By calculation, this will produce a total error
104 dB below the signal. This total error figure remains the same regardless of
the sample rate of the converter.
As you can see in Figure 16, in a 4X oversampled DAC this error signal
will be spread over four times the frequency range compared with a 1X con-
verter. For audio purposes, of course, we limit our interest to the 20 Hz to
20 kHz bandwidth, and a measurement made over that range contains only
one-quarter of the power of the full spectrum of error noise. One-quarter the
power implies one-half the voltage, resulting in an error 6 dB lower than that
for the non-oversampled converter.
Jitter sources, however, are normally not spectrally flat. Jitter is usually dom-
inated by lower-frequency components, due both to the typical phase noise
spectrum of oscillators and to the low-pass jitter filtering common in clock re-
covery circuits. Oversampling will not reduce the impact of this lower-
frequency jitter.
Audio
Passband
1/f Jitter
Modulation
Spectrum
or
o
Fl
se
oi
N
64X Fs
2
Switched-Capacitor Filters
Sampling or re-sampling occurs at the interface between the sampled signal
domain and the continuous-time signal domain, which is not always the same
as the interface between the digital and analog domains. A switched-capacitor
filter operates on analog signals in the sampled signal domain.
Audio N • Fs
Passband
2
Noise in
Passband from
m
ec ic
Jitter Time
tru
Sp son
Modulation
ise ltra
No U
Noise in
Passband from
Jitter AM / Time
Modulation
Ultrasonic
Jitter Component
Figure 18. Spectrum of amplitude + time modulation products of ultrasonic jitter with ultrasonic
noise.
With this kind of DAC, sampling jitter produces an amplitude modulation ef-
fect with the following output:6
Jw i
v¢( t) = A cos( w i t) - A
4
(
cos (w i - w j )t )
Jw
(
- A i cos (w i + w j )t .
4
) (10)
This amplitude modulation combines with the pure jitter modulation to pro-
duce the following:
J (w i - w j )
v¢( t) = A cos( w i t) + A
4
((w - w )t)
cos i j
J (w i + w j )
-A
4
cos ((w + w )t).
i j
(11)
The sidebands for this combination now scale with the sideband frequen-
cies, w i - w j and w i + w j , rather than the modulated frequency, w i . Where w i
is ultrasonic (and the sideband offset w j is very large) this reduces the impact
of the jitter on any sideband modulated down into the audio band in approxi-
mate proportion to the ratio between the ultrasonic noise frequency component
and the audio band frequency.
Where the signal under consideration is at high level and high frequency (a
component of the ultrasonic noise) and the sideband offset is very large (due to
an ultrasonic jitter signal), sidebands can be modulated down to much lower
frequencies. This technique reduces the impact of the jitter modulating the ul-
trasonic noise down into the audio band in approximate proportion to the
oversampling ratio, e.g. 256:1.
Often the output sample frequency cannot be locked to the input. Addition-
ally, some equipment is designed to retain the flexibility to cope with an arbi-
trary relationship between input and output timing. In these cases the
conversion is more complex and includes an algorithm that tries to track the re-
lation between the input and output samples based on their actual time of ar-
rival. This sort of SRC is called an asynchronous sample rate converter
(ASRC).
dBr A
peak wideband jitter. –80
–100
–120
–140
–80
–100
–120
–140
sidebands—is 71.4 dB below the 12 kHz tone. By calculation we can see that
this corresponds with sampling jitter of 3.5 ns—the same level as the jitter ap-
plied to the interface. This indicates that at 5 kHz there is no jitter attenuation
between the applied stimulus jitter on the interface and the sampling clock on
the DAC. (The skirts around the 12 kHz tone are probably low-frequency
noise in the jitter generation mechanism.)
–100
–120
–140
As an example of how the results could vary, the same tests were repeated
with a different device. Figure 21 shows the FFT traces. Notice that the 5 kHz
sidebands are attenuated compared with Figure 19, and the higher-frequency
components resulting from the wide-band jitter are attenuated relative to Fig-
ure 20.
2n
s
e 1n
c 500p
200p
100p
50p
20p
10p
0 4k 8k 12k 16k
Hz
0
Figure 23. FFT spectrum –10
showing jitter modulation –20
–30
products from J-test after a –40
cable simulation. –50
–60
–70
–80
–90
–100
–110
–120
–130
–140
–150
0 3k 6k 9k 12k 15k 18k 21k 24k
Hz
Figure 23 shows an FFT of the analog output of the first test device with J-
test applied. Notice that the jitter sidebands follow the interface jitter spectrum
reliably, which means that the test device is susceptible to data jitter on the
interface.
The shape of each sideband matches the interface jitter spectrum of the pre-
vious figure, so we can also conclude that it does not have jitter filtering
within the band. The 125 Hz sidebands are each about 70 dB below the stimu-
lus tone (67 dB for both sidebands together). This corresponds with sampling
jitter at that frequency of amplitude
antilog- ((104 - 67) 20) 12= 6 ns rms.
Audibility considerations
It is one thing to be able to identify and measure sampling jitter. But how can we
tell if there is too much?
A recent paper by Eric Benjamin and Benjamin Gannon describes practical re-
search that found the lowest jitter level at which the jitter made a noticeable difference
was about 10 ns rms. This was with a high level test sine tone at 17 kHz. With music,
none of the subjects found jitter below 20 ns rms to be audible.7
This author has developed a model for jitter audibility based on worst case audio
single tone signals including the effects of masking.8 This concluded:
“Masking theory suggests that the maximum amount of jitter that will not produce
an audible effect is dependent on the jitter spectrum. At low frequencies this level is
greater than 100 ns, with a sharp cut-off above 100 Hz to a lower limit of approximately
1 ns (peak) at 500 Hz, falling above this frequency at 6 dB per octave to approximately
10 ps (peak) at 24 kHz, for systems where the audio signal is 120 dB above the threshold
of hearing.”
In the view of the more recent research, this may be considered to be overcautious.
However, the consideration that sampling jitter below 100 Hz will probably be less audi-
ble by a factor of more than 40 dB when compared with jitter above 500 Hz is useful
when determining the likely relative significance of low- and high-frequency sampling
jitter.
References
Introduction
–1
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
Time, ms
The worst case level error would occur if the peak were halfway between
sampling points:
æ 22.5° ö
20×logçcos ÷=-017
. dB FS .
è 2 ø
This error is shown in Figure 25.
Figure 25. 3 kHz sine wave
Value as a fraction of full scale
0.95
In most cases this behavior is not a problem because during the measure-
ment it is likely that samples would be occurring near the peak in the wave-
form. The following are cases where the under-reading may be significant:
§ a sine wave with a period that has an exact integer ratio with the sam-
pling period. This will cause samples to occur at a very limited number
of phases of the sine wave; for example, 3 kHz at a 48 kHz sample rate
has 16 samples per cycle. The relationships are not always that obvious:
a frequency near 9.14 kHz has the same 21 samples repeated every 4
cycles.
§ any signal that has frequency components close to having an integer ratio
with the sample frequency, but not actually synchronous. This signal
would slowly drift in timing relative to the samples and would cause the
sampled peak reading to fluctuate.
§ any high bandwidth signal with significant high frequency content and
only a short duration. During measurement only a few samples would oc-
cur near the crest or crests.
It is also possible to reduce the peak metering error by interpolating the data
by oversampling. The higher density of data points for which there are sam-
ples reduces the error in the estimate of signal peak amplitude.
The level meter on the APWIN Digital Input/Output (DIO) panel uses a
form of peak sample detection and should not be used for ADC performance
measurements. The Digital Analyzer panel, which uses rms and quasi-peak de-
tectors, should be used instead.
RMS Metering
Since rms measurements are much less sensitive to the relative phase of sam-
pling instants, an rms meter is normally used to measure signal levels in ADC
testing. This also brings mathematical advantages.
For example, if a signal is made up of several components, the total mean
square amplitude of the signal is the sum of the mean square amplitudes of
each component, as shown:
Measurement Techniques
Gain
The gain of an ADC is not a unit-free ratio so it cannot be quoted in dB. It
is normally quoted indirectly as the analog level corresponding to a digital out-
put level of 0 dB FS. Many practical devices either cannot quite reach
0 dB FS, or have non-linearities that mean that the gain at that level is not rep-
resentative of the gain over most of the range. For this reason, gain is often
measured at a digital output level below 0 dB FS, for instance –20 dB FS.
As an example, you may find that an analog level of 3.81 dBV on the input
to an ADC generates an output level of –19.87 dB FS. There is not a conven-
tional method of reporting ADC gain, but it may be quoted as the digital out-
put level corresponding to a reference input level, like this:
0 dBV =-19.87 - 3.81 =-23.68 dB FS.
As an alternative this can also be described in terms of the analog level cor-
responding to digital full-scale level:
0 dB FS = 3.81+19.87 = 23.68 dBV.
Unless otherwise specified (as in a frequency response measurement), the
gain is normally quoted at a frequency of 997 Hz.
Figure 26. Subroutine from within "a-d tech note utilities.apb." Setting stimulus levels in dBFS.
An initial estimate of the gain is made and the input is set to produce an out-
put level of approximately –20 dB FS. The value for gain at that level,
Gain20, is then used in a second iteration to set up the desired output level to
the value of NewOutputLevel.
Gain Stability
The gain of an ADC may drift due to instability in the converter reference
voltage or in the value of other components. This variation can be monitored
over time to determine the gain stability.
AES17 defines an input logarithmic-gain stability test which measures the
range of gain seen in an ADC over an hour’s time. A brief (typically five min-
ute) warm-up period precedes the test. The measurement is of output level of
the ADC, with the input level set to produce a –6 dB FS output initially. The
procedure in Figure 27, “a-d input gain stability.apb,” performs this test.
Gain-Frequency Response
In sampled systems the bandwidth of the input has to be limited to the fold-
ing frequency, or half of the sampling frequency, to avoid aliasing. Modern au-
dio ADCs normally have this anti-alias filter implemented with a combination
of a sharp-cutoff finite impulse response (FIR) digital filter and a simple low-
order analog filter. The digital filter operates on a version of the signal after
conversion at an oversampled rate, and the analog filter is required to attenuate
signals that are close to the oversampling frequency. This analog filter can
have a relaxed response, since the oversampling frequency is often many
octaves above the passband.
The FIR filter characteristics may be specified very tightly and will have a
fine ripple characteristic to the edge of the passband. Above that frequency the
response will tail off quite sharply. The key parameters are
Sub Main
'#Uses "a-d tech note utilities.apb"
'***********************************************************************
'APWin procedure developed to illustrate the article
'"Analog to Digital Converter Measurements" written by
'Julian Dunn. (c) Julian Dunn 2000
'***********************************************************************
'Set level to -6 dB FS and examine gain variation
Print #1,
Print #1, "===================================================="
Print #1, "Input logarithmic-gain stability (AES17-1998 cl 5.6)"
Print #1, "===================================================="
AP.S2CDsp.Program = 1
SetADClevelChA (-6) 'Apply analogue -6dBFS
BargraphNumber = AP.BarGraph.New
AP.BarGraph.AxisLogLin(1) = 0
AP.BarGraph.Id(1) = 6005
AP.BarGraph.AxisLeft(1,"dBFS") = -6.5
AP.BarGraph.AxisRight(1,"dBFS") = -5.5
' Note the range of the readings on the meter after an hour
Close #1
End Sub
There may also be other factors affecting the frequency response, including
components such as transformers, or perhaps a DC blocking filter that is often
implemented in the digital and/or analog domains.
The following graphs illustrate some procedures to test frequency response
applied to a high-quality ADC operating at a 96 kHz sampling rate.
The plot in Figure 28 was generated by “a-d stopband.apb.” This procedure
measures the signal attenuation on the digital output of the ADC at frequencies
from the folding frequency to 200 kHz. The stimulus tone is applied at
–20 dB FS. Any signal in this range would appear aliased below the folding
frequency into the passband, so suppression of these components is important.
.
d -40
B
r
-60
1
-80
-100
-120
60k 80k 100k 120k 140k 160k 180k 200k
Hz
This is similar to the alias suppression test described in AES17. The black
line on the graph is the response of the ADC. The noise floor with the ADC in-
put muted is shown in gray.
1 -60
-80
-100
44k 46k 48k 50k 52k 54k 56k
Hz
Notice that the alias suppression above 52 kHz is enough to reduce the
–20 dB FS signal to below the noise floor of the measurement. However, there
may be problems for signals in the transition region, below 52 kHz but above
the folding frequency. The following trace is designed to examine this.
The procedure “a-d antialias corner.apb” has produced the plot in Figure 29.
It is similar to the stopband measurement in Figure 28 but focuses on the re-
sponse in the transition region between the anti-alias filter passband and
stopband. This region is of interest as it indicates the potential for this ADC to
suffer from aliasing between 48 kHz and 52 kHz, as well as in the margin be-
tween the top of the passband and the folding frequency.
The minimum alias suppression is near the folding frequency of 48 kHz,
with an attenuation of approximately 10 dB. (The measurement at the folding
frequency itself, 48 kHz, is attenuated by an amount that depends on the rela-
tive phase of the tone and the ADC sampling, so the notch on this graph is
probably not significant.)
The passband frequency response shown in Figure 30 was made using pro-
cedure “a-d passband.apb.” The result is displayed on a more magnified Y-axis
scale to examine the deviation of the passband response from flat. The zero ref-
erence is set at 997 Hz; high- and low-frequency rolloffs are about 0.2 dB at
20 Hz and 40 kHz. Even at this magnification the ripple due to the digital filter
is not visible.
+0.2
Figure 30. Frequency
+0
response in the ADC
passband. d
-0.2
B
r -0.4
1
-0.6
-0.8
-1
10 20 50 100 200 500 1k 2k 5k 10k 20k 40k
Hz
To view the ripple, Figure 31 has been made by concentrating on the middle
of the audio band in Figure 30 and then magnifying the Y-axis even further.
This shows the sinusoidal nature of the filter ripple and the two components of
the response variation. These variations are due to two cascaded stages of
equiripple anti-alias filtering in this design.
+0.002
d +0.001
B
r +0
-0.001
1
-0.002
-0.003
-0.004
-0.005
1k 2k 3k 4k 5k 6k 7k 8k 9k 10k
Hz
This ripple is so small that it could not be audible as a signal level variation.
However, it is interesting because it indicates the time dispersion of the
passband signal due to the filter. The time-domain equivalent of a sinusoidal
gain variation in the frequency response is a pair of attenuated duplicates of
the signal, with one duplicate occurring before and one after the main signal.
The amplitude and relative timing of these duplicates can be calculated from
the ripple periodicity and amplitude.
In this example, the period of the finer of the ripple components is about
3 kHz. (Observe the crests at about 1.6, 4.7 and 7.7 kHz which are 3 kHz
apart). The reciprocal of this periodicity indicates the timing offset to be:
±1
=±333 ms.
3 kHz
This is the advance of the pre-echo and the delay of the post-echo relative to
the main signal.
The amplitude of these echoes is directly related to the amplitude of the rip-
ple, which, as you can see in Figure 31, is ±0.003 dB. If we assign the main
signal a value of 1 and convert the ripple from dB to a linear scale, we can cal-
culate the sum of signal plus ripple, as shown here:
æ ±0. 003 ö
ç ÷
10è 20 ø
= 1± 0.0003.
The rms value of this ripple component, expressed as a ratio to the main
component, is therefore:
=================================================
Input for full-scale amplitude (AES17-1998 cl 5.4)
and maximum input amplitude (AES17-1998 cl 5.5)
=================================================
Input full scale defined by signal just reaching positive full scale
was not measured as positive full scale cannot be reached
Results of test
Maximum input amplitude is: 22.82 dBV
(defined by the level for 1% distortion)
Figure 32. Results of procedure “a-d input for full-scale.apb.” DUT is a high-quality ADC.
This test was run on a high-quality ADC with the results shown in Figure
32, which illustrates the variation produced by the three measurement meth-
ods. In this case it was not possible to reach digital full scale due to a small
DC offset being subtracted (after clipping) by a DC blocking filter in the
digital domain.
The procedure also measured the maximum input amplitude using the two
alternate methods, by measuring both the level at which the signal clipped
enough to have 1% distortion, and the level at which the signal is compressed
by 0.3 dB. The lower of these results is defined as the maximum input
amplitude.
AES17 specifies that where the full-scale output amplitude cannot be
achieved, a level 0.5 dB below the maximum input amplitude is quoted. In this
example the compression result was much higher than the 1% distortion result,
so the distortion result was used.
Another device, a popular portable DAT recorder, was tested as shown in
Figure 33. The test was performed with the recorder’s automatic level control
set to “manual” and the record level set to maximum.
=================================================
Input for full-scale amplitude (AES17-1998 cl 5.4)
and maximum input amplitude (AES17-1998 cl 5.5)
=================================================
Input full scale defined by signal just reaching positive full scale
is –12.40 dBV
(with distortion, 0.0079 %, and output level 0.00 dB FS RMS)
Figure 33. Results of procedure “a-d input for full-scale.apb.” DUT is a portable DAT recorder.
In this case the full-scale level was reached, at just 0.02 dB below the level
predicted from the gain at –20 dB FS.
+4 +2
Figure 34. Maximum input +3.5 +1.5
+3 +1
level for 1 % THD+N vs.
+2.5 +0.5
frequency. DUT is a high- +2 +0
quality ADC. d
+1.5 -0.5 d
B +1 -1 B
r F
+0.5 -1.5 S
+0 -2
-0.5 -2.5
-1 -3
-1.5 -3.5
-2 -4
20 50 100 200 500 1k 2k 5k 10k 20k 40k
Hz
Figure 35 shows the same plot for the portable DAT recorder. This graph
shows a rise in the maximum input level (lower line) at both low and high fre-
quencies. As before, the high-frequency rise is due to the elimination of the dis-
tortion harmonics from the passband of the converter. The low-frequency rise
is a result of the DC blocking filter being implemented in the analog domain,
attenuating the signal before the converter so that slightly higher levels are tol-
erated before full scale is reached. Neither of these characteristics illustrate
any problem with the ADC being tested.
+4 +2
Figyre 35 Maximum input +3.5 +1.5
level for 1 % THD+N vs. +3 +1
+2.5 +0.5
frequency. DUT is a portable +2 +0
DAT recorder. d
+1.5 -0.5 d
B +1 -1 B
r F
+0.5 -1.5 S
+0 -2
-0.5 -2.5
-1 -3
-1.5 -3.5
-2 -4
20 50 100 200 500 1k 2k 5k 10k 20k 40k
Hz
Noise
The analog-to-digital conversion process will always produce errors, which
in an ideal system should be inaudible. However, in a practical system the con-
version errors can often be audible or become so as a result of amplification.
These errors are much more acceptable to the listener if they have a random
character and are not manifested as distortion, chirping or modulation effects.
The error should be noise-like, possessing a spectrum that does not have spuri-
ous tonal components and that does not change with the signal.
Errors are more acceptable in the presence of high level signals, which can
mask their audibility. For this reason, the errors (which, after dithering, be-
come the noise) of an ADC are examined in the presence of a low-level signal.
This signal stimulates the lower-amplitude coding levels of the converter,
which would produce the most audible artifacts if any errors were present.
The noise floor of most converters is fairly flat, so these figures indicate the
difference in results that might be quoted. The A-weighting gives the lowest
noise figure and is normally the figure quoted on the front page of a data sheet.
Where the noise is fairly flat you can add 2.3 dB to an A-weighted noise figure
to estimate the unweighted noise over the DC to 20 kHz band.
========================================
Idle channel and signal to noise ratio
========================================
Unweighted measurements
Unweighted signal-to-noise ratio –84.82 dB FS
Unweighted idle channel noise –84.08 dB FS
The idle channel noise measurement is not very useful for testing ADC per-
formance. It is not representative of normal operating conditions and can pro-
duce erratic results.
For a successive-approximation converter, idle channel noise measurement
does not exercise many of the conversion codes of the converter. The codes
that are exercised depend critically on DC offset, and so may offer very incon-
sistent results.
For a delta-sigma converter this technique is also not very useful. Delta-
sigma converters can have idle tones that fall at frequencies determined by the
DC offset into the converter. For an ADC with an analog DC blocking stage it
is difficult to exercise many DC levels.
However, idle channel DC conditions can be used to study idle tones by tak-
ing an FFT spectrum of the ADC output under idle channel conditions.
Figure 38, produced by “a-d idle channel FFT.apb,” shows an FFT of the
output from the DAT recorder in the idle channel state with an idle tone
around 11 kHz at –112 dB FS. This disappears from the FFT when a signal is
applied.
For example, the Crystal Semiconductor CS5396 data sheet quotes the fol-
lowing characteristics (at 48 kHz sample rate in 128X oversampling mode):
Crystal CS5396
Dynamic
A-weighted 20 kHz bandwidth 120 dB
Range
Dynamic
20 kHz bandwidth 117 dB
Range
997 Hz at –60 dB FS and 20 kHz
THD+N 57 dB
bandwidth
Note that the THD+N performance at –60 dB is quoted as a ratio, but if you
subtract that ratio of 57 dB from the tone amplitude of –60 dB FS the result of
–117 dB FS matches (apart from the sign) the “dynamic range” quoted over a
20 kHz bandwidth.
“Number of Bits”
In the audio industry the discussion about the performance of a product of-
ten focuses on the “number of bits” that a product “has.” There are multiple
meanings being implied for the “number of bits,” so that in addition to the
word size used for storage or transmission of digital audio data, it is also as-
sumed that it relates to the performance of the equipment. Often the short-form
description of a product mentions the “number of bits” rather than any other
aspect of performance.
An ideal ADC with a flat noise floor will have the same noise as a dithered
quantization at a word-length of that number of bits. (This allows no room for
any internal noise, but this is an ideal ADC.) The noise is spread over the band
from DC to the folding frequency and can be determined using the following
equation:
This formula is based on an N-bit conversion with no errors apart from the
noise of a quantization that uses unshaped TPDF dither of 2 LSBs amplitude
peak-to-peak. Applying this formula to a 16-bit converter will produce a figure
for an unweighted signal-to-noise ratio of 93.32 dB FS, measuring the noise
from DC to half the sampling frequency.
The proportion of this noise that falls within a 20 kHz bandwidth will scale
with the sampling frequency, FS:
Ideal Noise ( DC to 20 kHz)
é 20 kHz ù
= 10×logê ú+3.01- 6.02×N dB FS.
ë 0.5×FS û
crease to show that these lower four bits were making a significant
contribution to the performance.
It is possible to assess the value of the least significant bits by taking a mea-
surement of signal-to-noise ratio and examining it for low-level non-
linearities. If the noise rises, or if spurious spectral components appear on the
truncated output in the presence of a low level signal, then the bits are signifi-
cant. See Low-level non-linear behavior, page 66.
Noise Spectrum
Figure 39 shows an FFT of the output of the portable DAT recorder, using
the same test signal as the signal-to-noise ratio measurement. The FFT was
transformed from 16384 points and power-averaged 16 times. The Blackman-
Harris 4-term window was used.
-55
Figure 39. FFT of signal-to- -60
-65
noise test output , linear -70
-75
axis. -80
-85
d
B -90
F -95
S -100
-105
-110
-115
-120
-125
-130
0 2k 4k 6k 8k 10k 12k 14k 16k 18k 20k 22k 24k
Hz
This figure is an APWIN plot that has been made using a linear frequency
scale with the same number of points as FFT bins, which makes it possible to
estimate the mean level of the bins in the noise floor at about –122 dB FS.
(When the number of plotted points does not equal the number of bins, the
APWIN plotting routines plot the highest valued bin for each point where
more than one bin was present, and this would skew this visual estimate of bin
mean level).
The conversion factor to calculate noise density for this FFT using the
Blackman-Harris 4 window is:
Noise density correction
æ 1 FFTpoints ö
= 10×logç
ç × ÷
÷dB
èWindow Scaling Sampling Frequency ø
æ 1 16384 ö
= 10×logç × ÷
è 2.004 48000 ø
= 7.7 dB.
The noise over most of the graph is about in line with the –122 dB FS on
the Y-axis. Using the conversion factor this corresponds with:
10×log ( Noise Density) dB FS =-122- 7.7 dB FS
=-129.7 dB FS.
This noise density, if it were constant over a 20 kHz bandwidth, would cor-
respond with an unweighted noise of:
Noise ( 20 kHz )
= 10×log ( Bandwidth) - 10×log ( Noise Density) dB FS
= 43- 129.7 dB FS
=-86.7 dB FS.
This compares with the –84.82 dB FS measurement reported for the signal-
to-noise ratio. The 2 dB difference corresponds with low-frequency noise. As
confirmation of this, the difference disappears when the 100 Hz high-pass fil-
ter is selected for the signal-to-noise measurement.
The noise floor is basically flat above 200 Hz, but it shows a small increase
in the noise density of about 1 dB from 2 kHz to 22 kHz. This could be an ef-
fect of the noise-shaping curve of the delta-sigma modulator in the ADC, or it
could indicate some shaping of internal dither or quantization noise in the deci-
mation filter after the modulator. The discrete spurious components seen be-
tween –119 and –116 dB FS at 5 kHz, 11 kHz, 13 kHz and other frequencies
may be idle tones. See Low-level non-linear behavior, page 66.
The low-frequency noise contribution is much clearer if graphed on a loga-
rithmic frequency axis. This is shown in Figure 40, which is the same data
from Figure 39 re-plotted on a log scale.
-55
Figure 40. FFT of signal-to- -60
-65
noise test output, logarithmic -70
-75
axis. -80
-85
d
B -90
F -95
S -100
-105
-110
-115
-120
-125
-130
1 2 5 10 20 50 100 200 500 1k 2k 5k 10k 20k
Hz
DC Offset
The DC offset that is indicated on this FFT can be accurately measured us-
ing a DC averaging meter, which is available in APWIN by selecting “DC
only” for the coupling on the Digital Analyzer panel. For the portable DAT re-
corder under test in this illustration, the DC level reads –72.4 dB FS.
In APWIN, the generator settings for DC offset and the analyzer measure-
ments for DC level are relative to the full-scale DC value. Full-scale DC has a
level 3 dB higher than a full-scale rms sine wave—which is the defined refer-
ence level for dB FS. Consequently, DC settings and readings appear 3 dB
lower than the equivalent dB FS (RMS) values. However, DC has the same
value as the peak level of a full-scale sine wave, so the APWIN values for DC
offset are correct for dB FS (peak).
This statement may seem mathematically strange, as the numerical rms value and
peak value of a DC level are obviously the same. However, the dB FS (RMS) measure-
ment is defined as the ratio of the rms level of the signal being measured against the rms
level of a full-scale sine wave—which is numerically 1 2. The rms level of DC at digi-
tal full scale is therefore 3 dB above the rms level of a full-scale sine wave, and reads
+3 dB FS (RMS). The peak level of full-scale DC is the same as for a sine wave, and so
full-scale DC reads 0 dB FS (peak).
====================================================
Total harmonic distortion and noise
====================================================
Measured as an amplitude
-110
-120
20 50 100 200 500 1k 2k 5k 10k 20k
Hz
-60
Figure 43. THD+N by level -65
-70
for a 997 kHz tone.
-75
-80
d -85
B -90
F
S -95
-100
-105
-110
-115
-120
-80 -75 -70 -65 -60 -55 -50 -45 -40 -35 -30 -25 -20 -15 -10 -5 +0
dBr
is so low in good quality digital audio systems that the noise level becomes sig-
nificant and often dominant for all signal levels except close to full scale. As
the noise level does not scale with signal level, reporting the THD+N measure-
ments as a ratio to the signal level makes the numerical result vary in inverse
proportion to the input tone level.
The alternative shown here plots the results as a level (in dB FS) and not a
ratio (in dB), and shows more clearly when the result departs from the noise
floor at input levels above –15 dB FS.
This is particularly important for the plot of THD+N versus level in Figure
43, which shows that below –15 dB FS the measurement is fairly constant. We
can conclude that this is the noise floor of the device and that the harmonic dis-
tortion components are not significant in the measurement. This plot also re-
veals that the noise floor rises slightly toward lower input levels. This effect
would not be very clear if the plot had a basic 6 dB per octave downward
slope.
For many systems the odd harmonics are dominant, and in these cases it is
important to measure the third harmonic. For digital audio systems which are
band-limited to 20 kHz, this test is not capable of revealing the third harmonic
distortion products for tones above 6.7 kHz. This can be observed for the
–1 dB FS tone amplitude in Figure 42; and it can be a problem if you wish to
measure non-linearity due to slew-rate limiting, for example. In a 20 kHz
band-limited system the lowest odd and even harmonics are lost for input
tones above 6.7 kHz (for the third harmonic) or 10 kHz (for the second
harmonic).
It is useful to examine the FFT amplitude spectrum for specific input condi-
tions. The trace in Figure 44 corresponds with the THD+N reading of
–105.14 dB FS with a 997.00 Hz stimulus tone at –1.01 dB FS.
+0
Figure 44. FFT of THD+N -20
test output.
-40
-60
d
B -80
F
S
-100
-120
-140
-160
0 5k 10k 15k 20k 25k 30k 35k 40k 45k
Hz
This graph was produced using “a-d THD_FFT.apb.” The equiripple win-
dow was chosen as it has the lowest close side-lobes. The FFT length is 16384
points and the plot is the result of power averaging over eight acquisitions.
The graph reveals that the odd harmonic components at 3, 5 and 7 kHz are a
much higher levels than the even harmonic components at 2, 4 and 6 kHz,
which leads to the conclusion that the high-level non-linearity is a result of
symmetrical mechanisms (which produce odd harmonics). The dominance of
the third harmonic confirms the indication inferred from the dip after 6.7 kHz
in Figure 42, as the third harmonic of input frequencies above 6.7 kHz will fall
outside the 20 kHz measurement band.
This plot can also be used to estimate the uncorrelated noise level as distinct
from the discrete harmonic components. The number of points in the plot
matches the number of points in the FFT output, so every FFT point has been
plotted. This means that an estimate of noise density can be made. The win-
dow scaling factor for the equiripple window is 2.63191, and sample rate is
96 kHz, so the factor for conversion to noise density is:
Noise density correction
æ 1 FFT Points ö
= 10×logç
ç × ÷
÷dB
èWindow Scaling Sampling Frequency ø
æ 1 16384 ö
= 10×logç × ÷
è 2.63191 96000 ø
= 119
. dB.
The in-band products up to the fourth order are at 2 kHz, 4 kHz and 16 kHz.
AES17 specifies that the measurement is of the ratio of the total output level to
the rms sum of the second- and third-order difference frequency components
on the output.
0
Figure 45. FFT of IMD test
-20
output, with 18 kHz and 20
-40
kHz input.
-60
d
B
-80
F
S
-100
-120
-140
-160
0 5k 10k 15k 20k 25k 30k 35k 40k 45k
Hz
Overload Response
It is important that when an input signal exceeds the full-scale range the er-
rors that are produced are as benign as possible. This is especially true in some
audio applications, such as broadcast, when there may be no opportunity to re-
try at a lower gain setting.
An example of non-benign behavior under overload conditions is inversion
in the digital output signal. In the early years of digital audio some systems
were liable to do this. The numeric processing would “wrap” from positive full
scale to negative full scale as a result of the most significant bit inverting.
In recent times it is more widely understood that numeric processes should
be designed to prevent overload behavior by limiting the signal to the full-
scale level rather than allowing it to wrap. However, it is still quite possible for
a software coding error to produce this problem.
Delta-sigma converters can also have non-benign overload characteristics as
they can become unstable in overload conditions.
Quantization Distortion
A sine test signal at a very low frequency can be used to stimulate most of
the levels in an ADC. If the output of the ADC is filtered so that the main tone
and principal harmonics are not present, the remnant can give an indication of
quantization distortion. A “quantization distortion” measurement following
this approach was proposed in the past, using a notch filter to attenuate the
main tone by over 80 dB with an additional high-pass filter to take out the har-
monics. The low frequency tone was to be 41 Hz and the filter corner fre-
quency was set at 400 Hz.
Although this test is no longer recommended by any measurement stan-
dards, it is occasionally referred to and is still honored by the 400 Hz high-
pass filters found in some test equipment.
Truncation Artifacts
The error produced by inadequate dither at a quantization—and this can oc-
cur at any of several points within an ADC—is correlated with the data bits of
lower significance. These bits have a poor correlation with the signal when the
signal level is high, but they have a high correlation when the signal is low.
This high correlation will result in artifacts at discrete frequencies that are
harmonics, and sometimes aliased harmonics, of the stimulus frequency. There
is no standard that specifies the measurement of this effect.
The harmonics resulting from truncation can be observed, for example, in
the spectrum of the output of a device stimulated by a low level tone, such as
the –60 dB FS tone of the signal-to-noise measurement discussed earlier and il-
lustrated in Figure 39. There are some discrete frequency components shown,
but none are harmonically related to the original tone. Sometimes it is neces-
sary to average the FFT spectrum a large number of times to smooth out the
representation of the noise floor so that discrete components will be more
obvious.
-115
Figure 46. FFT of signal-to- -116
d -119
B
F
S -120
-121
-122
-123
-124
-125
0 2k 4k 6k 8k 10k 12k 14k 16k 18k 20k 22k 24k
Hz
Noise Modulation
It is possible for noise or dither to have decorrelated the truncation error
from the input signal, but not decorrelated the truncation error power from the
input signal. For example, truncation error power might be maximized when
the mean signal level is centered on a quantization step, while it would be mini-
mized if the signal is centered between quantization steps.
This correlation of truncation error power with signal is a form of noise
modulation.
A simple test for this might be to measure the noise or noise spectrum for
various DC levels in the ADC. However, since ADC inputs usually have DC
blocking filters, it is normally not possible to control the DC level in the
converter.
An alternative to trying to manipulate the DC level in the ADC is to stimu-
late the ADC with tone signals of various amplitudes. However, this approach
will not give results as clear as varying the DC level.
The broadband noise variation is of interest, but the variation of the noise
spectrum will often be more revealing. This can be examined using either a
swept bandpass filter or an FFT approach.
A third-octave bandwidth is appropriate for the swept bandpass filter mea-
surement, as it scales in bandwidth with frequency and in this respect it is simi-
lar to the width of the auditory filter that detects noise. The maximum
variation in noise for each third-octave frequency should be noted.
In the FFT approach, if the variation in noise level is small then it may be
swamped by the statistical variation of the FFT noise floor. In this case it is
possible to use FFT power averaging to reduce the statistical variation.
Jitter Modulation
Jitter is the error in the timing of a regular event, such as a clock. The intrin-
sic jitter of a device is the element of jitter that is independent of any external
clock synchronization input, and the jitter transfer function indicates the rela-
tion between an external synchronization input and the jitter of the device.
The jitter of the clock that determines the ADC sampling instant—which is
called sampling jitter—is the only jitter that has any effect on analog-to-digital
conversion performance. Jitter in other clocks may or may not be indicative of
the jitter on the sampling clock.
The direct connection of test probes to a sampling clock inside a particular
ADC might be possible, but measurements using this technique are beyond the
scope of this article. Instead, we will measure the effect of the jitter on the
audio signal.
The theory of sampling jitter in an ADC is discussed in detail in reference.3
Const N_frequencies = 10
Const StartFreq = 100
Const EndFreq = 39e3
-120
-140
-160
0 2.5k 5k 7.5k 9.64287k 12.5k 15k 17.5k 20.0006k 22.5k
Hz
Figure 47 illustrates one of the FFT traces. The cursors highlight the main
component at 20 kHz and the jitter sideband at 9.64287 kHz. The sideband am-
plitude is first calculated from theory using the amplitude of the applied jitter
and the stimulus tone frequency. The difference between the calculated level
and the actual level is then plotted as the jitter gain.
Note that the “skirts” around the main 20 kHz component are a byproduct
of noise in the jitter generation mechanism and do not represent jitter intrinsic
to the converter under test. These skirts disappear when the jitter generation is
disabled.
+10
Figure 48. Jitter transfer +8
function. +6
+4
+2
+0
d -2
B
-4
-6
-8
-10
-12
-14
Figure 48 shows the total measured jitter transfer function using this proce-
dure. You can see that there is between 1 and 2 dB of jitter peaking at 5 kHz,
and jitter attenuation above 8 kHz. Above 20 kHz the slope is about 6 dB per
octave, which indicates a first-order response. More measurements could be
made near the 5 kHz point to be assured that the jitter peaking is not much
worse than 2 dB, but the main conclusion is clear: this device does not have
significant audio-band jitter attenuation. This compares with other converters
which have as much as 60 dB attenuation at 500 Hz to ensure that modulation
sidebands cannot approach audibility.
For an ADC, the upper jitter frequency limit is set by the maximum side-
band offset that can be achieved within the audio band. In the case of a 20 kHz
bandwidth system, the maximum frequency offset is just under 40 kHz with
the stimulus tone at 20 kHz. The highest frequency plotted by this procedure is
39 kHz. This produces a sideband at
20 kHz - 39 kHz = 19 kHz.
The lower jitter frequency limit is set by the frequency resolution of the
FFT. This procedure uses a 32768 point FFT with an equiripple window,
which limits the lower jitter frequency measurement at about 15 Hz. The jitter
frequency range selected for this measurement has a lower limit at 100 Hz, so
in this case the number of FFT points could be reduced to 8192 with the bene-
fit of increased processing speed.
Notice that Figure 49 has some close-in skirts that appear to start from
–110 dB FS and go down to –122 dB FS at 20 kHz, ±1 kHz. At higher offsets
the slope is more relaxed. This slope could be a shaped noise floor that is not
due to sampling jitter. To eliminate that possibility, the slope should be com-
pared with the shape of the noise floor produced by a lower-amplitude or
lower-frequency stimulus tone.
There is a peak at about 11.6 kHz. This may or may not be due to sampling
jitter, so its effect on the total result should be considered as another “un-
known” in the measurement.
The procedure measures the amplitude of each bin between DC and the stim-
ulus tone, then calculates the frequency and level of the jitter required to pro-
duce this level through modulation of the 20 kHz tone. This is plotted in
Figure 50 as (potential) intrinsic jitter versus jitter frequency.
The lower line on Figure 50 is shows jitter density. This line is calibrated in
seconds (RMS) of jitter per root hertz on the left axis. This axis covers the
range from 300 FS (0.3 ps) to 15 ps.
Figure 50. Calculated 10p 100p
S
intrinsic jitter per root hertz, e
7p
6p
70p
60p
c 5p
and integrated to 20 kHz. o
n 4p
50p
40p
d
s 3p 30p S
e
/ 2p c
20p o
r n
o d
o 1p s
t 10p
H 0 7p
z 0 6p
0 5p
0 4p
0 3p
200 500 1k 2k 5k 10k 20k
Hz
The upper line is the integration of this jitter density, representing the total
jitter measured from the frequency on the X-axis to the right-hand limit of the
graph. This shows, for example, that the total jitter above 1 kHz is just over
100 ps, and above 200 Hz it is about 120 ps.
The amplitude of the discrete component that was noticed earlier—at an off-
set of 8.5 kHz—is not easy to determine from the noise density curve. The
slight step in the integration curve that it produces shows that it is not very sig-
nificant; so, the uncertainty about the cause of this component does not add a
large uncertainty to the total result.
The speculative interpretation of the original FFT into sampling jitter
should be treated carefully, but as an indicator of the maximum possible sam-
pling jitter spectral density it is a very sensitive tool.
Signal
Samples
2
0 0.5 1
Time, ms
Power spectrum
0
Amplitude, dB FS
100
0 4 8 12 16 20 24 28 32
Frequency, kHz
Figure 51. 64-point FFT with a 1 ms block length showing a 6 kHz sine with noise.
The magnitude of the transform output is shown in the lower graph. There
is one bin at 0 dB FS, which corresponds to the input sine wave, and the other
bins are less than –100 dB FS, corresponding to the white noise.
Signal
Samples
2
0 0.5 1
Time, ms
Power spectrum
0
Amplitude, dB FS
100
0 4 8 12 16 20 24 28 32
Frequency, kHz
Figure 52. 64-point FFT with a 1 ms block length showing the leakage from a 6.3 kHz sine that
does not repeat over 1 ms (no window).
Note that the frequency axis consists of 33 bins spread from DC (0 Hz) to
FS/2 (32 kHz). The DC and FS/2 bins are at the end points of the spectrum and
consequently are half the width of the other bins, which are 32 kHz/32=1 kHz
wide. (The FS/2 bin is not very useful and is often ignored).
This example represents a special case where a integer number of cycles of
the waveform fit exactly into the input data block. In the frequency domain,
this means that the fundamental frequency is exactly centered on the bin corre-
sponding to the number of cycles that the waveform has completed in the input
data block length. Hence, the peak at bin number 6 in the previous figure.
The Fourier transform can correctly represent only a static signal. The 64-
sample data block transforms to a frequency-domain representation of a static
signal made by repeating the data block forever. In the audio measurements
that use FFTs the signals normally used are fairly static: they do not last for-
ever but they are stable for the duration of the measurement. In cases where
the signal exactly repeats over the length of the data block, as in the example
just illustrated, the transform will produce a good representation.
Windowing
Normally, the signal does not exactly repeat over the FFT block, and a dis-
continuity appears in the signal at the point where the data at the end of the
buffer wraps into the data at the start of the buffer. This discontinuity trans-
forms into the frequency domain and is likely to swamp the features of inter-
est, as shown in Figure 52.
In this case there are 6.3 cycles of the sine wave in the 64-sample block. At
the point where the end of the block wraps to the beginning, there is a large dis-
continuity. This discontinuity distorts the power spectrum so that the noise
floor is swamped by wide skirts to the main spectral peak; this mechanism is
called leakage.
Of the two techniques available, windowing is the most commonly used.
Windowing multiplies the input data block by one of several window func-
tions that tapers the signal at both ends of the block and minimizes the disconti-
nuity.
Figure 53 illustrates the use of a Hann window on the same signal as used
in the Figure 52. This window function is one cycle of an inverted raised co-
sine, and, apart from a rectangular window (which is effectively no window) it
is the simplest used. In this example, the Hann window is scaled so that it has
a mean square value of 1, which preserves approximately the same power in
the data block.
Signal Hann
Samples window
2
0 0.5 1
Time, ms
Power spectrum
0
Amplitude, dB FS
100 No window
Hann window
0 4 8 12 16 20 24 28 32
Frequency, kHz
Figure 53. Non-synchronous, FFT of 1 kHz signal. Non-windowed (blue) compared with Hann
window (black).
The power spectrum shown in the lower graph of Figure 53 displays the ben-
efit of the Hann window in the much lower skirts. However, when compared
with the synchronous FFT, the effect has been to broadening the spectral peak
and add skirts where the power in the main lobe has “leaked” to nearby bins.
Figure 54. Comparison of FFT windowing functions supplied with System Two Cascade.
Figure 55. Comparison of additional FFT windowing functions supplied with System Two
Cascade.
Several window functions are in common use, each representing a different
compromise between frequency resolution and leakage. Figures and show ex-
amples of the windows supplied with the Audio Precision System Two Cas-
cade.
-50
-60
-70
-80
d -90
B -100
F
S -110
-120
-130
-140
-150
0 5k 10k 15k 20k 25k 30k 35k 40k 45k
Hz
Figure 56. FFT spectrum with noise floor at –109 dB FS.
The window spreads the energy from the signal component at any discrete
frequency, and the Y-axis calibration takes this windowing into account. For
the Blackman-Harris window used here, the calibration compensates for the
power being spread over a bandwidth 2.004 bins wide.
This can be converted to the power in a 1 Hz bandwidth, or the power den-
sity, by adding a scaling factor in dB that can be calculated as follows:
Conversion factor
æ 1 ö
= 10×logç
ç ÷
÷
èWindow Scaling ×Bin Width ø
æ 1 FFT Points ö
= 10×logçç × ÷
÷
èWindow Scaling Sampling Frequency ø
æ 1 1024 ö
= 10×logç × ÷
è 2.004 96000 ø
= 22.73 dB.
This scaling factor is for the FFT used in Figure 56, which uses a
Blackman-Harris window, a 1024-point FFT and a sampling frequency of
96 kHz. Note that the calculation is in power terms so the ratio in dB is 10
times the logarithm of the ratios.
For some of the other windows used in the Audio Precision Systems FFT
analysis the figures are:
To estimate the noise from a device based on an FFT spectrum you can inte-
grate the power density over the frequency range of interest. For an approxi-
mately flat total noise (where the noise power density is roughly constant) it is
possible to estimate the sum of the power in each bin within reasonable accu-
racy, by estimating the average noise power density and multiplying by the
bandwidth.
Figure 56, for example has a noise floor that is approximately in line with
about –134 dB FS on the Y-axis. The conversion factor for this FFT was previ-
ously calculated as –22.7 dB, so the noise power density is:
-134 dB FS- 22.7 dB =-156.7 dB FS per Hz.
The integration to figure the total noise over a given bandwidth is simple if
the noise is spectrally flat. Multiply the noise power density by the bandwidth,
which in this case is 20 kHz. For dB power (dB=10logX), this is the same as
adding 43 dB, as follows:
Noise = NoiseDensity (dB)+10×log ( Bandwidth) dB FS
=-156.7 +10×log( 20 000)
=-156.7 + 43 dB FS
=-113.7 dB FS.
Power Averaging
Power averaging is normally used to reduce the statistical variation of a
noise floor. This is achieved by acquiring a number of FFT power spectra and
computing the mean result for each bin. The noise in each bin is reduced to a
statistical mean, and any spectrally discrete components (often called spuriae)
will become more obvious.
This also makes it easier to visually estimate the amplitude of the noise
floor using the technique described above.
Synchronous Averaging
It is possible to average the signal in the time domain before applying the
transform. This synchronous averaging technique requires that successively ac-
quired data blocks have their signals aligned in time before averaging. This
can be done with a trigger, or by adjusting the timing of each acquired data
block to match the previously acquired data. Either way, this technique will re-
duce the noise level below the statistical mean value, while preserving the
level of components that are synchronized with the main (or trigger) signal.
Synchronous averaging is used to find spectral features that are below the
level of the noise. The indicated level of non-synchronous components, such
as noise, is not significant.
The following APWIN Basic procedures are referred to or used in this chap-
ter:
§ a-d Menu.apb
§ a-d Setup.at2c
These files are provided on the companion CD-ROM. You may also down-
load the files from the Audio Precision Web site at audioprecision.com. These
procedures and tests are designed for use with System Two Cascade, but with
minor changes can be modified to work with System Two as well.
Please check the README.DOC file in the same folder for further informa-
tion.
References
Introduction
converted to the analog domain; the response to this data is also an as-
pect of testing that is new for this article.
Measurement Techniques
'Set analogue reference for full scale level (based on digital output level)
AP.S2CDsp.Analyzer.ChALevelTrig 'Reset ready count for new reading
AP.S2CDsp.Analyzer.ChBLevelTrig 'Reset ready count for new reading
LevelA = AP.S2CDsp.Analyzer.ChALevelRdg("dBV")
LevelB = AP.S2CDsp.Analyzer.ChBLevelRdg("dBV")
AP.Anlr.RefChAdBr("dBV") = LevelA-AP.DGen.ChAAmpl("dBFS")
AP.Anlr.RefChBdBr("dBV") = LevelB-AP.DGen.ChBAmpl("dBFS")
Figure 57. Procedure script to calibrate analog analyzer dBr reference to be equivalent to dB FS at the DAC output
(extracted from “d-a gain.apb”).
Gain
The gain of a DAC is normally quoted as the analog output level resulting
from a digital input level of 0 dB FS. Practical devices may have non-
linearities that mean that the gain at that level is not representative of the gain
over most of the range, so gain is often measured at a digital output level be-
low 0 dB FS, often –20 dB FS.
As an example, you may find that a level of –20.00 dB FS on the input to a
DAC generates an output level of 3.68 dBV. The gain of the DAC, then, ex-
pressed as the output level corresponding to an input level of 0 dB FS, is:
0 dB FS = 3.68+ 20.00= 23.68 dBV.
Unless otherwise specified, the gain is quoted at a frequency of 997 Hz.
The procedure “d-a gain.apb” measures the gain as described above and
also sets the user-defined analog output reference levels, (dBr A, and dBr B) to
correspond to dB FS based on this gain value.
An output of this procedure can be viewed in the APWIN Log File:
==========================================
D-A converter gain
==========================================
Output level for at -20dB FS input
Channel A: -14.023 dBV
Channel B: -13.937 dBV
Equivalent to gain of:
Channel A: 5.977 dB(V/FS)
Channel B: 6.063 dB(V/FS)
Gain stability
The gain of a DAC may drift due to instability in the converter reference
voltage or the value of other components. This variation can be monitored
over time to determine the gain stability.
The output level stability test defined in AES17 is a measurement of the
variation in the DAC output level with a –6 dB FS input, over a period of at
least an hour. The DAC is first given a brief (typically 5 minute) warm-up.
The APWIN procedure “d-a output gain stability.apb” illustrates the various
settings that are required to perform this test accurately. The key parameters
are set near the top of the procedure:
'Test conditions
TestLevel = -6 ' dBFS
TestFreq = 997 ' Hz
DeviceWarmUpInterval = 0.1 ' (minutes)
StabilityTestDuration = 1 ' minutes (>= 60 for AES17)
==========================================
Output-level stability (AES17-1998 cl 6.5)
==========================================
Level variation over 1.0 minutes
Channel A 0.0012 dB
Channel B 0.0012 dB
Gain-frequency response
Digital audio signals can only represent a selected bandwidth. When con-
structing an analog signal from a digital audio data stream, a direct conversion
of sample data values to analog voltages will produce images of the audio
band spectrum at multiples of the sampling frequency. Normally, these images
are removed by an anti-imaging filter. This filter has a stopband that starts at
half of the sampling frequency—the folding frequency.
Modern audio DACs usually have this anti-imaging filter implemented with
a combination of two filters: a sharp cut-off digital finite impulse response
(FIR) filter, followed by a relatively simple low-order analog filter. The digital
filter is operating on an oversampled version of the input signal, and the ana-
log filter is required to attenuate signals that are close to the oversampling
frequency.
+10
Figure 58. Anti-image filter. 0
–10
–20
–30
d
B
–40
–50
–60
–60
–70
0 10k 20k 30k 40k 50k 60k 70k 80k 90k 100k 110k 120k 130k
Hz
This figure shows an anti-image filter frequency response for one DAC op-
erating at a sampling frequency of 48 kHz. The response is “normalized” to
1 kHz (in practice this may be 997 Hz for reasons discussed in the chapter An-
alog-to-Digital Converter Measurements beginning on page 37). This means
that the y-axis is adjusted for the response to read 0 dB at 1 kHz. The passband
shows little variation up to a edge where the gain falls rapidly into the
stopband. The region between the passband and stopband, in this case from
22 kHz to 26 kHz, is the filter transition region.
The key parameters of the transition region and stopband are:
-0.2
d
B
-0.4
-0.6
-0.8
-1
1k 2k 3k 4k 5k 6k 7k 8k 9k 10k 11k 12k 13k 14k 15k 16k 17k 18k 19k 20k 21k 22k
Hz
-0.2
d
B
-0.4
-0.6
-0.8
-1
10 20 50 100 200 500 1k 2k 5k 10k 20k
Hz
passband deviation. In this case, a figure of 21.5 kHz could be quoted for
a deviation of 0.08 dB.
§ Passband deviation.
This is the maximum deviation of gain over the passband when com-
pared with the gain at 1 kHz (the graph in Figure 59 is normalized to
1 kHz). This deviation normally has more than one component. In the
case of Figure 59 there is a regular sinusoidal ripple (caused by the high-
order digital FIR filter) superimposed on more gradual gain changes
(caused by low-order effects) which slope from the peak at 3.8 kHz.
Over the range of 100 Hz to 22 kHz the maximum deviations from the
1 kHz gain are +0.07 dB at 3.8 kHz and –0.08 dB at 19.8 kHz.
shows a passband ripple that has a periodicity of 3.6 kHz. The passband
ripple periodicity can be related to the time dispersion of signals in the
4
passband of the filter.
The logarithmic frequency plot, shown in Figure 60, is more useful for rec-
ognizing the low-frequency roll-off due to DC blocking filters. (There could
also be other components, such as transformers, that could be responsible for
this.) Figure 60 shows that the 20 Hz response is about 0.2 dB down (from the
1 kHz reference).
In all the four stopband FFT plots the test conditions are similar: The black
trace shows the output spectrum when stimulated by the white pseudo-random
MLS sequence at FS = 48 kHz. The gray trace is a reference trace which is of
the DAC stimulated with a tone that has the same peak amplitude. This refer-
ence trace is plotted in order to show the measurement noise floor so that it
can be distinguished from the images of the MLS stimulus. The black trace is
normalized for 0 dB at 1 kHz. The gray trace is scaled by the same amount.
In Figure 61 for DAC “A,” at some frequencies the black and gray traces
are at the same level. At those frequencies noise dominates so we can only ob-
serve that the attenuation must exceed this level. (The gray trace shows attenu-
ated spectral images of the 2 kHz tone and we shall see later how with a tone
stimulus we can explore the points of the frequency response much more
slowly but with a greater sensitivity.)
This plot indicates that the minimum stopband attenuation over the band to
130 kHz is 49 dB. Defining the stopband lower edge frequency by the lowest
frequency with that attenuation (a definition for convenience) we get a figure
of 26 kHz. The attenuation at the folding frequency (24 kHz) is 6 dB.
The manufacturer of this part quotes a minimum stopband attenuation of
72 dB. This appears to be true for the attenuation of the spectral images either
side of FS (48 ±24 kHz) but this plot shows that this is not true of the images ei-
ther side of 2 · FS (96 ±24 kHz). The response at the 2 · FS images is indicative
of a zero-order hold function operating on 96 kHz data (rather than a more
complex FIR filter that may be expected). Perhaps the manufacturer had forgot-
ten about this characteristic when producing the specification?
The passband lower edge is at 26.5 kHz and the attenuation at the folding
frequency is 7 dB.
+30
+25
Figure 63. Stopband FFT, +20
+15
DAC “C.” +10
+5
0
-5
-10
-15
-20
-25
-30
d
-35
B
-40
-45
-50
-55
-60
-65
-70
-75
-80
-85
-90
-95
-100
10k 20k 30k 40k 50k 60k 70k 80k 90k 100k 110k 120k 130k
Hz
can be observed using an FFT. The procedure “d-a stopband sweep.apb” does
this for a small range of stimulus frequencies to illustrate the method.
Figure 64 shows the output of that procedure for DAC “D.” There are seven
FFT spectra laid over each other. The FFTs are taken with test frequencies
from 18.2 kHz to 22.5 kHz. This band of test frequencies was chosen for this
example so that the harmonic distortion products do not overlap with the im-
ages to make it even more confusing!
The spikes in the ranges from 36 kHz to 45 kHz and from 54 kHz to 68 kHz
are second and third harmonics of the test frequencies, and do not tell us any-
thing about the filter (except that the harmonic distortion is probably occurring
after the main filtering action has occurred). The spikes from 25.5 kHz to
30 kHz are the lowest set of spectral images of the stimulus tones, and are at a
frequency corresponding to the difference between the input and sampling fre-
quencies. These images are near to the lower stopband edge frequency that we
observed in Figure 64. The results appear to indicate a minimum attenuation of
around 107 dB. (Measurements at more stimulation frequencies would be re-
quired to confirm that this result is typical of the whole band.)
The next-higher set of spectral images is at the sum of the input and sam-
pling frequencies. These would fall in the range 66.2 kHz to 70.5 kHz. They
appear to be very close to the noise floor at below –113 dB.
The difference between the stopband attenuation of DAC “D,” at greater
than 100 dB, and the attenuation of the first two DACs, at less than 50 dB, is
interesting.
such as a sine wave sweep can be used. (More sophisticated methods such as
multi-tone could also be used but will not be described here.)
In the procedure “d-a passband.apb” the Analog Analyzer level meters are
used to measure the level on the output of the DAC being tested. The Digital
Generator is set for an output of –20 dB FS and the frequency swept over the
complete range supported by the generator (10 Hz to 0.47 · FS). During the
sweep the results are averaged to improve the accuracy of the results.
As the passband ripple is an indicator of time dispersion, the test has particu-
lar significance. Since passband ripple levels from modern DACs are small the
test has to be as precise as possible. That is why the APWIN Digital Analyzer
measurement mode (which is capable of faster measurements using the ADC
with DSP measurement techniques) is not used.
Use of the Analog Analyzer avoids confusion between the ripple of the
DAC being measured and the ripple in the test equipment ADC filter. How-
ever, this procedure could be adapted to use the Digital Analyzer as long as a
suitable equalizing correction was incorporated into the test.
Figure 65 is a graph showing the passband frequency response of DAC “B”
with a logarithmic frequency scale. This shows that within the range of the dig-
ital sine generator the frequency response of the DAC deviates by less than
0.8 dB at the low-frequency limit of 10 Hz, and about 1.1 dB at the high-fre-
quency limit of 0.47 · FS. Over the conventional audio frequency range from
20 Hz to 20 kHz the deviation is determined by the low-frequency attenuation
of 0.2 dB.
d -0.4
B -0.5
-0.6
-0.7
-0.8
-0.9
-1
-1.1
10 20 50 100 200 500 1k 2k 5k 10k 20k
Hz
shows that above that frequency the response falls rapidly beyond the range of
the gain fluctuations lower in the passband.
=-57dB.
This indicates that the primary passband dispersion of the filter is producing
echoes 56 dB below the main signal and separated by 290 µs before and after.
==================================================
Output amplitude at full scale (AES17-1998 cl 6.3)
and maximum output amplitude (AES17-1998 cl 6.4)
==================================================
For the current settings of the device under test:-
The gain (measured at -20dB FS) is
Channel A 7.083 dBV/FS
Channel B 7.040 dBV/FS
Output amplitude at full scale:
Channel A 7.083 dBV, with THD+N of 0.0043% and compression of -
0.001 dB
Channel B 7.040 dBV, with THD+N of 0.0042% and compression of -
0.001 dB
If the device under test has controls that can alter the output
level then the maximum output amplitude is determined by adjustment
of the controls of the device under test to the maximum level.
§ With the gain control set to maximum and an the input signal set to
0 dB FS, the output amplitude is measured. If the THD+N or compres-
sion are below the 1% and 0.3 dB targets of the following two measure-
ments, then this is the maximum output amplitude.
§ If the THD+N is greater than 1%, then the gain setting is adjusted until
THD+N measures 1%. Then the output level is measured.
shown in Figure 69 is the output level sweep at 0 dB. This has signs of high-
frequency roll-off above 15 kHz.
The plot of compression versus frequency in Figure 70—taken as the differ-
ence between the previous plot and the –20 dB FS reference plot—shows that
if there is any frequency dependent compression it is below the normal varia-
tion of the measurement. The high-frequency roll-off is therefore not due to
compression. It is part of the linear frequency response of the DAC.
0 dB FS. +7.3
+7.2
+7.1
d
B +7
V
+6.9
+6.8
+6.7
+6.6
+6.5
10 20 50 100 200 500 1k 2k 5k 10k 20k
Hz
Figure 68 shows the output graph from the APWIN procedure “d-a full
scale thd v freq.apb.” This procedure makes a sweep of THD+N against fre-
quency for a sine wave at digital full scale.
The measurement is band-limited to 80 kHz, rather than the AES17 stan-
dard for THD+N of 20 kHz or less. This change has been made so that it re-
mains sensitive to harmonics when the sine wave is above 10 kHz. Otherwise,
for example, distortion due to clipping that started at 15 kHz and produced har-
monic distortion products at 30 kHz, 45 kHz, 60 kHz and 75 kHz would not
show up as a significant change to the line.
The resulting plot in Figure 68 has a flat line at –85 dB over most of the
band, and indicates that this device does not suffer from any artifacts that re-
duce the working maximum output level to below the level corresponding to
full scale on the input.
Even at 22.5 kHz, where the measurement rises to about –60 dB (or 0.1%),
the reading is still significantly below the 1% threshold. It should be noted that
this reading is probably not due to a harmonic distortion product but from a
sampling image that has been inadequately attenuated by the reconstruction fil-
ter in the DAC. The first image of a 22.5 kHz tone when sampled at the
48 kHz sampling frequency used here would appear at 48–22.5=25.5 kHz,
which is below the starting frequency of the stopband for this DAC.
+5
Figure 71. D-A converter +4.5
+4
output clipping. +3.5
+3
+2.5
+2
+1.5
+1
+500m
V 0
-500m
-1
-1.5
-2
-2.5
-3
-3.5
-4
-4.5
-5
1m 1.5m 2m 2.5m 3m 3.5m 4m
sec
filters wrapped and the overshoot caused the sign of the signal to change—so
a properly limited clip is a fairly good sign.
As can be seen in Figure 71, this particular converter shows some asymme-
try in the clipping between the overshoot that proceeds the transition and the
overshoot that trails it. The filtering function of this DAC is designed to be lin-
ear phase between the digital and analog domains, with phase and amplitude er-
rors due to the analog filter compensated by the digital filter. However, the
clipping here occurs at the output of the digital filter, so the phase compensa-
tion is not applied to the result of the clipping. The overshoots that follow the
transition have a component due to the analog filter and, in the clipped condi-
tion, are not symmetrical with the overshoots preceding the transition, which
have no component due to the analog filter.
+2
+1.5
2.2m 2.3m 2.4m 2.5m 2.6m
sec
+3.5
Figure 73 . DAC “A” output
+3
clipping.
V +2.5
+2
+1.5
2.2m 2.3m 2.4m 2.5m 2.6m
sec
In Figure 73 the same plot for DAC “A” shows a different result. An over-
shoot of 0.7 V is showing no obvious signs of clipping. This indicates a head-
room of at least:
3.4
20×log = 18
. dB.
2.75
It is possible to measure headroom beyond that shown by the clipping of a
full scale square wave. Digital filters can be overdriven further by even more
complex signals. The simplest method of generating an arbitrary near-worst
case signal is to use a maximum length sequence (MLS). The MLS generator
in System Two Cascade has an output consisting of samples of a constant am-
plitude and with a sign that varies pseudo-randomly according to the sequence.
In that sequence some of the patterns of sample values will produce exception-
ally high output values, and these high points can be used to probe the clipping
behavior.
Figure 74 illustrates how the MLS can produce output peaks that far exceed
the levels presented at the input. For DAC “A,” digital full scale sine wave pro-
duces a peak output voltage of about 2.7 V, so an amplitude of 50% would cor-
respond with 1.35 V. As the 50% MLS here is producing peaks of 3.1 V that
represents an increase of 130%.
The values highlighted by the cursor can be used to probe the internal head-
room of the converter. This is shown in Figure 75.
scale input. +2
+1
V 0
-1
-3
-1.981
-2
-3
-4
331.3m 331.4m 331.5m331.5m 331.6m 331.7m
sec
This figure has expanded the trace in the region near the positive peak high-
lighted in Figure 74 and repeats the measurement for various generator ampli-
tudes. The trace shows the results taken for amplitudes of 60%,70% and 80%
of full scale (more than these were originally measured in order to find the am-
plitude where clipping occurs). The trace at 80% of full scale is showing signs
of overload at the selected peak. The previous measurement showed that this
peak is 130% above (2.3 times) the nominal input signal level. Therefore this
indicates clipping at 2.3 · 0.8 = 1.84 times full scale, or +5.3 dB FS.
Another method for examining the overload characteristic of a DAC uses
the Analog Analyzer peak meter to measure the peak level compression of the
MLS at high levels. The procedure “d-a output clipping.apb” illustrates this
technique, again with DAC “A.” The results are tabulated below:
===============================================
D-A filter MLS overshoot compression
===============================================
Channel A
Gain: 8.43 dBV/FS
Peak overshoot: 7.37 dB
I/P I/P+Overshoot O/P-Gain Compression
-7.00 dB FS 0.37 dB FS 0.45 dB FS 0.09 dB
-6.00 dB FS 1.37 dB FS 1.44 dB FS 0.08 dB
-5.00 dB FS 2.37 dB FS 2.10 dB FS -0.24 dB
-4.00 dB FS 3.37 dB FS 2.39 dB FS -0.94 dB
-3.00 dB FS 4.37 dB FS 2.56 dB FS -1.82 dB
-2.00 dB FS 5.37 dB FS 2.69 dB FS -2.67 dB
-1.00 dB FS 6.37 dB FS 2.78 dB FS -3.58 dB
0.00 dB FS 7.37 dB FS 2.82 dB FS -4.53 dB
===============================================
===============================================
D-A filter MLS overshoot compression
===============================================
Channel A
Gain: 9.49 dBV/FS
Peak overshoot: 7.55 dB
I/P I/P+Overshoot O/P-Gain Compression
-9.00 dB FS -1.45 dB FS -1.35 dB FS 0.05 dB
-8.00 dB FS -0.45 dB FS -0.41 dB FS 0.05 dB
-7.00 dB FS 0.55 dB FS 0.33 dB FS -0.21 dB
-6.00 dB FS 1.55 dB FS 0.61 dB FS -0.93 dB
-5.00 dB FS 2.55 dB FS 0.78 dB FS -1.75 dB
-4.00 dB FS 3.55 dB FS 0.97 dB FS -2.56 dB
-3.00 dB FS 4.55 dB FS 1.22 dB FS -3.37 dB
-2.00 dB FS 5.55 dB FS 1.46 dB FS -4.16 dB
-1.00 dB FS 6.55 dB FS 1.58 dB FS -4.95 dB
0.00 dB FS 7.55 dB FS 1.73 dB FS -5.78 dB
===============================================
This shows the compression reaches 1 dB with the input level –6 dB FS.
The peak output level is then approximately +0.6 dB FS, or 3.2 V. This small
amount of headroom confirms the result shown with the square wave earlier.
Is this measurement useful?
Most signals driving into a DAC are not likely to cause any overshoot, and
so the amount of headroom beyond 0 dB FS is not relevant to the faithful re-
production of those signals. However, some signals may cause overloads in
DACs. The MLS signal is not meant to be a representative signal. It is being
used as a (nearly) worst-case signal in order to measure other effects. For ex-
ample, if—in another device—the kind of signal inversion that occurs in the
trace of DAC “A” were to occur only just above full scale (rather than at
+5.3 dB FS) it may then produce audible artifacts in the presence of some
high-level material.
Noise
The digital-to-analog conversion process will always have errors, and in an
ideal system these errors should be inaudible. However, if they are audible, or
The noise floor of most converters is fairly flat, so these figures indicate the
difference in results that might be quoted. The A-weighting gives the lowest
noise figure and is normally the figure quoted on the front page of a data sheet.
Where the noise is fairly flat you can add 2.3 dB to an A-weighted noise figure
to estimate the unweighted noise over the DC to 20 kHz band.
Quasi-Peak Measurements of
TPDF Dithered Truncation
Unweighted rms –0.02 dB
Unweighted Q-peak 4.67 dB
CCIR-RMS weighted rms 1.36 dB
CCIR 468-4 weighted rms 6.99 dB (= CCIR-RM + 5.629 dB)
CCIR 468-4 weighted Q-peak 11.64 dB
==============================================================
Idle channel and signal to noise ratio (AES17-1998 cl 9.1,9.3)
==============================================================
Un-weighted measurements
Un-weighted signal-to-noise ratio
Channel A: -102.89 dB FS
Channel B: -102.99 dB FS
Un-weighted idle channel noise
Channel A: -102.72 dB FS
Channel B: -102.91 dB FS
Figure 77. Results from DAC “A” gathered by “d-a idle channel noise.apb.”
The idle channel noise measurement is not as useful for testing DAC perfor-
mance as the signal-to-noise ratio measurement, discussed on page 108. The
signal-to-noise test measures noise in the presence of signal, while the idle
channel noise test uses the digital zero signal which is not representative of nor-
mal operating conditions and, as a result, can produce misleading results.
For DAC architectures that use a multi-level conversion, the main noise
mechanisms can be a result of level mismatches. For those types of devices a
lack of modulation in the input data (such as for the idle channel test) will pro-
duce a much “better” measurement, one with an unrealistically low noise read-
ing.
Perhaps because of this, the manufacturers of DACs with other architec-
tures have sometimes incorporated circuits that modify the converter’s opera-
tion in order to measure well for this test. These circuits may disable the
conversion function when the number of samples of digital zero received ex-
ceeds a defined number. This is particularly true for delta-sigma converters,
which are not sensitive to internal level mismatches but have other noise
sources that do not vary as significantly with modulation. They can produce a
higher noise reading for the idle channel measurement and could benefit, on pa-
per at least, from such circuits.
==============================================================
Idle channel and signal to noise ratio (AES17-1998 cl 9.1,9.3)
==============================================================
Un-weighted measurements
Un-weighted signal-to-noise ratio
Channel A: -103.00 dB FS
Channel B: -103.01 dB FS
Un-weighted idle channel noise
Channel A: -110.03 dB FS
Channel B: -110.01 dB FS
Figure 78. Results from DAC “A” with “cheat” switch on, gathered by “d-a idle channel noise.apb.”
The two traces show a very similar noise floor, apart from a component near
11 kHz at a level of –132 dB FS in the gray trace. The black trace with the
–60 dB FS signal does not show this component, which suggests that it could
be an idle tone. As idle tones critically depend on the applied signal, a more
thorough investigation would examine the spectra at various levels of DC. The
procedure “d-a idle channel fft v level.apb” illustrates such an investigation,
with the result shown in Figure 80. It is not as easy as simply running this pro-
cedure: the range of DC values for the sweep to produce this graph was se-
lected after investigations with many more traces over a larger span of levels.
The DC values were selected to illustrate how a DC level swept over the
range of 5.6% to 6.6% of full scale causes an idle tone at about –114 dB FS to
sweep from 200 Hz to 15.3 kHz. There are also other components that appear
to be at multiples of these frequencies. These vary in amplitude up to
–102 dB FS at 30.6 kHz.
==============================================================
Signal to noise ratio (AES17-1998 cl 9.3)
==============================================================
AES17 CCIR weighted RMS signal-to-noise ratio
Channel A: -101.44 dB FS CCIR-RMS
Channel B: -101.52 dB FS CCIR-RMS
IEC61606 CCIR-468 (ITU-R BS 468-4) signal-to-noise ratio
Channel A: -91.19 dB FS CCIR Q-Peak
Channel B: -91.33 dB FS CCIR Q-Peak
IEC61606 A-weighted RMS signal-to-noise ratio
Channel A: -105.08 dB FS (A-weight RMS)
Channel B: -105.25 dB FS (A-weight RMS)
Un-weighted RMS signal-to-noise ratio (20kHz band-limited)
Channel A: -102.92 dB FS (Unweighted)
Channel B: -102.95 dB FS (Unweighted)
The values produced by this equation for some common word-lengths and
sampling frequencies are tabulated on the next page:
in the original signal from the noise power measured in the signal-to-noise
measurement.
For example, the DAC in the 16-bit DAT recorder (DAC “D”) has a signal-
to-noise ratio of –93.6 dB FS (unweighted) measured in a 20 kHz bandwidth.
This is close to the signal-to-noise ratio of the applied test signal at
–94.10 dB FS. How do we work out how much noise is added by the DAC?
Uncorrelated noise has the property that the mean square level of the total
noise is the sum of the mean square level of the noise components that are con-
tributing to it. Therefore the relation is:
Output _ Noise 2 = Input _ Noise 2 + Added _ Noise 2 .
Note: This relation applies if all the noise terms are referred
back to the same point; in other words, the output noise and
the device noise should be scaled to the level that they would
have had at the input to produce the level of noise that is be-
ing measured at the output. In the case of DAC output level
measurements in dB FS, the output levels can be referred to
the corresponding digital input level, as the gain scaling is
implicit to dB FS.
For noise levels in a decibel scale, the sum of squares relation is:
OutputNoiseLevel ( dB FS ) InputNoiseLevel ( dB FS ) AddedNoiseLevel ( dB FS )
10 10
= 10 10
+10 10
.
Given a measured output noise level, and a known input noise level in the
applied test signal, this equation can be used to determine the added noise
level:
AddedNoiseLevel( dB FS )
æ OutputNoiseLevel ( dBFS ) InputNoiseLevel ( dBFS )
ö
= 10×logç10 10
- 10 10
÷.
è ø
The procedure “d-a subtracting test signal noise.apb” performs this calcula-
tion with the measured noise, and Figure 82 shows the results for the 16-bit
DAC (DAC “C”) in the DAT recorder. You can see that the intrinsic DAC
noise is at about –102 dB FS. This is significantly less than the total output
noise of –93.6 dB FS measured for the unweighted signal-to-noise
measurement.
At the end of the procedure the unweighted idle channel noise (the DAC out-
put noise with a digital zero input) is measured. This is effectively a measure
of the DAC intrinsic noise, since the input is digital zero without any noise
(from dither or truncation). The result can be directly compared with the DAC
intrinsic noise calculated using the signal-to-noise ratio test signal, listed just
above it on Figure 82. The difference of 6.5 dB between the two intrinsic noise
levels is an example of a noise modulation effect that might not be desired.
==============================================================
Intrinsic DAC noise
==============================================================
Noise measurements (Un-weighted signal-to-noise ratio)
Test signal noise on DAC input: -94.23 dB FS
Channel A total DAC output noise: -93.60 dB FS
Channel B total DAC output noise: -93.64 dB FS
Figure 82. Results of procedure “d-a subtracting test signal noise.apb,” showing intrinsic DAC noise for a
16-bit DAT recorder (DAC “C”) .
Noise spectrum
An FFT of the output of the portable DAT recorder (DAC “C”) with the test
signal used for the signal-to-noise ratio measurement is shown in Figure 83.
This FFT was transformed from 16384 points and power averaged 16 times.
The Blackman-Harris 4 term window was used.
The conversion factor to calculate the noise density scale from the discrete
spectral line amplitude scale is, for this FFT:
Noise density correction
æ 1 FFTPoints ö
= 10×logç
ç × ÷
÷dB
èWindowScaling SamplingFrequency ø
æ 1 16384 ö
= 10×logç × ÷dB
è 2.004 262144 ø
= 15.06 dB.
The procedure “d-a noise floor FFT.apb” includes this correction when pre-
senting the FFT of the noise floor in the presence of a signal used in the sig-
nal-to-noise ratio measurement.
The plot in Figure 83 has been made using a linear frequency scale with the
same number of plotted points as FFT bins. This means that every FFT point is
plotted. It is therefore possible to estimate the noise density by taking the mean
level of the noise floor by eye.
F
S -140
-150
0 10k 20k 30k 40k 50k 60k 70k 80k 90k 100k 110k 120k 130k
Hz
The mean level of the bins in the noise floor is at about –136 dB FS per
hertz over the frequency range up to 20 kHz. This is related to a total noise
level. We can calculate this noise over this bandwidth by integrating the noise
density over that frequency range. In this case it is approximately flat over that
range, so it is possible to calculate this by multiplying the mean density by the
square root of the bandwidth. In logarithmic terms using dB FS this is
expressed as:
Unweighted Noise = MeanNoiseDensity+10log( Bandwidth )
= -136+10×log ( 20,000) dB FS
=-136+ 43 dB FS
=-93 dB FS.
This result compares to within a decibel of the more accurate direct measure-
ment of –93.6 dB FS for the unweighted signal-to-noise ratio made earlier.
Take care that this calculation can be made, because the FFT has been
scaled by the –15 dB calculated earlier in order to represent the signal ampli-
tude in terms of a spectral density per 1 Hz bandwidth. (More information on
Fourier transform scaling is found beginning on page 72)6 For an FFT that has
not been so scaled the calculation is:
Unweighted Noise
æ 1 FFTpoints ö
= 10×logç
ç × ÷
÷
èWindowScaling SamplingFrequency ø
+10×log ( Bandwidth) + Mean amplitude per bin ( dB FS ).
Audio Precision
“HiBW” and “HiRes” converters
The 16-bit ADC in the analyzer has a wider bandwidth but poorer dynamic range
than the analyzer “HiRes” precision ADC. The lower dynamic range is not a problem
because the low test signal level allows analog gain to be applied in front of the test
equipment ADC, so that it can be driven with a signal that is up to 60 dB closer to full
scale than for the DAC under test. The output of the analyzer ADC is then scaled down
in the digital domain by the same ratio.
DAC “C” has a delta-sigma modulator. The rising noise density from
25 kHz is a characteristic of the noise shaping filter used in the modulator. At
the modulator output this noise floor rises further but there is an analog
lowpass filter after the modulator to reduce the amount of ultrasonic noise at
the DAC output. In this case the noise floor appears to be well controlled so
that the noise density does not rise significantly above the in-band noise level.
There are some spurious components shown on the plot in Figure 83. When
looking at an FFT like this—one that has been scaled to show noise den-
sity—the height of the peaks due to discrete, or single frequency, spectral com-
ponents does not correspond to the amplitude of that component. The
correction used earlier needs to be subtracted. In this case 15 dB needs to be
added to the noise density figure at the peak in order to estimate the amplitude
of that component.
It is interesting that the two channels have different spectral components.
The left channel has a –106 dB FS (–121 dB FS + 15 dB FS) component at
96 kHz, which is twice the sample rate, along with sidebands at 1 kHz and odd
harmonics of 1 kHz offsets. The right channel has a fairly strong sample rate
component at –93 dB FS (–108 dB FS + 15 dB FS). These components possi-
bly indicate crosstalk from the respective clocks.
D -100
e
n
s -110
I R CHAN
t
y -120
L CHAN
d
B -130
F
S -140
-150
10 20 50 100 200 500 1k 2k 5k 10k 20k 50k 100k
Hz
In Figure 84, the lower frequency limit has been selected to be 10 Hz. The
bin width for a 262.144 kHz, 16384 point FFT is 16 Hz. The amplitude at the
first three points is due to the broadening of the DC bin by the window func-
tion and does not indicate low-frequency noise. (There is a longer discussion
of the DC bin in the Analog-to-Digital Converter Measurements chapter.)6 The
effect of any non-DC components in the low-frequency noise spectrum is not
apparent until about 64 Hz. and above. The measurement was made in Eng-
land where the power line frequency is 50 Hz. The components at 100 Hz,
200 Hz, and 300 Hz are at even multiples of this power line rate and so are
probably related to power supply ripple or some power line interference.
Harmonic distortion
Deviation from non-linear behavior can be simply investigated using a pure
tone. Any non-linearity in the transfer function of the DAC will result in fre-
quency components in addition to the tone. Static non-linearities (those that de-
pend only on the signal) will result in harmonic products at multiples of the
original tone frequency. The most significant individual harmonics are nor-
mally at low multiples, such as the 2nd and 3rd harmonic at twice and three
times the original (fundamental) frequency.
=====================================================
Total harmonic distortion and noise AES17:1998 cl 8.5
=====================================================
-40
-42
Figure 86. THD+N as a -44
-46
-48
ratio, with total DAC output -50
-52
-54
signal vs. frequency. -56
-58
-60
-62
-64
-66
d -68
-70
B -72
-74
-76
-78
-80
-82 –20 dB FS
-84
-86
-88
-90
-92
-94
-96
-98 –1 dB FS
-100
20 50 100 200 500 1k 2k 5k 10k
Hz
at a much lower level than the noise. The effect is shown more clearly in a
trace of THD+N against test signal level, shown in Figure 87.
-40
-42
Figure 87. THD+N as a -44
-46
-48
ratio, with total DAC output -50
-52
-54
signal vs. applied signal -56
-58
-60
level. -62
-64
-66
d -68
-70
B -72
-74
-76
-78
-80
-82
-84
-86
-88
-90
-92
-94
-96
-98
-100
-80 -75 -70 -65 -60 -55 -50 -45 -40 -35 -30 -25 -20 -15 -10 -5 0
dB FS
This trace was also generated by the same procedure. The main part of the
trace between the coordinates (–50 dB FS,–53 dB) and (–10 dB FS,–93 dB) is
close to being a straight line. This is the measurement of the constant noise
floor at –103 dB FS as a proportion of the total signal amplitude as the test
signal falls.
Below –50 dB FS the line deviates from this straight relationship. This devi-
ation downward is not because the noise level measured within the THD+N
measurement is falling for signal levels below –50 dB FS but because the con-
tribution to the total signal level from wide-band noise has increased. Without
the signal present the wide-band noise on the DAC output is at –63 dB FS, so
a reading of the level of the DAC output signal with a tone at –60 dB FS will
be the sum of that tone level and the wide-band noise. This can be calculated
as:
The increase of 1.8 dB in the total level will cause a reduction by the same
amount in the dB ratio (representing THD+N amplitude divided by total out-
put amplitude) and will account for the deviation from linearity at input level
(–60 dB FS,–45.8 dB). This error bottoms out at –40 dB, which represents the
ratio of noise in the THD+N reading—which is band-limited to 20 kHz—to
the wide-band noise (measured by the level meters prior to the notch filter) in
the test equipment. (The Audio Precision S-AES17 filter option can be used to
limit the level meter bandwidth to 20 kHz. If that filter is used then the ratio
will “bottom out” at a much lower signal level.)
Another method of showing the effect being measured avoids this confu-
sion. It is also more useful in that deviations in the THD+N level are more ob-
vious. This is shown in Figure 88.
-90
-91
Figure 88. THD+N -92
-93
measured as amplitude vs. -94
-95
generator amplitude. -96
-97
-98
d
B -99
-100
F -101
S
-102
-103
-104
-105
-106
-107
-108
-109
-110
-80 -75 -70 -65 -60 -55 -50 -45 -40 -35 -30 -25 -20 -15 -10 -5 0
dB FS
The measurement technique is the same but the Y-axis is plotted as a level,
rather than a ratio. As the range of Y-axis values is reduced this amplitude
scale can be more expanded than the ratio scale.
For most of the graph the THD+N amplitude measurement is flat at
–103.7 dB FS. This indicates that noise components are independent of signal
level over this range and that the harmonic components are insignificant. For
X-axis values (test signal generator amplitudes) above –30 dB FS the THD+N
amplitude starts to increase. This is a consequence of the high-level non-
linearities of the DAC being tested beginning to contribute harmonic distortion
to the total reading.
It is useful to examine the FFT amplitude spectrum for specific input condi-
tions. The traces in Figures 89, 90, and 91 were all made with the procedure
“d-a THDN output fft.apb.” They examine the spectrum of the same DAC out-
put, applying a 997 Hz sine wave at –1 dB FS as a test signal. The THD+N un-
der these conditions is –99.33 dB FS.
0
-5
Figure 89. FFT of THD+N -10
-15
-20
test output, linear scale. -25
-30
-35
-40
-45
-50
-55
-60
d -65
-70
B -75
-80
F -85
S -90
-95
-100
-105
-110
-115
-120
-125
-130
-135
-140
-145
-150
-155
-160
0 2k 4k 6k 8k 10k 12k 14k 16k 18k 20k 22k 24k 26k 28k 30k 32k
Hz
Figure 89 shows two main distortion components at the 2nd and 3rd har-
monic frequencies at –104 dB FS and –110 dB FS. There are also signs of 5th
and 6th harmonics at –125 dB FS and –128 dB FS. These measurements repre-
sent a very high linearity in the DAC being tested, which is possible in even
fairly inexpensive devices.
This measurement is also testing the linearity of the Audio Precision mea-
surement ADC being used to digitize the DAC output for the FFT. In the case
of System Two Cascade the distortion specification for the highest-perfor-
mance ADC is –105 dB, which is more accurate than—but still comparable
with—the device being tested, so there is some uncertainty about the results.
Another approach is to use the output of the analog notch filter as the input
for the measurement ADC. This notch reduces the peak level of the remaining
signal so significant gain can be applied in front of the ADC without clipping
the signal. Test equipment can auto-range to take advantage of this directly.
The graphs in Figures 90 and 91 were produced by taking the signal on the
output of the notch filter.
In the absence of the main tone component the Cascade’s higher-bandwidth
ADC can be used to give a picture of more of the frequency spectrum. This is
now possible (even though the higher bandwidth measurement ADC has a
poorer linearity than the device under test) because the removal of the high-
level tone by the notch filter allows the residual signal to be presented at a
much higher level to the ADC input. This is done so that errors due to ADC
non-linearity will be at a much lower level with respect to the residual. Also,
in the absence of the original tone, no harmonics will be produced by the
measurement converter.
The higher bandwidth of this measurement reveals a rising noise floor and
images of the test tone at a distance of 1 kHz on either side of 96 kHz (These
are “images” of the test tone frequency.) The rising noise floor is a characteris-
tic of the delta-sigma converter architecture and the images are indicative of a
0
-5
Figure 90. FFT of THD+N -10
-15
-20
test notch filter output, linear -25
-30
scale. -35
-40
-45
-50
-55
-60
d -65
-70
B -75
-80
F -85
S -90
-95
-100
-105
-110
-115
-120
-125
-130
-135
-140
-145
-150
-155
-160
0 10k 20k 30k 40k 50k 60k 70k 80k 90k 100k 110k 120k 130k
Hz
Note: You may notice that the level of the underlying noise of
the higher bandwidth plots is 6 dB higher than from the FFT
taken by the lower bandwidth high-precision converter. This
is not caused by the difference in converter precision but is
due to the four-times-higher sample rate. Each FFT bin rep-
resents four times the bandwidth, and therefore has
10×log (4) = 6 dB more noise.
This plot can be used to estimate the noise level as distinct from the discrete
harmonic components. The number of points in the plot matches the number
of points in the FFT output, so every FFT point has been plotted. This means
that an estimate of noise density can be made. The window scaling factor for
the equiripple window is 2.63191, and sample rate is 262144 Hz, so the factor
for conversion to noise density is:
Noise density correction
æ 1 FFTpoints ö
= 10×logç
ç × ÷
÷dB
èWindowScaling SamplingFrequency ø
æ 1 16384 ö
= 10×logç × ÷
è 2.63191 262144 ø
= 16.24 dB.
Intermodulation Distortion
Another conventional method of measuring non-linearity is to use two input
tones and measure the discrete intermodulation products that are produced.
This is a twin-tone intermodulation test.
For a pair of frequencies F1 and F2, the effect of non-linearities is to pro-
duce harmonic and intermodulation products at the following frequencies:
sured within the passband of the system; some of the difference frequency
products are at lower frequencies than the stimulating tones.
There are several styles of twin-tone signals. The SMPTE RP120-183 and
DIN45403 tests each use one high and one low frequency. The AES17 stan-
dard IMD test signal uses two high frequencies, one at the “upper band edge”
frequency (normally 20 kHz), and another at 2 kHz below that frequency. (For
most systems the upper band edge is defined in AES17 as 20 kHz, but it may
be lower than this for systems with sample frequencies less than 44.1 kHz.)
The level of the twin-tone is specified for the AES17 test to peak at full
scale. This is an rms level of –6.02 dB FS for each tone, with a total rms level
of –3.01 dB FS.
20 kHz and 18 kHz input tones will produce the following intermodulation
difference frequencies:
The in-band products up to the 4th order are at 2 kHz, 4 kHz and 16 kHz.
AES17 specifies that the measurement is of the ratio of the total output level to
the rms sum of the 2nd- and 3rd-order difference frequency components on the
output.
0
-5
Figure 92. IMD test using -10
-15
-20
FFT. -25
-30
-35
-40
-45
-50
-55
-60
d -65
-70
B -75
-80
F -85
S -90
-95
-100
-105
-110
-115
-120
-125
-130
-135
-140
-145
-150
-155
-160
0 2k 4k 6k 8k 10k 12k 14k 16k 18k 20k 22k 24k 26k 28k 30k 32k
Hz
Figure 92 shows the spectrum of the DAC output when stimulated by the
twin-tone. The Y-axis has been calibrated to the level of a discrete sine wave
in dB FS. The procedure “d-a imd_fft.apb” produces the graph and calculates
the IMD product amplitudes from the FFT by summing the spectral power den-
sity around each spectral component. The levels of each component and the
IMD ratio are tabulated in Figure 93.
====================================================
FFT of intermodulation distortion output
====================================================
-108.24 dB FS 2nd Order Difference product
-98.73 dB FS 3rd Order Difference product
-98.27 dB FS Sum of 2nd and lower 3rd order difference products
-3.07 dB FS total signal level
-95.20 dB IMD ratio
-107.41 dB analog analyzer DFD 2nd order component
lated with the input signal. For delta-sigma converters, the error can have
strong discrete frequency components at a frequency related to the instanta-
neous, or DC, level of the signal. The solution is to add dither.
Ideally, dither randomizes the quantization error so that it is has the charac-
ter of white noise at a constant level. The ideal application of dither at all possi-
ble stages, however, is often not practical. Any compromises that must be
made can be evaluated by the measurement of the amplitude of low-level
distortion products.
The procedure “d-a low level distortion fft.apb” illustrates how these prod-
ucts can be investigated. The method is similar to making a measurement of
the noise floor spectrum described earlier (“d-a noise floor FFT.apb”), but as
we are now interested in the amplitude of specific components rather than
noise density, the following changes are made:
-90
-100
d
B -110
r
-120
A
-130
-140
-150
2k 4k 6k 8k 10k 12k 14k 16k 18k 20k
Hz
level of the dither. These components are each at least 15 dB below the total
unweighted noise floor, but, not being masked by the uncorrelated noise or by
the low-level signal, they are audible.
In contrast, the upper trace was measured with the correct amplitude of
dither (16-bit) and shows no harmonics. The amplitude of the noise floor is, of
course, higher.
The sensitivity of this test can be increased by expanding the vertical scale,
and increasing the number of averages.
Noise Modulation
It is possible for noise or dither to have decorrelated the truncation error
from the input signal, but not to have decorrelated the truncation error power
from the input signal. For a simple example, truncation error power might be
minimized if the mean signal level is centered between quantization decision
points, while it would be maximized when the signal is closer to a decision
point. This is illustrated in Figure 118 in the annex on dither. The positive half
of the waveform approaches the decision point between the 0 and 1 level (at
0.5 LSB), and the dither causes the quantizer to switch frequently between
those two levels. This is in contrast with the negative half of the waveform,
which is close to midway between decision points where the dither is much
less likely to cause the output to change.
This correlation of truncation error power with signal is a form of noise
modulation.
A simple test for this would be to measure the noise or noise spectrum for a
low-level tone with various DC levels. The idle channel FFT spectra measure-
ment discussed on page 107 would also reveal broad-band noise fluctuations.
When performing an FFT, if the variation in noise level is small it may be
swamped by the statistical variation of the FFT noise floor. In this case, it is
possible to use FFT power averaging to reduce the statistical variation.
A swept bandpass filter measurement may also be used. AES17 recom-
mends a using 41 Hz stimulus at –40 dB FS, notching the stimulus out of the
results, applying a series of one-third octave bandpass filters and measuring
the noise in each band. The stimulus is then dropped by 10 dB and a new set
of measurements is taken. This process is repeated until a family of measure-
ments is completed.
Jitter Modulation
The theory of sampling jitter in a DAC is described in the Jitter Theory
chapter.
Jitter is the error in the timing of a regular event, such as a clock. The jitter
transfer function indicates the relation between the jitter of an external synchro-
nization input and the jitter of a device. The intrinsic jitter of the device is that
element of jitter that is independent of any external clock synchronization
input.
The jitter of the clock that determines the DAC reconstruction timing is
called sampling jitter. It is the only jitter that has any effect on digital-to-ana-
log conversion (DAC) performance. Jitter in other clocks may, or may not, be
indicative of the jitter on the sampling clock.
The direct connection of test probes to a sampling clock inside an DAC may
be possible, but measurements using this technique are beyond the scope of
this article. Measurements of the effect of this jitter on the audio signal are con-
sidered instead.
The jitter will produce modulation sidebands above and below the audio sig-
nal frequency. At each jitter frequency, the amplitude of the lower (frequency)
sideband is measured. (It is important to have good frequency resolution for
this measurement as the sidebands for low jitter frequencies will be close to
the 20 kHz tone.) The measurements are taken from an FFT using a high dy-
namic range window (specifically, Equiripple) and applying integration over
nearby bins.
d -60
B
r
-80
A
-100
-108.24
-120
-140
-160
0 2.5k 5k 7.5k 9.642k
10k 12.5k 15k 17.5k 20k
20k 22.5k 25k 27.5k 30k 32.5k
Hz
Figure 96 illustrates one of the FFT traces. The cursors are highlighting the
main component at 20 kHz and the lower sideband at 9.642 kHz. The sideband
amplitude is first calculated from theory, for amplitude of the applied jitter and
the main tone frequency. The difference between this calculated level and the
actual measured level is then plotted as the jitter gain. (The interface jitter
level used for this measurement was reduced to 0.05 UI, as the 0.125 UI set-
ting defined in the test causes the device to temporarily lose lock. This reduced
level of jitter stimulation means that the measurement is 8 dB less sensitive,
since the sidebands’ amplitudes are closer to the noise floor.)
Note that the “skirts” around the main 20 kHz component in Figure 96 are a
result of the jitter generation mechanism and are not jitter intrinsic to the con-
verter under test. These skirts disappear when the jitter generation is disabled.
+10
Figure 97. DAC “D” jitter
+5
transfer function.
0
-5
-10
d
-15
B
-20
-25
-30
-35
-40
100 200 500 1k 2k 5k 10k 20k 30k
Hz
Figure 97 shows the total measured jitter transfer function using this proce-
dure. This shows that there is approximately 1 dB of jitter peaking at around
700 Hz with jitter attenuation above 800 Hz. The slope above 2 kHz is about
40 dB per decade, indicating a second-order response. This device has signifi-
cant audio-band jitter attenuation, but as this is only true for higher jitter fre-
quencies it remains susceptible to lower-frequency jitter. (This compares with
some converters which have 60 dB attenuation at 500 Hz in order to ensure
that modulation sidebands cannot approach audibility.)
The plot in Figure 98 shows the measurement of another DAC. In this case
there is no significant jitter attenuation in the audio band.
transfer function.
0
-10
d
B
-20
-30
-40
100 200 500 1k 2k 5k 10k 20k 30k
Hz
The upper frequency limit for this measurement is set by the maximum side-
band offset that can be achieved within the measurement band. Any analog
band-limit filter after the converter may affect measurement bandwidth. In the
case of a 20 kHz bandwidth DAC with an analog anti-image filter, the maxi-
mum frequency offset is just under 40 kHz with the stimulus tone at 20 kHz.
The highest jitter frequency plotted by this procedure is 39 kHz. This produces
a sideband at:
20 kHz - 39 kHz = 19 kHz.
The lower jitter frequency limit for the measurement is set by the frequency
resolution of the FFT. This procedure can use a 32768 point FFT with an
equiripple window, which sets the lower jitter frequency measurement limit to
about 15 Hz.
The lower jitter frequency limit selected for this particular measurement has
been set at 100 Hz. As this does not require such high frequency resolution,
this allows a reduction in the number of FFT points to 8192 with the benefit of
increased processing speed.
d -60
B
r -80
A -100
-120
-140
-160
0 5k 10k 15k 20k 25k 30k
Hz
in the middle of the band (such as 10 kHz) to look for symmetry in the skirts,
which is an indicator that modulation effects are being observed.
The procedure measures the amplitude of each bin between DC and the stim-
ulus tone, and calculates the frequency and level of jitter that would be re-
quired to produce this bin amplitude through jitter modulation of the 20 kHz
tone. This produces a plot of potential intrinsic jitter versus jitter frequency.
This is shown on Figure 100 as a jitter density, plotted as the lower line, in
gray. This line is calibrated in rms seconds of jitter per root hertz on the left
axis.
100p 1n
Figure 100. Calculated
intrinsic jitter per root Hz, s 50p 500p
e
integrated to 20 kHz. c
/ 20p 200p
r s
o 10p 100p e
o c
t 5p 50p
H
z 2p 20p
1p 10p
20 50 100 200 500 1k 2k 5k 10k 20k
Jitter frequency in Hz
The integration of this jitter density to determine the total jitter is not a sim-
ple task to do from the graph. The integrated jitter curve is shown to simplify
this task. This upper curve in black represents the total jitter measured from
the frequency on the X axis to the right-hand limit of the graph. This shows,
for example, that the total jitter above 1 kHz is just over 350 ps, and above
200 Hz it is about 225 ps.
The interpretation of the original FFT into sampling jitter should be treated
carefully. However, as an indicator of the upper limit of possible sampling jit-
ter spectral density, it is a very sensitive tool.
In this example an examination of the original FFT is useful in judging the
reliability of the result. Components within 5 kHz of the main 20 kHz tone ap-
pear to be symmetrical and so are likely to be caused by modulation, such as
jitter. On the other hand, the components around 4 kHz and 12 kHz are less
likely to be jitter. These are offset by 8 kHz and 16 kHz from the main tone.
The component 8 kHz above the main tone, at 28 kHz, does not have the same
shape as that at 12 kHz and this lack of symmetry suggests that they are not
due to jitter modulation. Another source for these components should be
investigated.
Most of the jitter is in the region below 1 kHz. We know from an earlier
measurement of this DAC (DAC “D”) that the sampling jitter transfer function
of this part shows little or no attenuation below 1 kHz. It is therefore quite pos-
sible that the jitter being observed is sourced prior to the clock recovery cir-
cuit. It may be from the “data-jitter” on the interface signal or from
interference within the unit. “Data-jitter” from the interface could be investi-
gated using J-test (see the next section).
This analysis of this intrinsic jitter measurement could be compared with
the similar analysis in Analog-to-Digital Converter Measurements chapter.
2n
Figure 101. DAC “D” J-test 1n
“jitter” measurement. s 500p
e
c 200p
/ 100p
r
o 50p
o
t 20p
10p
H
z 5p
2p
1p
20 50 100 200 500 1k 2k 5k 10k
Jitter frequency in Hz
The integrated jitter result is not useful for this test because the components
appearing at the high-frequency part of the graph would dominate the result.
These are not due to jitter but a direct measurement of the low-level 250 Hz
square wave.
From this graph we can observe that the low-frequency jitter components
are identical in amplitude and frequency to the previous result in Figure 100,
and there is no sign of components at 250 Hz. Therefore, we can conclude that
in this test situation the unit is not showing jitter components due to data-jitter,
and that the low-frequency jitter components are originating elsewhere.
Jitter Tolerance
Another jitter test to perform on a DAC is to verify jitter tolerance. Once
again, refer to the chapter on Jitter Theory for more details.5
This can be performed in two ways.
0 10
Figure 102. DAC “A” jitter
-20 5
tolerance verification.
-40 2
d U
B -60 1 I
-80 500m
-100 200m
-120 100m
100 200 500 1k 2k 5k 10k 20k 50k 90k
Hz
This test and result are in the file “DAC A jitter tolerance.at2c.” This shows
two traces against the sweep of jitter frequency. The DIO jitter amplitude trace
in gray is showing the amplitude of applied jitter which falls from 5 UI (or
10 UI peak to peak) at 200 Hz down to 0.125 UI (0.25 UI pk-pk) at 8 kHz
which is the tolerance template required by the specification in AES3 and in
IEC60958-4. The EQ curve used by the jitter generator to follow this template
is installed with APWIN as “APWIN\EQ\jittol.adq.”
The black THD+N trace, which uses peak detection and a 2 second wait for
settling, reveals if any errors have been generated. Though the THD+N trace
varies (it rises as the jitter-modulated sidebands come out of the THD+N notch
close to the stimulus tone frequency, and then falls as the jitter attenuation
starts to reduce the sidebands above 5 kHz), there are no signs of errors. So the
device—DAC “A”—has passed the test.
The interface carries the data assocsiated with each audio sample in a 32-bit
subframe. Each subframe begins with a synchronization pattern called a pream-
ble that has a duration of 4 bits. The preambles are followed by the audio data,
BLOCK (192 FRAMES)
AES3
Data Stream
AES3
Preamble
3 UI 1 UI 1 UI 3 UI 3 UI 2 UI 1 UI 2 UI 3 UI 3 UI 1 UI 1 UI 3 UI 2 UI 1 UI 2 UI
Patterns
Z (B) PREAMBLE Y (W) PREAMBLE X (M) PREAMBLE Y (W) PREAMBLE
Unit Interval (UI)
Time Reference
which is in turn followed by four bits of metadata at the end of each subframe.
The first bit of metadata is bit 28, the validity bit; next is the user data bit; then
the channel status bit; finally, the parity bit, bit 31.
A frame consists of two subsequent, associated subframes; the frame rate
normally corresponds to the source sampling frequency. In the most common
implementations, subframe 1 carries the information for audio channel 1, and
subframe 2 carries audio channel 2.
Byte Bit 0 1 2 3 4 5 6 7
0 a b c d e
1 f g
2 h i r
3 j
4 k r
5 r
6
7 Alphanumeric channel origin data
8
9
10
11 Alphanumeric channel destination data
12
13
14
15 Local sample address code
16 (32-bit binary)
17
18
19 Time-of-day sample address code
20 (32-bit binary)
21
22 Reliability flags
0 34 27 28 31
Preamble LSB 24-bit audio sample word MSB V U C P
0 34 7 8 27 28 31
Preamble Aux LSB 20-bit audio sample word MSB V U C P
Figure 106. AES3 24-bit audio word, and 20-bit audio word with 4-bit auxiliary sample word.
within the 20- or 24-bit audio data word. A table showing the status flags is in
Figure 108.
However, as there are very few applications for using the lower 4 bits of
auxiliary audio data for anything else, it is quite common for receiving devices
to roll up those bits into the audio no matter what the indication in the channel
status. In case that field is actually to be used for any other purpose, it is useful
to be able to verify how the DAC reacts to the status controls.
1 0 1 20 bits
1 0 1 20 bits
All other states of bits 3–5 are reserved All other states of bits 1–3 are reserved
x x x x x x and are not to be used until further defined.
and are not to be used until further defined.
Figure 107. Professional Channel Word Figure 108. Consumer Channel Word Length
Length Status Indication. Status Indication.
This can be done by manipulating the channel status pattern being sent to
the DAC for indicating that the field is not part of the main audio word and
then measuring to see if, despite this, the analog output is responding to the
data in the auxiliary audio field.
An audio signal can be set to modulate only the auxiliary audio field by set-
ting a sine wave of amplitude –121 dB FS (±7.5 LSBs) and with a positive off-
set of –120.6 dB FS (+7.8 LSBs) with no dither. This covers the range of
values from 0.3 LSBs to 15.3 LSBs which will keep the most significant 20
bits at zero. The output level at the sine wave frequency can then be measured
and compared with the level with a muted output, and with the output without
the DC offset. These measurements are performed by the procedure “d-a aux
truncation test.apb.”
The results shown in Figure 109 do not indicate truncation. They show that
there is a small change in level with the DC offset from –118.6 dB FS to
–120.6 dB FS. This indicates that there is a small problem with non-linearity
that should be investigated, but the variation is not enough to suggest that the
modulation in the auxiliary data is being ignored. The level measured with a
muted input is significantly below both these readings.
==============================================================
Test for truncation of auxiliary audio
==============================================================
Test signal level: -121.0 dB FS
If the data in the four LSBs were being ignored, the readings would show a
significantly lower level (probably the same level as for the muted input) for
the signal with offset, which is only carried in the aux bits.
if the incoming status pattern is in a “sample rate not indicated” state. This
may be inappropriate in many applications and should be checked.
There may also be situations where the channel status indication of sample
rate is incorrect but correct operation of the converter is desirable. This can be
verified by setting the indication to an incorrect value and observing operation.
Emphasis Flags
Some digital audio recordings and systems use emphasis and deemphasis to-
gether at the encoder and decoder (ADC and DAC).
Emphasis amplifies the higher frequencies in a signal prior to encoding.
When the signal is decoded the inverse response is applied. This attenuates the
higher frequencies in the signal so that the total response is flat.
There are two standard emphasis curves:
§ The Compact Disc format permits the use of emphasis, which has a dif-
ferent frequency characteristic from J-17. There are, however, very few
CDs recorded with emphasis. Recording engineers have problems manag-
ing the frequency-dependent headroom that results from using an empha-
sized recording chain. Also, some CD players and DACs apparently do
not correctly deemphasize the signal on replay, and on those devices an
emphasized recording would replay incorrectly.
The deemphasis characteristic of the DAC can be verified using the tech-
nique described in the section on passband deviation on page 92. This uses the
procedure “d-a passband.apb.” Before running the procedure select the empha-
sis characteristic required. This can be done in APWIN using the Digital I/O
panel “PreEmphasis” selection. The transmitted channel status pattern should
also be configured to indicate the emphasis selected, in order to activate the
deemphasis circuit in the DAC.
+0.2
Figure 110. DAC “D”
deemphasis response +0.15
deviation.
+0.1
d
+0.05
B
-0.05
-0.1
10 20 50 100 200 500 1k 2k 5k 10k 20k
Hz
+10
Figure 111. Measurement of
emphasized signal without +8
deemphasis. +6
d
+4
B
+2
-2
10 20 50 100 200 500 1k 2k 5k 10k 20k
Hz
This graph has a small deviation which confirms that the characteristic has
been selected, but the deephasis has a response error of almost 0.2 dB at
7 kHz.
If the DAC was not performing deemphasis, the plot would show the
preemphasis characteristic for CD as shown in Figure 111.
The J-17 emphasis characteristic has more high-frequency boost starting at
a lower frequency, but the same technique can be applied.
DITHER ANNEX
Dither is used to make a digital signal behave more like an analog signal.
Without dither, the quantization error which results from sampling a low-
level signal varies with the signal. This variation is very unnatural, and it
changes the nature of low-level signals in an obvious manner. A good example
of this is a decaying piano note, which sounds very distorted before it disap-
pears—possibly very suddenly. It is preferable for the error to be noise-like,
uncorrelated to the audio signal. Dither is used to achieve this.
Dither is a small noise-like signal added to the input. This presents a ran-
dom component to the quantizer. When the signal is between two quantization
levels, the quantizer then selects either of the two levels in proportion to how
close they are to the (dithered) input level, which results in the average output
level matching the input level. There is still quantization error, but now it is
random. It is decorrelated from the signal, and the distortion and modulation ef-
fects of the correlation can be minimized.
3
2
1
0
-1
When a signal is quantized with dither, it has the effect of making the resul-
tant signal sound like the original, at the price of added noise.
The mechanism is illustrated in Figure 112. The input signal is a simple
ramp, and the input samples are shown as black dots. The quantization pro-
cess, in this example, takes the nearest integer value of least significant bits
(LSBs). For most of the first millisecond the output of the quantizer (shown in
black) remains constant at 3 LSB, even though the input signal is rising. At
about 0.9 ms the input signal crosses the quantizing decision threshold at
3.5 LSB, and the output jumps to the next level, 4 LSB. (Other quantizers may
simply take the largest integer that is not more positive than the input, but the
difference is only a DC offset of 0.5 LSB.)
Notice the pattern of the short black lines, which represent the output of the
quantizer. The ramp has been converted into a staircase. The quantization er-
ror—the difference between input and output—is displayed in gray. This
shows a regular sawtooth of amplitude 1 LSB. The error is highly correlated
with the signal, with a slope equal and opposite to the slope of the signal. This
correlation is undesirable as it produces strange tonal components to the
background noise floor.
0.5
-0.5
-1
-1.5
RPDF TPDF
1 1
0 0
1 0 1 1 0 1
Dither value / LSB Dither value / LSB
The two forms of dither that are commonly used are called Rectangular
Probability Density Function (RPDF) dither and Triangular Probability Den-
sity Function (TPDF) dither. The distributions of their probable values are illus-
trated in Figures 113 and 114.
RPDF Dither
The RPDF dither value has an equal chance of falling anywhere in a range
that is 1 LSB wide. (The histograms shown in Figure 114 are uneven because
they are derived from 8192 samples of actual dither.)
The effect of the RPDF dither is shown in the following figure.
Value / LSB
3
2
1
0
-1
The extra uncertainty added by the dither means that for input values be-
tween two quantization levels the output is a mix of both levels. The dither has
the effect of making the relative number of output values from the quantization
levels above and below the input signal such that their mean value is the same
as that of the input. This means that the mean value of the error is zero (or, for
some quantizers, a DC level).
The average noise penalty for adding RPDF dither (at an otherwise perfect
quantization) is to double the noise “power,” a 3.01 dB increase.
Note that when the input signal is close to a quantization level, RPDF dither
has little or no effect. This means that the rms error is very much lower at
these points. The effect of this is signal-dependent noise modulation—which is
not ideal—so TPDF dither is often preferred.
TPDF Dither
TPDF dither is generated by adding two RPDF values together. This has the
effect of doubling the peak-to-peak variation but changing the shape of the dis-
tribution in such a way that the dither value is more likely to be in the middle
of the range.
3
2
1
0
-1
The TPDF dither has the effect of making the rms error independent of sig-
nal level (as well as maintaining the property of the RPDF to make the mean
error zero). This is the form of dither recommended for most measurement ap-
plications. The average noise penalty for adding TPDF dither (at an otherwise
perfect quantization) is to triple the noise “power,” a 4.77 dB increase. This is
1.76 dB more than RPDF dither.
Shaped dithers
An important variation of TPDF dither is called high-pass dither. High-pass
dither is not spectrally flat but is weighted towards high frequencies. This has
the advantage that the resulting quantization noise has a slightly lower audibil-
ity; also, the generation of high-pass TPDF dither is slightly simpler than flat
TPDF dither. It is very commonly used.
The mathematical properties of this dither do not completely decorrelate the
rms error from the signal level, leaving a very small correlation which may be
important in some applications, such as within a recursive filter. For this rea-
son it is not used in all applications.
Shaped dithers, such as high-pass TPDF dither, are not to be confused with
noise shaping. Noise shaping affects the quantization noise of the data trunca-
tion as well as the dither, while shaped dithers cannot reduce the noise contri-
bution from the quantizer truncation itself. Noise shaping also has a higher
total noise penalty than shaped dither, so that even though the noise density is
lowered at some frequencies the total unweighted noise power in the fre-
quency range from DC to half the sampling frequency is increased.
Without dither there is no modulation on the output. The input signal does
not cross any quantization decision levels (for this quantizer the nearest are at
–0.5 and +0.5 LSB) so it has disappeared altogether. There is no noise.
Value / LSB
0
With TPDF dither, the output noise has increased so that now 3 different lev-
els are being used. Though not very obvious from the graph, the rms
quantization noise amplitude is no longer correlated with the signal. The out-
put of the quantizer sounds like a tone in a steady background of white noise.
The following APWIN Basic procedures are referred to or used in this Ap-
plication Note:
§ d-a Menu.apb
§ d-a Setup.at2c
These files are on the companion CD-ROM. Please check the
README.DOC file in the same folder for further information.
You may also download the files from the Audio Precision Web site at
audioprecision.com. Check the What’s New link for updated procedures.
These procedures and tests are designed for use with System Two Cascade,
but with minor changes can be modified to work with System Two as well.
References
Introduction
The AES31 and IEC609582,3,4 standards provide a common interface for digi-
tal audio signals. This chapter describes the interface and highlights some of
the aspects that may require measurement to verify conformance.
The interface defined in AES3 and IEC60958-4 is commonly called the
“professional standard” interface; IEC60958-3 defines the “consumer stan-
dard” interface.
There are a number of differences between the professional and consumer
standards which in some cases can render them completely incompatible. For
proper performance, the consumer and professional interfaces should not be
mixed. However, they are similar enough that in many situations, given the
right electrical connections, the embedded audio can carried from one standard
to the other.
By requiring conformance with these standards, a user of digital audio
equipment rightfully expects compatibility within devices adhering to a stan-
dard. Compatibility allows interconnecting the equipment without suffering
loss of performance or functionality—which is, after all, the aim of interface
standardization.
The digital audio interface carries three types of information:
§ timing information,
§ audio data, and
§ non-audio data.
Some of this information can be degraded by implementations of the inter-
face that conform to the standard but are not ideal. We shall consider aspects
Bi-phase coding
The simplest coding of binary pulse code modulation (PCM) audio data is
to code a “one” as a logic high, and a “zero” as a logic low. This is not an ideal
format electrically. Consider the case where all the bits are set to ones (or ze-
ros) for a period of time. Another signal—a bit clock—would be required to
identify the individual bits.
The coding used in this interface format is more sophisticated. This bi-
phase coding scheme has an embedded “bit clock” which can also be used to
recover the sampling frequency. Bi-phase coded PCM has a mean voltage of
zero, eliminating DC on the interface, with the result that the data can be AC-
coupled through a transformer or series capacitor. The coding works like this:
Each data bit has a time slot that begins with a transition and ends with a sec-
ond transition, which is also the beginning transition for the next time slot. If
the data bit is a “one,” an additional transition is made in the middle of the
time slot; a data “zero” has no additional transition.
Time slot 4 Time slot 5 Time slot 6 Time slot 7 Time slot 8 Time slot 9
0 0 1 0 1 0
1 UI 2 UI 3 UI 4 UI 5 UI 6 UI 7 UI 8 UI 9 UI 10 UI 11 UI 12 UI
by these regular transitions, the interface signal is now clearly AC, and the di-
rection of the transitions (or signal polarity) becomes irrelevant.
Unit interval
Many of the timing parameters on the interface are defined in terms of the
unit interval, or UI. This is the shortest nominal interval between transitions.
The bi-phase coding introduces a second transition (indicating data “one”) into
the time slot, which means that a time slot is defined as 2 UI wide, as shown in
Figure 120.
Framing
The data carried by the interface is transmitted serially. In order to identify
the assorted bits of information the data stream is divided into frames, each of
which are 64 time slots (or 128 UI) in length. Since the time slots correspond
with the data bits, the frame is often described as being 64 bits in length, but
the preamble sections (see below) break this correspondence.
Each frame consists of two subframes. Figure 121 shows an illustration of a
subframe, which consists of 32 time slots numbered 0 to 31. A subframe is 64
UI in length.
The first four time slots of each subframe carry the preamble information.
The preamble marks the subframe start and identifies the subframe type.
The next 24 time slots carry the audio sample data, which is transmitted in a
24-bit word with the least significant bit (LSB) first.
0 34 27 28 31
Preamble LSB 24-bit Audio sample word MSB V U C P
Figure 121. The AES3 subframe (24-bit audio data).
After the audio sample word there are four final time slots, which carry:
Preambles
A preamble is a distinctive data pattern carried in the first 4 time slots of a
subframe to mark subframe and block starts. There are three preambles, all of
which break the bi-phase coding rule by containing one or two pulses which
have a duration of 3 UI. This rule-breaking means that the pattern cannot oc-
cur anywhere else in the pulse stream.
Subframe 2 always begins with a Y preamble. Subframe 1 almost always be-
gins with an X preamble, with this exception: every 192 frames the X pream-
ble in subframe 1 is replaced with a Z preamble, which indicates a block start.
This provides framing for the information carried in the channel status
fields—the channel status block.
The interface signal is insensitive to polarity so the preambles can be found
to start with a falling transition:
X Subframe 1
Y Subframe 2
X Subframe 1
Y Subframe 2
Under the bi-phase coding rules there should be a transition between each
time slot; but the preambles, of course, each have two three-bit pulses, so for
each preamble there are two time slot boundaries without transitions. The first
of these bi-phase coding violations is in the same place for each preamble—af-
ter time slot 0. This identifies that a new subframe has started. The pattern that
follows then identifies the type of subframe.
The time slot numbers in Figures 122 and 123 correspond with the numbers
shown in Figure 121 and are 2 UI wide. The preambles are 8 UI wide and so
take the same amount of time as 4 bits.
Audio data
After the preamble the audio data is transmitted with the LSB first. For au-
dio word lengths less than 24 bits the data is justified to the most significant
bit (MSB) and zero-filled below the LSB, as shown in Figure 124.
0 34 11 12 27 28 31
Preamble ———Zeroes——— LSB 16-bit Audio sample word MSB V U C P
Figure 124. The AES3 subframe (16-bit audio data).
In some of the audio modes—those that transmit 20 or fewer bits of main au-
dio data—the first four bits after the preamble can be used for another signal
known as auxiliary audio data. This mode has the subframe structure of Fig-
ure 125. If this auxiliary data is used, then the channel status (see page 7)
should indicate that the maximum word length is 20 bits, and the receiver
should mask off the auxiliary audio field so that any values there are not added
to the main audio sample values. Unfortunately, many decoders are not that
sophisticated.
0 34 7 8 27 28 31
Preamble Aux LSB 20-bit Audio sample word MSB V U C P
Figure 125. The AES3 subframe (20-bit audio data with auxiliary data).
The use of auxiliary audio is very rare. One application is for voice commu-
nications, and AES3 suggests the auxiliary bits can be used for coordination or
talk-back purposes. One way of doing this is to transmit a 12-bit channel at a
sample rate of one-third of the frame rate. Other applications, like the use of
the auxiliary audio field for transmission of a data-compressed version of the
main audio signal may also be possible.
Validity bit
The validity bit was originally intended to somehow qualify the transmitted
data. If the bit is set then the data is identified as not suitable for conversion to
analog audio. However, there are some applications that set the validity bit if
an error has been found and concealed. This behavior is quite common for
Compact Disc players, for example.
This confusion as to the function of this bit means that it is not easy to de-
cide how a receiver should behave when a sample is marked as invalid.
When the IEC60958 or AES3 stream is used to transmit data that does not
represent linear PCM audio, then the bit should certainly be set. This has at
least a chance of causing linear PCM replay equipment to mute, which is pref-
erable to an attempt to reproduce the data as an audio signal.
The specifications for carrying data-compressed audio on AES3 or
IEC60958 require this bit to be set, so that linear PCM receiver devices will
recognize the need to mute. This has the potential benefit of stopping the re-
ceiver from producing a burst of high-level noise from the data before the
channel status pattern (which is only updated every 192 frames) can identify
the signal as not being linear PCM audio. This is indicated in channel status by
the non-audio bit.
User bit
The user bit can be used to carry user-specific information. In practice this
means application-specific information for consumer devices such as CD or
DCC.
The consumer specification, IEC60958-3, has defined a packet-based for-
mat for carrying program-related information in the user data stream and de-
fines rules for the preservation of the user data by various classes of
equipment.
In the consumer format the user data streams from subframe 1 and subframe
2 are combined to form one stream at 2 bits per frame. This means that for a
frame rate of 44.1 kHz the user data rate is 88200 bits/sec.
The professional specification, AES3 (and IEC60958-4) has channel status
patterns that allow the indication of various formats of user data, specifically:
Parity bit
The parity bit is used to maintain even parity for the data as a means of er-
ror detection. Specifically, even parity in the interface signal means that there
is an even number of mid-cell transitions in the data area, which spans time
slots 4 to 31. Since there is an even number of all other transitions, even parity
means that there is an even number of transitions in every frame.
Even parity has the effect of starting each subframe with a transition in the
same direction all the time. As a consequence, the transmitter of an AES3 or
IEC60958 stream does not need to calculate parity, and the receiver needs only
to verify (since the parity bit is the last bit of the subframe) that the state of the
second half of the parity bit is always the same as its state in the previous
subframe.
If an error occurs, it is most likely to be a pair of missing transitions, that is,
both edges of an individual pulse that was not detected. If a pair of transitions
are missing, the parity will not change, even though there was an error. In fact,
in many schemes for decoding the bitstream, a genuine parity error is
impossible.
However, a violation of the bi-phase coding could be detected at such a
point, since at least one of the missing transitions would be on the time slot
boundary. For this reason it is much more useful to check bi-phase coding vio-
lations in identifying errors than to use the parity bit.
Electrical properties
There are three basic electrical formats:
§ The balanced format. This is the primary professional format and is de-
fined in AES3.
Balanced format
This uses a shielded twisted pair cable to carry the interface signal differen-
tially and is normally coupled to equipment with a standard XLR connector.
(See IEC60268-12). It has the advantage that we can use cabling that is in com-
mon with analog interfaces. However, it can also result in confusion between
the two types of connections.
Though not required by AES3, many designs use small pulse transformers
at the receiver and transmitter. In the same way as the balanced interfacing ap-
plication for analog audio, the transformers offer advantages for reducing emis-
sions and susceptibility to inductive coupling as a consequence of the
improved current balance in the line.
Transformers are required by the EBU version of the specification, EBU
32505. This is motivated by the need to maintain a high common-mode imped-
ance at the cable terminations so that crosstalk is minimized. Crosstalk is of
particular concern for EBU members because of the large amount of cabling
run in parallel at broadcast installations.
Like the other electrical formats there is a requirement for the cable imped-
ance and the transmitter and receiver termination impedances to be matched.
In this case the nominal impedance is 110 W.
At the transmitter, the amplitude of the signal should be between 2 V and
7 V peak-to-peak with the output terminated. Without termination (assuming
conventional implementation of source impedance) the generator voltage
would be twice that. This can be driven from complementary outputs with
logic operating from 3.3 V or 5 V rails and using a 1:1 transformer. A line
driver circuit is shown here:
R
2
Though they were popular in the early applications for AES3, equalizers are
not used very often in modern designs. This may be because there is an expec-
tation that cable losses will not be as significant, perhaps because lower-loss
cable is used; and also because in most applications the cable length is quite
short—significantly less than 100 meters. In the early 1980s I found that with
the standard BBC-specified shielded twisted pair cable used for analog audio
signal distribution, it was possible to get reliable operation over 100 meters
without an equalizer, and that this could be extended to 250 meters with an
equalizer. Moreover, with short cable lengths, the equalizer can be a liability,
increasing the sensitivity to errors from cable reflections due to impedance mis-
match.
Unbalanced Format
The two unbalanced formats use a 75 W impedance- matched coaxial cable
for transmission.
The consumer version has a transmitted level of 0.5 V peak-to-peak and
uses the same kind of coaxial connecter (the RCA “phono”connector, defined
in 8.6 of Table IV of IEC 60268-11) that is used for consumer analog
connections.
The same kind of interface signal distortion occurs in the unbalanced ver-
sion as for the balanced version of the interface, so the eye diagram is also
used here in to assess and define signal levels and receiver characteristics.
Transmitter Interconnecting
Figure 131. Unbalanced format cable
Interconnecting Receiver
cable
Optical Format
In common use for the consumer format is an optical interface called
TOSLink®, after the version sold by Toshiba. This uses plastic multi-mode op-
tical fiber with a red light-emitting diode (LED) transmitter and a photo diode
receiver. The transmission distance is limited to less than a few yards (or me-
ters). IEC60958-3 has a section for defining this format but it is still “under
consideration.” As a result, methods of defining receiver and transmitter perfor-
mance do not have a benchmark to evaluate against.
There are two connector formats for the optical fiber. The older and more
widely used uses a friction lock connector type F-05 specified in IEC60874-
17, shown in Figure 132.
This connector is too large for portable audio equipment, so a coaxial con-
nector has been developed that appears quite similar to the electrical 3.5 mm
mini-jack plug used for personal stereo headphones. This is shown in Figure
133.
3.5 mm
The socket for this connector has the advantage that it can double-up as the
analog headphones jack and hence use no extra space on the equipment
surface.
Synchronization
The embedded clock defined by the interface bit-cell transitions, the sub-
frame and the frame boundaries can be used as a timing reference by equip-
ment to derive timing for converters, processors, and digital outputs. For digi-
tal outputs AES11 defines limits for the timing offset between the frames of
the reference input signal and the frames of the outputs.
In some cases the timing reference is provided by another signal or clock,
and the incoming signal needs to have been already synchronized to that clock.
AES11 defines a specification for this sort of synchronization, and It also cov-
ers synchronization of the digital audio interface with video signals. For fur-
ther details see the Synchronization sidebar, which begins on page 181.
by INTERVU. Black is
unterminated, gray is terminated. 2.5
V 0
–2.5
–5.0
–7.5
–1u 0 1u 2u
sec
This test result and setup is in the file “output term test back to back.at2c.”
This shows very close to a 2:1 ratio between the two traces, as expected with
correct source termination. The higher amplitude waveform, shown here in
black, is the unterminated one.
I have chosen to look at the part of the waveform near a preamble. The
static pattern around the preamble makes it easy to make direct comparisons.
In APWIN this is determined by selecting a preamble as the trigger.
In comparison, Figure 135 (from “output term test DAT.at2c”) shows the
same measurement on the AES3 output of a DAT machine. The unterminated
and terminated waveforms are different shapes. There is not a consistent 2:1 ra-
tio in amplitude.
7.5
terminated. 2.5
V 0
–2.5
–5.0
–7.5
–1u 0 1u 2u
sec
STANDARDS
There are several published standards documents that either define basically the
same digital audio interface as AES3, or the similar consumer-targeted equivalent, in
IEC60958-3. There are also standards that are used in conjunction with these interfaces.
IEC60958:1989
(previously known as IEC958:1989)
This has been replaced by the multi-part document, IEC60958-n. It defined both the
professional and consumer applications and the two-connector types for electrical con-
nection. By accident it did not require that the professional format used the XLR and the
consumer format used the coaxial connection.
IEC60958-1, -3, and -4
The revision of IEC958:1989 involved splitting the standard into three parts. Part 1
covers the aspects common to both consumer (which is in part 3) and professional (part
4) applications. As the document has a different reference it is also Edition 1—which
may be confusing.
1
AES3-1992
The primary definition of the professional format is in this document. This under-
goes regular revision by amendment or new edition. It is possible for interested parties
to contribute to this process by joining the working group on digital input/output inter-
facing, SC-02-02. Further information on joining AES standards working groups is
available at http://www.aessc.aes.org.
5
EBU 3250 (Ed. 2, 1992)
This document has been produced by the European Broadcasting Union (EBU). It
is similar to AES3-1992 (without the amendments) apart from one key difference—the
EBU document specifies that transformers shall be used between the cable connection
and the receiver and transmitter electronics. Transformers are optional for AES3.
8
ITU-R BS647-2 (1992)
This is very similar to EBU3250. The International Telecommunications Union
(ITU) is an intergovernmental organization that is part of the UN.
4
IEC60958-4 (Ed. 1, 1999)
This part of IEC60958 defines the professional interface. At the time of writing
(early 2001) it is similar to AES3-1992 with amendments 1 (1997) and 2 (1998) but not
amendments 3 or 4 (both 1999); the key difference is that it does not support sampling
frequencies above 48 kHz. There is an amendment in process within IEC to rectify this.
Technical Report IEC60958-2:1994
(or IEC958-2:1994)
This document is not a standard. The specification describes a method of carrying
software information in the channel status stream of the consumer application of
IEC60958:1989. It uses the setting of the channel status mode field in byte 0 of the chan-
nel status block to distinguish this use of the channel status block. Originally it was pro-
posed as an amendment to the IEC958:1989 standard but was rejected. With the
conversion of that standard to a three-part standard, with parts called IEC60958-1,
IEC60958-3, and IEC60958-4, this document appears—at first sight—to be part 2 of a
4-part standard. That is not the case.
3
IEC60958-3 (Ed. 1, 1999)
This part of IEC60958 defines the consumer interface in all respects except the opti-
cal interface. At the time of writing (early 2001) it does not support sampling frequen-
cies above 48 kHz. There is an amendment in process within IEC to rectify this.
11 12
AES-3id 1995 and SMPTE 276M-1995
These two documents both define a variant of AES3 that is transmitted over 75 W
coaxial cable at a level of 1 V (peak to peak). The impedance and level are chosen to be
compatible with broadcast analog video interfacing and allow the use of some of the
same cabling and routing infrastructure.
The two specifications are different. The SMPTE specification has tighter toler-
ances for some parameters and intended for use with dedicated interfaces on equipment
for high performance. AES3-id has more relaxed specifications that permit use with pas-
sive converters between the 110 W balanced and the 75 W coaxial formats.
6
AES11-1991 (Synchronization)
This standard defines rules to be followed to ensure synchronization of digital au-
dio equipment together and with video. A special AES3 signal is used to distribute a tim-
ing reference from the synchronization (clock) master to all the other (slave) devices in
the synchronized system. This timing reference is called a Digital Audio Reference
Signal (DARS).
mode reference point, such as the cable shield or chassis, and dividing by two.
This summing can be done quite easily in most two-channel oscilloscopes.
Where the output is coupled via transformers (as illustrated in Figure 127)
the common-mode impedance will be high, and a measurement of common-
mode voltage at the output terminals will be sensitive to the impedance bal-
ance of the connected measurement device. Any such measurement will have
limited accuracy and a limited relationship to the impedance balance, which is
the relevant factor for crosstalk.
It is recommended that a matching pair of high-impedance oscilloscope
probes are used to make a measurement of the output port balance in the man-
ner defined by AES3. This should minimize the effect of their load on the mea-
surement. Figure 136 shows a measurement of the differential and common-
mode parts of a signal made in the way. The traces show the sum and differ-
ence signals derived from the two input channels. The sum signal, shown in
black, is twice the common-mode voltage. The difference signal, shown in
gray , is the differential-mode voltage.
By inspection of this measurement we see that transitions in the differential
signal are accompanied by high-frequency disturbances on the trace of the
common-mode signal, shown as the black trace. The main spectral compo-
nents of these disturbances have an amplitude of up to 0.4 V peak-to-peak (rep-
resenting a common-mode amplitude of 0.2 V peak-to-peak) and a period of
less than 20 ns (a frequency of greater than 50 MHz). This is close to 30 dB be-
low the differential signal (at 4.5 V peak-to-peak) but the significant frequency
components fall above the DC-to-128 Fs (6 MHz) range in the AES specifica-
tion, so this output passes the balance specification.
§ The common-mode and differential impedances are the same, so the ratio
of common-mode to differential-mode signals is the same for voltage
and for current.
§ The significance of the result is comparable for output ports with high or
low common-mode source impedances, i.e., ports with or without trans-
formers.
The accuracy of this measurement depends on the matching of the two 55 W
resistances to much better than the balance ratio that is being measured. Apart
from that matching, the precision of the resistors is not critical to the measure-
ment accuracy and need be no better than 2%.
Transition times
The speed of the transitions on the interface can be measured using an oscil-
loscope. Digital audio test sets, such as System Two Cascade, can also be
used. System Two Cascade has a sampling frequency of 80 MHz and a band-
width of 30 MHz when using the INTERVU software. This is fast enough to
get a reasonably accurate measurement for typical AES3 waveforms, and the
result is illustrated in Figure 137.
3
1.5
machine. 1
500m
V 0
-500m
-1
-1.5
-2
-2.5
-3
-50n -25n 0 25n 50n 75n 100n 125n 150n 175n 200n
sec
The rise and fall transition times are defined as the time between the 10%
and 90% amplitude points. In the case of this figure, with an amplitude of ap-
proximately 5 V, the 10% and 90% points are at about 0.5 V away from the
low and high state values. The transition times on this trace appear to be be-
tween 15 ns and 20 ns. This is slightly faster than we can measure reliably
with the Cascade.
An oscilloscope trace is shown in Figure 138. The oscilloscope used for this
trace uses digital sampling at a rate of 1 GS/s with a signal bandwidth of
60 MHz. The two channels of the oscilloscope are used together, with one
trace displaying the differential signal (with channel two inverted and summed
with channel one).
Figure 138.
Oscilloscope view of the
interface waveform at
the output of a DAT
machine. Channel two
has been inverted and
summed with channel
one.
The cursors have been set manually at the 10% and 90% points of the rising
transition and the time separation of the cursors (delta) is 12 ns.
Figure 139.
Oscilloscope view of the
interface waveform at
the output of a DAT
machine. The two
oscilloscope channels
are displayed
independently.
This oscilloscope also gives direct readings of rise and fall time on the indi-
vidual channels. Those are consistent with this result, but fluctuate with indi-
vidual traces. This is shown in Figure 139.
Intrinsic Jitter
The theory behind jitter is explained in Jitter Theory, beginning page 3.
There it is explained that the jitter on a digital interface output port is specified
through two distinct measurements: the measurement of jitter produced by the
device (the intrinsic jitter) and the conformance of the output signal with the
jitter transfer function (which specifies the amount of jitter being passed
through from an external synchronization source).
The intrinsic jitter of a device may well depend on the synchronization
mode of the device. If selected as a clock master, the device may use an inter-
nal clock with one level of intrinsic jitter. If the device is selected as a clock
slave and is locked to an external source, a different circuit will be used, and
that circuit may have a different intrinsic jitter measurement. In addition, the
Taking into account this residual jitter, we can conclude that the output jitter
of the DUT is 9±1.5 ns peak-to-peak. This simple and direct measurement is
useful indicator of the jitter level, but has disadvantages:
§ The measurement can only be made when the output port timing is
slaved from a known low-jitter reference.
§ The deviation of the transitions from the mean is not clear. If the jitter is
an asymmetric, then the peak deviation from the mean will be not be sim-
ply half the peak-to-peak deviation. We need to evaluate the maximum
excursion of timing deviation from the mean, as that is what relates to in-
terface error mechanisms.
The AES3 intrinsic jitter specification (also in IEC60958-3 and IEC60958-
4) is written for a different measurement method. This specification uses a jit-
ter meter that compares the timing of the input transitions with a clock derived
from the same signal, but with a defined jitter attenuation characteristic. This
combination has the effect of producing a measurement that can meet the de-
fined high-pass characteristic with a 3 dB corner frequency at 700 Hz.
140. 200p
100p
s
e 50p
c
20p
10p
5p
2p
1p
60 100 200 500 1k 2k 5k 10k 20k 50k 100k 300k
Hz
These jitter meters are becoming available on digital audio test equipment
and are provided in the Audio Precision range of instruments.
The same signal that is shown in Figure 140 produces a jitter measurement
of 3.3 ns peak with the APWIN meter set to a frequency range of 700 Hz to
100 kHz. These two results are consistent, given that a peak-to-peak reading
can be up to twice the peak reading and the lower-frequency limit to the first
result is much lower.
Figure 141 illustrates the spectrum of this jitter gathered using the
INTERVU package of APWIN. This test is stored as “Intrinsic jitter spec-
trum.at2c.”
The graph shows that the jitter spectrum has a significant peak around
1.2 kHz. This may be indicative of the corner frequency of the clock recovery
phase-locked loop (PLL) in the DUT. A high peak such as this can be a conse-
quence of inadequate damping of the PLL. The jitter transfer function plot
may be able to confirm this.
an AES3 transmitter-receiver 8
100m
evaluation board as measured by 80m 7
70m
60m 6
Errors
50m
5
40m
4
30m
20m
2
10m 0
60 100 200 500 1k 2k 5k 10k 20k 50k 100k
Jitter Freq., Hz
Figure 142 shows a measurement using System Two Cascade. The DUT is
an evaluation board for an AES3 receiver and transmitter. The test file is “JTF
eval board.at2c.”
The measurement was made with a sinusoidal jitter input of 0.25 UI peak-
to-peak (0.125 UI peak). This amplitude was selected as the highest amplitude
of jitter that the jitter tolerance specification of AES3 requires for receivers to
be able to decode at all frequencies. See Receiver Jitter Tolerance, page 176.
For equipment that does not meet this tolerance level the applied jitter
amplitude may need to be reduced.
The measured level of output jitter is shown in terms of peak jitter, rather
than peak-to-peak, so a reading of 125 mUI corresponds to the same level as
the applied jitter level. The trace shows a slight peak at around 2 kHz, fol-
lowed by attenuation that is at –3 dB at around 10 kHz. This then falls to a
reading of around 13 mUI at 40 kHz.
Above the 40 kHz point, the measurement rises again to a small peak of
17 mUI at around 48 kHz. Above that frequency, the response is a mirror im-
age of the response below 48 kHz. This is indicative of aliasing at the sub-
frame rate of 96 kHz. This occurs if the phase detector in the clock recovery
system uses interface transitions in the preamble, not in the modulated part of
the data stream. That jitter is effectively being sampled at a rate of 96 kHz, so
jitter above half that rate (48 kHz) becomes equivalent to jitter below half that
rate. This hypothesis was confirmed by setting the input jitter frequency to
95.999 kHz, which is just 1 Hz below the 96 kHz sub-frame rate, and observ-
ing on an oscilloscope that the output jitter was a slowly-moving 1 Hz.
When performing this observation on an oscilloscope, it is important to trig-
ger the oscilloscope so that it is not just presenting transitions at the sub-frame
or frame rate. If that were the case then the oscilloscope trigger is performing
the same aliasing as we are trying to observe, and jitter occurring close to
those rates will appear to be at a low frequency. To avoid this problem, trigger
the oscilloscope using a jitter-free reference clock running at a higher rate. The
MASTER CLOCK OUTPUT on the back of the System Two Cascade pro-
vides a clock that has a period of 0.5 UI. This can be used to observe jitter in
interface transitions. The oscilloscope trigger hold-off can be adjusted until the
transitions are observed to be 1 UI apart; however, this is not essential in mak-
ing the observation if the jitter amplitude is significantly less than the trigger
signal period.
The small peak at 48 kHz in the measurement is another aliasing effect, and
is probably indicative of a non-linearity in the phase detector that is causing a
modulation of the incoming jitter at the frame rate. This was confirmed by set-
ting the jitter frequency to 1 Hz below 48 kHz. A major component of the jitter
was then seen to be at 1 Hz.
The measured jitter transfer function conforms to the AES3 specification
for jitter gain, as is detailed in the Jitter Theory chapter. That specification re-
quires that for any frequency there should not be more than 2 dB of jitter ampli-
fication, measured from input to output. The key point to look for is the
amount of “jitter peaking.” In the case of the measurement shown in Figure
142, the peak at 2 kHz is at 133 mUI, representing a gain of 0.54 dB over the
input level of 125 mUI (both measurements peak, rather than peak-to-peak).
This is well within the specification.
The overall measurement indicates that this circuit does not provide signifi-
cant levels of jitter attenuation, such as the 6 dB at 1 kHz described in the op-
tional AES3 jitter attenuation specification. This is normal for a device that
uses the same clock to both perform data recovery and provide the output
clock. This is a single-PLL configuration of a receiver/transmitter system. In a
dual-PLL system, further jitter attenuation can be provided in the second PLL,
which is not used to decode the incoming data stream. More information on
this topic can be found in the Jitter Theory chapter.
pedance refers to the differential mode impedance between the two signal
lines.
A reference output port that has a reliable impedance is used to drive the in-
put port under test. An oscilloscope can be used to observe the voltage wave-
form on the cable at the input port, and then this can be compared with the
waveform viewed on the cable when the input port has been removed and re-
placed with a resistor of the correct impedance.
System Two Cascade provided the source for the oscilloscope traces of Fig-
ure 143. The traces are of the difference between the two oscilloscope chan-
nels, and the scales are 1 V/div and 100 ns/div. The reference trace with the
110 W 1% resistor is shown in gray. The measurement trace, shown in black, is
a close match to the reference.
There is some overshoot after the transition. The overshoot indicates that
for the highest-frequency signal components the impedance may be slightly
higher than the reference. The small amount of extra droop indicates that the
low-frequency input impedance may be slightly lower than the reference. The
maximum voltage difference caused by the overshoot is around 0.2 V, or 8%
of the signal voltage at that point, and it lasts for about 30 ns. The difference in
droop is less than 0.1 V over the 480 ns (3 UI) of the first pulse of the pream-
ble. The overshoot and droop effects, then, are very small compared with the
20% tolerance of the AES3 input impedance specification. Both of these ef-
fects could be a consequence of the limited bandwidth of the transformer on
the input port being tested.
The trace of Figure 144 shows the same test on the input port of another de-
vice. In this case the stimulus interface waveform carries no embedded infor-
mation—no audio, no user data or channel status bits—so the whole waveform
is stable. The time axis has been extended to the left to include the 3 bits (U, C
and P) preceding the preamble and the two 3 UI pulses in the preamble.
The droop in this case is more significant. It appears to be 0.4 V over the
3 UI preamble pulse, and this may indicate a problem with the impedance
match at low frequencies. Taking into account the droop—which changes the
starting voltage of the transitions—the amplitude step of the transitions is also
reduced by about 0.25 V, or 5%. The shape of the curve after the transition
does not show a significant overshoot, so we are not seeing as much imped-
ance change for the higher frequencies in the signal as we saw in Figure 143.
In conclusion, the impedance is not a good match, but is likely to be within
the 20% tolerance of AES3. If in doubt, then use a dedicated impedance ana-
lyzer to make a more precise measurement.
Tmin=0.5 UI
Figure 145. Eye diagram used for AES3-1992,
IEC60958-3:1999 and IEC60958-4:1999
Vmin=
200 mV
1.0 UI
Even so, the verification that a receiver can decode signals with eye sizes at
least as small as the AES/IEC minimum remains useful.
200m
Figure 146. APWIN view of an eye
150m
pattern.
105.2m
50m
-50m
-100m
-150m
-200m
0 46n 100n 126n 150n
sec
Figure 147. Oscilloscope view of an eye Figure 148. Oscilloscope view of an eye
pattern. pattern, showing a smaller “opening.”
For this figure the MASTER CLOCK OUTPUT on the System Two Cas-
cade rear panel provided the oscilloscope trigger, with the oscilloscope hold-
off adjusted to align with the data transitions. Many traces are shown on top of
each other through the use of several seconds of display persistence.
The eye diagram can also be directly measured with System Two Cascade,
shown in Figure 146. (See test “DIF eye.at2c” as well.)
is quite small, so the error rate does not indicate the margin by which the re-
ceiver has failed the test.
Common-mode rejection
The common-mode rejection specification for the balanced AES3 format re-
quires that a receiver should remain functional even with a common-mode sig-
nal of up to 7 V peak at frequencies from DC to 20 kHz. For testing against
this specification, this common-mode signal can be imposed using a center-
tapped transformer on the interface signal generator output. Some digital audio
test equipment can generate this common-mode component, including System
Two Cascade.
This specification is not sensitive to the impedance balance of the input
port. Any imbalance can result in the production of common-mode currents
and introduces a mode conversion mechanism whereby induced common-
mode signals produce differential voltages.
There is not a direct specification for this performance aspect in AES3. The
use of transformers (as mentioned in AES3) should ensure it is not an issue;
without transformers, however, this crosstalk mechanism may be signifi-
cant—particularly in conjunction with long cable runs with many different
signals bundled together.
100
Figure 149. Jitter tolerance template
0.1
3 4 5 6 7
10 100 1 . 10 1 . 10 1 . 10 1 . 10 1 . 10
Jitter frequency (Hz)
The jitter tolerance corner frequency is lowered to around 200 Hz. This al-
lows a single-stage clock recovery system to be used, which can also provide
jitter attenuation above 200 Hz. This would be useful for generating a sample
clock used by a digital-to-analog converter where sidebands due to jitter much
above 200 Hz are increasingly likely to be audible.
In both templates the maximum jitter tolerance is set at 10 UI. This is pri-
marily to simplify the task of generating the test signal. In receivers the toler-
ance will continue to increase as the jitter frequency falls.
The consumer template also has a curious step at 400 kHz. Above this fre-
quency the required tolerance level is reduced slightly, but apart from that the
flat high-frequency part of the template is at the same level as for the profes-
sional format.
Any receiver that meets the professional tolerance specification will meet
the consumer specification.
that will reveal when errors start to occur at the input port receiver. Mean-
while, sinusoidal jitter of variable frequency and level is applied to the input
port.
At each test frequency the level of jitter is increased until errors are de-
tected. The highest level before errors occur at each frequency defines the jit-
ter tolerance at that frequency. This technique provides a measurement of the
actual jitter tolerance of the input port.
5
against jitter. -40
-60
1
-80
500m
-100 200m
-120 100m
50m
-140
20m
10 20 50 100 200 500 1k 2k 5k 10k 20k 100k
Jitter Frequency, Hz
DSP Anlr.THD+N Ratio A Left Axis
Dio.Interface Jitter Right Axis
Dio.Jitter Ampl Right Axis
The traces in Figure 151 are taken from the results stored in this test (DIF jit-
ter tolerance.at2c). The failure of the input port is determined by the THD+N
reading on the signal coming back from the DUT, which is plotted against the
left-hand scale in black. The device under test is a professional DAT recorder
Signal Characterization
Signal Amplitude
The peak amplitude is a characteristic that is easy to measure, but it is not
normally a direct indicator of signal quality. If the peak signal level is very
much lower than the specified level due to cable losses, then it is likely that the
pulse distortion due to the frequency-dependent nature of those losses will
have caused an even more significant reduction in the eye opening.
In any case, an estimate of the size of the eye opening should be made; this
can be compared with the eye diagram associated with the input port minimum
input signal amplitude.
Signal reflections
An oscilloscope can be used to look for signs of signal reflections. These
are produced by impedance discontinuities in the transmission line formed by
the interconnect cable and connectors. Discontinuities occur if:
§ The use of a BNC T-fitting to parallel two inputs. (This can be done, but
only if the terminations that are not at the end of the cable are switched
off and the T-fitting is mounted directly on the non-terminated inputs.
However, it is not normal for terminations to be switchable at all.)
SYNCHRONIZATION
The embedded clock defined by the interface bit-cell transitions, the subframe and
the frame boundaries can be used as a timing reference by equipment to derive timing
for converters, processors, and digital outputs. For digital outputs AES11 defines limits
for the timing offset between the frames of the reference input signal and the frames of
the outputs.
In some cases the timing reference is provided by another signal or clock, and the
incoming signal needs to have been already synchronized to that clock. AES11 defines a
specification for this sort of synchronization. It also covers synchronization of the digital
audio interface with video.
put is less than 5% of a frame period, or 6.4 UI. This is shown in Figure 153, with the
circle representing the possible phases of relative input and output frame timing. In this
picture the distance around the perimeter of the circle corresponds to one frame.
DARS
The digital audio reference signal, or DARS, is an AES3 signal that is used for tim-
ing purposes rather than for carrying audio data. This signal can be fed from the clock
master device to other devices—which would be synchronization slave devices—which
need to be synchronized to each other or to the clock master. For example, the clock mas-
ter may be a digital mixer. The slaves may be the various source devices that feed the in-
puts of the mixer, such as tape and hard disk recorders and outboard analog-to-digital
converters.
These slave devices also need to meet the AES11 output tolerance alignment specifi-
cation of ±5% of a frame. As a result, the signals from the slave devices are appropri-
ately aligned to the internal timing of the mixer so that there is no ambiguity about
which frames are associated with the same sample time.
In this situation it is assumed that the input signals are all at the same
sample rate and have been synchronized by a DARS. If any are at slightly
different rates then a re-synchronizing sample rate converter would be
required.
The timing of the arrival of the data frames from each input signal will determine
which frames are aligned together when processed. If the timing is closely matched there
is no ambiguity, but if one of the input signals is slightly misaligned that produces a
problem.
For this example, consider that the data from each input signal is received and de-
coded and briefly held in a buffer store. At a time determined by the mixer’s own clock
(which is derived from the DARS) this buffer store is transferred to another store, or
“read.” This defines the boundary between times when an input data word corresponds
with one sample or the next.
An ambiguity in frame alignment can occur if the input signal arrives just at the
time when the mixer is reading the input data buffer. If the new frame data for input sam-
ple N has been decoded and then loaded into the mixer input data buffer just before
mixer sample M is read, then the input sample number and mixer sample number are the
same. However, if input sample N arrives a few microseconds later, then input sample
N–1 is used for mixer sample M. This produces a time error of one sample for that input.
Even worse is the situation where the input sample arrives so close to the moment
that the input buffer is read, that a small amount of jitter causes a fluctuation of states
between a delay of one sample, and no delay at all. This is shown in Figure 154. This
could result in the missing and repeating of input samples each time the data arrival
phase crossed the buffer “read” phase.
The AES11 rules address this problem with the combination of input and output
alignment tolerance. The output tolerance has already been mentioned. The input toler-
ance requires that the receiver should correctly process an input that has arrived with a
timing that is within 25% of a frame period to the timing of the reference. This range is
shown in Figure 155.
A receiver that needs to support the DARS synchronization mode should be de-
signed with the input buffer read time opposite in phase to the ideal phase alignment de-
termined by the timing of the DARS.
Reference
Reference
phase
phase
Compliant input
Compliant input timing range
timing range –25 %
–25 %
Figure 155. AES11 input alignment Figure 156. Provision of input buffer
tolerance. hysteresis to improve response to
“wander.”
AES11 requires that a receiver should treat synchronized input data as being sam-
pled at the same instant, if the frame start is aligned to the DARS frame start with an er-
ror of less than 25% of a frame period. This timing offset tolerance allows for a chain of
devices that are synchronized using the signal embedded clock (rather than a DARS)
and therefore adding up to 5% of a frame of error for each device, and also for other tim-
ing errors.
A good receiver design can go further than this. It could use hysteresis in the region
of non-compliant input timing and take away the risk of any particular timing relation-
ship resulting in the dropping and repeating of samples. The ±25% rule mentioned
above allows for hysteresis in the other 50% of the timing circle. This could be imple-
mented to ensure that if the relative input alignment drifts past the critical phase, a sam-
ple of input data is not lost or repeated until the timing is up to 75% of a frame away
from the nominal ideal, as is illustrated in Figure 156. If that occurs and the alignment
drifts in the other direction, then the correction in the other direction would not occur un-
til the error had reduced down to 25% from the nominally ideal timing. This will then
give a tolerance-to-timing wander of as much as 50% of a sample frame, even if the
source has a worst-case misalignment of 180 degrees to the correct (reference) phase.
The ability of the digital audio interface ports to conform to the relevant
standards does not ensure that the equipment interfaces will behave as
expected.
It is possible, for example, for the interfaces to truncate part of the audio
data word, or to require a particular channel status pattern before they can de-
code the audio data. Characteristics like these may be a consequence of the in-
ternal data word size, for example; or of a method of selecting an operating
frequency by reading the channel status indication of the interface signal
sampling frequency.
In such cases the interface ports can be functioning properly, yet there is a
failure to decode the audio.
Audio data
The measurement of the performance of audio processing is beyond the
scope of this Application Note. However—short of that—there are some im-
portant tests to assess how the digital interfaces are processing the data.
Data transparency
Many devices, such as digital recorders, routing devices and format convert-
ers are totally transparent to the audio data; that is, the audio data is passed
through as a perfect bit-for-bit image. In some modes other equipment, such as
digital mixers or outboard processing boxes, can also be operated in a data-
transparent manner.
For these examples a pseudo-random data test signal can be very useful.
This type of data stream can follow a defined sequence of bits over an ex-
tended period of time. The stream can then be recognized (even if time-shifted
in a recorder) and the transparency of the image evaluated on a bit-for-bit
basis.
The digital generator in APWIN can produce such a stream, which is called
Bittest. Bittest can be selected as a generator special waveform by choosing
Wfm: Special: Bittest Random on the Digital Generator panel.
The Bittest pattern can be recognized by choosing Analyzer: Digital Data
Analyzer (Bittest) and Waveform: Random on the Digital Analyzer panel.
The word-length to be tested should be selected on the DIO panel under In-
put: Resolution.
If any change in the data, including dither, has been applied to the signal,
then the pseudo-random technique will not work. In that case the equipment is
Channel Status
The earlier description of the channel status bit function (page 153), and the
annex on channel status that defines all the bit states (page 190), indicate how
channel status can be used. Some of the more basic functions can be tested in a
straightforward manner.
Status transparency
For equipment with any kind of pass-through function for the channel status
information, it is useful to identify which data is actually being passed
through. This can be discovered by sending various channel status patterns to
the equipment and recording the pattern that is returned.
For the AES3 channel status data there is a check code, the Cyclic Redun-
dancy Check Code (CRCC). If AES3 channel status data is modified in any
way, then this code needs to be regenerated. If all the channel status remains
the same, then the code does not need to be changed.
There is some information in the CRCC code. If an incoming channel status
pattern has a CRCC error, that indicates the channel status is unreliable. The
special case of CRCC=0 may indicate that CRCC is not implemented. If it is
consistently zero, then it may make sense for the equipment to ignore the
CRCC error.
Apart from the zero case, there are two methods of handling CRCC errors.
§ The first is to ignore the new block and repeat the old channel status
block.
§ Deemphasis. For both professional and consumer formats there are em-
phasis flags. These should enable deemphasis filters on any signals, such
as analog, that do not carry through the deemphasis flag information. In
a format converter, the (rarely used) J17 emphasis flag in the AES3
stream cannot be converted to a consumer format equivalent. The DUT’s
reaction to the presence of this flag can be noted.
§ Sample rate selection. Many devices require the sample rate to be indi-
cated in order to function correctly. Some devices do not operate when
the “non-indicated” state is used. This behavior can be noted.
§ Word length manipulation. Where the audio word length is being re-
duced, then dither may be applied before the truncation in order to avoid
signal-correlated errors in the resultant signal. Note whether or not the
dither is disabled if the word-length indication shows that the word
length does not require truncation.
§ Auxiliary audio masking. The bottom 4 bits in the 24-bit audio word
might (rarely) be used to carry other data. If that is the case it is indicated
in the channel status that the maximum audio word length is 20 bits.
Note whether or not the DUT masks off the lower bits in this condition.
16
(See the chapter Digital-to-Analog Converter Measurements and “d-
a aux truncation test.apb.”)
§ Pre-emphasis selection.
§ Copyright status. This is subject to the serial copy management system
and affects the consumer format bits 2 and 15.
§ Standard. This requires that the first three bytes and the cyclic redun-
dancy check code (CRCC) in byte 23 are correctly implemented.
Since this system was introduced after IEC60958 (then called IEC958) was
first published, there are some apparently complex rules in order to retain com-
patibility with the Compact Disc format, which pre-dates SCMS.
Generally, the L-bit is set to “one” to indicate that the signal is from pre-re-
corded material, and is cleared to “zero” to indicate that a “home-copy” has
been made. The SCMS rules require that a home-copy with copyright asserted
cannot be copied again.
With the category codes for laser-optical products (such as CD) and digital
broadcast receivers, the sense of the L-bit is reversed. In these cases the L-bit
is set to “zero” to indicate that the signal is from pre-recorded material.
There are two category codes for which the devices are deemed to be with-
out knowledge of copyright status. These are the “general” category code,
00000000h; and the code for converters for analog signals without copyright in-
formation, 0110000Lh (the sense of the L-bit is determined by the product cate-
gory). An SCMS compliant recorder, such as a consumer-mode DAT recorder,
will record a signal with these codes and ignore the copyright flag. On replay
it will indicate that the material is the equivalent of “pre-recorded.” This has
the effect that one further home-copy generation is allowed—giving two
generations of copying in all.
Validity bit
Since the validity bit was poorly defined, it is not clear how equipment
should behave on receipt of a signal with the bit set to “one”—indicating an in-
valid signal.
Strictly to the specification, the audio data word associated with an invalid
data status indication should not be converted to analog. This would infer that
any equipment that is not simply passing the validity flag through with the au-
dio data should mute or interpolate the associated audio word, so that any fol-
lowing equipment does not convert it to analog. This behavior can be verified
in APWIN by setting the valid flag manually in the DIO panel and observing
the effect on a DUT.
If the validity bit is being passed through with the audio, then it is important
that there they remain exactly aligned. If there were a time offset between the
flag and the audio, the flag could align with a different audio word, with the re-
sult that the originally incorrect word would be wrongly marked as valid, and
a correct word would be wrongly marked as invalid. Correct alignment can be
verified with dedicated test equipment. The author is unaware of any that is
commercially available.
Some equipment uses the validity bit to indicate that error concealment has
taken place. This non-compliant behavior is common to a large number of CD
players. If the response of the DUT is to replace invalid samples with a mute
or concealment that is more noticeable than the original concealment, then that
may be seen as a disadvantage.
Because of this confusion it is common for equipment to be required to ig-
nore the validity flag.
User data
For the professional format, the user data stream can be used for a variety of
applications, which can have different formatting and relations to the other in-
terface data.
If a device under test is aiming to be completely transparent to all defined
and future formats, this can be verified by passing through a known pseudo-
random data stream (such as Bittest) and confirming that it has not been
corrupted.
If the DUT is not transparent, the test can be repeated for the three standard
formats that have already been defined in case the device only supports one of
those subsets. This could be a large exercise without a protocol analyzer. Un-
fortunately, the author is not aware of any commercially available AES3 or
IEC60958 user-data protocol analyzers.
Simpler than that, it may be possible to put the DUT in the path between
equipment that is communicating in the appropriate format and then to con-
firm that the user data messages are still getting through.
For the consumer format the general user data format is the only one speci-
fied. The technique described in the previous paragraph could also be used
(with appropriate consumer equipment) to test for transparency to consumer-
format user data messages.
The annex lists these fields for IEC60958-3:2000 and AES3-1997 but is not
authoritative. A copy of the latest revision of the appropriate standard should
be used if possible.
Table 2. Consumer format channel status Table 3. Consumer format channel status
fields. interpretations.
Note that the bit fields are shown with the earliest, or lowest-
numbered bit, to the left. As the format is LSB-first, this nota-
tion this is opposite to the conventional binary notation, which
would show the MSB to the left.
17 Local sample address code (32-bit binary, MSW) 16–18 use of aux sample word 0000: not defined, audio max 20 bits
bit 136 137 138 139 140 141 142 143 0001: used for main audio, max 24 bits
18 Time of day code (32-bit binary, LSW) 0010: used for coord, audio max 20 bits
bit 144 145 146 147 148 149 150 151 0011: user-defined
19 Time of day code (32-bit binary)
bit 152 153 154 155 156 157 158 159
19–21 source word length if max = 20 bits if max = 24 bits
000: not indicated not indicated
20 Time of day code (32-bit binary)
001: 23 bits 19 bits
bit 160 161 162 163 164 165 166 167
010: 22 bits 18 bits
21 Time of day code (32-bit binary, MSW) 011: 21 bits 17 bits
bit 168 169 170 171 172 173 174 175 100: 20 bits 16 bits
22 reserved Reliability flags 101: 24 bits 20 bits
bytes bytes bytes bytes
0-5 6-13 14-17 18-21
bit 176 177 178 179 180 181 182 183 Table 5. Professional format channel status
23 Cyclic redundancy check character (CRCC)
bit 184 185 186 187 188 189 190 191 interpretations.
Table 4. Professional format channel status
fields.
Note that the bit fields are shown with the earliest, or lowest-
numbered bit, to the left. As the format is LSB-first, this nota-
tion this is opposite to the conventional binary notation, which
would show the MSB to the left.
List of Files
The following APWIN files are referred to or used in this Application Note:
References