Digital Audio Effect
Digital Audio Effect
Proc. of
Proc. of the
the 77th
th
Int.Conference
Int. Conferenceon
onDigital
DigitalAudio
AudioEffects
Effects(DAFx'04),
(DAFX-04),Naples,
Naples,Italy,
Italy, October
October 5-8,
5-8, 2004
2004 DAFx
DIGITAL AUDIO EFFECTS APPLIED DIRECTLY ON A DSD BITSTREAM
ABSTRACT PCM
Interpol-
Digital audio effects are typically implemented on 16 or 24 bit 1 bit ADC
Decim-
PCM ation
Analog
ation SDM Low-pass
signals sampled at 44.1 kHz. Yet high quality audio is often en- Front End
Filter
Recorder Digital
Filter
Filter
coded in a one-bit, highly oversampled format, such as DSD.
Processing of a bitstream, and the application of audio effects on a DSD
bitstream, requires special care and modification of existing meth- Analog
1 bit ADC DSD
ods. However, it has strong advantages due to the high quality Front End Recorder
Low-pass
Filter
phase information and the elimination of multiple decimators and
interpolators in the recording and playback process. We present
several methods by which audio effects can be applied directly on Figure 1: The standard multibit PCM recording amd playback
a bitstream. We also discuss the modifications that need to be chain, (a), requires a decimation filter on the recording side and
made to existing methods for them to be properly applied to DSD an oversampling filter on the playback side, whereas Direct
audio. Methods are presented through the use of block diagrams, Stream Digital, (b), enables sound to be recorded directly in the 1-
and results are reported. bit signal format and eliminates the need for these filters.
Keywords: Sigma Delta Modulation, SACD, DSD, Digital
Audio Effects, Bitstream Signal Processing Digital (see Figure 1). As in conventional PCM systems, the ana-
log signal is first converted to digital by 64x oversampling sigma
delta modulation. The result is a 1-bit digital representation of the
1. INTRODUCTION audio signal. Where conventional systems immediately decimate
the 1-bit signal into a multibit PCM code, Direct Stream Digital
One-bit signals are used throughout the audio recording, editing records the 1-bit pulses directly.
and playback process. Most analog to digital and digital to analog The resulting pulse train has some remarkable properties. The
converters employ a sigma delta modulator that converts a signal bandwidth now extends over more than 1.4 MHz. Through the use
to a bitstream. Digital audio is often stored during production in a of high order sigma delta modulators (SDMs), the noise can be
single bit format. In addition, the high-end audio distribution for- shifted up to inaudible frequencies. And the digital-to-analog con-
mat, SuperAudio CD, employs the single bit recording format version is now as simple as running the pulse train through an
known as Direct Stream Digital, or DSD. analog low-pass filter.
The benefits of the DSD format are numerous. Improvements in Ultra-high signal-to-noise ratios as required for DSD in the audio
the traditional pulse code modulation (PCM) format from higher band are achieved through 5th-order noise shaping filters. Thus
bit rates and higher sampling rates have experienced diminishing DSD can represent signals with a frequency response from DC to
returns. This is partly due to the difficulties in implementing accu- 100 kHz. The residual noise power is held at -120 dB through the
rate high bit quantisers, but primarily due to the losses incurred audio band [2].
from filtering. PCM systems require steep filters at the input to Although single bit, oversampled formats have been found to be
block any signal at or above half the sampling frequency. Ideally, excellent for archiving, A/D and D/A conversion, and re-
a brick wall filter should be used; passing all frequencies below cording [3], they suffer from a serious drawback in the editing and
the Nyquist frequency, and rejecting all above. Yet an ideal brick mastering phase. Few tools have been developed which allow
wall filter does not exist. effective processing of audio bitstreams. To apply audio effects
In addition, requantization noise is added by the multi-stage or directly on the bitstream, it is vital that requantisation, decimation
cascaded decimation (downsampling) digital filters used in re- and interpolation be kept to a minimum.
cording and the multi-stage interpolation (oversampling) digital However, processing and audio effect creation in the 1 bit domain
filters used in playback. Increasing the sample rate, as with DVD- is appealing for many reasons. The oversampled signal has very
Audio, eases the difficulty of the brick wall filter, but does not high quality phase information, making phase vocoder-based ef-
correct the problems introduced by multi-stage decimation and fects easier and more accurate. Effects using variable delays, such
interpolation. as chorus and flange, also benefit from oversampling since inter-
This was the inspiration for a 1 bit audio format, as first proposed polation of the delay is far more precise. Furthermore, 1-bit audio
by Angus [1], and independently implemented as Direct Stream effects can be applied on the DSD signal directly before or after
DAFX-1
372 — DAFx'04 Proceedings — 372
Proc. of
Proc. of the
the 77th
th
Int.Conference
Int. Conferenceon
onDigital
DigitalAudio
AudioEffects
Effects(DAFx'04),
(DAFX-04),Naples,
Naples,Italy,
Italy, October
October 5-8,
5-8, 2004
2004
encoding, thus maintaining the simplified production chain as in 3.1. Bitstream addition
Figure 1.
The goal of this paper is to describe how to develop standard au- Perhaps the most fundamental signal processing is the addition of
dio effects on the DSD bitstream, while minimizing intermediate two signals. O’Leary and Maloberti [15] demonstrated an elegant
conversions to multibit format (thus destroying all benefits of bitstream adder (Figure 2). The oversampled nature of the bit-
DSD). Previous work [4-12] has already established that suitable stream allows one to use a simple feedback loop whereby two
IIR and FIR filters can be created, as well as some mixing tools. bitstreams are added along with the sum bit from the previous
However, common audio effects, such as compandors, expandors, iteration. When the bandwidth of the input signals is far below the
reverb, modulation, and so on, have not yet been developed. In the sampling frequency, as is the case with DSD, the output carry bits
following Sections we will demonstrate how these effects can be are an excellent representation of the average of the two signals.
applied directly on a bitstream without introducing unwanted arti- This bitstream adder is remarkable because it requires no requanti-
facts, or significant degradation of audio quality. sation, and it has been shown to be highly effective for oversam-
pled signals. The alternative, bitstream addition via the interleav-
ing of bitstreams [16], suffers degradation of audio quality due to
2. PROPERTIES OF THE DSD BITSTREAM downsampling, phase shift and possible introduction of low-
frequency noise.
There are several features of DSD which distinguish it from PCM. However, although this bitstream adder does not explicitly per-
At its heart, DSD is specified as being a 1-bit format, with a sam- form requantisation, it amounts to the same effect. Thus it acts as a
pling rate of 64*44.1 kHz, or 2.8224 MHz [13]. Little else is speci- first order sigma delta modulator and introduces some noise and
fied regarding the format, although constraints are imposed for the distortion into the audible band. The bitstream adder is suitable
archiving of DSD on SuperAudioCDs and the playback of those either for a limited duration, or when increased noise is accept-
CDs (notably, restrictions on noise levels, frequency response, able. An alternative would involve summing the signals and then
peak levels and DC offsets). However, the specifications of DSD performing high order noise shaping.
also note the following properties
A
1. The 1-bit format is such that the 1 represents a positive output sum
(+1) and the 0 a negative output (-1).
B + z-1
carry A+B
2. The 0 dB reference level has been set to 50% of the maximum 2
theoretically possible modulation depth. At least 4 out of any
28 consecutive bits must be set to 1 (and similarly for 0). This Figure 2: A bitstream adder.
maximum setting corresponds to 3.10 dB.
3.2. Delay based effects
3. Silence patterns are defined as repeating bytes where each byte
contains an equal number of 1s and 0s. By using the bitstream adder in conjunction with multiple delays,
it is possible to create a flanger or chorus effect entirely through
Unlike PCM, the DSD signal always has a power of 1 (the bits simple logic operations on the bitstream. This is indicated in
representing +1 and -1 levels). Thus any instantaneous measure- Figure 3, where BSA represents the bitstream adder from Figure 2.
ment of signal level is meaningless. Furthermore, whereas PCM This implementation is very elegant and appealing because it re-
has a strict 0dB maximum, the 0 dB limit for DSD has been im- quires no filtering, decimation, interpolation or requantisation. It
posed as a safety measure. In practice, this means that a DSD sig- deals solely with bit operations and delays. Furthermore, the de-
nal, when put through a sigma delta modulator, is unlikely to re- lays can be set to any length, and due to the high sampling rate of
sult in instability or severe clipping since its peak levels have DSD, there are far more options over the number of voices and
already been restricted to within safe margins. their placement. To weight the delayed signals, a given delay time
Silence patterns do not make sense in 44.1 kHz PCM since any may be repeated in the inputs to the bitstream adders.
repeating pattern would be ≤ 22.05 kHz and hence potentially au-
dible. A constant DC level represents silence in PCM. But for a
DSD BSA
DSD signal, constant levels (i.e., all zeroes or all ones) are not Input z-n1
allowed. A repeating pattern of 8 bits or less, on the other hand,
only has frequency components above 176 kHz, i.e., far outside the BSA
DSD
range of human hearing. Thus whenever inaudible output is re- z-n2 Output
quired, a silence pattern should be used. This is important in the BSA ... BSA
construction of many audio effects, such as noise-gating. z-n3
BSA
...
DAFX-2
373 — DAFx'04 Proceedings — 373
Proc. of
Proc. of the
the 77th
th
Int.Conference
Int. Conferenceon
onDigital
DigitalAudio
AudioEffects
Effects(DAFx'04),
(DAFX-04),Naples,
Naples,Italy,
Italy, October
October 5-8,
5-8, 2004
2004
DAFX-3
374 — DAFx'04 Proceedings — 374
Proc. of
Proc. of the
the 77th
th
Int.Conference
Int. Conferenceon
onDigital
DigitalAudio
AudioEffects
Effects(DAFx'04),
(DAFX-04),Naples,
Naples,Italy,
Italy, October
October 5-8,
5-8, 2004
2004
Level
A G
DSD
Detector
+ z-1 Q
Bitstream
B 1-G
+ -
DSD
Silence +
Figure 5: A hard noise gate implemented on a DSD bitstream. A
DSD silence signal must be used since constant DC levels are not Figure 7: Smooth switching between DSD bitstreams using a
possible. slowly changing gain and a first order SDM.
In [19], Reefman and Nuitjen described an approach to synchroni- The result of this switching scheme on input signals of frequency
sation of bitstreams which allows for seamless switching. This 1 and 2 kHz, is depicted in Figure 8. A switch is desired at 2 milli-
approach involves the use of a sigma delta modulator acting on the seconds. The example is particularly pernicious (and somewhat
mix of the two input bitstreams. However, this SDM must be syn- unrealistic) since the waveforms are very different; out-of-phase
chronised such that it produces the bitstream A when acting just and with peak amplitudes of 0.2 and 0.9. The gain is changed
on A, and the bitstream B when acting just on B. linearly from 1 to 0 over 1,600 samples, or just over half a milli-
In order to produce synchronisation, the integrator states, or initial second. Depicted are the analog input signals before conversion to
conditions of the SDM, must match those integrator states. This bitstreams, and the output signal after decimation to multibit,
synchroniser can be implemented by using a least squares ap- 44.1 kHz using a sinc2 filter. The resulting transition at 2 msecs is
proach to find integrator states which minimise the difference smooth without abrupt changes in amplitude or slope. There is a
between a DSD input signal and the resulting DSD output signal. slight and temporary increase in frequency, but this effect can be
Thus editing is done as depicted in Figure 6. When synchronisa- minimised through the use of a slower gain change or eliminated
tion is ready, the switch is changed to the central position, and G completely by using a detection scheme to find a more appropriate
is set to 1. G is slowly decreased to 0, then the output stream is time to perform the edit.
resynchronised to input stream B, and the switch is set to the Improvements to this method could also be achieved by using a
downwards position. more effective noise shaper (higher order SDM) instead of the first
An alternative switching method is proposed in Figure 7. We note order SDM in Figure 7. However, with gain equal to 1, the output
first that both input and output streams are low-pass filtered, and bitstream would not be identical to the input bitstream. To phase
the application of a slowly changing gain and a first order SDM out the effects of requantisation, and resynchronize the output
should not significantly change the bandwidth of the signal. Im- bitstream with the input stream A, we can slowly reduce the feed-
portantly, a first order SDM will have no effect on a DSD bit- back coefficients of the modulator. As feedback coefficients ap-
stream. The difference between quantization of a bit and the origi- proach zero, the modulator becomes lower order until it ap-
nal bit is zero. Thus, when G is set to 1 in Figure 7, the output proaches a first order SDM, and as before, has no effect on the
bitstream is A. As G is decreased, a cumulative error based on the bitstream.
difference between the 2 input signals is added to the quantiser
input. As G approaches 0, the difference between the output and
input bitstream B also approaches 0. Eventually, the feedback 1.0
term approaches a constant (typically non-zero) and the output
0.5
Analog Input
0.5
Sync1
A G 0.0
+ SDM -0.5
B 1-G -1.0
Sync2 0 1 2 3 4
Time (msec)
-n
z
Figure 8: Smooth switching between DSD bitstreams using the
Figure 6: Smooth switching between bitstreams using synchronisa- circuit from Figure 7. This is the worst case scenario, where the
tion. input bitstreams have differing amplitudes and opposing phases.
DAFX-4
375 — DAFx'04 Proceedings — 375
Proc. of
Proc. of the
the 77th
th
Int.Conference
Int. Conferenceon
onDigital
DigitalAudio
AudioEffects
Effects(DAFx'04),
(DAFX-04),Naples,
Naples,Italy,
Italy, October
October 5-8,
5-8, 2004
2004
IIR filtering of a DSD signal, on the other hand, does not change 7. REFERENCES
the delays but changes the coefficients. The coefficients of the
filter can be calculated in the same way as for PCM, but the over- [1] J. A. S. Angus, “The One Bit Alternative for Audio Process-
sampling implies that their values will be very different. ing and Mastering,” Proceedings of the Audio Engineering
As has been mentioned, requantisations should be kept to a mini- Society Conference on Managing the Bit Budget, London, pp.
mum. Thus, if the filtering consists of IIR/FIR filters, a noise 34–40, 1994.
shaping filter and a low pass filter, then these stages should be
[2] D. Reefman and E. Janssen, “Signal processing for Direct
combined in such a way that there is only one requantisation in the
Stream Digital: A tutorial for digital Sigma Delta modulation
final stage. Figure 9 depicts an IIR filter which incorporates an
and 1-bit digital audio processing,” Philips Research, Eind-
SDM-based requantiser. Although such a design is efficient and
hoven, White Paper 18 December 2002
eliminates the multi-bit stage, it does not differ greatly from a
cascade of one bit filters followed by a remodulator. [3] D. Reefman and P. Nuijten, “Why Direct Stream Digital
Minimising decimation, interpolation, and requantisation is not a (DSD) is the best choice as a digital audio format,” Proceed-
drawback. These filters add to system complexity and degrade ings of the Audio Engineering Society 110th Convention,
performance. In addition, filtering in the oversampled domain is Amsterdam, Holland, 2001.
advantageous because it relaxes specifications on anti-alias and [4] J. A. S. Angus, “Direct Digital Processing of 'Super Audio
reconstruction filters at the analog interfaces, thus improving CD' Signals,” Proceedings of the Audio Engineering Society
phase linearity [12]. 108th Convention, Paris, France, 2000.
DAFX-5
376 — DAFx'04 Proceedings — 376
Proc. of
Proc. of the
the 77th
th
Int.Conference
Int. Conferenceon
onDigital
DigitalAudio
AudioEffects
Effects(DAFx'04),
(DAFX-04),Naples,
Naples,Italy,
Italy, October
October 5-8,
5-8, 2004
2004
DAFX-6
377 — DAFx'04 Proceedings — 377