Keiler Realtime Prediction Dafx2000
Keiler Realtime Prediction Dafx2000
ABSTRACT
In many audio applications an appropriate spectral estimation from
a signal sequence is required. A common approach for this task is
the linear prediction [1] where the signal spectrum is modelled by
an all-pole (purely recursive) IIR (infinite impulse response) filter.
Linear prediction is commonly used for coding of audio signals Figure 1: LPC structure with feed forward prediction. (a)
leading to linear predictive coding (LPC). But also some audio Analysis, (b) Synthesis.
effects can be created using the spectral estimation of LPC.
In this paper we consider the use of LPC in a real-time system.
We investigate several methods of calculating the prediction coef-
ficients to have an almost fixed workload each sample. We present produces the output signal
modifications of the autocorrelation method and of the Burg al-
gorithm for a sample-based calculation of the filter coefficients as (5)
alternative for the gradient adaptive lattice (GAL) method. We
discuss the obtained prediction gain when using these methods re-
where can be realized with the FIR filter in a feedback
garding the required complexity each sample. The desired con-
loop as shown in Figure 1(b). If the residual calculated in
stant workload leads to a fast update of the spectral model which
the analysis stage is fed directly into the synthesis filter, the input
is of great benefit for both coding and audio effects.
signal will be ideally recovered.
The IIR filter is termed synthesis filter or LPC filter and
1. INTRODUCTION it models – except for a gain factor – the input signal . For
speech coding this filter models the time-varying vocal tract. The
In LPC the current input sample is approximated by a linear filter for calculating the residual from the input
combination of past samples of the input signal. The prediction of signal (see Eq. (3)) is called the inverse filter.
is computed using an FIR filter by
With optimal filter coefficients, the residual energy is mini-
mized. This can be exploited for efficient coding of the input signal
(1) where the quantized residual is used as excita-
tion to the LPC filter.
Other applications for linear prediction are audio effects such
where is the prediction order and are the prediction coeffi- as the cross-synthesis of two sounds or the pitch shifting of one
cients. With the -transform of the prediction filter sound with formant preservation [2, 3, 4, 5].
In this paper, modifications of two commonly used linear pre-
diction methods (autocorrelation method, Burg algorithm) are pre-
(2)
sented to get methods which are suited for real-time computation,
i.e. to get a similar workload each sample. Furthermore the qua-
the difference between the original input signal and its pre- lity of the spectral model computed by different linear prediction
diction is evaluated in the -domain by methods is compared regarding the required computation time.
We consider here linear prediction methods for a computation of
(3) the residual with zero delay. Thus, the prediction coefficients are
computed from past samples of the input signal and the methods
The difference signal is called residual or prediction error are suited for audio coding using the ADPCM structure where no
and its calculation is depicted in Figure 1(a). Here the feed for- transmission of the filter coefficients is required. The fast filter
ward prediction is considered where the prediction is calculated in update coming from the similar workload each sample leads to
forward direction from the input signal. better spectral models than block-based approaches where the co-
Using the excitation as input to the all-pole filter efficients are held constant for the duration of one block. With a
fast update of the spectral model no interpolation of the filter coef-
ficients between frames is required as usually performed in audio
(4) effects based on a frame-by-frame LPC analysis [3, 4].
DAFX-1
Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-00), Verona, Italy, December 7-9, 2000
(10)
The Burg algorithm is based on the lattice structure and it mini- 3.3. Gradient Adaptive Lattice (GAL)
mizes for a predictor of order in a block of length the sum
of the energies of the forward prediction error and of the As in the block-based Burg algorithm, in the gradient adaptive lat-
backward prediction error . tice method the lattice coefficients are used and the sum of the for-
The initialization of the forward and backward prediction er- ward and backward prediction errors is minimized [7, 8]. Using
rors of order zero for the considered block is obtained by the approximation of the error energy of the -th order predictor
(6) (17)
(7) yields with the steepest decent approach the coefficient update
DAFX-2
Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-00), Verona, Italy, December 7-9, 2000
with the gradient weights . Applying the recursions after Equa- Durbin recursion of order as given in Equations (13) to
tions (9) and (10) for the current time index leads to (16). Thus, the calculation of the complete set of the direct FIR
coefficients of order requires samples. If the stan-
(19) dard FIR structure is used, the use of the coefficients in the filter
operation has to be delayed by samples.
Thus putting (19) into (18) gives a formula for a sample-by-sample
update of the lattice coefficients. 3.5. DSP Workload of the Sample-Based Methods
The simplest approach is to choose the values constant.
Simulations have shown that the optimum value of (equal for all The required instructions per sample on the Motorola DSP 56307
orders for simplicity) depends highly on the used signals, the opti- (DSP=Digital Signal Processor) for the described sample-based
mum value varies approximately in the range from 1 to 10. Better methods including the filter operations are summarized in Table 1.
results are expected for gradient weights which are adaptively de- For the GAL (with adaptive gradient weights) only an estimation is
pendent on the expectation value of the sum of the forward and given since it is not implemented yet on the DSP. Furthermore, the
backward prediction error energies. An approximation of this ex- required update of the lattice states is only estimated to be of the
pectation value can be recursively calculated by [7] order of , as in the Burg algorithm. If not using the states up-
date, during the GAL coefficient update the prediction error using
the states corresponding to the old coefficient set is calculated.
(20)
where influences the weight of older samples. The Method Computation DSP instructions
gradient weights are obtained by
suc. FIR
(21) autoc. coeff calc. ( )
coeff calc. ( )
with a constant value which is normally chosen to suc. lattice + states update
for a recursive formulation of the Burg algorithm [7]. In our Burg coeff calc. ( )
simulations using different sounds the optimum value was coeff calc. ( )
independent of the used sound. GAL lattice states update
coeff calc. + std. lattice
3.4. Modification of Block-based Methods for Real-Time Com-
putation Table 1: Required DSP instructions per sample for real-
Both the autocorrelation method and the Burg algorithm require time prediction methods and filter operations dependent on
first an initialization process before the prediction coefficients are block length and prediction order .
computed recursively. The real-time computation of a coefficient
set of order is spread over samples; one sample for the
Table 2 shows the maximum workloads per sample for calcu-
initialization and one sample each for the coefficients. Thus,
lating the filter coefficients, i.e. the filter operations to calculate the
with the counter (changing every sample)
prediction are not considered. In the GAL the prediction order
we get the following procedure:
has the greatest influence on the complexity. In this method for
For perform the initialization. each sample divisions are required which are very expensive on
For calculate the coefficient with index . a DSP. Notice that in the Burg algorithm the maximum workload
is only influenced by the block length which is also the case in
the autocorrelation method for long blocks.
3.4.1. Successive Burg Algorithm
In the initialization process ( ) the operations of Equations (6) Method DSP instructions
and (7) are performed for setting both the forward and backward
prediction error of order zero to the input samples in the con- suc. autocorrelation
sidered block. For one coefficient is calculated suc. Burg
according to Equation (8), and applying the recursions of order GAL
after Equations (9) and (10), the forward and backward
prediction errors of -th order are computed which are required in Table 2: Maximum workload per sample for coefficient cal-
the following sample for computing . The new replaces the culation.
previously used . Since one coefficient is changed, a recalcula-
tion of the lattice states prior to the filter operation is required.
In the Levinson-Durbin recursion the autocorrelation sequence In this section we present some simulation results to compare the
for is required, see Equation (11). In the ini- performance of the sample-based linear prediction methods re-
tialization process ( ) first the input data block is windowed, garding the required DSP workload. For the simulations we use
where normally a Hamming window is used; then is com- two short sequences from a piano and a triangle sound, respec-
puted. For the value is computed followed by the tively. The duration is approximately 0.8 seconds each and the
DAFX-3
Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-00), Verona, Italy, December 7-9, 2000
sequences are taken from tracks 39 and 32 of the EBU CD [10]. increasing prediction order in case of using the Burg algorithm at
The following parameters are used: short blocks. It seems that the GAL performs for a given prediction
order like the Burg algorithm with a long block length. For the tri-
Successive autocorrelation method:
angle sequence none of the methods is able to reach the maximum
with ;
prediction gain with the used parameters. Notice that in these re-
with
sults the complexity for the filter operations is not included which
Successive Burg algorithm: is much higher for the lattice-based Burg and GAL methods.
with ;
with ;
5. MUSICAL APPLICATIONS OF LPC
with
GAL with : LPC has been very active in the very early years of computer mu-
sic, technically as well as musically. Some publications regarding
For each parameter set the prediction error is calculated by
to LPC based audio effects and musical applications are those from
using the described algorithms. Afterwards the segmental predic-
Dodge [5], Lansky [4], and Moorer [3]. Impressive pieces of mu-
tion gain (seg ) is calculated which is the average value of the
sic have been achieved at a time where real-time was only a dream.
prediction gains of small blocks of 100 samples duration. The pre-
Nowadays some powerful software implementations of musi-
diction gain of a length block is computed by
cal processing offer LPC as a way to modify, transform, mutate,
hybridize sounds. Csound is a classical one and it has functions
dB (22) to perform LPC filter operations. SoundSculpt follows its earlier
version, named SVP, with a cross-synthesis between two sounds
using LPC.
The prediction gain is a measure for the quality of the spectral First the general structure used to produce LPC based audio ef-
model. The inverse values of the averaged spectral flatness mea- fects is presented and then some special applications are described.
sures are appr. 53 dB for the piano and 23 dB for the triangle,
which are upper bounds for the achievable prediction gains [11]. 5.1. General Approach
Figure 3 shows the obtained seg values dependent on the ma-
ximum DSP workload per sample given in Table 2. For the block- Producing a sound is mainly based on the LPC analysis/synthesis
based methods the plots show lines for constant values of the pre- structure of Figure 1. As mentioned at the beginning, if using the
diction order . In the plots for the Burg algorithm (on the right) residual coming out of the analysis stage as excitation to the syn-
additionally the results of the GAL (dashed) are shown. thesis filter, the original sound is recovered. Now either the excita-
tion or the synthesis filter or both of them are changed to generate
piano, autocorrelation piano, Burg/GAL a sound different from the input. Figure 4 shows the LPC synthe-
50 50
GAL
sis with excitation and synthesis filter . The subscript
45 45 indicates ”synthesis”. The excitation may be a processed
version of the residual or a sound not related to the signal which
segG /dB
segG /dB
40 40
is used for processing the spectral model. As explained in the fol-
p
35 35
lowing, a gain factor is required for the amplitude control of
30 30 the synthesized sound .
25 25
0 500 1000 1500 2000 0 1000 2000 3000 4000
DSP instructions per sample DSP instructions per sample
GAL
10 10
segGp/dB
5 5
0 0
5.1.1. Estimation of the Gain Factor
0 500 1000 1500
DSP instructions per sample
2000 0 1000 2000 3000
DSP instructions per sample
4000 Using as excitation the residual computed by the inverse filter
operation
Figure 3: Segmental prediction gain over maximum DSP (23)
instructions per sample for sample-based LP methods; lines will result exactly in the input signal . Thus if changing the
with constant pred. order for the block-based methods. excitation, the gain factor is used to scale the amplitude of
the used excitation in that way that the excitation’s energy is
From these results it can be seen that the performance of the similar to the energy of the residual . This can be done recur-
methods depend highly on the input signals. It is not possible to sively to have a permanent update of the gain factor. The energies
say in general which method gives a better prediction gain for a of the residual and of the excitation can be calculated recursively
fixed DSP workload. The GAL and Burg methods perform better (see Figure 5) by the first order IIR filter
for the piano sequence while the Burg algorithm does not work
satisfying at very small block lengths for the triangle sequence (24)
(seg in some cases). The prediction gain decreases for an (25)
DAFX-4
Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-00), Verona, Italy, December 7-9, 2000
0.05
x(n) 0
Ee(n) /dB
But this expression gives sometimes very rapidly changing values. 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
t/s
To smooth the gain factor it can be calculated recursively as well
by Figure 6: Energies and gain factor for cross-synthesis.
power spectra in dB power spectra in dB
0 0
original original
otherwise LPC filter LPC filter
(27)
Experiments have shown that a value of gives sa-
tisfactory results. For greater values of more weight is given to
the energy of past samples which results in slower varying esti-
mates. The parameter is comparable to a block length if a block
of samples is used to estimate the signal energy. Since the residual 0 5 10 15 20 0 5 10 15 20
is assumed to be spectrally flat, the gain factor has to be chosen f/kHz f/kHz
DAFX-5
Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-00), Verona, Italy, December 7-9, 2000
5.2.4. Pitch Shifting We presented algorithms for the sample-based calculation of linear
prediction (LP) coefficients. Apart from the well-known gradient
To modify the pitch of the resulting signal, for voiced parts the
adaptive lattice (GAL) we presented modifications of the com-
pitch of the modelled excitation can be changed. This effect can
monly used block-based approaches like autocorrelation method
also be combined with the frequency warping to independently
and Burg algorithm. The performance of the considered methods
change the formants induced by the synthesis filter. It is also pos-
depends highly on the input signal. The lattice-based methods
sible to apply chorus or flanging or whatever effect over the ideal
GAL and Burg algorithm have the disadvantage that in the lattice
excitation before it is fed into the synthesis filter.
filter structure an update of the filter states is required for every
coefficient change. An improvement on the lattice states update is
5.3. Cross-Synthesis between two Sounds under further investigation. For the GAL a modification may be
A very classical algorithm is to use two different sounds ( and developed which reduces the very high complexity.
) and take the residual of one ( ) as the excitation to the LPC Apart from possible coding applications (e.g. ADPCM with
filter of the other ( ). The main structure is shown in Figure 8. low delay) the sample-by-sample update of the spectral model is
This effect is very reminiscent of the ”vocoder effect” that comes beneficial in LP-based audio effects. Some audio effects have been
from professional vocoder (channel vocoder) or phase vocoder. described including a suggestion for an efficient way to calculate
The synthesis filter is computed from signal by using LPC the permanently changing gain factor required in the realization of
analysis with a high filter order for representing the harmonic struc- the effects.
ture of this sound. To compute the excitation an LPC inverse fil-
tering with a low prediction order is performed to whiten the input 7. REFERENCES
. As reported by Moorer [3] an order of 4 to 6 works well for
this whitening process. The cross-synthesis gives good results if a [1] John Makhoul, “Linear Prediction: A Tutorial Review,” Proceedings
speech signal is used to compute the synthesis filter which results of the IEEE, vol. 63, no. 4, pp. 561–580, Apr. 1975.
for example in the ”talking guitar”. [2] James A. Moorer, “Audio in the New Millennium,” Journal of the
AES, vol. 48, no. 5, pp. 490–498, May 2000.
[3] James A. Moorer, “The Use of Linear Prediction of Speech in Com-
puter Music Applications,” Journal of the AES, vol. 27, no. 3, pp.
134–140, Mar. 1979.
[4] P. Lansky, “Compositional Applications of Linear Predictive Cod-
ing,” in Current Directions in Computer Music Research, Max V.
Mathews and John Pierce, Eds., pp. 5–8. MIT Press, Cambridge,
Mass., 1989.
[5] C. Dodge, “On Speech Songs,” in Current Directions in Computer
Figure 8: Cross-synthesis structure. Music Research, Max V. Mathews and John Pierce, Eds., pp. 9–17.
MIT Press, Cambridge, Mass., 1989.
For musical satisfactory results the two used sounds have to [6] John Makhoul, “Stable and Efficient Lattice Methods for Linear Pre-
be synchronized. Thus, for the case of mixing speech and music, diction,” IEEE Trans. on Acoustics, Speech, and Signal Processing,
the played instrument must fit to the rhythm of the syllables of the vol. ASSP-25, no. 5, pp. 423–428, Oct. 1977.
speech. The performance is improved if either speech or music is [7] Sophocles J. Orfanidis, Optimum Signal Processing, An Introduc-
coming from a prerecorded source and the other sound is produced tion, McGraw-Hill, Singapore, second edition, 1990.
to match to the recording [3]. [8] Peter M. Clarkson, Optimum and Adaptive Signal Processing, CRC
Press, Boca Raton, Florida, 1993.
5.3.1. Using Spectrally flat Signals as Excitation [9] D. O’Shaugnessy, Speech Communication. Human and Machine,
Addison-Wesley, New York, 2nd edition, 2000.
If a sound is spectrally almost flat, this sound can be used directly
as excitation, thus using in Figure 8. For example [10] European Broadcasting Union (EBU) Technical Centre, Brussels,
Sound Quality Assessment Material. Recordings for Subjective Tests,
the sound of the ocean (or another noise-like unpitched signal) can CD and User’s Handbook for the EBU-SQAM Compact Disc,
be used as excitation. This effect is very different from the prece- Tech. 3253-E, Apr. 1988.
ding ones because now it is the pitch of the sound which is
[11] N.S. Jayant and P. Noll, Digital Coding of Waveforms, Prentice-Hall,
given to the resulting sound if the prediction order for the spectral Englewood Cliffs, New Jersey, 1984.
model of is high enough to capture its harmonic structure. This
effect can also work for some music signals like the sound of a [12] P. Lansky and K. Steiglitz, “Synthesis of Timbral Families by
Warped Linear Prediction,” in Proc. of the IEEE ICASSP, Atlanta,
distorted guitar, but the pitch structure is then a mixture of the two Georgia, 1981.
pitches. In this case the resulting sound will have some loss of
high frequencies.
DAFX-6