Efficient Algorithms For MPEG-4 AAC-ELD, AAC-LD and AAC-LC Filterbanks
Efficient Algorithms For MPEG-4 AAC-ELD, AAC-LD and AAC-LC Filterbanks
Efficient Algorithms For MPEG-4 AAC-ELD, AAC-LD and AAC-LC Filterbanks
Ravi K. Chivukula, Yuriy A. Reznik1, Venkat Devarajan The University of Texas at Arlington, Email: {ravikiran.chivukula,venkat}@uta.edu Stanford University, Email: yreznik@stanford.edu Abstract
Recently MPEG has developed a new audio coding standard - MPEG-4 AAC Enhanced Low Delay (ELD), targeting low bit rate, full-duplex communication applications such as audio and video conferencing. The AAC-ELD combines low delay SBR filterbank with a low delay core coder filterbank to achieve both high coding efficiency and low algorithmic delay. In this paper, we propose an efficient mapping of the AACELD core coder filterbank to the well known MDCT. This provides a fast algorithm for the new filterbank. Since AAC-LD and AAC-LC profiles also use MDCT filterbank, this mapping enables efficient joint implementation of filterbanks for all 3 profiles. We also present a very efficient 15-point DCT-II algorithm that is useful for implementation of all 3 profiles with frame lengths of 960 and 480. This algorithm requires just 17 multiplications and 67 additions. The overall design structure and complexity analysis of proposed implementation of the filterbanks is also provided. Hence they are unsuitable for full-duplex communication. MPEG-4 AAC-LC [9] is an example of this type of codec. MPEG-4 AAC Low Delay (LD) [9] codec reduces algorithmic delay by halving the frame length from 1024/960 to 512/480; by removing block switching and by minimizing the use of bit reservoir in the encoder. AAC-LD could reduce the delay down to 20ms but it still required bit rates close to 64kbps per channel to deliver satisfactory audio quality [6]. Recently, MPEG standardized Enhanced Low Delay AAC (AAC-ELD) codec [6, 7, 19]. This codec addresses the drawbacks of AAC-LD by incorporating a low-delay spectral band replication (LD-SBR) tool and a new low-delay core coder filterbank. The LDSBR tool improves coding efficiency and also has minimal delay [6, 19, 20]. The delay of the new core coder filterbank is independent of window length [6, 8] and hence, a window with multiple overlap (for good frequency selectivity) can be used. Parts of the window that access future input values are zeroed out, thus reducing the delay further. AAC-ELD achieves an algorithmic delay of only 31ms with good audio quality at low bit rates of 32kbps per channel [19]. In this paper, we map the AAC-ELD core coder filterbanks to the well known MDCT. The mapping involves only permutations, sign changes and additions. As many fast algorithms exist for MDCT, this mapping essentially provides a fast algorithm to implement the new filterbanks. Since LC and LD profiles use MDCT filterbanks, the mapping also provides a common framework for the joint implementation of filterbanks in all 3 profiles. We also present a very efficient algorithm for 15-point DCT-II useful for frame lengths of 960 and 480. Complexity analysis of the AAC-ELD core coder filterbanks is provided at the end.
1. Introduction
Traditionally, speech and audio coding paradigms have been significantly different. Speech coding is primarily based on source modeling [1], and low round trip algorithmic delay for full-duplex communications could be achieved [2]. However, most speech codecs are only efficient in encoding single-speaker material and are unsuitable for generic audio content [6]. On the other hand, audio coding is based on modeling the psychoacoustics of human auditory system [3]. The codecs are intended for perceptually transparent reproduction of generic music material. The delay of these codecs is generally high due to long frame lengths and the use of orthogonal filterbanks such as Modified Discrete Cosine Transform (MDCT) whose delay depends on the length of the window [5, 8].
1
978-1-4244-1724-7/08/$25.002008IEEE
1629
ICALIP2008
2. Definitions
The MPEG-4 AAC ELD core coder analysis and synthesis filterbanks are defined as follows [7]:
X (k) = 2.
N 1
X (k) = 2
n=N N 1 2
z(n)cos N ( n + n ) k + 2
0
N 1
for 0 k <
N 2
= 2.(1)k
X (N / 2 1 k) = 2.(1) 2 = (1) 2
N 1k N1 n=0
N 1
N 1k
where, n0 = ( N / 4 + 1 / 2) , z(n) denotes windowed input data samples, X(k) denotes subband coefficients, x(n) denotes reconstructed samples (prior to aliasing cancellation). N is 1024 or 960. The MDCT and IMDCT are defined as [5,9]: N 1 2 1 X (k) = 2. z(n)cos (n + p0 ) k + ; k = 0,...., N / 2 1, 2 n =0 N
x(n) =
spectrum coefficients, x(n) denotes reconstructed samples (prior to aliasing cancellation) and N is the length of the input sequence. Hereafter, for brevity, we will use the terms DCT and IDCT to refer to DCT-II and IDCT-II transforms respectively without the normalization factors [4].
We note that the summation on the RHS is an MDCT. Thus, the algorithm for analysis filterbank is: 1. Form the sequence {z(n) z(n-N)} for 0n<N, 2. Invert the signs of the even indexed samples if N/4 is even or invert the signs of odd-indexed samples if N/4 is odd, 3. Apply MDCT, 4. Reverse the order of the output, 5. Invert the signs of the odd-indexed samples if N/2 is even or invert the signs of even-indexed samples if N/2 is odd. The flow graph for the analysis filterbank is shown in Fig. 1 assuming N/4 is even.
2 x(n + N ) = N = 2 N
2 1 X (k ) cos N ( n + N + n ) k + 2
0
2 1 X (k ) cos N ( n + n ) k + 2
k =0
N 1 2
= x ( n) For 0 n < N , x ( n) =
1
2. = 2.
N 1
2 1 z(n)cos ( n + n0 ) k + 2 N n=0 2
0
2 N 2 N
N /2 1 k =0 N /2 1 k =0
X (k ) cos N n + p (1)
k
N 1 k + 2 2
z(n N)cos N ( n N + n ) k + 2
n=0 N 1 n=0 N 1
N 1
2 1 X (k )sin ( n + p0 ) k + 2 N
2.
z(n)cos N ( n + n ) k + 2
0
N
k=0
1 N N ( n+ p0 ) 2 1k + 2 1 ( n+ p0 ) k + 2 N
2 n+ 4 +1 = (1) N
N N/21
k=0
N 2 1k N 2 X (1) 1k cos
1630
We note that the summation on the RHS is an IMDCT. Thus, the algorithm for the synthesis filterbank is: 1. Invert the signs of the odd-indexed spectral coefficients, X(k), if N/2 is even or invert the signs of even-indexed coefficients if N/2 is odd, 2. Reverse the order of the above sequence, 3. Apply IMDCT, 4. Invert the signs of the even-indexed output samples if N/4 is even or invert the signs of oddindexed samples if N/4 is odd; these form the first N output points of the filterbank, 5. The remaining N output samples are obtained by inverting the signs of the first N samples. The flow graph for the synthesis filterbank is shown in Fig. 2 assuming N/4 is even.
the windowing stage, thus reducing multiplications and the storage requirement. The remaining even length DCT can be optimally implemented by decimation process described in [12]. This overall scheme leads to a very efficient MDCT implementation, see [13] for additional details.
5. Implementation of MDCT
A number of MDCT/IMDCT algorithms have been proposed in the literature, see e.g., [5, 10] and references therein. An efficient, and suitable for our purposes, algorithm for MDCT/IMDCT of even lengths has been recently described by Cheng and Hsu [11]. This algorithm maps MDCT/IMDCT to DCT-IV, and DCT-IV in turn can be mapped to DCT/IDCT with pre/post additions and multiplications [12]. These pre/post multiplications in DCT-IV can be merged with
(2n + 1)k
2N
; k = 0,..., N 1
Heideman [14] showed that if N is odd, the DCT can be mapped to an equal length real-input DFT with just input and output permutations and sign changes at the
1631
output. Thus, the computational complexity of an oddlength DCT is equal to that of an odd-length real DFT. Hence, efficient algorithms for 15-point real DFT can be used to implement a 15-point DCT. A 15-point DFT can be efficiently implemented using the Winograd Fourier Transform Algorithm (WFTA) [15, 16]. The WFTA for 15-point DFT uses Winograd 3-point and 5-point DFT modules in a prime factor mapping. Because of the structure of the 3-point and 5-point modules, it is possible to nest together the multiplications in the individual modules, thus reducing the total number of multiplications. See [1517] for details. The 15-point real WFTA, and hence the 15-point DCT, can be implemented with 17 multiplications and 67 additions [17, 18]. The flow graph for the 15-point DCT is shown in Fig. 3. IDCT can be obtained by
transposing this flow graph, i.e., the data flows from right to left, summations become tap-off points and tap-off points become summations. The constants used in the figure are defined below: 2 2 u= ; v= 5 3 cos u + cos 2u cos u cos 2u c1 = 1; c2 = 2 2 c3 = sin u + sin 2u; c4 = sin 2u; c5 = sin u sin 2u
c6 = cos v 1; c7 = c1c6 ; c8 = c2 c6 c9 = c3c6 ; c10 = c4 c6 ; c11 = c5 c6 c12 = sin v; c13 = c1c12 ; c14 = c2 c12 c15 = c3 c12 ; c16 = c4 c12 ; c17 = c5 c12
1632
7. Complexity analysis
In this section, we discuss the computational complexity of the AAC-ELD filterbanks. We assume that the MDCT algorithm discussed in section 5 is used for these filterbanks. Since N is either 1024 or 960, we give the analysis assuming N is of the form 2m or 15 2m (m 3) . Let RMA(N) and RAA(N) denote, respectively, the number of real multiplications and additions required for the analysis filterbank and the preceding windowing operation. Let RMS(N) and RAS(N) denote the corresponding numbers for the synthesis filterbank and the succeeding windowing and overlap-add operation. N/8 samples of the window are actually zeros and hence, multiplications and additions involving these coefficients need not be counted. Then, mN 13 N RM A N = 2 m = RM S N = 2 m = + 4 8 3 mN 5 N RA A N = 2 m = RAS N = 2 m = + 4 8
9. References
[1] [2] [3] A.M. Kondoz, Digital Speech: Coding for Low Bit Rate Communication Systems, 2nd Ed., Wiley, 2004. A. Spanias, Speech Coding: A Tutorial Review, Proc. IEEE, vol. 82, pp. 1541-1582, Oct. 1994. T. Painter and A. Spanias, Perceptual Coding of Digital Audio, Proc. IEEE, vol. 88, pp. 451-515, Apr. 2000. K. R. Rao and P. Yip, Discrete Cosine Transform: Algorithms, Advantages, Applications, New York, Academic Press, 1990. H. Malvar, Signal Processing with Transforms, Artech House, Boston, 1992. Lapped
[4]
[5] [6]
M. Schnell et al., Enhanced MPEG-4 Low Delay AAC Low Bitrate High Quality Communication, 122nd Convention of AES, Vienna, Austria, May 2007. ISO/IEC 14496-3:2005/FPDAM9, Enhanced Low Delay AAC, Apr. 2007. G.D.T. Schuller and T. Karp, Modulated Filterbanks with Arbitrary System Delay: Efficient Implementations and the Time Varying Case, IEEE Trans. Signal Processing, vol. 48, no. 3, pp. 737-748, March 2000. ISO/IEC 14496-3: Subpart 4: General Audio Coding (GA) - AAC, TwinVQ, BSAC.
[7] [8]
RM A ( N = 15 2 m ) = RM S ( N = 15 2 m ) = 2 N + 2 m 1 RM D (15) + (2 m 3) N 8
RAA ( N = 15 2 m ) = RAS ( N = 15 2 m ) = (6 m 7) N 8 where, RMD(15) and RAD(15) are the number of multiplications and additions for 15-point DCT. From section 6 we have, RMD(15) = 17, RAD(15) = 67. Thus, for N = 1024 we have 4224 multiplications and 8320 additions; for N = 960 we have 3544 multiplications and 7512 additions. (59 + RAD (15)).2 m 1 +
[9]
[10] P. Duhamel; Y. Mahieux and J.P. Petit, "A Fast Algorithm for the Implementation of Filter Banks Based on `Time Domain Aliasing Cancellation'," in Proc. ICASSP, pp. 2209-2212 vol. 3, 14-17 Apr 1991. [11] M.-H. Cheng and Y.-H. Hsu, "Fast IMDCT and MDCT Algorithms - A Matrix Approach," IEEE Trans. Signal Processing, vol. 51, no. 1, pp. 221-229, Jan. 2003. [12] C.W. Kok, "Fast Algorithm for Computing Discrete Cosine Transform," IEEE Trans. Signal Processing, vol. 45, no. 3, pp. 757-760, Mar 1997. [13] R.K. Chivukula and Y.A. Reznik, Efficient Implementation of a Class of MDCT/IMDCT Filterbanks for Speech and Audio Coding Applications, accepted for ICASSP 2008. [14] M.T. Heideman, Computation of an Odd-Length DCT from a Real-Valued DFT of the Same Length, IEEE Trans. Signal Processing, vol. 40, no. 1, pp. 54-61, Jan 1992.
8. Summary
In this paper, we presented a mapping of the MPEG-4 AAC-ELD filterbanks to MDCT/IMDCT. This simple mapping involves just permutations, sign changes and additions, and it gives a fast algorithm for the filterbanks. It also provides a framework for the joint implementation of filterbanks in AAC LC, LD and ELD profiles. We also presented a very efficient 15-point DCT algorithm that takes 17 multiplications and 67 additions. Complexity analysis for the AACELD filterbanks for the possible block lengths was provided.
1633
[15] S. Winograd, On Computing the Discrete Fourier Transform, Mathematics of Computation, vol. 32, no. 141, pp. 175-199, Jan 1978. [16] H.F. Silverman, An Introduction to Programming the Winograd Fourier Transform Algorithm (WFTA), IEEE Trans. ASSP, vol. 25, no. 2, pp. 152-165, April 1977. [17] C.S. Burrus and T.W. Parks, Convolution Algorithms Implementation, Wiley, NY, 1985. DFT/FFT Theory and and
[18] H.V. Sorensen et al. Real-Valued Fast Fourier Transform Algorithms, IEEE Trans. ASSP, vol. 35, no. 6, pp. 849-863, June 1987. [19] M. Schnell et al., Low Delay Filterbanks for Enhanced Low Delay Audio Coding, IEEE Workshop on Appl. Signal Proc. to Audio and Acoustics, pp. 235-238, Oct. 2007. [20] M. Dietz et al., Spectral Band Replication, a Novel Approach in Audio Coding, 12th AES Convention, Munich, Germany, Apr. 2002.
1634