Multi_stage_Collaborative_Microphone_Arr

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

CHiME 2011 Workshop on Machine Listening in Multisource Environments September 1, 2011

Multi-stage Collaborative Microphone Array Beamforming


in Presence of Nonstationary Interfering Signals
Danilo Comminiello1 , Michele Scarpiniti1 , Raffaele Parisi1 ,
Albenzio Cirillo2 , Mauro Falcone2 and Aurelio Uncini1
1
Department of Information Engineering, Electronics and Telecommunications,
“Sapienza” University of Rome, Via Eudossiana 18, Rome, Italy
2
Fondazione Ugo Bordoni, Viale del Policlinico 147, Rome, Italy
danilo.comminiello@uniroma1.it

Abstract tive algorithm. The most popular adaptive algorithms in time-


domain are based on the gradient rule, such as the least mean
This paper describes a novel adaptive beamforming technique, squares (LMS) algorithm. The advantage of this family of al-
for speech enhancement applications, designed to be robust to gorithms is the cheaper computational cost. However, LMS
nonstationary interfering sources in noisy and reverberant en- shows poor convergence performance when the filter length
vironments. The proposed beamforming architecture aims at is quite long [3], that is the rule in acoustic applications. A
extracting the desired source signal and suppressing interfer- faster convergence rate can be yield using Hessian-based adap-
ing signals in a multisource environment with unknown a priori tive filtering; a typical algorithm is the recursive least squares
conditions. This purpose is realized by means of a multi-stage (RLS). However, RLS entails an high computational complex-
collaborative generalized sidelobe canceller. The trademark ity; therefore, adaptation can become prohibitively expensive,
of this architecture relies on a two-level convex combination compromising real-time implementations. A good compromise
of two multiple-input single-output (MISO) adaptive systems, can be obtained by using the family of affine projection algo-
which improves the beamformer capability to track undesired rithm (APA), which is widely used in adaptive beamforming
sources, in order to achieve a stronger suppression of interfer- [4], [5], showing better convergence rates and manageable com-
ing signals. The potency of the proposed architecture is proved putational complexity. The proposed beamforming technique
enhancing the speech quality of the desired source in a hands- uses a fast affine projection (FAP) algorithm [6] to adapt filter
free teleconferencing application. coefficients.
Index Terms: Nonstationary Adaptive Beamforming, Speech Moreover, in order to make the beamformer robust to non-
Enhancement, Combination of Adaptive Filters stationary sources, we propose a collaborative adaptive struc-
ture, in which, for each channel, we perform the convex com-
1. Introduction bination of two adaptive filters of different families in order to
obtain an algorithm with superior tracking capability [7]. Fur-
Machine listening aims at extracting, from audio signals, useful thermore, in order to use the best parameter setting for each
informations for computational or human purpose, such as anal- filter we introduce a preventive convex combination between
ysis or synthesis of audio signals. In hands-free speech com- filters of the same family, thus achieving optimum conditions
munications, the audio signals of interest are speech signals, for each combination [8]. The proposed approach is realized
and the audio spatial perception is the desired information to re- in a multi-stage collaborative architecture, since the filtering is
trieve because it allows to distinguish a certain voice in a noisy carried out in more steps.
environment, simulating the binaural human hearing. In mul-
This paper is organized as follows: the microphone array
tisource environments, the presence of interfering signals and
beamforming technique is described in Section 2; the proposed
reverberation may cause the loss of spatial information, thus
multi-stage collaborative filtering is detailed in Section 3 and in
resulting in a degradation of speech intelligibility. In order to
Section 4 the effectiveness of the proposed beamforming system
tackle this problem, speech enhancement systems are widely
is assessed. Finally, in Section 5 our conclusions are drawn.
employed in distant talking applications. Microphone array
beamforming represents a class of such speech enhancement
techniques. Beamforming systems exploit the properties of mi- 2. Microphone Array Beamforming
crophone interfaces which facilitate binaural hearing. However, Technique
in order to achieve a quite good recovery of binaural percep-
tion, beamforming techniques need to control some aspects of The beamforming architecture used in this paper, depicted in
the multisource communication: the spatial realism of sound Fig. 1, is a typical GSC configuration composed of a mi-
rendering, the high-quality of acquired speech signals, and the crophone array interface, a fixed delay-and-sum beamformer
nonstationary of sources which can talk without tethered micro- (DSB), and an adaptive noise cancelling (ANC) path.
phones while moving in the environment [1]. Considering a microphone array interface composed of M
Among beamforming techniques, the generalized sidelobe sensors, the m-th microphone signal, with m = 0, . . . , M − 1,
canceller (GSC) [2] is highly effective in acquiring a desired is a delayed replica of the target signal s [n] convolved with the
source and adaptively reducing interfering signals. The potency acoustic impulse response (AIR) am between the m-th micro-
of an adaptive beamformer depends on the choice of the adap- phone and the desired source, with the addition of background

64
τ1 FAP Algorithm
τ2 d[n] + e[n]
Σ Delay Σ 1. Initialization: R0,k = δIP , ε̂0,k = 0, Ê0,k = 0
-
⌈P ⌉T ⌈P ⌉ ⌊P ⌋T ⌊P ⌋
τM z[n] 2. Rn,k = Rn−1,k + Xn,k Xn,k − Xn−1,k Xn−1,k
xk[n]
Collaborative
3. rn,k = [Rn,k [n − 1] , . . . , Rn,k [n − P + 1]]
Blocking
Adaptive Noise
Matrix T
Canceller 4. yk [n] = wn,k xn,k + rn,k ε̂n,k
5. ek [n] = d [n] − yk [n]
 
Figure 1: Microphone array beamforming architecture. µek [n]
6. En,k =  
(1 − µ) Ên−1,k
noise vm [n]. The DSB spatially aligns the microphone sig- 7. Ên,k = [Ek [n] , . . . , Ek [n − P ]]T
nals with reference to the desired source direction, yielding the
speech reference signal d [n]: 8. gn,k = R−1
n,k En,k
 
M −1 L−1 0
εn,k =   + gn,k
! !
d [n] = am [l] s [n − l] + vm [n] (1) 9.
m=0 l=0 ε̂n−1,k

where we suppose that each AIR between the desired source 10. ε̂n,k = [ε̂k [n] , . . . , ε̂k [n − P ]]T
and the m-th microphone has the same length denoted with L.
In the adaptive path of the beamformer, the blocking ma- 11. wn,k = wn−1,k + εk [n − P + 1] xn−P +1,k
trix (BM) generates the noise references xk [n], with k =
0, . . . , K − 1, being K = M − 1. The blocking matrix is im- Table 1: Summary of FAP algorithm.
plemented by pairwise differences between microphone signals
[9]. The noise reference signals are then processed by the col-
laborative ANC, whose structure will be described in the next (FAP) algorithm [6], which is summarized in Table 1, omitting
section. The task of the collaborative ANC is to remove the the MISO system index for a better comprehension.
residual noise components in the speech reference signal, min- It is well known that the combination of filters of different
imizing the output power and yielding the beamformer output families of algorithms can improve the tracking capabilities of
signal e [n]. the whole system [7]. In particular, important results can be
achieved combining a family of gradient-based algorithms and
a family of Hessian-based algorithms [7]. Taking into account
3. Collaborative Adaptive Noise Canceller this point, a first distinction between the four MISO systems
The trademark of the proposed beamforming technique is repre- can be made choosing different values for the projection order.
sented by the structure of the collaborative ANC. Generally, an In fact, for P (j) = 1 the FAP algorithm turns into the NLMS
ANC is composed of an adaptive filter bank forming a MISO algorithm yielding gradient-based properties, while for P (j) >
system. The adopted architecture, depicted in Fig. 2, is a 1 the FAP algorithm preserves its Hessian nature. Therefore,
multi-stage convex combination of adaptive filters. In partic- we set P (j) = P1 = 1 for j = 1, 2, and P (j) = P2 > 1
ular, the structure is composed of four different MISO systems, for j = 3, 4. This choice will affect the second-stage convex
each bringing different filtering capabilities to the whole beam- combination. The second stage combination is a system-by-
former. Each MISO system receives the same input signals, system combination scheme. On the other hand, the convex
which are the noise reference signals coming from the BM. The combination of the first-stage will involve the MISO systems
j-th MISO system can represent the input signals in an L×P (j) having the same projection order. In particular, the first stage
(j) involves two different convex combinations, one for systems
reference noise matrix Xn,k :
j = 1, 2 and another one for systems j = 3, 4. In this case we
(j) "
Xn,k = xn,k xn−1,k . . . xn−P (j) +1,k
#
(2) differentiate the systems according to the step size value µ(j) :
we choose a small step size µ(j) = µ1 for j = 1, 3 and a large
step size µ(j) = µ2 for j = 2, 4. In this way we further improve
 ' ( 
xk [n] ··· xk n − P (j) + 1
 ' (  the mean-square performance of the adaptive filtering [8]. The

 xk [n − 1] ··· xk n − P (j) 
 kind of combination scheme performed in the first stage is the
= .. ..
 filter-by-filter scheme.
 .. 

 . . . 
 Let i = 1, 2 the index which refers to the convex combi-
' (
xk [n − L + 1] · · · xk n − P (j) − L + 2 nation of the first stage. As it is possible to see in Fig. 2, con-
sidering the i-th combination, the k-th filter output of the first
where P (j) represents the projection order for all the fil- MISO system, is convex combined with the correspondent k-th
(j) filter output of the second MISO system, yielding K outputs,
ters of the j-th MISO system. We denote with wn,k =
(i)
'
(j) (j) (j)
(T denoted as zk [n], each related to a noise reference:
wk [n] , wk [n − 1] , . . . , wk [n − L + 1] the L × 1 co- , -
(i) (i) (j) (i) (j+1)
efficient vector of the k-th filter belonging to the j-th MISO zk [n] = λk [n] yk [n] + 1 − λk [n] yk [n] (3)
system, with j = 1, . . . , 4, at n-th time instant. Each filter
of the ANC is adapted according to the fast affine projection where, in this case, the system index is j = 1 when i = 1,

65
NLMS MISO Systems
where η [n] is the mixing parameter of the second stage, adapted
x0[n] (1)
y0 [n]
using an auxiliary parameter, similarly to (4).
(1)
wn,0 (1)
0[n] Once computing the second stage convex combination, it is
x1[n] (1)
possible to derive the overall beamformer output signal e [n]:
y1 [n]
(1)
wn,1 Σ
(1)
1[n] (1)
0[n] e [n] = d [n] − z [n] . (6)
(1)
z0[n]
Σ
xK-1[n]
(1)
wn,K-1
(1)
yK-1 [n] The multi-stage collaborative architecture presented above im-
(1)
(1)
z1[n] proves the tracking capabilities of the ANC [7], giving robust-
1[n]
Σ ness to the overall beamforming system in presence of nonsta-
(2)
wn,0 (1)
tionary interfering signals.
(2) K-1[n]
y0 [n] (1)
zK-1[n]

(2)
wn,1 (2)
Σ 4. Simulation Results
y1 [n]
(1)
K-1[n]
n In the this section we carry out two different sets of experi-
ments: the first set aims to assess the effectiveness of the multi-
(2)
wn,K-1 (2)
yK-1 [n]
stage collaborative filtering adopted in the proposed beamform-
z[n]
ing architecture; the second set of experiments is performed to
Σ evaluate the proposed beamforming architecture for speech en-
FAP MISO Systems
hancement application in multisource environments. Both the
(3)
(3)
y0 [n] experiments take place in a 10 × 6, 6 × 3 m room with a rever-
wn,0
beration time of T60 = 150 ms.
(2)
0[n]
n
(3)
y1 [n]
(3) Σ
wn,1 (2)
1[n] (2)
4.1. Evaluation of the Multi-stage Collaborative Filtering
0[n]
(2)
z0[n] In the first set of experiments, in order to prove the effectiveness
Σ
(3)
wn,K-1
(3)
yK-1 [n] of the multi-stage collaborative filtering, we analyze a single-
(2)
(2)
z1[n] channel (i.e. K = 1) acoustic echo cancelling application, in
1[n]
Σ which the acoustic environment changes due to a nonstationary
(4)
wn,0 (2)
source or to an alteration in the environemental conditions. The
(4) K-1[n]
y0 [n] (2)
zK-1[n] AIR is simulated by means of Roomsim, which is a Matlab tool
(4) Σ
[10]; the AIR is measured by using an 8 kHz sampling rate and
wn,1 (4)
y1 [n] it is truncated after L = 300 samples. The length of the experi-
(2)
K-1[n]
ment is t = 10 s. Furthermore, an independent white Gaussian
(4)
noise with zero mean and unit variance is added as background
wn,K-1 (4)
yK-1 [n] noise, in order to provide 20 dB of signal to noise ratio (SNR).
In order to introduce an abrupt change in the environment, we
shift the AIR circularly to the right by 50 samples, 5 s after the
Figure 2: Multi-stage collaborative adaptive noise canceller. start of the adaptive process. We choose the following parame-
ter settings: µ1 = 0.1, µ2 = 0.9, P1 = 1, P2 = 2, δ = 30σx2k ,
(i)
where σx2k is the power of the input signal. In order to measure
and j = 3 when i = 2. In (3), λk [n] represents the k-th the filtering performance we use the normalized misalignment
mixing parameter of the i-th combination of the first stage, and M, expressed in dB, defined as:
it is updated using a gradient descent rule through the adapta- 1 1 
(i) (i) (j) 1
, ak [n],
tion of an auxiliary parameter, - related to λk [n] by the 1hn − ĥn,k 1
1
(i) (i) 2
M = 20 log10  (7)

expression λk [n] = sgm ak [n] , according to [8]: #hn #2

(i) (i)
ak [n + 1] = ak [n] + (4)
(j)
where hn is the AIR column vector, and ĥn,k is the estimated
µa (j) (i) (i)
,
(i)
-
+ (i)
ek [n + 1] ∆ek [n + 1] λk [n] 1 − λk [n] filter.
qk [n] Figure 3 displays the performance results; it is possible
(i)
where ∆ek [n + 1] = ek
(j+1)
[n + 1] − ek [n + 1], µa is a
(j) to see that the multi-stage collaborative filtering exploits the
common step size value for the adaptation of each auxiliary pa- tracking capabilities of all the four filters, always taking the be-
(i) (i)
,
(i)
-2 haviour of the best performing filtering. Furthermore, in Fig. 4
rameter; qk [n] = βqk [n − 1] + (1 − β) ∆ek [n + 1] it is possbile to notice the behaviour of the three mixing parame-
is the estimated power of ∆ek [n + 1], and β is a smoothing
(i) ters, λ(1) [n] and λ(2) [n] related to the first-stage combination,
factor. and η [n] related to the second-stage combination. Observing
In the second stage a system-by-system convex combina- Fig. 4 it’s still more easy to comprehend the collaboration be-
tion is carried out between the two outputs yielded by the first tween the four different filterings.
stage. The second-stage output signal, denoted with z [n], rep-
resents the overall ANC output: 4.2. Evaluation of the Collaborative Beamformer

K−1 K−1 In the second set of experiments we assess the effectiveness of


(1) (2) the proposed beamforming architecture in terms of speech en-
! !
z [n] = η [n] zk [n] + (1 − η [n]) zk [n] (5)
k=0 k=0
hancement. The scenario is the same of the previous simula-

66
5 1
MISO System #1 − P=1, µ=0.1

λ(1)[n]
0 MISO System #2 − P=1, µ=0.9 0.5
MISO System #3 − P=2, µ=0.1
Normalized Misalignment [dB]

MISO System #4 − P=2, µ=0.9


0
−5 0 1 2 3 4 5 6 7 8 9 10
Collaborative Filtering
−10
1

λ(2)[n]
−15 0.5

−20 0
0 1 2 3 4 5 6 7 8 9 10
−25
1

η[n]
−30
0.5
−35
0 1 2 3 4 5 6 7 8 9 10 0
Time [sec]
0 1 2 3 4 5 6 7 8 9 10
Time [sec]

Figure 3: Normalized misalignment comparison. Figure 4: Mixing parameter behaviours.

GSC 0-2 s 2-5 s 5-7 s 7-10 s 0-10 s


tions; in this case the source of interest is a female speaker lo-
cated 50 cm from the center of the microphone array. Two inter- NLMS, µ1 13.2 22.7 12.5 22.5 15.2
fering pink noise sources are located respectively 1, 2 m and 2, 2 NLMS, µ2 15.1 17.3 15.8 18.2 16.1
m from the center of the microphone interface; the first source
FAP, µ1 15.2 22.6 13.1 23.7 17.9
is on the right of the speaker and the second is on the left. White
Gaussian noise is added at microphone signals as diffuse back- FAP, µ2 17.4 18.5 17.1 19.6 17.8
groud noise. The overall input SNR level, measured for each MSC 24.6 31.2 26.7 32.3 28.5
microphone signal, is of about 5 dB. The microphone interface
is a common uniform linear array (ULA) composed of 5 omni- Table 2: SNR comparison in dB.
directional sensors equally spaced with a distance of 4 cm. In
order to introduce a change in the acoustic environment, after 5
s from the start of the experiment, we move the two interfering vironments. The proposed architecture is evaluated in terms of
sources 50 cm to the right, keeping unchanged their distance convergence performance and SNR improvement in speech en-
from the center of the array. The enhancement of the speech, hancement applications, in which the multi-stage collaborative
provided by the beamformer, and the resulting noise reduction, beamformer outperforms standard beamforming techniques.
are usually associated with an SNR improvement, defined as
[9]: 6. References
5 8
E u2in [n] [1] Y. Huang, J. Chen, and J. Benesty, “Immersive audio schemes,”
6 7
SNR = 10 log (8) IEEE Signal Processing Magazine, vol. 28, no. 1, pp. 20–32, Jan.
E {u2in [n]} − E {u2out [n]} 2011.
[2] L. Griffiths and C. Jim, “An alternative approach to linearly con-
where uin [n] is the generic input clean signal and uout [n] is strained adaptive beamforming,” IEEE Trans. on Antennas and
the processed signal. The operator E {·} is the mathematical Propagation, vol. 30, no. 1, p. 27 34, Jan. 1982.
expectation. We compute the SNR level over the total length [3] A. H. Sayed, Fundamentals of adaptive filtering. Hoboken, NJ:
of the experiment (0 − 10 s) and also in 4 different time sub- John Wiley & Sons, Inc., 2003.
intervals: in the first transient state, from 0 − 2 s; in the fol-
[4] Y. R. Zheng and R. A. Goubran, “Adaptive beamforming using
lowing steady state from 2 − 5; from 5 − 7 s to evaluate the affine projection algorithms,” in WCCC-ICSP 2000, vol. 3, Bei-
new transient state after the path changes; in the following new jing, China, Aug. 2000, pp. 1929–1932.
steady state from 7 − 10 s. We compare the proposed multi- [5] D. Comminiello, M. Scarpiniti, R. Parisi, and A. Uncini, “A novel
stage collaborative (MSC) GSC with four simple GSC beam- affine projection algorithm for superdirective microphone array
formers, each having one of the MISO system used in the MSC beamforming,” in ISCAS 2010, Paris, France, May 2010, pp.
architecture. The results are collected in Table 2 2, in which it 2127–2130.
is possible to notice the behaviour of the different beamformers [6] S. L. Gay and S. Tavathia, “The fast affine projection algorithm,”
and their contribution to the noise reduction in terms of SNR in ICASSP 1995, vol. 5, Detroit, MI, USA, May 1995, pp. 3023–
improement. However, it is evident from Table 2 that the best 3026.
performing architecture is the proposed MSC GSC. [7] M. T. M. Silva and V. H. Nascimento, “Improving the tracking ca-
pability of adaptive filters via convex combination,” IEEE Trans.
on Signal Processing, vol. 56, no. 7, pp. 3137–3149, July 2008.
5. Conclusions [8] J. Arenas-Garcı́a, A. R. Figueiras-Vidal, and A. H. Sayed, “Mean-
In this paper we have introduced a new beamforming technique square performance of a convex combination of two adaptive fil-
whose trademark relies on the use of a multi-stage collabora- ters,” IEEE Trans. on Signal Process., vol. 54, no. 3, p. 10781090,
tive filtering in the ANC block. The multi-stage collaborative Mar. 2006.
structure is composed of four different MISO systems; in the [9] M. Bradstein and D. Ward, Eds., Microphone arrays: Signal pro-
first stage we carry out the convex combinations of MISO sys- cessing techniques and applications. New York: Springer, 2001.
tems adapted by the same family of algorithms in order to find [10] D. R. Campbell, K. J. Palomaki, and G. J. Brown, “Roomsim,
the best configuration for each kind of system. Then, in the a MATLAB simulation of ”shoebox” room acoustics for use in
teaching and research,” Comput. and Inform. Systems, vol. 9,
second stage, the two combination outputs are combined in or-
no. 3, pp. 48–51, 2005.
der to give robustness to the beamformer in nonstationary en-

67

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy