Performance Evaluation of Different Thresholding Methods in Time Adaptive Wavelet Based Speech Enhancement
Performance Evaluation of Different Thresholding Methods in Time Adaptive Wavelet Based Speech Enhancement
Performance Evaluation of Different Thresholding Methods in Time Adaptive Wavelet Based Speech Enhancement
5,December,2009
ISSN: 1793-8236
multiresolution analysis. The discrete detail coefficients and II. BACKGROUND WAVELET ANALYSIS
the discrete approximation coefficients can be obtained by a Wavelet transform has been intensively used in various
multi-level wavelet decomposition. Wavelet-based fields of signal processing. It has the advantage of using
techniques using coefficient thresholding [8], using adaptive variable size time-windows for different frequency bands.
thresholding [9] approaches have also been applied to speech This results in a high frequency-resolution (and low
enhancement. Donoho introduced wavelet thresholding time-resolution) in low bands and low frequency-resolution
(shrinking) as a powerful tool in denoising signals degraded in high bands. Consequently, wavelet transform is a powerful
by additive white noise and more recently a number of tool for modeling non-stationary signals like speech that
attempts have been made to use perceptually motivated exhibit slow temporal variations in low frequency and abrupt
wavelet decompositions coupled with various thresholding temporal changes in high frequency. Moreover, when one is
and estimation methods. Although the application of wavelet restricted to use only one (noisy) signal (as in
shrinking for speech enhancement has been reported in single-microphone speech enhancement), generally the use
literature [10]-[13], there are many problems yet to be of the subband processing can result in a better performance.
resolved for a successful application of the method to speech Therefore, wavelet transform can provide an appropriate
signals degraded by real environmental noise types. The model for speech signal denoising applications. In the
most known thresholding methods in the literature are the present work, the computation of Discrete Wavelet
soft and hard thresholding. We can expect that the technique Transform (DWT) providing sufficient information for both
of soft thresholding would introduce more error or bias than analysis and synthesis of the original signal, with a
hard thresholding does [6]. But on the other hand, soft significant reduction in the computation time. The DWT is
thresholding is more efficient in de-noising. To achieve a considerably easier to implement without needing to perform
compromise between the two methods, the trimmed numerical integration as like Continuous wavelet transform
thresholding method is proposed in this paper for noisy (CWT).
speech co-efficient to reduce the noise. In addition in this
paper, trimmed thresholding technique is used for de-noising
as well as good in preserving the edges. A. DWT Computation
The main objective of the proposed method is to improve The DWT analyze the signal at different frequency bands
on existing single-microphone schemes for an extended
with different resolutions by decomposing the signal into a
range of noise types and noise levels, thereby making this
coarse approximation and detail information shown in
method more suitable for mobile speech communication
applications than the existing. This algorithm introduces a
speech enhancement system based on a time adaptive
discrete wavelet denoising using trimmed thresholding. The
performance of the proposed method was evaluated on Fig.1.The DWT of a signal x(n) is calculated by passing it
several speakers and under various noise conditions through a series of filters. First the samples are passed
including white noise, pink noise, F16 cockpit noise, babble through a low pass filter with impulse response g resulting in
noise, high frequency channel noise and car interior noise. a convolution of the two as in (1)
Subjective experiments by means of a listening test shows
that the system based on this method has significant The signal is also decomposed simultaneously using a
improvement over the wavelet based approach using hard high-pass filter h. The output giving the detail coefficients
and soft thresholding and the state-of-the-art speech
from the high-pass filter and approximation coefficients
enhancement system. The results of the proposed method
from the low-pass filter. It is important that the two filters are
shows that it is well suited for adverse noise conditions and
related to each other and they are known as a quadrature
yields better spectral performance. It is very important
characteristic for speech recognition or speaker verification. mirror filter (2). However, since half the frequencies of the
This paper is organized as follows: Section I represents a signal have now been removed, half the samples can be
survey on the related works. Section II presents principle of discarded according to Nyquists rule. The filter outputs are
wavelet and discrete wavelet computation. Section III then sub sampled by 2.
presents the proposed scheme for speech signal
enhancement. In section IV de-noising by thresholding in the
wavelet domain introduced and the hard ,soft thresholding (2)
are reviewed and a new trimmed thresholding method is
proposed for speech enhancement purpose that has better
performance. Section V shows the steps for implementation
and the experimental results were discussed, which validate
the proposed thresholding algorithm. Finally section VI This decomposition has halved the time resolution since only
summarizes the presented research work half of each filter output characterizes the signal. However,
each output has half the frequency band of the input so the
frequency resolution has been doubled.
440
IACSIT International Journal of Engineering and Technology Vol.1,No.5,December,2009
ISSN: 1793-8236
1
T (a, + ) =
C
1
s
1 + XTADWT (a, )
C + X (a, ) t
Fig.2 A three level filter bank s TADWT
(5)
(1 + 3 ) (3 + 3 )
DAUBECHIES WAVELET TRANSFORM
,
C0 = C1 =
Wavelet decomposition transforms signal from time 4 4, (7)
domain to time-scale domain, and it can describe the local
feature well in both time domain and frequency domain. C2 =
(3 3 ) , C3 =
(1 3 )
Because the amplitude of the discrete detail coefficients of 4 4
441
IACSIT International Journal of Engineering and Technology Vol.1,No.5,December,2009
ISSN: 1793-8236
Then h(k ) = C k / 2 , co-efficients for db4 wavelets are as threshold processing, we study how they influence the
performance of wavelet de-noising.
follows [16],
Specifically, in each wavelet band, we calculate the
h (0 ) =
(1 + 3 ) , h(1) =
(3 + 3) variance of the noisy coefficients. Using (9), each variance
then can then be employed to set the threshold to a value
4 2 4 2 based on the noise energy in the band. In practical situations,
(1 3)
(8)
h(2) = h(3) =
(3 3) one is often encountered with colored rather than white
noises. Assuming zero-mean Gaussian noise, the coefficients
,
4 2 4 2 will be Gaussian random variables of zero mean and variance
2, the standard deviation is thus estimated by (9),
Since, it is a discrete wavelet this computational method
requires no integration and is more efficient. Removing noise = (1 / 0.6745) Median ( | ci | ) (9)
components by thresholding the wavelet coefficients is based
on the observation that in many signals (like speech), energy where ci represents high frequency wavelet coefficients which
is mostly concentrated in a small number of wavelet are used to identify the noise components at first level
dimensions. The coefficients of these dimensions are decomposition. The set of standard deviation values can now
relatively large compared to other dimensions or to any other be used as the noise profile for threshold setting [15].
signal (specifically noise) that has its energy spread over a This noise profile estimation enables the algorithm to cope
large number of coefficients. Hence, by setting smaller with colored noises. Threshold value [10] can be determined
coefficients to zero, one can nearly optimally eliminate noise by (10),
while preserving the important information of the original Th = (2 log (N log2 N) 1/ 2 (10)
signal.
where Th is the threshold value. N is the length of the noisy
Discrete Wavelet Transform (Db) signal. In threshold selection, we should not ignore the
Noisy Speech X DWT (a, ) detail coefficients in every level that probably influence the
x(n)
robustness of the threshold estimating. So we have to rescale
a selected threshold in some level. In this paper, the threshold
Time adaptation is dependent on the detail coefficients at every level.
XTADWT = K(a, ) *XDWT (a, )
IV. DENOISING BY WAVELET THRESHOLDING
The wavelet de-noising technique is called thresholding,
it is a non linear algorithm. It can be decomposed in tree steps.
Threshold value determination The first one consists in computing the coefficients of the
Th = (2 log (N)) 1/ 2 wavelet transform (WT) which is a linear operation. The
second one consists in thresholding these coefficients. The
last step is the inversion of the thresholded coefficients by
applying the inverse wavelet transform, which leads to the
Trimmed Thresholding de-noised signal. This technique is simple and efficient.
'
YTADWT ( a, ) = Thresh{ X TADWT ( a, )} However it relies heavily on the choice of the threshold,
which in its turn depends on the noise distribution. In the
wavelet thresholdimg de-noising, we should first select a
Revert to DWT threshold and process the components of wavelet transform
'
YDWT (a, ) = [1/ K (a, )]* YTADWT
'
(a, ) of the noisy signal in order to improve signal-to-noise ratio
(SNR).
A. Soft and Hard thresholding
Estimated
Speech Inverse Discrete In the literature there are two types of thresholding
y '(n ) wavelet transform
techniques applicable to speech processing which are Hard
Thresholding and Soft thresholding. Hard thresholding can
be described as the usual process of setting to zero the
elements whose absolute values are lower than the threshold.
Fig.4 Proposed Scheme
Soft thresholding is an extension of hard thresholding, first
setting to zero the elements whose absolute values are lower
In wavelet representation noise characteristics will tend to than the threshold, and then shrinking the nonzero
be characterized by smaller coefficients across time and scale coefficients towards 0. Let Th denote the given threshold. The
while signal energy will be concentrated in larger hard thresholding is defined by (11),
coefficients. This offers the possibility of using threshold to
separate the signal from the noise. Generally, the selected
X , X T
threshold has to be multiplied by the median value of the Y =
TADWT TADWT h
(11)
442
IACSIT International Journal of Engineering and Technology Vol.1,No.5,December,2009
ISSN: 1793-8236
The soft thresholding is defined by (12), we compared it to other two thresholding on this task. Two
sets of additive noise experiments were implemented on this
Sign (X ) ( X - Th ) , X TADWT Th data. In the first, white Gaussian noise was added to the
=
TADWT TADWT
YTADWT
0 , X TADWT < Th (12) sentences from the TIMIT database at SNR levels of -10,-5,
0, +5, +10 db. In the second, specific noise characteristics
where Y TADWT is thresholded time adaptive wavelet including pink noise, car noise, babble noise, F-16 cockpit
co-efficient of estimated speech signal and X TADWT is noise and HF channel noise was added at 6dB SNR level to
time adaptive wavelet co-efficient of the noisy speech signal . evaluate how well the methods work with non-white and
relatively non-stationary noise sources. Signal-to-noise ratio
B. Trimmed thresholding (SNR), Itakuro-Saito (IS) distance and MMSE are used as
objective measurements criteria for both set of experiments.
Motivated by finding a more general case that
incorporates the soft and hard thresholding, we proposed the A. Implementation steps:
following thresholding rule as in (13), Step:1 Computation of the discrete wavelet transform for
noisy speech.
X TADWT - Th
The objective of any speech enhancement system is to shown in Fig.7 and Fig.8. By referring Fig 7& 8 it is clear
minimize this MSE. that the trimmed thresholding method outperforms in
speech signal de-noising as compared to other two methods
Itakura-Saito (IS) distance in all
IS distance is a meaningful measure of performance aspects.
when the two waveforms differ in their phase spectra.
(a b)T R(a b)
d (a,b) = (16)
aT R (a)
where a is the vector for the prediction coefficients of the
clean speech signal, vector R is the (Toeplitz) autocorrelation
matrix of the clean speech signal and vector b is the
prediction coefficients of the enhanced signal. Many reported
experiments confirmed that two spectra would be
perceptually nearly identical if the distance is from 1 to 10,
with lower values indicating lesser distance and better speech
quality. Fig.7 IS distance measure comparisons for different noises
Fig.9. Output SNR results for white noise case at -10,-5, 0,+5,
+10 dB input SNR
444
IACSIT International Journal of Engineering and Technology Vol.1,No.5,December,2009
ISSN: 1793-8236
SNR improvement across varying realistic noise as net improvement, so that relative effectiveness can be seen
conditions at 0 dB SNR are shown in Fig.4. Here the results for all six noise conditions as a function of thresholding
are given method. The proposed method substantially outperforms the
other methods in all cases, but relatively less performance in
car noise conditions.Fig.11, 12 and 13 shows the time
domain representation and spectrogram (shows the energy
in a signal at each frequency and at each time) of the noisy,
clean and enhanced speech using different thresholding
methods. From results it is clear that in time domain the
estimated speech using trimmed thresholding is more
identical to the clean reference speech as compared with
other two thresholding methods. In case of spectrogram the
enhanced speech using trimmed is more comparable with
spectrogram of clean speech as compared to the obtained
Fig.10 SNR comparisons for varying noise conditions at 0dB SNR results of other two methods. The intensity variations in
enhanced speech using different thresholding algorithm are
due to decrease in signal strength.
Fig.11 Time domain representation and Spectrogram of (a).Clean Speech (b) Noisy speech (white noise 0 dB)
(c) Enhanced speech using Hard thresholding (d) Enhanced speech using Soft thresholding
(a)
(b)
(c)
(d)
(e)
445
IACSIT International Journal of Engineering and Technology Vol.1,No.5,December,2009
ISSN: 1793-8236
Fig.12 Time domain representation and Spectrogram of (a).Clean Speech (b) Noisy speech (Babble Noise 6 dB)
(c) Enhanced speech using Hard thresholding (d) Enhanced speech using Soft thresholding
(a)
(b)
(c)
(d)
(e)
Fig.13 Time domain representation and Spectrogram of (a).Clean Speech (b) Noisy speech (HF channel Noise 6 dB)
(c) Enhanced speech using Hard thresholding (d) Enhanced speech using Soft thresholding
446
IACSIT International Journal of Engineering and Technology Vol.1,No.5,December,2009
ISSN: 1793-8236
Amman Institute of Technology, Sathymangalam, India Mrs.M.G.Sumithra, born in Salem District, TamilNadu
State, India in1973, received B.E. in Electronics and
who provided the facilities to do our research. Communication Engineering from Govt. College of
Engineering , Salem, India in 1994 and received M.E in
REFERENCES Medical Electronics from
College of Engineering, Guindy, Anna University,
[1] S.F.Boll, Suppression of acoustic noise In speech using Chennai, India in 2001. She is currently Professor in
spectral subtraction, IEEE Trans. Acoustics. Speech. Signal Department of ECE, Bannari Amman Inst. of Technology
processing, vol. 27,pp. 13-120, April 1979. in, Sathyamangalam, Tamilnadu, India and pursuing her
[2] Djigan, V.I., Sovka, P. and Cmejla, R., Modified Spectral Subtraction research in Speech Processing. Her areas of interest are Signal Processing and
based Speech Enhancement, Proc. of the 1999 IEEE Workshop on Biomedical Engineering. She has published 20 technical papers in National and
Acoustics Echo and Noise Control IWAENC99, Sept. 1999, International conferences and one technical paper in National and two
Pensylvania, USA, pp. 64-67. technical papers in International Journals.
[3] Martin, R., Spectral subtraction based on minimum statistics, in Proc.
EUSIPCO, pp 1182-1185, September 1994.
[4] D.E.Tsoukalas , J.N.Mourjopoulos & Kokkinakis Speech Dr. K. Thanushkodi, born in Theni District, TamilNadu
enhancement based on audible noise suppression , IEEE Trans. State,India in1948, received the BE in Electrical and
Speech Audio . Proc., vol.5,pp. 479514,Nov.1997. Electronics Engineering from Madras University, Chennai.
[5] N.Virag,Single channel speech enhancement based on masking MSc (Engg) from Madras University, Chennai and PhD in
properties of the human auditory system, IEEE Trans. Speech Electrical and Electronics Engineering from Bharathiar
Audio Processing, vol. 7, pp. 126137, Mar. 1999. University, Coimbatore in 1972, 1976 and 1991
[6] Y. Ephraim & H.L. V Trees, A signal subspace Approach for respectively. He is currently Principal in Akshya College of
Speech Enhancement , IEEE Trans. Speech and Audio Processing Engineering and Technology , Coimbatore, Tamil Nadu.
vol.3, no.4 pp.251-265, Sep 1995. India. His research interests lie in the area of Computer
[7] Ephraim. Y., Malah. D., Speech Enhancement Using a minimum Modeling and Simulation, Computer Networking, Signal Processing and
mean-square error log-spectral amplitude estimator, IEEE Trans. Power System. He has published 40 technical papers in National and
Acoust. Speech Signal Processing ASSP-32(6), 1109- International Journals.
1121,1984.
[8] S. Mallat and W. L. Hwang, Singularity detection and processing with
wavelets, IEEE Trans. on Infomation Theory, vol. IT-38, pp. 617-643,
1992.
[9] D.L. Donoho , De-noising by soft thresholding, IEEE Trans. on
Information Theory, vol. 41 no. 3, 613-627, May 1995.
[10] Michael T.Johnson , Xiaolong Yuan , Yao Ren, Speech signal
enhancement through adaptive wavelet thresholding, Speech
Communication, 49 (10),pp. 123-133, 2007.
[11] Sumithra M G , Thanuskodi K, Anitha M R, Modified Time Adaptive
Wavelet Based Approach for Enhancing Speech from Adverse Noisy
Environment, ICGST International Journal on Digital Signal Processing,
vol.9, issue 1,pp.33-40, April 2009.
[12] W.Seok and K.S.Bae, Speech enhancement with reduction of noise
components in the wavelet domain, in Proceedings of the ICASSP,
pp.1323- 1326, 1997.
[13] Yasser Ghanbari, Mohammad Reza Karami Mollaei, A new
approach for speech enhancement based on the adaptive thresholding
of the wavelet packets, Speech Communication 48 , 927-940, 2006.
[14] Yao,J.,Zhang,Y.T., The application of bionic wavelet transform to
speech signal processing in cochlear implants using neural network
simulations, IEEE trans.Biomed.Eng.49(11), 1299-1309,2002.
[15] Yao,J., An Active model for otoacoustic emissions and its
application to time frequency signal processing, Ph.D. thesis, The
Chinese University of Hong Kong, Hong Kong, 2001.
[16] K.P. Soman, K.I. Ramachandran, Insight in to wavelets - From
theory to practice, Prentice - Hall of India Private Ltd, 2nd edition,
2006, pp 81-83 .
[17] J.Sgarofolo, Getting started with the DARPA TIMIT CD-ROM: An
acoustic phonetic continuous speech database, NIST speech disc
1-1.1,oct 1990.
447