Performance Evaluation of Different Thresholding Methods in Time Adaptive Wavelet Based Speech Enhancement

IACSIT International Journal of Engineering and Technology Vol.1,No.
5,December,2009
ISSN: 1793-8236
Performance Evaluation of Different

Thresholding Methods in Time Adaptive
Wavelet Based Speech Enhancement
A. Sumithra M G, Member, IACSIT B. Thanushkodi K
and/or the speech recognition performance. The problem of

Abstract Speech quality and intelligibility might de-noising consists of removing noise from corrupted signal
significantly deteriorate in the presence of background noise,
especially when the speech signal is subject to subsequent without altering it. The corrupting noise sources are usually
processing. Speech enhancement algorithms have therefore classified into additive and convolutional. The former very
attracted a great deal of interest in the past two decades.
often dominates in real world applications, and the spectral
Wavelets provide a powerful tool for non-linear filtering of
signals contaminated by noise and wavelet thresholding
subtraction (SS) approach has been a very popular example
de-noising techniques provide a new way to reduce noise in solution for it [1],[2],[3]. To subtract the noise components
signal. In this work speech enhancement is accomplished from the input noisy speech, the SS algorithm has to estimate
through the use of different thresholding on time adaptive the statistics of the additive noise in frequency domain.
discrete Daubechies wavelet transform co-efficients. However, Under low signal-to-noise ratio (SNR) conditions, a spectral
the soft thresholding is best in reducing noise but worst in flooring process is usually taken to prevent the
preserving edges, and hard thresholding is best in preserving over-subtraction situation occurred. However, all such
edges but worst in de-noising. Motivated by finding a more processes very often produce some unnatural residual noise
general case that incorporates the soft and hard thresholding
in the enhanced speech, the so-called musical noise, due to
to achieve a compromise between the two methods, the
trimmed thresholding method is proposed in this paper to
the inevitable random tone peaks generated in the
enhance the speech from background noise. The performance time-frequency spectrogram. Previous studies have pointed
of different thresholding methods are evaluated by enhancing out that this perceivable residual noise can be effectively
the speech corrupted by various noises. Finally, the objective alleviated by considering the masking effect in human
and subjective experimental results show that the proposed auditory system [4],[5] i.e., the residual noise will not be
scheme with trimmed thresholding is superior in denoising as perceived if it is under the masking thresholds in human
compared to hard and soft thresholding methods. It also auditory functions.
indicates that the proposed method gives better mean square In recent years, several alternative approaches such as
error (MSE) performance than other wavelet thresholding
signal subspace methods [6], have been proposed for
methods.
enhancing the degraded speech. In subspace method the
estimation of signal subspace dimension is difficult for
Index Terms Speech enhancement, Time adaptive
Daubechies wavelet transform, Time adaptation factor, unvoiced period and transitional regions. Existing
Thresholding. approaches to this task include traditional methods such as
spectral subtraction and Ephraim Malah filtering [7], a
drawback of this technique is the necessity to estimate the
I. INTRODUCTION noise or the signal to noise ratio. This can be a strong
limitation when recording with non-stationary noise and for
In communication systems, speech signals can be
situations where the noise can not be estimated.
contaminated by environmental noise and, as a result, the
Fourier domain was long been the method of choice to
communication quality can be affected making the speech
suppress noise. Recently however, methods based on the
less intelligible. Voice quality and intelligibility are always
important for communication systems, either wired or wavelet transformation have become increasingly popular.
wireless, either in human-to-human or human-to-machine Wavelets provide a powerful tool for non-linear filtering of
interactions. In order to obtain near-transparent speech signals contaminated by noise. Mallat and Hwang [8] have
communications, for example via mobile phones, speech shown that effective noise suppression may be achieved by
enhancement techniques have been employed to improve the transforming the noisy signal into the wavelet domain, and
quality and intelligibility of the noise corrupted speech preserving only the local maxima of the transform.
Alternatively, a reconstruction that uses only the
large-magnitude coefficients has been shown to approximate
Manuscript submitted July 3, 2009 for review. F. A. Author is with the
well the uncorrupted signal. In other words, noise
Department of Electronics and Communication Engineering, Bannari Amman
Institute of Technology, Sathyamangalam, Tamil Nadu, India (phone: suppression is achieved by thresholding the wavelet
09865816671). transform of the contaminated signal. The method of wavelet
S. B. Author, is with Akshya College of Engineering and Technology , threshold de-noising is based on the principle of the
Coimbatore, Tamil Nadu. India. (phone: 09843394451).
439
IACSIT International Journal of Engineering and Technology Vol.1,No.5,December,2009
ISSN: 1793-8236
multiresolution analysis. The discrete detail coefficients and II. BACKGROUND WAVELET ANALYSIS
the discrete approximation coefficients can be obtained by a Wavelet transform has been intensively used in various
multi-level wavelet decomposition. Wavelet-based fields of signal processing. It has the advantage of using
techniques using coefficient thresholding [8], using adaptive variable size time-windows for different frequency bands.
thresholding [9] approaches have also been applied to speech This results in a high frequency-resolution (and low
enhancement. Donoho introduced wavelet thresholding time-resolution) in low bands and low frequency-resolution
(shrinking) as a powerful tool in denoising signals degraded in high bands. Consequently, wavelet transform is a powerful
by additive white noise and more recently a number of tool for modeling non-stationary signals like speech that
attempts have been made to use perceptually motivated exhibit slow temporal variations in low frequency and abrupt
wavelet decompositions coupled with various thresholding temporal changes in high frequency. Moreover, when one is
and estimation methods. Although the application of wavelet restricted to use only one (noisy) signal (as in
shrinking for speech enhancement has been reported in single-microphone speech enhancement), generally the use
literature [10]-[13], there are many problems yet to be of the subband processing can result in a better performance.
resolved for a successful application of the method to speech Therefore, wavelet transform can provide an appropriate
signals degraded by real environmental noise types. The model for speech signal denoising applications. In the
most known thresholding methods in the literature are the present work, the computation of Discrete Wavelet
soft and hard thresholding. We can expect that the technique Transform (DWT) providing sufficient information for both
of soft thresholding would introduce more error or bias than analysis and synthesis of the original signal, with a
hard thresholding does [6]. But on the other hand, soft significant reduction in the computation time. The DWT is
thresholding is more efficient in de-noising. To achieve a considerably easier to implement without needing to perform
compromise between the two methods, the trimmed numerical integration as like Continuous wavelet transform
thresholding method is proposed in this paper for noisy (CWT).
speech co-efficient to reduce the noise. In addition in this
paper, trimmed thresholding technique is used for de-noising
as well as good in preserving the edges. A. DWT Computation
The main objective of the proposed method is to improve The DWT analyze the signal at different frequency bands
on existing single-microphone schemes for an extended
with different resolutions by decomposing the signal into a
range of noise types and noise levels, thereby making this
coarse approximation and detail information shown in
method more suitable for mobile speech communication
applications than the existing. This algorithm introduces a
speech enhancement system based on a time adaptive
discrete wavelet denoising using trimmed thresholding. The
performance of the proposed method was evaluated on Fig.1.The DWT of a signal x(n) is calculated by passing it
several speakers and under various noise conditions through a series of filters. First the samples are passed
including white noise, pink noise, F16 cockpit noise, babble through a low pass filter with impulse response g resulting in
noise, high frequency channel noise and car interior noise. a convolution of the two as in (1)
Subjective experiments by means of a listening test shows
that the system based on this method has significant The signal is also decomposed simultaneously using a
improvement over the wavelet based approach using hard high-pass filter h. The output giving the detail coefficients
and soft thresholding and the state-of-the-art speech
from the high-pass filter and approximation coefficients
enhancement system. The results of the proposed method
from the low-pass filter. It is important that the two filters are
shows that it is well suited for adverse noise conditions and
related to each other and they are known as a quadrature
yields better spectral performance. It is very important
characteristic for speech recognition or speaker verification. mirror filter (2). However, since half the frequencies of the
This paper is organized as follows: Section I represents a signal have now been removed, half the samples can be
survey on the related works. Section II presents principle of discarded according to Nyquists rule. The filter outputs are
wavelet and discrete wavelet computation. Section III then sub sampled by 2.
presents the proposed scheme for speech signal
enhancement. In section IV de-noising by thresholding in the
wavelet domain introduced and the hard ,soft thresholding (2)
are reviewed and a new trimmed thresholding method is
proposed for speech enhancement purpose that has better
performance. Section V shows the steps for implementation
and the experimental results were discussed, which validate
the proposed thresholding algorithm. Finally section VI This decomposition has halved the time resolution since only
summarizes the presented research work half of each filter output characterizes the signal. However,
each output has half the frequency band of the input so the
frequency resolution has been doubled.
440
ISSN: 1793-8236
the noise decreases with the level increasing, we can select a

threshold, modify and process all of the discrete detail
coefficients at all scale by hard thresholding or soft
thresholding or trimmed thresholding so as to remove noise
[10]. A noisy speech signal can be modeled as the sum of
clean speech and additive background noise. If the signal
includes ambient noise, the result is an additive signal model
Fig.1 Block diagram of filter analysis
given by,
x = y+n (3)
This decomposition is repeated to further increase the
frequency resolution and the approximation coefficients where x is noisy signal, y is clean speech and n is additive
decomposed with high and low pass filters and then noise component . So that
down-sampled. This is represented as a binary tree with X=Y+N (4)
nodes representing a sub-space with a different
time-frequency localization. The tree is known as a filter where X = Wx, Y = Wy, N = Wn in wavelet domain [14]. The
bank. matrix notation represents the coefficients across each scale
and time. Block diagram of the proposed approach is shown
in Fig.4. First DWT of the noisy speech is taken then the time
adaptive nature is captured by time varying linear factor
T ( a , ) calculation for each scale (a = 2 ) and time ( = n 2 )
m m
using (5). This factor only affects the duration of amplitude

envelope of wavelet, but not affects the frequency.
1
T (a, + ) =

C
1
s
1 + XTADWT (a, )
C + X (a, ) t
Fig.2 A three level filter bank s TADWT
(5)
For implementation based on Yao and Zhangs work [14]

for cochlear implant coding, coefficients at 22 scales,
m = 7,8,28 are calculated using numerical integration of
the CWT. These 22 scales corresponds to center frequencies
logarithmically spaced from 225 Hz to 5300Hz have
considered in this method. Cs = 0.8 is a constant representing
Fig.3. Frequency domain representation of the DWT non linear saturation effects in the cochlear model [15].
Since, the primary adaptation mechanism involves variation
At each level in the above diagram the signal is of the wavelet time support, the impact of initial time support
decomposed into low and high frequencies shown in Fig.2 was done by turning off adaptation mechanism ( T (a, ) = 1 ).
and Fig.3. Due to the decomposition process the input signal The resulting time adaptive wavelet transform coefficients
must be a multiple of 2n where n is the number of levels. This X TADWT (a, ) are calculated from the product of DWT
proposed method of analysis has gone up to 4th level coefficients X DWT (a , ) with a time constant K ( a , ) and
decomposition using Daubechies wavelet-4. The Daubechies the same is substituted in (5) for time adaptation mechanism.
wavelets are a family of orthogonal wavelets defining a From the reported analysis [8], [11],
discrete wavelet transform and characterized by a maximal
X TADWT = K(a, ) * X DWT ( a , )
number of vanishing moments for some given support. With
each wavelet type of this class, there is a scaling function 1 (6)
K ( a , ) =
which generates an orthogonal multi resolution analysis. It is C 1+ T 2
( a , )
one of the brightest wavelet on research which is compactly
supported orthonormal wavelets [14]. where C 0 + C1 + C 2 + C 3 = C = 2 (normalizing constant).
The normality is obtained by (7).
III. PROPOSED SCHEME USING TIME ADAPTIVE
(1 + 3 ) (3 + 3 )
DAUBECHIES WAVELET TRANSFORM
,
C0 = C1 =
Wavelet decomposition transforms signal from time 4 4, (7)
domain to time-scale domain, and it can describe the local
feature well in both time domain and frequency domain. C2 =
(3 3 ) , C3 =
(1 3 )
Because the amplitude of the discrete detail coefficients of 4 4
441
ISSN: 1793-8236
Then h(k ) = C k / 2 , co-efficients for db4 wavelets are as threshold processing, we study how they influence the
performance of wavelet de-noising.
follows [16],
Specifically, in each wavelet band, we calculate the
h (0 ) =
(1 + 3 ) , h(1) =
(3 + 3) variance of the noisy coefficients. Using (9), each variance
then can then be employed to set the threshold to a value
4 2 4 2 based on the noise energy in the band. In practical situations,
(1 3)
(8)
h(2) = h(3) =
(3 3) one is often encountered with colored rather than white
noises. Assuming zero-mean Gaussian noise, the coefficients
,
4 2 4 2 will be Gaussian random variables of zero mean and variance
2, the standard deviation is thus estimated by (9),
Since, it is a discrete wavelet this computational method
requires no integration and is more efficient. Removing noise = (1 / 0.6745) Median ( | ci | ) (9)
components by thresholding the wavelet coefficients is based
on the observation that in many signals (like speech), energy where ci represents high frequency wavelet coefficients which
is mostly concentrated in a small number of wavelet are used to identify the noise components at first level
dimensions. The coefficients of these dimensions are decomposition. The set of standard deviation values can now
relatively large compared to other dimensions or to any other be used as the noise profile for threshold setting [15].
signal (specifically noise) that has its energy spread over a This noise profile estimation enables the algorithm to cope
large number of coefficients. Hence, by setting smaller with colored noises. Threshold value [10] can be determined
coefficients to zero, one can nearly optimally eliminate noise by (10),
while preserving the important information of the original Th = (2 log (N log2 N) 1/ 2 (10)
signal.
where Th is the threshold value. N is the length of the noisy
Discrete Wavelet Transform (Db) signal. In threshold selection, we should not ignore the
Noisy Speech X DWT (a, ) detail coefficients in every level that probably influence the
x(n)
robustness of the threshold estimating. So we have to rescale
a selected threshold in some level. In this paper, the threshold
Time adaptation is dependent on the detail coefficients at every level.
XTADWT = K(a, ) *XDWT (a, )
IV. DENOISING BY WAVELET THRESHOLDING
The wavelet de-noising technique is called thresholding,
it is a non linear algorithm. It can be decomposed in tree steps.
Threshold value determination The first one consists in computing the coefficients of the
Th = (2 log (N)) 1/ 2 wavelet transform (WT) which is a linear operation. The
second one consists in thresholding these coefficients. The
last step is the inversion of the thresholded coefficients by
applying the inverse wavelet transform, which leads to the
Trimmed Thresholding de-noised signal. This technique is simple and efficient.
'
YTADWT ( a, ) = Thresh{ X TADWT ( a, )} However it relies heavily on the choice of the threshold,
which in its turn depends on the noise distribution. In the
wavelet thresholdimg de-noising, we should first select a
Revert to DWT threshold and process the components of wavelet transform
'
YDWT (a, ) = [1/ K (a, )]* YTADWT
'
(a, ) of the noisy signal in order to improve signal-to-noise ratio
(SNR).
A. Soft and Hard thresholding
Estimated
Speech Inverse Discrete In the literature there are two types of thresholding
y '(n ) wavelet transform
techniques applicable to speech processing which are Hard
Thresholding and Soft thresholding. Hard thresholding can
be described as the usual process of setting to zero the
elements whose absolute values are lower than the threshold.
Fig.4 Proposed Scheme
Soft thresholding is an extension of hard thresholding, first
setting to zero the elements whose absolute values are lower
In wavelet representation noise characteristics will tend to than the threshold, and then shrinking the nonzero
be characterized by smaller coefficients across time and scale coefficients towards 0. Let Th denote the given threshold. The
while signal energy will be concentrated in larger hard thresholding is defined by (11),
coefficients. This offers the possibility of using threshold to
separate the signal from the noise. Generally, the selected
X , X T
threshold has to be multiplied by the median value of the Y =
TADWT TADWT h
detail coefficients at some level which is called threshold <T

TADWT
processing. To find the comparatively the best method of

0 , X TADWT h
(11)
442
ISSN: 1793-8236
The soft thresholding is defined by (12), we compared it to other two thresholding on this task. Two
sets of additive noise experiments were implemented on this
Sign (X ) ( X - Th ) , X TADWT Th data. In the first, white Gaussian noise was added to the
=
TADWT TADWT
YTADWT
0 , X TADWT < Th (12) sentences from the TIMIT database at SNR levels of -10,-5,
0, +5, +10 db. In the second, specific noise characteristics

where Y TADWT is thresholded time adaptive wavelet including pink noise, car noise, babble noise, F-16 cockpit
co-efficient of estimated speech signal and X TADWT is noise and HF channel noise was added at 6dB SNR level to
time adaptive wavelet co-efficient of the noisy speech signal . evaluate how well the methods work with non-white and
relatively non-stationary noise sources. Signal-to-noise ratio
B. Trimmed thresholding (SNR), Itakuro-Saito (IS) distance and MMSE are used as
objective measurements criteria for both set of experiments.
Motivated by finding a more general case that
incorporates the soft and hard thresholding, we proposed the A. Implementation steps:
following thresholding rule as in (13), Step:1 Computation of the discrete wavelet transform for
noisy speech.

X TADWT - Th

X TADWT , X TADWT Th Step:2 Computation of time adaptation factor and multiply

YTADWT = X TADWT

(13) with discrete wavelet coefficients using (6).

0 , X TADWT < Th
Step:3 Estimate the noise using (9) and determine the
Th is chosen as an estimate of noise level. When = 1, it threshold value using (10) then apply different
is equivalent to soft thresholding; when , it is thresholding techniques for the time adaptive
equivalent to hard thresholding. Fig.5 graphically shows its wavelet co-efficients using (11), (12) and (13).
relation with soft and hard thresholding. It can be clearly Step:4 Inverse Time Adaptive Discrete Wavelet transform
seen that trimmed thresholding is something between hard is taken through dividing the co-efficients by that
and soft thresholding. With careful tuning of parameter for adaptation factor, which yields DWT coefficients.
a particular signal, one can achieve best de-noising effect
within thresholding framework. Step:5 Taking Inverse Discrete wavelet Transform ( IDWT)
the enhanced speech with reduced noise
components
is obtained while applying trimmed
thresholding.
But in case of hard and soft thresholding post
filtering is done to achieve the comparable results.
B. Objective Measure Evaluation

Objective Quality measures provide a measure based on
a mathematical comparison of the original and processed
speech signals that can be easily implemented and reliably
reproduced.
Signal to noise ratio

Fig.5. Hard, Soft and Trimmed thresholding function The global SNR values are determined by the following
equation,
V. IMPLEMENTATION AND EVALUATION
In this section, the implementation steps of the proposed

n s (n ) 2
(14)
SNR = 10 log 2
[s (n ) s(n )]
method were discussed. Furthermore, the performance of the dB 10
best thresholding method is evaluated and compared it with n

other methods of thresholding. A clean speech sentences where s (n ) = clean speech and S (n ) = estimated speech. If
from the TIMIT database [17] is corrupted by different noises
for various SNR ranging from -10dB to +10 dB are the summation is performed over the whole signal length, the
considered as noisy input speech. Objective and subjective operation is called as global SNR.
tests were conducted to evaluate the quality of the proposed
method. Objective Quality measures provide a measure based Minimum Mean Square Error
on a mathematical comparison of the original and processed Mean Square Error (MSE) is defined as to be the average
speech signals. The quality of speech signal is a subjective power of the difference between the enhanced speech and
measure which reflects the way the signal is perceived by clean one. It can be obtained by
listeners. It can be expressed in terms of how pleasant the
r = E [S (n ) S (n )]
2
signal sounds are or how much effort is required to (15)

understand the message. To evaluate the effectiveness of
using this proposed method for denoising of speech signals,
443
ISSN: 1793-8236
The objective of any speech enhancement system is to shown in Fig.7 and Fig.8. By referring Fig 7& 8 it is clear
minimize this MSE. that the trimmed thresholding method outperforms in
speech signal de-noising as compared to other two methods
Itakura-Saito (IS) distance in all
IS distance is a meaningful measure of performance aspects.
when the two waveforms differ in their phase spectra.
(a b)T R(a b)
d (a,b) = (16)
aT R (a)
where a is the vector for the prediction coefficients of the
clean speech signal, vector R is the (Toeplitz) autocorrelation
matrix of the clean speech signal and vector b is the
prediction coefficients of the enhanced signal. Many reported
experiments confirmed that two spectra would be
perceptually nearly identical if the distance is from 1 to 10,
with lower values indicating lesser distance and better speech
quality. Fig.7 IS distance measure comparisons for different noises
C. Subjective Measure Evaluation

This provide a broad measure of performance since a
large difference in quality is necessary to make it
distinguishable to the listener. The mean opinion score
(MOS) provides a numerical measure of the quality of human
speech. The scheme uses subjective tests (opinionated scores)
that are mathematically averaged to obtain a quantitative
indicator of the system performance. To determine MOS, a
number of listeners rate the quality of test sentences by
hearing test. Based on the perceived quality of enhanced Fig.8 MMSE comparisons for different noises
speech listener gives a rating for each sentence as follows: (1)
Bad (2) Poor (3) Fair (4) Good (5) Excellent. The Output SNR results for the white noise condition across
MOS is the arithmetic mean of all the individual scores, and range of SNR values are shown in Fig.9. Thresholding
can range from 1 (worst) to 5 (best). methods compared include Hard, Soft and proposed
Average MOS is computed by having a group of 20 trimmed method. From fig.9 the proposed thresholding
listeners to rate the quality of the enhanced speech on a five method clearly have the best performance for this white
point scale, then averaging the results. For all measures, noise condition. The output SNR of the proposed method,
results are averaged, giving a single evaluation metric for Soft thresholding method shows the linear SNR
each method for input speech corrupted by various noises improvement as compared with hard thresholding.
shown in Fig. 6. and it shows that the obtained average
MOS of Trimmed thresholding is better as compared with
the obtained results for Hard and Soft thresholding. But in
the case of babble and HF channel noise the performance of
the proposed thresholding method is similar to soft
thresholding.
Fig.9. Output SNR results for white noise case at -10,-5, 0,+5,
+10 dB input SNR
Proposed method gave about 15 dB improvement at the lower

input SNRs decreasing to about 9dB improvement at the
higher input SNRs. Similarly, soft thresholding gave about
12 dB improvement at the lower input SNRs decreasing to
about 3dB improvement and hard thresholding gave about
Fig.6 MOS comparisons for different noises
7dB improvement at the lower input SNRs decreasing to
In addition Itakura-Saito (IS) distance and MMSE are about -3dB improvement at the higher input SNRs
computed for the estimated speech for three different respectively.
thresholding methods at different noise conditions are
444
ISSN: 1793-8236
SNR improvement across varying realistic noise as net improvement, so that relative effectiveness can be seen
conditions at 0 dB SNR are shown in Fig.4. Here the results for all six noise conditions as a function of thresholding
are given method. The proposed method substantially outperforms the
other methods in all cases, but relatively less performance in
car noise conditions.Fig.11, 12 and 13 shows the time
domain representation and spectrogram (shows the energy
in a signal at each frequency and at each time) of the noisy,
clean and enhanced speech using different thresholding
methods. From results it is clear that in time domain the
estimated speech using trimmed thresholding is more
identical to the clean reference speech as compared with
other two thresholding methods. In case of spectrogram the
enhanced speech using trimmed is more comparable with
spectrogram of clean speech as compared to the obtained
Fig.10 SNR comparisons for varying noise conditions at 0dB SNR results of other two methods. The intensity variations in
enhanced speech using different thresholding algorithm are
due to decrease in signal strength.
Fig.11 Time domain representation and Spectrogram of (a).Clean Speech (b) Noisy speech (white noise 0 dB)
(c) Enhanced speech using Hard thresholding (d) Enhanced speech using Soft thresholding
(a)
(b)
(c)
(d)
(e)
445
ISSN: 1793-8236
Fig.12 Time domain representation and Spectrogram of (a).Clean Speech (b) Noisy speech (Babble Noise 6 dB)
(a)
(b)
(c)
(d)
(e)
Fig.13 Time domain representation and Spectrogram of (a).Clean Speech (b) Noisy speech (HF channel Noise 6 dB)
speech and clean speech have better similarities on time and

frequency domain analysis. In spite of the powerful
performance for additive white noise case, the proposed
VI. CONCLUSION method produces better performance in real time noisy
environment like F-16 cockpit, babble, HF channel noise.
A time adaptive wavelet with trimmed thresholding
The proposed method was well suited to enhance the speech
process for de-noising speech from different noisy
even for very strong noise condition since it has produced
conditions has been presented. Enhancement results
better performance than the existing algorithms. The
demonstrate that the proposed scheme shows better
limitation of this proposed scheme is the proper tuning of the
performance than hard and soft thresholding methods to
parameter for each noise conditions. Future work on this
de-noise the signal. It has shown that it can considerably
approach will include the adaptation of the parameter
enhance the noisy speech corrupted by white and colored
and modified thresholding techniques for other noisy cases
noises. In case of signal de-noising using hard and soft
like street, helicopter, train noise and industrial noises etc,.
thresholding a post processing band pass filter is used to
Further this algorithm can be implemented in FPGA for
reduce the noise effectively [11].But while using the trimmed
enhancing speech in digital hearing aids.
thresholding method without post filtering a very good
results were obtained. The competency of the proposed
system to extract a clear and intelligible speech from various ACKNOWLEDGMENT
adverse noisy environments in comparison with other
We gratefully acknowledge the cooperation of the people
well-known thresholding methods has been demonstrated
who participated in the subjective test. We would like to
through both objective and subjective measurements. The
express our sincere gratitude to the management of Bannari
quality and intelligibility tests were proved that the enhanced
446
ISSN: 1793-8236
Amman Institute of Technology, Sathymangalam, India Mrs.M.G.Sumithra, born in Salem District, TamilNadu
State, India in1973, received B.E. in Electronics and
who provided the facilities to do our research. Communication Engineering from Govt. College of
Engineering , Salem, India in 1994 and received M.E in
REFERENCES Medical Electronics from
College of Engineering, Guindy, Anna University,
[1] S.F.Boll, Suppression of acoustic noise In speech using Chennai, India in 2001. She is currently Professor in
spectral subtraction, IEEE Trans. Acoustics. Speech. Signal Department of ECE, Bannari Amman Inst. of Technology
processing, vol. 27,pp. 13-120, April 1979. in, Sathyamangalam, Tamilnadu, India and pursuing her
[2] Djigan, V.I., Sovka, P. and Cmejla, R., Modified Spectral Subtraction research in Speech Processing. Her areas of interest are Signal Processing and
based Speech Enhancement, Proc. of the 1999 IEEE Workshop on Biomedical Engineering. She has published 20 technical papers in National and
Acoustics Echo and Noise Control IWAENC99, Sept. 1999, International conferences and one technical paper in National and two
Pensylvania, USA, pp. 64-67. technical papers in International Journals.
[3] Martin, R., Spectral subtraction based on minimum statistics, in Proc.
EUSIPCO, pp 1182-1185, September 1994.
[4] D.E.Tsoukalas , J.N.Mourjopoulos & Kokkinakis Speech Dr. K. Thanushkodi, born in Theni District, TamilNadu
enhancement based on audible noise suppression , IEEE Trans. State,India in1948, received the BE in Electrical and
Speech Audio . Proc., vol.5,pp. 479514,Nov.1997. Electronics Engineering from Madras University, Chennai.
[5] N.Virag,Single channel speech enhancement based on masking MSc (Engg) from Madras University, Chennai and PhD in
properties of the human auditory system, IEEE Trans. Speech Electrical and Electronics Engineering from Bharathiar
Audio Processing, vol. 7, pp. 126137, Mar. 1999. University, Coimbatore in 1972, 1976 and 1991
[6] Y. Ephraim & H.L. V Trees, A signal subspace Approach for respectively. He is currently Principal in Akshya College of
Speech Enhancement , IEEE Trans. Speech and Audio Processing Engineering and Technology , Coimbatore, Tamil Nadu.
vol.3, no.4 pp.251-265, Sep 1995. India. His research interests lie in the area of Computer
[7] Ephraim. Y., Malah. D., Speech Enhancement Using a minimum Modeling and Simulation, Computer Networking, Signal Processing and
mean-square error log-spectral amplitude estimator, IEEE Trans. Power System. He has published 40 technical papers in National and
Acoust. Speech Signal Processing ASSP-32(6), 1109- International Journals.
1121,1984.
[8] S. Mallat and W. L. Hwang, Singularity detection and processing with
wavelets, IEEE Trans. on Infomation Theory, vol. IT-38, pp. 617-643,
1992.
[9] D.L. Donoho , De-noising by soft thresholding, IEEE Trans. on
Information Theory, vol. 41 no. 3, 613-627, May 1995.
[10] Michael T.Johnson , Xiaolong Yuan , Yao Ren, Speech signal
enhancement through adaptive wavelet thresholding, Speech
Communication, 49 (10),pp. 123-133, 2007.
[11] Sumithra M G , Thanuskodi K, Anitha M R, Modified Time Adaptive
Wavelet Based Approach for Enhancing Speech from Adverse Noisy
Environment, ICGST International Journal on Digital Signal Processing,
vol.9, issue 1,pp.33-40, April 2009.
[12] W.Seok and K.S.Bae, Speech enhancement with reduction of noise
components in the wavelet domain, in Proceedings of the ICASSP,
pp.1323- 1326, 1997.
[13] Yasser Ghanbari, Mohammad Reza Karami Mollaei, A new
approach for speech enhancement based on the adaptive thresholding
of the wavelet packets, Speech Communication 48 , 927-940, 2006.
[14] Yao,J.,Zhang,Y.T., The application of bionic wavelet transform to
speech signal processing in cochlear implants using neural network
simulations, IEEE trans.Biomed.Eng.49(11), 1299-1309,2002.
[15] Yao,J., An Active model for otoacoustic emissions and its
application to time frequency signal processing, Ph.D. thesis, The
Chinese University of Hong Kong, Hong Kong, 2001.
[16] K.P. Soman, K.I. Ramachandran, Insight in to wavelets - From
theory to practice, Prentice - Hall of India Private Ltd, 2nd edition,
2006, pp 81-83 .
[17] J.Sgarofolo, Getting started with the DARPA TIMIT CD-ROM: An
acoustic phonetic continuous speech database, NIST speech disc
1-1.1,oct 1990.
447

Performance Evaluation of Different Thresholding Methods in Time Adaptive Wavelet Based Speech Enhancement

Uploaded by

Copyright:

Available Formats

Performance Evaluation of Different Thresholding Methods in Time Adaptive Wavelet Based Speech Enhancement

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Performance Evaluation of Different Thresholding Methods in Time Adaptive Wavelet Based Speech Enhancement

Uploaded by

Copyright:

Available Formats

IACSIT International Journal of Engineering and Technology Vol.1,No.

Performance Evaluation of Different

and/or the speech recognition performance. The problem of

the noise decreases with the level increasing, we can select a

using (5). This factor only affects the duration of amplitude

For implementation based on Yao and Zhangs work [14]

detail coefficients at some level which is called threshold <T

processing. To find the comparatively the best method of

X TADWT , X TADWT Th Step:2 Computation of time adaptation factor and multiply

(13) with discrete wavelet coefficients using (6).

B. Objective Measure Evaluation

Signal to noise ratio

In this section, the implementation steps of the proposed

best thresholding method is evaluated and compared it with n

C. Subjective Measure Evaluation

Proposed method gave about 15 dB improvement at the lower

speech and clean speech have better similarities on time and

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.