Boll79 SuppressionAcousticNoiseSpectralSubtraction PDF

IEEE TRANSACTIONS
ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING,
VOL. ASSP-27, NO. 2 , APRIL 1979
113
Suppression of Acoustic Noise in Speech Using Spectral Subtraction
Abstract-A stand-alone noise suppression algorithm is presented for reducing the spectral effects of acoustically added noise in speech. Effective performance of digital speech processors operating in practical environments may require suppression of noise from the digital waveform. Spectral subtraction offers a computationally efficient, processorindependent approach to effective digital speech analysis. The method, requiring about the same computation as high-speed convolution, suppresses stationary noise from speech by subtracting the spectral noise bias calculated during nonspeech activity. Secondaryprocedures are then applied to attenuate the residual noise left after subtraction. Since the algorithm resynthesizes a speech waveform, it can be used as a preprocessor to narrow-band voice communications systems, speech recognition systems, or speaker authentication systems.
speech processor implementation (it could be connected t o a CCD channel vocoder or a digital LPC vocoder). The objectives of this effort were to develop a noise suppression technique, implement a computationally efficient algorithm,andtestitsperformance in actual noise environments. The approach used was to estimatethemagnitude frequencyspectrumoftheunderlyingcleanspeechbysubtracting the noise magnitude spectrum from the noisy speech spectrum. This estimator requires an estimate of the current noise spectrum. Rather than obtain this noise estimate from asecondmicrophonesource [9] , [lo] , it is approximated using the average noise magnitude measured during nonspeech activity. Using this approach, the spectral approximation error is then defined,andsecondarymethodsforreducing it are I.INTRODUCTION described. ACKGROUND noise acoustically added to speech can The noise suppressor is implemented using about the same degrade the performance of digital voice processors used amount of computation as required in a high-speech convolufor applications such as speech compression, recognition, and tion. It is tested on speech recorded in a helicopter environi l l be used in ment. Its performance is measured using the Diagnostic Rhyme authentication [ 11 , [2] . Digital voice systems w a variety ofenvironments,andtheirperformancemust be Test (DRT) [111 and is demonstrated using isometric plots of maintained at a level near that measured using noise-free input short-time spectra. speech. To ensure continued reliability, the effects of backThe paper is divided into sections which develop the spectral ground noise can be reduced by using noise-cancelling micro- estimator, describe the algorithm implementation, and demonphones, internal modification of the voice processor algorithms strate the algorithm performance. t o explicitly compensate for signal contamination,orpreprocessor noise reduction. 11. SUBTRACTIVE NOISE SUPPRESSION ANALYSIS Noise-cancelling microphones, although essential for ex- A. Introduction tremely high noiseenvironments such as the helicopter cockpit, This section describes the noise-suppressed spectral estimator. offer little or no noise reduction above 1 kHz [3] (see Fig. 5 ) . The estimator is obtained by subtracting an estimate of the Techniques available for voice processormodification to acnoise spectrum from the noisy speech spectrum. Spectral incount for noise contamination are being developed [4] , [ 5 ] . formation required to describe the noise spectrum is obtained But due to the time, effort, and money spent on the design activity. After and implementation of these voice processors [6]-[8] , there fromthe signal measuredduringnonspeech developing the spectral estimator, the spectral error is comis a reluctance t o internally modify these systems. puted and four methods for reducing it are presented. Preprocessor noise reduction E121 , [21] offers the advantage The following assumptions were used in developing the that noise stripping is done on the waveform itself with the analysis. The background noise is acoustically or digitally output being either digital oranalogspeech.Thus, existing the speech. The background noise environment voice processorstuned to clean speechcan continue to be added to remains locally stationary t o the degree that its spectral magused unmodified. Also, since the output is speech, the noise nitude expected value just prior to speech activity equals its stripping becomesindependentofany specific subsequent expected value during speech activity. If the environment exists enoughtime Manuscript received June 1, 1978; revised September 12, 1978. This changes to a new stationary state, there research was supported by the InformationProcessingBranch of the (about 300 ms) to estimate a new background noise spectral DefenseAdvancedResearchProjects Agency, monitored by the Naval magnitude expected value before speech activity commences. Research Laboratory under Contract N00173-77-C-0041, The author is with the Department of Computer Science, University For the slowly varying nonstationary noise environment, the algorithmrequiresaspeech activity detector to signal the of Utah, Salt Lake City, UT 84112.
0096-3518/79/0400-0113$00.75 0 1979 IEEE
114
IEEE TRANSACTIONS ON ACOUSTICS, SPEECH,
AND SIGNAL PROCESSING,
VOL.ASSP-27, NO. 2, APRIL 1979
program that speech has ceased and a new noise bias can be estimated. Finally, it is assumed that significant noise reduction is possible by removing the effect of noise from the magnitude spectrum only. Speech, suitably low-pass filtered and digitized, is analyzed by windowing data from half-overlapped input data buffers. The magnitude spectra of the windowed data are calculated and the spectral noise bias calculated during nonspeech activity is subtracted off. Resulting negative amplitudes are then zeroed out. Secondary residual noise suppression is then applied. A time waveform is recalculated from the modified magnitude. This waveform is then overlap added t o the previous datat o generate the output speech.
A number of simple modifications areavailable to reduce theauditoryeffectsofthis spectral error. These include: 1) magnitude averaging; 2) half-wave rectification; 3) residual noise reduction; and 4) additional signal attenuation during nonspeech activity.
E. Magnitude Averaging
Since the spectral error equals the difference between the noise spectrum N and its mean p, local averaging of spectral magnitudes can be used to reduce the error. Replacing IX(eiw)I with IX(ejW)I where
Ix(ej")l
M-1
IXi(e'")I
i=O
B. Additive Noise Model Assume that a windowed noise signal n ( k ) has been addedto a windowed speech signal s(k), with their sum denoted by X@). Then
x ( k ) = s(k) + n(k).
Taking the Fourier transform gives
X i ( & "
gives
= ith time-windowed transform ofx ( k )
The rationale behind averaging is that the spectral error becomes approximately
X(e'") = S(ei")
where
x(k)
+ +
+ N(eiw)
e(e'"> =
where
s(eiw> zs
-p
X(ei")
Thus, the sample mean of IN(eiw)l will converge to p(e'") as a longer average is taken. The obvious problem with this modification is that thespeech is nonstationary, and therefore only limited time averaging is allowed.DRT results show that averaging over more than three half-overlapped windows with a total time duration of 38.4 ms will decrease intelligibility. Spectralexamples and DRT scores with and without averaging are given in the "Results" section. Based upon these results, it appears that averaging coupled with half rectification offers some improvement. The major disadvantage of averaging is the risk of some temporal smearing of short transitory sounds.
k=O
1 2n x(#%)=-
1,
=
X(eiw)ejwk dw.
C . Spectral Subtraction Estimator

The spectral subtraction filter H(eiw) is Calculated by replacing the noise spectrum N ( e i w ) with spectra which can be readily measured. The magnitude (N(eiw)(of N(eiw) is replaced by its averagevalue p ( e J w ) takenduringnonspeech activity, and the phase e,(ei") of N(eiw) is replaced by the phase ex(eiw) of X(eiw). T2esesubstitutions result in the spectral subtraction estimatorS ( e i w ) :
F, Half-Wave Rectification For each frequency w where the noisy signal spectrum magnitude IX(eIW)Iis less than the average noise spectrum magnitude p(ei"), the output isset t o zero. This modification canbesimply implemented by half-wave rectifying H(eiw). The estimator then becomes
$(ejw>
= HR(ejW)X(ejW)
where
D. Spectral Error The spectral error e(e'") given by
resulting from this estimator is
= $ ( e ' W ) - s ( e i W >= N ( e i w ) - p(e'"> ejex.
The input-output relationship between X(eiW) and $(eiw) at each frequency c3 is shown in Fig. 1 . Thus, the effectof half-wave rectification is to bias down the magnitudespectrumateachfrequency w by the noisebias determined at that frequency. The bias value can, of course,
BOLL: SUPPRESSION OFSPEECH ACOUSTIC NOISE IN
115
probability that the spectrum at that frequency is due tolow energy speech; therefore,*taking the minimum will retain the information; and third, if S ( e i w ) is greater than the maximum, there is speech present at that frequency; therefore,removing the bias is sufficient. The amount of noise reductionusing this replacement scheme was judged equivalent to thatobtained by averaging over three frames. However, with this approach high energy frequency bins are not averaged together. The disadvantage to the scheme is that more storage is required t o save themaximum noise residuals andthemagnitude values for three adjacent frames. The residual noise reduction schemeis implemented as
Fig. 1. Input-output relation between X(@)
and $(eiw).
change from frequency to frequency as well as from analysis time window to time window. The advantage of half rectification is that the noise floor is reduced by p ( e i w ) . Also, any low variance coherent noise tones are essentially eliminated. The disadvantage of half rectification can exhibit itself in the situation where the sum of the noise plus speechat a frequency w isless than p(e'"). Then the speech information at that frequency is incorrectly removed, implying apossible decrease in intelligibility. As discussed in the section on "Results," for the helicopter speech data base this processing did not reduce intelligibility as measured using the DRT.
C. Residual Noise Reduction
where
and max INR(ejw)I = maximum value of noise residual measured during nonspeech activity.
H. Additional Signal Attenuation During Nonspeech Activity The energycontent of $(ei") relative to p(eiW) provides an
accurate indicator of the presence of speech activity within a given analysis frame. If speech activity is absent, then S ( e i w ) will consist of the noise residual which remains afterhalf-wave rectification and minimum value selection. Empirically,it was determined that the average (before versus after) power ratio was down at least 12 dB. This implied a measurefor detecting the absence of speechgiven by
After half-wave rectification, speech plus noise lying above

p remain.Intheabsenceofspeech activity thedifference NR - N - p i e n , which shall be called the noise residual, will
for uncorrelated noise exhibit itself in the spectrum as randomly spaced narrow bands of magnitude spikes (see Fig. 7). This noise residual will have a magnitude between zero and a maximum value measuredduringnonspeech activity. Transformed back to the time domain, the noise residual will sound If T was less than - 12 dB, the frame was classified as having like the sumof tone generatorswithrandomfundamental no speech activity. During the absence of speechactivity there frequencies which are turned on and off at a rate of about 20 are at least three options prior to resynthesis: do nothing, atms. During speech activity the noise residual will also be pertenuate the output by a fixed factor, or set the output to zero. ceived at thosefrequencieswhich are not masked bythe Having some signal present during nonspeech activity was speech. The audible effects of the noise residual can be reduced by judged to give the higher quality result. A possible reason for takingadvantageof its frame-to-framerandomness. Specifi- this is that noise presentduringspeech activity is partially cally, at a given frequency bin, since the noise residual will masked bythe speech. Its perceived magnitudeshould be balanced by the presence of the same amount of noise during randomlyfluctuateinamplitudeateach analysis frame,it nonspeech activity. Setting the buffer to zero had the effect can be suppressed by replacing its current value with its activity. Likewise, doing minimum value chosen from the adjacent analysis frames. of amplifying the noise during speech nothing had the effect of amplifying the noise during nonspeech T&$g the minimum value is used only when the magnitude of S ( e i w ) is less than the maximum noise residual calculated activity. A reasonable, though by no means optimum amount output during nonspeech activity. The motivation behind tEs replace- ofattenuation was found t o be -30 dB.Thus,the spectral estimate including output attenuation during nonment scheme is threefold: first,if the amplitude of&'(eiW) lies below the maximum noise residual, and it varies radically from speech activity is given by analysis frame to frame, then there is a high probability that T>-12dB the spectrum at that frequency is due to noee; therefore,supT Q - 12 dB $ ( e j w ) = cX(eiw) $(eiw) press it by taking the minimum; second, if S ( e i w ) lies below the maximum but has a nearly constant value, there is a high where 20 log,, c = -30 dB.
116
IEEE TRANSACTIONS
ON ACOUSTICS, SPEECH,
AND SIGNAL PROCESSING, VOL.
ASP-27, NO. 2, APRIL 1979
D. Magnitude Averaging
As was described in the previous section, the variance of the noise spectral estimate is reduced by averaging over as many spectral magnitude sets as possible. However, the nonstationarity of the speech limits the total time interval available for local averaging. Thenumber of averages is limited bythe number of analysis windows whichcan be fit into the stationary speech time interval. The choice of window length and averaging interval must compromise between conflicting requirements.Foracceptablespectralresolution a windowlength greater than twice the expected largest pitch period is required with a 256-point window being used. For minimum noise variance a large number of windows are required for averaging. Finally, for acceptable time resolution a narrow analysis interval is required. A reasonable compromise between variance reductionand time resolution appears to be three averages. This results in an effective analysis time interval of 38 ms.
E. Bias Estimation
1
PROCESS
Fig. 2. Data segmentation and advance.
111. ALGORITHM IMPLEMENTATION
A. Introduction Based onthedevelopmentofthe last section, a complete analysis-synthesis algorithm can be constructed. This section presentsthe specifications required to implement a spectral subtraction noise suppression system.
B. Input-Output Data Buffering and Windowing
Speech from the A-D converter is segmented and windowed such that in the absence of spectral modifications, if the synthesis speech segments are added together, theresulting overall system reduces to anidentity.Thedata are segmented and windowed using the result [ 121 that if a sequence is separated into half-overlapped data buffers,and each buffer is multiplied by a Hanning window, then the sum of these windowed sequences adds back up to the original sequences. The window length is chosen to beapproximately twice as largeas the maximum expected pitch period for adequate frequency resolution[13] . For the sampling rate of 8.00 kHz a window length of 256 points shifted in steps of 128 points was used. Fig. 2 shows the data segmentation andadvance.
The spectral subtraction method requires an estimate at each frequency bin of the expected value of noise magnitude spectrum p~ :
PN
=E{INI).
C Frequency Analysis
The DFT of each data window is taken and the magnitude is computed. Since real data are being transformed, two datawindows can be transformed using one FFT [14] . The FFT size is set equal to the window size of 256. Augmentation with zeros was not incorporated. As correctly noted by Allen [15] , spectral modification followed by inverse transforming can distort the time waveform due to temporal aliasingcaused by circular convolution with the time response of the modification. Augmenting the input time waveform with zeros before spectral modification will minimize this aliasing. Experiments withandwithoutaugmentation using thehelicopter speech resulted in negligible differences, and therefore augmentation was notincorporated. Finally, sincereal data are analyzed, transform symmetries were taken advantage of to reduce storage requirements essentially in half [I41 .
This estimateisobtained by averaging the signalmagnitude spectrum 1 x 1 duringnonspeechactivity. Estimating pN in this manner places certain constraints when implementing the method. If the noise remains stationary during the subsequent speech activity, then an initial startup or calibration period of noise-only signal is required. During this period (on the order of a third of a second) an estimate of pN can be computed. If the noise environment is nonstationary, then a new estimate of p N must be calculated prior to bias removal each time the noise spectrum changes. Since the estimate is computed using the noise-only signal during nonspeech activity, a voice switch is required. When the voice switch is off, anaveragenoise spectrum can be recomputed. If the noisemagnitudespectrum ischanging faster than an estimate of it canbe computed,then time averaging to estimate pN cannot be used. Likewise, if the expected value of the noise spectrum changes after an estimate of it has been computed, then noise reduction through bias removal will be less effective or even harmful, i.e., removing speech where little noise is present.
F. Bias Removal and Half- Wave Rectification
A
The spectral subtraction spectral estimate S is obtained by subtracting the expected noise magnitude spectrum p from the magnitude signal spectrum 1x1. Thus
IS^(k)I=IX(k)I-p(k)
O K
k=0,1;..,L-I
where L = DFT buffer length. After subtracting, the differenced values having negative magnitudes are set to zero (half-wave rectification). These
BOLL: SUPPRESSION OF ACOUSTIC NOISE IN SPEECH
117
negative differences represent frequencies where the sum of speech plus local noise is lessthan the expected noise.
G. Residual Noise Reduction As discussed in the previous section, the noise that remains after the mean is removed can be suppressed or even removed by selecting the minimum magnitude value from the three adjacent analysis frames in each frequency bin where the currentamplitude isless than the maximum noise residual measured duringnonspeech activity. This replacement procedure follows bias removal and half-wave rectification. Since the minimum is chosen from values on each side of the current time frame, the modification induces a one frame delay. The improvement in performance was judged superior to three frame averaging in that an equivalent amount of noise suppression resulted withoutthe adverse effect of high-energy spectral smoothing. The following section presents examples of spectra with and without residual noise reduction.
1
Hanning,Window
,&, I
H a l f - W a v eR e c t i f y
R e d u c eN o i s eR e s i d u a l
H. Additional Noise Suppression During Nonspeech Activity

The final improvement in noise reduction is signal suppression during nonspeech activity. Aswas discussed, a balance must be maintained between the magnitude and characteristics of the noise that is perceived during speech activity and the noise that is perceived during speech absence. An effective speech activity detector was defined using spectra generated by the spectral subtraction algorithm. This detector required thedetermination of a threshold signaling absence of speech activity. This threshold ( T = - 12 dB) was empirically determined to ensure that only signals definitely consisting of background noise would be attenuated.
Fig. 3. System block diagram.
I. Synthesis
After bias removal, rectification, residual noise removal, and nonspeech signal suppression a time waveform is reconstructed from the modified magnitude corresponding to the center window. Again, since only real data are generated, two time win- B. Short-Time Spectra of Helicopter Speech dows are computed simultaneously using one inverse FFT. Isometric plots of time versus frequency magnitude spectra The data windows are then overlap added to form the output were constructed from the data by computing and displaying speech sequence. The overall system block diagram is given in magnitude spectra from 64 overlapped Hanning windows. Fig. 3. Each line represents a 128-point frequency analysis. Time increases from bottom to top and frequency from left to right. VI. RESULTS A 920 ms section of speech recorded with a noise-cancelling A. Introduction microphone inahelicopterenvironment is presented. The Examples of the performance of spectral subtraction will be phrase Save your was filtered at 3.2 kHz and sampled at presented in two forms: isometric plots of time versus fre- 6.67 kHz. Since the noise was acoustically added, no underquency magnitude spectra,with and without noise cancella- lying clean speech signal is available. Fig. 4 shows the digitized tion; and intelligibility and quality measurement obtained time signal. Fig. 5 shows the averagenoise magnitude specfrom the Diagnostic Rhyme Test (DRT) [ 11J . The DRT is a trum computed by averagingover the first 300 ms of nonwell-established method for evaluating speech processing speech activity. The short-time spectrum of the noisy signal devices. Testing and scoring of the DRT data basewas pro- x is shown in Fig. 6 . Note the high amplitude, narrow-band vided by DynastatInc. [12]. A limited single speaker DRT ridges corresponding to the fundamental (1550 Hz) and first test was used. The DRT data base consisted of 192 words harmonic (3100 Hz) of the helicopter engine, aswellas the using speaker RH recorded inahelicopterenvironment.A ramped noise floor above 1800 Hz. Fig. 7 shows the result crew of 8 listeners was used. from bias removal and rectification. Figs. 8 and 9 show the The results are presented as follows: 1) short-time ampli- noisy spectrum and the spectralsubtractionestimate using tude spectra of helicopter speech; 2) DRT intelligibility and three frame averaging. quality scores on LPC vocoded speech using as input the data These figures indicate that considerable noise rejection has
given in 2); and 3) short-time spectra showing additional improvements in noise rejection through residual noise suppression and nonspeech signal attenuation.
118
IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-21, NO.
1
2 , APRIL 1919
2 145Et4
2.0
I
II
I rI I.
8.8
-1
.e
- 1 .953E+4
1 0
2.0
I
3 0
I
4 0
II]
1 0 2 0
1 .000E+0 RECORD 1
5 B 6.0 6 144E+3
3 0
6144 S A M P L E S
0 000E+0
3 334E+3
Fig. 4 . Time waveform of helicopter speech. Save your.
Fig. 5. Average noise magnitude of helicopter noise.
Fig. 6 . Short-time spectrum of helicopter speech.
spectrum Fig. 7. Short-time
using bias removal rectification.
and half-wave
Fig. 8. Short-timespectrum
of helicopterspeech averaging.
using threeframe
Fig. 9. Short-timespectrum after frame cation
using bias removal and half-wave rectifithree averaging.
BOLL: SUPPRESSION OF ACOUSTIC NOISE IN SPEECH
119
been achieved, although some noise residual remains. The next step was to quantitatively measure the effect of spectral subtraction on intelligibility and quality. For this task a limited single speaker DRT was invoked t o establish an anchor point for credibility.
TABLE I DIAGNOSTIC RHYME TEST SCORES

Original Voicing Nasality Sustention Sibilation 95 a2 92 75
(No Average)
92
s^
(Three Average) 91 77
78
87
C Intelligibility and Quality Results using the DRT
a6
The DRT data base consisted of 192 words recorded i na Graveness helicopter environment. The data base was filtered at 4 kHz Compactness and sampled at 8 kHz. During the pause between each word, Total the noise bias was updated. Six output speech files were generated: 1) digitized original; 2) speech resulting from bias removal and rectification without averaging; 3) speech resulting from bias removal and rectification using three averages; 4) an LPC vocoded version of original speech; 5) an LPC vocoded version of 2); and 6 ) an LPC vocoded version of 3). The last three experiments were conducted to measure intelligibility andqualityimprovements resulting fromthe useof Naturalness o f Signal spectral subtraction as apreprocessor t o an LPC analysisInconspicuousness synthesis device. The LPC vocoder usedwas a nonreal-time o f Background floating-point implementation [ 171 . A ten-pole autocorrela- I n t e l l i g i b i l i t y tion implementation was used with a SIFT pitch tracker [ 181 . Pleasantness The channel parameters used for synthesis were not quantized. Overall Thus, any degradation would not be attributed to parameter Acceptability quantization, but rather to the all-pole approximation to the Composite Acceptability spectrumand tothe buzz-hissapproximation to theerror signal. In addition, a frame rate of 40 frames/s was used which is typical of 2400 bit/s implementations. The vocoder on 3.2 kHz filtered clean speech achieved a DRT score of 88. In additionto intelligibility, a coarse measure of quality [ 191 was conducted using the same DRT data base. These quality scores are neither quantitatively nor qualitatively equivalent to the more rigorous quality tests such as PARM or DAM [20] . Voicing 90 However, they do indicate on86 a relativescale improvements Nasa lity 52 between data sets. Modern 2.4 kbit/s systems are expected to Sustention range from 45 to 50 on composite acceptability; unprocessed Sibilation speech, 88-92. Graveness The results of the tests are summarized in Tables I-IV. Tables I and I1 indicate that spectral subtraction alone does Compactness Total not decrease intelligibility, but does increase quality, especially in the areas of increased pleasantness and inconspicuousness of noise background. Tables I11 and IV clearly indicate that spectral subtraction can be used to improve the intelligibility and quality of speech processed through an LPC bandwidth compression device.
a3
70 87 a3
a4
66
6a
88
84
a8
a2
TABLE I1 QUALITY RATINGS

Original
S (No Average)
60
38 32 31 33 32
S (Three Averages)
63
36
61 42
30
20 27
33
25 29 29
26
TABLE I11 DIAGNOSTIC RHYME TEST SCORES

LPC on Original 84
56
LPC on S without averaging

A
LPC on S with averaging

A
63 52
70
49 61 61 83
56
88 59 93 72
62 83 70
66
TABLE IV QUALITY RATINGS

LPC on Original 53 LPC on S with ao vu etr a g i n g
A
D. Short-Time Spectra Using Residual Noise Reduction and Nonspeech Signal Attenuation Naturalness o f Signal Based onthepromising results of these preliminaryDRT Inconspicuousness experiments, the algorithm was modified to incorporate resido f Background ual noise reduction and nonspeech signal attenuation. Fig. 10 I n t e l l i g i b i l i t y shows the short-time spectra using the helicopter speech data Pleasantness with both modifications added. Note that now noise between O v e r a l l Acceptability words has been reduced below the resolution of the graph, and noise within the words has been significantly attenuated (com- Composite Acceptability pare with Fig. 7). -
LPC on w i t h averaging 5a 39
26
49 36 30 28
26
34
za
15 24 23
20 26 25
29
120
IEEE TRANSACTIONS ON ACOUSTICS, SPECECH, AND SIGNAL PROCESSING, VOL. ASSP-27, NO. 2, APRIL 1979 noise,. ZEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-24, pp. 488-494, Dec. 1976. [3] D. Coulter, private communication. [4] S. F. Boll, Improving linear prediction analysis of noisy speech in hoc. IEEE Znt. Coni on by predictivenoisecancellation, Acoust., Speech, Signal Processing, Philadelphia, PA, Apr. 12-14, 1976, pp. 10-13. [5] J. S. Lim and A. V. Oppenheim, All pole modeling of degraded speech, ZEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-26, pp. 197-210, June 1978. [6] B. Gold, Digital speechnetworks, Proc. ZEEE, vol. 65, pp. 1636-1658, Dec. 1977. [7] B. Beek, E. P. Neuberg, and D. C. Hodge, An assessment of the technologyof automatic speechrecognition for militaryapplications, IEEE Trans. Acoust.,Speech,Signal Processing, vol. ASSP-25, pp. 310-322, Aug. 1977. [8] J. D. Markel, Text independent speaker identificationfrom a large linguistically unconstrained time-spaced data base, in Proc. IEEE Znt. Coni on Acoust.,Speech,Signal Processing, Tulsa, OK, Apr. 1978, pp. 287-291. [9] B. Widrow et al., Adaptive noise cancelling: Principles and applications,Proc. ZEEE, vol. 63, pp. 1692-1716, Dec. 1975. [ l o ] S . F. Boll and D. Pulsipher, Noise suppression methodsfor robust speech processing, Dep. Comput. Sci.,Univ. Utah, Salt Lake City, Semi-Annu. Tech. Rep., Utec-CSc-77-202, pp. 50-54, Oct. 1977. [ 11] W. D. Voiers, A. D. Sharpley, and C. H. Helmsath, Research on diagnostic evaluation of speech intelligibility, AFSC, Final Rep., Contract AF19628-70-C-O182,1973. [ 121 M. R. Weiss, E. Aschkenasy, and T. W. Parsons, Studyand development of the INTEL technique for improving speech intelligibility, Nicolet ScientificCorp., Final Rep. NSC-FR/4023, Dec. 1974. [13] J. Makhoul and J. Wolf, Linear prediction andthe spectral analysis of speech, Bolt, Beranek, and Newman Inc., BBN Rep. 2304, NTIS No. AD-749066, pp. 172-185,1972. I141 0. Brigham, The Fast Fourier Transform. Englewood Cliffs, NJ: Prentice-Hall, 1974. [ 151 J. Allen, Short term spectral analysis, synthesis, and modification by discrete Fourier transform, IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-25, pp. 235-239, June 1977. [16] Dynastat Inc., Austin, TX 78705. [17] S. F. Boll, Selected methodsfor improving synthesis speech quality using linear predictive coding: Systemdescription,coefficientsmoothing and STREAK, Dep. Comput. Sci., Univ. Utah, Salt Lake City, UTEC-CS-74-151, Nov. 1974. [18] J. D. Markel and A. H. Gray, Linear Prediction of Speech. New York: Springer-Verlag, 1976, pp. 206-210. [ 191 In-house research, Dynastat Inc., Austin, TX. [20] W. D. Voiers, Diagnostic acceptability measure for speech communicationsystems, in Proc. Znt. Conf. on Acoust., Speech, Signal Processing, Hartford, CT, May 1977, pp. 204-207. [21] S. F. Boll, Suppression of noise in speech using the SABER method, in Proc. ZEEE Znt. Conf. on Acoust., Speech, Signal Processing, Tulsa, OK, Apr. 1978, pp. 606-609.
Fig. 10. Short-time spectrum using bias removal, half-wave rectification, residual noise reduction, and nonspeech signal attenuation (helicopter speech).
v.
SUMMARY AND CONCLUSIONS
A preprocessing noise suppression algorithm using spectral subtraction has been developed, implemented, and tested. Spectralestimatesforthebackground noisewere obtained fromtheinput signal duringnonspeech activity. Thealgorithm can be implemented using a single microphone source and requires aboutthe same computation as ahigh-speech convolution. Its performance was demonstrated using shorttime spectra with and without noise suppression and quantitatively tested forimprovements in intelligibility andquality using the Diagnostic Rhyme Test conducted by DynastatInc. Results indicate overall significant improvements in quality and intelligibility when used as a preprocessorto an LPC speech analysis-synthesis vocoder.
ACKNOWLEDGMENT The author would like to thank K. Evans for preparation of the manuscript and M. Milochik for preparation of the photographs.
REFERENCES
[ l ] B. Gold, Robust speech processing, M.I.T. Lincoln Lab., Tech. Note 1976-6, DDC AD-A012 P99/0, Jan. 27, 1976. [2] M. R. Samburand N. S. Jayant, LPC analysis/synthesis from speech inputs containing quantizing noise or additive white

Boll79 SuppressionAcousticNoiseSpectralSubtraction PDF

Uploaded by

Copyright:

Available Formats

Boll79 SuppressionAcousticNoiseSpectralSubtraction PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Boll79 SuppressionAcousticNoiseSpectralSubtraction PDF

Uploaded by

Copyright:

Available Formats

IEEE TRANSACTIONS

ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING,

VOL. ASSP-27, NO. 2 , APRIL 1979

Suppression of Acoustic Noise in Speech Using Spectral Subtraction

IEEE TRANSACTIONS ON ACOUSTICS, SPEECH,

AND SIGNAL PROCESSING,

VOL.ASSP-27, NO. 2, APRIL 1979

= ith time-windowed transform ofx ( k )

C . Spectral Subtraction Estimator

D. Spectral Error The spectral error e(e'") given by

resulting from this estimator is

= $ ( e ' W ) - s ( e i W >= N ( e i w ) - p(e'"> ejex.

BOLL: SUPPRESSION OFSPEECH ACOUSTIC NOISE IN

Fig. 1. Input-output relation between X(@)

After half-wave rectification, speech plus noise lying above

AND SIGNAL PROCESSING, VOL.

ASP-27, NO. 2, APRIL 1979

Fig. 2. Data segmentation and advance.

111. ALGORITHM IMPLEMENTATION

BOLL: SUPPRESSION OF ACOUSTIC NOISE IN SPEECH

H. Additional Noise Suppression During Nonspeech Activity

Fig. 3. System block diagram.

Fig. 4 . Time waveform of helicopter speech. Save your.

Fig. 5. Average noise magnitude of helicopter noise.

Fig. 6 . Short-time spectrum of helicopter speech.

spectrum Fig. 7. Short-time

using bias removal rectification.

Fig. 9. Short-timespectrum after frame cation

using bias removal and half-wave rectifithree averaging.

BOLL: SUPPRESSION OF ACOUSTIC NOISE IN SPEECH

TABLE I DIAGNOSTIC RHYME TEST SCORES

C Intelligibility and Quality Results using the DRT

TABLE I1 QUALITY RATINGS

TABLE I11 DIAGNOSTIC RHYME TEST SCORES

LPC on S without averaging

LPC on S with averaging

TABLE IV QUALITY RATINGS

SUMMARY AND CONCLUSIONS

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.