Creating Clarity in Noisy Environments by Using de
Creating Clarity in Noisy Environments by Using de
Creating Clarity in Noisy Environments by Using de
THIEME
Lorenz Fiedler, Ph.D.,2 Jesper Jensen, Ph.D.,1 and Thomas Behrens, M.Sc.1
ABSTRACT
1
Oticon A/S, Smørum, Denmark; 2Eriksholm Research License, permitting copying and reproduction so long as the original work
Centre, Oticon A/S, Snekkersten, Denmark. is given appropriate credit. Contents may not be used for commercial
Address for correspondence: (e-mail: aand@demant. purposes, or adapted, remixed, transformed or built upon. (https://
com). creativecommons.org/licenses/by-nc-nd/4.0/)
Hearing Aid Technology to Improve Speech Intelligi- Thieme Medical Publishers, Inc., 333 Seventh Avenue,
bility in Noise; Guest Editor, Joshua M. Alexander, Ph.D. 18th Floor, New York, NY 10001, USA,
Semin Hear 2021;42:260–281. # 2021. The Author(s). DOI: https://doi.org/10.1055/s-0041-1735134.
This is an open access article published by Thieme under the terms of ISSN 0734-0451.
the Creative Commons Attribution-NonDerivative-NonCommercial-
260
CREATING CLARITY IN NOISY ENVIRONMENTS BY USING DEEP LEARNING IN HEARING AIDS/ANDERSEN ET AL 261
H earing aids are often misconceived as To reduce the impact of the deficits men-
being simple amplifiers of sound. While this tioned earlier and to make challenging listening
may have been true in the past, modern hearing environments more accessible to the user, mod-
aids use a vast array of technologies to help the ern hearing aids apply noise reduction algo-
user perceive their surroundings. One of these rithms. These tackle the difficulty of noisy
technologies, which particularly finds its use- environments directly by attempting to reduce
fulness in the most challenging and noisy distracting background noise without removing
environments, is the noise reduction system. target sounds such as speech.
The primary “medicine” administered by a This article provides the reader with an
hearing aid is hearing loss compensation. This understanding of the techniques used for re-
applies frequency-dependent gain, derived from ducing unwanted environmental noise in hear-
the user’s pure-tone thresholds, and dynamic ing aids. The focus will be on building intuition
range compression to ensure that soft sounds rather than on providing complete mathemati-
are amplified enough to be audible without loud cal detail. Section 2 describes the typical struc-
sounds being amplified so much as to cause ture of a noise reduction system as employed in
discomfort or pain. However, despite such a hearing aid. Such a system primarily com-
compensation, many users still report difficulty prises an adaptive beamformer, which removes
coping with noisy environments.1,2 This sug- noise by adapting the directional response of the
gests that the effects of hearing loss cannot hearing aid, coupled with a postfilter, which
simply be compensated away through the use of removes noise by applying time- and frequency-
amplification. dependent attenuation to the signal. Section 3
While the origins of sensorineural hearing describes how deep learning, a subdiscipline of
loss are complicated and incompletely under- artificial intelligence, is currently making
stood, psychophysical experiments have revea- completely new approaches for noise reduction
led a range of deficits in the impaired hearing available. After building basic intuition about
system that are not related to a simple loss of the principle of deep learning, it is described
sensitivity. These include the following3: how a neural network can be trained to replace
the postfilter in a noise reduction system. This is
Frequency spread of masking. Noise present in shown to give rise to considerable improve-
one frequency region may spread over a ments in noise reduction performance. Section
broader range to disturb sounds in nearby 4 is a brief comment on the importance of using
frequency regions. This spread is more ex- an automatic system to regulate the noise
tensive for hearing-impaired listeners. reduction system. Section 5 presents results
Temporal spread of masking. Noise bursts may from a selection of measurements and clinical
mask following sounds. The duration across studies that highlight the importance and con-
which this effect is present tends to be longer tinued improvement of noise reduction tech-
for hearing-impaired listeners. nology. Section 6 concludes upon the findings.
Reduced ability to use spatial cues. This deficit
reduces the ability to localize sound sources
and the ability to improve speech under- 1: THE PRINCIPLES OF NOISE
standing in noise via spatially selective at- REDUCTION
tention. This section provides an intuitive description of
the core principles used for noise reduction in
The aforementioned deficits, which cannot hearing aids. Fig. 1 shows the main components
be compensated by gain or compression, can involved in such a noise reduction system. Two
make speech intelligibility in noisy environ- separate—but highly co-dependent—methods
ments worse. Therefore, hearing loss is often are used to reduce noise:
modeled as the sum of an attenuation compo-
nent that can be compensated by amplification Beamforming utilizes the fact that modern
and additional distortion components that hearing aids most often have multiple mic-
cannot.4,5 rophones to amplify or suppress sounds
262 SEMINARS IN HEARING/VOLUME 42, NUMBER 3 2021 # 2021. THE AUTHOR(S).
Figure 1 An overview of the components used in the noise reduction system of a typical modern hearing
aid. The signals from two microphones are converted to a time–frequency representation using separate
analysis filterbanks (AFBs). An adaptive beamformer controls the directional response of the system by
applying variable gains and time delays to one of the two signals before these are summed together. A
postfilter computes a time- and frequency-dependent gain which is applied to the signal before a synthesis
filterbank (SFB) converts the time–frequency representation of the signal back to an audio waveform.
depending on the direction from which they practice. Hearing aids, therefore, employ an
originate. This principle may also be referred analysis filterbank to split the input signal
to as directionality, directional processing, into short overlapping time segments and ana-
or spatial processing. lyze the frequency content of these. This results
Postfiltering aims at suppressing any left- in a signal representation that is closely related
over noise from the beamforming process. It to a spectrogram. Most processing (e.g., beam-
does so by attenuating time–frequency forming and postfiltering) is conveniently per-
regions that are dominated by leftover noise. formed on this signal representation. When the
Postfiltering is closely related to single- signal has been processed, a synthesis filterbank
channel noise reduction. converts the signal back to an audio waveform
by resynthesizing overlapping wave segments
Note that the term noise reduction is used and combining them. The principle of analysis,
here to refer to the joint use of these two processing, and synthesis is illustrated in Fig. 2.
principles, whereas some authors denote only
postfiltering as noise reduction or single-channel
noise reduction. 1.2: Beamforming
Here, the necessary concept of filterbanks is Modern hearing aids typically have two micro-
covered briefly (Section 2.1). Then beamforming phones mounted with a distance of approximately
(Section 2.2) and postfiltering (Section 2.3) are 6 to 12 mm, depending on the hearing aid style
covered separately. Lastly, Section 2.4 comments and brand. Depending on the direction of the
on the strong integration between beamforming impinging sound, it may arrive at one microphone
and postfiltering, both in theory and practice. slightly before the other. While this time differ-
ence is tiny (at most 35 microseconds), it holds
valuable information about the direction of the
1.1: Analysis and Synthesis Filterbanks sound. For instance, as Fig. 3 illustrates, if the two
The human auditory system has an amazing microphone outputs are simply summed together,
ability to discern different frequencies contai- the amplitude of the resulting signal depends
ned in audio signals.6 Similarly, hearing aids greatly on the direction from which the sound
can benefit from separately processing different arrived. This suggests that by simply summing the
frequency bands. The frequencies contained in microphone signals, one can perform filtering in
an audio signal are, however, not readily visible space: signals from certain directions can be sup-
from the raw audio waveform. This makes the pressed completely, while signals from other
raw audio waveform difficult to work with in directions can pass through unaltered.
CREATING CLARITY IN NOISY ENVIRONMENTS BY USING DEEP LEARNING IN HEARING AIDS/ANDERSEN ET AL 263
Figure 2 First, an analysis filterbank reveals the frequency structure inherent in an audio waveform of
speech. Processing is performed in this representation, after which a synthesis filterbank is used to transform
the result back to an audio waveform.
Figure 3 The physical principle utilized in beamforming. (a) A single-tone signal impinging on a pair of
microphones at an angle of 90 degrees relative to the axis of the microphones. The oscillations are picked up
simultaneously by the microphones, resulting in signals that are in phase. When the two signals are summed,
they add constructively to form a signal with twice the individual amplitude. (b) The signal impinges from a
larger angle. Because of this, the sound arrives slightly earlier at the rear microphone compared with the front
microphone. This causes the two signals to be out of phase. When summed, the signals cancel due to
destructive interference.
264 SEMINARS IN HEARING/VOLUME 42, NUMBER 3 2021 # 2021. THE AUTHOR(S).
Figure 4 Showing how the principle illustrated in Fig. 3 can be controlled. The two microphones pick up
signals that are not in phase and do not have the same amplitude. By applying a time delay and a gain to one
of the signals, these differences are removed. The resulting signals sum constructively to a signal with twice
the amplitude, even though the signals picked up by the microphones would not have.
A beamformer controls this phenomenon cancels sound from behind, making it particu-
by applying additional gains and time shifts to larly useful in listening environments where the
one or both of the signals before summing them target is located in the front and noise in the
together. These parameters can be determined back. The hypercardioid has nulls placed at
mathematically to ensure that sounds from 109 degrees and provides the highest possible
specific directions are attenuated while sounds amount of noise reduction, assuming that the
from other directions remain unaltered target is located in the front and the noise comes
(see Fig. 4). evenly from all directions (i.e., a spherically
Beamforming allows an enormous degree diffuse noise field). Please refer to Elko7 for a
of flexibility for continuously reconfiguring thorough overview of the properties of various
the directional properties of the hearing aid directional patterns.
according to the current listening environment The patterns shown in Fig. 5 assume free
or the desired focus of the user. Hearing aids field acoustics and thus neglect the acoustic
may offer a range of fixed directional patterns influence of the hearing aid shell and the user’s
as well as adaptive directional patterns that head and body. The user’s head has a consider-
change continuously to suit the environmental able influence on the directional pattern that is
characteristics. actually realized, making it less symmetric by
attenuating sounds coming from the opposite
1.2.1: FIXED BEAMFORMING side of the head (see Fig. 6i in the article by
By determining appropriate fixed values for the Derleth et al in this issue for an example of this
delay and gain parameters applied in Fig. 4, it is phenomenon).
possible to produce a range of static directional
patterns, examples of which are shown in Fig. 5. 1.2.2: ADAPTIVE BEAMFORMING
The most straightforward of these is the omni- Fixed beamformers force the user to either
directional response, which is produced by a listen with the same directional pattern in all
single microphone, that is, by applying a gain of listening environments or make a conscious
0 (1 dB) to the other microphone. The effort to change programs whenever a different
omnidirectional pattern has the same sensitivity directional response is desired. A less manual
to all impinging sounds. It is typically preferred approach is to automatically adapt the beam-
in environments where background noise is not former parameters to minimize background
an issue because it maintains the natural balance noise across changing listening environments.
of the listening environment. The remaining Modern hearing aids tend to include at least
patterns are left–right symmetric and have at some degree of adaptive beamforming, even in
most two spatial nulls, which are directions their default configurations.
where sound is completely canceled. The dipole A common approach for adaptive beam-
cancels sound from the sides while passing forming is the adaptive minimum variance dis-
sound from the front and rear. The cardioid tortionless response (MVDR) beamformer.8,9
CREATING CLARITY IN NOISY ENVIRONMENTS BY USING DEEP LEARNING IN HEARING AIDS/ANDERSEN ET AL 265
Figure 5 Examples of directional responses that can be achieved using the described principles of
beamforming. The plots show the attenuation of sounds reaching the hearing aid depending on the angle of
arrival in the horizontal plane.
This collects statistics about the listening envi- a group of noise sources can be attenuated by
ronment to derive beamformer parameters that placing a null in the middle of them. The bottom
(1) attenuate the total received sound as much as right example shows the pattern that arises when
possible (i.e., achieve minimum variance), while several noise sources are distributed uniformly
(2) ensuring that sounds from the target direc- around the user.
tion are not attenuated or amplified (i.e., achieve The top right plot in Fig. 6 shows that,
a distortionless response toward the target). The while the MVDR beamformer guarantees 0 dB
target direction must be estimated separately or gain in the target direction, it may actually
simply assumed to be directly in front of the amplify signals from other directions. Note,
user. Fig. 6 shows several examples of directional however, that this has no impact in this partic-
patterns arising from the use of an MVDR ular example since neither target nor noise is
beamformer for different configurations of noise located in the directions with positive gain.
sources. The top left example shows how the Since beamforming is applied to the fre-
MVDR beamformer can completely cancel a quency decomposition given by the analysis
single noise source by placing a null in that filterbank, different directional patterns can
direction. The bottom left example shows how be applied for each frequency band. This allows
266 SEMINARS IN HEARING/VOLUME 42, NUMBER 3 2021 # 2021. THE AUTHOR(S).
Figure 6 Examples of directional responses achieved with an adaptive MVDR beamformer for different
configurations of target and noise. In all four examples, the target is located in front of the user (0˚), while one
or more noise sources are located at directions indicated by the dots near the perimeter of the plots.
the adaptive beamformer to choose indepen- only from “softer” patterns that do not have
dent directional patterns that suppress the nulls, or avoid strict assumptions on where the
dominating noise sources in each frequency target is located.
band.
MVDR beamforming is a very powerful
technique to reduce background noise. Howev- 1.3: Postfiltering
er, for this same reason, it is often perceived as Beamforming is a very powerful tool for remov-
being too aggressive. Removing too much ing background noise whenever speech and
background noise can cause the user to feel noise arrive from different directions. It is,
detached from their surroundings. Therefore, however, unable to remove noise from the
such techniques require additional controls and target direction. This problem can instead be
limitations to be useful in practice. For instance, approached using methods from single-micro-
one might constrain the beamformer to select phone noise reduction. When such processing
CREATING CLARITY IN NOISY ENVIRONMENTS BY USING DEEP LEARNING IN HEARING AIDS/ANDERSEN ET AL 267
is applied after beamforming, it is often referred according to a similar principle, but they aim to
to as postfiltering. Such methods attempt to solve slightly different mathematical problems
attenuate time–frequency regions in the signal or rely on different speech and noise
(as seen in a spectrogram) dominated by noise. models.11,12
They do so by applying a postfilter gain of less The processing of a postfilter is most easily
than 0 dB to noisy regions. The most well- visualized by considering a spectrogram of noisy
known of these methods, the Wiener filter,10 speech, such as Fig. 7b. A good postfilter would
uses a time-varying estimate of the signal-to- suppress all noise-dominated time–frequency
noise ratio (SNR) in each frequency band to regions, leaving the speech unharmed. If done
suppress noise at times and frequencies where well, the result should be similar to the clean
this can be done with little effect on the target speech shown in Fig. 7a.
signal. Mathematically, the method aims to If the underlying target signal is known (as
make the filtered time-domain signal as similar it is when imagining what a good postfilter
to the target signal as possible (in a mean squared should do to Fig. 7b while observing Fig. 7a),
error sense). Other methods typically operate such processing can be almost infinitely
Figure 7 (a) A spectrogram of a speech utterance. (b) The same utterance mixed with 24-talker babble at
þ3 dB SNR. (c) The noisy utterance after postfiltering. (d) Gray scale version of b, colorized according to the
gain applied by the postfilter.
268 SEMINARS IN HEARING/VOLUME 42, NUMBER 3 2021 # 2021. THE AUTHOR(S).
effective. For instance, Kjems et al13 showed estimates about the target and noise, which are
that noisy speech at 60 dB SNR can be used to decide when and where to attenuate. For
rendered completely intelligible by such a Wiener filter, this involves estimating the
processing. short-time SNR in each frequency band. The
In real-world scenarios, as faced by hearing beamformer is uniquely suited to help with the
aid users, the target signal is obviously not accurate estimation of SNR.16,17 While a single
known (one might even ask, “why attempt to directional pattern must be chosen for processing
remove the noise if the underlying target signal is the signal to be presented to the user, nothing
already known?”). Postfiltering algorithms must prevents the hearing aid from simultaneously
instead rely on their own statistical estimates of using multiple other directional patterns for the
the target and noise properties to determine explicit purpose of accurately estimating
which parts of the signal to attenuate. Fig. 7c SNR16–18 (see the article by Jespersen et al in
shows the result of such processing, as applied by this issue for a similar approach that uses dual
a typical hearing aid. In comparison with Fig. 7b, microphones to estimate noise levels). This
significant amounts of noise are clearly removed. represents a significant difference between sin-
On the other hand, some noise remains, and gle-channel noise reduction and postfiltering.
spectral and temporal details are smeared when Researchers have often found that single-
comparing the postfiltered signal to the original channel noise reduction has no impact on, or
target signal (Fig. 7a). Fig. 7d shows a spectro- may even deteriorate, speech intelligibility.19–21
gram of the noisy signal, colorized according to This turns single-channel noise reduction into a
the attenuation applied by the postfilter. This tradeoff between speech intelligibility and lis-
clearly reveals that the postfilter correctly applies tening comfort. This result is often mistakenly
attenuation (as shown in purple) in many regions extended to postfiltering. However, because
with little or no speech while not attenuating (as noise reduction relies on accurate estimates of
shown in cyan) regions with mostly speech. SNR and because beamformers can help pro-
vide these, postfiltering has a significant advan-
tage compared with single-channel noise
1.4: Integrated Beamforming and reduction. In practice, postfiltering can there-
Postfiltering fore increase speech intelligibility, even in nor-
The previous sections have treated beamforming mal-hearing listeners.22
and postfiltering as two separate techniques,
postfiltering being essentially just single-channel
noise reduction applied to the beamformer out- 2: NOISE REDUCTION USING
put. There are, however, important links be- MACHINE LEARNING
tween the two systems. As noted, the Wiener Throughout the last decade, artificial intelli-
filter attempts to filter a single noisy signal to gence has transformed many technologies be-
make it resemble the target signal as closely as yond recognition, including hearing aids (see
possible. The same mathematical problem can be the articles by Fabry and Bhowmik and by
formulated when multiple microphones are Balling et al in this issue for additional applica-
available. The solution to this problem is known tions of artificial intelligence to hearing aids).
as a multichannel Wiener filter.14 It can be These breakthroughs have mostly come from a
shown to be mathematically identical to an subfield of machine learning called “deep learn-
MVDR beamformer coupled with a single- ing” (see Fig. 8), which covers the training and
channel Wiener filter.15 Thus, the combined use of neural networks for solving tasks.23
use of beamformers and postfilters for noise Neural networks with multiple layers are some-
reduction is a theoretically optimal strategy—it times referred to as deep neural networks
arises as a mathematical consequence when (DNNs). Like many other technologies, deep
solving the noise reduction problem. learning has already had an enormous impact on
A related fact makes the combined use of noise reduction technology.
beamformers and postfilters even more interest- The previous section covered noise re-
ing. As stated, the postfilter requires statistical duction without reference to techniques that
CREATING CLARITY IN NOISY ENVIRONMENTS BY USING DEEP LEARNING IN HEARING AIDS/ANDERSEN ET AL 269
Figure 8 Deep learning refers to the training and use of neural networks to solve tasks. It is a subfield of
machine learning which itself is a field of artificial intelligence.
employ machine learning or deep learning. 2.1: Training a Neural Network for
The discussed classical methods are charac- Postfiltering
terized by using statistical models and This section explains the basic principles involved
methods to tell the target signal and back- in training a neural network to reduce noise. The
ground noise apart. However, there is a limit training is executed on a database of examples of
to the accuracy with which such models can corresponding clean and noisy speech signals,
reflect the diversity of real-world listening such as the pair that comprise Fig. 7a
environments. This is because the models and Fig.7b. Pairs like these are referred to as
need to be fairly simple to allow for carrying training examples. The aim is to train a neural
out the mathematical derivations that lead to network to compute postfilter gains that make the
noise reduction algorithms. For instance, it is noisy signals similar to the clean ones. The
common to assume that speech is not corre- architecture used for doing so is shown in Fig. 9.
lated across frequency, that is, that there is no The neural network itself is composed of
correspondence between what happens at one layers of neurons. The neurons in a layer are
frequency and what happens at another fre- connected to the neurons in the previous layer
quency at the same moment. However, speech by connections of varying strength.
signals contain an intricate phonetic structure An input to the neural network is a sequence
that is indeed highly correlated across fre- of numbers: one number per neuron in the input
quency. By assuming independence of fre- layer. The input is transmitted and processed
quency channels, noise reduction algorithms through the layers of neurons via the connections
miss the opportunity of benefiting from the that link the layers. Finally, the last layer of the
structure of speech. neural network returns an output, given as a
Machine learning (including deep learn- sequence of numbers: one for each neuron in
ing) approaches the same problem in an the output layer. Therefore, the neural network is
entirely different manner. Instead of directly simply a machine that takes an input and produces
designing a specific algorithm to carry out a a corresponding output. How the output depends
task (e.g., reducing noise), machine learning on the input is governed by a large number of
applies flexible, generic algorithms that can be parameters, given by the strengths of the connec-
trained to solve a task by analyzing examples tions between the layers. The number of param-
of how the task should be solved. The applied eters (connections) can range from thousands to
algorithm is completely free to model whatev- billions depending on the design of the neural
er structures can be found in the examples, and network (the famous GPT-3 language model
there is no requirement for the solution to be trained by researchers at OpenAI has 175 billion
mathematically simple or easy to explain. See parameters25). Training a neural network corre-
Bishop24 for a thorough overview of machine sponds to adjusting the parameters in a way that
learning. makes the neural network solve a task.
270 SEMINARS IN HEARING/VOLUME 42, NUMBER 3 2021 # 2021. THE AUTHOR(S).
Figure 9 Showing how a neural network is trained to perform postfiltering. The neural network is used to
compute postfilter gains for examples of noisy audio from the training database. These postfilter gains are
applied to the noisy signals, and the result is compared with the underlying clean target signal using a loss
function. Through the mathematical techniques of backpropagation and gradient descent, the neural network
connections are updated to make the loss progressively smaller so that the postfiltered noisy signal is more
similar to the underlying clean target.
To use a neural network for postfiltering, all the neural network parameters, which will
an input that is somehow derived from the noisy tend to slightly decrease the loss. When repeat-
signal is provided. This could correspond to ed over and over for different training examples,
simply the output of the beamformer or some- this process is known as stochastic gradient
thing more refined like estimates of SNR. The descent. If done carefully, this gradually causes
neural network outputs are the postfilter gains the neural network to start behaving like a
that are applied to the noisy signal (one gain postfilter. Interestingly, this is achieved solely
value per frequency band). by showing the neural network examples of
Before training, the connections of a neural what a good postfilter should do (i.e., make the
network are typically initialized to random values. noisy signal less noisy), but without ever speci-
Thus, to begin with, when a noisy signal is fying how to do so.
presented to the system, the neural network Fig. 10a, b shows the output when the
behaves mostly arbitrarily. The resulting, poorly noisy signal from Fig. 7b is processed with a
postfiltered signal is compared with the target conventional postfilter and a postfilter based on
signal using a numerical rating known as a loss a neural network, respectively. Processing with
function. A loss function is a numerical metric that a neural network (Fig. 10b) results in a notably
quantifies the difference between the two signals. sharper and more speech-like result. This dif-
For the untrained neural network, the loss func- ference becomes even more apparent when
tion will likely report that there is a poor similarity comparing the applied postfilter gains, as shown
between the postfiltered noisy signal and the target in Fig. 10c, d. The conventional postfilter
signal. The aim is to adjust the neural network largely succeeds in identifying the speech
connections through training to improve this regions, but otherwise appears somewhat un-
similarity or, more specifically, decrease the loss. coordinated. In contrast, the neural network
Using a technique known as backpropaga- postfilter displays a sharp and coordinated
tion, one can compute backward from the loss behavior across both time and frequency, cor-
value to determine how a small change in any rectly identifying most of the underlying speech
parameter would affect the loss. Using this and letting it through. These differences are not
knowledge, one can devise a small update to merely visual—the neural network postfilter
CREATING CLARITY IN NOISY ENVIRONMENTS BY USING DEEP LEARNING IN HEARING AIDS/ANDERSEN ET AL 271
Figure 10 Comparison of conventional postfiltering and DNN-based postfiltering. (a) A noisy speech
utterance processed by a conventional postfilter (same as Fig. 7c). (b) The same noisy utterance processed by
a DNN-based postfilter. (c) A gray scale spectrogram of the noisy utterance colorized according to the gain
applied by the conventional postfilter (same as Fig. 7d). (d) Same as c, but for the DNN-based postfilter.
improves the speech intelligibility index (SII) to conventional noise reduction.31 There are,
weighted SNR by almost 2 dB over the con- however, many intricacies involved in the train-
ventional postfilter in the example shown. ing and evaluation of systems based on machine
While the above serves mainly as an illus- learning that can make it difficult to assess the
tration of the advantages associated with the use real-world implications of such results. After
of neural networks for noise reduction, many carefully training and testing a state-of-the-art
academic studies have found comparable bene- system based on neural networks to ensure that
fits on technical measures.26 Behavioral studies it was not evaluated on data that it had seen
have also reported intelligibility improvements during training, Kolbæk et al26 found that it
in hearing-impaired listeners27–29 and even could not reliably improve speech intelligibility
normal-hearing listeners.30 Similarly, it has for normal-hearing listeners. This result, how-
been reported that normal-hearing listeners ever, was obtained for a single-channel noise
prefer neural network-based noise reduction reduction system, which generally does not
272 SEMINARS IN HEARING/VOLUME 42, NUMBER 3 2021 # 2021. THE AUTHOR(S).
benefit from the improved SNR estimates that a er array as close as possible to the original sound
directional system can produce. recorded by the microphone array. With more
loudspeakers, a better rendering of the original
listening environment can be obtained.
2.2: Collection of Environmental With this approach, an acoustic scene of
Recordings the original listening environment can be accu-
An essential resource for training neural networks rately reproduced near the center of the loud-
is the database of training examples. Academic speaker array (Fig. 11c). Before the acoustic
studies, which are most often focused on single- scenes can be used as training material for
channel noise reduction, typically generate examp- neural networks, it is necessary to reproduce
les by mixing recordings from publicly available them as if they were recorded by a hearing aid
databases of speech and noise recordings. This mounted on a person’s ear. A simple option
allows large training databases to be produced could be to record from the microphones of a
while retaining complete control over factors hearing aid mounted on a person or a manikin
such as noise type and SNR. On the contrary, at the center of the loudspeaker array. However,
such artificially produced sound examples are to avoid the inconvenience of doing so for a
typically neither ecologically plausible nor repre- large number of recordings, one can instead
sentative of everyday environments for a hearing measure impulse responses from the studio
aid user. Furthermore, when training noise reduc- loudspeakers to the hearing aid microphones.
tion systems for hearing aids, one relies on input These can then be used to quickly accomplish
signals as recorded from the hearing aid’s micro- the same result for any number of recordings,
phones, including the acoustics of the hearing aid hearing aid styles, or people.
shell and the user’s head. When training neural When using acoustic scenes as training
networks for noise reduction at Oticon, the authors material for a neural network, it is necessary
have found that a good—albeit time-consuming— to have separate recordings of the target speech
solution to the discussed issues is to use a database signal and the background noise. It is well-
of ecologically valid spherical microphone array known that humans tend to raise their vocal
recordings. A substantial collection of such recor- effort when speaking in a noisy background.34
dings has therefore been made. These consist of Therefore, an acoustic scene consisting of back-
real conversations in different noisy listening envi- ground noise mixed with a target talker who
ronments commonly experienced by hearing aid was recorded in the absence of noise will be
users. The recordings were made at various physi- perceived as unnatural because the vocal effort
cal locations, such as restaurants, cafes, offices, cars, does not correspond to the noisy background.
and busy streets. The complete workflow from To improve the ecological validity of the acous-
recording to training is illustrated in Fig. 11. tic scene, the original recording of the listening
The sound environments were captured environment (Fig. 11a) is converted into a
with a spherical microphone array containing binaural audio signal. In the absence of noise,
32 microphone capsules (Fig. 11a). This re- the target signal is recorded while the noise is
cording technique allows the sound environ- presented to the talker(s) via open headphones
ments to be reproduced in a sound studio with (Fig. 11d). In this way, target speech and noise
many loudspeakers. The sound-rendering pro- for a given acoustic scene are captured separate-
cedure is described by Minnaar et al.32 The ly. Finally, the recorded speech and noise sig-
technique relies on a calibration step where the nals are mixed to generate an ecologically valid
microphone array is placed at the center of the acoustic scene.
loudspeaker array so that the transfer functions
from all loudspeakers to all microphones on the
sphere can be measured (Fig. 11b). Using an 3: PERSONALIZATION AND
inverse filtering method,33 each loudspeaker AUTOMATICS
signal is computed as the sum of the micro- The noise reduction systems described in Sec-
phone recordings that have been filtered to tions 2 and 3 are highly effective at removing
render the sound at the center of the loudspeak- noise. However, at the same time, they
Figure 11 The workflow involved in using spherical microphone array recordings for training neural networks. (a) Noisy listening environments are recorded with a
spherical microphone array. (b) The microphone array is placed in the center of a loudspeaker array. The transfer functions from all loudspeakers to all microphones are
measured. (c) Using techniques from Minnaar et al,32 the transfer functions are inverted to reproduce the recorded listening environment at the center of the array. (d)
Target audio is recorded by having one or more participants listen to noise recordings via open headphones while conversing in a quiet environment. (e) The acoustic scene
is obtained by summing the noise and target recordings. Target and noisy sound signals are rendered to hearing aid microphones and used for neural network training.
CREATING CLARITY IN NOISY ENVIRONMENTS BY USING DEEP LEARNING IN HEARING AIDS/ANDERSEN ET AL
273
274 SEMINARS IN HEARING/VOLUME 42, NUMBER 3 2021 # 2021. THE AUTHOR(S).
introduce various forms of unwanted distortion. noise reduction systems based on the approa-
Furthermore, there is generally a large variation ches described in the previous sections, using
among hearing aid users regarding the preferred two commercially available premium hearing
amount of noise reduction.19 Such factors have aids (referred to as HA1 and HA2 in the
led researchers to introduce heuristic limits that following). HA1 employs a 16-channel noise
control the influence of the noise reduction reduction system with a fast-acting combina-
system.35 This makes it possible to mostly tion of an MVDR beamformer16 and a single-
eliminate unwanted distortion and to adjust channel Wiener postfilter.17 HA2 employs a
the amount of noise reduction to meet the fast-acting 24-channel noise reduction system
user’s preference. with a higher-resolution MVDR beamformer
The preferred amount of noise reduction combined with the processing of a DNN-based
varies across users, but it also varies across time. postfilter that was trained to enhance the con-
In a very noisy environment like a busy restau- trast between speech and noise using across-
rant, most users may be willing to tolerate some channel information.36
distortion as long as the noise reduction provi-
des the needed relief from the background
noise. On the other hand, in a quiet environ- 4.1: Signal-to-Noise Ratio Benefit
ment, noise reduction might not be necessary or To compare the SNR benefits of the two
desired. Modern hearing aids have an automatic hearing aids, output SNR measurements were
system that continuously adapts the noise re- performed using the Hagerman and Olofsson
duction system to suit the listening environ- phase-inversion technique37 for HA1 and
ment. Automatic adjustment of the hearing aid HA2. For each, a pair of hearing aids were
is based on the results of an environmental fitted to the ears of a head-and-torso simulator
classifier and the user’s preferences for noise (HATS) using closed-ear tips. The HATS was
reduction as selected during the fitting process placed in the center of a circular loudspeaker
(see the article by Hayes in this issue for more setup containing 12 loudspeakers equally spa-
details on environmental classifiers). The auto- ced by 30 degrees in the horizontal plane.
matics system primarily acts by controlling the Continuous speech was presented from the
amount of directionality and postfiltering ap- front loudspeaker at 0-degree azimuth, while
plied (as shown in Fig. 1), but it may influence cafeteria noise with an overall level of 65 dB
other systems in the hearing aid too. SPL was presented from all loudspeakers si-
When surveying the academic literature on multaneously, such that noise came from all
noise reduction, it becomes clear that the topic of directions, including that of the speech, a
automatics systems is an underappreciated part situation that is especially challenging for tra-
of hearing aid design. This is perhaps because it ditional noise reduction systems. The measu-
is a relatively softer discipline than the mathe- rements were obtained for speech levels of 60
matically exact one of designing the underlying dB SPL (corresponding to 5 dB unaided
noise reduction system. However, the automa-
tics system serves a critical function by ensuring
that the individual user is exposed to the correct Table 1 SII-weighted output SNR improve-
ment in dB, relative to the unaided output
amount of noise reduction in any given listening SNR, for HA1 and HA2 at two different input
environment. For the same reason, the clinician SNRs when noise reduction is deactivated
responsible for the fitting must be well-acquain- ("off"), the postfilter only is activated ("PF
ted with the features of the noise reduction and only"), and both beamformer and postfilter are
automatics systems in the selected hearing aid. activated ("BF þ PF")
5 dB input SNR 0 dB input SNR
HA1 HA2 HA1 HA2
4: TECHNICAL AND CLINICAL Off 0.75 0.16 1.18 0.39
BENEFITS OF NOISE REDUCTION PF only 0.11 1.81 0.08 2.16
This section reports the results of technical and BF þ PF 4.04 4.54 3.82 4.65
clinical investigations of the effects of different
CREATING CLARITY IN NOISY ENVIRONMENTS BY USING DEEP LEARNING IN HEARING AIDS/ANDERSEN ET AL 275
Figure 12 Mean SRTs for 50% correct speech intelligibility obtained in the Oldenburg sentence test (N ¼ 20).
Error bars indicate the standard error of the mean. Note that the y-axis is reversed, such that higher bars
indicate higher speech intelligibility. p < 0.05, p < 0.01, p < 0.001.
especially, that the DNN-based HA2 in the view and differences in participants and test
“PF only” condition produces a statistically setups are provided here. The reader is referred
significant improvement in intelligibility com- to the articles mentioned for further methodo-
pared with “Off.” This runs counter to the logical details.
conventional expectation that only beamfor- Thirty-one experienced hearing-aid users
ming can improve intelligibility and clearly with mild to moderately severe sensorineural
showcases the differences between postfiltering hearing loss who qualified for fitting with RITE
and single-channel noise reduction. hearing aids (mean age: 65.6 years) were en-
rolled in the study. All provided informed
consent and the experiments were approved
4.3: Effects on Cortical Representations by the Science Ethics Committee for the Capi-
and Listening Effort tal Region of Denmark (journal no.
Noise reduction systems in hearing aids have H20028542). As described in the article by
been shown to reduce listening effort during Alickovic et al,44 two continuous speech signals
speech recognition tasks in noise (e.g., as shown from different talkers were presented at 73 dB
by Ohlenforst et al40,41) and to enhance the SPL from two different loudspeakers in front of
cortical representation of speech in the auditory the participants (30-degree azimuth). Parti-
cortex in noisy multitalker environments.42,43 cipants were instructed to attend to one of the
The protocols from previous electroencepha- foreground talkers (the target talker) and to
lography (EEG) and pupillometry stud- ignore the other (the masker talker). Mean-
ies42,44,45 were adapted to compare how the while, babble noise at 70 dB SPL was presented
noise reduction systems from HA1 and HA2 from four loudspeakers in the background
affect these two outcomes. Since the same (100- and 153-degree azimuth), with a
protocols were strictly followed, only an over- mix of 4 talkers in each loudspeaker. The study
CREATING CLARITY IN NOISY ENVIRONMENTS BY USING DEEP LEARNING IN HEARING AIDS/ANDERSEN ET AL 277
was designed to measure the benefit of noise data in different time windows (see Fig. 3 in
reduction in HA2 and to compare the noise Alickovic et al44), these cortical representations
reduction systems of HA1 and HA2, yielding at different stages of auditory cortical proces-
three test conditions: noise reduction deactiva- sing can be estimated. Early EEG responses
ted in HA2 (“Off”) and noise reduction acti- (<85 milliseconds) are thought to originate
vated in HA1 and HA2 (“BF þ PF”). For each from the primary areas of the auditory cortex
test condition, the participants listened to 20 and are less influenced by selective attention so
trials of 38 seconds each. Both hearing aids were that all sounds in the acoustic scene are co-
fitted to participants in the same way as de- represented. In contrast, late EEG responses
scribed in Section 5.2. (>85 milliseconds) are generated from higher-
During this task, EEG was recorded, from order, nonprimary cortical areas and show a
which a measure was derived that indicates how large effect of selective attention, such that the
strongly parts of the acoustic scene or single cortical representation of the target talker is
sound sources are represented in the auditory emphasized compared with that of the masker
cortex.42,44,46 This measure is referred to as talker and the background.47–49 Following this
cortical representation. By analyzing the EEG premise, the cortical representation of the entire
Figure 13 Strength of cortical representation of the entire acoustic scene (top left) and of the foreground
(top right) as estimated from early EEG responses, and of the target talker (bottom left) and of the masker
talker (bottom right) as estimated from late EEG responses. Gray dots indicate trial-averaged individual results,
whereas black dots and error bars show the group strengths of cortical representation (grand average 1
between-subject standard error of the mean). Each horizontal line in gray denotes a single participant.
278 SEMINARS IN HEARING/VOLUME 42, NUMBER 3 2021 # 2021. THE AUTHOR(S).
acoustic scene (comprising target talker, masker allow the user to switch attention as the situation
talker, and background noise) and of the fore- calls for it. The DNN-based HA2 seems to
ground (comprising target and masker talkers) provide a greater advantage in this regard.
was investigated using early EEG responses, Finally, the pupil size of 17 of the participants
while the cortical representation of the individ- was recorded while they selectively attended to
ual foreground talkers (target and masker) was the target talker during the same EEG experi-
investigated using late EEG responses. ment. Pupil size indicates how much cognitive
The top panels in Fig. 13 show the strength of effort is spent on a listening task.45,50,51 As a
the cortical representation of the entire acoustic general rule, a smaller pupil size indicates reduced
scene (i.e., the combination of all objects in the listening effort compared with a larger pupil size.
environment) and of the foreground (i.e., the The pupillometry results (Fig. 14) showed
combination of the two possible talkers that the a significant difference between test conditions
user may attend to) based on early EEG responses. (one-way ANOVA, F2,937 ¼ 5.3, p ¼ 0.005).
A one-way linear mixed model ANOVA revealed Post hoc tests revealed that there was a signifi-
a significant main effect of condition (entire cantly smaller pupil size for HA2 “BF þ PF”
acoustic scene: F2,1232 ¼ 9.4, p < 0.001; fore- compared with “Off” (t931 ¼ 3.2, p ¼ 0.001),
ground: F2,1230 ¼ 11.3, p < 0.001). Post hoc pair-
wise comparisons (Bonferroni corrected) revealed
that the strength of early cortical representations
was significantly higher for the “BF þ PF” condi-
tions than for the “Off” condition (entire acoustic
scene: p < 0.001; foreground: p < 0.001) and sig-
nificantly higher for HA2 than for HA1 (entire
acoustic scene: p ¼ 0.020; foreground: p ¼ 0.029).
These results suggest that activating noise reduc-
tion contributes to a more accurate representation
of the hearing aid user’s whole listening environ-
ment in the early stages of cortical processing. The
same can be said about foreground sound sources
that may become the focus of attention. Finally, the
results suggest that the DNN-based noise reduc-
tion system of HA2 is more advantageous in these
regards.
The bottom panels in Fig. 13 show the
strength of the cortical representation of the
target and masker talkers based on late EEG
responses. A one-way linear mixed model
ANOVA revealed a significant main effect of
condition (target: F2,1225 ¼ 4.1, p ¼ 0.016; mask-
er: F2,1226 ¼ 5.6, p ¼ 0.004). Post hoc pairwise
comparisons (Bonferroni corrected) showed that
the strength of late cortical representations was
significantly higher for “BF þ PF” conditions
than for the “Off” condition (target: p ¼ 0.038;
masker: p ¼ 0.003) and significantly higher for
HA2 than for HA1 for the target talker (p ¼
0.040). These results suggest that the tested noise
reduction systems help the user selectively attend Figure 14 Pupil size depicted as the average
change from baseline. Black dots and error bars
to a talker of interest in complex listening envi- indicate the average across participants (mean 1
ronments while maintaining access to between-subject standard error of the mean). Gray
other secondary talkers, which is important to dots and lines depict individual means across trials.
CREATING CLARITY IN NOISY ENVIRONMENTS BY USING DEEP LEARNING IN HEARING AIDS/ANDERSEN ET AL 279
while the other two comparisons did not reach contributions to data collection and analysis
significance (HA1 “BF þ PF” vs. “Off”: related to output SNR and speech intelligibility
t931 ¼ 1.6, p ¼ 0.11; HA2 “BF þ PF” vs. measurements, as well as the following collea-
HA1 “BF þ PF”: t931 ¼ 1.6, p ¼ 0.11). This gues from Oticon and Eriksholm Research
indicates that the noise reduction system of Centre for their contributions to the research
HA2 reduces listening effort during a selective- studies reported in this manuscript: Josefine
attention task in a complex multitalker noisy Juul Jensen, Carina Graversen, Dorothea
environment, in line with the findings of Fied- Wendt, Elaine Hoi Ning Ng, Hamish Innes-
ler et al.45 Brown, Brian Kai Loong Man, Sara Al-Ward,
In summary, the studies discussed here and Louis Villejouberts. Lastly, the authors
indicate that noise reduction systems in com- would like to thank Joshua M. Alexander whose
mercial hearing aids which combine an MVDR inputs greatly improved this article.
beamformer with a postfilter can provide clini-
cal benefits to users, with the most significant
effects obtained with the DNN-based HA2. REFERENCES
Benefits are seen in terms of increased speech
intelligibility in noise, stronger cortical repre- 1. Kochkin S. MarkeTrak VIII: consumer satisfaction
with hearing aids is slowly increasing. Hear J 2010;
sentations of multiple sound sources, and re-
63(01):19–32
duced listening effort. 2. Picou EM. MarkeTrak 10 (MT10) survey results
demonstrate high satisfaction with and benefits
from hearing aids. Semin Hear 2020;41(01):
5: CONCLUSION 21–36
Noise reduction in modern hearing aids typically 3. Moore BCJ. Cochlear Hearing Loss: Physiological,
takes the form of joint beamforming and post- Psychological and Technical Issues. 2nd ed.Wiley;
2007
filtering, which work particularly well when the
4. Plomp R. Auditory handicap of hearing im-
noise is separate from the target speech in either pairment and the limited benefit of hearing aids.
time, frequency, or direction of arrival. Rapid J Acoust Soc Am 1978;63(02):533–549
advances in machine learning are increasingly 5. Lopez RS, Bianchi F, Fereczkowski M, Santurette
influencing the design approach to such systems. S, Dau T. Data-driven approach for auditory
In fact, hearing aids using neural networks for profiling. In: Proceedings of the International
postfiltering are already commercially available. Symposium on Auditory and Audiological Re-
search. Nyborg, Denmark2017:247–254
Experimental results presented in this arti-
6. Moore B. An Introduction to the Psychology of
cle indicate that noise reduction algorithms Hearing. 6th ed. Leiden, Netherlands: Brill; 2013
provide a range of benefits. First, they can 7. Elko GW. Superdirectional microphone arrays. In:
improve SNR and speech intelligibility in noisy Gay SL, Benesty Jeds.. Acoustic Signal Processing
environments. Second, they can decrease listen- for Telecommunication. New York, United States:
ing effort and improve the user’s ability to focus Springer; 2000:181–237
on specific targets. As discussed here, impro- 8. Capon J. High-resolution frequency-wavenumber
spectrum analysis. Proc IEEE 1969;57(08):
vements in noise reduction algorithms are
1408–1418
highly relevant because they effectively extend 9. Cox H, Zeskind R, Owen M. Robust adaptive
the range of listening environments in which beamforming. IEEE Trans Acoust Speech Signal
hearing aids can benefit the user. Process 1987;35(10):1365–1376
10. Wiener N. Extrapolation, Interpolation, and
Smoothing of Stationary Time Series, with Engi-
CONFLICT OF INTEREST neering Applications. Cambridge: MIT Press; 1949
None declared. 11. Ephraim Y, Malah D. Speech enhancement using a
minimum-mean square error short-time spectral
amplitude estimator. IEEE Trans Acoust Speech
ACKNOWLEDGMENTS Signal Process 1984;32(06):1109–1121
The authors would like to thank Micha Lund- 12. Gannot S, Vincent E, Markovich-Golan S, Ozerov
beck (HörTech gGmbH) and Michael Schulte A. A consolidated perspective on multimicrophone
(Hörzentrum Oldenburg GmbH) for their speech enhancement and source separation. IEEE/
280 SEMINARS IN HEARING/VOLUME 42, NUMBER 3 2021 # 2021. THE AUTHOR(S).
ACM Trans Acoust Speech Signal Process 2017;25 for hearing-impaired listeners. J Acoust Soc Am
(04):692–730 2013;134(04):3029–3038
13. Kjems U, Boldt JB, Pedersen MS, Lunner T, Wang 28. Healy EW, Yoho SE, Chen J, Wang Y, Wang D.
D. Role of mask pattern in intelligibility of ideal An algorithm to increase speech intelligibility for
binary-masked noisy speech. J Acoust Soc Am hearing-impaired listeners in novel segments of the
2009;126(03):1415–1426 same noise type. J Acoust Soc Am 2015;138(03):
14. Doclo S. Multi-microphone noise reduction and 1660–1669
dereverberation techniques for speech applications. 29. Chen J, Wang Y, Yoho SE, Wang D, Healy EW.
PhD thesisKU LeuvenLeuven, Belgium2003 Large-scale training to increase speech intelligibil-
15. Simmer KU, Bitzer J, Marro C. Post-filtering ity for hearing-impaired listeners in novel noises. J
techniques. In: Brandstein M, Ward Deds.. Mi- Acoust Soc Am 2016;139(05):2604–2612
crophone Arrays: Signal Processing Techniques 30. Kim G, Lu Y, Hu Y, Loizou PC. An algorithm that
and Applications. Springer; 2001:39–60 improves speech intelligibility in noise for normal-
16. Kjems U, Jensen J. Maximum likelihood based hearing listeners. J Acoust Soc Am 2009;126(03):
noise covariance matrix estimation for multi-mi- 1486–1494
crophone speech enhancement. In: Proceedings of 31. Xu Y, Du J, Dai L, Lee C. An experimental study
the 20th European Signal Processing Conference on speech enhancement based on deep neural
(EUSIPCO). Bucharest, Romania 2012:95–299 networks. IEEE Signal Process Lett 2014;21(01):
17. Jensen J, Pedersen MS. Analysis of beamformer 65–68
directed single-channel noise reduction system for 32. Minnaar P, Albeck SF, Simonsen CS, Søndersted
hearing aid applications. In: IEEE International B, Oakley SAD, Bennedbæk J. Reproducing real-
Conference on Acoustics, Speech and Signal Pro- life listening situations in the laboratory for testing
cessing (ICASSP) South Brisbane, Queensland, hearing aids. In: Audio Engineering Society
Australia 2015:5728–5732 Convention 135: Paper 8951, 2013
18. Boldt J, Kjems U, Pedersen MS, Lunner T, Wang 33. Kirkeby O, Nelson PA, Hamada H, Orduna-
D. Estimation of the ideal binary mask using Bustamante F. Fast deconvolution of multichannel
directional systems. In: Proceedings of the 11th systems using regularization. IEEE Trans Acoust
International Workshop on Acoustic Echo and Speech Signal Process 1998;6(02):189–194
Noise Control Seattle, Washington USA 2008 34. Brumm H, Zollinger S. The evolution of the
19. Neher T, Wagener KC. Investigating differences in Lombard effect: 100 years of psychoacoustic re-
preferred noise reduction strength among hearing search. Behaviour 2011;148(11/13):1173–1198
aid users. Trends Hear 2016;20:20 35. Berouti M, Schwartz R, Makhoul J. Enhancement
20. Kim G, Loizou PC. Gain-induced speech distor- of speech corrupted by acoustic noise. IEEE Trans
tions and the absence of intelligibility benefit with Acoust Speech Signal Process 1979:208–211
existing noise-reduction algorithms. J Acoust Soc 36. Andersen AH, Jensen J, Pedersen MSet al.. Hear-
Am 2011;130(03):1581–1596 ing device comprising a noise reduction system.
21. Dillon H. Hearing Aids. Thieme; 2000 United States Patent Application Publication No.
22. Kuklasiński A, Doclo S, Jensen SH, Jensen J. US 2020/0260198 A1
Maximum likelihood PSD estimation for speech 37. Hagerman B, Olofsson Å. A method to measure
enhancement in reverberation and noise. IEEE/ the effect of noise reduction algorithms using
ACM Trans Acoust Speech Signal Process 2016;24 simultaneous speech and noise. Acta Acust United
(09):1599–1612 Acust 2004;90(02):356–361
23. Goodfellow I, Bengio Y, Courville A. Deep Learn- 38. Wardenga N, Batsoulis C, Wagener KC, Brand T,
ing. MIT Press; 2016 Lenarz T, Maier H. Do you hear the noise? The
24. Bishop C. Pattern Recognition and Machine German matrix sentence test with a fixed noise level
Learning. Springer; 2006 in subjects with normal hearing and hearing im-
25. Brown T, Mann B, Ryder Net al.. Language pairment. Int J Audiol 2015;54(Suppl 2):71–79
models are few-shot learners. In: Larochelle H, 39. Buus S, Florentine M. Growth of loudness in
Ranzato M, Hadsell R, Balcan MF, Lin Heds.. listeners with cochlear hearing losses: recruitment
Advances in Neural Information Processing Sys- reconsidered. J Assoc Res Otolaryngol 2002;3(02):
tems. 2020;33:1877–1901 120–139
26. Kolbæk M, Tan ZH, Jensen J. Speech intelligibility 40. Ohlenforst B, Zekveld AA, Jansma EPet al..
potential of general and specialized deep neural Effects of hearing impairment and hearing aid
network based speech enhancement systems. amplification on listening effort: a systematic re-
IEEE/ACM Trans Acoust Speech Signal Process view. Ear Hear 2017;38(03):267–281
2017;25(01):153–167 41. Ohlenforst B, Wendt D, Kramer SE, Naylor G,
27. Healy EW, Yoho SE, Wang Y, Wang D. An Zekveld AA, Lunner T. Impact of SNR, masker
algorithm to improve speech recognition in noise type and noise reduction processing on sentence
CREATING CLARITY IN NOISY ENVIRONMENTS BY USING DEEP LEARNING IN HEARING AIDS/ANDERSEN ET AL 281
recognition performance and listening effort as 46. Fiedler L, Wöstmann M, Herbst SK, Obleser J.
indicated by the pupil dilation response. Hear Late cortical tracking of ignored speech facilitates
Res 2018;365:90–99 neural selectivity in acoustically challenging condi-
42. Alickovic E, Lunner T, Wendt Det al.. Neural tions. Neuroimage 2019;186:33–42
representation enhanced for speech and reduced for 47. O’Sullivan J, Herrero J, Smith Eet al.. Hierarchical
background noise with a hearing aid noise reduc- encoding of attended auditory objects in multi-
tion scheme during a selective attention task. Front talker speech perception. Neuron 2019;104(06):
Neurosci 2020;14:846 1195–1209.e3
43. Lunner T, Alickovic E, Graversen C, Ng EHN, 48. Zion Golumbic EM, Ding N, Bickel Set al..
Wendt D, Keidser G. three new outcome measures Mechanisms underlying selective neuronal tracking
that tap into cognitive processes required for real- of attended speech at a “cocktail party”. Neuron
life communication. Ear Hear 2020;41(Suppl 2013;77(05):980–991
1):39S–47S 49. Mesgarani N, Chang EF. Selective cortical repre-
44. Alickovic E, Ng EHN, Fiedler L, Santurette S, sentation of attended speaker in multi-talker speech
Innes-Brown H, Graversen C. Effects of hearing perception. Nature 2012;485(7397):233–236
aid noise reduction on early and late cortical 50. Seifi Ala T, Graversen C, Wendt D, Alickovic E,
representations of competing talkers in noise. Front Whitmer WM, Lunner T. An exploratory study of
Neurosci 2021;15:636060 EEG alpha oscillation and pupil dilation in hear-
45. Fiedler L, Seifi Ala T, Graversen C, Alickovic E, ing-aid users during effortful listening to continu-
Lunner T, Wendt D. Hearing Aid Noise Reduc- ous speech. PLoS One 2020;15(07):e0235782
tion Lowers the Sustained Listening Effort During 51. Pichora-Fuller MK, Kramer SE, Eckert MAet al..
Continuous Speech in Noise—A Combined Pupil- Hearing impairment and cognitive energy: the
lometry and EEG Study. Ear and Hearing; in press. framework for understanding effortful listening
Doi: 10.1097/AUD.0000000000001050 (FUEL). Ear Hear 2016;37(Suppl 1):5S–27S