Backdoor: Making Microphones Hear Inaudible Sounds: Nirupam Roy, Haitham Hassanieh, Romit Roy Choudhury

BackDoor: Making Microphones Hear Inaudible Sounds
Nirupam Roy, Haitham Hassanieh, Romit Roy Choudhury
University of Illinois at Urbana-Champaign
ABSTRACT
a
Ong shadow Inaudible
Consider sounds, say at 40kHz, that are completely out- Crea tone pair
Amplitude
side the human’s audible range (20kHz), as well as a mi-
crophone’s recordable range (24kHz). We show that these
high frequency sounds can be designed to become record-
able by unmodified microphones, while remaining inaudible
Signal inside
to humans. The core idea lies in exploiting non-linearities microphone Microphone
in microphone hardware. Briefly, we design the sound and ﬁlter
play it on a speaker such that, after passing through the mi- 10K 20K 24K 40K 50K Frequency
crophone’s non-linear diaphragm and power-amplifier, the
Audible sound Near Ultrasound
signal creates a “shadow” in the audible frequency range.
ultrasound
The shadow can be regulated to carry data bits, thereby en-
abling an acoustic (but inaudible) communication channel to Figure 1: The main idea underlying BackDoor.
today’s microphones. Other applications include jamming modification, enabling billions of phones, laptops, and IoT
spy microphones in the environment, live watermarking of devices to leverage the capability. This paper presents Back-
music in a concert, and even acoustic denial-of-service (DoS) Door, a system that develops the technical building blocks
attacks. This paper presents BackDoor, a system that de- for harnessing this opportunity, leading to new applications
velops the technical building blocks for harnessing this op- in security and communications.
portunity. Reported results achieve upwards of 4kbps for
proximate data communication, as well as room-level pri- Security: Given microphones record these inaudible
vacy protection against electronic eavesdropping. sounds, it should be possible to silently jam spy microphones
from recording. Military and government officials can se-
1. INTRODUCTION cure private and confidential meetings from electronic eaves-
This paper shows the possibility of creating sounds that hu- dropping; cinemas and concerts can prevent unauthorized
mans cannot hear but microphones can record. This is not recording of movies and live performances. We also realized
because the sound is too soft or just at the periphery of the possibility of security threats. Denial-of-service (DoS)
human’s frequency range. The sounds we create are ac- attacks on sound devices are typically considered difficult
tually 40kHz and above, completely outside both human’s as the jammer can be easily detected. However, BackDoor
and microphone’s range of operation. However, given micro- shows that inaudible jammers can disable hearing aids and
phones possess inherent non-linearities in their diaphragms cellphones without getting detected. For example, during a
and power amplifiers, it is possible to design sounds that robbery, the perpetrators can prevent people from making
exploit this property. To elaborate, we shape the frequency 911 calls by silently jamming all phones’ microphones.
and phase of sound signals and play them through ultra- Communications: Ultrasound systems today aim to
sound speakers; when these sounds pass through the non- achieve inaudible data transmissions to the microphone [34].
linear amplifier at the receiver, the high frequency sounds are However, they suffer from limited bandwidth, around 3kHz,
expected to create a low-frequency “shadow”. The “shadow” since they must remain above human hearing range (20kHz)
is within the filtering range of the microphone and thereby and below the microphone’s cutoff frequency (24kHz).
gets recorded as normal sounds. Figure 1 illustrates the Moreover, FCC imposes strict power restrictions on these
effect. Importantly, the microphone does not require any bands since they are partly audible to infants and pets [20].
Permission to make digital or hard copies of all or part of this work for personal or BackDoor is free of these limitations. Using an ultrasound-
classroom use is granted without fee provided that copies are not made or distributed based transmitter, it can utilize the entire microphone spec-
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the trum for communication. Thus, IoT devices could find an
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or alternative channel for communication, reducing the grow-
republish, to post on servers or to redistribute to lists, requires prior specific permission ing load on Bluetooth (BLE). Museums and shopping malls
and/or a fee. Request permissions from permissions@acm.org.
could use acoustic beacons to broadcast information about
MobiSys ’17, June 19–23, 2017, Niagara Falls, NY, USA.
nearby art pieces or products. Various ultrasound ranging
c 2017 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ISBN 978-1-4503-4928-4/17/06. . . $15.00
schemes, that compute time of flight of signals, could benefit
DOI: http://dx.doi.org/10.1145/3081333.3081366
from the substantially higher bandwidth in BackDoor.
This paper focuses on developing the technical primitives In sum, this paper makes the following contributions:
that enable these applications. In the simplest case, Back-
Door plays two tones at say 40kHz and 50kHz. When these • Exploits non-linearities in off-the-shelf microphones to
tones arrive together at the microphone’s power amplifier, enable a “backdoor” from high to low frequencies. This
they are amplified as expected, but also multiplied due to backdoor permits playback of high frequency sounds that
fundamental non-linearities in the system. Multiplication are inaudible to humans and yet recordable through mi-
of frequencies f1 and f2 result in frequency components at crophones.
(f1 − f2 ) and (f1 + f2 ). Given that (f1 − f2 ) is 10kHz in this • Builds enabling primitives for applications in acoustic
case, well within the microphone’s range, the signal passes communication and privacy. The acoustic radio outper-
unaltered through the low pass filter (LPF). Human ears, forms today’s near-ultrasound systems, while jamming
on the other hand, do not exhibit such non-linearities and raises the bar against eavesdropping.
completely filter out the 40kHz and 50kHz sounds.
The subsequent sections expand on these contributions. We
While the above is a trivial case of sending a tone, Back- begin with an acoustic primer, followed by intuitions, system
Door intends to load data on transmitted carrier signals and design, and evaluation.
demodulate the “shadow” after receiving through the micro-
phone. This entails challenges. First, The non-linearities
we intend to exploit are not unique to the microphone; they
2. ACOUSTIC SYSTEMS PRIMER
are also present in speakers that transmit the sounds. As a
result, the speaker also produces a “shadow” within the audi-
Common Microphone Systems
ble range, making its output audible to humans. We address Any sound recording system requires two main modules
this by using multiple speakers and isolating the signals in – a transducer and an analog-to-digital converter (ADC).
frequency across the speakers. We show, both analytically The transducer contains a “diaphragm” that vibrates due to
and empirically, that none of these isolated sounds create a sound pressure, producing a proportional change in voltage.
“shadow” as they pass through the speaker’s diaphragm and The ADC measures this voltage variation (at a fixed sam-
amplifier. However, once these sounds arrive and combine pling frequency) and stores the samples in memory. These
non-linearly inside the microphone, the “shadow” emerges samples represent the recorded sound in the digital domain.
within the audible range. A practical microphone needs two more components between
Second, for communication applications, standard modu- the diaphragm and the ADC, namely a pre-amplifier and
lation and coding schemes cannot be used directly. Sec- a low pass filter. Figure 2 shows the pipeline. The pre-
tion 4.1 shows how appropriate frequency-modulation, com- amplifier’s task is to amplify the output of the transducer by
bined with inverse filtering, resonance alignment, and ring- a gain of around 10× so that the ADC can measure the signal
ing mitigation are needed to boost achievable data rates. effectively using its predefined quantization levels. Without
Finally, for security applications, jamming requires trans- this amplification, the signal is too weak (around tens of
mitting noisy signals that cover the entire audible frequency millivolts).
range. With audible jammers, this requires speakers to op-
erate at very high volumes. Section 4.2 describes how Back- Mic Pre-amp Low-pass ADC
Door is designed to achieve equally effective jamming, but
in complete silence. We leverage the adaptive gain control
(AGC) in microphones, in conjunction with selective fre-
quency distortion, to improve jamming at modest power Voltage Ampliﬁed Band-limited Digital
Sound signal
levels. signal signal samples
The final BackDoor prototype is built on customized ultra- Figure 2: The sound recording signal flow.
sound speakers and evaluated for both communication and
As per Nyquist’s law, if the ADC’s sampling frequency is
security applications across different types of mobile devices.
fs Hz, the sound must be band limited to f2s Hz to avoid
Our results reveal the following:
aliasing and distortions. Since natural sound can spread over
• 100 different sounds played to 7 individuals confirmed a wide band of frequencies, it needs to be low pass filtered
that BackDoor was completely inaudible. (i.e., frequencies greater than f2s removed) before the A/D
conversion. Since ADCs in today’s microphones operate at
• BackDoor attained data rates of 4 kbps at a distance of 48kHz, the low pass filters (LPFs) are designed to cut off
1 meter, and 2 kbps at 1.5 meters – this is 2× higher in signals at 24kHz. Figure 3 shows the effect of the low pass
throughput and 5× higher in distance than systems that (or anti-aliasing) filter on the recorded sound spectrum.
use the near-ultrasound band.
ADC
• BackDoor is able to jam and prevent the recording of
any conversation within a radius of 3.5 meters (and po-
Ampl.
Aliasing noise fs/2

tentially a room-level coverage with higher power [25]).
When 2000 English words were played back to 7 humans freq.
and a speech recognition software [2], less than 15% of the Input
words were decoded correctly. Audible jammers, aiming spectrum Low-pass fs/2 ADC fs/2
at comparable performance, would need to play white Figure 3: The digital spectrum with and without the
noise at a loudness of 97 dBSPL, considered seriously (anti-aliasing) low-pass filter.
harmful to human ears [19].
Sound Playback through Speakers terms, however, is a multiplication of signals, resulting in
Sound playback is simply the reverse of recording. Given a various frequency components, namely, 2ω1 , 2ω2 , (ω1 − ω2 ),
digital signal as input, the digital-to-analog converter (DAC) and (ω1 + ω2 ). Mathematically,
produces the corresponding analog signal and feeds it to the 1 1
speaker. The speaker’s diaphragm oscillates to the applied A2 (S1 + S2 )2 = 1 − Cos(2ω1 t) − Cos(2ω2 t) +
2 2
voltage producing varying sound pressures in the medium, Cos((ω1 − ω2 )t) − Cos((ω1 + ω2 )t)
which is then audible to humans.
With the microphone’s cut off at 24kHz, all of the above
Linear and Non-linear Behavior frequencies in Sout get filtered out by the LPF, except
Modules inside a microphone are mostly linear systems, Cos((ω1 − ω2 )t), which is essentially a 10kHz tone. The
meaning that the output signals are linear combinations of ADC is oblivious of how this 10kHz signal was generated
the input. In the case of the pre-amplifier, if the input sound and records it like any other sound signal. We call this the
is S, then the output can be represented by “shadow” signal. The net effect is that a completely inaudi-
ble frequency has been recorded by unmodified off-the-shelf
Sout = A1 S microphones.
Here A1 is a complex gain that can change the phase and/or
amplitude of the input frequencies, but does not generate
3.1 Measurements and Validation
spurious new frequencies. This behavior makes it possible For the above idea to work with unmodified off-the-shelf
to record an exact (but higher-power) replica of the input microphones, two assumptions need validation. (1) The di-
sound and playback without distortion. aphragm of the microphone should exhibit some sensitivity
at the high-end frequencies (> 30kHz). If the diaphragm
In practice, however, acoustic amplifiers maintain strong does not vibrate at such frequencies, there is no opportu-
linearity only in the audible frequency range; outside this nity for non-linear mixing of signals. (2) The second or-
range, the response exhibits non-linearity. The diaphragm der coefficient A2 needs to be adequately high to achieve a
also exhibits similar behavior. Thus, for f > 25kHz, the meaningful signal-to-noise ratio (SNR) for the shadow sig-
net recorded sound Sout may be expressed in terms of the nal, while the third and fourth order coefficients (A3 , A4 )
input sound S as follows: should be negligibly weak. We verify these next.
∞
(1) Sensitivity to High Frequencies: Figure 4 reports
X
Ai S i = A1 S + A2 S 2 + A3 S 3 + ...

Sout =
f >25 i=1
the results when a 60kHz sound was played through an ul-
trasonic speaker and recorded with a programmable micro-
While in theory the non-linear output is an infinite power phone circuit. To verify the presence of a response at this
series, the third and higher order terms are extremely weak high frequency, we “hacked” the circuit using an FPGA kit,
and can be ignored. BackDoor finds opportunities to ex- and tapped into the signal before it entered the low pass
ploit the second order term, which can be manipulated by filter (LPF). Figure 4(a) shows the clear detection of the
designing the input signal S. 60kHz tone, confirming that the diaphragm indeed vibrates
to ultrasounds. We also measured the channel frequency re-
3. CORE INTUITION AND VALIDATION sponse at the output of the pre-amplifier (before the LPF):
As mentioned earlier, our core idea is to operate the mi- Figure 4(b) illustrates the results. The take away message
crophone at high (inaudible) frequencies, thereby invoking is that the analog components indeed operate at a much
the non-linear behavior in the diaphragm and pre-amplifier. wider bandwidth; it is the digital domain that restricts the
This is counter-intuitive because most researchers and engi- operating range.
neers strive to avoid non-linearity. In our case, however, we -40 0
Magnitude (dBV)
intend to create an inlet into the audible frequency range -60 (60KHz, -47dB) -20
Power (dB/Hz)
and non-linearity is essentially the “backdoor”. We sketch -80 -40
the basic technique next, followed by some measurements to -60

-100
-80
validate assumptions. -120 Freq. Response
-100 Noise
-140
To operate the microphone in its non-linear range, we use an 0 50 100 0 50 100
Frequency (KHz) Frequency (KHz)
off-the-shelf ultrasound speaker and play a sound S, com-
posed of two inaudible tones S1 = 40 and S2 = 50kHz. Figure 4: (a) Microphone signals (measured before
Mathematically, S = Sin(2π40t) + Sin(2π50t). After pass- the LPF) confirm the diaphragm and pre-amplifier’s
ing through the diaphragm and pre-amplifier of the micro- sensitivity to ultrasound frequencies. (b) Full freq.
phone, the output Sout can be modeled as: response at the output of the amplifier.
Sout = A1 (S1 + S2 ) + A2 (S1 + S2 )2 (2) Magnitude of Non-linear Coefficients: Figure 5(a)
shows the entire spectrum after the non-linear mixing has
+ A2 Sin2 (ω1 t)+

= A1 Sin(ω1 t) + Sin(ω2 t) occurred, but before the low pass filter (LPF). Except for
the shadow at (ω1 − ω2 ), we observe that all other frequency
Sin2 (ω2 t) + 2Sin(ω1 t)Sin(ω2 t)

spikes are above the LPF’s 24kHz cutoff frequency. Simi-
larly, the nonlinear effect on a single frequency – shown in
where ω1 = 2π40 and ω2 = 2π50.
Figure 5(b) – only produces integer multiples of the original
Now, the first order terms produce frequencies ω1 and ω2 , frequency, i.e., ω, 2ω, 3ω, and so on. These two types of non-
which lie outside the microphone’s cutoff. The second order linear distortions are called intermodulation and harmonic
distortions, respectively. Importantly, the shadow signal is plitude modulation [23, 27], this results in m(t)Sin(ωc t),
still conspicuous above the noise floor, while the third order where ωc is a high frequency, ultrasound carrier. Now, if
distortion is marginally above noise. This confirms the core m(t) = a.Sin(ωm t), then the speaker should produce this
opportunity to leverage the shadow. signal:
0 2nd order 0 SAM = aSin(ωm t)Sin(ωc t)
1st order Fundamental
Magnitude (dB)
-20
Magnitude (dB)
-20 Harmonics Now, when this signal arrives at the microphone and passes
2nd order
-40 through the non-linearities, the squared components of the
-40
-60 amplifier’s output will be:
-60 2
2
-80 Sout,AM = A2 aSin(ωm t).Sin(ωc t)
-80
-100 a2 2
-100 = −A2 Cos(ωc t − ωm t) − Cos(ωc t + ωm t)
-120 4
0 50 100 0 50 100
Frequency (KHz)
a2
Frequency (KHz) = −A2 Cos(2ωm t) + (terms with f requencies
4
Figure 5: (a)The intermodulation distortion of sig- above ωc and DC)
nal (b) Harmonic distortion.
The result is a signal that contains a Cos(2ωm t) component.
3.2 Hardware Generalizability So long as ωm , the frequency of the data signal, is less than
Before concluding this section, we report measurements to 10kHz, the corresponding shadow at 2ωm = 20kHz is within
confirm that non-linearities are present in different kinds of the LPF cutoff. Thus, the received sound data can be band
hardware (not just a specific make or model). To this end, pass filtered in software, and the data signal correctly de-
we played high frequency sounds and recorded them across a modulated.
variety of devices, including smartphones (iPhone 5S, Sam-
Importantly, the above phenomenon is reminiscent of
sung Galaxy S6), smartwatch (Samsung Gear2), video cam-
coherent demodulation in conventional radios, where
era (Canon PowerShot ELPH 300HS), hearing aids (Kirk-
the receiver would have multiplied the modulated sig-
land Signature 5.0), laptop (MacBook Pro), etc. Figure 6
nal (aSin(ωm t)Sin(ωc t)) with the frequency and phase-
summarizes the SNR for the shadow signals for each of these
synchronized carrier signal Sin(ωc t). The result would be
devices. The SNR is uniformly conspicuous across all the
the m(t) signal in baseband, i.e., the carrier frequency ωc
devices, suggesting potential for widespread applicability.
eliminated. Our case is somewhat similar – the carrier also
gets eliminated, and the message signal appears at 2ωm (in-
stead of ωm ). This is hardly a problem since the signal can
BackDoor Signal (dB)
60 be extracted via band pass filtering. Thus, the net benefit is

that the microphone’s non-linearity naturally demodulates
the signal and translates to within the LPF cutoff, requiring
40
Android phone
no changes to the microphone. Put differently, non-linearity

Smart-watch
Hearing aids
may be a natural form of self-demodulation and frequency

20 translation, the root of our opportunity.
Camera
iPhone
Laptop
Unfortunately, the ultrasound transmitter – a speaker with

0 a diaphragm – also exhibits non-linearity. The above prop-
Devices erty of self-demodulation triggers in the transmitter side as
Figure 6: Consistent shadow at 5kHz (in response well, resulting in m(t) becoming audible. Figure 7 shows
to 45 and 50kHz ultrasound tones) confirms non- the output of the speaker as visualized by the oscilloscope;
linearity across various microphone platforms. a distinct audible component appears due to amplitude mod-
ulation. In fact, any modulation that generates waveforms
with non-constant envelopes [45] is likely to suffer this prob-
4. SYSTEM DESIGN lem. This is unacceptable and brings forth the first design
This section details the two technical modules in BackDoor: question: how to cope with transmitter-side non-linearity?
communication and jamming.
4.1 Communication Bypassing Transmitter Non-linearity
Thus far, the shadow signal is a trivial tone carrying one- The design goal at this point is to modulate the carrier signal
bit of information (presence of absence). While this was with data without affecting the envelope of the transmitted
useful for explanation, our actual goal is to modulate the signal. This raises the possibility of angle modulation (i.e,
high frequency signals at the speaker and demodulate the modulating the phase or frequency but leaving amplitude
shadow at the microphone to achieve meaningful data rates. untouched). However, we recognized that phase modulation
We discuss the challenges and opportunities in developing (PM) is also unsuitable in this application because of un-
this communication system. predictable noise from phone movements. In particular, the
smaller wavelength of ultrasonic signals are easily affected by
Failure of Amplitude Modulation (AM) phase noise and involves complicated receiver-side schemes
Our first idea was to modulate a single ultrasound tone, a during demodulation. Therefore, we choose the other alter-
data carrier, with a message signal m(t). Assuming am- native of angle modulation: frequency modulation (FM). Of
Piezo sensor signal of ωc and ωs became clear. First, note that the FM-
(7me domain)
modulated signal has a bandwidth of, say 2W , ranging from
(ωc − W ) to (ωc + W ). Thus, assuming that the micro-
Signal FFT
phone’s LPF cutoff is 20kHz, we should translate the cen-
AM signal ter frequency to 10kHz; this maximizes W that can be
Audible
recorded by the microphone. Immediately, we know that
component (ωc − ωs ) = 10kHz.
Second, the microphone’s diaphragm exhibits resonance at

certain frequencies; ωc and ωs should leverage this to im-
prove the strength of the recorded signal. Figure 8 plots the
Figure 7: The AM signal produces an audible fre- normalized power of the translated signal for different values
quency due to self-demodulation, shown in this os- of ωc and ωs . Given (ωc −ωs ) = 10kHz, the resonance effects
cilloscope screenshot. demonstrate the maximum response when ωc is 40kHz, and
ωs is 50kHz.
course, FM modulation is not without tradeoffs; we discuss
them and address the design questions step by step.
dB/Hz
FM: No Frequency Translation 60
Secondary carrier (kHz)

FM modulated signals, unlike AM, do not get naturally de- 55 -40
modulated or frequency-translated when pass through non-
linear transmitter. Assuming Cos(ωm t) as the message sig- 50
-60
nal, we have the input to the speaker as: 45
-80
Sf m = Sin(ωc t + βSin(ωm t)) 40
Note that the phase of the FM carrier signal should be the 35 -100
integral of the message signal, hence it is Sin(ωm t). Now
when Sf m gets squared due to non-linearity, the result is of
the form (1+Cos(2ωc t+otherT erms)) i.e., a DC component 35 40 45 50 55 60
and another component at 2ωc . Hence, along with the orig- Primary carrier (kHz)
inal ωc frequency the transmitter output contains frequency Figure 8: Resonance for various ωc − ωs values.
at 2ωc , both above the audible cut-off. Thus nothing gets
recorded by the microphone. The advantage, however, is
that the output of the speaker is no longer audible. More-
over, as typically the speaker has a low response at high Coping with the “Ringing” Effect
frequencies near 2ωc , the output signal is dominated by the The piezo-electric material in the speaker, that actually
data signal at ωc as in original Sf m . vibrates to create the sound, behaves as an oscillatory
inductive-capacitive circuit. This loosely means that the
Second Carrier for Frequency Translation actual vibration is a weighted sum of input sound samples
To get the message signal recorded, we need to frequency- (from the recent past), and hence, the piezo-electric mate-
shift the signal at ωc to the microphone’s audible range, rial has a heavy-tailed impulse response (shown in Figure 9).
without affecting the signal transmitted from the speaker. Mathematically, the output of the speaker can be computed
To achieve this, BackDoor introduces a second ultra-sound as a convolution between this impulse response and the in-
signal transmitted from a second speaker collocated with the put signal. Unfortunately, the non-linearity of the speaker
first speaker. Let us assume this second signal is called the impacts this convolution process as well, and generates low
secondary carrier, ωs . Since ωs does not mix with ωc at frequency components similar to the natural demodulation
the transmitter, the signal that arrives at the microphone effect discussed earlier. The result is a “ringing effect”, i.e.,
diaphragm is simply of the form: the transmitted sound becomes slightly audible even with FM
modulation.
Rx
Sf m = A1 Sin(ωc t + βSinωm t) + A1 Sin(ωs t) 1
Ampl. (normalized)
10 Input
Ampl.(V)
Note that the first term from the FM modulated ωc signal, 0 0.5
and the second term from the ωs secondary carrier. Now, -10
upon arriving on the receiver, the microphone’s non-linearity
10 Output 0
Ampl.(V)
essentially squares this whole signal as (SfRx 2

m ) . Expanding
0
this mathematically results in a set of frequencies centered
-10 -0.5
at (ωc − ωs ), and the others at (ωc + ωs ), 2ωc , and 2ωs . If 0 10 20 30 40 -1 0 1 2 3 4
we design ωc and ωs to have a difference less than the LPF Time (ms) Time (ms)
cutoff, the microphone can record the signal. Figure 9: (a) The prolonged oscillation in an ultra-
sonic transmitter following a 40kHz sine burst input.
Choosing ωc and ωs : (b) The impulse response of the ultrasonic transmit-
As we considered the requirements of the system, the choice ter.
To explain the self-demodulation effect, we assume a simpli- different frequencies and calculate the (k0 , k1 , k2 , ...). Fortu-
fied impulse response ‘h’: nately, unlike wireless channels, the response of the trans-
∞
X mitter does not vary over time and hence the coefficients of
h= ki δ(t − i) ≈ k0 δ(t) + k1 δ(t − 1) the inverse filter can be pre-calculated. Figure 11(a) shows
i=0 the frequency response of one of our ultrasound speakers,
while Figure 11(b) shows how our inverse filtering scheme
When an angle modulated (FM/PM) signal ‘S’ is convolved
curbs the ringing effect.
with ‘h’, the output ‘Sout ’ is:
Sout = S ∗ h 10 Input
Ampl.(V)
-55
Power (dB/Hz)
0
= sin(ωc t + βsin(ωm t)) ∗ (k0 δ(t) + k1 δ(t − 1)) -60
-10
-65
= k0 sin(ωc t + βsin(ωm t))
-70 10 Output
Ampl.(V)
+ k1 sin(ωc (t − 1) + βsin(ωm (t − 1))) -75 0
While Sout contains only high frequency components (since -80 -10
2 20 40 60 80 100 120 0 10 20 30 40
convolution is linear), the non-linear counterpart Sout mixes Frequency (KHz) Time (ms)
the frequencies in a way that has lower frequency compo-
Figure 11: (a) Freq. response of the ultrasonic
nents (or shadows):
speaker. (b) Inverse filtering method almost elimi-
2 ωm ωm nates ringing effect compared to Figure 9
Sout = k0 k1 cos(ωc + 2βsin( )sin(ωm t − ))
2 2 .
+ (terms with f requencies over 2ωc and DC) Receiver Design
2
Figure 10 shows the spectrum of Sout and with Sout , This completes the transmitter design and the receiver is
and without the convolution. Observe the low frequency now an unmodified microphone (from off-the-shelf phones,
“shadow” that appear due to the second order term for the cameras, laptops, etc.). Of course, to extract the data bits,
convolved signal – this shadow causes the ringing and is no- we need to receive the output signal from the microphone
ticeable to humans. and decode them in software. For example, in smartphones,
we have used the native recording app, and operated on the
1st Order 2nd Order
0 0
stored signal output. The decoding steps are as follows.
w/o convolu7on
-50 -50 We begin by band pass filtering the signal as per the mod-
Power (db/Hz)
Power (db/Hz)
-100 -100 ulating bandwidth. Then, we need to convert this signal

-150 -150
to its baseband version and calculate the instantaneous fre-
quency to recover the modulating signal m(t). This signal
-200 -200
0 20 40
Frequency (KHz)
60 0 20 40
Frequency (KHz)
60 contains the negative-side frequencies that overlap with the
0 0 spectrum-of-interest during the baseband conversion. To re-
w/ convolu7on
-20 -20 audible move the negative frequencies, we Hilbert Transform the
Power (db/Hz)
Power (db/Hz)
-40 -40 frequencies

-60
signal, producing a complex signal [29]. Now, for baseband
-60
-80 -80
conversion, we multiply this complex signal with another
-100 -100 complex signal e−j2π(ωs −ωc )t . Here (ωs − ωc ) is 10kHz, i.e.,
-120
0 20 40 60
-120
0 20 40 60 the shifted carrier frequency. This operation brings the mod-
Frequency (KHz) Frequency (KHz)
2
ulated spectrum to baseband, centered around DC. The dif-
Figure 10: The spectrogram of Sout and Sout , with ferentiation of its phase gives the instantaneous frequency
and without the convolution. The shadow signal ap- [40], which is then simply mapped to data bits. Section
pears due to second-order non-linear effects on the 5 will present performance evaluation, but before that, we
convolved signal. present the techniques for inaudible voice jamming.
In most speakers, this “shadow” signal is weak; some expen-
sive speakers even design their piezo-electric materials to be 4.2 Jamming
linear in a wider operating region precluding this possibility. Imagine military applications in which a private conver-
However, we intend to be functional across all speaker plat- sation needs to be held in an untrusted environment, po-
forms (even the cheapest ones) and aim to be completely tentially bugged with spy microphones. We envision turn-
free of any ringing whatsoever. Hence, we adopt an inverse ing on one/few BackDoor devices in that room. The de-
filtering approach to remove ringing. vice will broadcast appropriately designed ultrasound sig-
nals that will not interfere with human conversation, but
Inverse Filtering to Eliminate Ringing will jam microphones in the vicinity. This section targets 2
Our core idea draws inspiration from pre-coding in wireless jamming techniques towards this goal: (1) passive gain sup-
communication, i.e., we modify the input signal Sf m so that pression, and (2) active frequency distortion. Together, the
it remains the same after convolution. In other words, if the techniques mitigate electronic eavesdropping.
modified signal Smod = h−1 ∗ Sf m , then the impact of con-
volution on Smod results in h∗h−1 ∗Sf m , which is Sf m itself. (1) Passive Gain Suppression
With Sf m as the output of the speaker, we do not experi- Our core idea is to leverage the automatic gain control
ence ringing. Of course, we need to compute h−1 , i.e., learn (AGC) circuit [31, 38, 44] in the microphone to suppress
the coefficients of the impulse response. For this, we moni- voice conversations. By transmitting a narrowband ultra-
tor the current passing through the ultrasonic transmitter at sound frequency at high amplitude, we expect to force the
microphone to alter its dynamic range, thereby weakening 40
the SNR of the voice signal. We elaborate next, beginning
Power reduction (dB)

30
with a brief primer on AGC.
AGC Primer: 20
Our acoustic environment has large variations in volume lev- 10

els ranging from soft whispers to loud bangs. While human
ears seamlessly handle this dynamic range, it poses one of 0
the major difficulties in microphones. Specifically, when a 5
So
microphone is configured at a fixed gain level, it fails to un
df
10
49
req 15 43 45 47
record a soft signal below the minimum quantization limit, 20 37 39 41
. (K 31 33 35
while a loud sound above the upper range is clipped, causing Hz freq. (KHz)
) Ultrasound
severe distortions. To cope, microphones use an Automatic
Figure 13: The reduction in sound power due to the
Gain Control (AGC) (as a part of its amplifier circuit) that
AGC: reduction maximum for the 40kHz tone due
adjusts the signal amplitude to fit well within the ADC’s
to the speaker’s resonance at this frequency.
lower and upper bounds. As a result, the signal covers the
entire range of the ADC, offering the best possible signal
resolution. In fact, an adequately loud ultrasonic tone can completely
prevent the microphone from recording any meaningful voice
Figure 12 demonstrates the AGC operation in a common
signal by reducing its amplitude below the minimum quanti-
MEMS microphone (ADMP401) connected to the line-in
zation level. However, as the electrical noise level is usually
port of a Linux laptop running the ALSA sound driver. We
higher than the minimum quantization level of the ADC,
simultaneously play 5kHz and 10kHz tones through two dif-
it is sufficient to reduce the signal power below that noise
ferent (but collocated) speakers and display the power spec-
floor.
trum of the received sound. Figure 12(a) reports both the
signals at around −20dB. However, when we increase the Figure 14 shows the reduction in the signal power of a
power of the 10kHz signal to reach its AGC threshold (while recorded voice segment for 3 different power levels of the
keeping the 5kHz signal unaltered), Figure 12(b) shows how 40kHz tone. In practice, an absolute amplitude reduc-
the microphone reduces the overall gain to accommodate the tion is difficult unless the speaker uses high power. Im-
loud 10kHz signal. This results in a 25dB reduction of the portantly, high power speakers are possible with Back-
unaltered 5kHz signal. Door since the jamming signal is inaudible. On the other
0 0 hand, regular white noise audio jammers must operate be-
(5KHz, -20dB) (5KHz, -45dB)
-20 -20 low strict power levels to not interfere with human conver-
Power (dB/Hz)
Power (dB/Hz)
(10KHz, -18dB)
(10KHz, -3dB)
-40 -40 sation/tolerance. This is a key advantage of jamming with
-60 -60 BackDoor. Nonetheless, we still attempt to lower the power
-80 -80 requirement by injecting additional frequency distortions at
-100 -100 the eavesdropper’s microphone.
-120 -120
4 6 8 10 12 4 6 8 10 12
Frequency (KHz) Frequency (KHz) 4
Frequency (KHz)
Figure 12: Automatic Gain Control: (a) The 5kHz

tone is at −20dB when the amplitude of the 10kHz
frequency is at comparable power. (b) The 5kHz 2
tone reduces to −45dB when the amplitude of the
10kHz tone is made to exceed the AGC threshold.
Some spurious frequencies also appear due to non-
0
linearities. 0.2 0.4 0.6 0.2 0.4 0.6 0.2 0.4 0.6
Time (sec)
Voice Suppression via AGC: Figure 14: The reduction in signal power of recorded
In line with the above idea, when our ultrasound signal at voice segment for 3 power levels (darker is lower
ωc passes through the AGC (i.e., before this frequency is power).
removed by the low pass filter), it alters the AGC gain con-
figuration and significantly suppresses the voice signals in
the audible frequency. Figure 13 shows the reduction in the
(2) Injecting Frequency Distortion
received sound power in a Samsung Galaxy S-6 smartphone A traditional jamming technique is to add strong white noise
when ultrasound tones are played at different frequencies to reduce the SNR of the target signal. We first implement a
from a piezo-electric speaker. Evident from the plot, the similar technique, but with inaudible band-limited Gaussian
maximum reduction is due to the signal at 40kHz – this noise. Specifically, we modulate the ωc carrier with white
is because 40kHz is the resonance frequency of the piezo- noise, bandpass filtered to allow frequencies between 40kHz
electric transducer, and thereby delivers the highest power. to 52kHz only. The 52kHz ωs carrier shifts this noise to
In that sense, using the resonance frequency offers double [0, 12]kHz, which is sufficient to affect the voice signal.
gains, one towards increasing the SNR of our communica-
To improve, we then shape the white noise signal to boost
tion signal, and the other for jamming.
power in frequencies that are known to be important for
This reduction in signal amplitude results in low resolution voice. Note that these distortions are designed in the ultra-
when sampled with discrete quantization levels at the ADC. sound bands (to maintain inaudibility), and hence they are
Figure 15: BackDoor experimental setup: (a) Two ultrasonic speakers mounted on a circuit board for data
communication. (b) A 2W att speaker array system for jamming applications. (c) The FPGA based set up for
probing into individual components of the microphone.
played through the ultrasound speakers. Section 5 will re- arrays, each array with 9 piezoelectric speakers connected
port results on word legibility, as a function of the separation in parallel to generate a 2Watt jamming signal. The signals
between the jammer and the spy microphone. driving these arrays are first amplified using an LM380
op-amp based power amplifier separately powered from a
5. EVALUATION constant DC-voltage source. Figure 16 shows the circuit
diagram of the speaker array.
BackDoor was evaluated on 3 main metrics: (1) human audi-
bility, (2) throughput, packet error rates(PER) and bit error
+
rates (BER) for data communication, and (3) the efficacy of 9-element ultrasonic
speaker array 1 2 … 8 9
jamming. We summarize the key results here, followed by -

details. VS (10-22V)
Bypass
VS (10-22V)
GND
• Table 1 reports human perception of audibility for Back- 100KΩ
2
1
Door for various frequencies, modulations, and SNR levels. Vin + 14 0.1µF
8
Modulated 0.1µF 100KΩ LM380
Except for amplitude modulation (AM), all the human vol- input signal 6
- 10KΩ
unteers reported complete silence. 10KΩ
3,4,5
10,11,12
7
• Figure 17 and 18 report the variation of throughput against 0.1µF GND Heat
sink
increasing distance, different phone orientations, and impact
of acoustic interference. The results show throughput of 4 GND GND
50KΩ
GND
kbps at 1 meter away which is 2× to 4× higher than today’s
mobile ultrasound communication systems. Figure 16: The circuit diagram of the jamming
• Figure 19 compares the jamming radius for BackDoor and transmitter.
audible white noise-based jammers. To achieve the same
(2) Receiver Microphones: We experiment with two
jamming effect (say, < 15% words legible by humans), we
types of receivers. The first is an off-the-shelf Samsung
find that the audible jammer requires a loudness of 97 dB-
Galaxy S6 smartphone (released in Aug, 2015) running An-
SPL which is similar to a jackhammer and can cause severe
droid OS 5.1.1. Signals are recorded through a custom An-
damage to humans [19]. BackDoor, on the other hand, re-
droid app using the standard APIs. The second receiver
mains completely silent. Conversely, when the white noise
is shown in Figure 15(c) – a more involved setup that was
sound level is made tolerable, the legibility of the words was
mainly used for micro-benchmarks reported earlier in Sec-
76%.
tions 3 and 4. This allowed us to tap into different com-
We elaborate on these results below, starting with details ponents of the microphone pipeline, and analyze signals in
on our implementation platform. isolation. The system runs on a high bandwidth data ac-
quisition ZedBoard, a Xilinx Zynq-7000 SoC based FPGA
5.1 Implementation platform [12], that offers a high-rate internal ADC (up to 1
(1) Transmitter Speakers: Figure 15(a) and (b) show Msample/sec). A MEMS microphone (ADMP 401) is exter-
two different transmitter prototypes we have developed, nally connected to this ADC, offering undistorted insights
the first one for communication and the other for jamming. into higher frequency bands of the spectrum.
The communication transmitter consists of two ultrasonic
piezoelectric speakers [33]; each transmits a separate
frequency as described in Section 4. A programmable 5.2 Human Audibility Results
waveform generator (Keysight 33500b series) drives the We played BackDoor signals to a group of 7 users (ages be-
speakers with frequency modulated signals. The signals are tween 27 and 38) seated around a table 1 to 3 meters away
amplified using an NE5535AP op-amp based non-inverting from the speakers. Each user reported the perceived loud-
amplifier, permitting signals up to 150kHz. The jamming ness of the sound on a scale of 0-10, with 0 being perceived
transmitter in Figure 15(b) is composed of two speaker silence. As a baseline, we also played audible sounds and
Reference Mic. 2kHz Tone 5kHz Tone FM AM White Noise
SNR (dB) BackDoor Audible BackDoor Audible BackDoor Audible BackDoor Audible BackDoor Audible
25 0 0.75 0 3.33 0 1.2 0 0.46 0 0.1
30 0 1.5 0 4.08 0 2.3 0.1 1.36 0 0.26
35 0 2 0 4.91 0 3.5 0.1 1.85 0 0.5
40 0 2.67 0 5.42 0 4.2 0.16 2.4 0 0.8
45 0 3.17 0 6.17 0 4.8 0.68 3.06 0 1.24
Table 1: Perceived loudness of BackDoor in comparison to audible sounds.
5 5 12 Secondary Y
Coding rate: 3/4 Primary mic
Throughput (Kbps)
mic
Packet Error Rate(%)

Coding rate: 1/2
4 4 10 Secondary mic
Throughput (Kbps)
3 8
3
-X Z
2 6 X
2
1 4
1
0 2
t r i r
irp ur pe an oo 0 Primary
0 Ch Bl hi
s w D
0.5 1 1.5 2
iW Dh ack Y -Y X -X Z -Z mic
Distance (meter) Pr B Orientation -Y
Figure 17: BackDoor Communication Results: (a) Throughput vs. Distance, (b) Throughput comparison
against related P2P communication schemes. (c) Packet error rate vs. Orientation. (d) Phone orientations.
asked the users to report the loudness levels. A reference we avoid AM, BackDoor signals remain inaudible to hu-
microphone is placed at 1m from the speaker to record and mans but produce audible signals inside microphones with
compute the SNR (Signal to Noise Ratio) of all the tested the same SNR as loud audible signals.
sounds. We varied the SNR and equalized them at the mi-
crophone for fair comparison between audible and inaudible 5.3 Communication Results
(BackDoor) sounds. The BackDoor transmitter is the 2-speaker system while the
receiver is the Samsung smartphone. The recorded acoustic
Four types of signals were played: (1) Single Tone Un-
signal is extracted and processed in MATLAB; we compute
modulated Signals: In the simplest form, we transmitted
bit error rate (BER), packet error rate (PER) and through-
multiple pairs of ultrasonic tones (<40, 42> and <40, 45>)
put under varying parameters. Overall, 40 hours of acoustic
that generate a single audible frequency tone in the micro-
transmission was performed to generate the results.
phone. As baseline, we separately played a 2kHz and 5kHz
audible tone. (2) Frequency Modulated Signals: We
modulated the frequency of a 40kHz primary carrier with
Throughput
a 3kHz signal. We also transmitted a 45kHz secondary car- Figure 17(a) reports BackDoor’s net end-to-end through-
rier on the second speaker, producing 3kHz FM signal cen- put for increasing separation between the transmitter and
tered at 5kHz in the microphone. As baseline, we played the receiver. BackDoor can achieve a throughput of 4kbps
the equivalent audible FM signal on the same speakers. (3) at 1m, 2kbps at 1.5m and 1kbps at 2m. Figure 17(b)
Amplitude Modulated Signals: Similar to FM signals, compares BackDoor’s performance in terms of throughput
we created these AM signals by modulating the amplitude of and range with state-of-the-art mobile acoustic communica-
40kHz signal with a 3kHz tone. (4) White Noise Signals: tion systems (in both commercial products [1, 13] and re-
Finally, we generated white Gaussian noise with zero mean search [34, 22]). The figure shows that BackDoor achieve 2×
and variance proportional to the transmitted power, at a to 80× higher throughput. This because these systems are
bandwidth of 8kHz, band-limited to [40, 48]kHz. We also constrained to a very narrow communication band whereas
transmit a 40kHz tone on the second speaker to frequency BackDoor is able to utilize the entire audible bandwidth.
shift the white noise to the audible range of the speaker.
As baseline, we create audible white noise with the same Impact of Phone Orientation
properties band-limited to [0, 8]kHz and played it on the Figure 17(c) shows the packet error rate (PER) when data
speakers. is decoded by the primary and secondary microphones in
the phone, placed in 6 different orientations (shown in Fig-
Audibility Vs. SNR ure 17(d)). The aim here is to understand how real-world
Table 1 summarizes the average of perceived loudness that use of the phone impacts data delivery. To this end, the
users reported for both BackDoor and audible signals as a phone was held at a distance of 1m away from the trans-
function of the SNR measured at the reference microphone. mitter, and the orientation changed after each transmission
For all types of signals except amplitude modulation (AM), session. The plot shows that except Y and −Y , the other
BackDoor is completely inaudible to all the users. AM sig- orientations are comparable. This is because the Y / − Y
nals are audible due to speaker non-linearity, as described orientation align the two receivers and transmitters in al-
earlier. However, the perceived loudness of BackDoor is sig- most a straight line, resulting in maximal SNR difference.
nificantly lower than that of audible signals. Thus, so long Hand blockage of the further-away microphone makes the
SNR gap pronounced. It should be possible to compare the for Bob, and the words played are derived from Google’s
SNR at the microphones and select the better microphone Trillion Word Corpus [10]; we pick the 2000 most frequent
for minimized PER (regardless of the orientation). words, prescribed as a good benchmark [35]. As mentioned
earlier, the volume of this playback is set to 70 dBSPL at
Impact of Interference 1m away. Now, the BackDoor prototype plays an inaudible
Figure 18(a) reports the bit error rate (BER) variation jamming signal through its ultrasonic speakers to jam these
against 3 different audible interference sources. To elabo- speech signals.
rate, we played audible interference signals – a presidential Baseline: Our baseline comparison is essentially against
speech, an orchestral music, and white noise – from a nearby audible white noise-based jammers in today’s markets. As-
speaker, while the data transmission was in progress. The suming BackDoor jams up to a radius of R, we compute the
intensity of the interference at the microphone was at 70 loudness needed by white noise to jam the same radius. All
dBSPL, equaling the level of volume one hears on average in all, 14 hours of sound was recorded and a total of 25, 000
in face-to-face conversations. This is certainly much louder words were tested. The ASR software is the open-source
than average ambient noise, and hence, this serves as a Sphinx4 library (pre-alpha version) published by CMU [2,
strict test for BackDoor’s resilience to interference. Also, the 21]. We present the results next.
smartphone receiver was placed 1m away from the speaker,
and transmissions were at 2kbps and 4kbps. Audible and Inaudible Jamming Radius
Evident from the graph, voice and music has minimal im- Figure 19(a) plots Lasr and Lhuman for increasing jamming
pact on the communication error. On the other hand, white radius. Even with a 1W power, a radius of 3.5m (around
noise can severely degrade performance. Figure 18(b) plots 11 feet) can be jammed around Bob. We compare against
the power spectral density of each interference – the de- audible noise jammers presented in Figure 19(b). For jam-
cay beyond 4kHz for voice and music explains the per- ming at the same radius of 3.5m, the loudness necessary for
formance plots. Put differently, since BackDoor operates the audible white noise is 97 dBSPL which is the same as a
around 10kHz frequency, voice and music signals do not af- jackhammer and can cause damage to the human ear [19].
fect the band as much as white noise, that remains flat over Conversely, we find that when the audible white noise is
the entire spectrum. made tolerable (comparable to a white noise smartphone
app playing at full volume), the legibility becomes 76%.
0.25 Bit-rate: 2K -40 Thus, BackDoor is a clear improvement over audible jam-
Bit-rate: 4K
0.2 mers. More importantly, increasing the power of BackDoor
PSD (dB/Hz)
-60 jammers can increase the radius proportionally. This can be

0.15
BER
easily achieved. In fact, current portable Bluetooth speak-

0.1 -80 ers already transmit 10× to 20× higher power than Back-
Voice
Music Door [4, 3]. Audible jammers cannot increase their power to
0.05 White noise
-100 boost the range since they are already intolerable to humans.
0 0 5 10
No interf. Voice Music W. noise Frequency (KHz) Impact of Selective Frequency Distortion
Figure 18: (a) BER vs. Interference. (b) Spectral
density of interfering signals. Figure 19(c) shows results when the jamming signal is simply
a white noise, without the deliberate distortions of voice-
centric frequencies (fricatives, phonemes, and harmonics).
5.4 Jamming Results Evidently, the performance is substantially weaker, indicat-
Setup: Consider the case where Bob is saying a secret ing the importance of signal shaping and jamming. Finally,
to Alice and Eve has planted a microphone in the vicinity, Figure 19(d) shows the confidence scores from ASR for all
attempting to record Bob’s voice. In suspicion, Bob places correctly recognized words. Results show quite low confi-
a BackDoor jammer in front of him on the table. We in- dence on a large fraction of words, implying that voice fin-
tend to report the efficacy of jamming in such a situation. gerprinting and other voice-controlled systems would be easy
Specifically, we extract the jammed signal from Eve’s micro- to DoS-attack with a BackDoor-like system.
phone and play it to an automatic speech recognizer (ASR),
as well as to a group of 7 human users. We define Legibility 6. POINTS OF DISCUSSION
as the percentage of words correctly recognized by each. We Needless to say, there is much room for further work and
plot Lasr and Lhuman for increasing jamming radius, i.e., for improvement. We discuss a few points here.
increasing distance between Alice and Eve’s microphone.
• Jamming Range: BackDoor’s restriction in the jam-
We still need to specify another parameter for this experi- ming range stems from the attenuation of ultrasound in air
ment – the loudness with which Bob is speaking. Acoustic and the limited amplitude at which the ultrasound speakers
literature suggests that at social conversations, say between can vibrate, producing low power signals. We have demon-
two people standing at arm’s length at a corridor, the av- strated a proof-of-concept with 9 speakers that boosts the
erage loudness is 65 dBSPL (dB of sound pressure level). jamming power level – direct materials cost is around $4.
We design our situation accordingly, i.e., when Bob speaks, It should be certainly possible to develop a bigger speaker
his voice at Alice’s location 1m away is made to be 70 dB- array to significantly increase the power [25]. In some cases
SPL (i.e., Bob is actually speaking louder than general social (e.g. movie theater) multiple short-range jammers can be
conversations). used to sufficiently cover the space. The jammers could be
In the actual experiment, we pretend that a smartphone is wall powered where necessary, and yet, will remain inaudi-
a spy microphone. Another smartphone’s speaker is a proxy ble.
100 150 100 1
Sound pressure (dBSpl)

Human user Hearing damage threshold Human user
ASR software (over long-term exposure) ASR software
80 80 0.8
Accuracy (%)
100
Accuracy (%)
60 60 0.6
CDF
Vacuum cleaner
Audible jammer
40 40 0.4 No Jamming
Jackhammer
(Full volume)
50 Dist: 5m
Jet engine
Laptop
0.2 Dist: 4m
20 20 Dist: 3m
Dist: 2m
0 0 0 0
Sound sources 0 0.2 0.4 0.6 0.8 1
m
0m
5m
0m
5m
0m
5m
0m
5m
N .0m
m
0m
5m
0m
5m
0m
5m
0m
5m
N .0m
Confidence score
Ja
Ja
1.
1.
2.
2.
3.
3.
4.
4.
1.
1.
2.
2.
3.
3.
4.
4.
5
5
o
o
Figure 19: Jamming results: (a) BackDoor jams a radius of 3.5m at 2W power. (b) White noise power needed
to match BackDoor is intolerable. (c) Jamming radius when BackDoor uses inaudible white noise, showing
importance of selectively jamming voice-centric harmonics. (d) Confidence of speech recognizer.
• Smarter Spy: We have assumed a fairly simple attacker cure data exchange medium. GhostTalk [26] explores vari-
planting a single microphone in the vicinity. Multiple mi- ous attack scenarios on the consumer electronics using high
crophones, perhaps even with various beamforming capabil- power electromagnetic interference. Another thread of re-
ities, may be able to extract out the voice from the jamming cent work has looked into watermarking audio-visual me-
signal. However, greater sophistication in jamming should dia. Dolphin [41] enables speaker-microphone communica-
be feasible too, such as variation in the jamming signal to tion by embedding data bits on the sound. It adapts the
prevent channel estimation; even some movements of the signal parameters in real-time to keep the embedded signal
speakers. We leave this to future work. imperceptible to human ears while achieving 500 bps data
rate. Kaleido[48] proposes a video precoding based solution
• Interference with Phone Calls: Data communica- to prevent videotaping an on-screen show in a theater or
tion with BackDoor can interfere with people talking on the on website. It precodes distortions in the video such that
phone nearby. To this end, data communication applications it is invisible to humans but severely distorts videotaping
will inherently need to be proximate and at low power. One (due to specific limitations of the camera). Finally, sound
possibility is an acoustic NFC, but at greater ranges of 1 maskers have also been used for protecting private conversa-
or 2 feet. Alternatively, the communication could be made tion, however, these techniques have been limited to audible
spread spectrum so that the interference remains below the frequencies [18, 30, 6, 7]. BackDoor differs from the above
noise floor. Our ongoing work is investigating these unre- in the sense that it exploits discrepancies between humans
solved issues. and electronics, ultimately enabling a new capability to the
best of our knowledge.
7. RELATED WORK
Literature in Acoustic Non-linearity: The litera- 8. CONCLUSION
ture in acoustic signal processing and communication is ex-
tremely rich. The notion of exploiting non-linearity was orig- Device non-linearity has been conventionally viewed as a
inally studied in the 1957 by Westervelt’s seminal theory peril. This paper breaks away from this point of view and
[43, 42], which later triggered a series of research. The core discovers various opportunities to harness non-linearity. By
vision was that non-linearities of the air can naturally self- carefully designing ultrasound signals, we demonstrate that
demodulate signals; when combined with directional prop- such signals remain inaudible to humans but are record-able
agation of ultrasound signals, it may be possible to deliver by unmodified off-the-shelf microphones. This translates to
audible information over large distances using relatively low new applications including inaudible data communication,
power [17, 14, 46]. Recently, there has been a revival of privacy, and acoustic watermarking. While our ongoing
these efforts with AudioSpotlight [5], SoundLazer [9, 8], and work is focused on deeper understanding of these capabil-
other projects [47, 11, 36]. Our work, however, is opposite ities and applications, our longer term goal is focused on
of these efforts – we are attempting to retain the inaudi- generalization to other platforms, such as wireless radios and
ble nature of ultrasound while making it recordable inside inertial sensors.
electronic circuits.
Medical Devices: Human bones have also been shown
Acknowledgement
to exhibit non-linearities that self-modulate signals, result- We sincerely thank the anonymous reviewers for their valu-
ing in applications in bone conduction ultrasound hearing able feedback. We are grateful to the Joan and Lalit Bahl
aids for severely deaf individuals [28, 15, 16, 37, 32]. Even Fellowship, Qualcomm, IBM, and NSF (award number:
bone conduction headphones are being considered that ex- 1619313) for partially funding this research.
ploit similar non-linearities [24].
Assorted Topics Related to BackDoor: A set of
9. REFERENCES
recent works bear some degree of relevance to BackDoor. [1] Chirp technology. http://www.chirp.io. Last accessed
Dhwani [34] explores in-air sound signals as a short range, 28 November 2016.
ad-hoc data transfer modality. Chirp [1] and Zoosh [39, 13] [2] Cmu sphinx. http://cmusphinx.sourceforge.net. Last
have rolled out commercial products using sound for a se- accessed 6 December 2015.
[3] Hight power bluetooth speaker: 12watt. [24] Kim, S., Hwang, J., Kang, T., Kang, S., and
https://www.cnet.com/products/jbl-pulse/specs/. Sohn, S. Generation of audible sound with ultrasonic
Last accessed 28 November 2016. signals through the human body. In Consumer
[4] Hight power bluetooth speaker: 38watt. Electronics (ISCE), 2012 IEEE 16th International
http://www.fugoo.com/fugoo-tough-xl/. Last accessed Symposium on (2012), IEEE, pp. 1–3.
28 November 2016. [25] Kumar, S., and Furuhashi, H. Long-range
[5] Holosonics webpage. https://holosonics.com. Last measurement system using ultrasonic range sensor
accessed 28 November 2016. with high-power transmitter array in air. Ultrasonics
[6] Sound masking device. 74 (2017), 186–195.
http://www.oeler.com/sound-masking-systems/. Last [26] Kune, D. F., Backes, J., Clark, S. S., Kramer,
accessed 28 November 2016. D., Reynolds, M., Fu, K., Kim, Y., and Xu, W.
[7] Sound masking solutions. Ghost talk: Mitigating emi signal injection attacks
https://www.speechprivacysystems.com. Last accessed against analog sensors. In Security and Privacy (SP),
28 November 2016. 2013 IEEE Symposium on (2013), IEEE, pp. 145–159.
[8] Soundlazer kickstarter. https://www.kickstarter.com/ [27] Lee, E. A., and Messerschmitt, D. G. Digital
projects/richardhaberkern/soundlazer. Last accessed communication. Springer Science & Business Media,
28 November 2016. 2012.
[9] Soundlazer webpage. http://www.soundlazer.com. [28] Lenhardt, M. L., Skellett, R., Wang, P., and
Last accessed 28 November 2016. Clarke, A. M. Human ultrasonic speech perception.
[10] Top 10000 words from google’s trillion word corpus. Science 253, 5015 (1991), 82–85.
https://github.com/first20hours/google-10000-english. [29] Lyons, R. G. Understanding Digital Signal
Last accessed 6 December 2015. Processing, 3/E. Pearson Education India, 2004.
[11] Woody norris ted talk. [30] McCalmont, A. M. Voice privacy system with
https://www.ted.com/speakers/woody norris. Last amplitude masking, Mar. 25 1980. US Patent
accessed 28 November 2016. 4,195,202.
[12] Zedboard. http://zedboard.org. Last accessed 28 [31] Mercy, D. A review of automatic gain control theory.
November 2016. Radio and Electronic Engineer 51, 11.12 (1981),
[13] Zoosh technology. 579–590.
http://www.bdti.com/insidedsp/2011/07/28/naratte. [32] Nakagawa, S., Okamoto, Y., and Fujisaka, Y.-i.
Last accessed 28 November 2016. Development of a bone-conducted ultrasonic hearing
[14] Bjørnø, L. Parametric acoustic arrays. In Aspects of aid for the profoundly sensorineural deaf. Transactions
Signal Processing. Springer, 1977, pp. 33–59. of Japanese Society for Medical and Biological
[15] Deatherage, B. H., Jeffress, L. A., and Engineering 44, 1 (2006), 184–189.
Blodgett, H. C. A note on the audibility of intense [33] Nakamura, T. Piezoelectric speaker, June 3 1986. US
ultrasonic sound. The Journal of the Acoustical Patent 4,593,160.
Society of America 26, 4 (1954), 582–582. [34] Nandakumar, R., Chintalapudi, K. K.,
[16] Dobie, R. A., and Wiederhold, M. L. Ultrasonic Padmanabhan, V., and Venkatesan, R. Dhwani:
hearing. Science 255, 5051 (1992), 1584–1585. secure peer-to-peer acoustic nfc. In ACM SIGCOMM
[17] Fox, C., and Akervold, O. Parametric acoustic Computer Communication Review (2013), vol. 43,
arrays. The Journal of the Acoustical Society of ACM, pp. 63–74.
America 53, 1 (1973), 382–382. [35] Nation, P., and Waring, R. Vocabulary size, text
[18] Goubran, R., and Botros, R. Adaptive sound coverage and word lists. Vocabulary: Description,
masking system and method, June 5 2003. US Patent acquisition and pedagogy 14 (1997), 6–19.
20,030,103,632. [36] NORRIS, E. Parametric transducer and related
[19] Hamby, W. Ultimate sound pressure level decibel methods, May 6 2014. US Patent 8,718,297.
table, 2004. [37] Okamoto, Y., Nakagawa, S., Fujimoto, K., and
[20] Heffner, H. E., and Heffner, R. S. Hearing Tonoike, M. Intelligibility of bone-conducted
ranges of laboratory animals. Journal of the American ultrasonic speech. Hearing research 208, 1 (2005),
Association for Laboratory Animal Science 46, 1 107–113.
(2007), 20–22. [38] Pérez, J. P. A., Pueyo, S. C., and López, B. C.
[21] Huggins-daines, D., Kumar, M., Chan, A., Agc fundamentals. In Automatic Gain Control.
Black, A. W., Ravishankar, M., and Rudnicky, Springer, 2011, pp. 13–28.
A. I. Pocketsphinx: A free, real-time continuous [39] Sherif, M. H. Protocols for secure electronic
speech recognition system for hand-held devices. In in commerce. CRC press, 2016.
Proceedings of ICASSP (2006). [40] Tretter, S. A. Communication System Design Using
[22] Iannucci, P. A., Netravali, R., Goyal, A. K., DSP Algorithms: With Laboratory Experiments for the
and Balakrishnan, H. Room-area networks. In TMS320C6713TM DSK. Springer Science & Business
Proceedings of the 14th ACM Workshop on Hot Topics Media, 2008.
in Networks (2015), ACM, p. 9. [41] Wang, Q., Ren, K., Zhou, M., Lei, T.,
[23] Jacobs, I. M., and Wozencraft, J. Principles of Koutsonikolas, D., and Su, L. Messages behind
communication engineering. the sound: real-time hidden acoustic signal capture
with smartphones. In Proceedings of the 22nd Annual
International Conference on Mobile Computing and [46] Yang, J., Tan, K.-S., Gan, W.-S., Er, M.-H., and
Networking (2016), ACM, pp. 29–41. Yan, Y.-H. Beamwidth control in parametric acoustic
[42] Westervelt, P. J. The theory of steady forces array. Japanese Journal of Applied Physics 44, 9R
caused by sound waves. The Journal of the Acoustical (2005), 6817.
Society of America 23, 3 (1951), 312–315. [47] Yoneyama, M., Fujimoto, J.-i., Kawamo, Y., and
[43] Westervelt, P. J. Scattering of sound by sound. Sasabe, S. The audio spotlight: An application of
The Journal of the Acoustical Society of America 29, 2 nonlinear interaction of sound waves to a new type of
(1957), 199–203. loudspeaker design. The Journal of the Acoustical
[44] Whitlow, D. Design and operation of automatic gain Society of America 73, 5 (1983), 1532–1536.
control loops for receivers in modern communications [48] Zhang, L., Bo, C., Hou, J., Li, X.-Y., Wang, Y.,
systems. Microwave Journal 46, 5 (2003), 254–269. Liu, K., and Liu, Y. Kaleido: You can watch it but
[45] Xiong, F. Digital modulation techniques. Artech cannot record it. In Proceedings of the 21st Annual
House, 2006. International Conference on Mobile Computing and
Networking (2015), ACM, pp. 372–385.

Backdoor: Making Microphones Hear Inaudible Sounds: Nirupam Roy, Haitham Hassanieh, Romit Roy Choudhury

Uploaded by

Copyright:

Available Formats

Backdoor: Making Microphones Hear Inaudible Sounds: Nirupam Roy, Haitham Hassanieh, Romit Roy Choudhury

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Backdoor: Making Microphones Hear Inaudible Sounds: Nirupam Roy, Haitham Hassanieh, Romit Roy Choudhury

Uploaded by

Copyright:

Available Formats

BackDoor: Making Microphones Hear Inaudible Sounds

Nirupam Roy, Haitham Hassanieh, Romit Roy Choudhury

University of Illinois at Urbana-Champaign

Aliasing noise fs/2

and non-linearity is essentially the “backdoor”. We sketch -80 -40

the basic technique next, followed by some measurements to -60

60 be extracted via band pass filtering. Thus, the net benefit is

no changes to the microphone. Put differently, non-linearity

may be a natural form of self-demodulation and frequency

Unfortunately, the ultrasound transmitter – a speaker with

Second, the microphone’s diaphragm exhibits resonance at

Secondary carrier (kHz)

essentially squares this whole signal as (SfRx 2

-100 -100 ulating bandwidth. Then, we need to convert this signal

-40 -40 frequencies

Power reduction (dB)

Our acoustic environment has large variations in volume lev- 10

Figure 12: Automatic Gain Control: (a) The 5kHz

jamming. We summarize the key results here, followed by -

Table 1: Perceived loudness of BackDoor in comparison to audible sounds.

Packet Error Rate(%)

-60 jammers can increase the radius proportionally. This can be

easily achieved. In fact, current portable Bluetooth speak-

Sound pressure (dBSpl)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.