IJMAV DetNearbyUAVs
IJMAV DetNearbyUAVs
IJMAV DetNearbyUAVs
net/publication/341512231
CITATIONS READS
6 200
3 authors:
Caleb Rascon
Universidad Nacional Autónoma de México
42 PUBLICATIONS 473 CITATIONS
SEE PROFILE
All content following this page was uploaded by Jose Martinez-Carranza on 09 July 2020.
Abstract
In this work, we address the problem of UAV detection flying nearby another UAV. Usually, computer vision could be
used to face this problem by placing cameras onboard the patrolling UAV. However, visual processing is prone to false
positives, sensible to light conditions and potentially slow if the image resolution is high. Thus, we propose to carry out
the detection by using an array of microphones mounted with a special array onboard the patrolling UAV. To achieve our
goal, we convert audio signals into spectrograms and used them in combination with a CNN architecture that has been
trained to learn when a UAV is flying nearby, and when it is not. Clearly, the first challenge is the presence of ego-noise
derived from the patrolling UAV itself through its propellers and motor’s noise. Our proposed CNN is based on
Google’s Inception v.3 network. The Inception model is trained with a dataset created by us, which includes examples
of when an intruder UAV flies nearby and when it does not. We conducted experiments for off-line and on-line
detection. For the latter, we manage to generate spectrograms from the audio stream and process it with the Nvidia
Jetson TX2 mounted onboard the patrolling UAV.
Keywords
Detection UAVs, audio classification, spectrograms, microphones
Date received: 15 March 2020; accepted: 7 April 2020
Introduction
task. However, if this is successful, it can be used to
Recently, the autonomous UAVs have grown in popu- find the relative direction of a sound source (such as
larity in aerial robotics since they are vehicles with another UAV) as well as to detect other sounds in dif-
multiples capabilities, with the help of on-board sen- ferent distance ranges. A useful manner with which
sors such as inertial measurement unit (IMU), laser, audio is represented in this type of applications is in
ultrasonics, and cameras (both monocular and the time–frequency domain, in which the spectrogram
stereo). Visual sensors can be used to generate maps of the signal is manipulated as if it were an image.
for 3D re-construction, autonomous navigation, search These images allow a detailed inspection of the noise
and rescue, and security applications. However, these of the rotors to analyse vibration and prevent future
applications face serious problems when attempting to failures in the motors. By detecting features inside the
detect another UAV in circumstances where the visual
range is lacking, which can cause collisions, putting by-
standers at risk in public places. Thus, it is necessary to
have strategies that employ other modalities other than 1
Instituto Nacional de Astrofisica, Optica y Electronica, Puebla, Mexico
vision to ensure the discovery of an intruder UAV. One 2
Computer Science Department, University of Bristol, Bristol, UK
such modality can be audio. 3
Instituto de Investigaciones en Matematicas Aplicadas y Sistemas,
Audio processing has been a topic of research for Universidad Nacional Autonoma de Mexico, Ciudad de Mexico, Mexico
years, which includes the challenge of recognising the
Corresponding author:
source of an audio signal. In aerial robotics, the signals J Martinez-Carranza, INAOE Luis Enrique Erro 1, Sta. Ma. Tonantzintla,
usually tend to present noise that disturbs the original San Andres Cholula, Puebla 72840, Mexico.
signal, making the recognition an even more difficult Email: carranza@inaoep.mx
Creative Commons Non Commercial CC BY-NC: This article is distributed under the terms of the Creative Commons Attribution-
NonCommercial 4.0 License (https://creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and dis-
tribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.
sagepub.com/en-us/nam/open-access-at-sage).
2 International Journal of Micro Air Vehicles
spectrogram, sound source detection and localisation environment with aerial vehicles; then describes the
may be possible over a UAV. hardware used; the subsequent section provides a
Recent works employ deep learning strategies (such detailed description of the proposed approach; the
as Convolutional Neural Networks, CNN) to classify analysis of the spectrograms for each class is then
sound sources, and many of these methods aim to learn described; the penultimate section presents the classifi-
features from a spectrogram. In this work, we propose cation results using the proposed approach; and con-
to use a CNN to classify a spectrogram aiming at clusions and future work are outlined in the last
detecting if an intruder UAV flies nearby a patrolling section.
UAV (see Figure 1).
We base our CNN-based classification model on
Related work
Google’s Inception v.3 architecture. The information
is separated into two different classes: with and without As mentioned earlier, UAVs that solely employ vision
a UAV. Each class has 3000 spectrograms for training. may be limited when detecting aerial vehicles in an
Each spectrogram is manipulated as though it is an environment near a flight zone. Thus, works with
image, with each pixel representing a time–frequency radars have used the micro-Doppler effect to detect a
bin, and its colour representing its energy magnitude. target1 or different targets.2 This is used as classifica-
Moreover, our approach aims to classify with a high tion due to changes in the velocity of the propellers,3,4
level of performance over different aerial platforms. To as well as other features.5 Additionally, when this effect
assess our approach, we have carried out off-line and is represented by its cadence frequency (CFS), it can be
on-line experiments. The latter means that we have used to recognise other descriptors like shape and size,
managed to carry out the detection in real-time achieving the classification of multiple classes of aerial
onboard the patrolling UAV during flight operation. vehicles.6
This paper is organised as follows: the next section As for audio processing techniques, they have been
provides related works which detect sources in the used in aerial robotics for classification, detection, and
Figure 1. Classification of audio in two different environments. Left: spectrogram of an intruder aerial vehicle nearby. Right:
spectrogram without an intruder vehicle nearby. https://youtu.be/B32_uYbL62Y
Cabrera-Ponce et al. 3
Figure 2. General overview to record the audio in two different environments and generate a dataset.
4 International Journal of Micro Air Vehicles
Spectrograms
Action Time (by mic)
Spectrograms
Action Time (by mic)
Figure 5. Comparison between spectrograms of manual control (top) and intruder UAV (bottom).
Figure 6. Spectrograms generated of activate motors (left) and intruder UAV flight (right).
CNN architecture with 4500 iterations for the model with recorded audios
and 15,000 iterations with the dataset of the audio
The convolutional neural network (CNN) proposed in
streaming. Since the softmax layers can contain N
the classification is based on the architecture of the
Google Net Inception v.3 (Figure 4) using Keras labels, the training process corresponds to learning a
2.1.4 and Tensorflow 1.4.0. We employed a transfer new set of weights; that is to say, it is technically a new
learning strategy, using a model that was already classification model.
trained on the ImageNet corpus20 and augmented it
with a new top layer to be trained with our dataset. Spectrograms
The resulting model will be focused on recognising the
spectrogram images to our application: detecting an The spectrograms generated from the audio files
intruder UAV flying near another. recorded and of audio streaming were processed for
The training data set was arranged in folders, each preliminary analysis to observe if there is a distin-
representing one class with approximately 3000 images guishable change in both the audio and the spectro-
to generate a model from recorded audios, and 4000 grams, when an intruder UAV is present or not. The
images obtained from audio streaming in real-time. analysis was made in the Audacity software21 to visu-
The models inherited the input requirement of the alise the spectrograms while we also listen to the
Inception V.3 architecture, receiving as input an RGB audios. For those spectrograms generated from
image of size 224 224 pixels. The network was trained the audio streaming in real-time, we compared the
6 International Journal of Micro Air Vehicles
Class Classification
Figure 8. Set-up for the real-time UAV classification: spectrograms acquisition and classification synchronized by the robot operating
system (ROS).
The experiments have carried out onboard of the present the architecture related to the acquisition of
Nvidia Jetson TX2 whose GPU architecture improved the spectrograms and their classification in real-time
the classifier’s performance in terms of processing time, using ROS.
thus enabling real-time detection. In Figure 8, we
8 International Journal of Micro Air Vehicles
Table 8. Real-time results onboard of the Matrice 100 We performed two flights, one with the intruder
UAV flying nearby the Matrice 100 during 335 s and
Flight 1 Flight 2
Class Accuracy Accuracy the other without a UAV nearby during 300 s. The first
flight consisted of the intruder having to fly around,
Intruder UAV 0.8409 0.2421 over and to the side of the Matrice 100, while the
No-intruder UAV 0.1590 0.7578 CNN ran in real-time classifying the spectrograms
Average time (s) 1.2526 1.2525
determinate whether an intruder UAV was detected
Note: The class with highest accuracy is highlighted in bold for each flight or not. For this flight, the Matrice 100 flew in hovering,
test. while the intruder UAV kept a distance between 30 cm
until 2 m w.r.t the Matrice 100. The second flight per-
formed the same for the Matrice 100, that is, in hover-
Table 9. Real-time intruder UAV detection for different ing, but without the intruder UAV flying around. Table
distances.
8 shows the results obtained for these flights with the
Class Flight 1 Flight 2 Flight 3 real-time detection in terms of accuracy and average
Accuracy Accuracy Accuracy time of the classification using the Nvidia Jetson TX2.
For these real-time experiments, we report an accu-
Intruder UAV 0.8356 0.7205 0.6677
No-intruder UAV 0.1643 0.2794 0.3322 racy of 0.8409 for the intruder UAV detection scenario.
Average time (s) 1.2432 1.2520 1.2986 This result is suitable to detect an intruder UAV locat-
No. of incorrect ed at 2 m of distance. For the second flight, the CNN
detections 31 53 64 obtained an accuracy of 0.7578 for the no-intruder
detection scenario. In comparison with the model
Note: The class with highest accuracy is highlighted in bold for each flight
test trained off-line, on-line detection is slightly lower. We
argue that this can be improved by adding more
Figure 9. Image sequence that shows the process of real-time classification to detect an intruder UAV flying nearby the Matrice 100
vehicle, which carried the microphones and computer hardware for the processing, including the Nvidia Jetson TX2 computer.
Cabrera-Ponce et al. 9
training data. We could also change Hann’s window We found that the real-time detection decays when
with more samples to generate a spectrogram with the intruder UAV flies farther than 5 m. However, we
major quality. believe this can be improved by using higher quality
In addition to the above, we performed other three microphones and more training data. Our future
flights with the intruder UAV near the Matrice 100. work includes detecting the direction of the intruder
The motivation for these experiments aimed at assess- UAV and its distance w.r.t the patrolling UAV.
ing the performance of the classifier to detect the
intruder UAV flying from different points near the Declaration of conflicting interests
Matrice 100. The first and second flights were per- The author(s) declared no potential conflicts of interest with
formed over and to the side of the Matrice 100. For respect to the research, authorship, and/or publication of this
the third flight, the intruder UAV was located at a article.
distance of 5 m. The results are presented in Table 9,
showing the accuracy, average time and the number of Funding
spectrograms that were classified incorrectly during the
The author(s) received no financial support for the research,
180 s of the flight, which is indicated as No. of incorrect
authorship, and/or publication of this article.
detections.
As noted in Table 9, the nearer the intruder flew to
ORCID iDs
the Matrice 100, the better the classification accuracy.
Aldrich A Cabrera-Ponce https://orcid.org/0000-0002-
However, flight 3 shows that the CNN struggles when
9998-7444
the intruder is farther (5 m) from the Matrice 100. Yet,
Caleb Rascon https://orcid.org/0000-0002-1176-6365
a detection accuracy of 0.667 was obtained, which
could be still exploited to detect suspicious activity.
In Figure 9, we show some images of the experiment References
performed in real-time using the classifier to detect the 1. Ritchie M, Fioranelli F, Borrion H, et al. Multistatic
intruder UAV. micro-Doppler radar feature extraction for classification
of unloaded/loaded micro-drones. IET Radar Sonar
Navig 2016; 11: 116–124.
Conclusion 2. Harmanny R, De Wit J and Cabic GP. Radar micro-
Doppler feature extraction using the spectrogram and
In this work, we have presented a CNN-based classifier
the cepstrogram. In: 11th European radar conference,
for the detection of an intruder UAV flying nearby Rome, Italy, 8-10 October 2014, pp. 165–168.
another patrolling UAV. The main goal was to carry Piscataway, NJ: IEEE.
out the detection by using only audio singles recorded 3. Fioranelli F, Ritchie M, Griffiths H, et al. Classification
with an array of microphones mounted on the patrol- of loaded/unloaded micro-drones using multistatic radar.
ling UAV. A time–frequency spectrogram was used as Electron Lett 2015; 51: 1813–1815.
the signal representation, which is compatible with 4. De Wit J, Harmanny R and Molchanov P. Radar micro-
known CNN-based architectures. We employed a Doppler feature extraction using the singular value
transfer-learning strategy, with which the top layer of decomposition. In: International radar conference, Lille,
France, 13-17 October 2014, pp. 1–6. Piscataway, NJ:
a pre-trained Google’s Inception V.3 model was mod-
IEEE.
ified and trained, which made the training process very
5. Molchanov P, Harmanny RI, de Wit JJ, et al.
efficient. Classification of small uavs and birds by micro-Doppler
We conducted experiments in outdoors to assess the signatures. Int J Microwave Wireless Technol 2014; 6:
performance of our classifier in off-line and on-line 435–444.
mode. For the former, a database of spectrograms 6. Zhang W and Li G. Detection of multiple micro-drones
was produced from recorded raw audio signals. For via cadence velocity diagram analysis. Electron Lett 2018;
the latter, the audio streaming was processed directly 54: 441–443.
to produce spectrograms in real-time, which were used 7. Christian AW and Cabell R. Initial investigation into the
for training and later on for classification in real-time psychoacoustic properties of small unmanned aerial
system noise. In: 23rd AIAA/CEAS aeroacoustics confer-
during a flight. In sum, for off-line detection, our CNN
ence, Denver, Colorado, USA , 5-9 June 2017, p. 4051.
obtained an accuracy of 0.97 for the intruder detection
8. Furukawa K, Okutani K, Nagira K, et al. Noise corre-
and 0.82 for the no-intruder detection. For the real- lation matrix estimation for improving sound source
time experiments, we achieved an accuracy of 0.84 localization by multirotor uav. In: IEEE/RSJ internation-
and 0.75, respectively, with an average time of 1.2 s to al conference on intelligent robots and systems, Tokyo,
generate the spectrogram from the audio stream and Japan, 3–7 November 2013, pp. 3943–3948.
classify it with the CNN. Piscataway, NJ: IEEE.
10 International Journal of Micro Air Vehicles
9. Ohata T, Nakamura K, Mizumoto T, et al. Improvement source localization in outdoor environments. Sensors
in outdoor sound source detection using a quadrotor- 2017; 17: 2535.
embedded microphone array. In: IEEE/RSJ international 16. Kaleem Z and Rehmani MH. Amateur drone monitor-
conference on intelligent robots and systems, Chicago, IL, ing: state-of-the-art architectures, key enabling technolo-
USA, 14-18 September 2014, pp. 1902–1907. Piscataway, gies, and future research directions. IEEE Wireless
NJ: IEEE. Commun 2018; 25: 150–159.
10. Park S, Shin S, Kim Y, et al. Combination of radar and 17. Jeon S, Shin JW, Lee YJ, et al. Empirical study of drone
audio sensors for identification of rotor-type unmanned sound detection in real-life environment with deep neural
aerial vehicles (UAVS). In: IEEE sensors, Busan, South networks. In: 25th European signal processing conference
Korea, 1-4 November 2015, pp. 1–4. Piscataway, NJ: (EUSIPCO), Kos, Greece, 28 August - 2 September
IEEE. 2017, pp. 1858–1862. Piscataway, NJ: IEEE.
11. Misra P, Kumar AA, Mohapatra P, et al. Aerial drones 18. Kim BK, Kang HS and Park SO. Drone classification
with location-sensitive ears. IEEE Commun Mag 2018; using convolutional neural networks with merged
56: 154–160. Doppler images. IEEE Geosci Remote Sens Lett 2017;
12. Harvey B and O’Young S. Acoustic detection of a fixed- 14: 38–42.
wing UAV. Drones 2018; 2: 4. 19. Grondin F, Letourneau D, Ferland F, et al. The many-
13. Harvey B and O’Young S. A harmonic spectral beam- ears open framework. Autonom Robots 2013; 34:
former for the enhanced localization of propeller-driven 217–232.
aircraft. J Unmanned Vehicle Syst 2019; 7: 156–174. 20. Deng J, Dong W, Socher R, et al. Imagenet: a large-scale
14. Ruiz-Espitia O, Martinez-Carranza J and Rascon C. hierarchical image database. In: IEEE conference on com-
Aira-uas: an evaluation corpus for audio processing in puter vision and pattern recognition, Miami, FL, USA, 20-
unmanned aerial system. In: International conference on 25 June 2009, pp. 248–255. Piscataway, NJ: IEEE.
unmanned aircraft systems (ICUAS), Dallas, TX, USA, 21. Mazzoni D and Dannenberg R. Audacity [software]. The
12-15 June 2018, pp. 836–845. Piscataway, NJ: IEEE. Audacity Team, Pittsburg, PA, USA, 2000.
15. Hoshiba K, Washizaki K, Wakabayashi M, et al. Design
of UAV-embedded microphone array system for sound