מאמר6
מאמר6
מאמר6
Abstract
This paper proposes a new approach for detecting human survivors in destructed environments
using an autonomous robot. The proposed system uses a passive infrared sensor to detect the ex-
istence of living humans and a low-cost camera to acquire snapshots of the scene. The images are
fed into a feed-forward neural network, trained to detect the existence of a human body or part of
it within an obstructed environment. This approach requires a relatively small number of images
to be acquired and processed during the rescue operation, which considerably reduces the cost of
image processing, data transmission, and power consumption. The results of the conducted expe-
riments demonstrated that this system has the potential to achieve high performance in detecting
living humans in obstructed environments relatively quickly and cost-effectively. The detection
accuracy ranged between 79% and 91% depending on a number of factors such as the body posi-
tion, the light intensity, and the relative color matching between the body and the surrounding
environment.
Keywords
Urban Search and Rescue, Human-Robot Interaction, Autonomous Systems, Intelligent Interface,
Neural Networks
1. Introduction
In light of recent catastrophes, which are either natural such as earthquake, hurricane, flood, and fire, or man-
made such as wars, bombing, and terrorist attacks, the need for creating new ways to rescue the survivors who
How to cite this paper: Awad, F. and Shamroukh, R. (2014) Human Detection by Robotic Urban Search and Rescue Using
Image Processing and Neural Networks. International Journal of Intelligence Science, 4, 39-53.
http://dx.doi.org/10.4236/ijis.2014.42006
F. Awad, R. Shamroukh
get trapped under rubble, in the fastest way possible, is a must. According to the field of Urban Search and Res-
cue (USAR), the probability of saving a victim is high within the first 48 hours of the rescue operation, after that,
the probability becomes nearly zero [1].
During the rescue operation, several human factors are deployed such as fire fighters, policemen, and medical
assistants. All of them are exposed to very dangerous situations which are caused by the destructed environment
they work in, such as collapsed buildings, landslides, and crater. Hence, the rescuer may become a victim who
needs to be rescued. Therefore, the rescue operation imposes a substantial risk on the rescue personnel them-
selves. From this point of view, looking for alternatives to human rescuers has been of a great demand.
Trained dogs have also been used in this field because of their high sensitivity to any slight motion or human
presence. However, in such situations, it is hard to depend totally on them because even if they can detect the
presence of a living victim, they still cannot judge the situation or relay the information quickly and systemati-
cally. For that reason, dogs cannot work independently but rather side by side as assistants to human rescuer.
Therefore, the need is for a totally or partially independent alternative to the human factor.
Meanwhile, robots were achieving relatively good progress in other fields such as education, industry, mili-
tary, and medicine and have proven their robustness and efficiency in most of what they are designed for.
Therefore, robots are expected to play an important role in replacing the human factor in many of these fields.
Even though the USAR field seems to be very challenging for robots, robots have already invaded this field
like many other fields. The first real attempt to use USAR robots was during the rescue of the World Trade
Center (WTC) disaster in New York in 2001. Since then, a worldwide interest in using robots in rescue opera-
tions has been rapidly increasing.
During the WTC rescue operation, six remotely-controlled robots were used to assist human rescuers in find-
ing and locating trapped victims [1]. These robots were remotely-controlled by operators. Each robot was
equipped with a camera and a wireless communications link and was supposed to transmit images of the scenes
they traverse. However, the attempt faced several critical problems due to the remotely-controlled nature of the
robots:
1) More than 25% of the communications between the wireless robot and the control unit was extremely noisy
and therefore was useless. This eventually led to a loss of communication between the robots and the opera-
tors. As a result, the robots stopped working totally and got lost in the rubble because they do not have any
autonomous intelligence to continue on their own.
2) A very high communication cost was incurred due to the large number of transmitted images to the operator,
especially when the communication line is very error-prone.
3) A very high processing cost is also incurred by the capturing, storing, and transmitting a large number of
images.
4) There was a continuous need for illumination due to the dark nature of these environments. Therefore, a con-
tinuous light source would be needed in order to be able to acquire and send images. This requires a rela-
tively large power supply, which is not feasible in such situations.
As a result, the attempt of using robots at the WTC failed to accomplish even a small part of its mission [1].
This research is an attempt to tackle the problems faced the USAR robots during the WTC rescue operation
via equipping the robot with some autonomous intelligence towards the ultimate goal of designing a fully auto-
nomous robot.
In this research, a new approach for detecting surviving humans in destructed environments using an auto-
nomous USAR robot is proposed. The proposed system uses a passive infrared sensor (PIR) to detect the exis-
tence of living humans and a low-cost camera to acquire snapshots of the scene as needed. Having detected a
sign of a living human, the PIR sensor triggers a camera to acquire an image of the scene. The image is fed into
a feed-forward neural network (NN), trained to detect the existence of a human body or part of it in the image
for different positions within an obstructed environment. This approach requires a relatively small number of
images to be acquired and processed during the rescue operation. In this way, the real-time cost of image
processing and data transmission is considerably reduced. The transmitted information may include any relevant
phenomena that the robot is prepared to sense such as the location, the temperature, the level of certain types of
gases, etc. The information may also include the image itself in order for the rescue team to have a better evalu-
ation of the situation.
The robot is assumed to have the capability to determine its current location in real-time, to wirelessly com-
municate with the rescue team, and to locally store the status and location information about the trapped victims
40
F. Awad, R. Shamroukh
2. Related Work
Image processing, neural networks (NN), and passive infrared sensing were used in detecting the presence or
shape of the human body for several applications such as surveillance and emergency rescue services. However,
to the best of our knowledge, a combined use of all of them in the way proposed here has not been cited.
In this section, a brief discussion of some of the previously proposed techniques in the literature is presented.
Moradi presented a method that uses the infrared radiation emitted by the human body using an infrared cam-
era that acquires thermal images of the surrounding environment [2]. This system includes six main parts: Image
Acquisition, Initial Processing, Feature Extraction, Acquisition Storage, Knowledge Base, and Recognition. The
features within the images are then classified using the Dystall NN. The number of nodes and layers in this sys-
tem depends on the desired accuracy. The difference between the training stage and the testing stage of the NN
is compared with a pre-defined threshold. If the difference is more than that threshold, then a human is detected.
This method has a number of advantages that distinguish it from other similar methods such as: real-time
processing speed, high efficiency, and fast learning. However, there are several factors that complicate the
process of the detection and recognition such as: the absorbency of thermal energy by the environment and the
victim's clothes, the existence of identical thermal sources such as fire or another living creature (e.g. an animal),
and that the images taken by the IR camera usually have relatively low resolution. In addition to that, the cost of
the thermal camera is relatively very high.
Burion presented a project that aimed to provide a sensor suite for human detection for the USAR robots [3].
This study evaluated several types of sensors for detecting humans such as pyroelectric sensor, USB camera,
41
F. Awad, R. Shamroukh
microphone, and IR camera. The pyroelectric sensor was used to detect the human body radiation, but its limita-
tion was its binary output. The USB camera was used for motion detection, but its limitation was its sensitivity
to changes in light intensity. The microphone was used for long duration and high amplitude sound detection,
but it was severely affected by noise. Lastly, the IR camera was used to detect humans by their heat image, but it
was affected by other nearby hot objects. The algorithm was based on collecting data from all these sensors as
an attempt to improve the robustness of the final results. The main idea was to detect a change in the image
scene by checking the values of the pixels. Several images for the scene were acquired and subtracted from each
other to discover if a motion has occurred. The used technique was fairly efficient in detecting the victims.
However, the robot was not fully autonomous and was dependent on the operator. No neural network was used
in the control program, and no detection of the human shape was used.
Nakajimaa et al. proposed a system that learns from examples in order to recognize the person’s body in the
images taken indoors [4]. The images were represented by color-based and shape-based features. The recogni-
tion process is carried out by using the linear Support Vector Machine (SVM) classifiers. This system works in
real-time and it can achieve high recognition rate on normalized color histograms of people’s clothes. However,
the main limitation of this system is that it demonstrated high performance rates only when both the training and
test images were recorded during the same day. When the test set contained images of a day that is not repre-
sented in the training set, the performance of the system drops down to about 53% due to the change of clothing
the person might have every day.
Mohan et al. presented a hierarchical technique for developing a system that locates people in images [5]. In
this Adaptive Combination of Classifiers (ACC) technique, the learning occurs in multiple stages. The system is
first trained to find the four components of the human body: the head, legs, left arm, and right arm separately
from each other. After ensuring that these parts are present in a proper geometric configuration, the system
combines the results in order to classify a pattern as either a “person” or a “non-person”. The obtained results
demonstrated that this system performed significantly better than a similar full-body person detector [4]. More-
over, the system handles the variations in lighting and noise in an image better than a full-body detector.
Engle and Whalen presented a simulation study for a multiple robots foraging in a simulated environment [6].
The robots were supposed to gather two different sets of resources from a walled environment containing ob-
stacles. Each set of resources must be returned to a specific goal. The robots were equipped with laser range
finder for long range obstacle detection, a thermal sensor which detects heat around resources, and a color sen-
sor to discriminate between different types of resources. The robots were supposed to search for humans by their
body heat, and distinguish humans from non-human objects by using the color sensor. The final simulation was
done using just one robot due to time constraints and implementation complexity and the used navigation algo-
rithm could not cover the whole environment. In addition, the financial cost was of the robot was relatively high.
Cavalcanti and Gomes presented an approach for determining the presence of people in still images [7]. It starts
with the segmentation of the skin areas in the image using YCbCr skin filter using a color space with a set of
experimental thresholds. Then, the detected skin areas are grouped together to form body regions. This tech-
nique was been applied on a database of still images and achieved 77.2% correct classification. The advantage
of this approach is its simplicity and relative low processing cost. However, higher-level analysis would be
needed in order to achieve higher detection performance.
Several research papers [8]-[12] were published regarding the performance of the National Institute of Stan-
dard and Technology (NIST) Standards. The papers list the criteria for designing standard arenas and victim
models and their characteristics. The arenas are supposed to simulate the destruction caused by the real catas-
trophes. In addition, they are supposed to provide tangible, realistic, and challenging environments for mobile
robot researchers interested in urban search and rescue applications. The arenas were modeled for buildings in
various stages of collapse, where the robots were allowed to traverse and were tested for obstacle avoidance and
victim detection by repeatedly testing their sensory, navigation, mapping, and planning capabilities. These are-
nas differ in their level of complexity. They are divided into three levels, the Yellow arena, the Orange arena,
and Red arena. The Yellow arena, which is the simplest one, simulates a home after an earthquake or a similar
disaster. The floor of this arena is easy to navigate so that most robots can handle it. There is no loose paper, no
ramps, etc. The Orange arena is harder than the Yellow arena. The major distinguishing feature of the Orange
arena is a ramp with a chicken-wire. In addition, the floor is covered with paper, which is much harder to tra-
verse by small wheeled robots. The Red arena is much harder than the other two. There are rubble and obstacles
all over the place. There is also enough metal to cause serious problems to wireless communication. Moreover,
42
F. Awad, R. Shamroukh
simulated victims with various signs of life can be found throughout the three arenas. Some of these signs are
human form, motion, audio, thermal, and chemical signatures that represent victims of various states. The robots
are awarded points for accurate mapping and victim detection according to the predefined performance metrics.
Trierscheid et al. proposed a new technique for detecting presence of a victim [13]. The technique depends on
the spectral imaging that analyzes an invisible light radiation of the human body, which is a near infrared (NIR)
light that has shorter wavelengths. This light has been used too to solve the problem of dust coverings in rescue
environments. The advantage of this technique is that it does not depend on temperature, unlike thermal imaging,
such that cold bodies (i.e. dead persons) can also be detected. The algorithm was responsible for locating victims
and marking them in a given image.
Zhao et al. presented a technique for detecting the breathing and heartbeat signals of a living human [14].
This system can detect passive victims who are either completely trapped or too weak to respond to the existing
traditional detection systems. The system depends on sending a low-intensity microwave beam to penetrate
through the rubble. If a living human exists, his/her small amplitude body-vibrations due to the breathing and
heartbeat will modulate the back scattered microwave signal. After that, the reflected wave from the body is
demodulated and the breathing and heartbeat signals can be extracted in-order to locate the buried human. This
system was able to detect buried humans under rubbles of nearly 90cm of thickness.
3. System Implementation
In order to emulate the proposed search and rescue robotic system and evaluate its performance, a laptop PC,
which represents the main processing and control unit (MPCU) of the system, was attached to a navigation me-
chanism (e.g. a large toy car) via a navigation control unit (NCU).
The NCU consists of electronic circuitry that interfaces the sensors (as system inputs) and the navigation me-
chanism (as a system output) to the main processing and control unit. The system was equipped with three types
of sensors:
1) A PIR sensor to detect the presence of a living human based on body radiation.
2) An Infra-Red (IR) range sensor to detect the obstacles in the way of the robot.
3) An image sensor, which is a traditional web camera to acquire still images according to orders from the con-
trol program when a need arises.
The MPCU consists of the following components (see Figure 1):
1) The navigation algorithm, which is responsible for steering the robot throughout the search area and around
obstacles, based on the readings of the PIR and the IR sensors while searching for survivors.
2) The location tracking algorithm, which is responsible for calculating the real-time coordinates of the robot in
order to report the locations of the survivors to the rescue personnel. The real implementation of a search and
rescue robot is assumed to be equipped with a Global Positioning System (GPS) or a GPS-like system for
location identification.
43
F. Awad, R. Shamroukh
3) The image processing and neural networks unit (IPNNU), which is responsible for detecting a human being
in a picture received from the camera. This unit is the core of this research and will be discussed in details in
the following sections.
4) The wireless communication unit is responsible for providing reliable communication link with the rescue
personnel in order to provide the locations of the survivors in addition to any other information the system
may be designed to provide such as a snapshot of the scene so that the rescue personnel may be able to assess
the situation of the survivor and prioritize their rescue tasks. In our system, the laptop PC was equipped with
a wireless local area networking adapter, which was used for communications.
It should be noted that the design of the navigation and location tracking algorithms is beyond the scope of
this research. Therefore, simple obstacle avoidance navigation algorithm and simple step counting location iden-
tification were used in the system.
The work flow of the system is as follows:
1) The MPCU continuously reads the IR range sensor via the NCU. When the IR sensor is triggered (i.e.; an
obstacle is detected), the MPCU signals the NCU to change the direction of the movement based on the na-
vigation algorithm and keeps track of the current location and the status of the communication link.
2) The MPCU periodically reads the PIR sensor. If the PIR sensor is triggered, the MPCU signals the NCU to
stop the robot and the camera to acquire an image.
3) The image (i.e.; a still frame) is passed to the IPNNU, where it is preprocessed by the image processing algo-
rithm (e.g.; converting to gray scale, resizing, and canceling some background effects) and is then fed to the
pre-trained NN.
4) The NN determines whether a human shape or a body part exists in the image and returns the result back to
the MPCU by categorizing the image based on the training sets used.
5) If a human is detected, the MPCU reports the location to the rescue center. If the communication link is not
available, the MPCU may decide to buffer the information and continue its normal operation until the com-
munication is resumed. It may choose to move back to a nearby location where the communication link is
known to be usable before it continues. The MPCU might also decide to send the image as well if a need
arises. However, the cost of transmitting an image is relatively high in terms of delay, processing, and com-
munication. Therefore, sending an image must be well justified. Otherwise, the battery lifetime of the robot
will be reduced significantly. The MPCU may also have the capability to compress the image before sending
it, which may reduce the cost significantly, depending on the resources available at the MPCU.
Figure 2 shows the simplified flowchart of the system operation.
Before the experiments were conducted, the detection ranges and coverage angles of the PIR sensor and the
camera had to be well-tested and unified in order to make sure that the camera is capable of “seeing” the survivor
that has triggered the PIR sensor with enough quality for processing and detecting. The test results were as follows:
The maximum detection range of the PIR sensor was found to be 5 m at 180˚.
The maximum distance between the camera and the object, Dmax , of which the image is to be acquired with
an acceptable image quality, was found to be 2 m.
The radius, R, of the image taken at 2 m by the camera was found to be about 1.15 m (see Figure 3(a)).
Based on the detection range measurements of the camera, the coverage angle, θ, was calculated as follows:
R
θ=
2 × tan −1 =60 (1)
Dmax
Due to the incompatibility in the detection range and the coverage angle between the PIR sensor and the cam-
era, the following procedure was performed in order to obtain a unified coverage of both:
1) In order to reduce the coverage angle of the PIR sensor, an aluminum cap was used. The length of the cap, L,
was determined as related to the diameter of the PIR sensor lens, d, which was 9.5 mm, as follows (see Fig-
ure 3(b)):
θ d
tan = (2)
2 L
9.5 mm
=L = 16.5 mm (3)
( )
tan 30
44
F. Awad, R. Shamroukh
Start
Navigate
PIR No
Signal?
Yes
Stop
Acquire an Image
Human No
Detected?
Yes
Send Location and
Image to Rescue Team
No
Done?
Yes
End
PIR Sensor
Aluminum Cap
R
d
Dmax Ѳ
Ѳ L
H
α τ
Dmax
(a) (b)
Figure 3. The cover angle of the camera.
45
F. Awad, R. Shamroukh
2) In order to decrease the effective detection range of the PIR sensor from 5 m to 2 m, the PIR sensor was
tilted down towards the ground by a calculated degree. Assuming that the PIR sensor is positioned on the
robot such that the far edge of the cap is at a certain height, H, from the ground, the tilting angle, τ, of the
sensor is calculated as follows assuming that H = 0.5 m:
H
∝= tan −1 (4)
Dmax
θ
τ≈ + ∝ ( since d Dmax ) (5)
2
−1 0.5 m
=∝ tan
= 14 (6)
2m
60
τ= + 14 = 44 (7)
2
46
F. Awad, R. Shamroukh
P1
P2 W01
a1 b1
W02
a2 . b2
. W11
.
…………….
t1
…………………….
…………………….
. W12 c1
W0255
c2 t2
W13
c3
W0256
t3
a255 b255
Epoch
90%
80%
Detection Accuracy, A(n)
70%
60%
50%
40%
30%
20%
10%
0%
0 50 100 150 200 250 300 350 400 450 500
saturate for n > 350 to about 84% accuracy at 500 images, which indicates a logarithmic behavior. Therefore,
experimental regression analysis was performed on the training results and the trend of A(n) was estimated as:
n
−
A( n)= 1 − e 180
(8)
47
F. Awad, R. Shamroukh
it either almost totally covered (only allowing the body radiation to be sensed) or is in a position, where none
of the human body features is clearly visible. Therefore, the detection of a human in these cases is consi-
dered as a false positive type of decision.
2) The environmental characteristics around the body such as the lighting level and whether there are features
or objects around it. The features and objects were emulated by pieces of furniture for indoor experiments
and rocks and trees for outdoor experiments. The different types of environments used and their characteris-
tics are listed in Table 2. This combination of features was used in order to test the robustness of the system
48
F. Awad, R. Shamroukh
Case 1 A full body was laid down facing the camera with no hidden parts
Case 2 A full body was laid on the side of the detection area with no hidden parts
Case 6 Human head was placed against the camera such that most other parts are invisible
in detecting the human body when it is either surrounded or partially covered with debris or rocks or when it
is laid in a dark area, etc.
3) The color levels of clothes dressed by the human being rescued; ranging from very dark to very bright. The
reason this factor was added is to test the robustness of the system in recognizing the features of the human
body under different combinations of color contrast levels between the body itself and the surrounding envi-
ronment. The color contrast levels used are listed in Table 3.
49
F. Awad, R. Shamroukh
VD Very dark
D Dark
M Moderate
B Bright
VB Very Bright
40%
35%
30%
25%
20%
15%
10%
5%
0%
Case 1 Case 2 Case 3 Case 4 Case 5 Case 6
Env. 1 Env. 2 Env. 3 Env. 4 Case Avg.
50%
45%
40%
35%
30%
25%
20%
15%
10%
5%
0%
Case 1 Case 2 Case 3 Case 4 Case 5 Case 6
VD D M B VB Case Avg.
Figure 8. Overall detection error rate statistics per clothes color level.
conditions for detection accuracy with 22.5% to 24% error rate, while the medium to dark clothes color contrast
provide the best conditions for detection accuracy is all cases and environment conditions with only 6% to 11%
error rate.
50
F. Awad, R. Shamroukh
D 4 1 5 0 5 0 5 0
M 5 0 5 0 5 0 5 0
B 5 0 5 0 5 0 5 0
VB 5 0 0 5 5 0 5 0
SUM 19 6 20 5 25 0 25 0
Case 2
VD 0 5 5 0 5 0 4 1
Clothes Color
D 4 1 5 0 5 0 4 1
M 5 0 5 0 5 0 5 0
B 5 0 5 0 5 0 5 0
VB 5 0 0 5 5 0 5 0
SUM 19 6 20 5 25 0 23 2
Case 3
VD 0 5 5 0 2 3 4 1
Clothes Color
D 2 3 5 0 4 1 5 0
M 4 1 4 1 4 1 5 0
B 5 0 4 1 5 0 5 0
VB 5 0 1 4 5 0 5 0
SUM 16 9 19 6 20 5 24 1
Case 4
VD 0 5 5 0 3 2 4 1
Clothes Color
D 2 3 4 1 4 1 4 1
M 4 1 4 1 4 1 4 1
B 5 0 4 1 4 1 5 0
VB 5 0 0 5 5 0 5 0
SUM 16 9 17 8 20 5 22 3
Case 5
VD 0 5 0 5 0 5 0 5
Clothes Color
D 0 5 0 5 0 5 0 5
M 0 5 0 5 0 5 1 4
B 0 5 0 5 1 4 1 4
VB 1 4 1 4 1 4 2 3
SUM 1 24 1 24 2 23 4 21
Case 6
VD 0 5 1 4 0 5 0 5
Clothes Color
D 0 5 0 5 0 5 0 5
M 0 5 0 5 0 5 1 4
B 0 5 0 5 1 4 1 4
VB 1 4 0 5 0 5 2 3
SUM 1 24 1 24 1 24 4 21
51
F. Awad, R. Shamroukh
Table 5. Summary of the experimental detection error rates per environment type.
C1 C2 C3 C4 C5 C6 Avg.
E3 0% 0% 20% 20% 8% 4% 9%
Table 6. Summary of the experimental detection error rates per clothes color level.
Color C1 C2 C3 C4 C5 C6 Avg.
M 0% 0% 15% 20% 5% 5% 8%
6. Conclusions
In this paper, a new system and methodology for detecting surviving humans in destructed environments using
an emulated autonomous robot is proposed. The proposed system applies existing image processing and neural
networks techniques for detecting human body or body parts within the camera’s range of view. The perfor-
mance of the system was validated and evaluated via experimentation.
A number of experiments were designed and conducted and a large number of scenarios were tested. For each
scenario, the tests were conducted within a number of designed environments that differ in their illumination
degree and existing surrounding features. In each environment, different positions of the human body were
tested.
The experiments demonstrated that the proposed system can achieve a relatively high detection accuracy of
up to 91% in certain conditions and an overall average detection accuracy of 86% for all experiments and envi-
ronments tested.
This research demonstrates that using relatively simple image processing and neural networks techniques in
critical applications such as the urban search and rescue is conceptually efficient and has the potential for play-
ing a significant role in more sophisticated urban search and rescue systems.
References
[1] Shah, B. and Choset, H. (2004) Survey on Urban Search and Rescue Robotics. Journal of the Robotics Society of Ja-
pan, 22, 582-586. http://dx.doi.org/10.7210/jrsj.22.582
[2] Moradi, S. (2002) Victim Detection with Infrared Camera in a Rescue Robot. IEEE International Conference on Ar-
tificial Intelligence Systems. http://ce.sharif.ac.ir/~rescuerobot/downloads/victim_detection.pdf
[3] Burion, S. (2004) Human Detection for Robotic Urban Search and Rescue. Infoscience Database of the Publications
and Research Reports, Technical Report.
[4] Nakajimaa, C., Pontilb, M., Heiselec, B. and Poggioc, T. (2003) Full-Body Person Recognition System. The Journal of
the Pattern Recognition Society, 36, 1997-2006. http://dx.doi.org/10.1016/s0031-3203(03)00061-x
[5] Mohan, A., Papageorgiou, C. and Poggio, T. (2001) Example-Based Object Detection in Images by Components. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 23, 349-361. http://dx.doi.org/10.1109/34.917571
[6] Engle, S. and Whalen, S. (2003) Autonomous Multi-Robot Foraging in a Simulated Environment. UC Davis Report.
[7] Cavalcanti, C. and Gomes, H. (2005) People Detection in Still Images Based on a Skin Filter and Body Part Evidence.
52
F. Awad, R. Shamroukh
53