Faia285 1777

ECAI 2016 1777
G.A. Kaminka et al. (Eds.)

© 2016 The Authors and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-672-9-1777
Planning Search and Rescue Missions for UAV Teams

Chris A. B. Baker and Sarvapali Ramchurn and W. T. Luke Teacy1 and Nicholas R. Jennings2
Abstract. The coordination of multiple Unmanned Aerial Vehicles although related to the exploration of a disaster space, are designed to
(UAVs) to carry out aerial surveys is a major challenge for emergency find a known number of targets that are in motion, rather than an un-
responders. In particular, UAVs have to fly over kilometre-scale areas known number of survivors distributed over an area: a very different
while trying to discover casualties as quickly as possible. To aid in problem since algorithms must be able to make predictive estima-
this process, it is desirable to exploit the increasing availability of tions of people that might be present in as-yet unvisited locations.
data about a disaster from sources such as crowd reports, satellite re- Other developments in path planning focus on trying to reach a set
mote sensing, or manned reconnaissance. In particular, such inform- goal location [9, 19], or working with single autonomous vehicles
ation can be a valuable resource to drive the planning of UAV flight [8, 18]; neither of which fulfil the need for algorithms that coordin-
paths over a space in order to discover people who are in danger. ate multiple vehicles in an explorative traversal of the disaster space,
However challenges of computational tractability remain when plan- rather than aiming for a particular final location. Moreover, specific
ning over the very large action spaces that result. To overcome these, challenges exist in coordinating multiple vehicles. For example, there
we introduce the survivor discovery problem and present as our solu- is often no benefit to multiple UAVs providing sensor data of the
tion, the first example of a continuous factored coordinated Monte same location: there must be coordination between the vehicles to
Carlo tree search algorithm. Our evaluation against state of the art allow them to find survivors in a disaster, without all attending the
benchmarks show that our algorithm, Co-CMCTS, is able to localise same locations. Additionally, UAVs must be able to coordinate over
more casualties faster than standard approaches by 7% or more on actions temporally: visiting the same location at the same time might
simulations with real-world data. be straightforward to avoid, but planning a UAVs current action to
account for the future action of other nearby UAVs is a non-trivial
problem, particularly as the number of UAVs in an environment in-
1 Introduction creases.
The increased prevalence of low-cost, robust, commercially available Against this background, some work has recently been performed
Unmanned Aerial Vehicles (UAVs) has led to concerted efforts to on planning problems that involve the coordination of multiple
utilise these platforms in order to aid first responders with collecting vehicles, including in disaster scenarios [2, 4]. However, as yet both
sensory data without putting human lives at risk [1]. In particular, these approaches require some degree of simplification before plan-
work has focused on developing autonomous systems to minimise ning can commence by discretising the environment into a number of
the involvement of overstretched first responder personnel, and to cells to be examined. Since locations in the real-world are not discret-
ensure UAVs can take action quickly [22, 24]. Key to this work, is ised in this way, this requires some additional processing of incoming
the idea of enabling coordinated UAVs to explore a disaster space to data before UAVs can begin their exploration. Furthermore, by sim-
discover the spatial location of casualties: a difficult task given the plifying data in this way it is inevitable that some information is lost
spatial extent to explore and the continuous action-space represented when a UAV can only be considered as visiting single cells, rather
by a UAV’s axes of motion.3 than being able to plan according to a continuous range of motion.
To enable this exploration, advances in data collection have cre- In this paper we seek to address these shortcomings in planning
ated new sources of information about disaster scenarios that con- UAV searches of disaster areas, specifically with a view to future
tribute to increased awareness of the situation on the ground during a applications and field-tests as part of the MOSAIC collaboration4
disaster. In particular, data gathered by crowd-sourcing is becoming and ongoing work with Rescue Global.5 In particular we make the
more readily available because of the speed with which it can be gen- following contributions to the state of the art:
erated, and its ability to directly reflect the experiences of the people 1. We introduce a novel formulation of the survivor discovery prob-
on-scene; people who can often give a very accurate report on the lem; specifically modelled on a likely real-world scenario with the
hazards in their vicinity and the number of potential casualties [21]. goal of locating an unknown number of people, over a wide area,
However, despite these advancements, at present there is little work by detection of mobile phone signals and where the diminishing
that seeks to use information on danger and the spatial locations of survivability of the people with time is incorporated into the re-
people to inform the paths of UAVs through a disaster space, in order ward function.
to maximise the number of observations made of possible casualties. 2. We develop a novel decentralised algorithm that allows multiple
Currently, the state of the art for UAV path planning algorithms UAVs to coordinate to explore a large continuous disaster space
focus on areas such as target tracking for surveillance [5, 17] which, 3 Although this work is framed in terms of disaster response, the same co-
ordination algorithms could be applied to other UAV applications: such as
1 Agents Interactions and Complexity Group, University of Southampton, geophysical surveying, security operations, or ecological monitoring.
UK, email {cabb1g08, sdr, wtlt}@ecs.soton.ac.uk 4 An EPSRC funded collaboration between the University of Southampton
2 Department of Computing, Imperial College London, UK, email and the University of Sheffield: http : //www.mosaicproject.info
n.jennings@imperial.ac.uk 5 Information available at: http : //www.rescueglobal.org
1778 C.A.B. Baker et al. / Planning Search and Rescue Missions for UAV Teams
under spatial and temporal constraints. Our approach utilises a any given time); as well as a detailed model of the scenario spe-
belief map—a term referred to the mapping of spatial locations cifically geared towards UAV applications. This requires significant
onto some function that represents numerical data—to represent changes to the existing approach in [2] and [4], primarily in applying
the presence of people and danger in the environment, and to form sampling of the continuous action-space while still allowing UAVs to
the basis of the rewards calculated in the planning algorithm. coordinate with one another. Crucially, discrete approaches to MDP
3. Furthermore, in order to demonstrate the applicability of our scen- solutions require some form of recognition that a particular set of
ario to potential future disasters, we test and evaluate our approach actions has been entirely explored: clearly where a continuum of ac-
on real-world data (gathered from the 2010 Haiti earthquake) sim- tions exists this cannot be said to be true, since the number of in-
ulating a very large action space, showing consistent gains in sur- dividual actions available for selection is infinite. As such we have
vivor discovery of at least 7% compared to benchmarks, with had to adapt different approaches usually applied to planners deal-
higher gains of around 20% for scenarios with additional UAVs. ing with continuous spaces [11, 25] into the regime of coordinated
In the following sections we first outline the current state of the exploration.
art areas of research in autonomous path-planning and explora-
tion. Next, we introduce the specific formulation of the environment
model and UAV behaviour considered in our simulations, before in-
3 Scenario Model
troducing the continuous form of our coordinated Monte Carlo tree- The overarching aim of disaster response work is to minimise the loss
search algorithm. Following this, we present empirical evidence to to human life in the disaster area. Currently, we perceive there to be a
substantiate the benefits of this approach, before concluding and dis- lack of suitable environment models that fully characterise this prob-
cussing future work and applications we will explore. lem and express it in terms of sensing technology and UAV behaviour
that already exists. To this end, we introduce a novel formulation for
the exploration of a disaster situation where we require that UAVs fo-
2 Background cus on areas of perceived danger, but also where these regions inter-
In order to best use UAVs to aid responders in disasters they must be sect with likely occupation by people. The rationale here, is that data
able to plan paths autonomously, as a group, in a decentralised man- about a region containing known hazards (for example, high levels
ner. This is particularly important given the implications of a UAV of radiation or presence of fire) is only useful in preserving human
failing in the field: any centralised system that relies on a single UAV life when it is known or believed that there are likely to be persons in
(or other central coordinator) will fail entirely if that central point the vicinity of a hazard; or will be at some future time. We give the
fails; whereas a decentralised system can—in principle—continue to example of radiation as a possible manifestation of danger in a dis-
function as UAVs are removed. Furthermore, as we have already in- aster scenario. In principle, we can extend this to any phenomenon
dicated, it is beneficial to use prior information about the area to in- that is present over an extended area and represents some risk to hu-
form the flight paths of UAVs in order to maximise the likelihood man life. This general approach could then represent several types
of discovering survivors. Currently, work on path planning in robot- of hazard in a disaster area, such as flooding, chemical spills, or risk
ics focusses primarily on reaching goal locations and frequently for- from earthquake-damaged buildings. At this point, we consider the
mulates path planning as a control problem [14]. Conversely, in a location of such hazards as static. Such an assumption can be justi-
disaster scenario there need not be any final end-point to a UAV’s fied in scenarios with slow-changing conditions (relative to the time
path planning; rather the length of the exploration may be constrained taken by the UAVs to explore).
by—for example—battery life, and the number of people to be dis- We will now outline the formulations describing the state of the
covered must be maximised over the length of the path. Alternatively, environment, and the actions available to the UAVs in our simula-
much work has also been done to enable the use of vision algorithms tions.
and belief data to track mobile targets or map an area [20]. However,
this area of research often focusses on locating a known number of
3.1 Environment Model
targets, or covering a bounded space for the purposes of mapping.
These foci are not relevant in a scenario where an unknown num- In considering a model for the distribution in space of a number of
ber of people are distributed over an already-mapped (but very large) civilian casualties, we exploit the ubiquity of mobile phone owner-
area, as is frequently the case with displaced populations during and ship and assume the use of mobile phone signals as proxies for the
after disasters. presence of a person. As well as having precedent in previous use
Against this background, we find closer similarities with work on in disaster scenarios [15, 26], this has the specific advantage of al-
solving Markov Decision Processes (MDPs); specifically where loc- lowing identification of individual sources using unique identifiers
ality of UAVs can be used to reduce calculation overheads. In par- associated with each handset. While a priori knowledge of the num-
ticular, the generality of MDP formulations suits our construction ber of victims in a given area might be unavailable, first responders
of a simulation environment where we are provided with numerical can maintain a belief distribution over unobserved victims while also
data used for a belief map; and MDPs in general have a number of attempting to isolate signals that have been observed, in order to re-
well-explored algorithmic solutions available. Specifically, work by duce the uncertainty in the location of victims.
[2] utilises factored tree-searches for partially observable MDP solu- As a result, we explicitly envisage a scenario where UAVs are
tions; exploiting problem structure to allow factorisation in a way equipped with some form of detector capable of providing a (noisy)
reflective of local state spaces and interactions. In a similar way, estimate of the range of individual unique phone signals. Specific-
we use factored trees in this paper to represent the available ac- ally, we seek to localise the expected position of victims in order to
tions of UAVs in a disaster environment, factoring the value of loc- reduce the time taken by search and rescue teams to find (and sub-
ating people between UAVs within spatial proximity of each other. sequently rescue) them. In more detail, we associate the uncertainty
However, our work deals with a continuous state-space (in this case of a person’s location with time taken to search the area for that per-
representing the continuous range of actions available to a UAV at son. By moving the detectors around the space, the expected location
C.A.B. Baker et al. / Planning Search and Rescue Missions for UAV Teams 1779
value can be determined with higher precision; effectively reducing 3.2 UAV Behaviour Formulation
the area to be covered (and thus the time taken) by rescue services
from a large initial area to a much smaller location. We consider simple UAV flight dynamics—including minimal con-
We consider a search area containing a number of signals s ∈ S straints on performance—since the focus of our work is on plan-
indicating the presence of people in some danger (mapped spatially ning rather than constraint optimisation, and because restrictions
by a two dimensional scalar function D : R2 → [0, 1]), correspond- on UAV behaviour can be included in subsequent iterations of the
ing to their expected likelihood of dying within the next time-step t. model as constraints on the reward function. Thus the set of UAVs
The reward we gain R is related to the number of people we hope to U = {u1 , . . . , um } traverse the space in iterations of a fixed distance
observe, their likelihood of survival, and a discovery time t indicating δ per time-step t (i.e. at fixed speeds and altitudes); with a continu-
how long it would take to rescue any victim: ous domain of available angles available to determine the direction of
the next action. The action vectors enabling UAV uk to move at the

R= pi · (1 − di )t next time step is defined as ak = (akα , akβ , akγ , . . .) where each
pi ,di ∀si ∈S Greek index can be interpreted as an angle between 0o and 360o .
In theory the cardinality of ak is infinite, but as detailed below we
where di ∈ D and pi ∈ P represents the expected number of people use continuous space tree-search methods to restrict our search to fi-
for a given signal si ∈ S. Each signal si is mutually distinguish- nite subsets. Each UAV selects a sequence of actions to produce its
able from other signals, and the magnitude of each can be sensed trajectory Tk = [ak (t = 1), ak (2), . . . , ak (tend )] (for a trajectory
by UAVs within a set radius. In the first instance we assume a re- that ends at time tend ); which together form the set of all trajectories
latively flat prior belief of victim position, implying a long time to T = {T1 , . . . , Tm }. Thus the collective goal of the UAVs is to plan
locate an individual. However, with a set of observations (O) of—for a set of trajectories to satisfy: T∗ = arg maxR R(T).
example—the strength of a mobile phone signal some estimate can
be made of the location of a person; effectively reducing the time to
locate them. We denote this using a time to find parameter tf that 4 The Coordinated Continuous Monte Carlo
decreases linearly with the estimated area of the location of an indi- Tree-Search Algorithm
vidual phone signal. In choosing Monte-Carlo tree search as the basis for our solution, we
As such, at any given time our reward is then: note its ability to sample very quickly from large state spaces (tra-
ditionally used in solving games), and the flexibility with which it
pi · (1 − di )tf (O) can be applied to general problems [6]. To do this we exploit locality
pi ,di ∀si ∈S
between UAVs to factor the search space into local joint-action trees.
whereas projecting reward to any arbitrary time in the future we have Furthermore, we allow trees to coordinate over shared factors (that
for a series of observations at time t: is, UAVs in multiple trees) using the max-sum algorithm [23] by ex-
changing messages to express the local reward gained by UAVs tak-
Rknown = pi · (1 − di )tf (Ot )+t ing particular actions at future times. In other words, when selecting
pi ,di ∀si ∈S which node in a tree to expand the individual trees are coordinated
By collecting information on signal sources we can use a popula- over their shared UAVs to select mutually nonconflicting actions that
tion Monte-Carlo (PMC) [7] to model the likely locations of a person, are maximally beneficial to both trees.
which increases in precision as more measurements are taken, expli- At this stage, we design the algorithm to plan on-line and re-
citly reducing tf . Thus reward is fundamentally a function related to calculate the optimum action at each time-step. In this way we have
the danger at a given location believed to contain a victim, and the built-in robustness to temporal changes in the map (as well as not
precision with which the location of that person is known. Planning requiring a priori knowledge of future coordination requirements).
can be performed by simulating the result of measurements on the We currently run simulations in a centralised fashion—insofar as
probability distribution for each person and extrapolating the effect they are performed on a single computer—but with allowances for
on tf in each instance. multiple parallel threads representing the different individual calcula-
To account for signals we have yet to observe, we include a term tions for each portion of the factored utility. In addition, we note that
for the expected number of victims outside of the range of obser- the nature of the max-sum coordination is such that UAVs are not
vations. In principle the search area for such victims would be the required to have perfect information: it is sufficient that they know
entirety of the area over which observations have yet to be recor- their local utility and are able to share this with local neighbours.
ded, since the positions of the victims are known with no localisation Specifically, we introduce an additional step to the standard MCTS
whatsoever. Thus for a continuous distribution of expected people process of tree growth. This growth is typically summarised: node
(pe ): selection, expansion, rollout or simulation, and backpropagation [6,
ˆ 16]. Most significantly, we modify the selection process to determine
which node to expand by coordinating in parallel between trees via
Runknown = pe (x, y) · (1 − d(x, y))tf dxdy
max-sum. We detail our approach in the following subsections.
The global reward function at time t then simply becomes the sum
of the two components: 4.1 Tree Construction
At each timestep in the simulation, the coordinated MCTS (Co-
Rtotal = Rknown + Runknown (1) MCTS) algorithm begins by calculating which UAVs require co-
It is worth noting that since the formulation of reward from the ordination with their neighbours, leading to the form of the UAV-
population Monte-Carlo simulation is essentially separate from the based factor graph constructed in the joint-action creation function
tree-search function outlined below, the two are (in general cases) J (Line 3), detailed fully in Algorithm 1. This is performed to estab-
separately applicable. lish whether coordination is needed in a given UAV’s locality: where
a UAV is spatially isolated from neighbouring UAVs, a local tree is corresponding to a given n(k) . This is then used as the argument to
grown. The resulting groups of UAVs will form the basis of the factor create the new expansion to a node in Nnext in Line 16.
graph used in the max-sum calculation (Line 13 in Algorithm 1). The
result of J is represented formally by a set N = {n1 , . . . , nf } that Algorithm 1 Coordinated MCTS
represents the domain of the factor nodes to be coordinated. Specific- CoM CT S (U, D, t = 0)
ally, each member of N contains a set of actions corresponding to a 1. for t in [1, . . . , tend ]
group of UAVs that require coordination. In this case, the correspond- 2. //Creation of factor graphs given UAV locations//
3. N ← J (U )
ing element in N —say ni —would be the set of actions available to 4. Nr ← ∅
these UAVs: ni = {a1 , a2 , . . . , ak }. Trees are grown for each ni in 5. for ni in N
(0)
N , each of which in turn represents the factors in the max-sum graph 6. append(Nr ) ← ni
connected to the variables representing the available actions of the 7. endfor
8. for eachstep in [1, . . . , Δ]
UAVs. In more detail, trees are grown for each joint-utility between 9. Nnext ← Nr
interacting UAVs, and coordination between trees is performed when 10. Nprev ← ∅
trees share access to a given UAV. Individual nodes in the tree ni will 11. while Nnext = ∅
(k) 12. //Max-sum coordination returns best actions for shared factors//
be indicated as ni , or from any arbitrary tree by n(k) . 13. a∗ ← maxsum (N, Nnext )
14. for n(k) in Nnext
(k)
15. //Actions relevant to n
selected//
16. n(k)
new ← expand n (k)
, select n(k) , a∗

17. if expansions n(k) ≥ K

18. remove n(k) , Nnext

19. append n(k) , Nprev
20. endif
21. endfor
22. endwhile
23. for n(k) in Nprev
24. //Simulation (rollout) and backpropagation of results//
25. rollout(n(k)
new )
26. backpropagate(n(k)new )
Figure 1. An example of four UAVs (u1−4 ) interacting via a max-sum
factor graph. Trees are grown for n1 and n2 .
27. endfor
28. endfor
29. for ni in N
30. a∗i = (bestactions (ni ))
An example max-sum factor graph is represented in Figure 1. 31. endfor
32. t←t+1
Here, four UAVs may interact at the next time-step in the following 33. endfor
sub-sets: {u1 , u2 , u3 } and {u3 , u4 }. As such, the algorithm main-
tains two joint trees for these sets, represented as the utility nodes n1 Although not explicitly tested, we note that this approach ensures
and n2 , which must coordinate over the action of the UAV common communications between UAVs need not be excessive. We note ex-
to both: u3 . Framing this in terms of the action selected as a result isting literature has shown at-length that max-sum in particular is
of the tree-search, the coordination serves to ensure the two trees se- robust to low bandwidth and irregular message-passing [12, 13, 23].
lect an action for a3 that is both mutually beneficial to both factored In practice we envisage that where UAVs share a tree a single UAV
reward functions, and also the same; since a3 can only take a single will handle the growth and planning of the joint actions.
action at the next time step, the two trees must “agree” on what this
action is.
4.3 Rollout and Backpropagation
4.2 Node Selection and Expansion The rollout portion of the MCTS (line 25 in Algorithm 1) is tradi-
tionally a coarse estimate of the affect of future actions as the result
Algorithm 1 begins with the creation of the root nodes representative of exploring a particular node in the action space, although some
of each factor seen in Line 6, which are recorded in the set Nr (Line more recent work has focussed on principled simulations using ex-
4). Following this, the creation and growth of branches is performed isting MCTS techniques [3]. In this example, we base the rollout on
Δ times inside the loop beginning at Line 8. This begins by exploring a random-walk through the action space starting at the node just ex-
down each tree, starting from the root node, to determine which node panded, biased in the direction of the last action taken. This method
to branch on next. Node selection is performed in accordance with has the benefit of showing not just the contribution of any random
standard progressive widening [11, 25], where we sample randomly series of actions, but of taking more actions similar to the one rep-
from the action set of a node up to a limit of K actions per node, resented by the frontier node (for each UAV). Intuitively, a purely
with K defined as in [10] to be a parameter of constants C > 0 and random rollout from one node in a joint action tree will be insigni-
α ∈ (0, 1) and the time of simulation t: K = Ctα . ficantly different from a rollout from any similar node. Conversely,
Line 9 introduces the current set of nodes (across all trees) to our rollout policy contributes to the exploration value of a node by
be expanded next, Nnext , and Line 10 creates the set of previ- indicating possible future reward through continued tree expansion
ously expanded nodes Nprev . At Line 13 the max-sum algorithm with a preference for repetitions of the action itself.
is used to maximise the value of rewards (as per Equation 1) the ac- Finally the rewards calculated at the leaf nodes are backpropag-
tions over each ni , returning a vector of favourable actions a∗ = ated up the tree towards the root by iteratively updating cumulative
(a∗1 , a∗2 , . . . , a∗m | a∗k ∈ ak ). Since each ni depends on a subset of average rewards for each upstream node (Line 26). This is unchanged
actions, the function select(n(k) , a∗ ) serves to return only the actions from classic MCTS.
C.A.B. Baker et al. / Planning Search and Rescue Missions for UAV Teams 1781
5 Empirical Evaluation in one timestep of one second. We typically simulate UAV searches
over time horizons of tend = 1000.
To verify the performance of our algorithm on data relevant to real- The performance metric used is the percentage reduction in the
world disaster scenarios, we used data from the Ushahidi project time for total cumulative discovery time tf (averaged over the num-
[21] produced from crowd-sourced information during the 2010 Haiti ber of UAVs) since it best reflects the ability of the UAV search to
earthquake to generate D and S.6 This dataset was selected as it pinpoint victims for rescue. We benchmark against a similarly co-
represents one of the largest available sources of information about ordinated—but discrete—MCTS implementation, where the action
spatial distributions of people and damage to buildings from any re- space of the UAVs is restricted to moving between the cells forming
cent natural disaster. Furthermore, with increased interest in simple the danger-function environment. This scenario poses similar chal-
systems that allow crowds to provide data using their phones very lenges of coordination in large action spaces but benefits from exist-
quickly after a disaster takes place, the prevalence of such data- ing work that deals with factored finite-space tree-search [2], and has
sets will invariably increase in future; further underscoring the im- already shown its efficacy in planning over the Ushahidi dataset [4].
portance of testing our algorithm on this type of data. Specifically,
we extracted the level of damage and coordinates of buildings in a
2km square centred on the capital, Port-au-Prince. Damage was rated
based on crowd reports on a scale from 1 to 5, with 5 being most
severe.
Figure 3. Result of randomised starting position tests for each of the

continuous coordinated MCTS, discretised (cellular) coordinated MCTS and
a simple lawnmower sweep-search; performed 106 times. Results indicate
reduction in tf averaged over the four UAVs in the scenario.
An initial simulation with four UAVs in randomised start loca-

tions on the map shown in Figure 2 is shown performing against a
Figure 2. Danger D as a function of position, created from Ushahidi discretised coordinated MCTS implementation (as described in [4])
dataset centred over Port-au-Prince. Dimensions of 2km along each side. and a simple lawnmower-style sweep search over the area for com-
parison. This shows a gain of around ~7% over a discretised search
space (Figure 3). We note that computation time of tree growth on
We then constructed a decomposed grid world of size 200 × 200 the order of hundreds of nodes typically took less than half a second,
of 10m × 10m cells, to form the basis of the danger function D. demonstrating that computational complexity is not excessive.
Since damaged buildings represent an estimate of the damage in an Furthermore, we are able to demonstrate the consistency of our ap-
area and thus, the danger to the victims on the ground, we formed a proach on addition of further UAVs to the simulation. Intuitively the
belief map of danger to the populace by summing the total number reward gained by each UAV in a well-coordinated algorithm should
of buildings above a threshold level (set to a crowd report of damage suffer fewer diminishing returns when adding more to the scenario.
3 and above) in each cell, before multiplying by a common factor to In detail, this is because any additional UAVs should still localise
convert the data into a map representative of expected fatalities (not- close-to the same number of people as other UAVs in the environ-
ing the constraint imposed by the domain of D). The environment is ment, if they coordinate the exploration task effectively as a group.
displayed in Figure 2 with a scale showing the value of d in each loc- If they do not, one would expect additional UAVs would explore the
ation. In calculating values for use in the reward functions, the value same regions of the disaster space as those already present: which,
of danger is based on the mean expected position of the signal based as discussed previously, offers negligible improvements to the global
on the data collected. Where the spatial location is not yet clearly es- reward function when compared to localising previously un-seen cas-
tablished we have found that empirically the change in spatial danger ualties. We demonstrate in Figure 4 that additional UAVs results in
was smooth enough that nearby values tended to be close to the final a slower decrease in observed reward than in the discretised action
estimated value of di in most cases. space. Most notably at 5 UAVs the difference in performance per-
Assuming a UAV speed—typical of quad rotor vehicles—of vehicle is approximately 18% in favour of the continuous algorithm.
10ms−1 amounts to the traversal of one action of moving δ = 10m This benefit is due to the continuum of actions available being less
restrictive than in a cellular decomposition of the search area; allow-
6 Available from http : //www.ushahidi.com ing more effective coordination.
[7] O. Cappé, A. Guillin, J. M. Marin, and C. P. Robert, ‘Population Monte

Carlo’, Journal of Computational and Graphical Statistics, 13(4), 907–
929, (2004).
[8] M. Cashmore, M. Fox, T. Larkworthy, D. Long, and D. Magazzeni,
‘AUV Mission Control via Temporal Planning’, in 2014 IEEE Interna-
tional Conference on Robotics and Automation, pp. 6535–6541, Hong
Kong, China, (2014). IEEE.
[9] Y.-b. Chen, G.-c. Luo, Y.-s. Mei, J.-q. Yu, and X.-l. Su, ‘UAV path plan-
ning using artificial potential field method updated by optimal control
theory’, International Journal of Systems Science, (October), 1–14, (jun
2014).
[10] A. Couëtoux, J.-B. Hoock, N. Sokolovska, O. Teytaud, and N. Bonnard,
‘Continuous Upper Confidence Trees’, in Proceedings of the 5th In-
ternational Conference on Learning and Intelligent Optimization, ed.,
C. A. Coello-Coello, number 5, pp. 433–445, Rome, Italy, (2011).
Springer.
[11] R. Coulom, ‘Efficient Selectivity and Backup Operators in Monte-Carlo
Tree Search’, in 5th International Conference on Computer and Games,
eds., P. Ciancarni and H. J. V. D. Herik, volume 4630, pp. 72–83, Turin,
Italy, (2006).
Figure 4. Comparison of continuous and discrete coordinated MCTS in a [12] F. M. Delle Fave, A. Farinelli, A. Rogers, and N. R. Jennings, ‘A Meth-
tend = 1000 simulation of varying numbers of UAVs. The continuous odology for Deploying the Max-Sum Algorithm and a Case Study on
space approach is not only better than the discretised approximation, it is Unmanned Aerial Vehicles’, in The Twenty-Fourth Innovative Applic-
more consistent in its reward per-UAV added to the scenario. Results here ations of Artificial Intelligence Conference, pp. 2275–2280, Toronto,
are averaged per-UAV in the simulation. (2012). AAAI.
[13] A. Farinelli, A. Rogers, and N. R. Jennings, ‘Agent-based decentral-
ised coordination for sensor networks using the max-sum algorithm’,
6 Conclusions in Autonomous Agents and Multi-Agent Systems, volume 28, pp. 337–
380, (2014).
Motivated by increased availability of belief-data about disaster en- [14] C. Goerzen, Z. Kong, and B. Mettler, ‘A Survey of Motion Planning Al-
vironments, we have introduced an implementation of a decentral- gorithms from the Perspective of Autonomous UAV Guidance’, Journal
ised, factored, coordinated Monte Carlo tree search algorithm for the of Intelligent and Robotic Systems, 57(1-4), 65–100, (nov 2009).
[15] A. Goetz, S. Zorn, R. Rose, G. Fischer, and R. Weigel, ‘A Time Dif-
purpose of discovering survivors in a simulated UAV path planning ference of Arrival System Architecture for GSM Mobile Phone Loc-
scenario. Tests were carried out on real-world data from the 2010 alization in Search and Rescue Scenarios’, in Positioning Navigation
Haiti earthquake via the Ushahidi platform; an environment with a and Communication (WPNC), 2011 8th Workshop on., pp. 1–4. IEEE,
continuous action space over a large area. We demonstrated the cap- (2011).
[16] L. Kocsis and C. Szepesvari, ‘Bandit based Monte-Carlo Planning’,
ability of our Co-CMCTS algorithm in sampling this space and plan-
Machine Learning: ECML 2006, 4212, 282–293, (2006).
ning paths, and demonstrated consistent performance gains over a [17] A. Kolling and A. Kleiner, ‘Multi-UAV Motion Planning for Guaran-
discretised algorithm in the number of survivors discovered of up teed Search’, in Autonomous Agents and Multiagent Systems, pp. 79–
to 18%. Future work will seek to extend these solutions to differ- 86, (2013).
ent densities of survivors, time-varying belief maps, and—as part [18] M. Kothari and I. Postlethwaite, ‘A Probabilistically Robust Path Plan-
ning Algorithm for UAVs Using Rapidly-Exploring Random Trees’,
of ongoing collaborative efforts—will attempt field-trials of the al- Journal of Intelligent and Robotic Systems, 71(2), 231–253, (sep 2012).
gorithms proposed above on real-world platforms to further demon- [19] M. Kothari, I. Postlethwaite, and D.-W. Gu, ‘Multi-UAV Path Plan-
strate the efficacy and real-world applicability of our contributions. ning in Obstacle Rich Environments Using Rapidly-exploring Random
Trees’, in Proceedings of the 48th IEEE Conference on Decision and
Control (CDC) held jointly with 2009 28th Chinese Control Confer-
References ence, pp. 3069–3074, Shanghai, (dec 2009). IEEE.
[20] Y. C. Liu and Q. H. Dai, ‘A Survey of Computer Vision Applied in
[1] S. M. Adams and C. J. Friedland, ‘A Survey of Unmanned Aerial Aerial Robotic Vehicles Yu-chi’, in OPEE 2010 - 2010 International
Vehicle (UAV) Usage for Imagery Collection in Disaster Research Conference on Optics, Photonics and Energy Engineering, number 201,
and Management’, in Proceedings of the Ninth International Workshop pp. 277–280, Wuhan, China, (2010). IEEE.
on Remote Sensing for Disaster Response, volume 9, Stanford, MA, [21] N. Morrow, N. Mock, A. Papendieck, and N. Kocmich, ‘Independent
(2012). Evaluation of the Ushahidi Haiti Project’, Technical report, Ushahidi,
[2] C. Amato and F. A. Oliehoek, ‘Scalable Planning and Learning for Mul- (2011).
tiagent POMDPs’, in Proceedings of the Twenty-Ninth AAAI Confer- [22] R. R. Murphy, ‘A Decade of Rescue Robots’, in IEEE/RSJ Interna-
ence on Artificial Intelligence, pp. 1995–2002, Austin, Texas, (2015). tional Conference on Intelligent Robots and Systems, pp. 5448–5449,
AAAI. Vilamoura, Portugal, (2012). IEEE.
[3] H. Baier and M. H. M. Winands, ‘Nested Monte-Carlo Tree Search for [23] A. Rogers, A. Farinelli, R. Stranders, and N. R. Jennings, ‘Bounded
Online Planning in Large MDPs’, in European Conference on Artificial approximate decentralised coordination via the max-sum algorithm’,
Intelligence (ECAI), volume 242, pp. 109–114. IOS, (2012). Artificial Intelligence, 175(2), 730–759, (2011).
[4] C. A. B. Baker, S. D. Ramchurn, W. L. Teacy, and N. R. Jennings, [24] United Nations Foundation, ‘Disaster relief 2.0’, Technical report,
‘Planning Search and Rescue Missions for Unmanned Aerial Vehicle United Nations, (2011).
Teams’, in ICAPS Proceedings of the 4th Workshop on Distributed and [25] Y. Wang, J.-Y. Audibert, and R. Munos, ‘Algorithms for Infinitely
Multi-Agent Planning (DMAP-2016), London, (2016). AAAI. Many-Armed Bandits’, in Advances in Neural Information Processing
[5] S. Bernardini, M. Fox, and D. Long, ‘Planning the Behaviour of Low- Systems, eds., D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou,
Cost Quadcopters for Surveillance Missions’, in Proc. of 24th Int. Con- 1729–1736, NIPS, 21 edn., (2008).
ference on Automated Planning and Scheduling, pp. 445–453, Ports- [26] S. Zorn, R. Rose, A. Goetz, and R. Weigel, ‘A Novel Technique for Mo-
mouth, NH, (2014). AAAI. bile Phone Localization for Search and Rescue Applications’, in 2010
[6] C. B. Browne, E. J. Powley, D. Whitehouse, S. M. Lucas, P. I. Cowling, International Conference on Indoor Positioning and Indoor Naviga-
P. Rohlfshagen, S. Tavener, D. Perez, S. Samothrakis, and S. Colton, ‘A tion, number September, pp. 15–17, Zurich, Switzerland, (2010). IEEE.
Survey of Monte Carlo Tree Search Methods’, Transactions on Com-
putational Intelligence and AI in Games, 4(1), 1–43, (2012).

Faia285 1777

Uploaded by

Copyright:

Available Formats

Faia285 1777

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Faia285 1777

Uploaded by

Copyright:

Available Formats

ECAI 2016 1777

G.A. Kaminka et al. (Eds.)

Planning Search and Rescue Missions for UAV Teams

Figure 3. Result of randomised starting position tests for each of the

An initial simulation with four UAVs in randomised start loca-

[7] O. Cappé, A. Guillin, J. M. Marin, and C. P. Robert, ‘Population Monte

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.