0% found this document useful (0 votes)
78 views7 pages

Robertson Watson INISTA15

This document discusses building behavior trees from observations of expert players in real-time strategy games. It presents a novel approach using motif-finding techniques to discover recurring action sequences in expert gameplay observations. These recurring action sequences are then used to automatically produce a behavior tree to represent the expert's behavior without any additional domain information. This technique is applied to build a behavior tree for strategic actions in the real-time strategy game StarCraft based only on observations of expert player actions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views7 pages

Robertson Watson INISTA15

This document discusses building behavior trees from observations of expert players in real-time strategy games. It presents a novel approach using motif-finding techniques to discover recurring action sequences in expert gameplay observations. These recurring action sequences are then used to automatically produce a behavior tree to represent the expert's behavior without any additional domain information. This technique is applied to build a behavior tree for strategic actions in the real-time strategy game StarCraft based only on observations of expert player actions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Building Behavior Trees from Observations in

Real-Time Strategy Games

Glen Robertson Ian Watson


Department of Computer Science Department of Computer Science
University of Auckland University of Auckland
Auckland, New Zealand, 1010 Auckland, New Zealand, 1010
Email: glen@cs.auckland.ac.nz Email: ian@cs.auckland.ac.nz

Abstract—This paper presents a novel use of motif-finding to customise, and are sufficient to produce the desired behavior
techniques from computational biology to find recurring action [4], [5].
sequences across many observations of expert humans carrying
out a complex task. Information about recurring action sequences Since their introduction by the video game industry in 2005
is used to produce a behavior tree without any additional domain [6], Behavior Trees (BTs) have become increasingly common
information besides a simple similarity metric – no action models in the industry for encoding agent behavior [4], [7], [8]. They
or reward functions are provided. This technique is applied to have been used in major published games [6] and they are
produce a behavior tree for strategic-level actions in the real-time supported by major game engines such as Unity1 , Unreal
strategy game StarCraft. The behavior tree was able to represent Engine2 , and CryEngine3 . BTs are hierarchical goal-oriented
and summarise a large amount of information from the expert structures that appear somewhat similar to Hierarchical Task
behavior examples much more compactly. The method could still
Networks (HTNs), but instead of being used to dynamically
be improved by discovering reactive actions present in the expert
behavior and encoding these in the behavior tree. generate plans, BTs are static structures used to store and
execute plans [4], [9]. This is a vital advantage for game
designers because it allows them fine control over agent behav-
I. I NTRODUCTION ior by editing the BT, while still allowing complex behavior
An ongoing challenge in Artificial Intelligence (AI) is to and behavior reuse through the hierarchical structure [4], [9].
create problem-solving agents that are able to carry out some Although they have fixed structure, BTs produce reactive
task by selecting a series of appropriate actions to get from a behaviour by the interaction of conditional checks and success
starting state to achieve a goal – the field of planning. Ideally and failure propagation within the hierarchy. Various types of
these agents would be able to be applied to the many practical nodes (discussed further in section V) can be composed to
problems that require a sequence of actions in order to carry produce parallel or sequential behavior, or choose amongst
out a task, such as robotic automation, game playing, and different possible behaviors based on the situation [9].
autonomous vehicles. However, applying a classical planning We are creating a system able to automatically learn
agent to a new domain typically requires significant knowl- domain knowledge from examples of expert behavior, with
edge engineering effort [1]. It would be preferable if domain few assumptions about the domain, and be able to quickly
knowledge could be learned automatically from examples, react to changes in state during execution. This would combine
but current automated planning systems capable of learning some of the benefits of learning systems in automated planning
domain knowledge are generally designed to operate under and case-based planning. Instead of learning a set of planning
strong assumptions that do not hold in complex domains [2]. operators, we aim to automatically learn to carry out a single
Conversely, case-based planning systems capable of acquiring complex task within a domain, creating a less-flexible but still
domain knowledge can make few assumptions about the do- widely applicable planning system. The learned knowledge
main, but can have difficulty reacting to failures or exogenous will be represented and acted upon in the form of a BT, which
events [3]. is ideal for a single task within a domain. Furthermore, the
resulting BT is able to be hand-customised, so this approach
In many potential application areas, a planner capable of
could be used as an initial step, followed by human refinement,
transitioning from any starting state to any goal state is not
in the process of defining new behavior for an agent.
actually required, and instead it is sufficient or even desirable
to have an agent capable of robustly carrying out a specific task In the remainder of this paper we start by outlining related
or behavior. For example, in game playing, there is usually a work to automatically learning planning knowledge in the form
very similar starting state and goal for each match or activity of HTNs, case-based planners, and BTs. We concretely define
within a game – in board games this is the starting board the challenging problem of learning a task from observations
layout and object of the game, and in video games this could
1 Unity — Behavior Designer:
be the starting and win conditions of a match or the daily
https://www.assetstore.unity3d.com/en/#!/content/15277
activities of a non-player character. In the genre of real-time 2 Unreal Engine — Behavior Trees:
strategy games, video game industry developers tend to use https://docs.unrealengine.com/latest/INT/Engine/AI/BehaviorTrees/
scripting and finite state machines instead of more complex 3 CryEngine — Modular Behavior Tree:
approaches because those techniques are well-understood, easy http://docs.cryengine.com/display/SDKDOC4/Modular+Behavior+Tree
of expert behaviour, and outline the domain of the Real-Time phase, and a human operator with domain knowledge in the
Strategy (RTS) game StarCraft as our motivating example. interaction phase.
We then present our approach to the first part of the learning
system: using a motif-finding technique to find and collapse Instead of focusing on learning from examples, some work
repeated patterns of actions. We present some results from the has instead used genetic algorithms to evolve BTs in an
current system and discuss its limitations. Finally we discuss exploratory process [27], [28]. These approaches hold promise
potential future directions and conclude the paper. but can become prohibitively computationally expensive for
complex domains. They also require the addition of a fitness
function for evaluating evolved BTs. To the best of the authors’
II. R ELATED W ORK knowledge, no prior work has investigated automatically build-
Early automated planning systems such as STRIPS [10] ing BTs from examples of expert behavior.
made strong assumptions about the domain in order to operate,
Probably the most closely related work to ours involves
such as a fully observable, deterministic world that changes
automatically learning domain-specific planners from example
only due to agent actions, and actions that are sequential and
plans [23]. These domain-specific planners are static structures
instantaneous, with known preconditions and effects. More
for solving specific planning problems, and are made up of
recent work has aimed to make planning more practically
programming components such as loops and conditionals,
applicable by automatically learning action models [1], [11]–
combined with planning operators. The system is provided
[13], task models [14], [15], or hierarchical structure [16],
with accurate action models in order to build the plans, and
[17]. Some work also expands the applicability of planners
implicitly assumes fully observable, deterministic domains.
by relaxing assumptions from classical planning, addressing
learning with nondeterminism [18], [19], partial observability
[13], [20]–[22], or durative actions [2]. Almost all of this work III. P ROBLEM
on learning in automated planning learns by observing plan
executions, including observations of the world state, as carried We propose a problem definition that relaxes the assump-
out by an external expert, allowing the learner to get a good tions of the classical planning restricted model (as defined
coverage of the common cases in what could be a huge (or in [29]) in order to more closely reflect the real world. In
infinite) space of possible actions and observations. All of this this problem there are a potentially infinite number of states,
work still requires strong assumptions about the domain, or which may be partial observations of the complete system. The
domain knowledge provided, usually in the form of accurate system may be nondeterministic and may change without agent
action models. actions. Actions may occur in parallel, may have a duration,
and need not occur at fixed intervals. A policy is learned
An alternative approach to learning from examples of instead of action models or a plan library, in order to allow
expert behavior is case-based planning, which finds solu- robust reactive behaviour in a dynamic environment without
tion action sequences by retrieving and adapting previously- expensive replanning [15].
encountered solutions to similar problems. Unlike learning
in automated planning, which focuses on acquiring logical However, we do restrict the problem to learning to carry
building blocks for the planner, case-based planners learn out a single task or achieve a single goal that is being carried
associations between initial states and partial or complete plans out in the examples, instead of the more general automated
as solutions. Case-based planners can operate with very little planning requirement of being able to form a plan for any
domain knowledge and few assumptions about the domain, specified goal. This reduces the burden on the learner so that
but because they do most of the processing at runtime (for it is not forced to depend upon accurate action models for these
the retrieval and adaptation parts of the case-based planning complex domains. Thus, we define the problem of learning a
process), they can have efficiency issues when problems have single task by observation:
large case bases or time constraints [23]. Case-based plan-
ning also face difficulty in adapting solutions for particular Given a set of examples of experts carrying out a single
circumstances – long solutions may react slowly to unexpected high-level task, {E1 , E2 , . . . , En }
outcomes during execution, while short solutions may react Where an example is a sequence of cases ordered
excessively to small differences in state or have difficulty by time, Ei = (Ci1 , Ci2 , . . . , Cim )
reasoning about action ordering [3], [24]. A possible remedy to Where a case is an observation and action pair,
these issues is to introduce conditional checks and hierarchical Cij = (Oij , Aij )
structure into cases [3], [23], [24]. Where an observation and an action are arbitrary
Other work has examined building probabilistic behavior information available to the agent, (eg. a
models using Hidden Markov Models [25] and Bayesian key-value mapping)
Models [26] from examples. These approaches require very Given a similarity metric between pairs of observations
little domain knowledge and are capable of recognising or and pairs of actions, M (Oij , Okl ) ∈ [0, 1] and
predicting plans. However, they are not designed to be used M (Aij , Akl ) ∈ [0, 1]
for creating plans – their predictions could be extrapolated Find a policy that will decide the next action given
into a plan but this would likely lead to increasing error and previous cases and the current observations,
cyclic behavior. There are also task-learning methods based on π((Ci1 , Ci2 , . . . , Cij−1 ), Oij ) → Aij
explanation based learning [15], in which the agent explores This policy should be able to reproduce the input
the domain but also interacts with a human teacher in order action sequences and generalise well to unseen action
to learn. This requires an action model for the exploration sequences
This policy should have low run-time cost for select- 1
ing actions so that it is applicable for embedded or Selector
real-time applications
Note that no information is given about the preconditions or
effects of actions, or any conceptual reasoning or task structure 2 5
behind groups of cases. There is also limited information about Sequence Parallel Action
failure, as possible actions considered but unused by experts 3
will not be observed, and subsequences of actions which had a
negative outcome are observed just like other actions. Experts *
Action Decorator: Action Action Action
are assumed to have made appropriate actions, but there may Return Failure
not be one optimal action for a given situation.
4 * 6 6
IV. T HE DOMAIN
Fig. 1. An example BT, showing the order in which each node would
The domain motivating our problem is the Real-Time Strat- execute. Asterisks indicate nodes which are not executed. Execution begins at
egy (RTS) video game StarCraft. RTS games are essentially the root selector node. Next the sequence node begins execution – assuming
the leftmost child is selected first – and executes its children until a failure
a simplified military simulation, in which players indirectly is returned by the decorator node. The sequence node returns a failure and
control many units to gather resources, build infrastructure and the selector node executes its next child. The parallel node executes both
armies, and manage units in battle against an opponent. RTS children simultaneously and successfully returns, allowing the selector to
games present some of the toughest challenges for AI agents, return successfully.
making it a difficult area for developing competent AI [30]. It
is a particularly attractive area for AI research because of how
human players can quickly become adept at dealing with the node types that control the flow of execution in a BT: sequence,
complexity of the game, with experienced humans outplaying selector, parallel, and decorator nodes [7]. Each node type has
even the best agents from both industry and academia [31]. a different effect on the execution of its children, and respond
differently to failures reported by their children. Sequence
StarCraft is a very popular RTS game which has recently nodes run their children in sequence, and usually return with
been increasingly used as a platform for AI research [5]. Due to a failure status if any of their children fail. Selector nodes
the popularity of StarCraft, there are many expert players avail- run their children in a priority order, switching to the next
able to provide knowledge and examples of play, and it also child if one of their children fails, and usually return with a
has the advantage of the Brood War Application Programming success status if any of their children succeed. Selector nodes
Interface (BWAPI), which provides a way for external code to may alternatively be set to cancel the execution of a child
programmatically query the game state and execute actions if a higher-priority child becomes executable. Parallel nodes
as if they were a player in a match. In terms of complexity, run all their children in parallel, and usually return with a
StarCraft has real-time constraints, hidden information, minor success status if a certain number of their children succeed, or
nondeterminism, long-term goals, multiple levels of abstraction a failure status if a certain number of their children fail. Finally,
and reasoning, a vast space of actions and game states, durative decorator nodes add extra modifiers or logical conditions to
actions, and long-term action effects [3], [30]–[32]. In order to other nodes, for example always returning a success status,
make the domain slightly more manageable, we have chosen to or executing only when it has not run before. The specific
deal with only the strategic-level actions: build, train, morph, behavior and even types of nodes can vary depending on the
research, and upgrade actions. We also assume that only needs of the user.
successfully executed actions are shown, not all inputs from
the human, because in the game of StarCraft most professional
players very rapidly repeat action inputs until they are executed
VI. M ETHOD
in order to make actions execute as soon as possible.
The first stage in being able to build BTs is to be able
V. B EHAVIOR T REES to locate areas of commonality within action sequences, as
As mentioned earlier in the paper, Behavior Trees (BTs) are these likely represent common or repeated sub-behaviors. The
being used to represent and act upon the knowledge learned overall method for creating the behavior tree is an iterative
by our system, so this section provides a short overview of process as follows (Fig. 2). First, a maximally specific BT is
BTs. BTs have a hierarchical structure in which top levels created from the given example case sequences. The BT is then
generally represent abstract tasks, and subtrees represent dif- iteratively reduced in size by finding and combining common
ferent subtasks and behavior for achieving each task. Deeper patterns of actions. When no new satisfactory patterns are
subtrees represent increasingly-specific behaviors, and leaf found, the process stops. By merging similar action patterns,
nodes represent conditions and primitive actions that interact we are forced to generalise the BT and can find where common
with the agent’s environment (Fig. 1). Although conceptually patterns diverge so we can attempt to infer the reasons for
represented as trees, it is common for task behaviors to be different actions being chosen. Reducing the size of the BT
reused at different places in the tree, so the resulting structure will also help to make it more understandable if people wish
is really a directed acyclic graph [6]. to read and edit it.4

Execution of a BT is essentially a depth-first traversal of 4 The code implementation of this method is avilable online at
the directed graph structure, but there are four main non-leaf https://github.com/phoglenix/bt-builder.
Input Sequences
to GLAM2
Input Examples

Maximally-specific BT
Identified pattern
and alignments

Merged alignments
into new sequence
Merge into new sequence
Find common pattern
Attach to tree
Merged sequence
Fig. 2. Overview of the general BT construction process. Input examples replaces aligned
are converted into a maximally-specific BT. The BT is then iteratively reduced regions
by finding common patterns, merging them into new sequences, and attaching
them to the tree. When no more patterns are found, the process stops.
Sequence after
matched region
A. Creating the original BT joined by selector

The process of creating a maximally-specific BT from a


set of examples is actually fairly trivial. All actions in an
Fig. 3. Reducing the BT. Sequences are passed in to GLAM2 for pattern
example can simply be made children of a single sequence discovery and alignment. Aligned regions are then merged to form a new
node (potentially with a special “delay” action between them sequence. The merged sequence then replaces the aligned regions of the
if timing is known). All of these sequence nodes can then original patterns. Finally, any sequences following the aligned region are joined
be joined by adding a selector node as their parent to make a by a selector node.
complete BT. The selector node can be set to choose randomly
among its children, or its children compared with the current
state using the observations similarity metric at runtime to matches well (scores highly when locally aligned) to many
select the most-similar option. We call this tree maximally- sequences at once. The BT is converted into sequences by
specific because it exactly represents the input example se- simply taking each sequence node separately, and passed to
quences without any generalisation or other processing. This GLAM2. When a pattern has not been improved by GLAM2
tree is clearly extremely over-fit to the example data, so it for a set number of iterations, it is returned along with the
needs to be reduced by finding common patterns. alignments and scores for each sequence. GLAM2 always
returns a pattern, so the quality of the pattern and alignments
B. Reducing the BT must be checked. We check that all aligned sections have a
score above a set threshold, and any alignments with a score
Now we iteratively reduce the tree by identifying and below the threshold are discarded. This threshold can be set by
merging common subsequences within the sequence nodes and informally testing and inspecting the alignments and scores,
rearranging the tree to share these common sections (Fig. 3). or can be more rigorously informed by shuffling the input
The core of the BT reducing method relies on local sequence sequences and checking the scores found, or concatenating
alignment techniques. These techniques are commonly used to sequences with shuffled versions of themselves and checking
compare two strings, especially DNA strings in computational the alignments are more often in the unshuffled regions [33].
biology, to find the indices at which one string aligns best with If no aligned sequences have a score above the threshold, the
another. The best alignment is defined by a scoring system that BT reduction process stops.
rewards matching or similar characters at a position, penalises
mismatching characters, and, importantly, allows but penalises Using the pattern and alignments found by GLAM2, we
extra or missing characters. For strings of length m and n, begin to construct a new sequence node. This sequence is
efficient implementations of this algorithm run in O(mn) time. a generalisation of all of the matching aligned sections of
In order for this algorithm to be used in our situation, we can the sequences. For each position in the matched pattern, all
extract sequences of actions on different branches of the BT nodes at that position in the alignment are merged. This
and compare them using the similarity metric given in the merging produces a weighted combination of the attributes
problem statement as a scoring system. of the nodes. For example, if five nodes had an action with
a “name” attribute set to “Train Protoss Probe” and two with
While the local alignment algorithm is effective for align- “Train Protoss Zealot”, the merged node would have a “name”
ing entire sequences against one another, in this case we attribute with “Train Protoss Probe”×5 and “Train Protoss
are trying to find similar subsequences in cases where the Zealot”×2. Any attribute values that were seen in just one
sequence as a whole may not be similar. For this task we node of the merge are discarded, because they likely represent
can make use of another technique: motif finding. Specifically, unique identifiers or unusual values.
we use the Gapped Local Alignment of Motifs (GLAM2)
software [33]. This software uses a simulated annealing-based Next, insertion and deletion positions in the sequence
approach to gradually select and refine a short pattern that are checked for possible transpositions, where actions have
250000
occurred in different orders in different sequences. These are Number of BT Nodes
detected if an action is almost always inserted either before or 200000
after another sequence of actions, but not both (or equivalently,
deleted from the pattern and inserted somewhere nearby). In 150000
these cases, a parallel node is added with the transposed action
(or action sequence) as one child and the sequence it would 100000
move around as another child. In cases where insertions and
deletions are not detected as parallel or unordered, conditional 50000
decorators are added with records of the state observations.
When executing these actions, the decorator will be able to 0
check the stored and current state observations in order to 0 2 4 6 8 10 12
Iteration
decide whether to execute, based on the similarity metric.
Finally, the newly constructed sequence can be used to Fig. 4. Number of nodes in the BT throughout a run reducing the StarCraft
“Protoss vs Protoss” dataset.
replace the aligned regions of the original sequences. For each
matched sequence, the region before and after the aligned
region are separated. For each section before the aligned level unit control modules from an existing StarCraft bot such
region, the new sequence is added as the final node. Next, as one of [36] or [37].
a selector node is added to the end of the newly constructed
sequence. For each section after the aligned region, the section VIII. D ISCUSSION AND F UTURE W ORK
is added as a child of the selector node. This will allow
the node to select the sequence with the most-similar state This paper presents a promising start to automatically
observations at execution time. At this point, each sequence producing a reactive AI system from expert examples, but this
node in the tree is passed back to GLAM2 for analysis. approach clearly has some significant limitations. Despite the
Because the previously-found pattern has been collapsed into approach managing to collapse large amounts of repeated or
one sequence node, it will be far less common, so a new pattern similar sequences of actions, it is not sophisticated enough to
will be found. separate out most parallel or reactive actions. Parallel actions
can be seen in some motifs, in which certain actions are placed
before or after other actions in the motif. Currently, GLAM2
VII. E XPERIMENT cannot detect transpositions, so such variability appears in the
In order to experiment with building behavior trees for alignment as one or more insertion and deletion pairs. This
StarCraft, we used an existing dataset from prior work [34] could potentially be mitigated by searching matched regions
consisting of 389 matches of expert human players using the for inserted and deleted items appearing at a similar fre-
“Protoss” team against another expert “Protoss” team player. quency across all matched regions, or by analysing the action
The matches were recorded from each player’s perspective for sequences for ordering relations or a lack thereof. Reactive
a total of 778 examples. The game state observations were actions are somewhat captured by the context-sensitive selector
sampled every second, and actions were paired with the most- nodes that are inserted after each merged region, but the
recent observations to produce cases. A very simple action trees produced by this method do not make good use of the
similarity metric was used, producing a score of 1 if the action behavior tree’s potential for reactive behavior. Certain motifs
names were the same (for example “Train Protoss Probe”), and found may very well be reactions to conditions in a game,
0 otherwise. but they are currently treated as if they are part of the normal
sequence of actions, instead of an interruption to the normal
Although there is nothing fundamentally preventing this sequence. Analysis of the observed conditions leading up to
algorithm being implemented with support for an arbitrary each discovered motif could allow the addition of conditional
similarity metric between actions or observations, GLAM2 nodes to trigger the reactive behavior dynamically, which
operates only on strings. This meant that the action sequences would make the BT much more robust to changes as well as
needed to be encoded as characters in an extended character set becoming a better and more compact representation. Ideally,
before being run in GLAM2, and the results decoded in order we could continuously process the tree to have fewer and fewer
to be used as actions again. Even so, GLAM2 was able to find nodes while still representing the original information, similar
clear motifs in the dataset and the process was able drastically to [19] or [23].
reduce the total number of nodes required to represent the
tree, from 218,832 in the original tree, down to 71,294 in The way in which the trees are reduced necessarily removes
the final tree (Fig. 4). The opening actions in each game, in information, so it is possible that important information is
particular, were always discovered as a strong motif early on lost in the process. This is particularly true of the steps in
in the process, as these are very similar every game. which patterns are used to merge nodes and subsequently join
the sequence back to the original locations of the matched
The resulting trees have not yet been tested at actually regions. In the merging process, only unique attribute values
playing the game, because they would be unable to perform are discarded, but more attention could be paid to generalising
the required low level unit commands that would be required these values. Numerical values, in particular, may often be
to complement the high-level strategic commands. These re- unique, but could be generalised to a range or distribution.
sponsibilities are separated out into different modules in many There may also be correlations between attributes, which are
StarCraft bots due to the difficulty of multi-scale reasoning lost if multiple values for those attributes are merged. This
[35]. This may be possible to test in future by using the low- situation might actually be an indication that the node should
be merged into multiple nodes instead of just one. For the R EFERENCES
joining of merged regions back to subsequent behavior, this [1] O. Ilghami, D. S. Nau, H. Muñoz-Avila, and D. W. Aha, “Learning
could potentially break or incorrectly connect longer sequences preconditions for planning from plan traces and htn structure,”
for which the merged region was just an interruption. This Computational Intelligence, vol. 21, no. 4, pp. 388–413, 2005.
issue would be solved with better identification of reactive [2] J. Lanchas, S. Jiménez, F. Fernández, and D. Borrajo, “Learning
actions, as discussed above. action durations from executions,” in Proceedings of the International
Conference on Automated Planning and Scheduling (ICAPS) Confer,
2007.
A limitation in the way GLAM2 works is that it always
[3] S. Ontañón, “Case acquisition strategies for case-based reasoning in
finds at most one pattern match per input sequence. This means real-time strategy games,” in Proceedings of the International Florida
that sequences may have to be split up before a pattern that Artificial Intelligence Research Society (FLAIRS) Conference, 2012.
repeats within one sequence will become common enough [4] G. Florez-Puga, M. Gomez-Martin, P. Gomez-Martin, B. Diaz-Agudo,
to be found as the most prominent motif. A related issue and P. Gonzalez-Calero, “Query-enabled behavior trees,” IEEE Trans.
is that the algorithm becomes less effective at finding motifs Computational Intelligence and AI in Games, vol. 1, no. 4, pp.
as sequences get shorter and more numerous, which is what 298–308, Dec 2009.
happens naturally as they are broken up by the tree-building [5] G. Robertson and I. Watson, “A review of real-time strategy game
process. This clearly isn’t too major of a problem, because AI,” AI Magazine, vol. 35, no. 4, pp. 75–104, 2014.
GLAM2 is still able to find motifs quite effectively for many [6] D. Isla, “Proceedings of the game developers conference: Handling
complexity in the Halo 2 AI,” Web page, March 2005. [Online].
iterations, but it does limit its usefulness. Finally GLAM2 Available: http://www.gamasutra.com/view/feature/130663/gdc 2005
complicates the process due to its use of character encodings proceeding handling .php
for comparison, instead of a more flexible similarity metric. [7] A. Champandard, “Getting started with decision making and control
This is understandable, as it was designed to work with DNA systems,” in AI Game Programming Wisdom. Charles River Media,
and nucleotide sequences and is being extended to work in this 2008, vol. 4, pp. 257–264.
scenario. [8] R. Palma, P. González-Calero, M. Gómez-Martı́n, and P. Gómez-
Martı́n, “Extending case-based planning with behavior trees,” in
Proceedings of the International FLAIRS Conference, 2011, pp.
A useful extension to this work would be to integrate an 407–412.
unsupervised data mining approach to inferring action precon- [9] A. J. Champandard, “Behavior trees for next-gen game AI,” Video,
ditions and effects, such as [38]. Even a partial understanding December 2007, retrieved 15 November 2012. [Online]. Available:
of the preconditions and effects of actions could help to guide http://aigamedev.com/open/article/behavior-trees-part1/
the BT building process, without having to strongly rely on [10] R. E. Fikes and N. J. Nilsson, “Strips: A new approach to
accurate action models like in HTN planning. As an addition to the application of theorem proving to problem solving,” Artificial
the problem, or possibly an alternative to the similarity metric, Intelligence, vol. 2, no. 3-4, pp. 189 – 208, 1971.
a fitness metric could be provided to the agent to encourage [11] X. Wang, “Learning by observation and practice: An incremental
a more search-based strategy of learning. Additionally, a goal approach for planning operator acquisition,” in Proceedings of the
International Conference on Machine Learning (ICML), 1995, pp. 549–
description could potentially be supplied in some form to make 557.
the problem easier, or a similarity metric could be left for [12] Q. Yang, K. Wu, and Y. Jiang, “Learning action models from plan
the agent to infer. As an evaluation mechanism, the similarity examples using weighted max-sat,” Artificial Intelligence, vol. 171,
metric could potentially be used to gauge how close a proposed no. 2, pp. 107–143, 2007.
action was to the expected action, because it may be impossible [13] H. H. Zhuo, D. H. Hu, C. Hogg, Q. Yang, and H. Munoz-Avila,
to predict the exact action details used by the expert. “Learning htn method preconditions and action models from partial
observations,” in Proceedings of the International Joint Conference on
Artificial Intelligence (IJCAI), 2009, pp. 1804–1810.
[14] C. Hogg, H. Munoz-Avila, and U. Kuter, “HTN-MAKER: Learning
HTNs with minimal additional knowledge engineering required,” in
IX. C ONCLUSION Proceedings of the AAAI Conference on AI, 2008, pp. 950–956.
[15] S. Mohan and J. E. Laird, “Learning goal-oriented hierarchical tasks
In this paper we introduced a new planning problem, from situated interactive instruction,” in Proceedings of the Association
in which no action models were supplied and very few for the Advancement of Artificial Intelligence (AAAI) Conference,
assumptions were made about the domain. In this problem, 2014.
the planning system is able to receive information about the [16] N. Mehta, “Hierarchical structure discovery and transfer in sequential
decision problems,” Ph.D. dissertation, Oregon State University, 2011.
domain only through observing examples of expert behavior
[17] N. Nejati, P. Langley, and T. Konik, “Learning hierarchical task
in the domain, and must be able to complete the same task networks by observation,” in Proceedings of the International
that the experts were undertaking in the examples. We intro- Conference on Machine Learning, 2006, pp. 665–672.
duced behavior trees as a potential mechanism for representing [18] C. Hogg, U. Kuter, and H. Munoz-Avila, “Learning hierarchical task
and executing knowledge about the problem and appropriate networks for nondeterministic planning domains,” in Proceedings of
actions to take to complete the task. We then introduced the IJCAI, 2009.
our mechanism for producing a solution behavior tree, which [19] H. Pasula, L. S. Zettlemoyer, and L. P. Kaelbling, “Learning
involves searching for common motifs among sequences of probabilistic relational planning rules.” in Proceedings of the ICAPS
Conference, 2004, pp. 73–82.
actions, joining the sequences found, and following them
with selector nodes that allow some reactivity to the current [20] M. D. Schmill, T. Oates, and P. R. Cohen, “Learning planning operators
in real-world, partially observable environments.” in Proceedings of
game state. The behavior tree learning mechanism was shown the Artificial Intelligence Planning and Scheduling Conference, 2000,
to successfully reduce sequences of player actions from the pp. 246–253.
real-time strategy game StarCraft by almost two orders of [21] D. Shahaf and E. Amir, “Learning partially observable action schemas,”
magnitude. in Proceedings of the AAAI Conference, 2006, pp. 913–919.
[22] Q. Yang, K. Wu, and Y. Jiang, “Learning actions models from plan
examples with incomplete knowledge,” in Proceedings of the ICAPS
Conference, 2005, pp. 241–250.
[23] E. Winner and M. Veloso, “Distill: Learning domain-specific planners
by example,” in Proceedings of the International Conference on
Machine Learning, 2003, pp. 800–807.
[24] R. Palma, A. Sánchez-Ruiz, M. Gómez-Martı́n, P. Gómez-Martı́n, and
P. González-Calero, “Combining expert knowledge and learning from
demonstration in real-time strategy games,” in Case-Based Reasoning
Research and Development, ser. Lecture Notes in Computer Science,
A. Ram and N. Wiratunga, Eds. Springer Berlin / Heidelberg, 2011,
vol. 6880, pp. 181–195.
[25] E. Dereszynski, J. Hostetler, A. Fern, T. Dietterich, T. Hoang, and
M. Udarbe, “Learning probabilistic behavior models in real-time
strategy games,” in Proceedings of the Artificial Intelligence and
Interactive Digital Entertainment (AIIDE) Conference. AAAI Press,
2011, pp. 20–25.
[26] G. Synnaeve and P. Bessière, “A bayesian model for plan recognition
in RTS games applied to StarCraft,” in Proceedings of the AIIDE
Conference. AAAI Press, 2011, pp. 79–84.
[27] C. Lim, R. Baumgarten, and S. Colton, “Evolving behaviour trees
for the commercial game DEFCON,” in Applications of Evolutionary
Computation, ser. Lecture Notes in Computer Science, C. Chio,
S. Cagnoni, C. Cotta, M. Ebner, A. Ekárt, A. Esparcia-Alcazar, C.-K.
Goh, J. Merelo, F. Neri, M. Preuß, J. Togelius, and G. Yannakakis,
Eds. Springer Berlin / Heidelberg, 2010, vol. 6024, pp. 100–110.
[28] R. Kadlec, “Evolution of intelligent agent behavior in computer
games,” Master’s thesis, Faculty of Mathematics and Physics, Charles
University in Prague, 2008.
[29] M. Ghallab, D. Nau, and P. Traverso, Automated planning: theory &
practice. Elsevier, 2004, chapter 11.
[30] M. Buro and T. M. Furtak, “RTS games and real-time AI research,” in
Proceedings of the Behavior Representation in Modeling and Simulation
Conference. Citeseer, 2004, pp. 63–70.
[31] M. Buro and D. Churchill, “Real-time strategy game competitions,” AI
Magazine, vol. 33, no. 3, pp. 106–108, Fall 2012.
[32] B. Weber, M. Mateas, and A. Jhala, “Building human-level AI for
real-time strategy games,” in Proceedings of the AAAI Fall Symposium
Series. AAAI, 2011, pp. 329–336.
[33] M. C. Frith, N. F. W. Saunders, B. Kobe, and T. L. Bailey, “Discov-
ering sequence motifs with arbitrary insertions and deletions,” PLoS
Computational Biology, vol. 4, no. 5, p. e1000071, 2008.
[34] G. Robertson and I. Watson, “An improved dataset and extraction
process for StarCraft AI,” in Proceedings of the FLAIRS Conference,
2014.
[35] B. Weber, P. Mawhorter, M. Mateas, and A. Jhala, “Reactive planning
idioms for multi-scale game AI,” in Proceedings of the IEEE
Conference on Computational Intelligence and Games. IEEE, 2010,
pp. 115–122.
[36] S. Ontañón, G. Synnaeve, A. Uriarte, F. Richoux, D. Churchill, and
M. Preuss, “A survey of real-time strategy game AI research and
competition in StarCraft,” IEEE Trans. Computational Intelligence and
AI in Games, vol. 5, no. 4, pp. 293–311, 2013.
[37] S. Wender and I. Watson, “Integrating case-based reasoning with
reinforcement learning for real-time strategy game micromanagement,”
in PRICAI 2014: Trends in Artificial Intelligence. Springer, 2014, pp.
64–76.
[38] M. A. Leece and A. Jhala, “Sequential pattern mining in StarCraft:
Brood War for short and long-term goals,” in In Proceedings of the
AIIDE Conference, 2014.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy