Artificial Intelligence
Artificial Intelligence
Artificial Intelligence
AI Theory
Topic Author Description
A Modular Framework for Artificial Charles Guy An approach to modeling the biological
Intelligence Based on Stimulus nervous system without using neural nets.
Response Directives
[Added: 2/21/2000]
Allis' Ph.D. thesis: Searching for
Solutions in Games and AI
[Added: 7/16/1999]
Does the Top-down or Bottom-Up J.Matthews Huge essay (7000+ words) detailing many
Approach Best Model the Human aspects of AI - including conceptual
Brain? representation, mentalese, the CYC
[Added: 7/30/1999] project and COG. This was written as my
Extended Essay for the International
Baccalaureate. Abstract, diagrams,
bibliography all included!
Introduction to Machine Vision James An introductiont to machine vision. Edge
[Added: 7/31/1999] Matthews detection and prototyping are discussed.
Logic Programming with Fuzzy Sets Ludek Matyska Zipped .PDF format.
[Added: 7/31/1999]
Philosophical Arguments For and J. Matthews Philosophy often helps artificial
Against AI intelligence in many fields, yet when
[Added: 7/31/1999] looking at the concept of AI as a whole,
often critizes the very fundamental parts
of AI - questioning when intelligent
machines will ever be possible. This essay
looks at some of the different schools of
thought that agree and disagree with AI.
Production Systems S.Hsiung Most AI systems today are production
[Added: 7/31/1999] systems. In fact many can argue that all
computer programs are production
systems. How are production systems
different from the rest? What are their
strengths and weaknesses? How do they
work? Many task-structured programs
such from puzzle solvers to chess playing
programs to medical diagnosis expert
systems, and to monsters in Quake2 are
production systems
Project AI Mark Lewis This document covers the basics of
[Added: 7/28/1999] Baldwin and designing an artificial intelligence for a
Bob Rakosky strategy game.
The Intuitive Algorithm Abraham This is an essay concerning the idea that
[Added: 10/7/1999] Thomas intuition may be a pattern recognition
algorithm. Due to its scientific nature, it
can be a difficult read for some.
The Natural Mind: Conciousness S. Hsiung Perhaps another one of the greatest
and Self-Awareness challenges in AI is to create something
[Added: 7/31/1999] that has knowledge of having knowledge.
Applications
Topic Author Description
AI in Gaming James Discusses current and future AI in the
[Added: 7/30/1999] Matthews shoot-em-up, flight simulator and board
game games.
Applications in Music J. Matthews This essay looks at AI and music, and
[Added: 7/30/1999] more specifically the guitar. The problems
that arise as effects are added, as speed
increases and as more and more
instruments are added to a piece of music.
Documentation
Topic Author Description
Hierarchal AI Andrew This document proposes an approach to
[Added: 9/7/1999] Luppnow the problem of designing the AI routines
for intelligent computer wargame
opponents
Multilayer Feedforward Networks S. Hsiung A guide to creating multilayer
and the Backpropagation Algorithm feedforward networks and tips and details
[Added: 7/31/1999] on how to implement the backpropagation
algorithm in C++ with heavy theoretical
discussion.
Gaming
Topic Author Description
A Practical Guide to Building a Geoff Howland Part I of this series covers state machines,
Complete Game AI: Volume I unit actions, and grouping.
[Added: 10/12/1999]
A Practical Guide to Building a Geoff Howland Part 2 covers unit goals and pathfinding.
Complete Game AI: Volume II
[Added: 10/12/1999]
Game AI: The State of the Industry, David C. The second installment of Game
Part Two Pottinger and Developer magazine's annual
[Added: 11/13/2000] Prof. John E. investigation into game AI presents two
Laird more experts discussing this
ever-evolving field.
Game Developers Conference 2001: Eric Dybsand Highlights some of the more salient points
An AI Perspective from the computer game AI related
[Added: 5/11/2001] sessions he attended during the recent
GDC 2001.
Machine Learning, Game Play, and David
Go Stoutamire
[Added: 7/31/1999]
More AI in Less Processor Time: Ian Wright Presents techniques to control and manage
Egocentric AI real-time AI execution, techniques that
[Added: 6/21/2000] open up the possibility for future
hardware acceleration of AI.
Recognizing Strategic Dispositions Steve A compilation of a 1995 newsgroup
thread Woodcock thread about creating an AI that
[Added: 7/5/2000] recognizes strategic situations.
Searching for Solutions in Games L.V. Allis In PostScript ZIP format.
and Artificial Intelligence
[Added: 7/31/1999]
Genetic Algorithms
Topic Author Description
Application of Genetic Tobin Ehlis This article covers the development and
Programming to the Snake Game analysis of a successful function set that
[Added: 8/10/2000] will allow the evolution of a genetic
program that will allow the AI to attain
maximal performance.
GA Playground Ariel Dolan A genetic algorithm toolkit for use with
[Added: 9/7/1999] Java
Genetic Algorithm Example: S.Hsiung A step-by-step example of how genetic
Diophantine Equation algorithms can be used to solve
[Added: 7/30/1999] diophantine equations. Now with
accompanying C++ code!
Genetic Algorithms Tutorials Darrell Whitley ZIPped PostScript format.
[Added: 7/31/1999]
Representing Trees in Genetic C.Palmer, ZIPped PDF format.
Algorithms A.Kershenbaum
[Added: 7/31/1999]
Introduction
Topic Author Description
An Introduction to Artificial S. Hsiung A history oriented introduction to AI.
Intelligence How thought and beliefs surround AI has
[Added: 7/30/1999] evolved since the 1950s- what we can
hope for in the future.
Neural Networks
Topic Author Description
Back Propogation Neural Network Patrick Ko (.ZIP 25Kb)
Engine source Shu-pui
[Added: 7/31/1999]
Four Neural Networks Various A collection of four neural networks in a
[Added: 7/31/1999] .Zip file.
Neural Netware André LaMothe A nice introduction to neural nets
[Added: 10/7/1999]
Neural Network FAQ Warren S. Sarle FAQ from comp.ai.neural-nets. Updated
[Added: 9/7/1999] monthly.
Neural Network FAQ Lutz Prechelt
[Added: 7/31/1999]
Neurons and Neural Networks: The Michael A
Most Abstract View Arbib
[Added: 7/31/1999]
Key: = HTML article hosted here , = HTML article hosted elsewhere = link to another web site = Adobe Acrobat
document = Zip files = Word or text document
About Us | Advertise on GameDev.net | Write for us
© 1999-2002 Gamedev.net. All rights reserved. Terms of Use Privacy Policy
Comments? Questions? Feedback? Click here!
| | | |
Features
by Charles Guy
Gamasutra
November 10, 1999
A Modular Framework for Artificial
Intelligence Based on Stimulus Response
Directives
There are three fundamental technologies used in modern computer Contents
games: graphics, physics, and artificial intelligence(AI). But while
graphics and physics have shown great progress in the last five years, The Biological Model for
current AI still continues to display only simple repetitive behavior, which Artificial Intelligence
is of little replay value. This deficiency, unfortunately, is often
sidestepped, and the emphasis switched to multi-player games, which Overview of Data Flow /
take advantage of real human intelligence. Data Structures
In this article, I have attempted to model AI based on the functional The Navigator and Goal
anatomy of the biological nervous system. In the pure sense of the word, Servos
a biological model of AI should use neural networks for all stimulus
encoding and motor response signal processing. Unfortunately, neural
networks are still difficult to control (for game design) and very
computationally expensive. Therefore I have chosen a hybrid model,
which uses a "biological" signal path framework in conjunction with more traditional heuristic methods
for goal selection. The main features of this model are:
1. Stimulus detection based on signal strength thresholds.
2. Target goal selection based on directives, know goals and acquired goals.
Letters to the Editor:
Write a letter 3. Target goals acquired by servo feedback loops that drive the body.
View all letters
4. Personalities constructed from sets of directives. Because the directives are modular, it is fairly
straightforward to construct a wide range of distinctive personalities. These personalities can
display stereotypical behavior while still retaining enough flexibility to exercise "judgment" and
adapt to unique situations. The framework of this model should be useful to a wide range of
applications because of its generic nature.
Some Background
This AI model was developed for use in
SpecOps II, a tactical infantry combat
simulation of Green Beret covert
missions. While the emphasis of the
project has been on realism and squad
level tactics, it still falls under the
category of a first-person shooter. The
original SpecOps project was based on
the U.S. Army Ranger Corps. and was one
of the first "photo-realistic" tactical
combat simulators released for the
computer gaming market. The
combination of high quality motion
capture data, photo digitized texture
maps and sound effects recorded from
authentic sources produce a rather
compelling combat experience. Although
the original game was fun to play, it was
justifiably criticized for having poor AI.
Therefore one of the major goals for
Another typical sequence might begin when a player issues a "demolish position" command to a squad
member. The AI will then navigate to the position goal, place a satchel charge and yell out: "fire in the
hole!" The "get away from explosive" directive will then cause him to move outside of the danger
radius of the explosive. I have observed an interesting case where the initial evasive maneuver lead to
a dead end, followed by backtracking towards the explosive object. Eventually the navigator got the AI
a safe distance away from the explosive in time.
Overview of Data Flow / Data Structures
| | | |
Features
by Charles Guy
Gamasutra Overview of Data Flow
November 10, 1999 Contents
The data flow begins with the Stimulus Detection Unit, which filters sound The Biological Model for
events and visible objects and updates the Known Goals queue. The Goal Artificial Intelligence
Selector then compares the Known Goals and Acquired Goals against the
personality and commander directives and then selects Target Goals. The Overview of Data Flow /
navigator determines the best route to get to a position goal using a path Data Structures
finding algorithm. The direction and position goal servos drive the body
until the Target Goals are achieved and then the Acquired Goals queue is The Navigator and Goal
updated. Servos
Data Structures
The primary data structures used by this brain model are: BRAIN_GOAL
and DIRECTIVE. AI personalities are represented by an array of Directive structures and other
parameters. The following is a typical personality declaration from SpecOps II:
PERSONALITY_BEGIN( TeammateRifleman )
PERSONALITY_SET_FIRING_RANGE ( 100000.0f ) \\ must be this close to fire
gun (mm)
PERSONALITY_SET_FIRING_ANGLE_TOLERANCE( 500.0f ) \\ must point this
accurate to fire (mm)
PERSONALITY_SET_RETREAT_DAMAGE_THRESHOLD( 75 ) \\ retreat if damage
exceeds this amount (percent)
DIRECTIVES_BEGIN
Letters to the Editor:
DIRECTIVE_ADD( TEAMMATE_FIRING_GOAL, AvoidTeammateFire, BaseWeight+1,
Write a letter AvoidTeammateFireDecay )
View all letters DIRECTIVE_ADD( EXPLOSIVE_GOAL, GetAwayFromExplosive, BaseWeight+1, NoDecay
)
DIRECTIVE_ADD( HUMAN_TAKES_DAMAGE_GOAL,
BuddyDamageVocalResponce,BaseWeight, AcquiredGoalDecay )
DIRECTIVE_ADD( DEMOLISH_POSITION_GOAL, DemolishVocalResponce, BaseWeight,
AcquiredGoalDecay )
DIRECTIVE_ADD( SEEN_ENEMY_GOAL, StationaryAttackEnemy, BaseWeight-1,
SeenEnemyDecayRate )
DIRECTIVE_ADD( HEARD_ENEMY_GOAL, FaceEnemy, BaseWeight-2,
HeardEnemyDecayRate )
DIRECTIVE_ADD( UNCONDITIONAL_GOAL, FollowCommander, BaseWeight-3, NoDecay
)
DIRECTIVE_ADD( UNCONDITIONAL_GOAL, GoToIdle, BaseWeight-4, NoDecay )
DIRECTIVES_END
PERSONALITY_END
● Response function pointer (is called if priority weight is best, assigns target goals)
● Decay rate (allows older goals to become less important over time)
● Goal object pointer (void *, cast to typed pointer based on object type)
● Goal position type (i.e. dynamic object position, fixed position, offset position etc.)
● Movement mode (Forward, Forward slow, Sidestep left, Sidestep right etc.)
Modeling stimulus detection in a physical way can achieve symmetry and help fulfill the user's
expectations (i.e. if I can see him, he should be able to see me). This also prevents the AI from
receiving hidden knowledge and having an unfair advantage. The stimulus detection unit models the
signal strength of an event as a distance threshold. For example, the HeardGunFire event can be
detected within a distance of 250 meters. This threshold distance can be attenuated by a number of
factors. If a stimulus event is detected, it is encoded into a BRAIN_GOAL and added to the known
goals queue. This implementation of stimulus detection considers only three sensory modalities:
visual, auditory and tactile.
Visual stimulus detection begins by considering all humans and objects within the field of view of the
observer (~180 degrees). A scaled distance threshold is then computed based on the size of the
object, object illumination, off-axis angle and tangential velocity. If the object is within the scaled
distance threshold, a ray cast is performed to determine if the object is not occluded by the world. If
all these tests are passed, the object is encoded into a BRAIN_GOAL. For example, a generic human
can be encoded into a SeenEnemyGoal or generic object can be encoded into SeenExplosiveGoal .
As sounds occur in the game, they are added to the sound event queue. These sound events contain
information about the source object type, position and detection radius. Audio stimulus detection
begins by scanning the sound event queue for objects within the distance threshold. This distance
threshold can be further reduced by an extinction factor if the ray from the listener to the sound
source is blocked by the world. If a sound event is within the scaled distance threshold, it is encoded
into a BRAIN_GOAL and sent to the known goals queue.
When the known goals queue is updated with a BRAIN_GOAL, a test is made to determine if it is was
previously known. If it was previously known, the matching known goal is updated with a new time of
detection and location. Otherwise the oldest known goal is replaced by it. The PREVIOUSLY_KNOWN
flag of this known goal is set appropriately for directives that respond to the rising edge of a detection
event.
Injuries and collision can generate tactile stimulus detection events. These are added to the acquired
goals queue directly. Tactile stimulus events are primarily used for the generation of vocal responses.
The Goal Selector
The goal selector chooses target goals based on stimulus response directives. The grammar for the
directives is constructed as a simple IF THEN statement:
IF I detect an object of type X (and priority weight Y is best) THEN call target goal function Z.
The process of goal selection starts by evaluating each active directive for a given personality. The
known goals queue or the acquired goals queue is then tested to find a match for this directive object
type. If a match is found and the priority weight is the highest in the list, then the target goal function
is called. This function can perform additional logic to determine if this BRAIN_GOAL is to be chosen as
| | | |
Features
by Charles Guy
Gamasutra The Navigator
November 10, 1999 Contents
Once a position goal has been selected, the navigator must find a path to The Biological Model for
get there. The navigator first determines if the target can be acquired
Artificial Intelligence
directly (i.e. can I walk straight to it?). My initial implementation of this
test used a ray cast from the current location to the target location. If the Overview of Data Flow /
ray was blocked, then the target was not directly accessible. The ray cast Data Structures
method has two problems:
1. Where an intervening drop off or obstacle did not block the ray and The Navigator and Goal
Servos
Figure 2. Side view of linear ray cast vs. step-wise walk through obstacle
detection.
If a position goal is not blocked by the world, the position goal servo goes directly to the target.
Otherwise a path finding algorithm is used to find an alternate route to get to the target position. The
path finding algorithm that is used in SpecOps II is based on Navigation Helper Nodes that are placed
in the world by the game designers. These nodes are placed at the junctions of doors, hallways, stairs
and boundary points of obstacles. There are typically a few hundred Navigation Helper Nodes per level.
The first step in the path finding process is to update the known goals queue with all Navigation Helper
Nodes that are not blocked by the world. Because the step-wise walk through obstacle test is fairly
time expensive, it is distributed over a number of frame intervals. Once the know goals queue been
updated with all valid navigation helper nodes, the next position goal can be selected. This selection is
based on when the Navigation Helper was last visited and how close it is to the target position. When a
Navigation Helper Node is acquired by the position goal servo, it is updated in the acquired goals
queue with the time of arrival. By only selecting Navigation Helper Nodes that have not been visited,
or that have the oldest time of arrival, ensures that the path finder will exhaustively scan all nodes
until the target can be reached directly. When two Navigation Helper Nodes have the same age status,
the one closer to the target position is selected.
The direction and position goal servos take an X, Y, Z position as their goal. This position is
transformed into local coordinates by translation and rotation. The direction servo drives the local X
component to 0 by applying the appropriate yaw velocity. The local Y component is driven to 0 by
Most actions are communicated to the body through a 128 bit virtual keyboard called the action flags.
These flags correspond directly to keys the player can press to control his avatar. Each action has an
enumerated type for each bit mask (i.e. IO_FIRE, IO_FORWARD, IO_POSTURE_UP,
IO_USE_INVENTORY etc.) These action flags are then encoded into animation states. Because the
body is articulated, rotation is controlled by separate scalar fields for body yaw velocity, chest yaw
angle, bicep pitch angle and head yaw/pitch angle. These allow for partially orthogonal direction goals
(i.e. the head and gun can track an enemy while the body is pointing at a different position goal).
Commands
Because of their modular nature, directives can be given to an AI by a commander at runtime. Each
brain has a special slot for a commander directive and a commander goal. This allows the commander
to tell one of his buddies to attack an enemy that is only visible to himself. Commands can be given to
a whole squad or to an individual. Note that is very easy to create directives for commander AI's to
issue commands to their teammates. The following is a list of commander directives used in SpecOps
II:
Future Improvements
Because this brain model is almost entirely data driven, it would be fairly easy to have it learn from
experience. For example, the priority weights for each directive could be modified as a response to
victories or defeats. Alternatively, an instructor could punish (reduce directive priority weight) or
reward (increase directive priority weight) responses to in-game events. The real problem with
teaching an AI during game play is the extremely short life span (10-60 seconds). However, each
personality could have a persistent communal brain, which could learn over the course of many lives.
In my opinion, the real value of dynamic learning in game AI is not to make a stronger opponent, but
to make a continuously changing opponent. It is easy to make an unbeatable AI opponent; the real
goal is to create AIs that have distinctive personalities, and these personalities should evolve over
time.
[Back to] The Biological Model for Artificial Intelligence
Search:
Home • Essays • Interviews • Programs • Books • Media • Software • Hardware • Games • Features
Discussion Board • AISolutions • Competition • Education • Projects • Glossary • Links • Search
Introduction
Throughout the history of artificial intelligence one question has always been asked when given a problem. Should the solution to a
problem be solved via the top-down method or through the bottom-up method? There are many different areas of artificial
intelligence where this question arises — but none more so than in the areas of Natural Language Processing (NLP) and robotics.
As we grow up we learn the language of our parents, making mistakes at first, but slowly growing used to using language to
communicate with others. Humans definitely learn through a bottom-up approach — we all start with nothing when we are born. It is
through our own intellect and learning that we master language. In the field of computing, though, such methods cannot always be
utilised.
The two approaches to the problems are called top-down and bottom-up, according to how they tackle the problems. Top-down takes
pre-programmed knowledge (like a large knowledge base) and uses symbolic creation, manipulation, linking and analysis to perform
the calculations. The top-down approach is the one most commonly used in the field of classical (and neo-classical) artificial
A CYC entity is not necessarily limited to one word; often it represents a group of words, or a concept. Look at this example taken
from the CYC KB:
;;; #$Food-ReadyToEat (#$isa #$Food-ReadyToEat #$ProductType)
(#$isa #$Food-ReadyToEat #$ExistingStuffType)
(#$genls #$Food-ReadyToEat #$FoodAndDrink)
(#$genls #$Food-ReadyToEat #$OrganicStuff)
(#$genls #$Food-ReadyToEat #$Food)
You can see how ‘food that is ready to eat’ is represented in CYC as a group of IS-A relationships and GENLS relationships. The
IS-A relationships are an ‘element of’-relationship whereas the GENLS relationships are a ‘subset of’-relationship. This hierarchy
creates a large linked concept for a very simple term. For example, the Food-ReadyToEat concept IS-A ExistingStuffType, and
in the CYC KB, ExistingStuffType is represented as:
;;; #$ExistingStuffType
(#$isa #$ExistingStuffType #$Collection)
(#$genls #$ExistingStuffType #$TemporalStuffType)
(#$genls #$ExistingStuffType #$StuffType)
With the following comment about it: "…A collection of collections. Each element of #$ExistingStuffType is a collection of
things (including portions of things) which are temporally and spatially stufflike; they may also be stufflike in other ways, e.g., in
some physical property. Division in time or space does not destroy the stufflike quality of the object…" It is apparent how generic
many of the concepts get as they rise in the CYC hierarchy.
Evidently, such a huge KB would generate a large concept for a small entity, but such a large concept is necessary. For example, the
CYC team created a sample program in 1994 that fetched images given search criteria. Given a request to search for images of seated
people, the program retrieved an image with the following caption: "There are some cars. They are on a street. There are some trees
on the side of the street. They are shedding their leaves. Some of them are yellow taxicabs. The New York City skyline is in the
background. It is sunny." The program had deduced that cars have seats, in which people normally sit, when the car is in motion.
Parsing
A look at parsing and its two approaches is necessary at this point. Parsers generally take information and convert it into a data
structure that the computer can manipulate. With reference to Artificial Intelligence, a parser is generally a program (or module of the
program) that takes a natural language sentence and converts it into a group of symbols. There are generally two methods of parsing,
bottom-up and top-down. The bottom-up method takes each word separately, matches the word to its syntactic category, does this for
the following word, and attempts to find grammar rules that can join these words together. This procedure continues until the whole
sentence is parsed, and the computer has represented the sentence in a well-formed structure. The top-down method, on the other
hand, starts with the various grammar rules and then tries to find instances of the rules within the sentence.
Here the bottom-up and top-down relationship is slightly different. Nevertheless, a parallel can be drawn if the grammar of a sentence
can be seen as the base of language (like commonsense is the base of cognitive intelligence). Both approaches have problems largely
due to the large amount of computational time both require. With the bottom-up approach, a lot of time is wasted looking for
combinations of syntactic categories that do not exist in the sentence. The same problem appears in the top-down approach, although
it is looking for instances of grammar that are not present that wastes the time.
So which method represents the human mind closer? Neither is perfect, because both methods simply involve syntactic analysis.
Take these two examples:
Carries’s box of strawberries was edible.
Carrie’s love of Kevin was intense.
If a program syntactically analyzed these two statements, it would come to the correct conclusion that the strawberries were edible,
but the incorrect conclusion that Kevin was intense. Despite the syntactical structure of the two sentences being the exact same, the
meaning is different. Nevertheless, even if a syntactical approach is used, it can be used to point the computer to the correct meaning.
As you will see with conceptual representation, if prior knowledge is known about the word ‘love’ then the computer can create the
correct data structure to represent the sentence. This still does not answer the question of what type of parser the brain is closer to. In
Schank’s words, ‘Does a human who is trying to understand look at what he has or does he look to fulfill his expectations?’ The
answer seems to be both; a person not only handles the syntax of the sentence, but also does a certain degree of prediction. Take the
following incomplete sentence:
John: I’ve played my guitar for over three hours and my fingers feel like ——
Looking at the syntax of the sentence it is easy to see that the next word will be a verb (‘dying’) or a noun (‘jelly’). It is easy,
therefore, to predict the conceptual structure of a sentence. Problems arise when meaning also has to be predicted too. We have the
problem of context, for instance. The fingers could be very worn out; they could be very callused from playing, or they could feel hot
from playing for so long.
Prediction is easier when there is more information to go on, for example, if John had said "and my poor fingers," from the context of
the sentence, we could have gathered that the fingers do not feel so good. This kind of prediction is called conversational prediction.
Another type of prediction is based upon the listener’s knowledge. If the listener knows John to be an avid guitar player, then he
might except a positive comment, but if he knows John’s parents force him to play the guitar, the listener could except a negative
remark.
All these factors are constituents when a human listens to someone talking. With all this taken into account, Schank sums up the
answer the following way:
"…We can therefore say that it would seem to be reasonable to claim that a human is a top-down parser with respect to some
well-defined world model. The hearer, however, is a bottom-up parser in that he hears a given word he tries to understand
what it is rather than decide whether it satisfied his ordered list of expectations…"
Types of Concepts
A concept can be any one of three different types — a nominal, an action, or a modifier. Nominals are concepts that do not need to be
explained further. Schank refers to nominals as picture-producers, or PPs, because he says that nominals produce a picture relating to
the concept in the mind of the hearer. An action is what a nominal can do, or more specially, what an animate nominal can perform
on some object. Therefore, a verb like ‘hit’ is classified as an action, but ‘like’ is not, since no action is performed. Finally, a modifier
is a descriptor of a nominal or an action. In English, modifiers could be given the names adverbs and adjectives, yet since CR is
supposedly independent of any language, the non-grammatical terms PA (picture aiders – for modifiers of nominals) and AA (action
aiders – for modifiers of actions) are used by Schank.
These three categories can all relate to each other, such relationships are called dependencies. Dependencies are well described by
Schank:
"…A dependency relation between two conceptual items indicates that the dependent item predicts the existence of the
governing item. A governor need not have a dependent, but a dependent must have a governor. The rule of thumb in
establishing dependency relations between two concepts is whether one item can be understood without the other. A governor
can be understood by itself. In order for a conceptualisation to exist, however, even a governor must be dependent on some
other concept in that conceptualisation…"
Therefore, nominals and actions are always governors, and the two types of modifiers are dependents. This does not mean, though,
that a nominal or an action cannot also be a dependent. For instance, some actions are derived from other actions, take for example
the imaginary structure CR-STEAL (conceptual type for stealing). Since stealing is really swapping of possession (with one party not
wanting that change of possession), it can be derived from a simpler concept of possession change.
C-Diagrams
C-Diagrams are the graphical equivalent of the structures that would be created inside a computer, showing the different relationships
between the concepts. C-Diagrams can get extremely complicated, with many different associations between the primitives; this
essay will cover the basics. Below is an example of a C-Diagram:
The above represents the sentence "John hit his little dog." ‘John’ is a nominal since it does not need anything further to describe it.
‘Hit’ is an action, since it is something that an animate nominal does. The dependency is said to be a ‘two-way dependency’ since
both ‘John’ and ‘hit’ are required for the conceptualisation — such a dependency is denoted by a ⇔ in the diagram. ‘Dog’ is also a
governor in this conceptualisation, yet it does not make sense within this conceptualization without a dependency to the action ‘hit.’
Such a dependency is called an ‘objective dependency’ — this is denoted by an arrow (the ‘o’ above it denotes the objectivity). Now
all the governors are in place, and we have created "John hit a dog" as a concept. We have to further this by adding the dependencies
— ‘little’ and ‘his’. Little is called an attribute dependency, since it is a PA for ‘dog’. Attributive dependencies are denoted by a ↑ in
the diagram. Finally, the ‘his’ has to be added — since his is just a pronoun, another way of expressing John, you would think it
would be dependent of ‘dog.’ It is not this simple, though, since ‘his’ also implies possession of ‘dog’ to ‘John.’ This is called a
prepositional dependency, and is denoted by a ⇑, followed by a label indicating the type of prepositional dependency. POSS-BY, for
instance, denotes possession.
With all this in mind lets look at a more complicated C-Diagram. Take the sentence, "I gave the man a book." Firstly, the ‘I’ and
‘give’ relationship is a two-way dependency, so a ⇔ is used. The ‘p’ above the arrow is to denote that the event took place in the
past. The ‘book’ is objectively dependent on ‘give’, so the arrow is used to denote this. Now, though we have a change in possession,
this represented in the diagram by two arrows, with the arrow pointing toward the governor (the originator ), and the arrow pointing
away (the actor ). The final diagram would look like:
You can see through conceptual representation, how a computer could create, store and manipulate symbols or concepts to represent
sentences. How can we be sure that such methods are accurate models of the brain? We cannot be certain, but we can look at
philosophy for theories that can support such computational, serial models of the brain.
Introduction to Mentalese
Despite this movement away from GOFAI by some researchers, the majority of scientists carried on with the classical approach. One
such pioneer of the ‘computational model’ field was Jerry A. Fodor. He says the following:
"…I assume that psychological laws are typically implemented by computational processes. There must be an implementing
mechanism for any law of a non-basic science, and the putative intentional generalisations of psychology are not
exceptions…Computational processes are ones defined over syntactically structured objects; viewed in extension, computations
are mappings from symbol to symbol; viewed in intension, they are mappings of symbols under syntactic description to
symbols under syntactic description…"
This quote is very reminiscent of conceptual representation and its methodology. Fodor argues that since sciences all have laws
governing their phenomenon, psychology and the workings of the brain are not an exception.
Consistency
How many different types of cells are our brains composed of? Essentially, it just uses one type — the neurone. How can the brain
both exhibit serial and parallel behaviours with only one type of cell? The obvious answer to this is that it does not. This is a fault in
the overall integrity of both approaches.
For example, we have parallel, bottom-up neural networks that can successfully detect pictures, but cannot figure out the algorithm
for an XOR bit-modifier. We have serial, top-down CR programs that can read newspaper articles, make inferences, and even learn
from these inferences — yet, such programs often make bogus discoveries, due to the lack of background information and general
knowledge. We have robots that can play the guitar, walk up stairs, aid in bomb disposal, yet nothing gets close to a program with
overall intellect equal to that of a human.
Top-Down Approach
One of the best ways to support the top-down approach and its similarities to the brain is to look at just how similar Mentalese and
conceptual representation are. Mentalese assumes that there is a language of the brain completely independent of the natural language
(or languages) of the individual. CR assumes the exact same thing, creating data structures with more abstract names that are
independent to the language, but rely on the parser for its input. This can explain the ease at which programs utilising CR easily
convert between two languages.
Both the ideas of Mentalese and CR must have been formulated from the same direction and perspective of the brain. Both assume a
certain independence that the brain has over the rest of the language/communication areas of the brain. Both assume that
computational manipulation is performed only after the language has been transformed into this language of the brain. It is about this
area that Mentalese and conceptual representation diverge — conceptual representation is limited to language (or has not yet been
applied to other areas), whereas Fodor maintains that this mental language applies to cognitive and perceptive areas of the brain too.
A fault in the Mentalese theory is that Fodor says Mentalese is universal. It seems hard to imagine that as we all grow up, learning
Bottom-Up Approach
The main advantage of the bottom-up approach is its simplicity (somewhat), and its flexibility. Using structures such as neural
networking, programs have been created that can do things that would be impossible to do with conventional, serial approaches. For
example, Hopfield networks can recognise partial, noisy, shrunken, or obscured images that it has ‘seen’ before relatively quickly.
Another program powered by a neural network has been trained to accept present tense English verbs and convert them to past tense.
The strongest argument for the bottom-up approach is that the learning processes are the same that any human undergoes when they
grow up. If you present the robot or program to the outside world it will learn things, and adapt itself to them. Like the COG Team
asserts, the key to human intelligence is the four traits they spelled out: developmental organisation, social interaction, embodiment,
and integration.
The bottom-up approach also models human behaviour (emotions inclusive) due to chaotic nature of parallel processing and neural
networking:
"…Neural networks combine both chaotic behaviour, since they are a nonlinear system, and reasonable, if unexpected
behaviour since this nonlinearity is controlled by so-called basins of attraction in the memory formed in the connection weight
values…"
The main downfall of the bottom-up approach is its practicality. Building a robot’s knowledge base from the ground up every time
one is made might be reasonable for a research robot, but for (once created) commercial robots or even very intelligent programs,
such a procedure would be out of the question.
Conclusion
In conclusion, a definite approach to the brain has not yet been developed, but the two approaches (top-down and bottom-up)
describe different aspects of the brain. The top-down approach seems like it can explain how humans use their knowledge in
conversation. What the top-down approach does not solve however, is how we get that initial knowledge to begin with — the
bottom-up approach does. Through ideas such as neural networking and parallel processing, we can see how the brain could possibly
take sensory information and convert it into data that it can remember and store. Nevertheless, these systems have so far only
demonstrated an ability to learn, and not sufficient ability to manipulate and change data in the way that the brain and programs
utilising a top-down methodology can.
These attributes of the approaches lead to their dispersion within the two fields of AI. Natural Language Processing took up the
top-down approach, since that had all the necessary data manipulation required to do the advanced analysis of languages. Yet, the
large amount of storage space required for a top-down program and the lack of a good learning mechanism made the top-down
approach too cumbersome for robotics. They adopted the bottom-up approach, which proved to be good for things such as face
recognition, motor control, sensory analysis and other such ‘primitive’ human attributes. Unfortunately, any degree of higher-level
complexity is very hard to achieve with a bottom-up approach.
Now we have one approach modelling the lower level aspects of the human brain, and another modelling the higher levels — so
which models the brain overall the best? Top-down approaches have been in development for as long as AI has been around, but
serious approaches to the bottom-up methodologies have only really started in the last twenty years or so. Since bottom-up
approaches are looking at what we know from neurobiology and psychology, not so much from philosophy like GOFAI scientists do,
there may be a lot more we have yet to discover. These discoveries, though, may be many years in the future. For the meanwhile, a
compromise should be reached between the two levels to attempt to model the brain consistently given the current technology. The
object-orientated approach might be one answer, research into neural networks trained to create and modify data structures similar to
those used in conceptual representation might be another.
Artificial Intelligence is the science of the unknown — trying to emulate something we cannot understand. GOFAI scientists have
always hoped that AI might one day explain the brain, instead of the other way around — connectionist scientists do not, and perhaps
this is the key to creating flexible code that can react given any environment, just like the human brain — real Artificial Intelligence.
Bibliography.
Altman, Ira. The Concept of Intelligence: A Philosophical Analysis. Maryland: 1997.
Brooks, R. A., Breazeal (Ferrell), C., Irie, R., Kemp, C. C., Marjanovi_c, M., Scassellati, B. & Williamson, M. M. (1998), Alternative Essences of Intelligence, in ‘Proceedings
of the American Association of Articial Intelligence (AAAI-98)’.
Brooks, R. A., Breazeal (Ferrell), C., Irie, R., Kemp, C. C., Marjanovi_c, M., Scassellati, B. & Williamson, M. M. (1998), The Cog Project: Building a Humanoid Robot.
Churchland, Patricia and Sejnowski, Terrence. The Computational Brain. London: 1996.
Churchland, Paul. The Engine of Reason, the Seat of the Soul: A Philosophical Journey into the Brain. London: 1996.
Crane, Tim. The Mechanical Mind: A Philosophical Introduction to Minds, Machines and Mental Representation. London: 1995.
Fodor, Jerry A. Elm and the Expert: An Introduction to Mentalese and Its Semantics. Cambridge: 1994.
Hahn, Udo. Schacht, Susanne. Bröker, Norbert. Concurrent, Object-Orientated Natural Language Parsing: The ParseTalk Model. Arbeitsgruppe Linguistische
Informatik/Computerlinhguistik. Freiburg: 1995.
Penrose, Roger. The Emperor’s New Mind: Concerning Computers, Minds and The Laws of Physics. Oxford: 1989.
Schank, Roger. The Cognitive Computer: On Language, Learning and Artificial Intelligence. Reading: 1984.
Schank, Roger and Colby, Kenneth. Computer Models of Thought and Language. San Francisco: 1973.
Search:
Home • Essays • Interviews • Programs • Books • Media • Software • Hardware • Games • Features
Discussion Board • AISolutions • Competition • Education • Projects • Glossary • Links • Search
Data size
We will be looking at the picture at the right throughout the essay. We will be making a few
changes though - we will say that the picture is an 8-bit 640x480 images (not the 200x150
24-bit image it actually is) since this is the "standard" size and colour-depth of a computer
image.
Why is this important? Well, the first consideration/problem of vision systems is the sheer size
of the data it has to deal with. Doing the math, we have 640x480 pixels to begin with
(307,200). This is multiplied by three to account for the red, green and blue (RGB) data
(921,600). So, with just one image we are looking at 900K of data!
So, if we are looking at video of this resolution we would be dealing with 23Mb/sec (or
27Mb/sec in the US) of information! The solution to this is fairly obvious - we just cannot deal with this sort of resolution at this
speed at this colour depth! Most vision systems will work with greyscale video perhaps 200x150. This greatly reduces the data rate -
from 23Mb/sec to 0.72Mb/sec! Most modern day computer can manage this sort of rate very easily.
Of course, receiving the data is the smallest problem that vision system face - it is processing it that takes the time. So how can we
simplify the data down further? I'll present two simple methods - edge detection and prototyping.
Edge Detection
Most vision systems will be determining where and what something is, and for the most part, by detecting the edges of the various
shapes in the image should be sufficient to help us on our way. Let us look at two edge detections of our picture:
The left picture is generated by Adobe Photoshop's "Edge Detection" filter, and the right picture is generated by Generation5's
ED256 program. You can see that both programs picked out the same features, although Photoshop has done a better job of
1 1 1
1 -8 1
1 1 1
Now, let us imagine we are looking at a pixel that is in a region bordering a black-to-white block. So the pixel and its surrounding 8
neighbours would have the following values:
Prototyping
Prototyping came about through a data classification technique called competitive learning. Competition learning is employed
throughout different fields in AI, especially in neural networks or more specificially self-organizing networks. Competitive learning
is meant to create x-number of prototypes given a data set. These prototypes are meant to be approximations of groups of data within
the dataset.
Somebody thought it would be neat to apply this sort of technique to an image to see if there are data patterns within an image.
Obviously it is different for every image, but on the whole, areas of the image can be classified very well using this techinque. Here a
more specific overview of the algorithm:
Prototyping Algorithm
1. Take x samples of the image (x is a high number like 1000). In our case, these samples would consist of small region of the
image (perhaps 15x15 pixels).
2. Create y number of prototypes (y is normally a smaller number like 9). Again, these prototypes would consist of 15x15 groups
of pixels.
3. Initialize these prototypes to random values (noisy images).
4. Cycle through our samples, and try and find the prototype that is closest to the sample. Now, alter our prototype to be a little
closer to the sample. This is normally done by a weighted average. ED256 brings the chosen prototype 10% closer to the
sample.
5. Do this many times - around 5000. You will find that the prototypes now actually represent groups of pixels that are
predominate in the image.
6. Now, you can create a simpler image only made up of y colours by classifying each pixel according to the prototype it is
closest too.
Here is our picture in greyscale and another that has been fed through the prototyping algorithm built into ED256. We use greyscale
to make prototyping a lot simpler. I've also enlarged the prototypes and their corresponding colours to help you visualize the process:
Notice how the green corresponds to pixels that have a predominantly white surroundings, most are red because they are similar to
the "brick" prototype. For very dark areas (look at the far right window frame) they are classified as dark red.
For another example, look at this picture of a F-22 Raptor. Notice how the red corresponds to the edges on the right wing (and the
left too for the some reason!) and the dark green for the left trailing edges/intakes and right vertical tail. Dark blue is for horizontal
edges, purple for the dark aircraft body and black for the land.
Conclusion
How do these techniques really help machine vision systems? It all boils down to simplifying the data that the computer has to deal
with. Less data, the more time can be spent extrapolating features. The trade-off is between data size and retaining the features within
the image. For example, with the prototyping example, we would have no trouble spotting the buildings in the picture, but the tree
and the car are a lot harder to differentiate. The same applies with a computer.
In general, edge detection helps when you need to fit a model to a picture - for example, spotting people in a scene. Prototyping helps
to classify images, by detecting their prominent features. Prototyping has a lot of uses since it can "spot" traits of an image that
humans do not.
Search:
Home • Essays • Interviews • Programs • Books • Media • Software • Hardware • Games • Features
Discussion Board • AISolutions • Competition • Education • Projects • Glossary • Links • Search
A deep explaination into all of these is out of the scope of this essay (see Robotics Essays for more information), but basically what
this is saying is that often people have tried to model the brain on a computer by modelling the brain on the computer! That is, they
use the computer as the analogy when trying to figure out what how the brain works - therefore, our ideas of the brain are often
distorted, oversimplified, or merely too computationally based.
Mimicking Intelligence?
A wonderful topic to throw about is the topic of mimicking intelligence. Can you mimic intelligence? Deep Blue beat Kasparov in
the game that often signified man's intelligence. Did Deep Blue exhibit intelligence? It played (and won) a game that requires a
significant amount of 'thought' and 'planning.' Deep Blue analyzes the board through immense computational power, so what does
and does not constitute intelligence? If a human made a list of all plausible moves given the board diagram, then started removing
options using a set of rules until he came up with what he thought was the best move, would that be classified as intelligence.
Definitely - this type of approach to problems is taken in many fields (granted, not chess) and the question of whether intelligence is
being used is never raised. Now, take a computer and do the same thing and ask yourself the same question. Why do people find it so
hard to see the same thing?
Some people will say that it is merely mimicking intelligence. What is the formal definition of mimicking intelligence? If forced into
answering this question, my second reply would be "the ability to display all traits of intelligence" - my first reply would be that no
such thing exists. To me, 'mimicking intelligence' is an oxymoron. If all traits of intelligence are exhibited (for the meanwhile, let us
assume traits of intelligence reduce down to the ability to have a meaningful conversation with a human) then intelligence exists! I
cannot see how humans as a race can classify how intelligence should be defined when we do not how our own brains work.
The human race is easily threatened because we have been on top for so long that we have never had to deal with something
potentially superior to ourselves. Indeed, Kasparov (and many others) saw the match between himself and Deep Blue as a way for the
human race to "help defend our dignity." The inherent narcissistic tendencies of the human race have been deflated slowly since
Copernicus told us we weren't the centre of the universe. Then again, by Darwin who said we'd evolved from protozoans. Now,
perhaps by IBMs Deep Blue team telling us its not just us who can play chess!
Conclusion
Artificial Intelligence is fraught with philosophical questions, since much of the brain and its functionings are yet unanswered. In this
essay, I merely moved from topic to topic as I wrote. These topics are by NO means the only ones brought up by Artificial
Intelligence. Also, as AI advances (and I believe it will) toward completely humanoid robots, many more philosophical, moral,
ethical and indeed even theological questions will arise. If computers and their apparent lack of intelligence isn't being battered, their
apparent lack of a conciousness (or indeed, soul) is. Now there's some food for thought...
"...You can't do without philosophy, since everything has its hidden meaning which we must know..."
- Maxim Gorky The Zykovs 1918.
StudyWeb
Academic
Excellence
Award
Search:
Home • Essays • Interviews • Programs • Books • Media • Software • Hardware • Games • Features
Discussion Board • AISolutions • Competition • Education • Projects • Glossary • Links • Search
Production Systems
Symbolic AI Systems vs Connectionism
Symbolic AI systems manipulate symbols, instead of numbers. Humans, as a matter of fact, reason symbolically (in the most general
sense). Children must learn to speak before they are able to deal with numbers for example. More specifically, these systems operate
under a set of rules, and their actions are determined by these rules. They always operate under task oriented environments, and are
wholly unable to function in any other case. You can think of symbolic AI systems as "specialists". A program that plays 3d tic tac
toe will not be able to play PenteAI (a game where 5 in a row is a win, but the players are allowed to capture two pieces if they are
sandwiched by the pieces of an opposing player). Although symbolic AI systems can't draw connections between meanings or
definitions and are very limited with respect to types of functionality, they are very convenient to use for tackling task-center
problems (such as solving math problems, diagnosing medical patients etc.). The more flexible approach to AI involves neural
networks, yet NN systems are usually so underdeveloped that we can't expect them to do "complex" things that symbolic AI systems
can, such as playing chess. While NN systems can learn more flexibly, and draw links between meanings, our traditional symbolic AI
systems will get the job done fast.
An example of a programming language designed to build symbolic AI systems is LISP. LISP was developed by John McCarthy
during the 1950s to deal with symbolic differentiation and integration of algebraic expressions.
Production Systems
(This model of production systems is based on chapter 5 of Stan Franklin's book, The Artificial Mind, the example of the 8-puzzle
was also based on Franklin's example)
Production systems are applied to problem solving programs that must perform a wide-range of seaches. Production ssytems are
symbolic AI systems. The difference between these two terms is only one of semantics. A symbolic AI system may not be restricted
to the very definition of production systems, but they can't be much different either.
Production systems are composed of three parts, a global database, production rules and a control structure.
The global database is the system's short-term memory. These are collections of facts that are to be analyzed. A part of the global
database represents the current state of the system's environment. In a game of chess, the current state could represent all the
positions of the pieces for example.
Production rules (or simply productions) are conditional if-then branches. In a production system whenever a or condition in the
system is satisfied, the system is allowed to execute or perform a specific action which may be specified under that rule. If the rule is
not fufilled, it may perform another action. This can be simply paraphrased:
WHEN (condition) IS SATISFIED, PERFORM (action)
For a scenerio where a production system is attempting to solve a puzzle, pattern matching is required to tell whether or not a
condition is satisfied. If the current state of a puzzle matches the desired state (the solution to the puzzle), then the puzzle is solved.
However, if this case is not so, the system must attempt an action that will contribute to manipulating the global database, under the
production rules in such a way that the puzzle will eventually be solved.
->
In order to take a closer look to control structures let us look at a problem involving the eight puzzle. The eight puzzle contains eight
numbered squares laid in a three-by-three grid, leaving one square empty. Initially it appears in some, obfuscated state. The goal of
the production system is to reach some final state (the goal). This can be obtained by successively moving squares into the empty
position. This system changes with every move of the square, thus, the global database changes with time. The current state of the
system is given as the position and enumeration of the squares. This can be represented for example as a 9 dimensional vector with
components 0, 1, 2, 3,..., 8, NULL, the NULL object being the empty space.
In this puzzle, the most general production rule can be simply summed up in one sentence:
If the empty square isn't next to the left edge of the board, move it to the left
However, in order to move the empty square to the left, the system must first make room for the square to move left. For example,
from the initial state (refer to above figure) the 1-square would be moved down 1 space, then, the 8-square right 1 space, then the
6-square up one space in order for the empty square to be moved left (i.e., a heuristic search). All these sequences require further
production rules. The control system decides which production rules to use, and in what sequence. To reiterate, in order to move the
empty square left, the system must check if the square is towards the top, or somewhere in the middle or bottom before it can decide
what squares can be moved to where. The control system thus picks the production rule to be used next in a production system
algorithm (refer to the production system algorithm figure above).
Another example of a production system can be found in Ed Kao's 3-dimensional tic-tac-toe program. The rules and conditions for
the AI are conviently listed just like for the 8 puzzle.
The question that faces many artificial intelligence researchers is how capable is a production system? What activities can a
production system control? Is a production system really capable of intelligence? What can it compute? The answer lies in Turing
machines...
Programs
These programs written are examples of production systems.
3d Tic Tac Toe - E. Kao
PenteAI - J. Matthews
Project AI GameDev.net
See Also:
Artificial Intelligence:AI Theory
Project AI
by Mark Lewis Baldwin and Bob Rakosky
Introduction
When designing an artificial intelligence (AI) for a strategy game, one must keep clearly in mind the final goal,
i.e. winning the game (actually the goal is to entertain the customer, but the sub-goal of trying to win the
game seems more appropriate here). Normally, winning the game is accomplished by reaching a set of victory
conditions. To achieve these victory conditions, the computer needs to control a divergent set of resources
(units and other decisions) in a coordinated and sophisticated manner.
In order to give a frame of reference for this discussion, we will be discussing Project AI from the point of view
of a strategic wargame, which has multiple units. Each needs to make separate movement decisions in each
turn of the game play. However, this system is not restricted to wargames. It's applicable in any game in
which a large number of decisions have to be made controlling a number of resources, which work best in
coordination with each other.
There are a number of ways to approach the problem and what will be discussed is by no means the only
approach. Project AI is a methodology that allows the computer to solve this problem at a strategic as well as
tactical level. But first, we need to build up to it by discussing the levels of AI decision making upon which it is
based. Each level described below is build upon it's previous levels.
First level...
Approach: In each turn (or cycle), examine each unit to be moved (i.e. node of decision making), build an list
of possible decisions and pick one randomly. In other words, look around and move somewhere.. It doesn't
matter what. Note that an action of doing nothing is still an action.
Problems: This does not direct the computer AI's resources toward the goal of victory, other than at a noise
level. It will however confuse the opponent something awful.
Second level...
Approach: When selecting from the possible moves (decision list) for each unit, pick that move that achieves
the victory conditions. To be specific, look around. Are there any victory goals achievable by the unit (i.e. can
the unit move into a victory location)? If so, implement.
Problem: When there are multiple actions which achieve the same goal, there is no filter in differentiating
equal (or nearly equal) actions. Also, most victory conditions cannot be achieved by any one decision,
therefore placing us back to the first level.
Third level...
Approach: Evaluate each alternate move for a unit by how well it moves us toward the final victory goal. Each
move needs to evaluated not as a TRUE/FALSE but as a numeric value, analyzing how well the target victory
goal is reached. For example, an action which would move to a victory location in two turns is worth more
than an action that would move to a victory location in 20 turns.
Problems: This method does not allow for actions that will support reaching the final victory conditions, but in
and of themselves do not achieve victory (i.e. killing another unit when that is not part of the victory goals).
Fourth level...
Approach: Define specific sub-goals which assist the artificial intelligence in achieving the victory conditions,
but in and of themselves are not victory conditions. When a unit is making a decision, it evaluates the
possibility of achieving these sub-goals. Such sub-goals might include killing enemy units, protecting friendly
units, maintaining a defensive line, achieving a strategic position, etc. Accomplishment of each of the subgoals
is then factored in the evaluation of the decision tree. A decision is then made upon it. This process can
actually produce a semi-intelligible game.
Problems: Each unit makes it's decisions independent of all others. It's like warfare pre-Napoleonic. It works
well until the opponent starts coordinating their forces.
Fifth level...
Approach: Allow a unit making decisions to examine the decisions of other friendly units in weighing it's own
decision. Weigh the possible outcomes of the other units planned actions, and balance those results with the
current unit's action tree.
Problems: This allows for coordination, but not strategic control of the resources. However, this level is
actually beyond what many computer AI's do today. It can also lead to iterative cycling, in which each unit is
modifying it's decision based on the others in a viscous circle.
Sixth level...
Approach: Create a strategic (or grand tactical) decision making structure that can control units in a
coordinated manner.
This leads us to the problem of how does one coordinate diverse resources to reach a number of sub-victory
goals. This question may be described as strategic level decision making. One solution would be to look at
how the problem is solved in reality, i.e. on the battlefield or in business.
The solution on the battlefield is several layers of a hierarchical command and control. For example, squad's
1, 2 and 3 are controlled by Company A which in and of itself is controlled by a higher layer of the hierarchy.
Communications of information mostly go up the hierarchy (information about a squad 20 miles away is not
relayed down to the local squad), while control mostly goes down the hierarchy. Upon occasion, information
and control can cross the hierarchy, and although it's happening more now than 50 years ago, it is still
relatively infrequent.
As a result, the lowest level unit must depend on it's hierarchical commander to make strategic decisions. It
cannot itself because a) it doesn't have as much information to make the decision with as it's commander,
and b) it is capable of making a decision based on known data different that others with the same data,
causing chaos instead of coordination.
OK, first cut solution. We build our own hierarchical control system, assigning units to theoretical larger units
or in the case where the actual command/control system is modeled (V for Victory), the actual larger units.
Allow these headquarters to control their commands and work with other headquarters as some type of
'mega-unit'. These in turn could report to and be controlled by some larger unit.
Note that this was actually my first approach to the problem.
But there seems to be some problems here. The hierarchical command system modeled on the real world
does not make optimal use of the resources. Because of the hierarchical structure, too many resources may
be assigned to a specific task, or resources in parallel hierarchies will not cooperate. For example, two units
might be able to easily capture a victory location they are near, but because they each belong to a separate
command hierarchy (mega unit) they will not coordinate to do so, but if by chance they did belong to the
same hierarchy, they would be able to accomplish the task. In other words, this artificial structure can be too
constraining and might produce sub optimal results.
And the human player does not have these constraints.
First, we have to ask ourselves, if the hierarchical command and control structure is not the best solution, why
is it used by business and the military? The difference is in the realities of the situations. As we previously
pointed out, in the battlefield, information known at one point in the decision making structure might not be
known at another point in the hierarchy. In addition, even if all information was known everywhere, identical
decisions might not be made from the same data. However, in game play, there is only one decision maker
(either the human or the AI) and all information known is known by that decision maker. This gives the
decision make much more flexibility on controlling and coordinating her resources than does the military
hierarchy.
In other words, the military and business system of strategic decision is not our best model. It's solution
exists because of constraints on communication. But those constraints do not exist in strategy games
(command and control is perfect) and therefore modeling military command and control decision making is
not our perfect model to solve the problem in game play AI.
And we want the best technique of decision making we can construct for our AI. So below is an alternative
Sixth Level attack on the problem...
The basic idea behind Project AI is to create a temporary mega-unit control structure (called a Project)
designed to accomplish a specific task. Units (resources) are assigned to the Project on an as needed basis,
used to accomplish the project and then released when not required. Projects exist temporarily to accomplish
a specific task, and then are released.
Therefore, as we cycle through the decision making process of each unit, we examine the project the unit is
assigned to (if it is assigned to one). The project then contains the information needed for the unit to
accomplish its specific goal within the project structure.
Note that these goals are not the final victory conditions of the game, but very specific sub-goals that can lead
to game victory. Capturing a victory location is an obvious goal here, but placing a unit in a location with a
good line of sight could also be a goal, although less valuable.
Let's get a little more into the nitty gritty of the structure of such projects, and how they would interact.
- Type of project -- What is the project trying to accomplish, for example, defend a city, kill an enemy unit,
capture a geographical location, etc.
- Formula for calculating the incremental value of assigning a unit to a project -- In other words, given a unit
and a large number of projects, how do we discern what project to assign the unit to. This formula might take
into account many different factors including how effective the unit might be on this project, how quickly the
unit can be brought in to support the project, what other resources have already been allocated to the project,
what is the value of the project, etc. In practice, we have associated the formula with the project type, and
the each project has just carried specific constants that are plugged into the formula. Such constants might
include enemy forces opposing the project, minimum forces required to accomplish the project, and
probability of success.
OK, now how do we actually use these 'projects'. Here is one approach...
1) For every turn, examine the domain for possible projects, updating the data on current projects, deleting
old projects that no longer apply or have too low a priority to be of value, and initializing new projects that
present themselves. For example, we have just spotted a unit threatening one of our towns, we create a new
project which is to defend the town, or if the project already existed, we might have to reevaluate its value
and resources required considering the new threat.
2) Walk through all units one at a time, assigning each unit to that project that gives the best incremental
value for the unit. Note, that this actually may take an iterative process since assigning/releasing units to a
project can actually change the value of assigning other units to a project. Also, some projects may not
receive enough resources to accomplish their goal, and may then release those resources previously assigned.
3) Reprocess all units, designing their specific move orders taking into account what Project they are assigned
to, and what other units also assigned to the project are planning on doing. Again, this may be an iterative
process.
The result of this Project structure is a very flexible floating structure that allows units to coordinate between
themselves to meet specific goals. Once the goals have been met, the resources can be reconfigured to meet
other goals as they appear during the game.
One of the implementation problems that Project AI can generate is that of oscillations of units between
projects. In other words, a unit gets assigned to one project in one turn, thus making a competing project
more important, grabbing the unit the next turn, etc. This can result in a unit wandering between two goals
and never going to either. The designer needs to be aware of this possibility and protect for it. Although there
can be several specific solutions to the problem, there is at least one generic solution. Previously, we
mentioned a formula for calculating the incremental value of adding a unit to a project. The solution lies in
this formula. To be specific, a weight should be added to the formula if a unit is looking at a project it is
already assigned to (i.e., a preference is given to remaining with a project instead of jumping projects). The
key problem here is assigning a weight large enough that it stops the oscillation problem, but small enough
that it doesn't prevent necessary jumps in projects. So one may have to massage the weights several times
before a satisfactory value is achieved.
Seventh Level
One can extrapolate past the "Project" structure just as we built up to it. One extrapolation might be a
multilayer level of projects and sub-projects. There are other possibilities as well to explore.
© Copyright 1997 - Mark Baldwin and Robert Rakosky
You can find out more about myself and services provided by Baldwin Consulting at
http://members.aol.com/markb01
See Also:
Artificial Intelligence:AI Theory
A new algorithm. Describes an algorithm, which successfully diagnoses diseases. Essentially, it reverses the logic of
the search process from selection to elimination, to achieve remarkably speedy results.
Instant recognition. When presented with unique links, the algorithm achieves instant recognition in massive search
spaces. It logically handles uncertainty, avoids stupid questions and is holistic. It also ignores the age old reasoning
chains of science, travelling a new avenue in the application of inductive logic.
The nerve cell and recognition. Currently, nerve cells are believed to be computational devices. A new recognition
role is suggested for neurons. They may recognise incoming patterns. Recognition may explain such phenomena as the
modification of pain, the focus of attention, awareness and consciousness.
Memory. Recognition is "the establishment of an identity". It may be achieved by comparing the features of an entity
to those in memory. Recognition may mandate memory. Nerve cells may carry such memory. Feelings may be nerve
impulses, the recognition of which may provide context for the recall of memory.
Recognition of objects. Nerve channels project from point to point, observing neighbourhood relationships. Such
mapping may suggest a matrix type transmission. Intuition may be the instant recognition of such cyclic transmitted
pictures. Cortical association regions recognise objects and may transmit pictures, for recognition by the system.
Motor control. Instant intuitive recognition of pictures may empower motor control functions. Persisting iterating
patterns may form the basis for achieving objectives. Such goal patterns may be triggered by feelings. Habitual
activities may be recalled through intuitive and iterative pattern recognition by the cerebellum.
Event recognition. Intuitive iterating patterns are suggested as enabling the recognition of events. Event recognition
may be the key to complex thought processes. Event recognition may automatically trigger feelings.
The goal drive. Iterating goal patterns may provide basic drives and long term goals and may represent the "purpose"
of the system. Purpose is set by the current feeling. The will of the system may be decided by the limbic system which
may determine the "current feeling".
The mind. Consciousness may be an independent intelligence, which expresses judgment and will, and resides in a
restricted group of nerve channels. The limbic system may over rule will to determine the current feeling and hence set
goals for the system.
An expert system shell. Details of the design of an AI shell program, which can be utitlised to create expert systems.
Explains simple method of knowledge input. Suggests areas in which expert systems can be helpful.
References.
memory, recalled, processed and then translated into an acceptable output mode. In AI, problems are translated into
specialised languages. Problem specific languages assist programs to play chess, or diagnose diseases. This need for
specialised languages partitions AI solutions into compartments. There is no single way in which problems can be
represented in AI to tackle chess, diagnostics, chemical analysis and banking. While the ultimate goal of AI may be to
become a single equivalent to human intelligence, its own languages fail to communicate with each other. As opposed
to this, the internal language used by the mind appears to fathom the whole world as we know it. This mystery is
sought to be addressed in this essay, using the logic of a new algorithm. The logic may point to a single internal
representation, for use by the mind. This may be its own interior language of communication.
Pattern Recognition. The second issue that has baffled AI researchers is the problem of how to identify a problem as
belonging to the field of mathematics, vision, or game playing, even before attempting to solve it. With its abstract
qualities, one can see difficulties in identifying a problem. Let alone identify a problem, AI efforts have failed to even
identify a tangible physical object, such as a face. Today, in spite of huge advances in technology, a computer cannot
identify a particular face as belonging to a particular person. The difficulty is that all recognisable objects and events in
our environment have innumerable shared qualities. For a computer, they form trillions of patterns, which overlap each
other. Establishing the identity of a single pattern among a range of overlapping patterns is called pattern recognition.
The recognition of a known face is a pattern recognition task. In AI, a computer algorithm may follow a logical
procedure to solve this problem. A pattern recognition algorithm may attempt to establish the identity of a seen pattern
through a sequence of logical steps. It may seek to identify a seen face as one belonging to a known person.
An exact match an impossibility. Current AI algorithms attempt to identify a pattern by matching its characteristics
strictly with that of a known pattern. The characteristics of known patterns can be stored in the memory of computers
for recall. Consider the problems in the recognition of a face. There are billions of faces in the world. They share
thousands of common features. The characteristics of colour, skin texture, facial features and makeup overlap each
other on a virtually infinite scale. People age, grow beards or change appearances with moods. The changes caused by
light and shade add further complexity. In such an environment, where patterns themselves have millions of shifting
characteristics, it is virtually impossible to find an exact match even if patterns are matched at the microscopic level of
detail. This essay suggests an algorithm which can establish the identity of a pattern in such a complex and changing
environment.
The problem of uncertainty. The third issue which has posed problems for AI programs is the factor of "uncertainty".
Computers work with a "Yes or No" logic. A characteristic belongs to a pattern, or it does not. A pattern can be
selected, or rejected on this basis. Unfortunately many characteristics have vague relationships to patterns. They are
only sometimes present. "Fuzzy logic" attempts to handle vagueness by giving grades to a characteristic, such as short,
medium height, tall and very tall. While this helps to define a characteristic in greater detail, it fails to handle
identification of a person who sometimes wears spectacles. A computer can match "wears glasses", or "does not wear
glasses". It cannot handle both. Unfortunately most patterns have such variable qualities. This essay attempts to show
how such uncertainty can still help pattern recognition.
Instant identification of context. The fourth issue, which has frustrated AI research is the inadequacy of available
tools to gauge the awesome size of the search space. When an AI program attempts machine translation of a word in
context, it must store contextual data and recall this through a search process. It is like searching for a needle on the
beach. The mind instantly identifies context. Every seen object or event fetches its own contextual background. When
the word "pool" is used with "swim", it suggests one meaning and quite another when used with "cartel". As we read,
specific meanings, which exactly suit the context, are instantly recalled. The mind holds a lifetime of memories and
associative thoughts. Yet it instantly identifies a single contextual meaning from such a gargantuan search space.
Computers seek an item in memory through a serial match. One characteristic of the perceived object is compared with
the characteristic of an item in memory. If this matches, the second characteristic is compared and so on, in a
systematic search.
An intractable search problem. The search space is enormous. In AI, a systematic search brings related problems as
to where to begin a search, and the direction of the search. "Heuristics" is a term used for determining a search
direction. If one is searching for a needle on the beach, heuristics would suggest a search to the North to locate it. But
such solutions work only in small search spaces. In spite of many attempted shortcuts, all such search algorithms
eventually face the problem of a "combinatorial explosion". The back and forth search paths become intractably
prolonged and cumbersome. While it takes milliseconds for the mind to locate a memory in context, the AI search and
match algorithm would take years, if it was to recall a single memory from a lifetime of memories. This essay suggests
an algorithm which can make instant identification practical for the mind in the context of a large search space.
A slower processing mechanism. The fifth puzzle is that the human nervous system is known to process data far
slower than a computer. (1) While messages in integrated circuits travel at the speed of light, nerve impulses travel just
a few yards per second. While computers process information in millions of cycles per second, the mind runs at
between 50 and 10,000 cycles per second. When one considers the enormous size of the memory bank of the mind,
how does a slower processing system achieve such incredible speed in locating one memory from trillions of memory
traces? This process of instant identification is usually called intuition, a hitherto unexplained and mysterious capability
of the mind. Parallel processing by the billions of nerve cells in the nervous system does explain some of the
complexity of the mind. Even then, no known search algorithm can achieve such precision with such speed. This essay
suggests a search algorithm which could be used by the mind to practically achieve the speed of intuition, even within
the limitations of the slower processing speeds of the mind.
No chain of reasons. The sixth issue is the mystery surrounding the reasoning processes of the mind. AI programs
attempt to give "backward chaining". When a solution is offered for a problem, step by step reasoning is provided for
the final conclusions. A chain of reasons links the premise to the conclusion. Yet, the average person detects a mistake
in the syntax of a sentence, without necessarily knowing anything about nouns, verbs, prepositions, deep structure, or
other intricacies of grammar. When a person pays attention to a sentence, errors are detected, without always knowing
why they are errors. Thus the reasoning processes used by AI do not appear to be the methods used by the mind. This
essay suggests that the mind may be constructed around a pattern recognition model, which does not apply reasoning
chains to draw its conclusions.
Where does memory reside ? The seventh issue that has baffled scientific research is the scarcity of data concerning
the location of human memory. (2) Classic experiments carried out in the early part of this century on the memories of
rats concluded that no particular location of the brain stored memories and that memories were somehow stored in a
distributed fashion across the entire network. Current theory supports this hypothesis that memory is a network
phenomenon. Research from the seventies in "neural networks" suggested that a network could be induced to carry a
memory through their tendency to balance the relationships between various nodes. By providing "weightage" to
nodes, it was possible for units of memory to be stored. Such an explanation implied that the nodes were devices which
received inputs, carried out certain computation and sent out nerve signals. Opposing this theory, this essay suggests a
recognition rather than a computational role for nerve cells. In the process, the paper suggests a location for human
memory.
A New Algorithm
Recognition and intelligence. Consider the process of reading. The words are just black and white patterns on paper.
Recognition of the patterns conveys the purpose of the author to the reader. A single message on paper can move an
army. The act of recognition of the patterns on the paper provides a powerful, but invisible link. If we did not
comprehend the recognition process, the arrival of a march order would appear to have a puzzling response. The
nervous system appears a mysterious network, with billions of inter-linked communicating nodes. The process of
becoming conscious, or of paying attention appear as baffling activities of the system, without any rational explanation.
This essay shows how instant recognition of patterns by neural processes can reasonably trigger intelligent activity in
real time. Recognition appears to be the key to intelligence.
The Intuitive Algorithm (IA). While the geography and functions of the human nervous system are well known and
well documented, the mind remains a mysterious entity. The key insight to the answers suggested in this essay come
from a diagnostic expert system which uses a new pattern recognition algorithm. It logically achieves virtually instant
recognition in a large search space - the suspected quality of intuition. A similar logic can enable intuition to achieve
the equivalent of instantly finding a needle on the beach. It removes the mystery surrounding intuition. It can be viewed
as a practical process which can identify a single item from an astronomically large database. It grants the mind the
ability of timely recognition in context. The insight opens to view the awesome range and power of an intelligently
interactive mind. The concept begins with the expert system. It uses a singular algorithm. Let us call it the Intuitive
Algorithm (IA).
The conventional expert system. When presented with a list of indicated symptoms, a diagnostic expert system
identifies a disease. Its database contains hundreds of diseases and their symptoms, including many commonly shared
symptoms. If a disease is a pattern, the objective is to identify a single pattern in a collection of interweaving patterns.
As explained before, traditional expert systems achieve this with an open ended search, based on indicated symptoms.
The database is searched for a disease that exhibits the first symptom. The first located disease having the first
symptom is tested for the second symptom. If the test fails, a new disease with the first symptom is located and the
second symptom is again tested. Each new symptom brings new diseases into evaluation. The search ends when all the
presented symptoms match the indicators of a single disease.
The IA process. IA uses a different approach in a logical search of a database. Each disease is stored with one of three
("Yes" (Y), "Neutral" (U), or "No" (N) ) relationships to each symptom question. Y means a positive link - the
symptom is always present in the disease. U means the symptom is sometimes present. And N means the symptom is
absent for the disease. After each answer to a presented symptom question, the Y/U/N relationships of all diseases are
tested in a single step, just the way all cells in a spreadsheet are instantly recalculated. The Y/U/N relationships are
entered specifically for their negative impact. An "Yes" answer eliminates all "N" diseases. If the problem is unilateral,
all bilateral eye diseases are eliminated. A "No" answer eliminates all "Y" diseases. If visual acuity is not affected, all
eye diseases which impact on visual acuity are eliminated. IA also purges questions which have "Y' relationships only
to eliminated diseases. The questioning process begins with the question which has the maximum number of "Y"
relationships. It ends when the presented symptoms eliminate all but a single disease. Specific questions can then
confirm the diagnosis. If all diseases are eliminated, the conclusion is that the presented symptoms do not match any
disease in the database. For IA, it is then an unknown disease. Such a problem solving approach gives IA some
exceptional capabilities.
IA circumvents "stupid questions". Normal search algorithms serially seek to match a symptom with a single
disease. IA narrows the search faster by evaluating the entire database concerning the current answer. IA is holistic.
Doctors know that the lack of a particular symptom clearly indicates the absence of a particular disease. So, a
subsequent query which suggests the possibility of that disease is a "stupid question". If a patient reports a lack of pain,
a subsequent question posing the possibility of a disease which always presents a powerful pain symptom is, naturally,
considered stupid. Such a question annoys the user. With their "back and forth, open ended" serial searches, a
traditional expert system is blind to the global impact of a previous answer on subsequent questions. Additional steps
are required to correct this defect. IA avoids "stupid questions" by purging all "Y" questions which relate only to
diseases eliminated by the process.
IA logically manages "uncertainty". When a disease exhibits a symptom only occasionally, (a "U" condition), it is
retained within the database regardless of whether the answer to the symptom question is "Yes" or "No". The disease is
not eliminated. It remains available for "further consideration". IA continues the elimination process. Each answer
eliminates "Y" or "N" diseases as per the entered relationships, taking IA ever closer to the answer. IA achieves the
subtle objective of making a decision on an uncertain piece of information. While the disease with the uncertain
condition is "retained", every answer continues the elimination process. On the other hand, an uncertain condition is
"garbage" for a traditional expert system, which cannot "match" a disease which has a "maybe" relationship to a
symptom. Since IA does not seek an exact match, it logically handles "uncertainty". For correctly entered relationships,
the IA logic is flawless in diagnosis. Traditional expert systems are slowed down through the exponential growth of
their back and forth search steps. They ask a tediously long series of questions, including stupid ones. They fail to
handle uncertainty. IA is generations ahead of current expert systems. Doctors certify that IA is fast and never asks
stupid questions.
Inductive logic. But, IA follows the logic that a person does not have a particular disease if he does not have a
particular symptom. This is not a conventional logical derivation. In any diagnostic process, we can use deductive, or
inductive reasoning. In deductive reasoning, a generally accepted principle is used to draw a specific conclusion. All
men are mortal. Socrates is a man. Therefore Socrates is mortal. When a person uses a number of established facts to
draw a general conclusion, he uses inductive reasoning. For instance, the observation of swans over the centuries has
led to the conclusion that all swans are white. This is the kind of logic which is normally used in the sciences. An
inductive argument, however, is never final. It is always open to the possibility of being falsified. The discovery of one
black swan would falsify "the white swan theory". Inductive reasoning is always subject to revision if new facts are
discovered. The sciences progress through this process of induction and falsification.
Exclusion is also a logical process. Inductive reasoning has traditionally been based on the principle of inclusion. The
white swan theory is a result of experience over time. If we saw a white bird, we would move one step forward in
identifying it as a swan. But logic is equally sound in exclusion. If the bird was black, we could conclude that it is not a
swan. Subsequent discovery of a black swan would make this induction wrong. But, if the reasoning that all swans are
white was true, then the induction that a black bird is not a swan would be equally true. The white swan theory can
logically lead to both conclusions. In a similar manner, if a symptom is always present for a particular disease,
inductive logic also implies that an absence of the symptom excludes that disease from further consideration. This is
not a conventional conclusion, but is accurate and unassailable.
IA avoids an exact match and uses elimination. A conventional search algorithm seeks an exact match between
indicated symptoms and the symptoms in memory for a known disease. The objective of IA is not to find an exact
match, but to eliminate those diseases which fail to meet the search criteria. Both "Yes" and "No" answers are
specifically encoded to eliminate unrelated diseases. Consider a patient with a disease, who approaches a computer
diagnostic session. Let us say the computer has a list of 200 diseases, which can be identified by 1000 symptom
specific questions stored in the system. (Many diseases will share common symptoms). In practice, on an average, each
disease may answer "Yes" to 20 of the 1000 questions.
More clues in elimination. But, upto 200 "Yes" answers may justify the elimination of the disease, since most
symptoms will promptly point to specific groups of diseases, excluding others. The conventional expert system looks
only for "Yes" answers. It will match the answers for the disease of the patient to just 20 of the 1000 questions. For this
patient, 980 answers will not take the search forwards. But for IA, every "Yes" answer can eliminate up to 20 percent
of the diseases. Elimination of a disease also removes its related questions. The elimination process will yield speedy
results even for "No" answers. IA will identify the disease long before the 20 relevant questions for the disease are
exhausted by swiftly purging any remaining alternatives. In pattern recognition, an elimination procedure is
unbelievably faster than one which seeks an exact match.
Instant Recognition
A logic for instant recognition. The speed of the elimination process is even more striking for IA in a special
situation. When IA identifies a special condition, its recognition process is virtually instantaneous. Its memory stores
the relationships of all diseases to symptoms. Suppose only one disease has a "Y" relationship and all others, an "N"
relationship to an exceptional symptom. The symptom is unique to the disease. Then, an "Yes" answer to this symptom
eliminates all "N" diseases, leading immediately to recognition. The symptom indicates the disease. It is recognised in
a single step of massive elimination. The process is logical. It evaluates every disease in its database against a single
clue from one symptom. A doctor may walk into a surgery and instantly attend to a patient suffering from a heart
attack. He may not even ask a question. With minimum visual clues, he instantly identifies a single disease from his
"known database" of thousands of diseases. He instantly recognises a single pattern in a maze of interweaving patterns.
IA may be imitating the logic of this recognition process.
Unique features can identify a pattern. The IA logic does not seek an exact match, but concentrates on the
elimination of alternate possibilities. Elimination is most effective when there are unique features. It is a practical
strategy for recognition in nature. All the recognised objects in our environment are unique. Despite millions of shared
characteristics, they also have individual qualities. Even where patterns shift constantly, some characteristics remain
stable. Consider a face in a newspaper cartoon. It contains the barest minimum of information - a few lines which
define the edges of facial features. But a public figure is identified by just the curve of a nose. The context of being in
the newspaper eliminates all ordinary people. The turn of the nose eliminates all politicians with straight noses. Unique
features and elimination can determine the outcome. Massive amounts of data are not evaluated. A few clues.
Recognition is virtually instant. Elimination based on uniqueness can achieve logical and acceptable recognition.
IA imitates parallel processing. With the discovery of the spreadsheet, it became possible for computers with single
processors to imitate one characteristic of parallel processing. Even if a spread sheet has thousands of cells, a single
entry in one cell is instantly reflected in all the related cells. Thousands of serial calculations appear to the user as a
single parallel calculation. Logically, the spreadsheet can have billions of cells and a sufficiently powerful processor
can still deliver this result. The spreadsheet is holistic, since every cell reflects the current re-calculated position. IA is
similar. By evaluating the results of a single answer on all the diseases in its database, it is holistic and imitates parallel
processing. Logically, IA too can produce instant recognition in any size of search space. Any unique symptom can
enable IA to instantly identify one among several thousand diseases. If IA is to attempt a problem on the scale of the
human nervous system, the only limitation will be the practical problem of data entry.
IA compared to intuition. Consider the steps followed by IA. It stores details of all diseases, their characteristics and
the relationships between them in memory. It receives inputs concerning symptoms through "Yes/No" answers. It
simulates parallel processing to globally evaluate the current input. It is encoded negatively to use all inputs to
eliminate unrelated diseases. If an input indicates any unique symptom, it achieves instant recognition by eliminating
all except the related disease. It follows an algorithm which results in instant identification. Compare IA to the
recognition process of the mind. When a face is familiar, there is instant recognition. Let us call it intuition. Such
recognition, of thousands of such objects, is repeated by people world-wide millions of times every day. Like most
other events in nature, such a process must follow an orderly set of instructions to achieve results in a finite number of
steps. In essence, intuition must also follow an algorithm.
Memory and relationships. A comparison of IA with the current knowledge of the mind, reveals some similarities
and several unexplained enigmas. This essay attempts to fill in the gaps to create a composite view of the mind. Firstly
IA stores the names of all diseases in memory. It is logical to assume that the mind stores data on all known faces in
memory. But the mechanics of memory remains unknown. This essay suggests a sound possibility. Secondly, IA infers
that certain symptoms are present, or absent, based on simple "Yes/No" answers to queries. There is considerable
evidence that the mind isolates thousands of characteristics of any seen object. Obviously, the mind must perceive the
characteristics of faces to be present, or absent. Thirdly, IA stores the relationships between symptoms and diseases. In
recognising a face, the mind establishes its identity. Identification demands a link between a face and its known
characteristics. One must know that the face is oval, or round. It is reasonable to presume that the mind must have such
links. But how the mind stores such links remains a mystery. This essay suggests how nerve cells can establish and
store such relationships.
Nerve cells eliminate alternative possibilities. Fourthly, IA encodes a negative relationship between diseases and
their symptoms. It is deliberately coded to eliminate. Deliberate elimination of alternatives is a well documented
feature of the nervous system. (3) Nerve cells have a powerful system of parallel inhibition of surrounding neurons
when a particular group of neurons start to send information. This inhibition is strongest for those immediately adjacent
to the excited neurons. Throughout the nervous system there are neural circuits which switch off other circuits when
their own areas are energised. There is evidence that the mind carries such systematic elimination beyond logic. This is
illustrated in the popular vision experiment, where a drawing can be interpreted as a vase, or two faces facing each
other. The mind eliminates one interpretation to recognise the other - a vase, or two faces. Evidently each recognition
path acts powerfully to inhibit the other. Recognition is firmed up by eliminating even logical alternative solutions.
The coding of elimination by nerve cells. The mind is known to have specialised networks which perform unique
functions. There is a network to identify the edges of a seen object. Another to detect the beginning and end of
movements by muscles. This essay gives some examples of how such intelligence can be achieved through recognition
based on the memory codes of neurons. In fact, the key theme of this essay is that such recognition can give
intelligence to a network. Such a tool can give neural networks the capability of achieving a variety of intelligent tasks.
It is assumed that neurons may be suitably coded, to facilitate elimination of less viable alternatives. This essay does
not suggest any probable process the mind may use to determine such elimination. But, elimination, as a neural
process, remains a well documented and practically experienced event.
Parallel links for speed. Definitive research suggests that the brain simultaneously isolates every incoming sensory
image into myriad characteristics. (4) The visual image alone is divided into several hundred million separate
characteristics of light, shade, colour, outline and movement. We do not know how all this information gets organised
and processed. But, each nerve cell in the system is known to have a hundred to a quarter of a million links with other
cells. (5) The average nerve cell is known to respond within about 5 milliseconds of receiving a message. Since all cells
work in parallel, any message received by any cell can reach any other cell in the system within just five or six steps -
in just one fiftieth of a second. Currently, science does not know how such a process can rapidly transfer information in
the system. Recognition may be provide the pivotal link. It can link every cell to the system. If so, every cell in the
network can recognise and respond to every flash of incoming information. If we assume a recognition role for the
nerve cell, global interpretation of incoming information and instant response becomes feasible for the system.
IA imitates intuition. IA has classic simplicity and power in its logic. The elimination process is logical. It is discrete
and does not leave a fuzzy answer. Yet it has the ability to evaluate possibilities with vague qualities. If a face is known
to occasionally wear spectacles, all faces which never wear spectacles can be eliminated. A vague characteristic is
productive for IA. As opposed to this, a search and match algorithm finds the "occasional use" type of information
futile. IA logic is holistic, since it evaluates its entire database, with each input. Every answer updates its perspective,
by eliminating all elements that fail the search criteria. Every answer narrows its focus. It creates in IA the equivalent
of "global awareness" of the mind. As against this, a search and match algorithm ambles about in the vast search space
without a clue as to the global picture and appears stupid. Finally, IA instantly identifies a pattern, if it indicates even a
single unique quality, through simultaneous elimination. In conclusion, IA is logical. It imitates intuition in being
holistic, avoiding "stupid questions", handling uncertainty and in providing instant recognition.
single event. It may fire a volley of impulses when the event is recognised. The all or nothing response of the nerve cell
may be a form of Boolean logic. In Boolean algebra, all objects are divided into separate classes, each with a given
property. Each class may be described in terms of the presence or absence of the same property. An electrical circuit,
for example, is either on or off. Boolean algebra has been applied in the design of binary computer circuits and
telephone switching equipment. These devices make use of Boole's two-valued (presence or absence of a property)
system. Firing by each neuron may represent the presence, or absence of a distinct property. The entire nervous system
may recognise an input from a cell as a perception of the presence of a property. Alternatively, the system may
recognise firing by a cell and respond with a specific activity, such as a muscle movement.
Recognition at the input level. For sensory inputs, the firing of a nerve cell is known to indicate recognition. The
entire in formation input into the human nervous system is through cells called receptors which convert sensory
information into nerve impulses. (8) Chemoreceptors in the nose and tongue report on molecules which provide
information on taste and smell. Other receptors are massed together to form sense organs such as the eye and the ear.
There are receptors which report on pressure, touch, pulling and stretching. Nociceptors report on cutaneous pain.
Peripheral nerves connect these sensory receptors to the central nervous system. At the entire input level, nerve
impulses indicate recognition of the occurrence of millions of isolated events. The whole system recognises the firing
by each one of these cells as the perception of a single microscopic event. At the input level, the firing of a cell
indicates an act of recognition and not one of computation.
Motor events at the output level. At the output level, individual nerve impulses control motor outputs. There are
motor areas in the cortex, the wrinkled surface layer of the cerebral hemispheres of the human brain. (9) Careful
electrical stimulation of these areas send nerve impulses which invoke flexion or extension at a single finger joint,
twitching at the corners of the mouth, elevation of the palate, protrusion of the tongue and even involuntary cries or
exclamations. The nerve fibres carrying inputs to and outputs from the cortex pass through the thalamus, a major neural
junction in the brain. This junction plays a key role in this explanation of the activities of the mind. The nerve impulses
passing through follow a form of Boolean logic. They report the presence or absence of individual events, or activate or
are quiescent to isolated motor functions. Each action potential indicates, at the input and output levels, the perception
or the triggering of a property - a distinctive event.
Nerve cells cannot add apples to pears. At the input and output levels, the firing of a nerve cell indicates an event.
Current theory admits the Boolean function at these levels. But scientists imagine computation by nerve cells at
subsequent levels, where these messages are interpreted and transmitted further. While it has a single "all or nothing"
output, a typical neuron receives thousands of inputs from other nerve cells. Numeric computation (adding, subtracting,
dividing, or multiplying) of widely varying inputs is quite improbable. The inputs are distinctly different events such as
sound, light, pressure, or smell. The outputs are complex muscle movements. It is wildly chaotic to include all this into
an integrated computation. It is like adding apples to pears, or subtracting the sense of touch from the sense of pain. It
is more realistic to assume that a pain cell recognises touch and reacts by despatching or inhibiting a pain message.
Recognition can evaluate varied inputs and trigger an appropriate output. Recognition may provide the key to
understanding intelligence.
Recognition the first step to intelligence. Throughout the nervous system there are networks of cells, which appear to
act intelligently. These events have been assumed to be some form of network intelligence - a mysterious mental
capability. But such intelligence can be explained if we assume that nerve cells recognise incoming information and
respond with action potentials through their axons. A typical unexplained act of intelligence is the baffling capability of
the mind to modify the sensation of pain on its route to the cortex. The sensation of pain is known to be reported,
enhanced or suppressed, under varying conditions. Consider the following explanation. A neuron which reports
cutaneous pain may receive inputs from its primary pain sensory neuron (P), along with other dendritic inputs from
neighbouring (sympathetic) pain (SP) and touch sensory (T) cells. The cell may report pain and sympathetic pain. It
may ignore the sense of touch to report pain. It may also inhibit sympathetic pain giving priority to the sense of touch.
In such a context, the cell responses to the listed inputs may be as follows:
P - Fire. Reports pain.
SP - Fire. Reports sympathetic pain.
Memory
Current knowledge regarding memory is limited. There is little current knowledge about how memory is stored in
the brain. (12) Some researchers suggest that memory is stored in specific sites and others that memories involve
network functions, with many regions working together. This essay suggests a method for the storage of human
memory and a mechanism of its recall. This explanation forms an enabling requirement to support the insight that
instant recognition is a key function of the mind. This follows the hypothesis that nerve cells act as primary recognition
devices at the most fundamental level. Such a premise can explain how memory enables nerve cells to support
intelligent networks, recognition of entities and habitual motor functions. This view of memory structure is vital for all
the functions of the mind, as described in this essay. This section provides an overview of how a nerve cell may store a
memory and how the nervous system may recall a memory.
Recognition requires memory. At the input and output levels, the firing by a nerve cell signifies a finite event.
Receptor cells interpret these sensory inputs and send impulses. These impulses are relayed to the cortex in several
stages. At an intermediate stage, a cell may receive messages from multiple locations representing multiple categories
of such information. The modification of the sensation of pain, or the focusing of attention were suggested to act
through the recognition of incoming messages by reporting cells. This essay suggests that a cell fires when it receives a
distinct pattern which it recognises. To "recognise" is to establish an identity. The identity of any entity can be
established only when it has a known relationship to certain characteristics. Knowledge requires consistency. If a cell
knows a relationship, it must fire every time the relationship is recognised. So, the cell must store a memory of this
relationship, if it is to recognise it. If a cell has the power of recognition it, must have a memory. It is suggested that
such memory may be an ability to selectively recognise different combinations of incoming nerve impulses.
The structure of memory. A nerve cell, with say, 26 dendritic inputs coded from A to Z may have a memory for
combinations of simultaneous inputs, such as CDE, DXZ, etc. The neuron can be said to store a memory for each
combination, if it fires (or is inhibited) on receiving simultaneous impulses at C, D and E, or at D, X and Z. Each
combination becomes a relationship which the cell remembers. Each cell has a functional specialisation. When it fires,
it reports, or triggers a finite and unique event. The combination represents the relationship of this event to other events
(CDE, or DXZ) it perceives. As suggested earlier, the pain reporting neuron fires for pain (P), sympathetic pain (SP) or
pain and touch (P+T). It is inhibited by (SP+T). Each remembered combination becomes a unit of memory, which
triggers a dependable response from the cell.
A massive memory. Perception of each unit of memory may cause the cell to fire, or to be inhibited. 26 characters can
be arranged in millions of unique combinations. For a nerve cell with just 26 inputs, there can be millions of such units
of memory. The cell may selectively respond to millions of combinations. Recognition on this basis may give massive
selective intelligence to the nerve cell. Contemporary research has so far failed to locate a physical location for human
memory. The possibility suggested here can point to incredible memory capabilities in individual nerve cells. If an
individual cell can have such a large memory, imagine the total memory capacity of 100 billion cells! The concept may
also highlight the problem of memory recall. There may be as many units of memory as the number of grains of sand
on a beach. The task may truly be the equivalent of locating a needle on a beach.
A memory at a synapse. High frequency stimulation of of the dendrites of a neuron have been known to improve the
sensitivity of the synaptic junctions. This phenomenon (13) is called long-term potentiation (LTP). Since such activity
is seen to be "remembered" by the cell through greater sensitivity at specific inputs, LTP is considered to be a hopeful
direction for research in locating human memory. This essay suggests that memory derives from a pattern recognition
function. It may follow from the cyclic recognition of the unique features of the multitudes of dendritic inputs of a
neuron. A neuron may become more sensitive to an individual input through LTP. Neurochemicals at the synaptic
junctions have also been known to increase such sensitivity. But, memory may derive from the gobal pattern
recognised by the nerve cell rather than from a greater sensitivity to a specific dendritic input.
Cell memory feasible. Each microscopic living cell contains the DNA molecule which carries within it the entire
blueprint for a human being. Recognition codes in cells interact in the handling of the millions of chemical interactions
in the body. The immune system is also known to use powerful code recognition systems. Under the circumstances, it
is feasible that the protein neuroreceptors which mediate neuronal interactions (or the innumerable chemical synaptic
intermediaries) contain sufficiently powerful memories and code recognition systems for the sustenance of a practically
limitless memory in each nerve cell. If such a massive memory exists within each one of billions of nerve cells, there is
the possibility of an astronomically large human memory - trillions of trillions of megabytes in computer terms.
Acceptance of the presence of such an immense memory may take us a step further in understanding the awesome
power of the mind. It may also create a massive barrier to AI in its efforts to imitate human intelligence.
The memory of nerve cells may be for patterns. Recognition requires a memory for the cell. Instead of just 26
inputs, many nerve cells have thousands, or even hundreds of thousands of incoming dendrites. 26 inputs can be
represented as characters on a page and each unit of memory as a group of characters, such as ABC or CDE. But, with
hundreds of thousands of inputs, the closer equivalent is a pattern of dots on a screen - a picture. With Boolean logic,
the pattern would consist of dots, which are either on, or off, with a defined frequency. The memory of a nerve cell
would be its ability to store in memory and so recognise multiple patterns of dots - the pattern of incoming dendritic
impulses on a cyclic basis. This cyclic pattern of dots is the equivalent of a black and white picture. Recognition of a
picture triggers an impulse from the cell, indicating that the current incoming information has relevance to this
particular cell. Each nerve cell may have a memory for millions of such pictures, recognising individual pictures to
respond with impulses, or with inhibition.
Memory must be recalled in context. Wherever memory may be stored, it concerns a whole lifetime of activity and is
available for instant recall. A threatened animal carries a potent memory bank of past perilous experiences. It has
memories of initial sensory indications of danger, of muscular responses for battle and of escape routes from the battle
zone. With contextual memory recalled within fractions of a second, the whole power of experience is brought to focus
on the ongoing task of survival. A contextual filing system for memories is a vital requirement of life. Contextual use
of memory existed from the beginning of evolution. (14) In the early aeons, "Nosebrains" recalled memories for smells
to decide if an object was edible and to be consumed, or inedible and to be avoided. Smells became the file pockets
which triggered physical activity. Simple odour based filing systems in vertebrates evolved to more sophisticated
feeling based systems in mammals. Feelings provided context for many subtle shades of activities, including leisure,
play, upbringing of the young, and mild hostility, or deadly combat. This essay suggests that feelings may provide the
key to the recall of memory.
Feelings and emotions are real. But, for centuries, feelings were discarded by scientists as not being part of the
rational modern mind, a throwback from primitive times. It was Charles Darwin who first suggested that emotions have
a real world existence, visibly expressed in the behaviour of humans and lower animals. The existence of an emotion
could be derived from an angry face, or even a bad feeling in the stomach. Later theory suggested that each emotional
experience is generated by a unique set of bodily and visceral responses. Visceral responses switch the nervous system
between the sympathetic system which supports energetic activities and the parasympathetic system, which supports
relaxation. (15) Subsequently, this view was disputed by W.B. Canon. He countered that emotions do not follow
artificial stimulation of visceral responses. Emotional behaviour was still present when the viscera was surgically or
accidentally isolated from the central nervous system.
Nerve impulses can represent feelings. This view that emotions have an independent existence is supported by
current research. Euphoric states of mind are created by drugs. (16) Electrical excitation of certain parts of the temporal
lobe of the brain produces intense fear in patients. Excitation of other parts cause feelings of isolation, loneliness or
sometimes of disgust. (17) The feeling of pleasure has been shown to be located in the septal areas of the brain for rats.
The animals were observed when they were able to self stimulate themselves, by pressing a lever, through electrodes
implanted in the septal area. They continued pressing the lever till they were exhausted, preferring the effect of
stimulation to normally pleasurable activities such as consuming food. All experimental evidence over the years
suggests that nerve impulses can trigger feelings. This fits in with the reasoning that nerve impulses represent finite
events. In such a case, a group of fibres which carry feeling impulses can be viewed as a picture in a channel,
representing the real time feelings in the system.
The limbic system - a feeling centre. (18) In 1937 Papez postulated that the functions of central emotion may be
elaborated and emotional expression supported by a region of the brain called the limbic system. This system is a ring
of interconnected neurons containing over a million fibres. These fibres also pass through the thalamus, the main nerve
junction to the cortex mentioned earlier. The limbic system is a feedback ring with impulses travelling in both
directions. (19) This essay suggests that the pattern of impulses in this million fibre channel of the nervous system may
represents our global feelings - a feeling channel. For a system which is constantly interpreting nerve impulses, the cell
of origin of the impulse indicates whether the impulse represents a point of light, a pitch of sound, an element of pain
or a twinge of disgust. Feelings are triggered as nerve impulses which represent measurements of the parameters of the
system. They are ever present. The pattern in this channel reflects the current feeling and may provide the context for
the recall of memories by the mind. Feelings may be expressed as a picture with a million dots. This essay suggests that
each subtle variation of the picture could recall a specific memory.
A sensory map on the cortex. It was reasoned that nerve cells store memories in the context of their relationships.
Such data must be stored somewhere to be recalled. It is widely known that the brain physically isolates each pixel of
sensory information. (20) When light enters the eye, it passes through the lens and focuses its image onto the retina.
The light is received by special cells in the retina called rods and cones. Light-sensitive chemicals in the rods and cones
react to specific wavelengths of light and trigger nerve impulses. About 125 million rods perceive only light and dark
tones in an image. 6 million cones receive colour sensations. The light from a single rod is perceived as a microscopic
spot of light when impulses reach the visual cortex. (21) Similarly, the tones heard by the ear reach a region of the
cortex called Heschl gyrus. There is a spatial representation with respect to the pitch of sounds in this region. Like a
piano keyboard, tones of different pitch or frequency produce signals at measurably different locations of the cortex.
Each pixel of sensory information terminates in a specialised complex on the cortex. The entire sensory inputs to the
mind impinges as a picture in a region of the cortex. Consider the possibility that the memory of each sensory image is
stored exactly where it is received. There is experimental evidence of this possibility.
A Barrel to store memory. Each of the millions of sensory signals is finally known to reach a specialised barrel of
cells in the cortex. (22) In 1959 Powel and Mountcastle identified this complex as the elementary functional unit in the
cortex. Each unit is unique. It is a vertical column of thousands of nerve cells within a diameter of 200 to 500 microns,
extending through all layers of the cortex. Let us call this unit a Barrel. Research has demonstrated the functional
specialisation of each Barrel. Each Barrel represents a single pixel of sensory information. The neurons of one Barrel
are related to the same receptor field and are activated by the same peripheral stimulus. All the cells of the Barrel
discharge at more or less the same latency following a brief peripheral stimulus. The activation of one Barrel indicates
the arrival of one finite element of information to the cortex. A single rod reports the incidence of light on a
microscopic spot on the retina. The impulses from this cell are carried through the optic nerve to a single Barrel in the
visual centre in the cortex. The firing of a Barrel in the primary visual cortex signifies the perception of a point source
of light by the mind. This essay reasons that memories may be stored in the same Barrels.
Barrel - logical location for memory. The firing of one Barrel represents a single pixel of the global sensory
information. The location of the Barrel defines it as a point of light, a pitch of sound or a pressure point on the skin.
The firing of a pattern of Barrels is interpreted by the mind as a sensory image. The Barrels will fire when the image is
received. If the same Barrels fire again, a memory of the same image will be recalled. It was reasoned that a memory
may be recalled in its context. Feelings may provide that context. Feelings are the logical filing references for the recall
of memory. Feelings form a picture in the feeling channel. It was reasoned that nerve cells store memories of
relationships. These relationships were stored as pictures. It is now suggested that such a memory may be recorded into
a Barrel. The current feeling may be recorded into the memory of all Barrels which receive the current sensory
perception. Each Barrel recalls the relationship of this feeling and fires. When this feeling is recalled again, the same
Barrels fire and the sensory memory is recalled. For this reasoning to be plausible, feelings must have access to each
barrel.
A "non-specific" access. If feelings trigger the firing of Barrels and the resultant recall of memory, then the feeling
channel must have access to each Barrel. Current research supports the view that there could be such an access. (23)
The nerve cells in the Barrels of the cortical layer are known to have both radial and parallel fibres. Radiating
downwards from the cortex are millions of fibres which directly link Barrels through the thalamus to all sensory and
motor functions. This link is called the "specific link". The cortex also has a surface layer which runs a thick network
of fibres parallel to the surface. These fibres are also known to be linked to the thalamus. This link is called
"non-specific thalamo-cortical link". The link was recognised when it was discovered that stimulation of the
"non-specific nuclei" of the thalamus led to wide-spread "recruiting activity" in the outer layers of the cortex. This
essay suggests that this "recruiting activity" could be the process of recalling memory.
Feelings have access to Barrels. The feeling channel in the limbic system passes through the thalamus. The impulses
in this channel may be broadcast through the "non specific thalamo-cortical link" to the cortical Barrels. The complex
of cells in each Barrel may receive dendritic inputs from the million fibre feeling channel through the surface layer of
the cortex. Each Barrel may instantly recognise feeling patterns. Recognition of a feeling may cause a pattern of
Barrels to fire. The firing inhibits Barrels with weaker recall. Firm firing by a contour of Barrels recalls the original
sensory image. There is evidence that strong feelings result in more powerful memory traces. When strong feelings are
experienced during a sensory event, each Barrel stores a more intense feeling pattern. As a result, more Barrels recall
the image and a more vivid memory of the event is recalled.
Memory of a flower. As explained earlier, when light from a flower enters the eye, it passes through the lens and
focuses its image onto the retina. The light is broken into millions of pixels. The impulses representing each pixel are
carried through the optic nerve to a single Barrel in the visual centre in the cortex. Each Barrel is a complex of cells
with a vast number of inputs. It is suggested that each such Barrel also receives a feeling image - the feelings
experienced when viewing the flower. Each Barrel, which receives a pixel of the image of the flower, records the
current feeling picture. Later, if the feeling was strong and is recalled, the Barrel fires. Firing by the relevant Barrels
inhibits weaker recognition paths. When all Barrels which recognise this feeling fire, the image of the flower is
recalled. (24) This hypothesis concerning the location of sensory memory is also supported by a recent discovery. In
1988, Kosslyn reported that the recall of a visual image involves activity in the same areas where visual perceptions are
received. Effectively, the same Barrels fired when an object was perceived and when its memory was recalled.
A gargantuan memory. Consider the impact of this view of memory storage and recall. The mass of nerve cells in
each Barrel in the sensory region may store memories of all the feelings one has ever experienced whenever it fired.
The recall of any relevant feeling causes the Barrel to fire. If recognition is weak, it is inhibited by the stronger
recognition of neighbouring Barrels. Firm firing by a pattern of Barrels recalls a clear sensory image. This implies that
any perception is stored as millions of microscopic pixels of the global sensory image, in the context of a relevant
feeling. Such a memory would be widely distributed through all sensory Barrels, This could explain why scientists
could not remove memory by ablating portions of the brain in their experiments on rats. The findings of Kosslyin that
the recall of vision involved activity in the same cortical region as visual perception also supports this view of memory.
Such a system could store a lifetime of sensory memories and instantly assemble a single contextual image. This
process could explain your ability to recall an image from everything you have ever read, seen, or heard, in the blink of
an eye. But, the tendency of the system to inhibit weaker recognition paths may also prevent the recall of weaker
memory traces and appear as the fading of memory.
The cell memory can be inherited, or instantly acquired. It is reasoned that the memory required for pattern
recognition by nerve cells, as envisaged in this essay, can be both inherited and acquired. Inherited processes may be
seen in the visual processing regions of the cortex. The varying attributes of a visual image are analysed in different
regions of the visual cortex. One of these locations analyses the orientation of the outlines of a visual image. The cells
are arranged into distinct modules, with orientation selective cells which fire only when an edge or bar in their
fields is held at a particular orientation. While all the cells in one column of cells respond to one orientation, and an
adjacent column responds to an orientation a few degrees off from the first and so on, till all possibilities are covered. If
a column of cells is to select a single orientation, it must receive inputs concerning all orientations and then select one.
Selection implies choice. From multiple received pictures, a single row of cells select a single picture. This is a
consistent response. Evidently, they remember the picture. Such responses by cells has to be inherited. The recognised
pattern may be its inherited memory. Evidence of such automatic responses by many neural systems provides proof of
a cell memory for patterns.
Sensory memories are, of course, acquired. As against a wide range of inherited responses by nerve cells, new
sensory memories are continually recorded. Every day, events that provoke feelings record thousands of images into
memory. When a Barrel fires to recall a new memory, the pattern of feeling impulses which triggered recall has already
been recorded afresh into the memory of the complex of cells. Since the cells can be sensitive to inputs from even a
single dendrite, the process of recording a memory can be a simple process of recording the current incoming picture
on receiving such an instruction from any source. This essay assumes that cells can have both inherited and acquired
memories.
Recognition of Objects
Channels carrying pictures. We have assumed that nerve cells recognise received pictures. A feeling channel carrying
a picture through a million fibres has also been suggested. There is a vital difference between a picture and a parcel of
messages, when transmitted through a bunch of fibres. Take the 32 bit parallel connection in a computer cable
connecting two parallel ports. A computer can recognise only two states, on or off, in each of its millions of circuit
switches. But when 32 switches are linked together, it can recognise, in a single cycle, over four billion pieces of data.
But, such connections must maintain integrity in neighbourhood relationships at the sending and receiving ends. Only
cyclic information received simultaneously through all the inter-related switches can be interpreted. Compare this to a
glass fibre channel transmitting information. If each fibre carried an individual message, the relative location of the
fibres would not matter. But suppose each fibre in the channel carries a single pixel of a picture. Then, if the relative
positions of the fibres change between the sending and receiving ends, the picture will be lost. It is in such a context,
that the computer cable transmits a primitive 32 dot picture. The relative position of the dots must be maintained at the
sending and the receiving ends. The feeling channel may similarly transmit a million dot picture.
Neighbourhood relationships critical for a picture. If a channel is to accurately transmit a picture, the fibres must be
projected and "mapped" at the receiving end. Such projection and mapping is done in the nervous system. (25)
Throughout their growth, axons extend and map on to specific target regions. Each area of the somato-sensory cortex is
proportionally linked to the number of nerve endings in the corresponding part of the body. Similar parallel projections
exist in many other regions. Proximity relationships are critical for these connections. They maintain integrity in the
relative location of the fibres in a transmission. The principle of projection suggests that relative location has meaning
for the nervous system. The transmission is essentially a matrix of precisely located dots. Millions of such dots, in a
magazine, are recognised by us as a picture. If the relative locations of dots change, the picture is seen to change. Just
as it decodes information from the characters on this page, the mind instantly decodes the information in such a matrix
of dots. In this essay, a picture is defined as a cyclic transmitted pattern of dots, in a fixed matrix, which is recognised
at the receiving end.
Pictures may be the language of the mind. The nerve fibres reporting pain to the cortex is a pain reporting channel.
Each fibre recognises incoming patterns to be inhibited or to report both pain and sympathetic pain. The mass and
location of dots in the picture in the channel reports the precise location and severity of perceived pain. This essay
suggests that recognition of such pictures is the basic capability of the nerve cell and of the nervous system. More
reasons are given to support the view that information may move in the nervous system as such pictures in dedicated
channels. The capability of recognising pictures may apply not just to visual images, but to all messages transmitted in
the system. The meaning of such messages may be defined as the information carried by them. Un-related, a single
pixel of a picture has little meaning. Meaning is derived from its contextual whole. An arrangement of dots can
represent a character of text. Higher and higher orders of meaning can be conveyed by a single character, a word, a
sentence, and a paragraph of text. A picture is said to carry more meaning than a thousand words. It is at the apex of the
hierarchy in the conveyance of meaning. A multi-million dot picture channel can carry an infinite range of information.
Neural channels may convey the most powerful meaning at this level. The remainder of this essay assumes that the
nervous system transmits such meaning.
From primary to secondary, then to association areas. When we assume that channels in the system carry
meaningful pictures, the flow of information reveals awesome order and purpose. The regions and pathways of the
human nervous system have been extensively mapped. Each receiving region performs a function and transmits the
pictures further. (26) The areas of the cortex which receive sensory information are called the primary areas. They were
seen to perceive and recall sensory images. The sensory pictures proceed from primary to secondary areas, which
co-ordinate those from similar sensory receptors in the other half of the body. Neuron channels from the primary areas
send pictures only to the secondary areas. All secondary areas in both hemispheres of the brain are inter-connected. The
secondary areas are known to deal with more complex functions such as binocular vision and stereophonic sound. The
pictures proceed from secondary areas to the so called "association areas" of the cortex. These areas receive the
consolidated pictures from all secondary regions. The association areas appear to be the principal pattern recognition
engines of the mind. They perceive and recognise
Many categories of recognition. Each association area recognises an entity in the context of its received sensory
information. (27) The primary somesthetic area of the cortex receives pictures of the sense of touch. If this area is intact
and there is damage to the somesthetic association area, a patient can feel a common object, such as a pair of scissors
held in the hand, while his eyes are closed, but is unable to identify it. The picture in the somesthetic areas enables the
sense of touch and that in the somesthetic association area enables recognition of the touched object. Failure of each
association area causes failure of a particular recognition ability. The visual association area impacts on visual
recognition. Tactile categorisation affects the recognition of an object by its feel. When the speech association area is
damaged, a person knows the object, but is unable to name it. The association areas appear to perform individual acts
of recognition.
A picture to represent the recognition of an object. The premise is that pictures transmit information in the system.
Pictures imply a distinctive pattern of dots in a fixed matrix. A visual image may be recalled through the firing of the
same Barrels in the cortex which received the original image. Groups of Barrels are also known to transfer information
between different regions of the cortex. The nerve fibres from the Barrels in the somesthetic association area also can
be reasoned to be sending pictures. Damage to this area implies loss of ability of the nerve cells in this region to send
these pictures. Subsequent failure to recognise a pair of scissors suggests that this picture represents a pair of scissors to
the mind. Such a recognition is a stable repeatable event. For recognition to be stable, this picture must be consistent
for this object. If pictures transmit information, the same picture must fire every time scissors are recognised. Each
object that is recognised would require equally consistent pictures. This would require a process which imprints such
pictures in this channel.
A recognition image. This essay suggests two routes for intelligent activity through instantaneous recognition of
patterns by billions of individual nerve cells. One is to recall an image in the exact geographic format in which it is
recorded as in the recall of a visual memory. The second is to recall a reference image, imprinted at the point of
recognition. A recognition picture fired by the association channel could be any random arrangement of dots. It would
be more logical to think that this arrangement is obtained from the system. Since feelings are always present, feeling
patterns could provide random reference points. If the geographic map of the association channel duplicates that of the
feeling channel, and has a parallel link to it, the channel can fire the same pattern as the feeling experienced when the
recognition of an object is first imprinted. Subsequent recognition would fire the same picture and the system would
recognise the same object. Stability of recognition may be provided by nerve cells in Barrels which act to trigger
inhibition of weaker recognition paths. In conclusion, each time the recognition of an object is imprinted, a picture is
imprinted in the association channel. Later when the channel perceives the object, it fires this picture in recognition.
The elimination algorithm to recognise an object. The mind instantly recognises one object from thousands of
known objects through a sense of touch. But the number of identifiable objects is finite. Each identification triggers a
picture by the Barrels of the somesthetic association channel. The Barrels receive integrated sensory pictures from both
halves of the body. They receive the global touch sensory information concerning an object. Assume that a random
group (X) of Barrels store the sensory picture at the point of recognition. X, a random reference, is provided by the
system. Later, during recognition, all Barrels perceive the object. The X Barrels which recognise some unique element
of the object trigger inhibition of Barrels which fail in such recognition. The X picture fires. Firm recognition would
imply a consistent firing of X. Active inhibition of all unrelated Barrels would eliminate other categories, leaving X,
indicating recognition of a single object. Sometimes the name of a recognised object may remain at the boundary of
consciousness, to be lost suddenly. Such "tip of the tongue" feelings may be derived from solutions that are eliminated
at the last moment.
">Elimination from a fixed list. No one will dispute the thesis that the human memory has gargantuan proportions. A
search and match algorithm would just go further and further back into memory to identify an object. Such a search
would be endless. But the elimination algorithm presumes a fixed list of known objects - a limited memory. From the
first recognition of its mother by an infant, the mind continually expands its list of differentiated entities. But at
any point in time, the list must needs be fixed. It is logical to conclude that the list is finite. Secondly, if the list of
known objects was an amorphous mass, the mind would attach little importance to a new addition to the list. But the
first recognition of a new category is more vividly remembered than the second or the third. The Xerox machine and
the Polaroid camera are typical new objects, which are better remembered. The principle of "positioning" in advertising
depends on finding new categories against which products can be "hooked". The marketing world attaches importance
to creating a "new niche" in the customer's mind, because the context of imprinting a new category is better
remembered. The importance attached to the creation of new categories implies a finite list of known categories.
Elimination from this list brings the recognised category into focus.
Analytical logic vs. IA pattern recognition. Initial AI efforts assumed that computation was the key to intelligence.
That the mind was just a sophisticated calculator. Later, it was acknowledged that any intelligent action requires a
knowledge based response. The mind uses a store of knowledge to respond differently to varying circumstances. AI
scientists attempt to assemble the knowledge and the related responses using the tools of analytical and inductive logic.
Data is chunked into categories which follow particular rules. Known relationships need to be generalised to fit rules
for such categories. This need to generalise limits analytical thinking. It cannot cope with infinitely differentiated steps.
It misses galaxies of fine detail. So, it fails to identify between charm and dignity, or anger and enmity. It ends up as a
subset of the human thought processes. IA suggests a pattern recognition process which also follows the principles of
inductive logic. It also uses a store of knowledge to draw conclusions from past experience. While analytical logic
requires a series of steps from premise to conclusion, IA pattern recognition leaps logically from perception to
conclusion based on unique category links in memory.
Fine logical differentiation. The IA logic implies a multi-million dot recognition picture, which can represent trillions
of categories. Each dot in such a picture (a Barrel) results from the evaluation of millions of stored multi-million dot
pictures. Such pattern recognition pinpoints millions of categories with precision, by identifying the unique quality of
each category. It views huge masses of data, using an astronomically large memory. In such a process, it instantly
identifies marble from jade, or tea from coffee. With the capability for massive discrimination, it conquers the
subtleties of language, poetry, art, and music. Such pattern recognition handles both analysis and the highest levels of
subtlety. While analytical logic fails to evaluate a great painting, pattern recognition identifies it as a work of art. It also
instantly recognises the stupid question in an analytical AI process. The only tool available to AI for modelling the
knowledge relationships of the mind was a logically analytical one. Unfortunately it was less sensitive, pitifully slow
and stupid. This became a barrier to an understanding of the mind. The IA pattern recognition model is logical and
capable of fine and massive differentiation. Above all, it functions in real time. It can sift mountains of data in
milliseconds. IA can better help to explain the vast capabilities of the human mind.
Motor Control
Mind's control of the body. A complex mind must interact with the physical world. Nerve impulses must transform
thoughts into an actions. The intelligence involved in such a process has appeared mysterious, with almost spiritual
overtones. It was as if the body followed the instructions of a phantom spirit, which resided somewhere in the brain. As
against such obscurity, this essay has argued that apparently mysterious activities of the intellect can be explained, if
we acknowledge the act of recognition. That recognition of pictures by the nervous system is the key element of
intelligence. This section suggests how the mind may control actions of the body through such a process. A network of
fibres, which convert sensory perceptions into thoughts, may consciously manage fluent physical skills through the
phenomenon of recognition by nerve cells.
The motor control process. This analysis of how the mind may control the body begins with an outline of the
subordinate motor control system, validating intelligence at this level. It goes on to propose how a pattern recognition
system can communicate decisions and objectives at higher levels. An explanation is offered for the logic of
consciousness, which is the pre-requisite for purposive activity. It tries to show how feelings can control conscious
motor activity. It adds the description of a very special organ which mediates to store and retrieve skilled activities.
Such managed recall of skilled physical activity is shown to virtually create the modern human being, with the finely
honed skills of a gymnast, or a concert pianist. The explanation hints at the awesome power and finesse of a control
system which is empowered by sensitive pattern recognition. It ends with more thoughts on a neural channel which
may contextually choose intelligent physical goals for the body.
Habitual and purposive responses to feelings. The recall of a memory and the recognition of an object are
instantaneous acts. As opposed to this, motor activities persist over time. Since motor control impulses may fire upto
10,000 times a second, sequences of millions of impulses command the act of writing a letter, or of playing a game.
Current knowledge is that these controls have both purposive and habitual components, which interact seamlessly.
Experimental evidence has shown that purposive controls come from the cortical areas. Habitual controls are known to
come from an organ known as the cerebellum. These two control systems co-operate to achieve the objectives of the
mind. It is reasonable to assume that the process achieves the satisfaction of felt needs through physical activity. The
million fibre feeling channel was suggested to have access to information on the needs of the system. This chapter
suggests how pictures in this channel can control motor activity.
Control and intelligent response. At the highest level, impulses that control muscle movements originate in the
Barrels of the motor area of the cortex. Electrical stimulation of the cortical motor areas trigger, through 60,000 or so
motor neurons, specific acts of muscle contraction. These are supported by controls from the cerebellum. This organ is
known to co-ordinate motor activity. When these signals are despatched to subordinate levels, a single motor neuron
processes further lower level information from upto 20,000 dendritic inputs (28) from other neurons. These are
supportive controls. In an air liner, a pilot expresses a purpose by moving a cockpit control lever. This act switches in a
series of hydraulic and electrical motors which finally achieve the intention of the pilot. There is purposive movement
at higher levels and intelligent support at lower levels. Similarly, in the human system, signals from the cortex and the
cerebellum provide high level controls. These are converted into smooth activity by the co-ordination of numerous
muscle groups. The 20,000 inputs add intelligent support to cortical purpose.
An inherited cell memory for low level intelligence. Current research does not assign a recognition role to neuronal
inputs. So 20,000 inputs become a mysterious network which achieves co-ordination. Any contracted muscle remains
contracted unless pulled back by an opposing muscle. Smooth activity requires co-ordination between muscles. Such
co-ordination is aided by sophisticated receptors which report back with nerve impulses on pressure, stretching of skin
and the initiation and cessation of movements. Let us assume that neuronal interactions achieve intelligence through
recognition. That each input combination triggers a remembered "fire or inhibit" response. If so, the significance of
20,000 inputs become obvious. Recognition of an impulse indicating the contraction of one muscle requires automatic
inhibition of impulses which contract an opposing muscle. Imagine the interactions in the ordinary act of sitting down.
It involves numerous muscle groups. Each muscle co-ordinates this simple act with the activities of other muscles and
movement related parameters. Changes in the responses of opposing muscles must be immediately reckoned. There
may be an inherited mechanical logic in such responses. Each microscopic muscle movement may inform other
neurons with nerve impulses. The impulses reaching each input may be unique in its information content. There can be
millions of combinations of such inputs. With the intuitive IA process, every decision may inhibit all irrelevant
activities to achieve a single choice. Each motor neuron may recall its memory codes to resolve 20,000 independent
inputs for a single decision on movement for the next instant. Imagine this to be the inherited intelligence at the lowest
level of the system.
A decision must persist. The final co-ordinated output of impulses in the motor channel trigger muscle movements
through the 60,000 motor neurons. Pictures are reasoned to be the language of intelligence. The final motor activity is
the result of a cyclic output picture in the motor channel. Obviously, the highest cortical levels originate this activity
through similar pictures. A cortical purpose picture is intelligently transformed into a motor output picture. It is now
suggested that decisions of the system can also be conveyed as pictures. The complexity of such decisions can be
imagined. The performance of a concert pianist is a product of such decisions. But, pictures can convey meaning at the
highest levels of complexity. Such decision pictures can be recognised by the motor channel to control muscle
movement. But, any decision to act is instantaneous. The impact of a decision must persist till its objective is achieved.
Muscle movements extend over time, while any decision to act occurs in a flash. If an objective is to be achieved, the
muscle must move till the desired action is completed. A decision to sit down must persist till the act of sitting down is
over. If a picture represents a decision, the picture must remain until the task is completed.
An iterating decision picture. In a cyclic system, a feasible solution of the need for persistence is for a decision
picture to iterate, till the task is completed. In a television set, the channel number that appears on the screen is a
constant iterating image, while images on the remainder of the screen change with each cycle. A stationary symbol is
produced in a cyclic system. Any decision, which requires a fixed objective, can be represented by a picture - the
channel number. A fixed picture is needed till an objective is achieved. If a television set could recognise the channel
number on the screen, it could respond with desired programs in the channel. If the channel number changes, the image
events could change. The change in channel number is instantaneous, while the program persists. This essay suggests
that a channel, with iterating signals, may convey the decisions of the nervous system to the motor control regions.
That such pictures iterate in a goal channel. One practical method for the mind to control the body would be to
consciously produce such iterating goal pictures.
The feeling to goal link. Let us assume that feelings represent needs of the system. Its constantly changing patterns
represent demands from the system in real time. If impulses in this channel could control motor activity, felt needs
could be satisfied. Feeling stimuli, like all nerve impulses, are cyclic patterns, which must trigger decisions. Any
decision is an instantaneous event. But it must initiate a persisting activity. Let us assume that feelings trigger goal
pictures. These pictures persist till goals are met. Motor activities are again cyclic, needing to be continually triggered.
Thus, a goal channel establishes a link between an instantaneous decision following a felt need, and continuing motor
activity to meet the need, by providing a persisting objective. The location of the feeling channel has already been
indicated. The goal channel is suggested as a necessary adjunct to a cyclic pattern recognition system which achieves
continuing intelligent behaviour. A possible physical location for this channel is indicated later in this essay.
The wellspring of consciousness. The mind controls the body when it is conscious. This theory of how the mind can
learn to control the body requires an explanation of consciousness. The source of this phenomenon may be evaluated
from a special context where a person becomes unconscious. This is known to occur when there is damage to a region
of the brain called the reticular formation. Damage to most other regions of the brain, (including the removal of half of
the cortex in an operation called hemispherectomy to remove tumours), causes only selective defects. (29) But, serious
damage to the reticular formation results in prolonged coma. Cutaneous or olfactory stimuli to the reticular formation is
known to restore a person from a fainting fit. Electrical stimulation of the reticular formation is also known to induce
sleep in animals. Activity in this region can both raise the levels of consciousness and alertness as well as induce sleep.
The reticular formation appears to be critical to consciousness.
A consciousness control channel. (11) It was shown how an executive attention centre could increase awareness by
causing inhibited sensory inputs to fire. It was suggested that a similar channel may wake us into consciousness, with
global awareness. It is now suggested that the reticular formation has initiating links to the sensory input and goal
channels. That these links form a consciousness control channel. That these channels recognise cyclic impulses from
the reticular formation and wake us into consciousness. That these control signals provide an ongoing consciousness
drive. Cyclic impulses keep us conscious. The consciousness drive opens sensory channels and people become aware
of their surroundings. It also triggers activity in the goal channel, which generates pictures defining objectives of the
system. Goal pictures are automatically interpreted into motor activity.
Purposive movement through learned pattern recognition. In conscious activity, primitive animals may have
inherited links between felt needs and purposive activity. Inherited memories may enable a need felt by a primitive
brain to lead to an activity, which automatically satisfies that need. But human systems have highly differentiated
purposes and even new ones. New purpose cannot be an inherited memory. Pattern recognition also implies the
recognition of a pattern in memory. A pattern recognition system can only recognise a known pattern. Human beings
learn through play and experimentation. These can lead to the successful achievement of goals. Feelings related to such
successes can become memories for subsequent recall. Imagine that a contextual feeling is recorded against a goal
picture which achieved a desired result. If this feeling is recalled later in a similar context, it can trigger the same goal
picture, resulting in the needed motor activity. If a goal channel records and recalls successful goal pictures in the
context of feelings, feelings can then trigger goals. The goal channel must learn to record contextual feelings against
activities which led to successful goals.
Learning purposive movement. Consider this explanation of how the goal channel learns to recognise feelings to
trigger goal pictures. It may be a process which begins in the cradle, with the intense activity of an infant. As the baby
wakes up, the consciousness drive initiates the goal and sensory input channels. The goal channel produces random
images. These symbols trigger erratic hand and leg movements. The active infant sees an object in its field of vision. Its
waving hand touches the object. A feeling of satisfaction is experienced. This feeling is recorded in the goal channel
against the goal picture which achieved this goal. A subsequent view of the object recalls this feeling. The goal channel
recognises the feeling to trigger the goal picture. This picture results in the hand movement. Contextual recall of the
feeling enables the child to moves its hand. An ongoing learning process continually adds memories of similar feelings
to goals relationships in the goal channel. In a continuing process of repeated play and experimentation, the child learns
to move its hand towards seen objects.
An ongoing learning process. Each achieved goal increases control. Pattern recognition permits fine discrimination of
feelings to chose ever more precise goals. Practice and millions of similar memory refinements later, the child learns to
reach out and grasp a pencil. As it learns to control its movements, the feeling channel takes charge of the goal channel
and the random activities of the infant cease. Feelings control the movements of the child. They become purposive.
Ultimately, such purpose covers the entire range of human activity, including speech. Speech, in fact, becomes one of
the best expressions of an individual's feelings. Each of these learned activities is the result of the memory of a goal and
a learned activity, recalled in the context of a feeling. The cerebellum may store and recall memories of learned
activities. Such a store of memory of habitual movements is known to interface seamlessly with purposive cortical
movements.
Computation and pattern recognition for movement. It is argued that the body achieves daily routines through the
instant recall of memories of habitual movements. A proof of this becomes a powerful support for this theory that
pattern recognition, (not computation) is the key to an understanding of the mind. Both computation and remembered
control systems can enable a robot arm to touch an object. It can compute the precise location of the object and make a
sequence of inter-related decisions. It moves a joint at the shoulder to an optimum position and locks. The movement
then switches to the elbow and subsequently to the wrist. Such an activity would be a precise, mechanically computed
movement. Alternatively, the arm could be guided directly to the target. The complex joint movements which result
can be recorded into the system memory. Subsequently, when the target is indicated to the system, it could recall its
memory to follow this "learned" path to reach the object. The first process involves a complex computational capability
in the control system and the second, a powerful memory. Pattern recognition implies the use of memory for intelligent
activity.
A filing cabinet for habitual movements. There is an organ of the mind which appears to store such memory. The
cerebellum is a miniature single purpose brain. It is laid out to assist cortical motor functions. It inserts habitual
movements into purposive activity. Electrical stimulation of the proper primary motor areas of the cortex invoke simple
actions such as the extension at a finger joint. While cortical control is simple, the cerebellum (30) is "necessary for
smooth, co-ordinated, effective movement". Failure of the cerebellum causes movements of a patient to become jerky.
With cerebellar problems, the patient converts a movement which requires simultaneous actions at several joints into a
series of movements, each involving a single joint. (31) When asked to touch his nose, with a finger raised above his
head, the patient will first lower his arm and then flex the elbow to reach his nose. This problem is called
"decomposition of movement". Such behaviour is remarkably similar to the actions of a primitive robot. Complex
joint movements for directly touching an object are forgotten and cortical purpose just manages a tedious joint by joint
movement. Purposive action continues to be achieved without the cerebellum. The patient can still reach the goal. But
presence of the organ achieves the same goals smoothly.
Sequential control of motor functions. Research has shown (32) that the cerebellar cortex has all motor control
functions with sensory inputs related to motor locations spread over its cortical layer with topographic precision. The
cerebellum receives all the information that is needed for motor activity. As it takes over control of habitual motor
functions from the cortex, (33) the entire outputs of nerve impulses from the cerebellum are through a type of nerve
cells called Purkinje cells. Each Purkinje cell is known to have hundreds of thousands of dendritic inputs, with inputs
from global sensory and motor functions. Each input evaluates a single parameter. In 1967, V.Braitenberg suggested
the possibility of control of sequential events by the cerebellum. (34) The organ appeared to have an accurate biological
clock. Impulses in fibres which link successive Purkinje cells reach the cell dendrites at intervals of one ten thousandths
of a second. Alternate rows of Purkinje cells are excited, while in-between rows are inhibited. The cells fire
sequentially.
A memory for habits. The cerebellum is known to perform a co-ordinating function. It has access to the entire range
of contextual motor control information, an accurate pace setting mechanism and is purposively controlled by the
cortex. The only output from the cerebellum are the Purkinje cells. They control habitual motor functions. Such motor
activity meets cortical goals. This essay suggests that each Purkinje cell records a microscopic motor activity for a
single motor neuron in the context of current global motor control data and the current cortical goal. The cell fires again
whenever this picture is recognised. It recalls a memory to generate motor activity. Consider the habitual controls from
the cerebellum in the simple act of sitting down. It is, essentially, a complex movement, controlled by both cortical
purpose and habitual controls from the cerebellum. The height and position of a chair provide cortical goal information.
The cerebellum manages the objectives of many muscle groups to achieve the cortical goal. It is reasoned that, with
trillions of contextual pictures in its memory, the Purkinje cells may sensitively recognise each microscopic motor and
goal prospect to support habitual acts.
A goal channel into the cerebellum. The cerebellum provides some hints of the existence of a goal channel as
suggested in this essay. The cerebellar cortex receives a major input from a nucleus of cells called the olivary nucleus.
Fibres from many regions of the cortex reach the inferior olivary complex and are distributed to all parts of the
cerebellar cortex. (35) Damage to this group of fibres is equivalent in effect to damage to the cerebellum, causing
severe loss of co-ordination of all movements. The cerebellum is known to insert a massive range of learned
movements into normal activities. These inserted activities meet cortical goals. It is suggested that these fibres could
form a goal channel, which continually informs the cerebellum of the current goals of the system.
A seamless interface. The cerebellum could be acting as a memory store, switching controls between it and the motor
areas of the cortex. Neurons in the cerebellum could learn and reproduce remembered movements, becoming inhibited
when cortical intercession takes place in any habitual movement. A person who reaches for an object could be moving
his hand smoothly to a point close to the object through cerebellar controls with conscious controls taking over for a
brief instant to adjust the hand. The cerebellum could again takes over to grasp the object smoothly. The Purkinje cells
also have inputs from stretch receptors which report increased muscle tension enabling the organ to hand back habitual
movements to purposive cortical controls.
Evaluation of a quarter million parameters for an imperceptible muscle shift. Children take years to learn to walk,
run, or ride a bicycle. At ten thousand frames per second for each one of sixty thousand motor neurons they could
represent astronomical memory capacities as each action to meet a cortical goal is learned by this organ. Habitual acts
are unique to each individual. If they were computed movements, they would have been similar for every one. Being
more quixotic and repetitive, it is more probable that these are remembered actions. Even the movements of a skilled
gymnast are learned with painstaking practice. It is training (requiring memory) which achieves the unique, but
co-ordinated movements of many muscles to precisely meet cortical objectives. This essay reasons that finely
discriminative feelings trigger sensitive pictures in the goal channel to recall myriad learned movements from the
cerebellum. The Purkinje cells represent a system which evaluates a quarter of a million parameters to generate a one
ten thousandths of a second movement of a single muscle. Multiply that by 10,000 decisions every second and again by
60,000 motor neurons. Imagine the power of such a system.
Event Recognition
Intelligence through recognition. Recognition has been proposed as the key to understanding intelligence. It was
shown how recognition may cause the feeling of pain to be heightened or suppressed. That it may enable signals from
the reticular formation to create consciousness, or those from EAC to focus attention. That recognition by association
channels may enable the identification of objects. That recognition by Purkinje Cells may enable the mind of a skilled
gymnast to finely control his feats. Recognition, in reality, even provides the links, beyond the level of an individual
intellect, for the highest levels of integration of a modern society. The lifeline for today's technological world is the link
created by the recognition of spoken and written messages. If there was no recognition, intelligence would not exist. It
is now suggested that, just as it identifies objects, the mind may have a capacity to recognise the events around them.
The process of event recognition may produce an altogether nobler level of intelligence.
Event recognition exists. Recognition of a pattern, which triggers a sequence of events is normal for computers. A
"sort" command sorts a column of figures. A "copy" command copies a document from one file to another. The
computer recognises the command to set in motion a train of events. In a reverse process, a computer in a bank may
evaluate a sequence of transactions to trigger an alarm concerning a suspicious, or fraudulent event. A sequence of
activities becomes the cause rather than the result of the recognition of a pattern. Computers can be programmed to
recognise such events. We know that the mind recognises events with enormous power and subtlety. Words and
sentences in language identify and define events in the environment in all their complexity. There is also simultaneous
recognition of multiple events. When one drives through traffic, there is awareness of the movements of many objects
in the field of vision. Cars move in various lanes. People cross the road. Signals flash. Each event has a distinct context,
history and future possibilities. One recognises where a pedestrian comes from and where he is likely to go. Event
recognition implies a complex past, a present and a future.
The time span for event recognition. Events are continuously perceived by the mind. It also absorbs complex
information through the process of reading. It may be absorbing such information in discrete packets. The structure of
language may indicate such a process. Each sentence has an acceptable length and every sentence closes with a period.
A working memory (36) may store the first part of a sentence, while sense is made of the second part. While memory
capacity is astronomically large, the working memory of the mind is just comfortable with a single telephone number.
This may suggest a chunking of information into comparatively manageable segments before it is absorbed. A sentence
contains data regarding objects and their static and dynamic relationships. It would be reasonable to assume that the
time taken to absorb an average comprehensible sentence spans the time period for an event to be learned for storage in
memory for subsequent recall and recognition. While even the instantaneous click of a camera lens is a recognised
event, the structure of our literature suggests a period of ten to fifteen seconds for the absorption of a more complex
event by the mind. The information in a sentence may be assimilated in this period.
An event picture. While an object can be identified instantly, an event needs to be evaluated over time, to achieve
recognition. Even the simple act of running generates a sequence of complex patterns. A sequence of images are
needed to represent the action. Even so, the event can still be represented by the simple word "run". The word covers
the sequence of activities. Words and sentences are static images. If they are considered symbols, then events can be
identified through them. Symbols can be represented by pictures. Fitting into the IA pattern recognition model of this
essay, an event can be represented by a picture. It is reasoned that such pictures may be imprinted in a goal association
channel. Compare the process to imprinting an iterating image "run" on every frame of a movie of a running person.
Subsequent recall of any single frame of the movie will contain the name of the event. The name symbolises the act
and enables recognition. The process imprints an iterating image on every frame of a sequence of images. Subsequent
recognition is achieved by identifying any unique quality of any one of these images. The imprinted iterating image
identifies the event.
IA for event recognition. In IA, a pattern is recognised by identifying its unique qualities, which are absent in any
other known pattern. Events are sequences of patterns. It is suggested that unique qualities differentiate every
recognised event at the microscopic level. A fleeting smile is instantly recognised because of some infinitesimally
differentiated and unique quality. That quality differentiates a smile from a grin or a smirk. While it may be impossible
to define the characteristics, the ability to recognise and differentiate such events remains a normal human ability. It is
known that event images are analysed by the mind into thousands of characteristics. If we assume an astronomically
large memory, which stores profoundly small variations between the characteristics of events, finite event recognition
may be practical for the nervous system. Once this capability is assumed, much of the mystery surrounding thought
processes disappear. Most cognitive processes may revolve around the recognition of an event, its recall from memory
and a visualisation of its consequences.
A goal association channel. This essay suggests that events may be identified by the mind as pictures in a goal
association channel. An iterating image (like a channel number on a TV screen) is suggested, since events cover a
sequence of images spread over a period of time. Association channels have access to sensory perceptions. All Barrel
may perceive, say, a person sitting down. The goal association channel is suggested to have a parallel link to the goal
channel. Certain Barrels (triggered by the current goal picture) may record the sensory image of the event.
Subsequently, when the action is again perceived, these Barrels dip into recorded memory to trigger the recognition of
the event "Sit". Barrels which lack the images in memory are inhibited. The event is recognised and an event picture
assembled in the channel. Just as a picture can represent many objects, event pictures in the channel may represent
multiple events. The equivalent may be visualised as a matrix of "iterating channel numbers" on a single screen,
representing several simultaneous events. Each can be represented by an independently changing number. A single
picture may identify several events, in parallel. An astronomically large and finely differentiated mass of such pictures
may represent our understanding of events.
Assembling event images. A musical composition is also an event. The process of combining such events to create a
unique new event in memory was described by Mozart. (37) According to his narration, when feeling well and in good
humour, melodies crowded into his mind. He retained those that pleased him. Once he selected a theme for a new
composition, another melody came, linking with the first one. Each part fitted into the whole. Mozart continued that,
when the composition was finished, he saw it as a beautiful whole in his head. Just as he could see a beautiful picture,
he saw the whole composition at "a single glance". This essay suggests that each melody may be an event, recalled by
Mozart, evaluated and stored again into his memory as an event picture. These are again combined into a complex
event picture, which covered the entire composition. Even though the composition is played over a long period of time,
Mozart saw the complex event "at a glance" as a single picture in his head. This narration may support the concept of
pictures representing events. The mind does recognise pictures, or, even, the implications of the word "read", "at a
glance".
Thinking in symbols. Running, sitting down, or walking are simple events, recognisable from a single picture.
Recognition of such events and their linkages is reflected in our language. A combination of multiple events is
understood from "John ran home and got into bed". The mind visualises the events "ran" and "got into" in the context
of the objects "John", "home" and "bed", to create a new, recognised event. Recognition of each event may fire a
picture in a goal association channel. With IA, the mind may have access to its global meaning. Language enables the
combination of such symbols into a complex event, which is also assimilated. Language has certain universal elements
in its construction. This may imply that the goal association process, which finally comprehends language, is itself
divided into different segments, which are then assembled according to specific rules. Nouns, and verbs may be
recognised separately and combined according to rules for recognition to generate meaning for the whole. The whole
process of comprehending events has been suggested as being the function of a complex goal association channel, with
many components.
Event pictures within event pictures. A logical extension of the recognition of events ia an event picture, which
stores within it a sequence event pictures. The musical composition of Mozart appears to him as a single composition,
which contains many pieces within. A shopping trip can be recalled as a sequence of many sub events within the main
event. Such event recognition implies a combination of current inputs with memories from the past. These can
represent rising hierarchies of understanding. Since pattern recognition, as envisaged in this essay, permits infinitely
differentiated steps, event recognition can explain virtually any type of human intelligence from planning a strategy for
war to comprehending the theory of relativity. They are hierarchies of events, which contain millions of images. Since
pattern recognition requires only uniquely remembered and not logically connected links, any pattern can be linked to
any other in complex associations. Any alteration of a component image leads to a new understanding and a fine
difference of infinite subtlety in a new image of recognition.
Inherited emotional event recognition. The management of feelings has played a critical role in this essay. It dealt
with the concept that nerve impulses represent different shades of feeling. Feelings include those generated by bodily
demands and those fired by intellectual perceptions. Thirst is triggered by a bodily demand. Fear, a sense of isolation,
loneliness, or disgust are feelings which result from intellectual perceptions. This essay suggests that certain feelings
may be triggered by the recognition of event pictures from the goal association channel. Impulses which represent fear,
sorrow, or jealousy may be triggered by specific, recognised categories of events. These impulses may form the signals
in the feeling channel. Each Barrel may recognise event pictures specific to an emotion. The Barrels which fire to
trigger a feeling of disgust may recognise an event picture which represents a revolting event. Such memories may be
inherited at birth in the Barrels of the feeling channel to enable the system to respond with feelings to events.
Event memories. There is reason to believe that our memories of events are recorded as the feelings which we
experienced during the event. When we recall a conversation, we can recall what was said, but not the exact words. But
we can remember the tone and the meaning. Feelings can convey such meaning. The recall of visual memories was
reasoned to be triggered by the Barrels of the visual cortex. That these memories were recalled in the context of
feelings. It is suggested that the Barrels of the feeling channel may trigger feelings when events are recognised, or
recalled. That the goal association channel recognises events as event pictures. It is suggested that each Barrel in the
feeling channel receives and stores memories of these event pictures, which triggered the related emotion. Subsequent
recall of the event recalls the related feeling. Since event pictures have been suggested to be iterating images, the
sequence of feelings related to an event are recalled when an event is remembered. Iteration may give a time dimension
to experienced feelings, enabling the mind to recall events in sequential detail. The feelings, in turn, may recall visual
and sensory memories.
Feelings as an accurate record. Modern society communicates a major portion of its sophisticated messages through
the medium of language. Text books, novels and scientific articles convey complex meaning to readers. A language,
with about quarter of a million words conveys a majority of the information between human beings. It is suggested that
the sequence of images in a million dot feeling channel can convey all this and more within the mind. When we narrate
an event, the recalled event pictures convey the concept of the event through a sequence of recalled feeling images. The
goal association channel recognises and translates it through the speech mechanism into language. As a person
expresses the words related to a feeling, the mind has access to its entire memory store in the context of that feeling. IA
selects the words which exactly suit the context with precision. When an inner voice speaks the ideas in the mind, the
speech mechanism is merely translating the current sequence of feelings.
A summary of the event to feeling link. Event recognition has so far never been suggested by AI research. Infinitely
graded categorisation itself has never been a possibility. So, event recognition was never visualised. But IA makes such
profoundly sensitive pattern recognition possible. It is but a step further to imagine a time dimensioned pattern
recognition system which can recognise events. Human experience and the clarity of language clearly indicate that
events can be recognised with precision. This process also may use mind's language of pictures. Events can be
represented through words. Pictures convey more meaning than words. Iterating pictures in a goal association channel
may represent events. Events could be absorbed in brief time capsules, as in the time span in which the mind grasps a
sentence. They may be recorded for recall as an event picture by the goal association channel. The mind is known to be
aware of several simultaneous events. An event picture could represent several such concurrent events just as an
ordinary picture could represent many objects. Combinations of events could also become complex event pictures,
which represent sophisticated concepts such as war, or democracy. And, finally, event memories may be stored as
sequences of feelings in the feeling channel, which could be recalled by the iterating event pictures. The recalled
feelings would, in turn trigger the recall of sensory memories of the event.
to touch the entity. A person may wish to copy a file on a computer. He interprets this purpose to the computer as a
typed in "copy" command. The computer then executes a series of steps which achieve his objective. This essay
suggests that the goal channel may be a special interface for purpose. It may interpret feelings (the needs of the system)
to determine purpose and trigger motor activity. For this, it may store the knowledge of the system concerning groups
of motor objectives which achieve each purpose. Such purpose may, ultimately, be the driving force of the system.
Purpose as a route map. A nosebrain, which recognised the smell of an object, issued additional instructions to
consume or avoid the food. These instructions were followed by its motor systems. The mammalian feelings system
permits a wider range of options. The cerebellum does not provide cortical purpose, but is known to assist in its
achievement through sequences of recalled motor activities. Just as a route map recalls the physical directions to reach
a destination, the goal channel may recall and set the physical goals which control motor activity to achieve an
objective. If the objective is to leave the room, the goal channel may identify the door as a physical goal. The
cerebellum may co-operate with muscle movements in a stroll to the door. If the objective is to escape from danger, the
channel may contextually select the easiest escape route. During a drive, the goal channel may determine the right
turns, in context, to reach a destination. The channel may contextually respond with the next most suitable physical
goal for motor activity, to meet a particular objective. This objective may meet the needs of the current feeling. Such
physical goals may be in the channel memory in the context of the past achievement of similar objectives.
A goal channel with intuitive intelligence. The channel may have access to an adequate inflow of information to
intuitively choose physical goals for the system. Society teaches an individual how to achieve objectives from driving a
car to building a house through a range of pre-defined physical activities. The channel may build up a massive memory
of physical goals as responses to feelings. The channel may set sequences of physical goals for complex objectives - to
flee, attack, or negotiate. The choice of goals in response to feelings may be established at a young age. Such goals may
be learned gradually from infancy, forming sequences of physical activities, to be recalled instantly. Many patterns of
social interaction may be learned in playing fields, where each feeling may result in a particular fashion of personal
contact.
Inherited responses to feelings. A goal picture may have many components. Geographically, the channel may be
widely distributed. The levels below the thalamus are known to have substantial powers of self management of basic
life support systems, including feeding, drinking, apparent satiation and copulatory responses. The interpretation of
feelings and the issue of such control instructions may be perennial elements of a goal picture. Some bodily responses
to feelings are automatic. The cerebellum was shown to control habitual movements, under cortical guidance. That the
process may be learned by the complex of cells surrounding the Purkinje cells in the cerebellum. It is now suggested
that over and above such learned movements, the cerebellum may respond with specific physical activity to the
interpretation of feelings by the goal channel. The cold sweat of fear, or the shuddering sobs of sorrow may be the
inherited responses triggered through the cerebellum by the goal channel when it recognises specific feelings.
A goal picture may be the primary drive. The purposive element of the channel may provide a mechanical interface
between feeling and motor activity. Feelings compel action. The individual may not be conscious of the many small
subsidiary motor activities which achieve a goal. The next subconscious objective that meets a feeling may be selected
and acted upon without significant conscious input. A child goes into a tantrum. A man commits a violent act.
Recognition of an event causes strong feelings to be experienced. These may automatically trigger goal pictures and
resultant motor activity. The process may be stopped only if the goal is changed. Once a goal decision is made, the
body is compelled to achieve the goal. The concept of a goal channel may explain the powerful drives that impel an
individual. Many day to day activities may also involve goals that are constant over hundreds of sleep and waking
cycles. Childhood feelings may set long term goals, providing contexts for the launch of current feelings. Such
elements of the goal picture may compel one to continue, consistently keeping a focus on primary objectives, over the
years.
Feedback loops co-ordinate output. It has been reasoned that the current feeling determines the goals and hence the
activity of an individual. From thousands of competing wants, the system must select a single one for action. Intuition,
as implied by IA, may be uniquely fit to contextually pick the single most germane selection. This capability is best
illustrated in the motor channel. Each one of 60,000 motor neuron has up to 20,000 inputs from other neurons as it
travels down the spinal cord. Feed back loops use information from lower levels to modify inputs at higher levels.
Every muscle has an opposing one and many muscles must co-ordinate to achieve even the simplest task. Any selection
may instantly inhibit conflicting demands. Such decisions occur thousands of times a second. It is logical to conclude
that these feed back circuits may have a singular ability to instantly consolidate the backward and forward interactions
of millions of simultaneous inputs. Galaxies of parameters may be processed, using phenomenal intelligence
concerning their interactive impact. After such assessment, a single final picture delivers smooth muscle movement. It
is suggested that a similar process may determine the current feeling of the system.
The limbic system may decide the current feeling. Experimental evidence attributes a significant role for the limbic
system in the realm of emotions. This essay suggested that the output of the limbic system may represent the current
feeling. It is a ring passing through the thalamus, consisting of over a million fibres, which acts in both directions. A
process similar to that in the motor channel may take place in the limbic system. It may evaluate millions of received
parameters to determine, instant by instant, the final output. As in the case of the motor channel, where many muscle
movements oppose each other, many feelings may also be in conflict with each other. While a person is reading a story,
an unexpected sound occurs in the background. The sound generates a feeling. The feeling related to the situation in the
story dominates the system. At some point, suddenly, the feeling generated by the background sound obtrudes. This
feeling may now set system goals. The attention of the mind now changes focus to the sound. It is suggested that the
limbic system may continually process myriad feelings generated by bodily needs and intellectual perceptions to
generate the current feeling. This feeling may inhibit conflicting emotions to dominate the system and trigger goal
images.
When a wish becomes an act of will. William James (38) narrates the internal conflicts of a person while getting up
from a warm bed on a cold winter morning. One lies unable to brace oneself to get out of bed. Then there is a sudden
decision. One may think of some thought connected with the day's activities. He calls it a lucky idea, which "awakens
no contradictory or paralysing suggestions, and consequently produces immediately its appropriate motor effects...."
Suddenly there are no negative feelings and one gets quickly out of bed. He calls it a shift from "wish" to an act of
"will". It is suggested that the lucky thought may have been triggered by that segment of the goal channel which
manages longer term goals. It may have called up images which create a feeling of the need to achieve the day's duties.
The limbic system may evaluate competing feelings, and balance them to determine the current feeling. This "lucky"
emotion may inhibit opposing feelings. It may bring appropriate context and memories. The goal channel may
recognise the feeling to initiate "appropriate motor effects". The "act of will" may have been a sophisticated decision
by the limbic system.
A subtle feeling to goal relationship. It is suggested that the distinction between feelings and goals may be hard to
draw in some areas. Some feelings may be subconscious, with their impact triggering goal pictures and resultant visible
activity. Thus the impulses that trigger many feelings, such as curiosity, or playfulness may be subconscious, but may
produce resulting activity which meets the parameters of the feeling. The event recognition process is reasoned to have
an inherited code to trigger specific feelings when a particular type of event is recognised. As an example, when an
object or event evokes interest and cannot be recognised, a feeling of curiosity may be triggered. This feeling may, in
turn trigger goal images which facilitate investigation. The person may then follow those activities which assist in
recognition of the significance of the event.
The Mind
A composite picture. The IA concept visualises intuition as a process of infinitely graded category recognition, which
enables a supersession of the "understanding" of science by the "wisdom" of the mind. Such wisdom is reasoned to be
the property of neural channels. The channels are assumed to be electrical circuits with powerful memories, with
intuitive intelligence of a very high order. Biochemical messages may further aid this process. The mind does not
appear as a single network intelligence. Myriad separate intelligences seem to operate independently from thousands of
specialised and geographically identifiable neural channels. These channels may be distinct entities, mutually
exchanging and recognising unique and perceptive messages. The picture theory of internal exchange of information is
offered as their medium of communication. A holistic, real time interaction is made feasible in such a system by the
swiftness of IA. Such circuits may further explain certain mysterious functions, such as drives, consciousness, will and
judgement. An attempt is made, in this section, to combine these ideas and functions to present a composite picture of
the mind.
Reasoning chains for understanding. Some scientists may dispute the superiority of human wisdom over modern
scientific understanding. Science is founded on inductive, analytical logic. Logical analysis chunks information into
minimums that fit a specific rule, or reason. Science assembles facts that fit these rules to create understanding. It
presumes that any understanding must be built on a logical structure of underlying reasons. Reasoning chains underpin
science. A phenomenon is presumed to be understood only when the underlying causes are well defined. But the
information inflow into the scientific world overwhelms its capability for providing supporting reasoning chains. Every
science has spawned a dozen more. The vastness of the universe, billions of years of history, the complexity of living
things and the miniature worlds in the cell and the atom dwarf scientific ability to provide reasons. Over centuries of
research, the reasoning chains proposed by the scientific community, even those underlying its most fundamental
beliefs, have also been overturned by new discoveries. Reasoning chains have fallen far behind in providing
understanding.
A wisdom which supersedes an understanding. Intuition, as implied by IA, uses inductive logic to identify the
unique elements which link two patterns through a process of elimination. Such recognition of unique links between
complex patterns is reasoned here to be the basis for human wisdom. Intuition instantly recognises the link of the
pattern "the eyes" to the pattern "look friendly". This is human (or even animal) wisdom. As against this, a reasoning
chain would be hard pressed to explain the causes, since analysis of the two patterns may yield an astronomically large
number of categories with vague relationships. Each element of vagueness further weakens a reasoning chain. The
intuitive link, on the other hand, may be based on powerfully accurate and logical perceptions of concrete experience.
Instead of seeking underlying reasons, the intuitive process may find unique links from a vast storehouse of experience.
Intuition may as often be just as wrong as scientific reasoning. But it girdles a wider horizon and is a powerful weapon
for coping with the environment. This essay suggests that the wisdom created by pattern recognition may be superior to
the analytical understanding created by science. Science assists such wisdom with reasoning chains. This essay
attempts to be one such reasoning chain.
Many intelligences in a federal system. Current neural networks theory may be compared to the effect of ripples
created by a sequence of pebbles dropped into a still pool. The ripples interact and can be expected to indicate the
global outcome of every dropped pebble. The theory suggests a similar global intelligence, with every portion of the
network reflecting every event that occurs in the nervous system. As against such a single intelligence, this essay
suggests that many intelligent regions may perform independent functions in the nervous system. Such regions, their
functions and the nerve fibre links between them have been extensively charted by science. These regions may
communicate internally through intelligent pictures. The evidence for the "picture mode" of transmission is provided
by the phenomenon of point to point "mapping" between the myriad neural channels. It was reasoned that the
association region may inform the prefrontal regions that a pair of scissors has been recognised through a picture. This
message is an independent communication between two finite intelligences and not "signals that balance" an entire
network. Medical evidence also supports the concept in this essay of myriad independent intelligences. These
intelligent circuits are known to form a hierarchy of interactive subsystems, each demanding only critical inputs from
higher levels. The management has been likened to a federal government. At the lowest levels, people manage their
affairs by themselves. Higher level decisions are made by the communities, by the state governments and finally, by the
central government.
A self managed system at lower levels. This decision making system is revealed in the "homeostasis" of animals in
the survival process. Homeostasis (39) is the achievement of a relatively constant state within the body, in a changeable
environment. It is naturally maintained. It is brought about by various sensing, feedback and control systems,
supervised by a hierarchy of control centres. The concept that these centres mediate these controls is based on a wide
base of experimental evidence, gathered by studying the impact of destruction of localised topographical targets in
animals. As higher levels are included with the spinal cord below the cut off section, more effective controls are
retained. The thalamus is the major nerve junction sitting at the apex of this survival hierarchy. The levels below can
sustain a wide range of activities including feeding, drinking, apparent satiation and copulatory responses in a wide
range of adverse conditions. Obviously, an incredibly high level of intelligence and self management exists at these
lower levels.
Selective awareness. But, are we just mechanically constructed objects which respond with electrical and chemical
impulses to the external environment? All of us have a deep down knowledge of being free of the mechanisms that
generate the impulses. We can vividly see visual images and powerfully experience a multitude of sensations and
feelings. Unlike a television camera or a microphone, we are independently conscious that we are seeing and hearing
the world around us. If something is seen, surely there must be someone who sees it - a ghost, or a soul? But, while
neural impulses pulse through every part of our body, we have the sensation of seeing only when these impulses
impinge on the visual cortex. Nerve impulses in the heschl gyrus alone cause us to hear sounds. Are these portals into
the soul? This essay suggests that among the myriad pictures evaluated by the nervous system, consciousness involves
a limited group of pictures of which a human being is conscious. Like every group in the nervous system with its own
intelligence, the conscious intelligence may be an independent entity, constituting a group of neural circuits. It may feel
and act as an independent entity. It is suggested that such an intelligence may operate in the region around the
pre-frontal lobe of the cortex.
Pre-frontal regions and a sense of self. The geography of nerve channels pinpoint many functions, which
inter-communicate. While all other regions of the cortex interact mostly within finite regions, the prefrontal lobes have
abundant connections (40) with the association regions of the three sensory lobes. The association regions are known to
perform the most important act of recognising perception. The message of recognition is carried to the prefrontal
regions. These connections may be one to one projections. Recalled memories, recognition of multiple objects and
complex events may travel as pictures to the prefrontal cortex. This region may be the conscious mind that sees and
knows that it sees. Suppose a computer is constructed to receive, categorise and store received sensory images.
Suppose parallel processing enables a second internal system to receive all such information, including its own
operational parameters. The second system may truthfully say "I can see and hear you. My speech mechanism is
functioning at optimum efficiency". An autonomous intelligence in this region may independently evaluate the system
to enhance our impression that we are independent of ourselves. Consciousness and the sense of self may be moulded
by the circuits in the pre-frontal regions.
Consciousness may provide context. The conscious mind receives sensory inputs, feels emotions, recalls memories,
focuses attention, recognises objects and events, visualises and evaluates alternatives, and wills motor activity. But,
while all sensory inputs are monitored, only a small fraction enters consciousness. The motor functions, stored and
recalled by the cerebellum, remains subconscious. Even the act of will does not enter consciousness. Only if the mind
is questioned does it reveal a decision to sit down, or to go to the water cooler. Many feelings which trigger goal events
also may not enter consciousness. From an astronomically large volume of information, and a wide range of options,
intuition forces the elimination of all alternatives to pin down a single choice. While the mind may be processing many
feelings, the conscious mind may experience only a single dominant feeling. The feeling may provide the context for
the recall of a memory of an event. It may provide a file pocket and reference point. A single hook, a focal point is vital
for context in recalling memories. The conscious mind may provide a critical focusing point for context. Since the
volume of information manipulated by the nervous system is massive, nature may have restricted stored memories to
those entering a limited region of consciousness.
Pre-frontal regions pass judgement. It has been suggested that motor activities may be triggered by feelings. Animals
are known to sustain a wide range of activities including feeding, drinking, apparent satiation and copulatory responses
in a wide range of adverse conditions, in spite of being disconnected from the levels above the thalamus. As such, it is
reasonable to presume that a wide range of feelings which trigger these activities may be generated by levels below the
thalamus. The prefrontal regions appear to generate a different set of feelings. Some years ago, (41) a procedure called
prefrontal lobotomy was applied for patients with intractable pain, or in attempts to modify the behaviour of severely
psychotic patients. The surgery disconnected the prefrontal zones from the regions around the thalamus by cutting
nerve fibre connections. It was noted that such patients were "tactless and unconcerned in social relationships, with an
inability to maintain a responsible attitude". These patients were seen to "lack judgement". Presumably, judgement may
result from the more intellectual feelings triggered by the prefrontal regions.
Cutting off judgement. The geographic differentiation between perception and action is seen in prefrontal lobotomy.
Judgement is a process which evaluates the impact of a proposed course of action. This essay suggests that any
proposed action, even a rude one, will trigger a goal picture. A goal picture is a planned event. The event may be
recognised by the prefrontal area to generate feelings related to its outcome. Normally, a person recognises the impact
of rudeness, to generate a feeling of impropriety. If the limbic system received this message, it may instantly select it as
the current feeling. If the current feeling was negative, the rude action would be instantly inhibited. With pre-frontal
lobotomy, this feeling may not be conveyed to the limbic system. This essay suggests that event recognition by the
prefrontal regions may trigger feelings concerning complex human interactions. Without access to such feelings, the
limbic system may permit the execution of tactless actions. While such intellectual feelings may be generated in the
region of consciousness, the so called primeval urges may be generated from regions below the thalamus.
When will is bypassed. Even while the system is incredibly sensitive to one's needs, one is aware of the difference
between voluntary and involuntary actions. This essay has suggested that many intelligences operate in the system. The
conscious mind may be one of these. It may appear as the "self" and "the master". The system seeks to be sensitive to
"the needs of the master". But it may not always yield control. Any planned course of action generates a feeling. If it is
acceptable to the system, action is triggered. The limbic system may select the current feeling from a range, including
the "wishes" of the conscious mind. It may have inherited code recognition parameters, which even prohibit the
dominance of self destructive feelings. If a feeling is unacceptable, conscious will may be ignored and the action
inhibited. While an individual may "will" the movement of a limb, such will may be over ruled if it does not conform
to a "WASP" formula. The action should be Worthwhile, Appropriate, Safe and Practical. One gets up out of bed if one
feels it is worthwhile. No ordinary person can will himself to take an action which is inappropriate, unsafe, or
impractical. This can seen when a person freezes on the high diving board, in spite of his "wish" to dive.
Limited intellectual control. The outcome of a proposed activity may be instantly transmitted to the pre-frontal
regions as a picture in the goal association channel. Recognition triggers related feelings. One wishes to bring one's
knee up. A goal association picture would inform the pre-frontal regions of the outcome of this move. One's wish,
expressed as a feeling, faithfully triggers motor activity through an appropriate goal picture. The knee comes up
dependably. But what happens if one had this ridiculous wish while standing in a crowded lift on the way to office?
The event recognition picture instantly transmits the impact of this move on a neighbour. The picture would trigger a
powerful feeling that it would be inappropriate. This feeling immediately triggers a goal picture which inhibits such
motor activity. When one sets out to do anything, one instantly knows of the social impact of that action. This
knowledge exists in the prefrontal regions. Evidently, if the pre-frontal regions are disconnected, such controls are
disconnected from the system, and one's activities lack judgement. This may also be the reason why an individual may
sense a lack of control. The conscious mind may reside in a region which has only an advisory role, while major
decisions are taken elsewhere.
Decision by the system. Let us consider the process that converts a judgement into a motor activity - the decision
making process. We don't merely respond to sensory inputs. Beyond mere recognition and evaluation, we have the
powerful ability to initiate, cause, activate, begin, create events. Who initiates all this activity? Is there a free will,
which is exercised by the individual to control his actions? This essay suggests that the consciousness drive continually
triggers activity in the feeling, awareness and goal channels. The most powerful indicator of a free will is demonstrated
by one's ability to move one's muscles, or to focus attention. This initiation may be only an automatic mechanism
which merely triggers the next highest priority activity of the system, while there is consciousness. The "initiation"
could merely be a switching process by the limbic system, which selects the most powerful feeling as the current motor
control option. That feeling becomes the will of the mind. The water balance in the body reduces. A feeling of thirst is
triggered. The limbic system switches the feeling in as the highest system priority of the moment. A goal picture is
triggered. The cerebellum assists the cortical decision in a habitual trip to the water dispenser. A series of motor events
meet the goal. Thirst is quenched. A high level goal picture triggers a reminder of the "urgent" file demanding
attention. The next feeling arrives to trigger a quick trip back.
"Will" may be an illusion. One can focus attention wherever one wishes. It is an act obviously seen to be willed by an
individual. This process is controlled by the executive attention centre (EAC). In reality, the process may be the result
of an intuitive search. The creative process demands focus on new contexts to find solutions. An idea or object which
becomes the focus of attention may be contextually the most appropriate in the light of the current goal. The goal
channel may select the focus. Since it precisely meets one's objectives, one is deluded into believing that one "willed"
the focus of attention. Imagine a slave who is so sensitive to its master's needs that he meets these instantly. The master
may believe that his will controls the slave. The truth may be that the slave is voluntarily following the will of the
master. It is one of the key themes of this essay that a pattern recognition system can be so microscopically sensitive to
the demands of the nervous system that its need (will) becomes its command. This sensitivity may give one the illusion
that one is in command of one's body. It may be the equivalent of believing that one controls one's shadow.
Even animals are creative. A search process, which enables the mind to seek information to assist the achievement of
goals may be a powerful subconscious process. Konrad Lorenz (1972) describes a chimpanzee in a room (42) which
contains a banana suspended from the ceiling just out of reach, and a box elsewhere in the room. "The matter gave him
no peace, and he returned to it again. Then, suddenly - and there is no other way to describe it - his previously gloomy
face 'lit up'. His eyes now moved from the banana to the empty space beneath it on the ground, from this to the box,
then back to the space, and from there to the banana. The next moment he gave a cry of joy, and somersaulted over to
the box in sheer high spirits. Completely assured of his success, he pushed the box below the banana. No man watching
him could doubt the existence of a genuine 'Aha' experience in anthropoid apes". This brilliant insight implies that
creative effort is not necessarily a human prerogative, but an essential nervous system process existing in all animals.
Creativity as a pattern recognition process. The mind has the unique ability to question itself. What is to be its next
course of action to meet a particular goal? The act of selecting an option may be considered a genuine act of will. That
act may come from a feeling. An element of uncertainty precedes such a feeling. This is an interim subconscious period
of search. It is suggested that there may be an intelligent search process in the nervous system, which continually
evaluates alternative contexts against a visualised goal image. Goal pictures control ongoing motor activity. A
sequential test of all perceived contexts for an answer to the current objective may merely be another motor activity.
Instead of despatching sequential impulses to manage muscle movements, such impulses may manage a continuing test
of current context against current goals. Such testing may occur constantly in the subconscious, bringing on the "Aha!"
experience of discovery, when a set of imagined events is perceived to meet all the parameters required for achieving a
singular goal.
Creativity from an algorithm. The old adage is that a computer can never be original, since it only spews out what
has been programmed into it. Computers follow algorithms. Creativity of the human mind has been the most powerful
argument against an algorithmic explanation of the mind. But, if a sophisticated computer could keep experimenting in
its memory with multitudes of combinations, with the goal of achieving a desired result, it could arrive at a new and
original solution. A computer can be programmed to "recognise" an "imagined" event which can achieve a specific
goal. The chimpanzee manipulated many images in its mind, chancing on the possibilities following the position of the
box below the banana. It instantly perceived the sequence of events which could achieve its goal. As at the time of this
writing, the memory capabilities of computers and their capacity for manipulating images are woefully limited. Using
its massive memory based on experience, the mind may create myriad images in imagination. Some of these may link
in exotic combinations to create brand new inventions. If a prodigious memory and sensitive pattern recognition is
assumed for the human system, it may explain the development of imaginative and exciting concepts, products and
processes. An algorithmic (and intuitive) recognition process may be primary to this capability.
A wide field of possibilities. Expert systems can assist millions of users to access key information regarding computer
software, which grows more complex by the day. The legal aspects of commercial activities cover taxation, company
law and constitutional law. Speedy access to particular case laws is a vital need for the legal profession. Computer
diagnosis of diseases can assist hospitals, general practitioners and students to find vital information in specialised
fields. Expert systems can guide staff in large organisations which have thousands of pages of manuals concerning
complex procedures. Diagnostics can assist in problems related to machinery and equipment. In all these fields,
existing manuals can be entered into expert systems if only the process was fairly simple and straightforward.
Simplification of procedures. Traditional expert systems require knowledge engineers, who understand the logical
reasoning in a diagnostic session and can encode this logic into "If, then, else" rules. When the database is large,
questioning priorities may need to be supported by probability estimates of likely enquiries or heuristic assessment of
enquiry directions. Such rule based systems also become complex and intractable when the size of the knowledge base
expands. This section describes an Expert System Shell based on the Intuitive Algorithm (IA). The IA shell requires
merely the categorised entry of data and the design of questions which can identify these categories. The shell isolates
categories, taking uncertainty into account - a question may or may not identify a particular category. The shell avoids
the perennial AI problem of asking stupid questions. The shell prioritises questions and produces answers based on the
IA elimination process.
General terminology. The Shell follows a certain terminology in its diagnostic processes. There are: Objects. Objects
have Properties. Properties suffer Alterations. Alterations are induced by Causes. The Relationship between Causes and
Alterations form Patterns. Causes, Alterations and the Patterns of their Relationships are stored in the memory of an
Expert System. Typical Applications: Object: Person. Property: Health. Alteration: Symptom. Cause: Disease.
Objective: Recognise Disease from an evaluation of Symptoms, using the Pattern of their Relationships. Similarly, an
Object could be a Legal Entity. Property: Freedom. Alteration: Civil Activity. Cause: Legislation, or Case Laws.
The Shell Program. An Expert inputs Knowledge into the Shell Program to create an User Program. The User inputs
Y/N answers to onscreen Alteration Queries which help to identify Causes. The general functions are as follows:
*Type Names. A 40 Character Alteration Type Name and Cause Type Name for data entry reference. For
a Medical Program: Alteration Type Name = Symptom. Cause Type Name = Disease. Further references
in the Program will be to Symptom and Disease.
*Alterations. A 20 Character Alteration Name. An 80 Character Question to User. Each screen holds 64
Alteration Entries, so that the Expert can have a global view of the questioning process. A 4000 Character
description screen permits the end user to obtain details concerning the question covered by the Alteration.
All data entry can be edited.
*Causes. A 20 Character Cause Name. An 80 Character Identifying Statement. Each screen holds 64
Cause entries. A 4000 Character description screen permits the end user to obtain details of the Cause. All
data entry can be edited.
*Hypertext. The Shell allows the Expert to create hypertext links between Causes, allowing the User to
search through the database, by clicking on highlighted words.
*Relationships. The Shell screen permits the entry of the Relationship between an Alteration and a Cause.
Yes/No/Maybe entries can be entered with a single keystroke. "Yes" is entered when the Alteration is
positively present for the Cause and absence of the Alteration clearly indicates absence of the Cause. "No"
is entered when the Alteration is absent for the Cause and presence of the Alteration indicates that this
Cause can be eliminated from further consideration. "Maybe" is entered when presence, or absence of the
Alteration does not indicate presence or absence of the Cause.
*Preparation of the expert system. The Shell is designed to enable the Expert to view the global range of
Causes and design Alteration questions which efficiently slice the matrix of Causes in multiple directions.
Other inputs include the Title of the Expert System, Introductory opening screens and Menu screens. Data
in the completed program is compressed and the program is compiled producing a .EXE file.
*User interaction. The User is presented with the option to carry out a word search, a menu search, or an
expert system search. The expert system choice presents the User with a sequence of questions, with
Yes/No/Skip options to arrive at a list of Probable Causes. The User can get further details of each
selected Cause to verify the diagnosis. The User can also backtrack the questioning process and alter the
Y/N/S entries.
*The process. An "Yes" answer eliminates all Causes which have been entered with a "No" relationship
to the Alteration question. A "No" answer eliminates all Causes which have been entered with a "Yes"
relationship to the Alteration question. The program chooses questioning priority by selecting Alteration
with highest number of "Y" relationships. The program also eliminates all Alteration questions, which
have "Y" relationships only to eliminated Causes. When there are less than 4 remaining Causes, the
program presents a list of Probable Causes.
Unlimited rules. Since it is not necessary to design complex reasoning chains, there is no theoretical limit to the size of
the database which can be handled by the IA system. Each Cause is eliminated based on a logical relationship. Such
logical relationships are entered as "rules" in the traditional expert system. While such systems will be prone to error
when the number of rules exceed a thousand, the IA system can accurately work with even a hundred thousand rules.
This opens the possibility of using AI in voluminous subjects which have never been attempted because of the
complexity of rule based expert systems.
Uncertainty. An extremely powerful part of the program is its ability to handle questions which may or may not have
an impact on the outcome. A particular symptom may or may not be present for a disease. The program will still
eliminate those diseases which have a positive or negative impact, depending on the answer. In spite of the uncertainty,
the elimination proceeds with power. The ability to deal logically with uncertainty is an exceptional feature, which is
not present in any other type of computer based logic.
Stupid questions. If an answer clearly indicates the absence of a related disease, a further question which indicates the
disease is called a "stupid question". Traditional expert systems struggle with the problem of trying to avoid stupid
questions. In the IA system, when a Cause is eliminated, the program also eliminates any Alteration question which has
a "Y" relationship only to the Cause. So, the program will never ask a stupid question and the expert does not need to
design the program to cover this eventuality.
Commercial value and optimal size. Speedy access to data has a commercial value in all those areas where people
routinely use computers. Expert systems which use IA can provide a third level of help for commercial computer
programs. The experience of the author is that expert systems, which solve problems in other areas, require an optimal
size to be of value. They should not appear to be toys. Speedy access to all the information in a 400 page manual may
not enthuse users. They may consider such information to be basic. A 3000 page data base may be considered more
useful. In India, Constitutional Law can be summed up in about 400 pages. Related case laws may cover 3000 pages. A
law practitioner may consider the extraction of a Constitutional Law Provision as too basic, but would value the
extraction of a related Case Law. An expert system may be planned only for areas of commercial value and should be
of optimal size.
Unutilised potential. AI researchers have tended to focus on the need for codification of knowledge from experts. But
in all commercially viable fields in today's world, expertise is already recorded in research papers, reference books and
manuals. It is more practical to design an expert system from published data and use the expert only to verify the
accuracy of the data and the acceptability of the questions. The lack of availability of a wide range of expert systems
for public use is a clear indication of the rule size limitations, complexity and impracticality of current rule based
expert systems. There is an urgent need for the use of practical AI solutions in thousands of areas for problems which
people encounter in their daily lives.
References
1.The nerve impulse is a sudden change in the permeability of the membrane to sodium ions. The sodium
ions carry a positive charge and displace the potassium ions, raising the voltage. This increased voltage is
carried through the axon in successive steps. The speed is barely .5 to 120 meters per second. A volley of
such nerve impulses are carried by the axon in a single direction only. The Oxford Companion to The
Mind, 1987, Richard L.Gregory, Nervous System, P.W.Nathan, Pages 517.
2.Experiments by Karl Lashley in the 1940s showed that the skills learned by rats in maze running could
not be obliterated by removal of particular cortical areas. The results of such ablations were generalised
deficits proportional to the amount and not the region of the cortex removed. The Oxford Companion to
The Mind, 1987, Richard L.Gregory, Memory: Biological Basis, Steven Rose, Page 458.
3.If the touch of a single hair is critical information, all surrounding sensory inputs are shut off to highlight
the message. Similar automatic emphasising of contrasts takes place for both visual, auditory and sensory
inputs. The brain actively participates in closing off irrelevant sensory inputs. Gray's Anatomy, 1989, 37th
Edition, Neural Organisation, Inhibitory Circuits, Page 865.
4.The visual system categorises the perceived images in terms of edges, orientation of lines, and even in
terms of isolation of moving lines. Human Neuroanatomy, 1975, 6th Edition, Raymond C. Truex and
Malcolm B. Carpenter, The Cerebral Cortex, The Primary Visual Area, Page 562-565.
5.The average nerve cell responds within about 5 milliseconds of receiving a message. Gray's Anatomy,
1989, 37th Edition, Physiological Properties of Neurons, Page 878-879.
6.Current understanding is that there is a step by step conversion of dendritic input impulses into output
impulses by a nerve cell. According to this understanding, a neuron has a resting voltage of about 80 mV,
inside negative. This resting voltage can change gradually, by "graded potentials" or suddenly, through
"action potentials". Gradual changes occur across membranes of dendrites, and the cell body. Such
changes can go up, or down. They can inhibit the cell, or trigger an impulse from it. Action potentials
reverse polarity across the membranes of axons. It is an all-or-none response, completed in about 5 milli
seconds. Once initiated, the action potential spreads rapidly down the axon. They travel as impulses,
maintaining a specific frequency. Gray's Anatomy, 1989, 37th Edition, Physiological Properties of
Neurons, Page 879.
7."Of the numerous synaptic terminals clustered on dendrites and soma of a multipolar neuron, some are
excitatory while those from other sources are inhibitory. Depending on the activity or quiescence of such
sources, the ratio of active excitatory and inhibitory synapses continuously varies. Their effects
summate..........., an action potential is generated and spreads along the axon as a nerve impulse." Gray's
Anatomy, 1989, 37th Edition, Neural Organisation, Neurons, Page 864.
8.There are receptors for pressure, touch, pulling and stretching. There's even one to detect hair
movement. Peritrtrichial receptors are cage like formations that surround hair follicles. A single axon
receives data from many hair follicles and each follicle reports to two to twenty axons. Some receptor
branches encircle the follicle and others run parallel to its long axis. Nociceptors are free nerve endings
which convert energy from substances released by damaged cells into pain impulses. The Human Nervous
System, 1983, 4th Edition, Murray L. Barr and John A. Kiernan, Introduction and Neurohistology,
Peripheral Nervous System, Cutaneous Sensory Endings, Physiological Correlates, Page 37.
9.Careful stimulation of the proper motor areas can invoke flexion or extension at a single finger joint,
twitching at the corners of the mouth, elevation of the palate, protrusion of the tongue, and even
involuntary cries or exclamations. Human Neuroanatomy, 1975, 6th Edition, Raymond C. Truex and
Malcolm B. Carpenter, The Cerebral Cortex, Efferent Cortical Areas, The Primary Motor Area, Page 571.
10."Everyone knows what attention is. It is the taking possession by the mind, in clear and vivid form, of
one out of what seem several simultaneous objects or trains of thought. Focalisation, concentration of
consciousness are its essence". The Principles of Psychology, 1890, William James. Quoted in: In the
Theater of Consciousness, 1997, Bernard J. Baars, Page 95.
11."In landmark work using cognitive and brain imaging techniques, Michael Posner and his coworkers
recently discovered a network of brain centres involved in visual and executive attention". In the Theater
of Consciousness, 1997, Bernard J. Baars, Page 100.
12."Little is known about the physiology of memory storage in the brain. Some researchers suggest that
memories are stored at specific sites, and others that memories involve widespread brain regions working
together; both processes may be involved". "Memory," Microsoft Encarta 97 Encyclopedia.
13.Long-term potentiation (LTP) is "the enduring facilitation of synaptic transmission that occurs
following the activation of a synapse by high-frequency stimulation of the presynaptic neuron." This
phenomenon (LTP) has been found to occur in the mammalian hippocampus. Researchers believe that the
hippocampus to be one of the major brain regions responsible for processing memories. Pinel, J. (1993).
Biopsychology,(2nd Edition) Allyn & Bacon: Toronto.
14.In the early periods of evolution, "Nosebrains" dominated decision making systems of lower
vertebrates. The smell of an object decided whether it was edible and could be consumed. If the odour was
wrong, it was inedible and had to be avoided. The Human Nervous System 1983, 4th Edition, Murray L.
Barr and John A. Kiernan, Introduction and Neurohistology, Telencephalon, Page 8.
15.In the late 1920s, W.B.Cannon published a paper which suggested that emotional behaviour was still
present when the viscera was surgically or accidentally isolated from the central nervous system. Different
emotions had similar patterns of visceral responses. Perceptions of visceral responses were non-specific.
Emotional responses were far quicker than visceral responses. Emotions do not follow artificial
stimulation of visceral responses as a matter of course. The Oxford Companion to The Mind, 1987,
Richard L.Gregory, Emotion, George Mandler, Pages 219-220.
16.Scar tissue in the cerebral cortex is one of the causes of epilepsy. When operating to remove the scar
tissue, the surgeon has to stimulate the brain electrically on the conscious patient to locate the problem
area. Excitation of certain parts of the temporal lobe produces intense fear in the patient. Other parts cause
feelings of isolation, of loneliness or sometimes of disgust. The Oxford Companion to The Mind, 1987,
Richard L.Gregory, Nervous System, P.W.Nathan, Page 527.
17.The septal area has been shown to be a pleasure zone for rats. Experiments were conducted on the
animals with electrodes planted in this area where they could self stimulate themselves by pressing on a
lever. They were observed to continue until they were exhausted preferring the effect of stimulation to
normally pleasurable activities such as consuming food. The Oxford Companion to The Mind, 1987,
Richard L.Gregory, Centers in The Brain, O.L.Zangwill, Page 129.
18.The limbic system of the brain contains a ring of interconnected neurons containing over a million
fibres connecting the thalamus, the hippocampus, the septal areas and the amygdaloid body. The ring
transmits impulses in both directions. In 1937 Papez postulated that these parts of the brain constitute a
harmonious mechanism which may elaborate functions of central emotion as well as participate in
emotional expression. Bilateral removal of the hippocampal formation and amygdaloid bodies in monkeys
is followed by docility and lack of emotional responses such as fear or anger. The Human Nervous
System, 1983, 4th Edition, Murray L. Barr and John A. Kiernan, Regional Anatomy of the Central
Nervous System, Circuits of the Limbic System, Page 268.
19.Current understanding of medical experts is that the limbic system is believed, to be intimately
involved in seeking and capturing prey, courtship, mating, rearing of young, subjective and expressive
elements in emotional responses and the balance between aggressive and communal behaviour. Gray's
Anatomy, 1989, 37th Edition, The Limbic Lobe and Olfactory Pathways, Page 1028.
20."The total number of rods in the human retina has been estimated at 110-125 million and of the cones
at 6.3-6.8 million (Osterberg 1935)." Gray's Anatomy, 1989, 37th Edition, The Visual Apparatus, Page
1197.
21.When mapping activity in the cerebral cortex, the tones heard by the ear were noted to be processed
within a region of the cortex called the Heschl gyrus. This auditory area of the brain receives fibres from
the medial geniculate nucleus in the thalamus. There is a spatial representation in the auditory area with
respect to the pitch of sounds. Tones of different pitch or frequency produce brain signals at measurably
different locations within the Heschl gyrus. It was laid out like a piano keyboard. A report by
Dr.Christopher Gallen of the Scripps Clinic in La Jolla, California.
22.A study in 1959 by Powell and Mountcastle indicated that a vertical column of cells extending across
all cellular layers in the somatic sensory cortex constitutes the elementary functional cortical unit. The
columns form a barrel, varying in diameter from 200 microns to 500 microns, with a height equal to the
thickness of the cortex. Neurons of one barrel are related to the same receptive field, are activated as a rule
by the same peripheral stimulus and all the cells of a vertical column discharge at more or less the same
latency following a brief peripheral stimulus. A barrel represents a piece of the cortex activated by a single
axon from one of the specific thalamic nuclei. Similar barrels also exist for associate and commisural
fibres, which transfer information between different regions of the cortex. Human Neuroanatomy, 1975,
6th Edition, Raymond C. Truex and Malcolm B. Carpenter, The Cerebral Cortex, Sensory Areas of the
Cerebral Cortex, Page 555-556. The Human Nervous System, 1983, 4th Edition, Murray L. Barr and John
A. Kiernan, Regional Anatomy of the Central Nervous System, Histology of the Cerebral Cortex,
Intracortical Circuits, Page 228.
23.In the early forties, Dempsey and Morison reported that repeated electrical stimuli into the
"non-specific" nuclei of the thalamus resulted in widespread activity in the outermost cortical layers. The
activity appeared to be of a "recruiting" nature. In 1960 Jasper again suggested that the synaptic
termination of the fibres of the "non-specific" system in the cortex travels parallel to the surface and is
widely distributed in all layers, but the principal functional processes appear to be within the outermost
layers. Human Neuroanatomy, 1975, 6th Edition, Raymond C. Truex and Malcolm B. Carpenter, The
Cerebral Cortex, Nonspecific Thalamocortical Relationships, Page 582-584.
24.Stephen Kosslyn and Martha Farah have shown extensively that visual imagery elicits activity in the
same parts of the cortex as visual perception (Kosslyn, 1980). In the Theater of Consciousness, 1997,
Bernard J. Baars, Page 74.
25.Throughout the growth of the nervous system, axons grow from one region to another and "map" on to
specific target regions. The Oxford Companion to The Mind, 1987, Richard L.Gregory, Brain
Development, Colwyn Trevarthen, Pages 101-110.
26.The information proceeds from primary areas of the cortex to secondary areas which co-ordinate the
information from similar sensory receptors in the other half of the body. Neurons in the primary areas
connect only to the secondary areas. All secondary areas in both hemispheres of the brain are
interconnected. These areas assist binocular vision and stereo-phonic sound. The association areas receive
information from every other secondary sensory region. The Human Nervous System, 1983, 4th Edition,
Murray L. Barr and John A. Kiernan, Regional Anatomy of the Central Nervous System, Medullary
Center, Internal Capsule and Lateral Ventricles, Medullary Center, Page 242.
27. All sensory inputs are first received in the primary somesthetic area. Electrical stimulation of this area
gives modified tactile senses, such as tingling, or numb sensations. If this area gets damaged, the related
sensory inputs cannot be felt. If the somesthetic area is intact and there is damage in the somesthetic
association area, awareness of general senses persists but significance of information with reference to
previous experience is elusive. It is impossible to correlate the surface texture, shape, size, and weight of
the object or to compare the sensations with previous experience. A patient is unable to identify a common
object such as a pair of scissors held in the hand while his eyes are closed. The Human Nervous System,
1983, 4th Edition, Murray L. Barr and John A. Kiernan, Regional Anatomy of the Central Nervous
System, Functional Localisation in the Cerebral Cortex, The Somesthetic Association Cortex, Page
232-233.
28.Each of the 30,000 motor neurons, which control motor activity, receives approximately 20,000
synaptic contacts. The greatest number are from interneurons in the spinal tract. They run up and down the
spinal pathway and synapse with the motor neurons. The Human Nervous System, 1983, 4th Edition,
Murray L. Barr and John A. Kiernan, Regional Anatomy of the Central Nervous System, Spinal Cord,
Ventral Horn, Page 71.
29.Situated in the brain stem, the reticular formation is an early predecessor to the brain. The reticular
formation is the recipient of data from most of the sensory systems. While damage to most other regions
of the brain cause only selective defects, serious damage to the reticular formation results in prolonged
coma. Cutaneous and olfactory stimuli to the reticular formation appear to be especially important in
maintaining consciousness. The latter stimuli may be the reason for the success of smelling salts in
restoring a person from a fainting fit. Experimental results show that electrical stimulation of the reticular
formation can also induce sleep in animals. While there are processes in the reticular formation which
raise the level of consciousness and alertness, there may be a co-existing process that induces sleep. The
Human Nervous System, 1983, 4th Edition, Murray L. Barr and John A. Kiernan, Regional Anatomy of
the Central Nervous System, Reticular Formation, Page 145, 152.
30.Medical research confirms that the cerebellum is "necessary for smooth, co-ordinated, effective
movement". Gray's Anatomy, 1989, 37th Edition, Cerebellar Dysfunction, Page 978.
31.Terminations of movements are affected by damage to the cerebellum. For a normal person, when the
elbow is made to flex against resistance and the arm is released suddenly, contraction of opposing muscle
fibres prevents overflexion. In cerebellar disease, flexion is uncontrolled and the patient may hit himself in
the face or chest. This is called the "rebound phenomenon". With cerebellar problems, the patient converts
a movement which requires simultaneous actions at several joints into a series of movements, each
involving a single joint. When asked to touch his nose, with a finger raised above his head, the patient will
first lower his arm and then flex the elbow to reach his nose. This problem is called "decomposition of
movement". Human Neuroanatomy, 1975, 6th Edition, Raymond C. Truex and Malcolm B. Carpenter,
The Cerebellum - Functional Considerations, Page 434.
32.Each half of the body is represented in the cerebellar cortex. The cerebellum has an arrangement that
represents all motor control functions spread over its cortical layer, with topographic precision.
Researchers have mapped out localised areas on the cerebellar cortex for the control of leg, arm and facial
movements which they found were identical with tactile receiving areas. Motor and sensory functions
were integrated in the cerebellum. Human Neuroanatomy, 1975, 6th Edition, Raymond C. Truex and
Malcolm B. Carpenter, The Cerebellum - Functional Considerations, Page 439.
33.The only fibres leaving the cerebellar cortex are the axons of a specialised group of neurons called the
Purkinje cells. The Human Nervous System, 1983, 4th Edition, Murray L. Barr and John A. Kiernan,
Regional Anatomy of the Central Nervous System - Cerebellum, Gross Anatomy, Cerebellar Cortex,
Cortical Layers, Page 159.
34.In 1967, V.Braitenberg suggested the possibility of control of sequential events by the cerebellum.
These neural relationships appear to create, in the cerebellum, an accurate biological clock. Impulses in
fibres which link successive Purkinje cells, reach the cell dendrites at intervals of about a one ten
thousandths of a second. Alternate parallel rows of Purkinje cells are excited, while the in-between rows
are inhibited. Gray's Anatomy, 1989, 37th Edition, Mechanisms of the Cerebellar Cortex, Page 974.
35.The inferior olivary complex is the source of climbing fibres to all regions of the cerebellar cortex. In
1940 Brodal noted that in young cats and rabbits, all regions of the cerebellar cortex receive exquisitely
marked out projections from the olivary nucleus. Destruction of this olivary neuron branch to the
cerebellar cortex results in severe loss of co-ordination of all movements. Such damage appears to cause
problems very similar to those caused by damage to the cerebellum, even though this bundle of nerves is
only one of the many nerve tracts connecting the cerebellum. Human Neuroanatomy, 1975, 6th Edition,
Raymond C. Truex and Malcolm B. Carpenter, The Cerebellum, Olivocerebellar Fibers, Page 422.
36.Sensory events occurring within a tenth of a second merge into a single conscious sensory experience,
suggesting a 100-millisecond scale. But working memory, the domain in which we talk to ourselves or use
our visual imagination, stretches over roughly 10 second steps. In the Theater of Consciousness, 1997,
Bernard J. Baars, Page 48.
37.Mozart, Wolfgang Amadeus. (Based on his quotation in Hadamard 1945, Page 16). Taken from The
Emperor's New Mind, 1989, Roger Penrose, Page 547.
38.The Principles of Psychology, 1890, William James. Quoted in: In the Theater of Consciousness, 1997,
Bernard J. Baars, Page 130.
39.Homeostasis is the naturally maintained, relatively constant state within the body, maintained in a
changeable environment. It is brought about by various sensing, feedback and control systems, supervised
by a hierarchy of control centres. The frontal cortex, limbic system, hypothalamus, reticular formation and
spinal cord constitute some of the components of this hierarchy. The concept that these centres mediate
these controls is based on a wide base of experimental evidence gathered by studying the impact of
destruction of localised topographical targets in animals. As higher levels are included with the spinal cord
below the cut off section, more effective controls are retained. With transection below the hypothalamus,
minor reflex adjustments of cardiovascular, respiratory and alimentary systems survive, but are not
integrated and normal temperature is not maintained. With transection above the hypothalamus, separating
it from the limbic system, effective controls are maintained within a moderate range of conditions. Innate
drives and motivated behaviour are preserved, including feeding, drinking, apparent satiation, and
copulatory responses. But such controls fail if environmental stresses exceed a certain range e.g.,
persistently high or low temperatures. Animals may attack, try to eat, drink or copulate with inappropriate
objects. But if the connections between the limbic system and the hypothalamus survive and only the
frontal cortex is cut off, normal homeostasis is preserved even in a wide range of adverse conditions.
Gray's Anatomy, 1989, 37th Edition, Functions of the Hypothalamus, Page 1011.
40.The prefrontal area forms a part of the frontal lobe of the cortex including much of the frontal gyri,
orbital gyri, most of the medial frontal gyrus and the anterior part of the cingulate gyrus. While all other
regions of the cortex communicate mostly within finite regions, the prefrontal lobe has abundant
connections with the association cortex of the three sensory lobes. Human Neuroanatomy, 1975, 6th
Edition, Raymond C. Truex and Malcolm B. Carpenter, The Cerebral Cortex, Prefrontal Cortex, Page 587.
41.Medical evidence suggests that patients with extensive frontal lobe damage show disregard for the
general tenets of behaviour and a marked lack of concentration. Some years ago, a procedure called
prefrontal lobotomy, or leucotomy was widely used, either for patients with intractable pain or in attempts
to modify the behaviour of severely psychotic patients. The basic operation disconnected the prefrontal
area from the lower regions by cutting its nerve fibre connections. Many institutionalised patients were
able to return home and even to resume their former activities. The results of these operations were
evaluated in a number of publications. While there was abolition of morbid anxiety and obsessional states,
Freeman and Watts noted a lessening of the consciousness of self. The patients were "tactless and
unconcerned in social relationships, with an inability to maintain a responsible attitude". Human
Neuroanatomy, 1975, 6th Edition, Raymond C. Truex and Malcolm B. Carpenter, The Cerebral Cortex,
Prefrontal Cortex, Page 588.
42.Lorenz, Konrad, 1972. As quoted in The Emperor's New Mind, 1989, Roger Penrose, Page 551.
Discuss this article in the forums
(Note that this date does not necessarily correspond to the date the article was written)
© 1999-2002 Gamedev.net. All rights reserved. Terms of Use Privacy Policy
Comments? Questions? Feedback? Send us an e-mail!
Search:
Home • Essays • Interviews • Programs • Books • Media • Software • Hardware • Games • Features
Discussion Board • AISolutions • Competition • Education • Projects • Glossary • Links • Search
Conclusion
Like David Chalmers, Donald Griffin believes that conciousness results from patterns of activity involving thousands or millions of
neurons. Perhaps this claim is ambiguous, but it is a pretty good lead. If we modeled this neural activity on a machine, then could it
be conscious - at least to some degree? We will have to wait and see.
Search:
Home • Essays • Interviews • Programs • Books • Media • Software • Hardware • Games • Features
Discussion Board • AISolutions • Competition • Education • Projects • Glossary • Links • Search
Turing Machines
During the 1930s-1950s many researchers debated over what was computable, and what wasn't. Many had argued over formal
approaches to computability. In 1937, Alan Turing, a British mathematician who is now considered the father of computing and
artificial intelligence sought to seek an answer to this dilemna. He constructed the theory of a Turing machine. His theorem (the
Church-Turing thesis) states that
Any effective procedure (or algorithm) can be implemented through a Turing machine.
So what are Turing machines? Turing machines are abstract mathematical entities that are composed of a tape, a read-write head, and
a finite-state machine. The head can either read or write symbols onto the tape, basically an input-output device. The head can
change its position, by either moving left or right. The finite state machine is a memory/central processor that keeps track of which of
finitely many states it is currently in. By knowing which state it is currently in, the finite state machine can determine which state to
change to next, what symbol to write onto the tape, and which direction the head should move (left or right). (Note: the tape shall be
assumed to be as large as is neccessary for the current computation it was assigned) As seen in the above figure, input onto the tape
comprises of some finite alphabet (in this case it consists of 0, 1, blank). Thus, the Turing machine can do three possible things.
1. It can write a new symbol at its current position on the tape.
2. It can assume a new state.
3. it can move the position of the head one position to either the left or the right.
This machine is (by the Church-Turing thesis) capable of making any computation. This is not a provable theorem (it has yet to be
disproved) nor a strictly formal definitive approach, the Church-Turing thesis is based on our intuition of what computation is about.
By understanding what Turing machines can compute, we can also gain a better grasp of the potential of production systems for
computing.
Search:
Home • Essays • Interviews • Programs • Books • Media • Software • Hardware • Games • Features
Discussion Board • AISolutions • Competition • Education • Projects • Glossary • Links • Search
AI in Gaming
Artificial Intelligence in games is slowly getting better. With the advent of games like HalfLife and Unreal, even the notoriously
dumb AI-engines in first-person shooters are gradually getting more and more intelligent! Is it due to neglect that games have taken
so long to get half-intelligent enemies? Perhaps, but it is also due to the incredible complexity of advanced AI engines that has put a
lot of programming groups off putting in the effort and research to create one. This essay deals with the techniques often used in
AI-engines in games, and possible uses for other paradigms in AI.
Finite-state machines are a good way of create a quick, simple, and sufficient AI model for the games it is encorporated in. The
"fun-factor" in first-person shooters comes from the sheer numbers of enemies, combined with (in modern ones) stunning 3D
graphics. The new first-person shooters allow for add-ons, like bots. A lot of bots encorporate much more advanced AI algorithms,
like A* pathing algorithms (generated from information they've dynamically learnt about the level), better dodging, jumping and
general fighting code.
There is also a slight derivative to the state-based engine, but it used in more complicated games like flight simulators and games like
MechWarrior. They use goal-based engines - each entity within the game is assigned a certain goal, be it 'protect base', 'attack
bridge', 'fly in circles'. As the game world changes, so do the goals of the various entities.
Possible AI Applications
AI techniques such as genetic algorithms and neural networking can be applied to gaming, and are increasingly are making their
ways into some AI engines. Generation5 has interviewed both Steve Grand (the lead programmer of Creatures, a program that
utilizes both neural networks and genetic emulation) and Andre LaMothe (a famous computer programmer and author) about their AI
programming methods.
Genetic Algorithms
Genetic Algorithms are excellent at searching very large problem spaces, and also for evolutionary developement. For example, an
idea I was going to implement was create a large structure of possible traits for Quake II monsters (aggressiveness, probability of
running away when low on health, probability of running away when few in numbers etc), then use a genetic algorithm to find the
best combination of these structures to beat the player. So, the player would go through a small level, and at the end, the program
would pick the monsters that faired the best against the player, and use those in the next generation. Slowly, after a lot of playing,
Neural Networks
Neural networks can be used to evolve the gaming AI as the player progresses through the game. LaMothe suggests that a
neural-network can be used to assess what fighting moves are to be made in the 3D fighting game (such as VirtuaFighter). The best
thing with neural networks is that they will continually evolve to suit the player, so even if the player changes his tactics, before long,
the network would pick up on it. The biggest problem with NN-programming is that no formal definitions of how to construct an
architecture for a given problem have be discovered, so producing a network to perfectly suit your needs takes a lot of trial-and-error.
Conclusion
I'm very pleased with the direction that Artificial Intelligence is heading - it is slowly taking over more and more of the game loop as
computers get quicker. Players who are getting used to network play are looking for intelligent opponents when playing offline too.
As artificial intelligence techniques get formalized and become more mainstream, we can expect to see some excellent games
emerging over the next 3 to 5 years.
Coming soon: Essays on how to apply AI to your games. Hopefully, I'll get a few essays up on minimax trees (I know the above
explanation wasn't the best), A* pathing, and FSMs.
Search:
Home • Essays • Interviews • Programs • Books • Media • Software • Hardware • Games • Features
Discussion Board • AISolutions • Competition • Education • Projects • Glossary • Links • Search
Applications in Music
The Artificial Intelligence applications in music are endless - unfortunately, at present there is very little so show for it. Artificial
Intelligence and music are at either ends of the spectrum, AI being seen as the epitomy of computer science, and music the epitomy
of art and abstractness.
Dynamic and autonomous music creation has endless possibilities - pieces could be composed in seconds, for garden centre, elevator
and trance music! Computers could provide brilliant 'jam' partners for guitarists, blues/jazz players to help develop their style.
Computers could provide piano accompaniment for orchestral practices. The possibilities are endless! Yet how can all of this be
achieved?
Fractal Music
What do fractals have to do with music? Ever since the 1920s with Joseph Schillinger, music has been recognized to have a chaotic
and recursive nature. Many other studies as to why we find certain music pleasing, and other music as cacophony. It has been shown
that music often has spectral density of 1/f (the concept of spectral density is unimportant here). It has been shown that most fractals
fall into a similar 1/f category too. Fractals have been used to generate music in several ways - you can select a row and use each
pixel to represent a certain note. Other ways have been done by creating music in the same way that the fractal is drawn, with each
pixel position represent a certain note. For me, the best example of a fractal-generated piece has been one created from a Mandelbrot
set:
mandel1.mp3 (347Kb)
This is a very chaotic piece, with occasional breaks - nevertheless, it has a certain Eastern tone to it. The is a lot of evidence that
fractal music could soon provide us with some very real, very entertaining pieces in the near future. If you are interested in finding
more about fractal music, please see our links section.
Automated Transcription
Automated Transcription would indeed revolutize the music industry - imagine putting in a CD, pressing a button and having the
computer create a perfect score of the piece. Transcription can be incredibly difficult, especially for pieces like Canon music, or new
highly-layered pieces such as the works of Frank Zappa or Steve Vai. The human ear (well, the human brain) has the ability to listen
in on a certain sound, ignoring (or at least not taking as much attention to) the other sounds. For example, when you listen to a song
you can listen to the words without being distracted by the music, because you can concentrate on the singer's voice. If you want,
though, you can listen to the guitar, bass, or drums without any trouble.
Creating a program to be able to hone in on these sounds is inherently difficult, since we have no idea how the brain is able to
distinguish sounds within sounds. The area of voice recognition may some day lead us to answers, since voice recognition is
basically the study of finding meaning in one sound - auto transcription would be finding 'meaning' (individual instruments) in a
multitude of sounds.
Other Applications
While these two are the main areas of use for Artificial Intelligence, being a guitar player, I see other areas. Roland released a
software package a few years ago - a MIDI program for guitarists. It printed MIDI files in terms of piano keys, music, or showed the
fret board and the positions played (essentially, tablature). Now, the one complaint was that the tablature feature was terrible, since
the notes and positions the program suggested had no 'logical' order. For non-guitarists, on a guitar you can play a note in several
areas, in fact on my guitar I can play a certain note in 6 different places (E). Therefore, when playing a piece, you can play a certain
note sequence in different areas - these areas can make it easier or harder to play a piece, depending on the stretches, jumps and string
skipping required. I have often contemplated creating a MIDI to TAB program, that takes a guitar riff and uses a genetic algorithm to
find the best tablature to play it with.
● Applications in Gaming.
● Applications in the Military.
● Applications in Music.
● AISolutions
● Interview with Al Biles.
● Applications in Music.
● AISolutions
Search:
Home • Essays • Interviews • Programs • Books • Media • Software • Hardware • Games • Features
Discussion Board • AISolutions • Competition • Education • Projects • Glossary • Links • Search
Military Applications of AI
The military and the science of computers has always been incredibly closely tied - in fact, the early development of computing was
virtually exclusively limited to military purposes. The very first operational use of a computer was the gun director used in the
second world war to aid ground gunners to predict the path of a plane given its radar data. Famous names in AI, such as Alan Turing,
were scientists that were heavily involved in the military. Turing, recognized as one of founders of both contempory computer
science and artificial intelligence, helped create a machine (called Bombe, based on previous work done by Polish mathematicians) to
break any portion of the German Enigma code.
As computing power increased and pragmatic programming languages were developed, more complicated algorithms and
simulations could be realized. For instance, computers were soon utilized to simulate nuclear escalations and wars or how arms races
would be affected by various parameters. The simulations grew powerful enough that the results of many of these 'wargames' became
classified material, and the 'holes' that were exposed were integrated into national policies.
Artificial Intelligence applications in the West started to become extensively researched when the Japanese announced in 1981 that
they were going to build a 5th Generation computer, capable of logic deduction and other such capabilities.
Inevitably, the 5th Generation project failed, due to the inherent problems that AI is faced with. Nevertheless, research still continued
around the globe to integrate more 'intelligent' computer systems into the battlefield. Emphatic generals foresaw battle by hordes of
entirely autonomous buggies and aerial vehicles, robots that would have multiple goals and whose mission may last for months,
driving deep into enemy territory. The problems in developing such systems are obvious - the lack of functional machine vision
systems has lead to problems with object avoidance, friend/foe recognition, target acquisition and much more. Problems also occur
trying to get the robot to adapt to its surroundings, the terrain, and other environmental aspects.
Nowadays, developers seem to be concentrating on smaller goals, such as voice recogition systems, expert systems and advisory
systems. The main military value of such projects is to reduce the workload on a pilot. Modern pilots work in incredibly complex
electronic environments - receiving information not only from their own radar, but from many others (principle behind J-STARS).
Not only is the information load high, the multi-role aircraft of the 21st century have highly complex avonics, navigation,
communications and weapon systems. All this must be organized in a highly accessible way. Through voice-recognition, systems
could be checked, modified and altered without the pilot looking down into the cockpit. Expert/advisory systems could predict what
the pilot would want in a given scenario and decrease the complexity of a given task automatically.
Aside from research in this area, various paradigms in AI have been successfully applied in the military field. For example, using an
EA (evolutionary algorithm) to evolve algorithms to detect targets given radar/FLIR data, or neural networks differentiating between
mines and rocks given sonar data in a submarine. I will look into these two examples in depth below.
Genetic Programming
Genetic programming is an excellent way of evolving algorithms that will map data to a given result when no set formula is known.
Mathmaticians/programmers could normally find algorithms to deal with a problem with 5 or so variables, but when the problem
increases to 10, 20, 50 variables the problem becomes close to impossible to solve. Briefly, how an GP-powered program works is
that a series of randomly generated expression trees are generated that represent various formulas. These trees are then tested against
the data, poor ones discarded, good ones kept and breed. Mutation, crossover, and all of the elements in genetic algorithms are used
to breed the 'highest-fitness' tree for the given problem. At best, this will perfectly match the variables to the answer, other times it
will generate an answer very close to the wanted answer. (For a more in-depth look at GP, read the case study)
A notable example of such a program is SDI's e evolutionary algorithm designed by Steve Smith. e has been used by SDI to research
algorithms to use in radars in modern helicopters such as the AH-64D Longbow Apache and RAH-66 Comanche. e is presented with
a mass of numbers generated by a radar and perhaps a low-resolution television camera, or FLIR (Forward-looking Infra-red) device.
The program then attempts to find (through various evolutionary means) an algorithm to determine the type of vehicle, or to
differentiate between a actual target and mere "noisy" data.
Basically, the EA is fed with a list of 42 different variables collected from the two sensors, and then a truth value specifying whether
the test data was clutter or a target. The EA then generates a series of expression trees (much more complicated than those normally
used in GP programs). When new a best program is discovered, the EA uses a hill-climbing technique to get the best possible result
out of the new tree. Then, the tree is subjected to a heuristic search to optimize the tree.
Once the best possible tree is found, e will output the program as either pseudocode, C, Fortran or Basic.
Once the EA had evolved the training data, it was put to work on some test data. The results were quite impressive:
Percent Errors
Type
Training Data Test Data
Radar 2.5% 8.3%
Imaging 2.0% 8.0%
Fused 0.0% 4.2%
While the algorithms performed well on the training data, the performance decreased a lot when applied to the test data.
Nevertheless, the fused detection algorithm (using both radar and FLIR information) still provided a decent error percentage.
An additional plus to this technique is that the EA could be actually programmed into the weapon systems (not just the algorithm
outputted), so that the system could dynamically adapt to the terrain, and other mission-specific parameters.
Neural-networks
Neural networks (NN) are another excellent technique of mapping numbers to results. Unlike the EA, though, they will only output
certain results. A NN is normally pre-trained with a set of input vectors and a 'teacher' to tell them what the output should be for the
given input. A NN can then adapt to a series of patterns. Thus, when feed with information after being trained, the NN will output the
result whose trained input most closely resembles the input being tested.
This was the method that some scientists took to identify sonar sounds. Their goal was to train a network to differentiate between
rocks and mines - a notoriously difficult task for human sonar operators to accomplish.
The network architecture was quite simple, it had 60 inputs, one hidden layer with 1-24 inputs, and two output units. The output
would be <0,1> for a rock and <1,0> for a mine. The large amount of input units was to encorporate 60 normalized energy levels of
frequency bands in the sonar echo. What this means is that a sonar echo would be detected, and subsequently fed into a frequency
analyzer, that would break down the echo into 60 frequency bands. The various energy levels of these bands was measured, and
converted into a number between 0 and 1.
A few simple training method was used (gradient-descent), as the network was fed examples of mine echoes and rock echoes. After
the network had made its classifications, it was then told whether it was correct or not. Soon, the network could differentiate as good
or better than its equivalent human operator.
The network had also beaten standard data classification techniques. Data classification programs could successfully detect mines
50% of the time by using parameters such as the frequency bandwidth, onset time, and rate of decay of the signals. Unfortunately, the
remaining 50% of sonar echoes do not always follow the rather strict heuristics that the data classification used. The networks power
came in its ability to focus on the more subtle traits of the signal, and use them to differentiate.
Conclusion
The applications of AI in the military are wide and varied, yet due to the robustness, reliability, and durability required for most
military programs and hardware, AI is not yet an intregral part of the battlefield. As techniques are refined and improved, more and
more AI applications will filter into the war scene - after all, silicon is cheaper than a human life.
● Applications in Gaming.
● Applications in the Military.
● Applications in Music.
● AISolutions
● Generation5 Interview with Steve Smith.
● Neural Networks - Generation5 Essays on NNs.
● Genetic Algorithms - Generation5 Essays on GAs.
Hierarchal AI GameDev.net
See Also:
Artificial Intelligence:Documentation
Newsgroup: comp.ai.games
From: andrew@cs.uct.ac.za (Andrew Luppnow)
Date: Fri, 2 Dec 1994 10:10:50 +0200 (SAT)
This document proposes an approach to the problem of designing the AI routines for intelligent computer wargame
opponents. It is hoped that the scheme will allow the efficient, or at least feasible, implementation of opponents
which are capable of formulating strategy, rather than behaving predictably according to fixed sets of simple rules.
In the text below, "DMS" is an abbreviation for "decision-making-system". I use the term very loosely to denote
any programming subsystem which accepts, as input, a "situation" and which generates, as output, a "response".
The DMS may be a simple neural network, a collection of hard-coded rules, a set of fuzzy logic rules, a simple
lookup table, or whatever you want it to be! It's most important feature is that it must be SIMPLE and TRACTABLE
- in particular, it must accept input from a small, finite set of possible inputs and generate output which belongs in
a similarly small, finite set of possible outputs.
Some time ago I asked myself how a programmer might begin to implement the AI of a wargame which requires
the computer opponent to develop a sensible military strategy. I eventually realized that simply feeding a SINGLE
decision-making system with information concerning the position and status of each friendly and enemy soldier is
hopelessly inefficient - it would be akin to presenting a general with such information and expecting him to dictate
the movement of each soldier!
But in reality a general doesn't make that type of decision, and neither does he receive information about the
precise location of each soldier on the battlefield. Instead, he receives strategic information from his commanders,
makes strategic decisions and presents the chosen strategy to the commanders. The commanders, in turn, receive
tactical information and make tactical decisions based on (1) that information and (2) the strategy provided by the
general.
And so the process continues until, at the very bottom level, each soldier receives precise orders about what he
and his immediate comrades are expected to accomplish.
The important point is that the whole process can be envisaged in terms of several 'levels'. Each level receives
information from the level immediately below it, 'summarises' or 'generalises' that information and presents the
result to the level immediately above it. In return, it receives a set of objectives from the level above it and uses
(1) this set of objectives and (2) the information from the lower level to compute a more precise set of objectives.
This latter set of objectives then becomes the 'input from above' of the next lower level, and so on. In summary:
information filters UP through the levels, becoming progressively more general, while commands and objectives
filter DOWN through the levels, becoming progressively more detailed and precise.
I decided that this paradigm might represent a good conceptual model for the implementation of the AI procedures
in a complex strategy-based game: a "tree of DMS's" can be used to mimic the chain of command in a military
hierarchy. Specifically, one might use one or more small, relatively simple DMS's for each level. The inputs for a
DMS of level 'k' would be the outputs of a level (k+1) DMS and the information obtained by 'summarising' level
(k-1) information. The outputs of the level k DMS would, in turn, serve as inputs for one or more level (k-1)
DMS's. Outputs of the level zero DMS's would be used to update the battlefield.
The main advantage of this scheme is that it allows the "higher levels" of the hierarchy to formulate strategy,
without being overwhelmed by the immense and intractably large number of possibilities which the computer AI
would have to consider if it possessed only information about individual soldiers. Indeed, at the topmost level,
decisions would involve rather abstract options such as
● "direct all military activity towards seizing territory X", or
Search:
Home • Essays • Interviews • Programs • Books • Media • Software • Hardware • Games • Features
Discussion Board • AISolutions • Competition • Education • Projects • Glossary • Links • Search
Summary
The problem with the perceptron is that it cannot express non-linear decisions. The perceptron is basically a linear threshold device
which returns a certain value, 1 for example, if the dot product of the input vector and the associated weight vector plus the bias
surpasses the threshold, and another value, -1 for example, if the threshold is not reached.
When the dot product of the input vector and the associated weight vector plus the bias
f(x1,x2,..,xn)=w1x1+w2x2+...wnxn+wb=threshold, is graphed in the x1,x2,...,xn coordinate plane/space one will notice that it is
obviously linear. More than that however, this function separates this space into two categories. All the input vectors that will give a
(f(x1,x2,..,xn)=w1x1+w2x2+...wnxn+wb) value greater than the threshold are separated into one space, and those that will not will be
separated into another (see figure).
The obvious problem with this model then is, what if the decision cannot be linearly separated? The failure of the perceptron to learn
the XOR network and to distinguish between even and odd almost led to the demise of faith in neural network research. The solution
came however, with the development of neuron models that applied a sigmoid function to the weighted sum
(w1x1+w2x2+...wnxn+wb) to make the activation of the neuron non-linear, scaled and differentiable (continuous). An example of a
commonly used sigmoid function is the logistic function given by o(y)=1/(1+e^(-y)), where y=w1x1+w2x2+...wnxn+wb. When these
"sigmoid units" are arranged layer by layer, with each layer downstream another layer acting as the input vector etc. the multilayer
feedforward network is created.
Multilayer feedforward networks normally consist of three or four layers, there is always one input layer and one output layer and
usually one hidden layer, although in some classification problems two hidden layers may be necessary, this case is rare however.
The term input layer neurons are a misnomer, no sigmoid unit is applied to the value of each of these neurons. Their raw values are
fed into the layer downstream the input layer (the hidden layer). Once the neurons for the hidden layer are computed, their activations
are then fed downstream to the next layer, until all the activations eventually reach the output layer, in which each output layer
neuron is associated with a specific classification category. In a fully connected multilayer feedforward network, each neuron in one
layer is connected by a weight to every neuron in the layer downstream it. A bias is also associated with each of these weighted sums.
Thus in computing the value of each neuron in the hidden and output layers one must first take the sum of the weighted sums and the
bias and then apply f(sum) (the sigmoid function) to calculate the neuron's activation.
How then does the network learn the problem at hand? By modifying the all the weights of course. If you know calculus then you
might have already guessed that by taking the partial derivative of the error of the network with respect to each weight we will learn a
little about the direction the error of the network is moving. In fact, if we take negative this derivative (i.e. the rate change of the error
as the value of the weight increases) and then proceed to add it to the weight, the error will decrease until it reaches a local minima.
This makes sense because if the derivative is positive, this tells us that the error is increasing when the weight is increasing, the
obvious thing to do then is to add a negative value to the weight and vice versa if the derivative is negative. The actual derivation will
be covered later. Because the taking of these partial derivatives and then applying them to each of the weights takes place starting
from the output layer to hidden layer weights, then the hidden layer to input layer weights, (as it turns out this is neccessary since
changing these set of weights requires that we know the partial derivatives calculated in the layer downstream) this algorithm has
been called the "back propagation algorithm".
How is the error of the network computed? In most classification networks the output neuron that achieves the highest activation is
what the network classifies the input vector to be. For example if we wanted to train our network to recognize 7x7 binary images of
the numbers 0 through 9, we would expect our network to have 10 output neurons, which each output neuron corresponding to one
number. Thus if the first output neuron is most activated, the network classifies the image (which had been converted to a input
vector and fed into the network) as "0". For the second neuron "1", etc. In calculating the error we create a target vector consisting of
the expected outputs. For example, for the image of the number 7, we would want the eigth output neuron to have an activation of 1.0
(the maximum for a sigmoid unit) and for all other output neurons to achieve an activation of 0.0. Now starting from the first output
neuron calculate the squared error by squaring the difference between the target value (expected value for the output neuron) and the
actual output value and end at the tenth output neuron. Take the average of all these squared errors and you have the network error.
The error is squared as to make the derivative easier.
Once the error is computed, the weights can be updated one by one. This process continues from image to image until the network is
finally able to recognize all the images in the training set.
Training
Recall that training basically involves feeding training samples as input vectors through a neural network, calculating the error of the
output layer, and then adjusting the weights of the network to minimize the error. Each "training epoch" involves one exposure of the
network to a training sample from the training set, and adjustment of each of the weights of the network once layer by layer.
Selection of training samples from the training set may be random (I would recommend this method escpecially if the training set is
particularly small), or selection can simply involve going through each training sample in order.
Training can stop when the network error dips below a particular error threshold (Up to you, a threshold of .001 squared error is
good. This varies from problem to problem, in some cases you may never even get .001 squared error or less). It is important to note
however that excessive training can have damaging results in such problems as pattern recognition. The network may become too
adapted in learning the samples from the training set, and thus may be unable to accurately classify samples outside of the training
set. For example, if we over trained a network with a training set consisting of sound samples of the words "dog" and "cog", the
network may become unable to recognize the word "dog" or "cog" said by a unusual voice unfamiliar to the sound samples in the
training set. When this happens we can either include these samples in the training set and retrain, or we can set a more lenient error
threshold.
These "outside" samples make up the "validation" set. This is how we assess our network's performance. We can not expect to assess
network performance based solely on the success of the network in learning an isolated training set. Tests must be done to confirm
that the network is also capable of classifying samples outside of the training set.
Backpropagation Algorithm
The first step is to feed the input vector through the network and compute every unit in the network. Recall that this is done by
computing the weighting sum coming into the unit and then applying the sigmoid function. The 'x' vector is the activation of the
previous layer.
The second step is to compute the squared error of the network. Recall that this is done by taking the sum of the squared error of
every unit in the output layer. The target vector involved is associated with the training sample (the input vector).
The third step is to calculate the error term of each output unit, indicated below as 'delta'.
The fourth step is to calculate the error term of each of the hidden units.
The fifth step is to compute the weight deltas. 'Eta' here is the learning rate. A low learning rate can ensure more stable convergence.
A high learning rate can speed up convergence in some cases.
The final step is to add the weight deltas to each of the weights. I prefer adjusting the weights one layer at a time. This method
involves recomputing the network error before the next weight layer error terms are computed.
Search:
Home • Essays • Interviews • Programs • Books • Media • Software • Hardware • Games • Features
Discussion Board • AISolutions • Competition • Education • Projects • Glossary • Links • Search
The Neuron
Although it has been proposed that there are anything between 50 and 500 different types of neurons in our brain, they are mostly just
specialized cells based upon the basic neuron. The basic neuron consists of synapses, the soma, the axon and dendrites. Synapses are
connections between neurons - they are not physical connections, but miniscule gaps that allow electric signals to jump across from
neuron to neuron. These electrical signals are then passed across to the soma which performs some operation and sends out its own
electrical signal to the axon. The axon then distributes this signal to dendrites. Dendrites carry the signals out to the various synapses,
and the cycle repeats.
Just as there is a basic biological neuron, there is basic artificial neuron. Each neuron has a certain number of inputs, each of which
have a weight assigned to them. The weights simply are an indication of how 'important' the incoming signal for that input is. The net
value of the neuron is then calculated - the net is simply the weighted sum, the sum of all the inputs multiplied by their specific
weight. Each neuron has its own unique threshold value, and it the net is greater than the threshold, the neuron fires (or outputs a 1),
otherwise it stays quiet (outputs a 0). The output is then fed into all the neurons it is connected to.
Learning
As this talk about weights and thresholds leads to an obvious question. How are all these values set? There are nearly as many
training methods as there are network types (a lot!), but some of the more popular ones include back-propagation, the delta rule and
Kohonen learning.
As architectures vary, so do the learning rules, but most rules can be categorized into two areas - supervised and unsupervised.
Supervised learning rules require a 'teacher' to tell them what the desired output is given an input. The learning rules then adjusts all
the necessary weights (this can be very complicated in networks), and the whole process starts again until the data can be correctly
analyzed by the network. Supervised learning rules include back-propagation and the delta rule. Unsupervised rules do not require
teachers because they produce their own output which is then further evaluated.
Architecture
This area of neural networking is the "fuzziest" in terms of a definite set of rules to abide by. There are many types of networks -
ranging from simple boolean networks (Perceptrons), to complex self-organizing networks (Kohonen networks), to networks
modelling thermodynamic properties (Boltzmann machines)! There is, though, a standard network architecture.
The network consists of several "layers" of neurons, an input layer, hidden layers, and output layers. Input layers take the input and
distribute it to the hidden layers (so-called hidden because the user cannot see the inputs or outputs for those layers). These hidden
layers do all the necessary computation and output the results to the output layer, which (surprisingly) outputs the data to the user.
Now, to avoid confusion, I will not explore the architecture topic further. To read more about different neural nets, see the
Generation5 essays.
Even after discussing neurons, learning and architecture we are still unsure about what exactly neural networks do!
Conclusion
Hopefully, by now you have a good understanding of the basics of neural networks. Generation5 has recently had a lot of information
added on neural networking, both in essays and in programs. We have examples of Hopfield networks, perceptrons (2 example
programs), and even some case-studies on back-propagation. Please browse through the site to find out more!
See Also:
Artificial Intelligence:Gaming
State Machines
Finite State Machine
A finite state machine (FSM) is a system that has a limited number of states of operation. A real world example could be a light
switch which is either on or off, or an alarm clock that is either idling by telling time, ringing an alarm or having its time or alarm
set. Any system that has a limited number of possibilities where something can be defined by one state (even combinations) can be
represented as a finite state machine.
Finite state machines are natural for any type of computer program and understanding how to use them effectively to create game
AI is only as hard as understanding the system you are trying to represent, which is as detailed or simple as you make it.
struct GameLevelState {
int alert; // Alert status of enemies //
struct Positionshot; // Position of last shot fired //
int shotTime; // Game cycle last shot was fired //
int hostage; // Hostage rescued //
int explosives; // Explosives set or not //
int tank; // Tank destroyed //
int dialogue; // Dialogue variable //
int complete; // Mission completed //
};
Flexibility
Keeping your AI flexible is extremely important. The more modular you make your routines, the more you will be able to expand
them as you go on. Its important to understand that designing a game AI is very much an iterative process, you need to try things
out and build upon them.
The goal in creating a good AI is to have units that react in situations that seem realistic in an environment they seem to be
interacting with. If you box in what your units will be able to do too early it will be difficult to expand the breadth of their actions
later on when you decide to augment the game world to feel more complete or interactive.
Unit Actions
In a game where the player controls units, it is all important to have meaningful and well organized information on them. Without
this, adapting the units to the players will become difficult and the users interface with the game could suffer. If the player doesn’t
feel he is controlling the units and getting appropriate information back from them, then all he is doing is clicking around in an
interface and all immersive aspects the game held will be lost. Meaning the player won't be having any fun and could be becoming
frustrated.
Anatomy 101
To get a sense of what kind of information you may want to provide to the player with let's take a look at a sample data structure.
struct Character {
struct Positionpos; // Map position //
int screenX, screenY; // Screen position //
int animDir, animAction, animNum; // Animation information, action and animation
frame number //
int rank; // Rank //
int health; // Health //
int num; // Number of unit //
int group; // Group number //
Grouping
To group or not to group?
If you are creating a First Person Shooter game then it comes as no big surprise that grouping isn't for you. However, if you are
creating a RTS or a game that has the player controlling more than one unit at a time then you have a question to ask yourself.
Do you need your units to act in a coordinated way?
If the answer is yes, then there is a good chance that grouping is for you. If the answer is no there still may be advantages to
grouping but you will have to sort those out on your own as they will no doubt be totally dependent on exactly the kind of actions
you want your units to perform.
Benefits of Grouping
1. Units can move in a formation only accessing one master list of movement information. The advantage here is that you do
not have to propagate information to every unit in the group when a destination or target changes as they all get their
movement information off of the group source.
2. Multi-unit coordinated actions, such as surrounding a building, can be controlled at a central location instead of each unit
One of many
While we are ultimately looking for a group to act as a coordinated system, the system is definitely made up of individual pieces
and it's important to keep track of what each unit is doing individually so that when we need the group to break formation and move
about as separate entities with common or individual purposes we can. For this goal I created a structure similar to the one below.
struct GroupUnit {
int unitNum; // Character Number //
struct Unit *unit; // Unit character data //
struct Positionwaypoint[50]; // Path in waypoints for units //
int action[50]; // Actions by waypoints //
int stepX, stepY; // Step for individual units, when in cover mode //
int run, walk, sneak, fire, hurt, sprint, crawl; // Actions //
int target; // Targets for units //
struct Position targetPos; // Target Position //
};
Explanations of the variables:
The unitNum is the number of the unit in the group. If there is a maximum of 10 units in a group, then there will be 10 possible slots
that could have units. The first unit would be unitNum 0, following to unitNum 9.
The unit is a pointer to the unit's character data, which holds information like the characters current position, health and every other
piece of information on the individual. Its important that a unit's vital signs and other information can be monitored from the group
so that you can easily check to see if a group member has been wounded and communicate this to the other members, along with a
myriad of other possibilities.
The waypoint array contains all the places that the unit has to move in a queue. All actions and waypoints are only specified in the
GroupUnit structure if the group is not in a formation and units need to move about on their own.
The action array contains actions that are associated with the movements to waypoints. This allows you to create more detailed
command chains, as telling units to sneak across one area and then sprint across another adds a lot of possibilities to making more
strategic and thought out movements by the player.
The stepX, stepY information can be used for simple velocity; every frame move this unit this many world-position-units in any
Mob mentality
The ultimate goal of course is to have a centralized location for as much of the data as possible to limit the amount of look ups and
processing to a minimum and keep things simple. Lets take a look at a sample data structure for doing this.
struct Group {
int numUnits; // Units in group //
struct GroupUnit unit[4]; // Unit info //
int formation; // Formation information for units and group //
struct Position destPos; // Destination (for dest. circle) //
int destPX, destPY; // Destination Screen Coords //
struct Position wayX[50]; // Path in waypoints for group //
float formStepX, formStepY; // Formation step values for group movements //
int formed; // If true, then find cover and act as individuals,
otherwise move in formation //
int action, plan; // Group action and plans //
int run, walk, sneak, sprint, crawl, sniper; // Actions //
struct Position spotPos; // Sniper Coords //
int strategyMode; // Group strategy mode //
int orders[5]; // Orders for group //
int goals[5]; // Goals for group //
int leader; // Leader of the group //
struct SentryInfo sentry; // Sentry List //
struct AIStateaiState; // AI State //
};
The numUnits variable refers to the number of units in the group and the unit array holds the GroupUnit information. For this
particular group the maximum units has been hard coded to 4.
The formation flag determines what type of formation the group is in. They could be formed in a column, wedge or diamond shape
easily by just changing this variable and letting the units reposition themselves appropriately.
The destPos and destPX,destPY are all information for keeping track of the final destination of the group and relaying that
information quickly to the player. The waypoints and steps work the same manner as individuals except that when units are in a
formation they will all have the same speed so they stay in formation. There is no need to update each unit by its own speed value,
as the group's can be used.
The formed variable is one of the most important as it determines whether the units act in formation or as individuals. The concept
is that if the group is formed then all the units will have the same operations performed on them each cycle. If there is a reason that
they can't all move the same way, such as enemies attacking or a necessary break in formation to get past an obstacle, then the units
need to move on their own.
The actions are the same as individual, and you'll notice that there is a variable for a sniper that is not in GroupUnit structure as
there is no reason to have one unit in a group be a sniper while the rest are off running around. It is logical to split that unit into its
own group and then control the sniper activities at the group level. This is the kind of planning you need to do to figure out what
information is best served in what section of your structures.
The strategyMode is a quick variable that determines how the units respond to enemies. Is it aggressive, aggressive with cause,
defensive, or run-on-sight? Having an easy to access overview variable that controls basic responses is a good way to cut out a lot
Putting it together
Now that we have some structures for our groups, what's next? The next step is to figure out how you are going to use this
information by routines in your game.
It's crucial that you carefully plan for your AI routines to be modular and flexible so that you can add on to them later and easily
call different pieces. The concept here, as in data structure organization, is to only do something in one function, and to make that
function limited so that it does a specific thing. Then if you need to do that thing again you can call the routine that is already tested
and is a known single point of operation on that data. Later if you run into problems with your AI and need to debug it, you don’t
have to go hunting all over the place for the offending routine, because there is only one routine that would operate on that data, or
at least in that manner.
Tips
Walk before you run. Learn the basics, create the basics, add more advanced routines for your particular game as you need them.
Never fall into the trap of using a routine because it is popular and everyone else seems to be using it. Often other people are using a
routine because its' benefits fit their needs, but it may not fit yours. Furthermore, people often use routines just because they are the
standard, even if they are actually not the best routines for their situation.
Your concern should always be on getting the best results, not having the current fashionable routines; if it works, use it. The game
developer mantra used to be, and always should be, "If it looks right, it is right." Don’t let the people who are interested in
designing real world total physics simulations make you feel bad for creating a simple positional system where you add X to the
position each frame. If that is what works in your particular situation, then do it. Correct physics has its place in some games, but
not all of them and there are other ways of achieving nearly the same results.
You can never build a perfect replica of reality. That is just a fact. So you need to draw your own line on where good enough is and
then make your good enough reality.
Volume II: Unit Goals and Path Finding
See Also:
Artificial Intelligence:Gaming
Movement
Moving, in its most basic form, consists of simply advancing
from one set of coordinates to another set over a period of
time. This can be performed easily by finding a distance vector
and multiplying it by the speed the unit is moving and the time
since we last calculated the position.
Because we are working from a mouse based input system, we
don't expect the user to have to make all the movements
around obstacles like they would in a joystick or first-person
shooter. The way to keep the user from having to click their
way around obstacles is to create an action queue so that we
can have more than one action in a row completed. This way if
a path has to avoid an obstacle we can add the additional paths
in front of the final destination to walk the unit around the
obstacle without player intervention.
Patrolling
Patrolling consists of moving to a series of specified positions in order. At the time when a unit has moved to a
destination and has nowhere else to go, we can compare his current position to his list of patrol points and set a new
destination to the one after where he is closest to.
There isn't a lot to this, but having units moving on the screen, as opposed to standing still and waiting, makes the
world look a lot more alive and gives them a lot less chance of being snuck up on and catching intruders.
Obstacle Avoidance
In the example to the left we have a very thin 4 point poly that is
between the unit and the destination. In this case we move away
from the closest vertex to the destination and move several units
away in the perpendicular angle from the unit's collision. This gives
us the buffer space we need to move around the obstacle.
Obstacle avoidance has to be determined by the type of the obstacles
you are going to be providing. In this simple example we are going
to use convex 4 point polygons, which will usually be in a diamond
or square shape. Because of these obstacle limitations we can simply
find the edge closest to the destination which the unit trying to get to
and move out from the obstacle a little thereby creating a simple way
to avoid obstacles which works fairly well as long as obstacles dont
get too close together.
When you want to get into some more advanced path finding
algorithms, you should look into A*, which is a popular algorithm
for finding the shortest path through very maze-like areas. Beyond
A* there are various steering algorithms for gradually moving
around obstacles and other more hard-coded situations such as creating funneling intersections that can be used to
get to different areas of the map.
Targeting Enemies
Targeting for other units will greatly depend on what you want
your player to be doing in your game. You may want your
player's units to automatically fire on enemies they see, so that
player can devote themselves to the big picture. You may want
your units to only attack if specifically told to keep your
player's attention on the units and their surroundings.
Either way, you will want your enemies to be on the look out
for the player's units to provide the challenge of on-the-ball
enemies.
In a situation where you have split up your directions 8 ways,
you can assume that for a unit to be facing another unit, within
vision range, they must be either directly in front of the unit or
in one of the adjacent directions. A simple test to determine if
the unit is within maximum sight distance and is in one of
these three directions from the unit can give you good result
with a minimal amount of time to test cases.
Of course you will want to add in a test for obstacles to see if the units are blocked, most likely you can use the same
test for obstacle avoidance for this, as you usually either have a clear path or not . Adding in height to visibility
testing will of course totally change the nature of these tests, but that is for another article.
Pursuit
Once your enemies have found a target, you won't want them to just wander around aimlessly if they lose sight of
their prey. At this point you need to make a choice about how you wish to handle your searching though. Up until
now we have only talked about spotting units based on actually being able to see them. In some cases this may get a
little tricky when pursuing an enemy, so you may opt to cheat and just set the destination of the unit being tracked as
the tracker's destination.
If you wish to keep things more realistic and do less "cheating", then you
need to store the last position the target was seen in to give a place to start
searching for them. For our example we will just take a random search
approach. First you would set the last position the target was seen as the
first destination. Then as the next destination you would make a random
distance in the direction that the target was originally from the unit before
he lost sight of the target.
In this way we assume that the target ran away from the unit and if we are
correct, the unit will hopefully find him quickly after passing the first
destination. In case the unit did not find his target, we can make a back up
plan of setting a patrol at random distances around where the first
destination was. So the unit will go back to the spot his target was last
seen and will walk in a pattern searching for him.
While this doesn't cover a lot of possibilities, it does give us a reasonable
response given the situation.
Conclusion
The secret to implementing all game AI is the understanding the cases
you are trying to deal with and the results of what you want it to look like. If you can picture what you want the
actions to look like and formulate an algorithm to make them turn out that way you are 90% of the way done.
However the last 10%, getting it to work, can easily take 10 times as long as figuring out how to do it…
-Geoff Howland
Lupine Games
The first article: Practical Guide to Building a Complete Game AI: Volume I
Discuss this article in the forums
| | | |
Features
by John Funge
Gamasutra
December 6, 1999
Printer Friendly
AI for Games and Animation:
Version A Cognitive Modeling Approach
Modeling for computer games addresses the challenge of automating a
variety of difficult development tasks. An early milestone was the Contents
Excerpted from AI for
combination of geometric models and inverse kinematics to simplify
Games and Animation
keyframing. Physical models for animating particles, rigid bodies, Making Them Think
(AK Peters, 1999)
deformable solids, fluids, and gases have offered the means to generate
copious quantities of realistic motion through dynamic simulation. Predefined Behavior
Biomechanical modeling employs simulated physics to automate the lifelike
animation of animals with internal muscle actuators. Research in behavioral Goal-Directed Behavior
modeling is making progress towards self-animating characters that react
appropriately to perceived environmental stimuli. It has remained difficult, The Middle Ground
however, to instruct these autonomous characters so that they satisfy the
programmer's goals. Hitherto absent in this context has been a substantive A Simple Tutorial: Maze
Solving
apex to the computer graphics modeling pyramid (Figure 1), which we
identify as cognitive modeling.
Discussion
Cognitive models go beyond behavioral models, in that they govern what a character knows, how that
knowledge is acquired, and how it can be used to plan actions. Cognitive models are applicable to
instructing the new breed of highly autonomous, quasi-intelligent characters that are beginning to find
use in interactive computer games. Moreover, cognitive models can play subsidiary roles in controlling
cinematography and lighting. See the color plates at the end of this article for some screenshots from
two cognitive modeling applications.
We decompose cognitive modeling into two related sub-tasks: domain knowledge specification and
character instruction. This is reminiscent of the classic dictum from the field of artificial intelligence
(AI) that tries to promote modularity of design by separating out knowledge from control.
knowledge + instruction = intelligent behavior
Domain (knowledge) specification involves administering knowledge to the character about its world
The situation calculus is the mathematical logic notation we will be using and it has many advantages
in terms of clarity and being implementation agnostic, but it is somewhat of a departure from the
repertoire of mathematical tools commonly used in computer graphics. We shall therefore overview in
this section the salient points of the situation calculus, whose details are well-documented in the book
[Funge99] and elsewhere [LRLLS97,LLR99]. It is also worth mentioning that from a user's point of
view the underlying theory can be hidden. In particular, a user is not required to type in axioms
written in first-order mathematical logic. In particular, we have developed an intuitive high-level
interaction language CML (Cognitive Modeling Language) whose syntax employs descriptive keywords,
but which has a clear and precise mapping to the underlying formalism (see the book [Funge99], or
website www.cs.toronto.edu/~funge, for more details ).
The situation calculus is an AI formalism for describing changing worlds using sorted first-order logic.
A situation is a "snapshot" of the state of the world. A domain-independent constant s0 denotes the
initial situation. Any property of the world that can change over time is known as a fluent. A fluent is a
function, or relation, with a situation term (by convention) as its last argument. For example,
Broken(x, s) is a fluent that keeps track of whether an object x is broken in a situation s.
Primitive actions are the fundamental instrument of change in our ontology. The sometimes
counter-intuitive term "primitive" serves only to distinguish certain atomic actions from the "complex",
compound actions that we will defined earlier. The situation s' resulting from doing action a in situation
s is given by the distinguished function do, so that s' = do(a,s). The possibility of performing action a
in situation s is denoted by a distinguished predicate Poss (a,s). Sentences that specify what the state
of the world must be before performing some action are known as precondition axioms. For example,
it is possible to drop an object x in a situation s, if and only if a character is holding it:
The effects of an action are given by effect axioms. They give necessary conditions for a fluent to take
on a given value after performing an action. For example, the effect of dropping a fragile object x is
that the object ends up being broken
Surprisingly, a naive translation of effect axioms into the situation calculus does not give the expected
results. In particular, stating what does not change when an action is performed is problematic. This is
called the "frame problem" in AI. That is, a character must consider whether dropping a cup, for
instance, results in, say, a vase turning into a bird and flying about the room. For mindless animated
characters, this can all be taken care of implicitly by the programmer's common sense. We need to
give our thinking characters this same common sense. They need to be told that they should assume
things stay the same unless they know otherwise. Once characters in virtual worlds start thinking for
themselves, they too will have to tackle the frame problem. The frame problem has been a major
reason why approaches like ours have not previously been used in computer animation or until
recently in robotics. Fortunately, the frame problem can be solved provided characters represent their
knowledge with the assumption that effect axioms enumerate all the possible ways that the world can
change. This so-called closed world assumption provides the justification for replacing the effect
axioms with successor state axioms. For example, the following successor state axiom says that,
provided the action is possible, then a character is holding an object if and only if it just picked up the
object or it was holding the object before and it did not just drop the object:
.
Character Instruction
We distinguish two broad possibilities for instructing a character on how to behave: predefined
behavior and goal-directed behavior. Of course, in some sense, all of a character's behavior is defined
in advance by the animator/programmer. Therefore, to be more precise, the distinction between
predefined and goal-directed behavior is based on whether the character can nondeterministically
select actions or not.
What we mean by nondeterministic action selection is that whenever a character chooses an action it
also remembers the other choices it could have made. If, after thinking about the choices it did make,
the character realizes that the resulting sequence of actions will not result in a desirable outcome, then
it can go back and consider any of the alternative sequence of actions that would have resulted from a
different set of choices. It is free to do this until it either finds a suitable action sequence, or exhausts
all the (possibly exponential number of) possibilities.
A character that can nondeterministically select actions is usually a lot easier to instruct, but has a
slower response time. In particular, we can tell a cognitive character what constitutes a "desirable
| | | |
Features
by John Funge
Gamasutra
December 6, 1999
Printer Friendly Predefined Behavior
Version Contents
There are many convenient techniques we can use to predefine a
character's behavior. In this article, however, we are more interested in Making Them Think
techniques for which the character's behavior is not completely determined
Excerpted from AI for in advance. Therefore, we shall not attempt a comprehensive survey of Predefined Behavior
Games and Animation techniques for predefining behavior. Instead, we shall take a brief look at
(AK Peters, 1999) two particularly popular approaches: reactive behavior rules, and Goal-Directed Behavior
hierarchical finite-state machines (HFSM).
The Middle Ground
Reactive Behavior Rules
A Simple Tutorial: Maze
We will use the term reactive behavior when a character's behavior is based Solving
solely on its perception of the current situation. What we mean by this is
that the character has no memory of previous situations it has encountered. Discussion
In particular, there is no representation of its own internal state and so it
will always react in the same way to the same input stimuli, regardless of Notes and References
the order in which the inputs are received. A simple way to encode reactive
behavior is as a set of stimulus-response rules. This has a number of important advantages:
● Although the set of rules might be short, and each of the rules very simple, that doesn't
necessarily mean the behavior that results from the character following the rules is simple at all.
That is, we can often capture extremely sophisticated behavior with some simple rules.
● We can usually evaluate the rules extremely quickly so there should be no problem obtaining
real-time response from our characters.
● There is no need to worry about various knowledge representation issues that arise when
characters start thinking for themselves. That is, the characters are not doing any thinking for
themselves; we have done it all for them, in advance.
The use of reactive behavior rules was also one of the first approaches proposed for generating
character behaviors, and it is still one of the most popular and commonplace techniques. Great
success has been obtained in developing rule sets for various kinds of behavior, such as flocking and
collision avoidance. As an example of a simple stimulus-response rule that can result in extremely
sophisticated behavior, consider the following rule:
Believe it or not, this simple "left-hand rule" will let a character find its way through a maze. It is an
excellent example of how one simple little rule can be used to generate highly complex behavior. The
character that follows this rule doesn't need to know it is in a maze, or that it is trying to get out. It
blindly follows the rule and the maze-solving ability simply "emerges". Someone else did all the
thinking about the problem in advance and managed to boil the solution down to one simple
instruction that can be executed mindlessly. This example also shows how difficult thinking up these
simple sets of reactive behavior rules can be. In particular, it is hard to imagine being the one who
thought this rule up in the first place, and it even requires some effort to convince oneself that it
works.
We can thus see that despite some of the advantages, there are also some serious drawbacks to using
sets of reactive behavior rules:
● The biggest problem is thinking up the correct set of rules that leads to the behavior we want. It
can require enormous ingenuity to think of the right set of rules and this can be followed by
hours of tweaking parameters to get things exactly right.
Letters to the Editor: ● The difficult and laborious process of generating the rules will often have to be repeated, at least
Write a letter in part, every time we want to effect even a slight change in the resulting behavior.
View all letters
● Since the behavior rules are deterministic, once an action is chosen, there is no way to
reconsider the choice. There are many cases when a cognitive character could use its domain
It is often easier to write a controller if we can maintain some simple internal state information for the
character. One popular way to do this is with HFSM that we discuss in the next section.
Hierarchical Finite-state Machines (HFSM)
Finite-state machines (FSMs) consist of a set of states (including an initial state), a set of inputs, a set
of outputs, and a state transition function. The state transition function takes the input and the current
state and returns a single new state and a set of outputs. Since there is only one possible new state,
FSMs are used to encode deterministic behavior. It is commonplace, and convenient, to represent
FSMs with state transition diagrams. A state transition diagram uses circles to represent the states and
arrows to represent the transitions between states. Figure 2 depicts an FSM that keeps track of which
compass direction a character is heading each time it turns "left".
As the name implies, an HFSM is simply a hierarchy of FSMs. That is, each node of an HFSM may itself
be an HFSM. Just like functions and procedures in a regular programming language, this provides a
convenient way to make the design of an FSM more modular. For example, if a character is at
coordinates (x,y), Figure 3 depicts an HFSM that uses the FSM in Figure 2 as a sub-module to calculate
the new cell after turning "left", or moving one cell ahead.
HFSMs are powerful tools for developing sophisticated behavior and it is easy to develop graphical user
interfaces to assist in building them. This has made them a popular choice for animators and game
developers alike.
HFSMs maintain much of the simplicity of sets of reactive-behavior rules but, by adding a notion of
internal state, make it easier to develop more sophisticated behaviors. Unfortunately, they also have
some of the same drawbacks. In particular, actions are chosen deterministically and there is no explicit
separation of domain knowledge from control information. This can lead to a solution which is messy,
hard to understand and all but impossible to maintain. Just like reactive-behavior rules, there can also
be a large amount of work involved if we want to obtain even slightly different behavior from an HFSM.
________________________________________________________
Goal-Directed Behavior
| | | |
Features
by John Funge
Gamasutra
December 6, 1999
Printer Friendly Goal-Directed Behavior
Version Contents
The first step in describing goal-directed behavior is to come up with a way
to define a cognitive character's goals. The situation calculus provides a Making Them Think
simple and intuitive theoretical framework to explain how this can be done.
Excerpted from AI for In particular, a character's goals can be expressed in terms of the desired Predefined Behavior
Games and Animation value of various relevant fluents. A goal can therefore be expressed as a
(AK Peters, 1999) defined fluent, i.e., a fluent defined in terms of other fluents. For example, Goal-Directed Behavior
suppose we have two characters, call them Dognap and Jack, such that
Dognap is armed with a gun, and wants to kill Jack. Then, we can state that The Middle Ground
Clearly, Dognap will have achieved this goal in any situation s' for which is Discussion
goal(s') true. We recall that any situation is either the initial situation s0, or
of the form: Notes and References
Therefore, if goal(s0) is not true, then Dognap must search for a sequence of n actions, a0,...,an-1
such that
is true.
Situation Tree
To explain how characters can automatically search for sequences of actions that meet their goals, we
will introduce the idea of a situation tree. In particular, we can think of the actions and effects as
describing a tree of possible future situations. The root of the tree is the initial situation s0, each
branch of the tree is an action, and each node is a situation. Figure 4 shows an example of a tree with
n actions, a0,a1...,an-1.
The value of the fluents at each node (situation) is determined by the effect axioms. Figure 5 shows a
simple concrete example using the Dognap and Jack example, and the corresponding effect axioms,
that we described earlier.
Figure 5 A concrete example of a situation tree. A goal situation is a situation in which the fluent is
true. For example, in Figure 5 we can see that if the goal is still to kill Jack then the situation
is a goal situation. We can see that in this example there are many goal situations, for example
is another goal situation. In general, however, there is no guarantee that a goal situation exists at all.
If a goal situation does exist, then any action sequence that leads to one of the goal situations is called
a plan.
Figure 6 shows a simple abstract situation tree with just three actions, and three goal situations. We
will use this figure to illustrate how a character can search the tree to automatically find a plan (a
path) that leads from the initial situation (the root) to a goal situation. Depending on how we choose
to search the tree we will find different plans (paths). In particular, we can see some common search
strategies being applied. We can see that a bounded depth-first search strategy finds the plan
[a0,a2,a0], whereas a breadth first search finds [a1,a2].
A breadth-first search tries exhaustively searching each layer of the tree before proceeding to the next
layer. That is, it considers all plans of length 0, then all plans of length 1, etc. Thus, a breadth-first
search is guaranteed to find a plan if there is one. Moreover it will find the shortest such plan.
Unfortunately, a breadth-first search requires an exponential amount of memory as the character has
to remember all the previous searches.
In the worst case, the situation tree does not contain any goal situations. If this is the case, then any
exhaustive search algorithm will take an exponential amount of time to respond that there is no plan
available to achieve the goal. This is one of the major limitations of planning and is something we will
look at in more detail in the next section. In the meantime, we mention that looking for different
search algorithms is an important topic in AI research and the interested reader should consult the
further reading section. One of the most interesting new developments is the use of stochastic search
algorithms.
It should also now be apparent how choosing actions nondeterministically entails searching for
appropriate action sequences in a search space that potentially grows exponentially. This corresponds
to the usual computer science notion of computational complexity. Another interesting point to note is
that CPU processing power is also growing exponentially. Therefore, according to Moore's law, our
computer characters can be expected to be able to search one layer deeper in the situation tree every
eighteen months or so.
________________________________________________________
| | | |
Features
by John Funge
Gamasutra
December 6, 1999
Printer Friendly The Middle Ground
Version Contents
As we explained, for predefined behaviors the character doesn't have to do
any searching for actions that achieve its goals. It simply follows the Making Them Think
instructions it was given and ends up at a goal situation. In effect, for a
Excerpted from AI for given set of inputs, the path through the tree of possible situations has been Predefined Behavior
Games and Animation determined in advance. If the predefined behaviors were defined properly,
(AK Peters, 1999) then the path that they specify through the tree will lead to a goal situation. Goal-Directed Behavior
In this section, the question we want to ask is whether there is some middle The Middle Ground
ground between asking the character to do all the work at run-time and
asking the programmer to all the work at compile time. In particular, A Simple Tutorial: Maze
consider that on the one hand we have predefined behavior which Solving
corresponds to a single path through the situation tree, and on the other
hand we have goal-directed behavior which corresponds to searching the Discussion
whole tree. Clearly, the middle ground has to be searching some subset of
Notes and References
the tree.
Note that this "middle ground" is still technically goal-directed behavior, but we now have control over
how much nondeterminism is allowed in the behavior specification. Only in the limiting case, when we
have removed all the nondeterminism, does the behavior reduce to deterministic predefined behavior.
Precondition Axioms
Although we might not have realized it, we have already seen one way to exclude parts of the
situation tree from the search space. In particular, precondition axioms prune off whole chunks of the
tree by stating that not all actions are possible in all situations. Figure 7 shows an example of an
abstract tree in which it is not possible to do an action a2 because an action a1 changed something
which made it impossible.
While preconditions are important for cordoning off parts of the situation tree, they are a clumsy way
Letters to the Editor: to try and coerce a character to search a particular portion of the tree. In particular, we need a way to
Write a letter give a character general purpose heuristics to help it find a goal faster. For example, we might want to
View all letters give the character a heuristic that will cause it look at certain groups of actions first, but we do not
want to absolutely exclude the other actions.
We would like to provide a character with a "sketch plan" and have it responsible for filling in the
remaining missing details. In this way, we salvage some of the convenience of the planning approach
while regaining control over the complexity of the planning tasks we assign the character. We will
show how we can use the idea of complex actions to write sketch plans.
The actions we discussed previously, defined by precondition and effect axioms, are referred to as
primitive actions. (The term "primitive action" is only meant to indicate an action is an atomic unit,
and not a compound action. Unfortunately, the term can be misleading when the action actually refers
to some sophisticated behavior, but we will stick with the term as it is widely used in the available
literature). Complex actions are abbreviations for terms in the situation calculus; they are built up
from a set of recursively defined operators. Any primitive action is also a complex action. Other
complex actions are composed using various operators and control structures, some of which are
deliberately chosen to resemble a regular programming language. When we give a character a
complex action a, there is a special macro Do that expands a out into terms in the situation calculus.
Since complex actions expand out into regular situation calculus expressions, they inherit the solution
to the frame problem for primitive actions.
Complex actions are defined by the macro Do(a,s,s'), such that is a state that results from doing the
complex action a in state s. The complete list of operators for the (recursive) definition of Do are given
below. Together, the operators define an instruction language we can use to issue direction to
characters. The mathematical definitions can be difficult to follow, and the reader is encouraged to
consult the book [Funge99], in which we explain the basic ideas more clearly using numerous
examples of complex actions (note there are two freely available implementations of complex actions
that can be studied for a more practical insight into how the macro expansion works--see
www.cs.toronto.edu/~funge/book).
The macro expansion Do(a,s,s') specifies a relation between two situations s and s', such that is a
situation that results from doing the complex action a in situation s. In general, there is not a unique
s', so if we have some initial situation s0, a complex action "program", and a bunch of precondition
and effect axioms, then Do(program, s0, s') specifies a subset of the situation tree. Figure 8 shows a
quick example of how a complex action can be used to limit the search space to some arbitrary subset
of the situation tree. The other thing we can see from the figure is that the mathematical syntax can
be rather cryptic. Therefore, in the appendix, we introduce some alternative syntax for defining
complex actions that is more intuitive and easy to read.
On its own, just specifying subsets of the situation tree is not particularly useful. Therefore, we would
normally explicitly mention the goal within the complex action. We shall see many examples of this in
what follows. For now, suppose the complex action "program" is such a complex action. If we can find
any
such that Do(program, s0, s'), then the plan of length n, represented by the actions a0,...,an-1. , is
the behavior that the character believes will result in it obtaining its goals. Finding such an s' is just a
matter of searching the (pruned) situation tree for a suitable goal situation. Since we still end up
searching, research in planning algorithms is just as relevant to this section as to the straight
goal-directed specification section.
Implementation
Note that we defined the notion of a situation tree to help us visualize some important ideas. We do
not mean to suggest that in any corresponding implementation that there need be (although, of
course, there may be) any data structure that explicitly represents this tree. In particular, if we
explicitly represent the tree, then we need a potentially exponential amount of memory. Therefore, it
makes more sense to simply build portions of the tree on demand, and delete them when they are no
longer required. In theorem provers and logic programming languages (e.g., Prolog), this is exactly
what happens continually behind the scenes.
Logic programming languages also make it straightforward to under-specify the domain knowledge.
For example, it is perfectly acceptable to specify an initial state that contains a disjunction, e.g.
OnTable(cup,s0) v OnFloor(cup,s0). Later on, we can include information that precludes a previously
possible disjunct, and the character will still make valid inferences without us having to go back and
| | | |
Features
by John Funge
Gamasutra
December 6, 1999
Printer Friendly A Simple Tutorial Example: Maze Solving
Version Contents
We already looked at some predefined behavior for solving a maze. Let's
take a look at a goal-directed approach to the problem. Of course, since Making Them Think
there are well-known predefined behaviors for maze solving, we would not
Excerpted from AI for suggest using a goal-directed approach in a real application. Therefore, this Predefined Behavior
Games and Animation section is simply meant as a tutorial example to show how some of the
(AK Peters, 1999) different pieces fit together. Goal-Directed Behavior
Let us suppose we have a maze defined by a predicate Free(c), that holds A Simple Tutorial: Maze
when, and only when, the grid cell c is "free". That is, it is within range and Solving
is not occupied by an obstacle.
Discussion
Occupied(c), sizex, and sizey each depend upon the maze in question. In addition, there are two maze
dependent constants start and exit that specify the entry and exit points of a maze. Figure 9 shows a
simple maze and the corresponding definition.
We also need to define some functions that describe a path within the maze. We say that the adjacent
cell "North" of a given cell is the one directly above it, similarly for "South", "East", and "West".
Figure 10 shows the possible directions a character can move when in two different situations.
A fluent is completely specified by its initial value and its successor-state axiom. For example, the
initial position is given as the start point of the maze and the effect of moving to a new cell is to
update the position accordingly.
So for example, in Figure 9, if the character has previously been to the locations marked with the filled
dots, and in situation the character moves north to the unfilled dot, then we have that position(s) =
(2,0) and that position(do(move(north),s)=(2,1).
The list of cells visited so far is given by the defined fluent . It is defined recursively on the situation to
be the list of all the positions in previous situations (we use standard Prolog list notation).
We have now completed telling the character everything it needs to know about the concept of a
maze. Now we need to move on and use complex actions to tell it about its goal and any heuristics
that might help it achieve those goals. As a first pass, let's not give it any heuristics, but simply
provide a goal-directed specification of maze-solving behavior. Using complex actions we can express
this behavior elegantly as follows:
Just like a regular "while" loop, the above program expands out into a sequence of actions. Unlike a
regular "while" loop, it expands out, not into one particular sequence of actions, but into all possible
sequences of actions. The precondition axioms that we previously stated, and the exit condition of the
loop, define a possible sequence of actions. Therefore, any free path through the maze, which does
not backtrack and ends at the exit position, meets the behavior specification.
In the initial situation we have . Thus the guard of the "while" loop holds and we
can try to expand
However, from the action preconditions for and the definition of the maze we can see that:
This leaves us with s = do(move(north), s0) V s = do(move(east), s0).That is, there are two possible
resulting situations. That is why we refer to this style of program as nondeterministic.
In contrast, in situation s = do(move(north),s0) there is only one possible resulting situation. We have
Do(( d) move(d),s,s') that expands out into s'=do(move(north),s).
If we expand out the macro
So, as depicted in Figure 11, our "program" does indeed specify all paths through the maze.
Although we disallow backtracking in the final path through the maze, the character may use
backtracking when it is reasoning about valid paths. In most of the mazes we tried, the character can
reason using a depth-first search to find a path through a given maze quickly. For example, Figure 12
shows a path through a reasonably complicated maze that was found in a few seconds.
To speed things up, we can start to reduce some of the nondeterminism by giving the character some
heuristic knowledge. For example, we can use complex actions to specify a "best-first" search
strategy. In this approach, we will not leave it up to the character to decide how to search the possible
paths, but constrain it to first investigate paths that head toward the exit. This requires extra lines of
code, but could result in faster execution.
For example, suppose we add an action goodMove(d), such that it is possible to move in a direction d
if it is possible to "move" to the cell in that direction and the cell is closer to the goal than we are now.
Now we can rewrite our high-level controller as one that prefers to move toward the exit position
whenever possible.>
At the extreme, there is nothing to prevent us from coding in a simple deterministic strategy such as
the "left-hand" rule. For example, if we introduce a defined fluent dir that keeps track of the direction
the character is traveling, and a function ccw that returns the compass direction counterclockwise to
its argument, then the following complex action implements the left-hand rule.
The important point is that using complex actions does not rule out any of the algorithms one might
consider when writing the same program in a regular programming language. Rather, it opens up new
possibilities for high-level specifications of behavior at a cognitive level of abstraction.
_______________________________________________________________
Discussion
| | | |
Features
by John Funge
Gamasutra
December 6, 1999
Printer Friendly Discussion
Version Contents
Complex actions provide a convenient tool for giving a character "advice" in
the form of heuristic rules that will help it solve problems faster. In general, Making Them Think
the search space will still be exponential, but reducing the search space can
Excerpted from AI for make the difference between a character that can plan 5 steps ahead, say, Predefined Behavior
Games and Animation and one that can plan 15 steps ahead. That is, we can get characters that
(AK Peters, 1999) appear a lot more intelligent. Goal-Directed Behavior
The possibility also exists for incremental refinement of a specification, The Middle Ground
perhaps, from a high-level specification to the point where it more closely
resembles a controller written using a conventional imperative programming A Simple Tutorial: Maze
language. That is, we can quickly create a working prototype by relying Solving
heavily on goal-directed specification. If this prototype is too slow, we can
use complex actions to remove more and more of the nondeterminism. If Discussion
required, we can even do this to the point where the behavior is completely
Notes and References
predefined.
To sum up, if we can think of, or look up, a simple predefined way to produce the behavior we are
interested in, then it makes a lot of sense to use it. This is especially so if we don't think the behavior
will need to be modified very often, or at least if the anticipated modifications are minor ones. It is not
surprising, therefore, that a lot of simple reactive behavior is implemented using simple reactive
behavior rules. For simple reactive behavior, like collision avoidance, it is not hard to think of a small
set of reactive behavior rules that will do the job. Moreover, once we have this set of rules working, it
is unlikely that we will need to modify it.
We have tried to make it clear that one type of behavior can be implemented using a variety of
techniques. We have, therefore, chosen not to classify behavior according to what the character is
trying to achieve, but rather on the basis of the technique used to implement it. The reader should
note however that some others do try to insist that behavior in the real world is of a certain type, and
its virtual world counterpart must therefore be implemented in a particular way. Unfortunately, this
leads to lots of confusion and disagreement among different research camps. In particular, there are
those who advocate using predefined behavior rules for implementing every kind of behavior, no
matter how complex. In the sense that, given enough time and energy it can be done, they are
correct. However, they are somewhat like the traditional animator who scoffs at the use of physical
simulators to generate realistic-looking motion. That is, to the traditional animator a physical simulator
is an anathema. She has an implicit physical model in her head and can use this to make realistic
motion that looks just as good (if not better), and may only require the computer to do some simple
"inbetweening". Compared to the motion that needs a physical simulator to execute, the key-framed
approach is lightning fast. If we could all have the skill of a professional animator there would not be
so much call for physical simulators. Unfortunately, most of us do not have the skill to draw
physically-correct looking motion and are happy to receive all the help we can get from the latest
technology. Even artists who can create the motion themselves might prefer to expend their energies
elsewhere in the creative process.
In the same vein, many of us don't have any idea of how to come up with a simple set of
stimulus-response rules that implement some complex behavior. Perhaps, we could eventually come
up with something, but if we have something else we'd rather do with our time it makes sense to get
the characters themselves to do some of the work for us. If we can tell them what we want them to
achieve, and how their world changes, then perhaps they can figure it out for themselves.
We should also point out that there are those who advocate a cognitive modeling approach for every
kind of behavior, even simple reactive ones. This view also seems too extreme as, to coin a phrase,
Letters to the Editor:
there is no point "using a sledgehammer to crack a nut". If we have a simple reactive behavior to
Write a letter implement, then it makes sense to look for a simple set of predefined rules. Also, if lightning-fast
View all letters
performance is an absolute must, then we might be forced to use a predefined approach, no matter
how tough it is to find the right set of rules.
| | | |
Features
by John Funge
Gamasutra
December 6, 1999
Printer Friendly Notes
Version Contents
For some basic information on FSMs see [HU79]. For more in-depth
information on predefined behavior techniques, consult Making Them Think
[Maes90,BBZ91,Tu99]. There are even some commercial character
Excerpted from AI for development packages that use HFSMs to define character behavior. See Predefined Behavior
Games and Animation [Nayfeh93] for a fascinating discussion on maze-solving techniques. Many of
(AK Peters, 1999) the classic papers on planning can be found in [AHT90]. See [SK96] for Goal-Directed Behavior
some work on the use of stochastic techniques for planning. Prolog is the
best known nondeterministic programming language and there are The Middle Ground
numerous references, for example see [Bratko90].
A Simple Tutorial: Maze
The complex action macro expansion is closely related to work done in Solving
proving properties of computer programs [GM96]. Our definitions are taken
from those given in [LRLLS97]. A more up-to-date version, that includes Discussion
support for concurrency, appears in [LLR99]. See [Stoy77] for the
Scott-Strackey least fixed-point definition of (recursive) procedure Notes and References
execution.
References
[AHT90] J. Allen, J. Hendler, and A. Tate, editors. Readings in Planning. Morgan Kaufmann, San
Mateo, CA, 1990.
[BBZ91] N.I. Badler, B.A. Barsky, and D.Zeltzer, editors. Making Them Move: Mechanics, Control, and
Animation of Articulated Figures. Morgan Kaufmann, San Mateo, 1991.
[Bratko90] I. Bratko. PROLOG Programming for Artificial Intelligence. Addison Wesley, Reading, MA,
1990.
[Funge99] J. Funge. AI for Games and Animation: A Cognitive Modeling Approach. A. K. Peters. Natick,
MA, 1999.
[GM96] J. A. Goguen and G. Malcolm. Algebraic Semantics of Imperative Programs. MIT Press,
Cambridge, MA, 1995.
[HU79] J. E. Hopcroft and J. D. Ullman. Introduction to Automata Theory, Languages, and
Computation. Addison-Wesley, Reading, MA, 1979.
[LLR99] Y. Lespérance, H. J. Levesque, and R. Reiter. A Situation Calculus Approach to Modeling and
Programming Agents. In A. Rao and M. Wooldridge, editors, Foundations and Theories of Rational
Agency. Kluwer, New York, 1999. (See also: www.cs.toronto.edu/cogrobo)
[LRLLS97] H. Levesque, R. Reiter, Y. Lespérance, F. Lin, and R. Scherl. Golog: A Logic Programming
Language for Dynamic Domains. Journal of Logic Programming, 31:59-84, 1997.
[Maes90] P. Maes (editor). Designing Autonomous Agents: Theory and Practice from Biology to
Engineering and Back. MIT Press, Boston, 1990.
[Nayfeh93] B. A. Nayfeh. "Using a Cellular Automata to Solve Mazes." Dr. Dobb's Journal, February
1993.
[SK96] B. Selman and H. Kautz. "Knowledge compilation and theory approximation." Journal of the
ACM, 43(2):193-224, 1996.
Letters to the Editor: [Stoy77] J. E. Stoy. Denotational Semantics: The Scott-Strachey Approach to Programming Language
Write a letter Theory. MIT Press, Cambridge, MA, 1977.
View all letters
[Tu99] X. Tu. Artificial Animals for Computer Animation: Biomechanics, Locomotion, Perception, and
________________________________________________________
[Back to] Making Them Think
See Also:
Artificial Intelligence:Gaming
AI In Empire-Based Games
Courtesy of Amit Patel
http://www-cs-students.stanford.edu/~amitp
points for star systems owned, followed by planets, and then ships/defenses/factories. [So an obvious decision weight
factor comes to mind: conquering a system is higher priority then building more ships unless you've got lots of
ships].
In an enemy system, one must first destroy the protecting fleet/defenses. Then you must destroy the enemy troops
occupying the planets. Every turn you have un-conquered planets, the enemy can destroy your ships, possibly
reducing the occupation fleet small enough so the system overthrows your rule.
So a typical game for me starts out scouting nearby systems while building up my fleet. I try to find the nearest
'neutral' (non-player-occupied) system that has high defenses (usually an indication of a large number of existing
factories; since it costs 5 points per *existing* factory to build a new one, the more already in a system, the better).
Or if there are any nearby enemy systems I send raid fleets to get points to build with [the player's home system has
no production limit; that is if you have extra points you can build as many of X with those points as you can,
whereas other systems can only build as many X as they have factories].
One of the tricks the computer opponent might do is to wreck factories to build stealthships. Since production in the
home system is not limited by number of factories, 1 factory can build several hundred stealthships from the points
recovered by wrecking the other factories. Then the computer can easily conquer several nearby systems, and use
those systems' factories to build. The computer opponent only seems to do this early in the game if there are lots of
nearby neutral systems. I haven't decided why the opponent decides to wreck factories later in the game.
A weakness of the computer opponent is to send most of the fleet to attack a new system, leaving an old system
relatively unprotected. If the computer has a small enough fleet, its possible to occupy the old system with little fear
of successful return take-over.
There are some other parts of the game, but thats pretty much it in a nutshell. The authors have produced a windows
version that has different rules for some of the above (eg its harder to raid). Part of my motivation for making my
own version is that I think their Windows interface is a dog, I'd like to learn, and the version I have has some
annoying bugs (like with a large game [26 star systems, 5 players] the game will tend to have field overwrite
problems, so that all of a sudden one player has got -32000 ships and is completely unconquerable).
Some potential additions include having systems that are rich in metals versus good crop planets, taking the time to
mine planets, colonization versus conquest, spy satellites, more ship types, trade, diplomacy, etc. But I'd like to get
my clone working first and then extend it.
>I, too, am creating space strategy game. Only part working 100% now is computer AI, :).
Care to share details? I'm rather lost when it comes to the AI part. Both Amit Patel and Robert Eaglestone have
expressed interest or ideas wrt the AI.
At this point I've done nothing on the AI (leave the hard part for last :). I haven't even thought much on the potential
computer operations, much less how the computer makes decisions between them [nor even how the computer
gathers the data to make the decisions, but that should be easier].
Is your AI data-driven? What computer operations/decision-points do you have, and how does the computer decide
between them? Don't feel that you have to give everything away, any input at all would be helpful at this point.
Thanks!
Kevin
easiest conquest in terms of range/defense, it merges them to fleet before attacking. If planetary defenses were last
time better than the fleet, it just flies to the system & waits until there is great enough force to wipe out the planet.
(Planetary defenses cannot attack, as name implies)
--
Markus Stenberg / fingon@nullnet.fi / Finland / Europe
Public PGP key available on request.
Discuss this article in the forums
| | | |
Features
by Joe Adzima
Gamasutra
[Author's Bio] AI Madness: Using AI to Bring Open-City
January 24, 2001
Racing to Life
Angel Studios' Midtown Madness 2 for PC and Midnight Club
for Playstation 2 are open racing games in which players
AI Map: Roads, have complete freedom to drive where they please. Set in
Intersections, and
Open Areas
"living cities," these games feature interactive entities that
include opponents, cops, traffic, and pedestrians. The role of
Curves Ahead: artificial intelligence is to make the behaviors of these
Creating Traffic high-level entities convincing and immersive: opponents
must be competitive but not insurmountable. Cops who spot
City People: you breaking the law must diligently try to slow you down or
Simulating Pedestrians stop you. Vehicles composing ambient traffic must follow all
traffic laws while responding to collisions and other
Simulating Vehicles unpredictable circumstances. And pedestrians must go about their routine business, until you swerve
with Full Physics towards them and provoke them to run for their lives. This article provides a strategy for programmers
who are trying to create AI for open city racing games, which is based on the success of Angel Studios'
Printer Friendly implementation of AI in Midtown Madness 2 and Midnight Club. The following discussion focuses on the
Version autonomous architecture used by each high-level entity in these games. As gameplay progresses, this
autonomy allows each entity to decide for itself how it's going to react to its immediate circumstances.
Discuss this This approach has the benefit of creating lifelike behaviors along with some that were never intended,
Article but add to gameplay in surprising ways.
The road object contains all the data representing a street, in terms of lists of 3D vertices. The main
definition of the road includes the left/right boundary data, the road's centerline, and orientation
vectors defined for each vertex in the definition. Other important road data includes the traffic lane
definitions, the pedestrian sidewalk definition, road segment lengths, and lane width data. A minimum
of four 3D vertices are used to define a road, and each list of vertices (for example, center vertices,
boundary vertices, and so on) has the same number of vertices.
The intersection object contains a pointer to each connected shortcut and road segment. At
initialization, these pointers are sorted in clockwise order. The sorting is necessary for helping the
ambient traffic decide which is the correct road to turn onto when traversing an intersection. The
intersection object also contains a pointer to a "traffic light set" object, which, as you might guess, is
responsible for controlling the light's sequence between green and red. Other important tasks for this
object include obstacle management and stop-sign control.
Big-city solutions: leveraging the City Tool and GenBAI Tool. Angel's method for creating
extremely large cities uses a very sophisticated in-house tool called the City Tool. Not only does this
tool create the physical representation of the city, but it also produces the raw data necessary for the
AI to work. The City Tool allows the regeneration of the city database on a daily basis. Hence, the city
can be customized very quickly to accommodate new gameplay elements that are discovered in
prototyping, and to help resolve any issues that may emerge with the AI algorithms.
The GenBAI Tool is a separate tool that processes the raw data generated from the City Tool into the
format that the run-time code needs. Other essential tasks that this GenBAI Tool performs include the
creation of the ambient and pedestrian population bubbles and the correlation of cull rooms (discrete
regions of the city) to the components of the road map.
Based on the available AI performance budget and the immense size of the cities, it's impossible to
simulate an entire city at once. The solution is to define a "bubble" that contains a list of all the road
components on the city map that are visible from each cull room in the city, for the purpose of culling
the simulation of traffic and pedestrians beyond a certain distance. This collection of road components
essentially becomes the bubbles for ambient traffic and pedestrians.
The last function of the GenBAI tool is to create a binary version of the data that allows for superfast
load times, because binary data can be directly mapped into the structures.
Data files: setting up races. The AI for each race event in the game is defined using one of two
files: the city-based AI map data file or the race-based AI map data file. The city file contains defaults
to use for all the necessary AI settings at a city level. Each race event in the city includes a race-based
AI map data file. This race file contains replacement values to use instead of the city values. This
approach turns out to be a powerful design feature, because it allows the game designer to set
defaults at a city level, and then easily override these values with new settings for each race.
Some examples of what is defined in these files are the number and definition of the race's opponents,
cops, and hook men. Also defined here are the models for the pedestrians and ambient vehicles to use
for a specific race event. Finally, exceptions to the road data can be included to change the population
fill density and speed limits.
________________________________________________________
| | | |
Features
by Joe Adzima
Gamasutra
[Author's Bio] AI Madness: Using AI to Bring Open-City
January 24, 2001
Racing to Life
Curves Ahead: Creating Traffic
AI Map: Roads, Following rails and cubic spline curves. During normal driving conditions, all the ambient vehicles
Intersections, and are positioned and oriented by a 2D spline curve. This curve defines the exact route the ambient traffic
Open Areas will drive in the XZ-plane. We used Hermite curves because the defining parameters, the start and end
positions, and the directional vectors are easy to calculate and readily available.
Curves Ahead:
Creating Traffic
Since the lanes for ambient vehicles on each road are defined by a list of vertices, a road subsegment
City People: can easily be created between each vertex in the list. When the ambient vehicle moves from one
Simulating Pedestrians segment to the next, a new spline is calculated to define the path the vehicle will take. Splines are also
used for creating recovery routes back to the main rail data. These recovery routes are necessary for
Simulating Vehicles recovering the path after a collision or a player-avoidance action sent the ambient vehicle off the rail.
with Full Physics Using splines enables the ambient vehicles to drive smoothly through curves typically made up of
many small road segments and intersections.
Printer Friendly
Version
Setting the road velocity: the need for speed. Each road in the AI map has a speed-limit
parameter for determining how fast ambient vehicles are allowed to drive on that road. In addition,
each ambient vehicle has a random value for determining the amount it will drive over or under the
Discuss this road's speed limit. This value can be negative or positive to allow the ambient vehicles to travel at
Article different speeds relative to each other.
From the December When a vehicle needs to accelerate, it uses a randomly selected value between 5 and 8 m/s2. At other
2000 issue of: times, when an ambient vehicle needs to decelerate, perhaps because of a stop sign or red light, then
the vehicle calculates a deceleration value based on attaining the desired speed in 1 second. The
deceleration is calculated by
where V is the target velocity, V0 is the current velocity, and (X - X0) is the distance required to
perform the deceleration.
Detecting collisions. With performance times being so critical, each ambient vehicle can't test all the
other ambient vehicles in its obstacle grid cell. As a compromise between speed and
comprehensiveness, each ambient vehicle contains only a pointer to the next ambient vehicle directly
in front of it in the same lane. On each frame, the ambient checks if the distance between itself and
the next ambient vehicle is too close. If it is, the ambient in back will slow down to the speed of the
ambient in front. Later, when the ambient in front becomes far enough away, the one in back will try
to resume a different speed based on the current road's speed limit.
Letters to the Editor: By itself, this simplification creates a problem with multi-car pileups. The problem can be solved by
Write a letter stopping the ambient vehicles at the intersections preceding the crash scene.
View all letters
Crossing the intersection. Once an ambient vehicle reaches the end of a road, it must traverse an
intersection. To do this, each vehicle needs to successfully gain approval from the following four
functional groups.
First, the ambient vehicle must get approval from the intersection governing that road's "traffic
control." Each road entering an intersection contains information that describes the traffic control for
that road. Applicable control types are NoStop, AllwaysStop, TrafficLight, and StopSign (see
Figure 2). If NoStop is set, then the ambient vehicle gets immediate approval to proceed through the
intersection. If AllwaysStop is set, the ambient never gets approval to enter the intersection. If
TrafficLight is set, the ambient is given approval whenever its direction has a green light. If
StopSign is set, the ambient vehicle that has been waiting the longest time is approved to traverse
the intersection.
The second approval group is the accident manager. The accident manager keeps track of all the
ambient vehicles in the intersection and the next upcoming road segment. If there are any accidents
present in these AI map components, then approval to traverse the intersection is denied. Otherwise,
the ambient vehicle is approved and moves on to the third stage.
The third stage requires that the road which the ambient is going to be on after traversing the
intersection has the road capacity to accept the ambient vehicle's entire length, with no part of the
vehicle sticking into the intersection.
The fourth and final approval comes from a check to see if there are any other ambient vehicles trying
to cross at the same time. An example of why this check is necessary is when an ambient vehicle is
turning from a road controlled by a stop sign onto a main road controlled by a traffic light. Since the
approval of the stop sign is based on the wait time at the intersection, the vehicle that's been waiting
longest would have permission to cross the intersection -- but in reality that vehicle needs to wait until
the cars that have been given permission by the traffic light get out of the way.
Selecting the next road. When an ambient vehicle reaches the end of the intersection, the next
decision the vehicle must make is which direction to take. Depending on its current lane assignment,
the ambient vehicle selects the next road based on the following rules (see Figure 2):
● If a vehicle is in the far-left lane, it can go either left or straight.
● If it's in the far-right lane, it can go either right or straight.
● If it's in any of the center lanes, then it must go straight.
● If it's on a one-way road, then it picks randomly from any of the outgoing roads.
● If it's on a freeway intersection where on-ramps merge with the main freeway traffic, then it
must always go right.
● U-turns are never allowed, mostly because a splined curve in this situation would not look
natural.
Since the roads are sorted in clockwise order, this simplifies selection of the correct road. For example,
to select the road to the left, just add 1 to the current road's intersection index value (the ID number
of that road in the intersection road array). To pick the straight road, add 2. To go right, just subtract
1 from the road's intersection index value.
Changing lanes. On roads that are long enough, the ambient vehicles will change lanes in order to
load an equal number of vehicles into each lane of the road. When the vehicle has traveled to the point
that triggers the lane change (usually set at 25 percent of the total road length), the vehicle will
calculate a spline that will take it smoothly from its current lane to the destination lane.
The difficulty here is in setting the next-vehicle pointer for collision detection. The solution is to have a
next-vehicle pointer for each possible lane of the road. During this state, the vehicle is assigned to two
separate lanes and therefore is actually able to detect collision for both traffic lanes.
Once a vehicle completes the lane change, it makes another decision as to which road it wants to turn
onto after traversing the upcoming intersection. This decision is necessary because the vehicle is in a
| | | |
Features
by Joe Adzima
Gamasutra
[Author's Bio] AI Madness: Using AI to Bring Open-City
January 24, 2001
Racing to Life
City People: Simulating Pedestrians
AI Map: Roads, In real cities, pedestrians are on nearly every street corner. They walk and go about their business, so
Intersections, and it should be no different in the cities we create in our games. The pedestrians wander along the
Open Areas sidewalks and sometimes cross streets. They avoid static obstacles such as mailboxes, streetlights,
and parking meters, and also dynamic obstacles such as other pedestrians and the vehicles controlled
Curves Ahead:
by the players. And no, players can't run over the pedestrians, or get points for trying! Even so,
Creating Traffic
interacting with these "peds" makes the player's experience as a city driver much more realistic and
City People:
immersive.
Simulating Pedestrians
Simulation bubbles for pedestrians. Just as the ambient traffic has a simulation bubble, so do the
Simulating Vehicles pedestrians. And while the pedestrian bubble has a much smaller radius, both types are handled
with Full Physics similarly. During initialization, the pedestrians are created and inserted into the pedestrian pool. When
the player is inserted into the city, the pedestrians are populated around him. During population, one
Printer Friendly pedestrian is added to each road in the bubble, round-robin style, until all the pedestrians in the pool
Version are exhausted.
Pedestrians are initialized with a random road distance and side distance based on an offset to the
Discuss this center of the sidewalk. They are also assigned a direction in which to travel and a side of the street on
Article which to start. As the pedestrians get to the edge of the population bubble, they simply turn around
and walk back in the opposite direction from which they came.
From the December
2000 issue of: Wandering the city. When walking the streets, the pedestrians use splines to smooth out the angles
created by the road subsegments. All the spline calculations are done in 2D to increase the
performance of the pedestrians. The Y value for the splines is calculated by probing the polygon the
pedestrian is walking on in order to give the appearance that the pedestrian is actually walking on the
terrain underneath its feet.
Each pedestrian has a target point for it to head toward. This target point is calculated by solving for
the location on the spline path three meters ahead of the pedestrian. In walking, the ped will turn
toward the target point a little bit each frame, while moving forward and sideways at a rate based on
the parameters that control the animation speed. As the pedestrian walks down the road, the ped
object calculates a new spline every time it passes a sidewalk vertex.
Crossing the street. When a pedestrian gets to the end of the street, it has a decision to make. The
ped either follows the sidewalk to the next street or crosses the street. If the ped decides to cross the
street, then it must decide which street to cross: the current or the next. Four states control ped
navigation on the streets: Wander, PreCrossStreet, WaitToCrossStreet, and CrossStreet (see
Figure 3). The first of these, Wander, is described in the previous section, "Wandering the City."
Letters to the Editor:
PreCrossStreet takes the pedestrian from the end of the street to a position closer to the street curb,
Write a letter
View all letters
WaitToCrossStreet tells the pedestrian waiting for the traffic light that it's time to cross the street,
and CrossStreet handles the actual walking or running of the pedestrian to the curb on the other side
of the street.
Animating actions. The core animation system for the pedestrians is skeleton-based. Specifically,
animations are created in 3D Studio Max at 30FPS, and then downloaded using Angel's proprietary
exporter. The animation system accounts for the nonconstant nature of the frame rate.
For each type of pedestrian model, a data file identifies the animation sequences. Since all the
translation information is removed from the animations, the data file also specifies the amount of
translation necessary in the forward and sideways directions. To move the pedestrian, the ped object
simply adds the total distance multiplied by the frame time for both the forward and sideways
directions. (Most animation sequences have zero side-to-side movement.)
Two functions of the animation system are particularly useful. The Start function immediately starts
the animation sequence specified as a parameter to the function, and the Schedule function starts the
desired animation sequence as soon as the current sequence finishes.
Avoiding the speeding player. The main rule for the pedestrians is to always avoid being hit. We
accomplish this in two ways. First, if the pedestrian is near a wall, then the ped runs to the wall, puts
its back against it, and stands flush up against it until the threatening vehicle moves away (see Figure
4).
Alternatively, if no wall is nearby, the ped turns to face the oncoming vehicle, waits until the vehicle is
close enough, and then dives to the left or right at the very last moment (see Figure 5).
The pedestrian object determines that an oncoming vehicle is a threat by taking the forward
directional vector of the vehicle and performing a dot product with the vector defined by the ped's
position minus the vehicle's position. This calculation measures the side distance. If the side distance
is less than half the width of the vehicle, then a collision is imminent.
The next calculation is the time it will take the approaching vehicle to collide with the pedestrian. In
this context, two distance zones are defined: a far and a near. In the far zone, the pedestrian turns to
face the vehicle and then goes into an "anticipate" behavior, which results in a choice between shaking
with fear and running away. The near zone activates the "avoid" behavior, which causes the pedestrian
to look for a wall to hug. To locate a wall, the pedestrian object shoots a probe perpendicular to the
sidewalk for ten meters from its current location. If a wall is found, the pedestrian runs to it.
Otherwise, the ped dives in the opposite direction of the vehicle's rotational momentum. (Sometimes
the vehicle is going so fast, a superhuman boost in dive speed is needed to avoid a collision.)
Avoiding obstacles. As the pedestrians walk blissfully down the street, they come to obstacles in the
road. The obstacles fall into one of three categories: other wandering pedestrians; props such as trash
cans, mailboxes, and streetlights; or the player's vehicle parked on the sidewalk.
In order to avoid other pedestrians, each ped checks all the pedestrians inside its obstacle grid cell. To
detect a collision among this group, the ped performs a couple of calculations. First, it determines the
side distance from the centerline of the sidewalk to itself and the other pedestrian. The ped's radius is
then added to and subtracted from this distance. A collision is imminent if there is any overlap
between the two pedestrians.
In order to help them avoid each other, one of the pedestrians can stop while the other one passes.
One way to do this is to make the pedestrian with the lowest identification number stop, and the latter
ped sets its target point far enough to left or right to miss the former ped. The ped will always choose
left if it's within the sidewalk boundary; otherwise it will go to the right. If the right target point is also
past the edge of the sidewalk, then the pedestrian will turn around and continue on its way. Similar
calculations to pedestrian detection and avoidance are performed to detect and avoid the props and
the player's vehicle.
________________________________________________________
| | | |
Features
by Joe Adzima
Gamasutra
[Author's Bio] AI Madness: Using AI to Bring Open-City
January 24, 2001
Racing to Life
Simulating Vehicles with Full Physics
AI Map: Roads, The full physics simulation object, VehiclePhysics, is a base class with the logic for navigating the
Intersections, and
city. The different entities in the city are derived from this base class, including the RouteRacer object
Open Areas
(some of the opponents) and the PoliceOfficer object (cops). These child classes supply the
Curves Ahead: additional logic necessary for performing higher-level behaviors. We use the term "full-physics
Creating Traffic vehicles" because the car being controlled for this category behaves within the laws of physics. These
cars have code for simulating the engine, transmission, and wheels, and are controlled by setting
City People: values for steering, brake, and throttle. Additionally, the VehiclePhysics class contains two key public
Simulating Pedestrians
methods, RegisterRoute and DriveRoute.
Simulating Vehicles
with Full Physics Registering a route. The first thing that the navigation algorithm needs is a route. The route can
either be created dynamically in real time or defined in a file as a list of intersection IDs. The real-time
Printer Friendly method always returns the shortest route. The file method is created by the Race Editor, another
Version proprietary in-house tool that allows the game designer to look down on the city in 2D and select the
intersections that make up the route. The game designer can thereby create very specific routes for
opponents. Also, the file method eliminates the need for some of the AI entities to calculate their
Discuss this
routes in real time, which in turn saves processing time.
Article
Planning the route. Once a route to a final destination has been specified, a little bit more detailed
From the December planning is needed for handling immediate situations. We used a road cache for this purpose, which
2000 issue of: stores the most immediate three roads the vehicle is on or needs to drive down next (see Figure 6).
At any given moment, the vehicle knows the next intersection it is trying to get to (the immediate
target), so the vehicle can identify the road connecting this target intersection with the intersection
immediately before the target. If the vehicle is already on this "hint road," then the cache is filled with
Choosing the best route. When all the possible routes have been enumerated, the best route for the
CCP can be determined. Sometimes one or more of the routes will take the vehicle onto the sidewalk.
Taking the sidewalk is a negative, so these routes are less attractive than those which stay on the
road. Also, some routes will become completely blocked, with no way around the obstacles present,
making those less attractive as well. The last criterion is minimizing the amount of turning required to
drive a path. Taking all these criteria into account, the best route is usually the one that isn't blocked,
stays on the road, and goes as straight as possible.
Setting the steering. The CCP vehicle simulated with full physics uses the same driving model that
the player's vehicle uses. For example, both vehicles take a steering parameter between -1.0 and 1.0.
This parameter is input from the control pad for the player's vehicle, but the CCP must calculate its
steering parameter in real time to avoid obstacles and reach its final destination. Rather than planning
its entire route in advance, the CCP simplifies the problem by calculating a series of Steering Target
Points (STPs), one per frame in real time as gameplay progresses. Each STP is simply the next point
the CCP needs to steer towards to get one frame closer to its final destination. Each point is calculated
with due consideration to navigating the road, navigating sharp turns, and avoiding obstacles.
Setting the throttle. Most of the time a CCP wants to go as fast as possible. There are two
exceptions to this rule: traversing sharp turns and reaching the end of a race. Sharp turns are defined
as those in which the angle between two road subsegments is greater than 45 degrees, and can occur
anywhere along the road or when traversing an intersection. Since the route through a sharp turn is
circular, it is easy to calculate the maximum velocity through the turn by the formula
where V is equal to the velocity, u is the coefficient of friction for the road surface, g is the value of
gravity, and R is the radius of our turn. Once the velocity is known, all that the CCP has to do is slow
________________________________________________________
AI Uncertainty GameDev.net
See Also:
Artificial Intelligence:Gaming
Newsgroups: comp.ai.games
From: smt@cs.monash.edu.au (Scott M Thomson)
Subject: Uncertainty in AI
Summary: This posting will hopefully publicise Bayesian networks, which provide
Keywords: AI
Date: Fri, 31 Mar 1995 04:56:58 GMT
Have you ever played peek-a-boo with a small child? Why is it that it works? What is it that engages the
child's delight? Why doesn't it work with older people?
The game of peek-a-boo takes advantage of the limited cognitive development of the child, when we hide
ourselves behind an object the child's mind no longer registers our presence. When we pop out from hiding
the child's mind is delirious at the magic of our rematerialization.
A complicated challenge for artificial intelligence since its inception has been knowledge representation in
problems with uncertain domains. What a system can't see is, nonetheless, of possible importance to its
reasoning mechanisms. What is unknown is also often still vital to common sense reasoning. This posting
will hopefully publicise Bayesian networks, which provide a formalism for modelling and an inference
mechanism for reasoning under uncertainty and initiate discussion about uncertainty problems and
probabilistic reasoning in game AI's.
Sun Tzu was a Chinese general who lived approximately 2400 years ago. His work, "The Art of War",
describes the relationships between warfare, politics, economics, diplomacy, geography and astronomy.
Such modern generals as Mao Zedung have used his work as a strategic reference.
Sun Tzu's philosophy on war can be summed up in this statement, "to win one hundred victories in one
hundred battles is not the acme of skill. To subdue the enemy without fighting is the supreme excellence"
[11]. In computer games utilising cheats for toughening computer AI there is no skill in a computer player's
victory. If a computer player can beat a human on even terms then we may start to discuss the skill of the
AI designer and any human victory is that much more appreciated.
The difficulty in representing uncertainty in any game AI is in the vast numbers of combinations of actions,
strategies and defences available to each player. What we are left with is virtually impossible to represent in
tables or rules applicable to more than a few circumstances. Amongst the strategies expounded by Sun Tzu
are enemy knowledge, concealment and position[11].
Enemy knowledge is our most obvious resource. Another player's units or pieces inform us about possible
future actions or weaknesses by location, numbers and present vectored movement. They suggest
possibilities for defensive correction, offensive action and bluffing. Sun Tzu states that we should, ``Analyse
the enemy's plans so that we will know his shortcomings as well as strong points. Agitate him in order to
ascertain the pattern of his movement"[11].
Concealment may be viewed as the art of both hiding one's own strategy and divining one's opponent's. By
considering our opponent's past history and placing our current situation in that context we hope to
discover something about what is hidden in their mind. Conversely, our actions must be designed to convey
as little as possible about the true strength or weakness of our positions.
The position of units refers to their terrain placement in the game. Those terrains that grant defensive or
offensive bonuses to computer players units should be utilised to the best advantage. In addition computer
units should strike where the enemy is weakest and where the most damage can be inflicted at the least
loss. Impaling units on heavily fortified positions for nominal gain is best left to real generals in real war and
is not a bench mark of intelligent behaviour.
To combine everything we need to play a good game in the face of a deceptive and hostile opponent is not
a trivial task. Sun Tzu believed, "as water has no constant form, there are no constant conditions in war.
One able to win the victory by modifying his tactics in accordance with the enemy situation may be called a
divine!" [11]. Our aim in designing game AI's is to obtain a mechanism for moderate strategic competence,
not a program with a claim to god-hood.
Debate on the mechanism for the representation of uncertainty has settled into two basic philosophies,
extensional and intensional systems [19, p3]. Extensional systems deal with uncertainty in a context free
manner, treating uncertainty as a truth value attached to logic rules. Being context free they do not
consider interdependencies between their variables. Intensional systems deal with uncertainty in a context
sensitive manner. They try to model the interdependencies and relevance relationships of the variables in
the system.
The difference between these two systems boils down to a trade-off between semantic accuracy and
computational feasibility. Extensional systems are computationally efficable but semantically clumsy.
Intensional systems on the otherhand were thought by some to be computationally intractable even though
they are semantically clear.
Both MYCIN (1984) and PROSPECTOR (1978) are examples of extensional systems. MUNIN (1987) is an
example of an intensional system.
MYCIN is an expert system which diagnoses bacterial infections and recommends prescriptions for their
cure. It uses certainty factor calculus to manipulate generalised truth values which represent the certainty
of particular formulae. The certainty of a formula is calculated as some function of the certainty of it
subformulae.
MUNIN is an expert system which diagnoses neuromuscular disorders in the upper limbs of humans. It uses
a causal probabilistic network to model the conditional probabilities for the pathophysiological features of a
patient[1].
Some of the stochastic infidelity of extensional systems arises in their failure to handle predictive or
abductive inference. For instance, there is a saying, ``where there's smoke there's fire". We know that fire
causes smoke but it is definitely not true that smoke causes fire. How then do we derive the second from
the first? Quite simply, smoke is considered evidence for fire, therefore if we see smoke we may be led to
believe that there is a fire nearby.
In an extensional approach to uncertainty it would be necessary to state the rule that smoke causes fire in
order to obtain this inferencing ability. This may cause cyclic updating which leads to an over confidence in
the belief of both fire and smoke, from a simple cigarette. To avoid this dilemma most extensional systems
do not allow predictive inferencing. An example of predictive inferencing in a strategic game is the
consideration of a player's move in reasoning about their overall strategy.
Even those authors that support extensional systems as a means for reasoning under uncertainty
acknowledge their semantic failures. "There is unfortunately a fundamental conflict between the demands of
computational tractability and semantic expressiveness. The modularity of simple rule-based systems aid
efficient data update procedures. However, severe evidence independence assumptions have to be made for
uncertainties to be combined and propagated using strictly local calculations"[5].
Although computationally feasible these systems lack the stochastic reliability of plausible reasoning. THE
PROBLEM WITH CERTAINTY FACTORS OR TRUTH VALUES BEING ATTACHED TO FORMULAE IS THAT
CERTAINTY MEASURES VISIBLE FACTS WHEREAS UNCERTAINTY IS RELATED TO WHAT IS UNSEEN, THAT
WHICH IS NOT COVERED BY THE FORMULAE[].
The semantic merits of intensional systems is also the reason for their computational complexity. In the
example,
if P(A|B) = m,
we cannot assert anything about B even if given complete knowledge about A. The rule says only that if A is
true and is the only thing that is known to be relevant to B, then the probability of B is 'm'. When we
discover new information relevant to B we must revoke our previous beliefs and calculate P(B|A,K), where K
is the new knowledge. The stochastic fidelity of intensional systems leaves them impotent unless they can
determine the relevance relationships between the variables in their domain. It is necessary to use a
formalism for articulating the conditions under which variables are considered relevant to each other, given
what is already known. Using rule-based systems we quickly get bogged in the unwieldy consideration of all
possible probable interactions. This leads to complex and computationally infeasible solutions.
Bayesian networks are a mechanism for accomplishing computational efficacy with a semantically accurate
intensional system. They have been used for such purposes as, sensor validation [9], medical diagnoses[1,
2], forecasting [3], text understanding [6] and naval vessel classification [7].
The challenge is to encode the knowledge in such a way as to make the ignorable quickly identifiable and
readily accessible. Bayesian networks provide a mathematically sound formalism for encoding the
dependencies and independencies in a set of domain variables. A full discussion is given in texts devoted to
this topic [10].
Bayesian networks are directed acyclic graphs in which the nodes represent stochastic variables. These
variables can be considered as a set of exhaustive and mutually exclusive states. The directed arcs within
the structure represent probabilistic relationships between the variables. That is, their conditional
dependencies and by default their conditional independencies.
We have then, a mechanism for encoding a full joint probability distribution, graphically, as an appropriate
set of marginal and conditional distributions over the variables involved. When our graphical representation
is sparsely connected we require a much smaller set of probabilities than would be required to store a full
joint distribution.
Each root node within a Bayesian network has a prior probability associated with each of its states. Each
other node in the network has a conditional probability matrix representing probabilities, for that variable,
conditioned on the values of its parents.
After a network has been initialised according to the prior probabilities of its root nodes and the conditional
probabilities of its other variables, it is possible to instantiate variables to certain states within the network.
The network, following instantiation, already has posteriors associated with each node as a result of the
propagation during initialisation. Instantiation leads to a propagation of probabilities through the network to
give posterior beliefs about the states of the variables represented by the graph.
In conclusion, I am not proposing that Bayesian networks are some god given solution to all of AI's
problems. It is quite plain that quite a few problems push the bounds of computational feasibility even for
Bayesian networks. It is my hope that by posting this I may play some game in the future that "reasons" in
a remotely intelligent way about strategies for victory. Perhaps incorporating the concepts of probabilistic
reasoning into a hybrid system is a feasible solution to a competent strategic AI.
Here is a list of some references I used in my Honours thesis. Numbers 8 and 10 are texts devoted to
Bayesian Networks.
[1]
Andreassen, S; et al.
"MUNIN - an Expert EMG Assistant."
{Computer-Aided Electromyography and Expert Systems}, 1989.
[2]
Berguini, C; Bellazi, R; Spiegelhalter, D.
"Bayesian Networks Applied to Therapy Monitoring.",
{Uncertainty in Artificial Intelligence},
Proceedings of the Seventh Conference (1991) p35.
[3]
Dagum, P; Galpher, A; Horvitz, E.
"Dynamic Network Models for Forcasting."
{Uncertainty in Artificial Intelligence},
Proceedings of the Eighth Conference (1992) p41.
[4]
Findler, N.
"Studies in Machine Cognition using th Game of Poker."
{Communications of the ACM}, v20, April 1977, p230.
[5]
Fox, J; Krause, P.
"Symbolic Decision Theory and Autonomous Systems."
{Uncertainty in Artificial Intelligence},
Proceedings of the Seventh Conference (1991) p103.
[6]
Goldman, R; Charniak, E.
"A Probabilistic Approach to Language Understanding."
{Tech Rep CS-90-34}, Dept Comp Sci, Brown University 1990.
[7]
Musman, SA; Chang, LW.
A Study of Scaling In Bayesian Networks for Ship Classification."
{Uncertainty in Artificial Intelligence},
Proceedings of the Ninth Conference (1993) p32.
[8]
Neapolitan, RE.
{"Probabilistic Reasoning in Expert Systems, Theory and Algorithms."}
John Wiley and Sons, 1989.
[9]
Nicholson, AE; Brady, JM.
"Sensor Validation using Dynamic Belief Networks."
{Uncertainty in Artificial Intelligence},
Proceedings of the Eighth Conference (1992) p207.
[10]
Pearl, J.
{"Probabilistic Reasoning in Intelligent Systems, Networks of Plausible Inference."}
Morgan Kaufmann Publishers, Inc, 1988.
[11]
Wordsworth Reference.
{"Sun Tzu, The Art of War."}
Sterling Publishing Co Inc, 1990.
Scott Thomson
###############################################################################
Scott M Thomson \|/ ^^^ \|/
smt@bruce.cs.monash.edu.au -O-[/@ @\]-O-
\ | > | /
| |___| |
\ \ U / /
---
"Cognito cognito ergo cognito sum cognito"
"I think I think therfore I think I am I think?"
(pardon the grammar)
###############################################################################
(Note that this date does not necessarily correspond to the date the article was written)
© 1999-2002 Gamedev.net. All rights reserved. Terms of Use Privacy Policy
Comments? Questions? Feedback? Send us an e-mail!
| | | |
Features
By Ian Wilson
Gamasutra
May 7, 1999
Artificial Emotion:
Simulating Mood and Personality
Characters that display emotion are critical to a rich and believable simulated Contents
environment, especially when those characters interact with real people possessing Introduction
real emotions. Emotion is the essential element that creates the difference
between robotic behavior and lifelike, engaging behavior. Traditionally, animators Layers of emotion
have painstakingly created these behaviors for prerendered animations. This
approach, however, is not possible when we wish to use autonomous, interactive How can we use AE?
characters that possess their own unique personalities and moods. Truly
interactive characters must generate their behavior autonomously through techniques based upon
what I call artificial emotion (AE).
Why do we have real emotion?
As human beings, we have an innate understanding of what emotions are. However, outside of
academia, we rarely hear discussions on how emotions are produced and, more importantly, on why
we have emotions. Within academia, these issues are subject to much contention and debate. That
said, allow me to offer my own thoughts on these issues.
When attempting to simulate natural systems, we first need to ask, "What is the nature of this system
and what is its purpose or reason for being?" Very few, if any, systems in the natural world exist for no
reason.
Emotions are an integral part of our decision-making systems. Emotions tune our decisions according
to our personalities, moods, and momentary emotions to give us unique responses to situations
presented by our environment. But why do we need unique responses to situations? Why don’t we all
have the same responses? To answer this question, we need to look beyond the individual at humanity
as a group or society of individuals. I believe personality has evolved as a problem-solving mechanism.
Our unique personalities determine that we all think and hence solve problems in unique and different
ways. In an evolutionary sense, this diverse method of solving problems is highly effective. If we had
only one method of problem solving there would be a large, if not infinite, number of solutions that
would be outside of our problem solving capabilities. So personality has evolved as a way of attacking
problems from many different angles: from bold high-risk solutions to cautious and precise
incremental solutions; from solutions discovered though deep thought and reflection to solutions
discovered by gaining knowledge from others (socializing).
Emotion is, to a large degree, an emergent system. Its use must be looked at in terms of its
interaction with society rather than in isolation to gain a better understanding of its reason for being.
Layers of emotion
| | | |
Features
By Ian Wilson
Gamasutra
May 7, 1999
Layers of emotion Contents
Introduction
Fundamental to our AE-based behavior system is the notion that emotions
comprise three layers of behavior. At the top level are what we term momentary Layers of emotion
emotions; these are the behaviors that we display briefly in reaction to events. For
example, momentary emotions occur when we smile or laugh at a joke or when we How can we use AE?
are surprised to see an old friend unexpectedly. At the next level are moods.
Moods are prolonged emotional states caused by the cumulative effect of momentary emotions.
Underlying both of these layers and always present is our personality; this is the behavior that we
generally display when no momentary emotion or mood overrides (Figure 1).
These levels have an order of priority. Momentary emotions have priority over mood when determining
which behavior to display. One’s mood, in turn, has priority over one’s personality (Figure 2)
Figures 1 and 2 show the various layers of emotional behavior. Momentary emotions are brief
reactions to events that assume the highest priority when we select our behavior. These momentary
behaviors are short-lived and decay quickly. Moods are produced by momentary emotions, usually by
the cumulative affects of a series of momentary emotions. Moods can gradually increase in prominence
even after the momentary emotions have subdued. The development of moods depends on whether
the momentary emotions are positive or negative (punishments or rewards in a reinforcement sense).
If a character were to receive a stream of negative momentary emotions, then the mood would
obviously be bad and would decay slowly. The personality layer is always present and has a consistent
level of prominence.
The behavior that a character displays depends upon each emotional layer’s prominence. The more
prominent the layer, the higher the probability of that behavior being selected.
Where can we use AE?
With the notable exceptions of P.F. Magic’s Catz and Dogz series, Fujitsu’s fin fin, and Cyberlife’s
Creatures series, autonomous AE of any significant depth is rarely seen in the world of interactive
entertainment. Why is this the case?
The field of interactive entertainment is dominated by genres that require the user to either conquer
and/or to kill everything in his or her path. Little emotion is required by the opposition, besides
perhaps a little hard-coded fear or aggression that manifests itself in simple movement patterns.
Emotion primarily serves a social function in interactive entertainment. Emotional responses are used
to make the characters that we encounter believable and engaging. For example, if we were to walk
into a virtual bar and all of the characters in the bar had distinct personalities, the scene would be a
very immersive and believable social situation. If the characters showed no emotion, our suspension of
disbelief would be immediately broken and we would be reminded that we were in a
computer-generated simulation rather than in our own fantasy world. Of course, if all of the bar’s
customers had guns and our sole purpose was to dispatch them to a simulated afterlife, then this
really wouldn’t constitute a social situation and emotion might not be required.
A key to the use of AE, then, is the context of situations in which it is used. An important area of
growth is in the field of girls’ entertainment, pioneered by Purple Moon and its friendship adventures
built on Brenda Laurel’s excellent research into girls’ play behavior and girls and sport. For more
information on Ms. Laurel’s research, see
http://www.purple-moon.com/cb/laslink/pm?stat+corp+play_behavior and
http://www.purple-moon.com/cb/laslink/pm?stat+corp+girl_sport.
Social cooperation is a key element in this area and as such is an ideal place to use autonomous
characters with AE. In these situations, the characters’ emotional states and their emotional responses
to the players’ actions are what make the experience enjoyable, interesting, and entertaining. After
playing the first of Purple Moon’s titles, I was a little disappointed to find that it used only static
animations, which limited its sense of immersion. A full, living, 3D world would have increased its
impact (and cost) dramatically.
Of course, processor overhead is always a problem with an element as computationally complex as AE.
The reason that Catz, Dogz, and Creatures succeed in displaying characters with believable emotional
| | | |
Features
By Ian Wilson
Gamasutra
May 7, 1999
How can we use AE? Contents
Introduction
Artificial emotion produces two fundamental components as output: gestures and
actions. Actions are a general category and are dependent upon the context of the Layers of
situation in which the character exists. A simulation’s movement system uses AE to
emotion
select and/or modify an action. When selecting an action, AE indicates what
actions are appropriate to the character’s personality and current mood. So a timid
How can we
character is unlikely to do anything aggressive, for example. When modifying an
use AE?
action, AE can help to determine how an action is carried out. An outgoing,
extroverted character might perform an action enthusiastically, although this probably wouldn’t be the
case for an extreme introvert. Our primary use of AE, however, is in driving gestures, namely hand,
body, and facial gestures. Gestures are the way in which we communicate our emotions to the outside
world. Without them, we would seem cold, flat, and unemotional — rather like a computer character.
These AE-driven gestures are tied directly to our characters’ personalities and moods and follow
definite patterns.
This body language adds an extra dimension to a
character’s behavior, giving life and depth to
simulations populated by autonomous characters
that now posses unique personalities. We are all
used to seeing environments populated by
characters that all have identical motions or body
language. They all stand stiffly upright and move
like clockwork toys. Would it not be refreshing to
see a sad looking fellow, shoulders hunched over,
arms hanging limply and walking slowly as he
Letters to the Editor: makes his way through our environment? This idea
Write a letter immediately introduces all sorts of theatrical and P.F. Magic's Dogz
View all letters cinematic possibilities, such as populating our
environment with a whole cast of unique characters. Our viewer’s experience would be enriched as
well. "Who is that guy? Why does he look so sad? What’s his story? Should I go and ask him?" The
kinds of questions that occur to the viewer of a truly interactive experience are simply irrelevant
without AE.
(It should be noted that I could also substitute the acting term character in place of my term
personality. Character might be a more appropriate term, but could confuse the reader because I’m
using character to indicate an autonomous agent in this article. The terms are, however,
interchangeable.)
The future of AE
I can imagine a scene; I’m searching for a lost city in a wild, remote jungle with my trusted
autonomous companion Aeida. Suddenly, we find the entrance to the city and walk in. It’s still
inhabited. The inhabitants’ body language changes when they see us, reacting to our sudden intrusion.
Some become fearful, backing away and curling into a nonthreatening posture. Others do the
opposite, standing upright, shoulders back, chest out, and fists clenched — looks like trouble. We
stand motionless for a time, until a very jovial character smiles broadly at us, laughs, then comes over
to greet us, telling the other inhabitants to do likewise. The inhabitants’ interactive behavior, and more
importantly their individual behavior, creates a living world for us to explore and within which to
entertain ourselves. This environment would be socially oriented; our decisions and actions would be
based upon the personalities and moods of the characters that we encounter. Essentially, the
characters’ decisions and actions would be interactively based upon ours; nothing would be prescripted
(unless the designer of the experience wished it that way, as in interactive theatre).
Such a world would require that designers spend a good deal of time designing their characters for
deep and engaging roles. Designers will need to add the skills of scriptwriting and storytelling to their
growing repertoire of talents. Interactive theatre and cinema is a relatively new area that is emerging
around autonomous characters. Those who are interested in participating in its development would be
wise to start their reading now. A great place to start looking is the web site composed by Andrew
See Also:
Artificial Intelligence:Gaming
Deterministic Algorithms
Deterministic algorithms are the simplest of the AI techniques used in games. These algorithms use a set of variables
as the input and then use some simple rules to drive the computer- controlled enemies or game objects based on these
inputs. We can think of deterministic algorithms as reflexes or very low-level instincts. Activated by some set of
conditions in the environment, the algorithms then perform the desired behavior relentlessly without concern for the
outcome, the past, or future events.
The chase algorithm is a classic example of a deterministic algorithm. The chase algorithm is basically a method of
intelligence used to hunt down the player or some other object of interest in a game by applying the spatial
coordinates of the computer-controlled object and the object to be tracked. Figure 2 illustrates a good example of
this. It depicts a top view of a typical battleground, on which three computer-controlled bad guys and one player are
fighting. The question is, how can we make the computer-controlled bad guys track and move toward the player?
One way is to use the coordinates of the bad guys and the coordinates of the player as inputs into a deterministic
algorithm that outputs direction changes or direction vectors for the bad guys in real time.
Let's use bad guy one as the example. We see that he is located at coordinates (bx1,by1) and the player is located at
coordinates (px,py). Therefore, a simple algorithm to make the bad guy move toward the player would be:
// process x-coords
if (px>bx1) bx1++;
else if (px<bx1) bx1--;
// process y-coords
if (py>by1) by1++;
else if (py<by1) by1--;
That's all there is to it. If we wanted to reverse the logic and make the bad guy run then the conditional logic could be
inverted or the outcome increment operators could be inverted. As an example of deterministic logic, Listing 2 is a
complete program that will make a little computer-controlled dot chase a player-controlled dot. Use the numeric
keypad to control your player and press ESC to exit the program.
Now let's move on to another typical behavior, which we can categorize as random logic.
Random Logic
Sometimes an intelligent creature exhibits almost random behaviors. These random behaviors may be the result of
any one of a number of internal processes, but there are two main ones that we should touch upon--lack of
information and desired randomness.
The first premise is an obvious one. Many times an intelligent creature does not have enough information to make a
decision or may not have any information at all. The creature then simply does the best it can, which is to select a
random behavior in hopes that it might be the correct one for the situation. For example, let's say you were dropped
into a dungeon and presented with four identical doors. Knowing that all but one meant certain death, you would
simply have to randomly select one!
The second premise that brings on a random selection is intentional. For example, say you are a spy trying to make a
getaway after acquiring some secret documents (this happens to me all the time). Now, imagine you have been seen,
and the bad guys start shooting at you! If you run in a straight line, chances are you are going to get shot. However,
if during your escape you make many random direction changes and zigzag a bit, you will get away every time!
What we learn from that example is that many times random logic and selections are good because it makes it harder
for the player to determine what the bad guys are going to do next, and it's a good way to help the bad guys make a
selection when there isn't enough information to use a deterministic algorithm. Motion control is a typical place to
apply random logic in bad- guy AI. You can use a random number or probability to select a new direction for the bad
guy as a function of time. Let's envision a multiplayer game with a single, computer- controlled bad guy surrounded
by four human players. This is a great place to apply random motion, using the following logic:
The position of the bad guy is translated by a random amount in both X and Y, which in this case is +-5 pixels or
units.
Of course, we can use random logic for a lot of other things besides direction changes. Starting positions, power
levels, and probability of firing weapons are all good places to apply random logic. It's definitely a good technique
that adds a bit of unpredictability to game AI. Listing 3 is a demo of random logic used to control motion. The demo
creates an array of flies and uses random logic to move them around. Press ESC to exit the demo.
Now let's talk about patterns.
● Turn left
● Move forward
● Move backward
● Sit still
● Fire weapon
Even though we only have six selections, we can construct quite a few patterns with a short input list of 16 elements
as in the example. In fact there are 6 16 different possible patterns or roughly 2.8 trillion different behaviors. I think
that's enough to make something look intelligent! So how can we use encoded lists and patterns in a game for the
AI? One solid way is to use them to control the motion of a bad guy or game object. For example, a deterministic
algorithm might decide it's time to make a bad guy perform some complex motion that would be difficult if we used
standard conditional logic. Thus, we could use that pattern, which simply reads an encoded list directing the bad guy
to make some tricky moves. For example, we might have a simple algorithm like this:
You'll notice that the encoded pattern is made up simply of X and Y translations. The pattern could just as well have
contained complex records with a multitude of data fields. I've written detailed code that will create an example of
patterns and list processing, a demo of an ant that can process one of four patterns selected by the keys 1-4.
Unfortunately, it's too long to print here. Go to the Game Developer ftp site, though (ftp://ftp.mfi.com/gdmag/src),
and you can download it there.
Now we're starting to get somewhere, but we need an overall control unit with some form of memory, and we must
select the appropriate types of behaviors.
The FSM's transition diagram is shown in Figure 4. We can see that if the bad guy is within 50 units of the player,
then the bad guy moves into State 2 and simply attacks. If the bad guy is in the range of 51 to 100 units from the
player, then the bad guy goes into State 3 and moves in a pattern. Finally, if the bad guy is farther than 100 units
from the player then chances are the bad guy can't even see the player (in the imaginary computer universe). In that
case, the bad guy moves into State 1, which is random motion.
So how can we implement this simple FSM machine? All we need is a variable to record the current state and some
conditional logic to perform the state transitions and outputs. Listing 4 shows a rough algorithm that will do all this.
Note that S0 (the new state) does not trigger any behavior on the part of the opponent. Rather, it acts as a state
"switchbox," to which all states (except itself) transition. This allows you to localize in a single control block all the
decision making about transitions
Although this requires two cycles through the FSM loop to create one behavior, it's well worth it. In the case of a
small FSM, the entire loop can stay in the cache, and in the case of a large FSM loop, the localization of the
transition logic will more than pay for the performance penalty. If you absolutely refuse to double-loop, you can
handcraft the transitions between states. A finite-state machine diagram will vividly illustrate, in the form of
spaghetti transitions, when your transition logic is out of control.
Now that we have an overall thought controller, that is, an FSM, we should discuss simulating sensory excitation in a
virtual world.
Environmental Sensing
One problem that plagues AI game programming is that it can be very unfair-- at least to the player. The reason for
this is that the player can only "see" what's on the computer screen, whereas the computer AI system has access to all
variables and data that the player can't access.
This brings us to the concept of simulated sensory organs for the bad guys and game objects. For example, in a three-
dimensional tank game that takes place on a flat plain, the player can only see so far based on his or her field of
view. Further, the player can't see through rocks, buildings, and obstacles. However, because the game logic has
access to all the system variables and data structures, it is tempting for it to use this extra data to help with the AI for
the bad guys.
The question is, is this fair to the player? Well, of course not. So how can we make sure we supply the AI engine of
the bad guys and game objects with the same information the player has? We must use simulated sensory inputs such
as vision, hearing, vibration, and the like. Figure 5 is an example of one such imaginary tank game. Notice that each
opponent and the player has a cone of vision associated with it. Both the bad guys and the player can only see objects
within this cone. The player can only see within this cone as a function of the 3D graphics engine, but the bad guys
can only see within this cone as a function of their AI program. Let's be a little more specific about this.
Since we know that we must be fair to the player, what we can do is write a simple algorithm that scans the area in
front of each bad guy and determines if the player is within view. This scanning is similar to the player viewing the
viewport or looking out the virtual window. Of course, we don't need to perform a full three-dimensional scan with
ray tracing or the like--we can simply make sure the player is within the view angle of the bad guy in question by
using trigonometry of any technique we wish.
Based on the information obtained from each bad guy scan, the proper AI decision can be made in a more uniform
and fair manner. Of course, we may want to give the computer-controlled AI system more advantage than the human
player to make up for the AI system itself being rather primitive when compared to the 100 billion- cell neural
network it is competing against, but you get the idea.
Finally, we might ask, "Can we perform other kinds of sensing?" Yes. We can create simulated light detectors, sound
detectors, and so forth. I have been experimenting with an underwater game engine, and in total darkness the only
way the enemy creatures can "see" you is to listen to your propulsion units. Based on the power level of the player's
engines the game AI determines the sound level that the bad guys hear and moves them toward the sound source or
sources.
of times an alien found energion in each geographical region of the game.(Figure 6 illustrates one such memory
map.) Then, when an alien was power hungry, instead of randomly bouncing around, the alien would refer to this
memory data structure and select the geographical region with the highest probability of finding energion and set its
trajectory for this region.
The previous example is a simple one, but as we can see, memory and learning are actually very easy to implement.
Moreover, we can make the computer AI learn much more than where energion is. It could learn the most common
defensive moves of the player and use this information against the player.
Well that's enough for basic AI techniques. Let's take a quick look at how we can put it all together.
The Future
I see AI as the next frontier to explore. Without a doubt, most game programmers have focused so much on graphics
that AI hasn't been researched much. The irony is that researchers have been making leaps and bounds in AI research
and Artificial Life or A-Life.
I'm sure you've heard the common terms "genetic algorithms" and "neural networks." Genetic algorithms are simply
a method of representing some aspect of a computer-based AI model with a set of "genes," which can represent
whatever we wish--aggressiveness, maximum speed, maximum vision distance, and so on. Then, a population of
creatures is generated using an algorithm that adds a little randomness in each of the output creatures' genes.
Our game world is then populated with these gene-based creatures. As the creatures interact in the environment, they
are killed, survive, and are reborn. The biological analog comes into play during the rebirthing phase. Either
manually or by some other means, the computer AI engine "mates" various pairs of creatures and mixes their genes.
The resulting offspring then survive another generation and the process continues. This causes the creatures to
evolve so that they are most adapted for the given environment.
Neural networks, on the other hand, are computer abstractions of a collection of brain cells that have firing
thresholds. You can enhance or diminish these thresholds and the connections between cells. By "teaching" a neural
network or strengthening and weakening these connections, the neural net can learn something. So we can use these
nets to help make decisions and even come up with new methods.
Andre LaMothe is the author of the best-selling Tricks of the Game Programming Gurus (SAMS Publishing, 1994)
and Teach Yourself Game Programming in 21 Days (SAMS Publishing, 1994). His latest creation is the Black Art of
3D Game Programming (Waite Group Press, 1995).
Discuss this article in the forums
(Note that this date does not necessarily correspond to the date the article was written)
© 1999-2002 Gamedev.net. All rights reserved. Terms of Use Privacy Policy
Comments? Questions? Feedback? Send us an e-mail!
See Also:
Artificial Intelligence:Gaming
What We Need
In order to play chess, a computer needs a certain number of software components. At the very least, these include:
● Some way to represent a chess board in memory, so that it knows what the state of the game is.
● Rules to determine how to generate legal moves, so that it can play without cheating (and verify that its human
opponent is not trying to pull a fast one on it!)
● A technique to choose the move to make amongst all legal possibilities, so that it can choose a move instead of
being forced to pick one at random.
● A way to compare moves and positions, so that it makes intelligent choices.
This series will cover all of the above, except the user interface, which is essentially a 2D game like any other. The
rest of this article describes the major issues related to each component and introduces some of the concepts to be
explored in the series.
Board Representations
In the early days of chess programming, memory was extremely limited (some programs ran in 8K or less) and the
simplest, least expensive representations were the most effective. A typical chessboard was implemented as an 8x8
array, with each square represented by a single byte: an empty square was allocated value 0, a black king could be
represented by the number 1, etc.
When chess programmers started working on 64-bit workstations and mainframes, more elaborate board
representations based on "bitboards" appeared. Apparently invented in the Soviet Union in the late 1960's, the bit
board is a 64-bit word containing information about one aspect of the game state, at a rate of 1 bit per square. For
example, a bitboard might contain "the set of squares occupied by black pawns", another "the set of squares to which
a queen on e3 can move", and another, "the set of white pieces currently attacked by black knights". Bitboards are
versatile and allow fast processing, because many operations that are repeated very often in the course of a chess
game can be implemented as 1-cycle logic operations on bitboards.
Part II of this series covers board representations in detail.
Move Generation
The rules of the game determine which moves (if any) the side to play is allowed to make. In some games, it is easy
to look at the board and determine the legal moves: for example, in tic-tac-toe, any empty square is a legal move.
For chess, however, things are more complicated: each piece has its own movement rules, pawns capture diagonally
and move along a file, it is illegal to leave a king in check, and the "en passant" captures, pawn promotions and
castling moves require very specific conditions to be legal.
In fact, it turns out that move generation is one of the most computationally expensive and complicated aspects of
chess programming. Fortunately, the rules of the game allow quite a bit of pre-processing, and I will describe a set
of data structures which can speed up move generation significantly.
Part III of this series covers this topic.
Search Techniques
To a computer, it is far from obvious which of many legal moves are "good" and which are "bad". The best way to
discriminate between the two is to look at their consequences (i.e., search series of moves, say 4 for each side and
look at the results.) And to make sure that we make as few mistakes as possible, we will assume that the opponent is
just as good as we are. This is the basic principle underlying the minimax search algorithm, which is at the root of
all chess programs.
Unfortunately, minimax' complexity is O(bn), where b ("branching factor") is the number of legal moves available
on average at any given time and n (the depth) is the number of "plies" you look ahead, where one ply is one move
by one side. This number grows impossibly fast, so a considerable amount of work has been done to develop
algorithms that minimize the effort expended on search for a given depth. Iterative-deepening Alphabeta, NegaScout
and MTD(f) are among the most successful of these algorithms, and they will be described in Part IV, along with the
data structures and heuristics which make strong play possible, such as transposition tables and the history/killer
heuristic.
Another major source of headaches for chess programmers is the "horizon effect", first described by Hans Berliner.
Suppose that your program searches to a depth of 8-ply, and that it discovers to its horror that the opponent will
capture its queen at ply 6. Left to its own devices, the program will then proceed to throw its bishops to the wolves
so that it will delay the queen capture to ply 10, which it can't see because its search ends at ply 8. From the
program's point of view, the queen is "saved", because the capture is no longer visible... But it has lost a bishop, and
the queen capture reappears during the next move's search. It turns out that finding a position where a program can
reason correctly about the relative strength of the forces in presence is not a trivial task at all, and that searching
every line of play to the same depth is tantamount to suicide. Numerous techniques have been developed to defeat
the horizon effect; quiescence search and Deep Blue's singular extensions are among the topics covered in Part V on
advanced search.
Evaluation
Finally, the program must have some way of assessing whether a given position means that it is ahead or that it has
lost the game. This evaluation depends heavily upon the rules of the game: while "material balance" (i.e., the
number and value of the pieces on the board) is the dominant factor in chess, because being ahead by as little as a
single pawn can often guarantee a victory for a strong player, it is of no significance in Go-Moku and downright
misleading in Othello, where you are often better off with fewer pieces on the board until the very last moment.
Developing a useful evaluation function is a difficult and sometimes frustrating task. Part VI of this series covers the
efforts made in that area by the developers of some of the most successful chess programs of all time, including
Chess 4.5, Cray Blitz and Belle.
Conclusion
Now that we know which pieces we will need to complete the puzzle, it is time to get started on that first corner.
Next month, I will describe the most popular techniques used to represent chess boards in current games. See you
there!
See Also:
Artificial Intelligence:Gaming
Bit Boards
For many games, it is hard to imagine better representations than the simple one-square, one-slot array. However,
for chess, checkers and other games played on a 64-square board, a clever trick was developed (apparently by the
KAISSA team in the Soviet Union) in the late 60's: the bit board.
KAISSA ran on a mainframe equipped with a 64-bit processor. Now, 64 happens to be the number of squares on a
chess board, so it was possible to use a single memory word to represent a yes-or-no or true-or-false predicate for the
whole board. For example, one bitboard might contain the answer to "Is there a white piece here?" for each square
of the board.
Therefore, the state of a chess game could be completely represented by 12 bitboards: one each for the presence of
white pawns, white rooks, black pawns, etc. Adding two bitboards for "all white pieces" and "all black pieces"
might accelerate further computations. You might also want to hold a database of bitboards representing the squares
attacked by a certain piece on a certain square, etc.; these constants come in handy at move generation time.
The main justification for bit boards is that a lot of useful operations can be performed using the processor's
instruction set's 1-cycle logical operators. For example, suppose you need to verify whether the white queen is
checking the black king. With a simple square-array representation, you would need to:
● Find the queen's position, which requires a linear search of the array and may take 64 load-test cycles.
● Examine the squares to which it is able to move, in all eight directions, until you either find the king or run out
of possible moves.
This process is always time-consuming, more so when the queen happens to be located near the end of the array, and
even more so when there is no check to be found, which is almost always the case!
With a bitboard representation, you would:
● Load the "white queen position" bitboard.
● Logical-AND that bitboard with the one for "black king position".
If the result is non-zero, then the white queen is checking the black king. Assuming that the attack bitboard database
is in cache memory, the entire operation has consumed 3-4 clock cycles!
Another example: if you need to generate the moves of the white knights currently on the board, just find the attack
bitboards associated with the positions occupied by the knights and AND them with the logical complement of the
bitboard representing "all squares occupied by white pieces" (i.e, apply the logical NOT operator to the bitboard),
because the only restriction on knights is that they can not capture their own pieces!
For a (slightly) more detailed discussion of bitboards, see the article describing the CHESS 4.5 program developed at
Northwestern University, in Peter Frey's book Chess Skill in Man and Machine; there are at least two editions of this
book, published in 1977 and 1981.
Note: To this day, few personal computers use true 64-bit processors, so at least some of the speed advantages
associated with bitboards are lost. Still, the technique is pervasive, and quite useful.
Transposition Tables
In chess, there are often many ways to reach the same position. For example, it doesn't matter whether you play 1.
P-K4 ... 2. P-Q4 or 1. P-Q4... 2. P-K4; the game ends up in the same state. Achieving identical positions in different
ways is called transposing.
Now, of course, if your program has just spent considerable effort searching and evaluating the position resulting
from 1. P-K4 ... 2. P-Q4, it would be nice if it were able to remember the results and avoid repeating this tedious
work for 1. P-Q4... 2. P-K4. This is why all chess programs, since at least Richard Greenblatt's Mac Hack VI in the
late 1960's, have incorporated a transposition table.
A transposition table is a repository of past search results, usually implemented as a hash dictionary or similar
structure to achieve maximum speed. When a position has been searched, the results (i.e., evaluation, depth of the
search performed from this position, best move, etc.) are stored in the table. Then, when new positions have to be
searched, we query the table first: if suitable results already exist for a specific position, we use them and bypass the
search entirely.
There are numerous advantages to this process, including:
● Speed. In situations where there are lots of possible transpositions (i.e., in the endgame, when there are few
pieces on the board), the table quickly fills up with useful results and 90% or more of all positions generated
will be found in it.
● Free depth. Suppose you need to search a given position to a certain depth; say, four-ply (i.e., two moves for
each player) ahead. If the transposition table already contains a six-ply result for this position, not only do you
avoid the search, but you get more accurate results than you would have if you had been forced to do the
work!
● Versatility. Every chess program has an "opening book" of some sort, i.e., a list of well-known positions and
best moves selected from the chess literature and fed to the program to prevent it from making a fool out of
itself (and its programmer) at the very beginning of the game. Since the opening book's modus operandi is
identical to the transposition table (i.e., look up the position, and spit out the results if there are any), why not
initialize the table with the opening book's content at the beginning of the game? This way, if the flow of the
game ever leaves the opening book and later translates back into a position that was in it, there is a chance that
the transposition table will still contain the appropriate information and be able to use it.
The only real drawback of the transposition table mechanism is its voracity in terms of memory. To be of any use
whatsoever, the table must contain several thousand entries; a million or more is even better. At 16 bytes or so per
entry, this can become a problem in memory-starved environments.
Other uses of transposition tables
CHESS 4.5 also employed hash tables to store the results of then-expensive computations which rarely changed in
value or alternated between a small number of possible choices:
● Pawn structure. Indexed only on the positions of pawns, this table requires little storage, and since there are
comparatively few possible pawn moves, it changes so rarely that 99% of positions result in hash table hits.
● Material balance, i.e., the relative strength of the forces on the board, which only changes after a capture or a
pawn promotion.
This may not be as useful in these days of plentiful CPU cycles, but the lesson is a valuable one: some measure of
pre-processing can save a lot of computation at the cost of a little memory. Study your game carefully; there may be
room for improvement here.
● Scan the board; when you encounter a piece, XOR its random number to the current hash key. Repeat until
the entire board has been examined.
An interesting side effect of the scheme is that it will be very easy to update the hash value after a move, without
re-scanning the entire board. Remember the old XOR-graphics? The way you XOR'ed a bitmap on top of a
background to make it appear (usually in distorted colors), and XOR'ed it again to make it go away and restore the
background to its original state? This works similarly. Say, for example, that a white rook on H1 captures a black
pawn on H4. To update the hash key, XOR the "white rook on H1" random number once again (to "erase" its
effects), the "black pawn on H4" (to destroy it) and the "white rook on H4" (to add a contribution from the new rook
position).
Use the exact same method, with different random numbers, to generate a second key (or "hash lock") to store in the
transposition table along with the truly useful information. This is used to detect and avoid collisions: if, by chance,
two boards hash to the exact same key and collide in the transposition table, odds are extremely low that they will
also hash to the same lock!
History Tables
The "history heuristic" is a descendant of the "killer move" technique. A thorough explanation belongs in the article
on search; for now, it will be sufficient to say that a history table should be maintained to note which moves have
had interesting results in the past (i.e., which ones have cut-off search quickly along a continuation) and should
therefore be tried again at a later time. The history table is a simple 64x64 array of integer counters; when the search
algorithm decides that a certain move has proven useful, it will ask the history table to increase its value. The values
stored in the table will then be used to sort moves and make sure that more "historically powerful" ones will be tried
first.
● There are 64 x 5 = 320 combinations of major piece and square from which to move, 48 squares on which a
black pawn can be located (they can never retreat to the back rank, and they get promoted as soon as they
reach the eight rank), and 48 where a white pawn can be located.
● Let us define a "ray" of moves as a sequence of moves by a piece, from a certain square, in the same
direction. For example, all queen moves towards the "north" of the board from square H3 make up a ray.
● For each piece on each square, there are a certain number of rays along which movement might be possible.
For example, a king in the middle of the board may be able to move in 8 different directions, while a bishop
trapped in a corner only has one ray of escape possible.
● Prior to the game, compute a database of all rays for all pieces on all squares, assuming an empty board (i.e.,
movement is limited only by the edges and not by other pieces).
● When you generate moves for a piece on a square, scan each of its rays until you either reach the end of the
ray or hit a piece. If it is an enemy piece, this last move is a capture. If it is a friendly piece, this last move is
impossible.
With a properly designed database, move generation is reduced to a simple, mostly linear lookup; it requires virtually
no computation at all. And the entire thing holds within a few dozen kilobytes; mere chump change compared to the
transposition table!
All of the techniques described above (bit boards, history, transposition table, pre-processed move database) will be
illustrated in my own chess program, to be posted when I finish writing this series. Next month, I will examine
move generation in more detail.
François Dominic Laramée, June 2000
Discuss this article in the forums
See Also:
Artificial Intelligence:Gaming
The Dilemma
No matter how you slice it, chess is a complicated game, especially for a computer.
In any given situation, a player may have 30 or more legal moves to choose from, some good, some suicidal. For
trained humans, it is easy to characterize the majority of these moves as foolish or pointless: even beginners learn
that they had better come up with a solid plan before leaving their queen in a position where she can be captured, and
masters know (more through instinctive pattern matching than by conscious effort) which 1-2 moves are likely to be
the strongest in the position.
However, coding this information (especially the unconscious type!) into a computer has proven spectacularly
difficult, and the strongest programs (except, to some extent, Hans Berliner's Hitech and its siblings) have given up
on this approach, instead relying on "brute force": if you can analyze all possible moves fast enough and predict their
consequences far enough down the road, it doesn't matter whether or not you start with a clear idea of what you are
trying to accomplish, because you'll discover a good move eventually. Therefore, move generation and search
should be made as fast as possible, so as to minimize the loss of effort required by the brute force method.
Search will be discussed in Parts IV and V of this series; this month, we will concentrate on move generation.
Historically, three major strategies have been used in this area:
● Selective generation: Examine the board, come up with a small number of "likely" moves and discard
everything else.
● Incremental generation: Generate a few moves, hoping that one will prove so good or so bad that search along
the current line of play can be terminated before generating the others.
● Complete generation: Generate all moves, hoping that the transposition table (discussed in Part II) will contain
relevant information on one of them and that there will be no need to search anything at all.
Selective generation (and its associated search technique, called forward pruning) have all but disappeared since the
mid 1970's. As for the other two, they represent two sides of the same coin, trading off effort in move generation vs
search. In games where move generation is easy and/or there are lots of ways to transpose into the same positions
(i.e., Othello and GoMoku), complete generation may be most efficient, while in games where move generation rules
are complicated, incremental generation will usually work faster. Both strategies are sound and viable, however.
● Only examine the "best" moves, as determined from a detailed analysis of a position, and then only the "best"
replies to each, recursively.
At first, the second alternative seemed more likely to succeed. After all, this is how human players do it, and it
seems logical to assume that looking at only a few moves at each node will be faster than looking at all of them.
Unfortunately, the results disproved the theory: programs using selective search just didn't play very well. At best,
they achieved low to mid-level club player ratings, often committing humiliating blunders at the worst possible
time. Beating a world champion (or even playing reasonably well on a consistent basis) was beyond their reach.
The problem is that, for a "best move generator" to be any good, it has to be almost perfect. Suppose that a program
is equipped with a function that looks for the 5 best moves in a position, and that the objective best move is among
those 5 at least 95% of the time. (That, by the way, is a big assumption.) This means that the probability that the
generator's list will always contain the best choice at all times, during a 40-move game, is less than 13%. Even a
god-like generator with 99% accuracy will blunder at least once in about a third of its games, while a more
reasonable 90%-accurate function will play an entire game without messing up less than 1.5% of the time!
In the mid 1970's, the legendary Northwestern team of Slate and Atkin decided to do away with the complicated
best-move generator; it turned out that the time they saved in avoiding costly analysis during move generation was
enough to cover the added expense of a full-width search (i.e., examining all possible moves). To all intents and
purposes, this discovery buried forward pruning for good.
Botvinnik's work
An extreme example of a forward pruning algorithm was developed in the Soviet Union, in the 1970's and early
1980's, under the tutelage of former World chess champion Mikhail Botvinnik. Botvinnik was convinced that the
only way for a computer to ever play grandmaster-level chess was to play like a grandmaster, i.e., examine only a
few moves, but in great depth and detail. His program seeked to identify and implement the sort of high-level plans
and patterns which a world-class player might come up with during a game. While it led to some fascinating books,
revealing insights into the master's mind which only Botvinnik could provide, this work unfortunately did not reach
its lofty goals.
● Ordering them in some way, hopefully speeding up search by picking an advantageous order.
● Searching them all one at a time, until all moves have been examined or a cutoff occurs.
Early programs, for example Sargon, did this by scanning the board one square at a time, looking for pieces of the
moving side, and computing possible move destinations on the fly. Memory being at a premium, the expenditure of
CPU time required to re-compute these moves every time was a necessary evil.
These days, a pre-processed data structure like the one I described last month can avoid a considerable amount of
computation and code complexity, at the cost of a few dozens of Kbytes. When this super-fast move generation is
combined with transposition tables, an added bonus may fall into the programmer's lap: if even one of the moves has
already been searched before, and if its evaluation (as retrieved from the table) is such that it triggers a cutoff, there
will be no need to search any of the moves at all! Obviously, the larger the transposition table, and the higher the
probability of a transposition given the rules of the game, the bigger the average payoff.
Not only is this technique conceptually simple, it is also the most "universal": while there are easy ways to segregate
chess moves into different categories (i.e., captures and non-captures), other games like Othello do not provide such
convenient tools to work with. Therefore, if you intend your program to play more than one game, you should
probably pick this technique instead of the one described in the next section.
course, if you knew which moves are best, you wouldn't be searching in the first place. Still, there are ways to
"guess" which moves are more likely to be good than others. For example, you might start with captures, pawn
promotions (which dramatically change material balance on the board), or checks (which often allow few legal
responses); follow with moves which caused recent cutoffs at the same depth in the tree (so-called "killer moves"),
and then look at the rest. This is the justification for iterative deepening alphabeta, which we will discuss in detail
next month, as well as the history table we talked about last time. Note that these techniques do not constitute
forward pruning: all moves will be examined eventually; those which appear bad are only delayed, not eliminated
from consideration.
A final note: in chess, some moves may be illegal because they leave the King in check. However, such an
occurrence is quite rare, and it turns out that validating moves during generation would cost a tremendous amount of
effort. It is more efficient to delay the check until the move is actually searched: for example, if capturing the King
would be a valid reply to Move X, then Move X is illegal and search should be terminated. Of course, if search is
cutoff before the move has to be examined, validation never has to take place.
My Choice
For my chess program, I have chosen to generate all moves at the same time. These are only some of the reasons
why:
● I intend to use the program as a basis for several other games, most of which have no direct counterparts to
chess captures.
● I have plenty of memory to play with.
● The code required to implement this technique is simpler to write and to understand; you will thank me when
you see it.
● There are several freeware programs that implement piece-meal move generation; the curious reader should
look at Crafty, for example, as well as James Swafford's Galahad.
While overall performance may be slightly less stellar than otherwise, my program (written in Java, no less) wouldn't
exactly provide a challenge to Deep Blue even in the best case, so I won't feel too bad!
Next Month
Now, we are ready to delve into the brains of a chess-playing program, with search techniques. This is such a large
topic that it will require two articles. We will begin with the basic search algorithms common to all games, before
continuing with new developments and chess-specific optimizations in the next installment.
See Also:
Artificial Intelligence:Gaming
Why Search?
Well, basically, because we are not smart enough to do without it.
A really bright program might be able to look at a board position and determine who is ahead, by how much, and
what sort of plan should be implemented to drive the advantage to fruition. Unfortunately, there are so many patterns
to discern, so many rules and so many exceptions, that even the cleverest programs just aren't very good at this sort
of thing. What they are good at, however, is computing fast. Therefore, instead of trying to figure out good moves
just by looking at a board, chess programs use their brute force to do it: look at every move, then at every possible
countermove by the opponent, etc., until the processor melts down.
Deep searches are an easy way to "teach" the machine about relatively complicated tactics. For example, consider
the knight fork, a move which places a knight on a square from which it can attack two different pieces (say, a rook
and the queen). Finding a way to represent this type of position logically would require some effort, more so if we
also had to determine whether the knight was itself protected from capture. However, a plain dumb 3-ply search will
"learn" the value of a fork on its own: it will eventually try to move the knight to the forking square, will test all
replies to this attack, and then capture one of the undefended pieces, changing the board's material balance. And
since a full-width search looks at everything, it will never miss an opportunity: if there is a 5-move combination,
however obscure, that leads to checkmate or to a queen capture, the machine will see it if its search is deep enough.
Therefore, the deeper the search, the more complicated the "plans" which the machine can stumble upon.
Grandpa MiniMax
The basic idea underlying all two-agent search algorithms is Minimax. It dates back from the Dark Ages; I believe
Von Neumann himself first described it over 60 years ago.
Minimax can be defined as follows:
● Assume that there is a way to evaluate a board position so that we know whether Player 1 (whom we will call
Max) is going to win, whether his opponent (Min) will, or whether the position will lead to a draw. This
evaluation takes the form of a number: a positive number indicates that Max is leading, a negative number,
that Min is ahead, and a zero, that nobody has acquired an advantage.
● Max's job is to make moves which will increase the board's evaluation (i.e., he will try to maximize the
evaluation).
● Min's job is to make moves which decrease the board's evaluation (i.e., he will try to minimize it).
● Assume that both players play flawlessly, i.e., that they never make any mistakes and always make the moves
that improve their respective positions the most.
How does this work? Well, suppose that there is a simple game which consists of exactly one move for each player,
and that each has only two possible choices to make in a given situation. The evaluation function is only run on the
final board positions, which result from a combination of moves by Min and Max.
Max Move Min Move Evaluation
A C 12
A D -2
B C 5
B D 6
Max assumes that Min will always play perfectly. Therefore, he knows that, if he makes move A, his opponent will
reply with D, resulting in a final evaluation of -2 (i.e., a win for Min). However, if Max plays B, he is sure to win,
because Min's best move still results in a positive final value of 5. So, by the Minimax algorithm, Max will always
choose to play B, even though he would score a bigger victory if he played A and Min made a mistake!
The trouble with Minimax, which may not be immediately obvious from such a small example, is that there is an
exponential number of possible paths which must be examined. This means that effort grows dramatically with:
● The number of possible moves by each player, called the branching factor and noted B.
● The depth of the look-ahead, noted d, and usually described as "N-ply", where N is an integer number and
"ply" means one move by one player. For example, the mini-game described above is searched to a depth of
2-ply, one move per player.
In Chess, for example, a typical branching factor in the middle game would be about 35 moves; in Othello, around 8.
Since Minimax' complexity is O( B^n ), an 8-ply search of a chess position would need to explore about 1.5 million
possible paths! That is a LOT of work. Adding a ninth ply would make the tree balloon to about 50 million nodes,
and a tenth, to an impossible 1.8 billion!
Luckily, there are ways to cut the effort by a wide margin without sacrificing accuracy.
combinations of moves, alphabeta turns on the Warp drive: in the best case, it will only need to examine roughly
twice the square root of the number of nodes searched by pure Minimax, which is about 2,500 instead of 1.5 million
in the example above.
Next Month
In Part V, we will discuss the limitations of straight, fixed-depth alphabeta search, and how to improve playing
strength using techniques like the null-move heuristic, quiescence search, aspiration search and MTD(f), and the
"singular extensions" which made Deep Blue famous. Hold on, we're almost done!
François Dominic Laramée, August 2000
Discuss this article in the forums
See Also:
Artificial Intelligence:Gaming
Why Bother?
So far, all of the search algorithms we have looked at examine a position's consequences to a fixed "depth".
However, this is rarely a good thing. For example, suppose that your program uses an iterative-deepening alpha-beta
algorithm with maximum depth 5-ply. Now look at these cases:
● Along a certain line of play, you discover a position where one of the players is checkmated or stalemated at
depth 3. Obviously, you don't want to keep searching, because the final state of the game has been resolved.
Not only would searching to depth 5 be a colossal waste of effort, it may also allow the machine to finagle its
way into an illegal solution!
● Now, suppose that, at depth 5, you capture a pawn. The program would be likely to score this position in a
favorable light, and your program might decide that the continuation leading to it is a useful one. However, if
you had looked one ply further, you might have discovered that capturing the pawn left your queen
undefended. Oops.
● Finally, suppose that your queen is trapped. No matter what you do, she will be captured by the opponent at
ply 4, except for one specific case where her death will happen at ply 6. If you search to depth 5, the
continuations where the queen is captured at ply 4 will be examined accurately, and scored as likely disasters.
However, the unique line of play where the queen is only captured at ply 6 (outside of the search tree) doesn't
reveal the capture to the machine, which thinks that the queen is safe and gives it a much better score! Now, if
all you have to do to push the queen capture outside of the search tree is delay the opponent with a diversion,
doing so may be worth the risk: although it could damage your position, it might also cause the opponent to
make a mistake and allow the queen to escape. But what if you can only delay the queen capture by
sacrificing a rook? To the machine, losing a rook at ply 4 is less damaging than losing a queen, so it will
merrily throw its good piece away and "hide" the too-horrible-to-mention queen capture beyond its search
horizon. (During its next turn, of course, the machine will discover that the queen must now be captured at ply
4 in all cases, and that it has wasted a rook for no gain.) Hans Berliner described this "horizon effect" a long
time ago, and it is the most effective justification for the "quiescence search" described in the next section.
The bottom line is this: a great many positions in chess (and in other games as well) are just too chaotic to be
evaluated properly. An evaluation function can only be applied effectively to "quiet" positions where not much of
importance is likely to happen in the immediate future. How to identify these is our next topic.
Quiet, Please!
There are two ways to assess a position's value: dynamic evaluation (i.e., look at what it may lead to) and static
evaluation (i.e., see what it looks like on its own, irrespective of consequences). Dynamic evaluation is performed
through search; as we have just mentioned, static evaluation is only feasible when the position is not likely to
undergo an overwhelming change of balance in the near future. Such relatively stable positions are called "quiet" or
"quiescent", and they are identified via "quiescence search".
The basic concept of Quiescence Search is the following: once the program has searched everything to a fixed depth
(say, 6-ply), we continue each line of play selectively, by searching "non-quiescent" moves only, until we find a
quiescent position, and only then apply the evaluator.
Finding a quiet position requires some knowledge about the game. For example, which moves are likely to cause a
drastic change in the balance of power on the board? For chess, material balance tends to be the overwhelming
consideration in the evaluator, so anything that changes material is fair game: captures (especially those of major
pieces) and pawn promotions certainly qualify, while checks may also be worth a look (just in case they might lead
to checkmate). In checkers, captures and promotions also seem like reasonable choices. In Othello, every single
move is a capture, and "material balance" can change so much in so little time that it might be argued that there are
no quiet positions at all!
My own program uses a simple quiescence search which extends all lines of play (after a full-width search to depth
X) by looking exclusively at captures. Since there are usually not that many legal captures in a given position, the
branching factor in the quiescence search tends to be small (4-6 on average, and quickly converging to 0 as pieces
are eaten on both sides). Nevertheless, the quiescence search algorithm is called on a LOT of positions, and so it
tends to swallow 50% or more of the entire processing time. Make sure that you need such a scheme in your own
game before committing to it.
Only when no capture is possible does my program apply its evaluator. The result is a selectively-extended search
tree which is anything but fixed-depth, and which defeats most of the nasty consequences of the "horizon effect".
are still too strong even without playing (i.e., it creates a cutoff), you have saved 97% of your effort; if not,
you must examine your own legal moves as usual, and have only wasted an extra 3%. On average, the gain is
enormous.
● Now, suppose that, during quiescence search, you reach a position where your only legal capture is
rook-takes-pawn, which is immediately followed by the opponent's knight-takes-rook. You'd be a lot better
off not making the capture, and playing any other non-capture move, right? You can simulate this situation by
inserting the null move into the quiescence search: if, in a given position during quiescence search, it is
revealed that the null move is better than any capture, you can assume that continuing with captures from this
position is a bad idea, and that since the best move is a quiet one, this is a position where the evaluation
function itself should be applied!
Overall, the null-move heuristic can save between 20% and 75% of the effort required by a given search. Well worth
the effort, especially when you consider that adding it to a program is a simple matter of changing the "side to play"
flag and adding less than a dozen lines of code in the quiescence search algorithm!
Relativity to orang-utangs than get into that mess) but Plaat insists that MTD(f) is the most efficient algorithm in
existence today and I'll take his word for it. My own program uses MTD(f); you'll be able to marvel at the
algorithm's simplicity very shortly!
Singular Extensions
One last thing before we leave the topic of search: in chess, some moves are obviously better than others, and it may
not be necessary to waste too much time searching for alternatives.
For example, suppose that after running your iterative algorithm to depth N-1, you discover that one of your moves
is worth +9000 (i.e., a capture of the opponent's queen) and all others are below 0. If saving time is a consideration,
like in tournaments, you may want to bypass the whole depth N search and only look at the best move to depth N
instead: if this extra ply does not lower its evaluation much, then you assume that the other moves won't be able to
catch up, and you stop searching early. (Remember: if there are 35 valid moves at each ply on average, you may
have just saved 97% of your total effort!)
Deep Blue's team has pushed this idea one step further and implemented the concept of "singular extensions". If, at
some point in the search, a move seems to be a lot better than all of the alternatives, it will be searched an extra ply
just to make sure that there are no hidden traps there. (This is a vast oversimplification of the whole process, of
course, but that's the basic idea.) Singular extensions are costly: adding an extra ply to a node roughly doubles the
number of leaves in the tree, causing a commensurate increase in the number of calls to the evaluator; in other words,
Deep Blue's specialized hardware can afford it, my cranky Java code can't. But it's hard to argue with the results,
isn't it?
Next Month
In Part VI, we wrap up the series with a discussion of evaluation functions, the code which actually tells your
program whether a given board position is good or bad. This is an immense topic, and people can (and do) spend
years refining their own evaluators, so we will have to content ourselves with a rather high-level discussion of the
types of features which should be examined and their relative importance. If everything goes according to plan, I
should also have some Java code for you to sink your teeth into at about that time, so stick around, won't you?
François Dominic Laramée, September 2000
Discuss this article in the forums
See Also:
Artificial Intelligence:Gaming
Material Balance
To put it simply, material balance is an account of which pieces are on the board for each side. According to chess
literature, a queen may be worth 900 points, a rook 500, a bishop 325, a knight 300 and a pawn 100; the king has
infinite value. Computing material balance is therefore a simple matter: a side's material value is equal to
MB = Sum( Np * Vp )
where Np is the number of pieces of a certain type on the board and Vp is that piece's value. If you have more
material on the board than your opponent, you are in good shape.
Sounds simple, doesn't it? Yet, it is by far the overwhelming factor in any chess board evaluation function. CHESS
4.5's creators estimate that an enormous advantage in position, mobility and safety is worth less than 1.5 pawns. In
fact, it is quite possible to play decent chess without considering anything else!
Sure, there are positions where you should sacrifice a piece (sometimes even your queen) in exchange for an
advantage in momentum. These, however, are best discovered through search: if a queen sacrifice leads to mate in 3
moves, your search function will find the mate (provided it looks deep enough) without requiring special code. Think
of the nightmares if you were forced to write special-case code in your evaluator to determine when a sacrifice is
worth the trouble!
Few programs use a material evaluation function as primitive as the one I indicated earlier. Since the computation is
so easy, it is tempting to add a few more features into it, and most people do it all the time. For example, it is a
well-known fact that once you are ahead on material, exchanging pieces of equal value is advantageous. Exchanging
a single pawn is often a good idea, because it opens up the board for your rooks, but you would still like to keep
most of your pawns on the board until the endgame to provide defense and an opportunity for queening. Finally, you
don't want your program to panic if it plays a gambit (i.e., sacrifices a pawn) from its opening book, and therefore
you may want to build a "contempt factor" into the material balance evaluation; this allows your program to think it's
ahead even though it is behind by 150 points of material or more, for example.
Note that, while material balance is highly valuable in chess and in checkers, it is deceiving in Othello. Sure, you
must control more squares than the opponent at the end of the game to win, but it is often better to limit his options
by having as few pieces on the board as possible during the middlegame. And in other games, like Go-Moku and all
other Connect-N variations, material balance is irrelevant because no pieces are ever captured.
Development
An age-old maxim of chess playing is that minor pieces (bishops and knights) should be brought into the battle as
quickly as possible, that the King should castle early and that rooks and queens should stay quiet until it is time for a
decisive attack. There are many reasons for this: knights and bishops (and pawns) can take control of the center,
support the queen's attacks, and moving them out of the way frees the back rank for the more potent rooks. Later on
in the game, a rook running amok on the seventh rank (i.e., the base of operations for the opponent's pawns) can
cause a tremendous amount of damage.
My program uses several factors to measure development. First, it penalizes any position in which the King's and
Queen's pawns have not moved at all. It also penalizes knights and bishops located on the back rank where they
hinder rook movement, tries to prevent the queen from moving until all other pieces have, and gives a big bonus to
positions where the king has safely castled (and smaller bonuses to cases where it hasn't castled yet but hasn't lost the
right to do so) when the opponent has a queen on the board. As you can see, the development factor is important in
the opening but quickly loses much of its relevance; after 10 moves or so, just about everything that can be measured
here has already happened.
Note that favoring development can be dangerous in a game like Checkers. In fact, the first player to vacate one of
the squares on his back rank is usually in trouble; avoiding development of these important defensive pieces is a very
good idea.
Pawn Formations
Chess grandmasters often say that pawns are the soul of the game. While this is far from obvious to the neophyte, the
fact that great players often resign over the loss of a single pawn clearly indicates that they mean it!
Chess literature mentions several types of pawn features, some valuable, some negative. My program looks at the
following:
● Doubled or tripled pawns. Two or more pawns of the same color on the same file are usually bad, because
they hinder each other's movement.
● Pawn rams. Two opposing pawns "butting heads" and blocking each other's forward movement constitute an
annoying obstacle.
● Passed pawns. Pawns which have advanced so far that they can no longer be attacked or rammed by enemy
pawns are very strong, because they threaten to reach the back rank and achieve promotion.
● Isolated pawns. A pawn which has no friendly pawns on either side is vulnerable to attack and should seek
protection.
● Eight pawns. Having too many pawns on the board restricts mobility; opening at least one file for rook
movement is a good idea.
A final note on pawn formations: a passed pawn is extremely dangerous if it has a rook standing behind it, because
any piece that would capture the pawn is dead meat. My program therefore scores a passed pawn as even more
valuable if there is a rook on the same file and behind the pawn.
Next Month
Well, there ain't no next month. This is it.
If I wanted to drag this series even longer, I could write about opening books, endgame libraries, specialized chess
hardware and a zillion other things. I could, I could. But I won't. Some of these topics I reserve for the book chapter I
will be writing on this very topic later this Fall. Others I just don't know enough about to contribute anything useful.
And mostly, I'm just too lazy to bother.
Still, I hope you enjoyed reading this stuff, and that you learned a useful thing or two or three. If you did, look me up
next year at the GDC or at E3, and praise me to whomever grants freelance design and programming contracts in
your company, will ya?
Cheers!
François Dominic Laramée, October 2000
Discuss this article in the forums
| | | |
| | | | | |
Features
by Ernest Adams
Gamasutra
[Author's Bio] Designing Need-Based AI for Virtual Gorillas
December 22, 2000
Printer Friendly
Now that I'm freelance, I get quite a variety of projects to work on. One of the most interesting
Version
involves updating the Virtual Gorilla exhibit at Zoo Atlanta, in Georgia. Longtime readers of The
Designer's Notebook might remember that I wrote about this project in "The VR Gorilla/Rhino Test" a
Discuss this couple of years ago. When you use the exhibit, you put on a VR headset and experience the zoo's
Article gorilla enclosure as if you were a gorilla yourself. Now the zoo's management has asked me to help
them port the exhibit over to new hardware and incorporate new AI for the virtual gorillas.
At the moment, the gorillas don't do much; walking around and exhibiting dominance behavior is
about the extent of it. The dominance behavior occurs when a low-status gorilla wanders into a
higher-status gorilla's personal space, and it consists of an escalating series of aggressive displays
until the low-status gorilla is scared off. It's accurate as far as it goes, but we'd like to extend the
behavior model to involve a variety of actions: eating, drinking, sleeping, playing. This will involve
creating a model of "gorilla needs" which are fulfilled by these activities in a reasonably realistic
manner. That got me thinking about need mechanisms, and they're the subject of this month's
column.
A need mechanism consists of several elements. The main one is a variable which describes how much
of a given needed object or substance the organism has at the moment. Various events in the world,
or activities on the part of the organism, can cause this variable to go up and down. A very simple
example is the remaining ammunition in a first-person shooter. Firing your weapon consumes
ammunition and lowers the amount remaining; picking up clips off the floor raises it again.
However, this creates a problem. Suppose you establish a threshold for remaining ammunition of
10%. Below that threshold our space marine will go look for some ammo; above it he'll continue
exploring. What will happen is that as soon as his ammunition drops below it, he'll go pick up a clip,
but once he's got that clip, he'll go back to exploring again, even if there is a second clip right there.
After he's fired a few bullets the count will drop and he'll start looking around for another clip. He'll be
in a very tight behavioral loop, always using clips and picking up new ones one at a time. Similarly, a
hungry gorilla would eat one bite, wander off for a few minutes, then come back and eat one more
bite, and so on. That's not the way gorillas, or space marines, behave.
What's needed is a second threshold—higher than the first—that tells when to stop the fulfillment
behavior. We want the gorilla to continue to eat until she is sated, not just until she is no longer
hungry. Similarly we want the space marine to go on picking up ammunition until he's sure he's got
enough to last him a while. This mechanism with the two thresholds, one to trigger the behavior and
one to inhibit it, occurs quite often in the natural world and in other kinds of devices as well. It's called
hysteresis, and it's the reason that the furnace in your house doesn't start and stop every 30 seconds.
When you set the thermostat at 68 degrees, the furnace comes on at 68, but it actually goes off at 72.
The thermostat doesn't display the inhibitory threshold, but it's built into the machinery.
But now we have another problem. Suppose our space marine has picked up all the available clips, or
our gorilla has eaten all the available food. She's above her hunger threshold, but not yet up to her
satiety threshold. With the current mechanism, she'll sit there forever, waiting for more food to come
along. Our space marine will continue to search for clips endlessly, even though he's got them all. We
need to place an artificial limit on the fulfillment behavior if it's no longer successful.
Exactly how this is done depends on how the needed item is distributed, and how smart you want the
organism to be. With human beings who know for a fact that there's only so much of the item around,
they should stop immediately when it runs out - if you eat all the food in the fridge, you don't hang
around the refrigerator hoping it'll magically get some more in it somehow. In the case of the gorilla,
though, I would expect her to wait hopefully by the food distribution point for a little while, maybe
searching around a bit before giving up. In the case of the space marine, although he may have found
all the clips he can see, he also knows that clips tend to be hidden in a variety of places. He shouldn't
necessarily stop looking for them as soon as he's picked up all the ones in a room; he should continue
to hunt around a little longer, and give up only after he hasn't found any for a while. In each case, we
want to set a timer (I'll call it the "persistence timer") every time the fulfillment occurs - the gorilla
eats something, or the space marine picks up a clip. It starts to run down while they look for more. If
they haven't obtained any by the time the timer runs out, then the behavior is apparently unsuccessful
and we interrupt it and return to other things.
However, even this isn't completely straightforward; it depends on the urgency of the need. A gorilla
who's eaten enough to not be hungry any more, but is not yet sated, should probably give up and
wander off fairly soon. But a gorilla who's still hungry might search for longer, and a starving gorilla
might search indefinitely. Depending on the importance of the item, you should set the length of the
persistence timer proportionately to the urgency of the need. A space marine who's completely out of
ammo should make finding more his first priority, as long as he's not actually under attack.
You can also set additional trigger thresholds to initiate a variety of different behaviors at different
levels of urgency. A hungry gorilla will go and eat; a very hungry gorilla might try to steal food away
from another one, and a starving gorilla might openly attack another one to get its food. The utmost
extreme, at least in humans, is cannibalism.
This brings me to another point: not everybody behaves the same way. Some people would rather die
than become cannibals; others have no such compunction. I have long felt that we need to make more
of an effort to create unique individuals in computer games. The positions of these trigger thresholds
and the length of the persistence timer should be partially randomized for each individual. Some
marines are risk-takers, and won't start to look for ammo until they're down to their last bullet. Others
are cautious and start looking early. These differences add richness to your game at a trivial cost in
code and CPU time, and I definitely intend to add them to the virtual gorilla exhibit. There's a more
detailed discussion of this in another early column of mine, "Not Just Another Scary Face."
It's also possible to have too much of a good thing. Just as there are thresholds and behaviors for
acquiring more of a needed item, we can also implement thresholds and behaviors for getting rid of
excess, at the top end of the scale. In role-playing games, for example, there's often a penalty for
carrying too much weight. When this occurs, we want our NPC to dump low-value items out of his
backpack until the weight is back down to a manageable level. Choosing which items to dump is of
course very tricky, but it's the right response in principle.
The amount of ammunition you have in a shooter is normally only modified by two actions: firing
reduces it; picking up clips raises it again. Otherwise it doesn't change. In the VR gorilla simulator,
we're going to want the gorillas' needs to change automatically over time - gorillas gradually get more
and more hungry regardless of what else happens. Similarly, different activities should affect the
needs at different rates. Gorillas that are very active should get hungry faster than gorillas that are
sedentary. This whole system is of course how The Sims works, quite explicitly and openly to the
player. In the case of the virtual gorillas we're not going to try to train them, so it can all be hidden.
The Sims also nicely illustrates another issue: needs interactions. In The Sims, the simulated people
have several needs: food, sleep, entertainment, hygiene, the toilet and so on, but they can only do
one thing at a time. The needs are all competing for the sim's attention, and they have to get them all
fulfilled or they become unhappy - or worse. To manage this they have a queue of things to do, and
it's ordered by the urgency of the need. Using the toilet is at the top, entertainment is at the bottom.
Because everything in the game takes a long time, often the Sims never get around to having any fun,
and get stressed out as a result.
With multiple needs, you don't necessarily have to wait until the inhibition threshold is reached to stop
the current behavior, or until the persistence timer runs out. If all the behaviors can interrupt one
another and they can all last an indefinite length of time, you can recompute the urgency of each need
every few seconds, and choose a new behavior whenever a new need is more urgent than the one
you're currently fulfilling. However, that could again lead to odd behaviors - a person might eat three
bites, sleep for five minutes, go to the bathroom, sleep for another five minutes, go eat another three
bites, and so on. In The Sims each behavior consists of a series of animations that take a certain
minimum amount of time, and they don't generally interrupt the current behavior until it's finished.
And of course it's all further complicated by the fact that you can give them instructions which override
their own instincts.
The virtual gorilla exhibit won't be as complex as The Sims, but we'll probably want a queue of
behaviors as well, sorted by urgency, and a certain minimum amount of time that any behavior can be
performed. The urgency is a combination of two factors: how far below the trigger threshold the
variable is, and a weighting for the type of need itself. For example, playing is intrinsically less
important than eating, but not in every possible circumstance. I'm sure as a child you've had the
experience of playing for a long time and having so much fun that you didn't notice you were hungry
until there was some interruption. For juvenile gorillas we'll give extra weight to the need to play,
which will cause it to be higher in the queue than eating until the gorilla gets really hungry.
One of the classic design questions for computer games is whether urgent needs should interfere with
performance. Clearly, if you're out of ammo, you can't fire your weapons, so you can't shoot anyone
and take his ammunition. If that were the only way to get any, it would create a deadlock. Most
shooter games break the deadlock by providing caches of ammunition that you don't have to shoot
anybody to get, or letting you use non-firearm weapons to dispatch enemies without needing any
ammunition.
With health, however, the situation is different. Obviously somebody who's near death from bullet
wounds shouldn't really be able to run around and fight at top speed, but most shooter games simply
ignore this. If you create a performance penalty for damage taken in an action game, negative
feedback sets in much too quickly and you don't have a chance. In a war game, on the other hand,
things move more slowly and you can often compensate for damaged forces with sound strategy or
efficient production. Besides, damage is supposed to convey an advantage to the other side; that's the
point of causing it.
I don't anticipate having any such problems with the virtual gorillas. For one thing, they don't kill each
other - their dominance behaviors, while dramatic, are not life-threatening. The only resource the
gorillas can't generate for themselves is food, and we'll make sure it's provided in abundance so that
we don't get any deadlocks. I'd like to create a feedback loop in which a well-rested gorilla becomes a
playful one; a long-playing gorilla becomes a hungry one; and a well-fed gorilla becomes a sleepy one,
so that we'll see a regular round of activities. We'll have to keep a sharp eye out to make sure that it's
a stable loop, though. Too much feedback and they start playing, eating and sleeping faster and
faster; too little and they slow down and do nothing.
It's going to be a fun project.
________________________________________________________
join | contact us | advertise | write | my profile
news | features | contract work | jobs | resumes | product guide | store
| | | |
Features
By Steve Woodcock
Gamasutra
August 20, 1999
Game AI: The State of the Industry
This article originally It's been nearly a year since my first article outlining the then-current trends in the
appeared in the Contents
game development industry regarding game AI ("Game AI: The State of the
August, 1999 issue of: Industry," October 1998). Since that time, another Christmas season's worth of Introduction
releases has come and gone and another Game Developers Conference (GDC) has
provided yet another opportunity for AI developers to exchange ideas. While polls Technologies in the
taken at the 1999 GDC indicate that most developers (myself included) felt that Limelight
the last year had seen incremental, rather than revolutionary advances in the field
of game AI, it seemed that enough interesting new developments have taken Technologies on
place, which makes an update to my previous article seem natural. the Wane
I'm very pleased to say that good game AI is growing in importance within the Academia and the
industry, with both developers and marketeers seeing the value in building better Game Industry
and more capable computer opponents. The fears that multiplayer options on
games would make good computer AIs obsolete appear to have blown over in the What's Next?
face of one very practical consideration — sometimes, you just don't have time to
play with anybody else. The incredible pace of development in 3D graphics cards Sidebars
and game engines has made awesome graphics an expected feature, not an added Listing 1. Sample
one. Developers have found that one discriminator in a crowded marketplace is Baldur's Gate AI
good computer AI. script
As with last year's article, much of the insights presented herein flow directly from Influence Maps in
the AI roundtable discussions at the 1999 GDC. This interaction with my fellow a Nutshell
developers has proven invaluable in the past, and the 1999 AI roundtables proved
to be every bit as useful in gaining insight into what other developers are doing, AAAI Spring
the problems they're facing, and where they're going. I'll touch on some of the Symposium
topics and concerns broached by developers at the 1999 roundtables. I'll also
discuss what AI techniques and developments seem to be gaining favor among Further Info
developers, the academic world's take on the game AI field, and where some
developers think game AI will be headed in the coming year or two.
Is The Resource Battle Over?
Letters to the Editor:
Last year there were signs that development teams were beginning to take game AI much more
Write a letter
seriously than they had in the past. Developers were getting involved in the design of the AI earlier in
View all letters
the design cycle, and many projects were beginning to dedicate one or more programmers exclusively
to AI development. Polls from the AI roundtables showed a substantial increase in the number of
developers devoted exclusively to AI programming (see Figure 1).
It was very apparent at the 1999 GDC that this trend has continued at a healthy clip, with 60 percent
of the attendees at my roundtables reporting that their projects included one or more dedicated AI
programmers. This number is up from approximately 24 percent in 1997 and 46 percent in 1998 and
shows a growing desire on the part of development houses to make AI a more important part of their
game design. If the trend continues, we'll see dedicated AI developers become as routine as dedicated
3D engine or sound developers.
AI specialists continue to be a viable alternative for many companies that lack internal resources to
| | | |
Features
by Steven Woodcock
[Author's Bio]
November 1, 2000
Game AI: The State of the Industry
This article originally
appeared in the August In the first of this two-part report on the state of game AI, Steven Woodcock
2000 issue of: shares what issues came up while moderating the AI roundtables at the Contents
2000 Game Developers Conference. Next week, in Part Two, John E. Laird
will discuss how academics and developers can better share information with
each other, and Ensemble Studios' Dave Pottinger will peer into the future of
game AI. Trends Since Last Year
One thing was made clear in the aftermath of this year's Game Developers Can AI SDKs Help?
Conference: game AI has finally "made it" in the minds of developers,
producers, and management. It is recognized as an important part of the
game design process. No longer is it relegated to the backwater of the
schedule, something to be done by a part-time intern over the summer. For many people, crafting a
game's AI has become every bit as important as the features the game's graphics engine will sport. In
other words, game AI is now a "checklist" item, and the response to both our AI roundtables at this
Printer Friendly year's GDC and various polls on my game AI web site (www.gameai.com) bear witness to the fact that
Version developers are aggressively seeking new and better ways to make their AI stand out from that of
other games.
Discuss this
The technical level and quality of the GDC AI roundtable discussions continues to increase. More
Article important, however, was that our "AI for Beginners" session was packed. There seem to be a lot of
developers, producers, and artists that want to understand the basics of AI, whether it's so they can
go forth and write the next great game AI or just so they can understand what their programmers are
telling them.
As I've done in years past, I'll use this article to touch on some of the insights I gleaned from the
roundtable discussions that Neil Kirby, Eric Dybsand, and I conducted. These forums are invaluable for
discovering the problems developers face, what techniques they're using, and where they think the
industry is going. I'll also discuss some of the poll results taken over the past year on my web site,
some of which also provided interesting grist for the roundtable discussions.
Resources: The Big Non-issue
Last year's article (Game AI: The State of the Industry) mentioned that AI developers were (finally)
becoming more involved in the game design process and using their involvement to help craft better
AI opponents. I also noted that more projects were devoting more programmers to game AI, and AI
programmers were getting a bigger chunk of the overall CPU resources as well.
This year's roundtables revealed that, for the most part, the resource battle is over (Figure 1). Nearly
80 percent of the developers attending the roundtables reported at least one person working full-time
on AI on either a current or previous project; roughly one-third of those reported that two or more
developers were working full-time on AI. This rapid increase in programming resources has been
evident over the last few years in the overall increase in AI quality throughout the industry, and is
probably close to the maximum one could reasonably expect a team to devote to AI given the realities
of the industry and the marketplace.
Even more interesting was the amount of CPU resources that developers say they're getting. On
average, developers say they now get a whopping 25 percent of the CPU's cycles, which is a 250
percent increase over the average amount of CPU resources developers said they were getting at the
1999 roundtables. When you factor in the increase in CPU power year after year, this trend becomes
even more remarkable.
Many developers also reported that general attitudes toward game AI have shifted. In prior years the
mantra was "as long as it doesn't affect the frame rate," but this year people reported that there is a
growing recognition by entire development teams that AI is as important as other aspects of the
game. Believe it or not, a few programmers actually reported the incredible luxury of being able to say
to their team, "New graphics features are fine, so long as they don't slow down the AI." If that isn't a
sign of how seriously game AI is now being taken, I don't know what is.
Developers didn't feel pressured by resources, either. Some developers (mostly those working on
turn-based games) continued to gleefully remind everyone that they devoted practically 100 percent
of the computer's resources for computer-opponent AI, but they also admitted that this generally
allowed deeper play, but not always better play. (It's interesting to note that all of the turn-based
developers at the roundtables were doing strategy games of some kind -- more than other genres,
that market has remained the most resistant to the lure of real-time play.) Nearly every developer was
making heavy use of threads for their AIs in one fashion or another, in part to better utilize the CPU
but also often just to help isolate AI processes from the rest of the game engine.
AI developers continued to credit 3D graphics chips for their increased use of CPU resources. Graphics
programmers simply don't need as much of the CPU as they once did.
Trends Since Last Year
A number of AI technologies noted at the 1998 and 1999 GDCs has continued to grow and accelerate
over the last year. The number of games released in recent months that emphasize interesting AI --
and which actually deliver on their promise -- is a testament to the rising level of expertise in the
industry. Here's a look at some trends.
Artificial life. Perhaps the most obvious trend since the 1999 GDC was the wave of games using
artificial life (A-Life) techniques of one kind or another. From Maxis's The Sims to CogniToy's Mind
Rover, developers are finding that A-Life techniques provide them with flexible ways to create realistic,
lifelike behavior in their game characters.
The power of A-Life techniques stems from its roots in the study of real-world living organisms. A-Life
seeks to emulate that behavior through a variety of methods that can use hard-coded rules, genetic
algorithms, flocking algorithms, and so on. Rather than try to code up a huge variety of extremely
complex behaviors (similar to cooking a big meal), developers can break down the problem into
smaller pieces (for example, open refrigerator, grab a dinner, put it in the microwave). These
behaviors are then linked in some kind of decision-making hierarchy that the game characters use (in
conjunction with motivating emotions, if any) to determine what actions they need to take to satisfy
their needs. The interactions that occur between the low-level, explicitly coded behaviors and the
motivations/needs of the characters causes higher-level, more "intelligent" behaviors to emerge
without any explicit, complex programming.
The simplicity of this approach combined with the amazing resultant behaviors has proved irresistible
to a number of developers over the last year, and a number of games have made use of the
technique. The Sims is probably the best known of these. That game makes use of a technique that
Maxis co-founder and Sims designer Will Wright has dubbed "smart terrain." In the game, all
characters have various motivations and needs, and the terrain offers various ways to satisfy those
needs. Each piece of terrain broadcasts to nearby characters what it has to offer. For example, when a
hungry character walks near a refrigerator, the refrigerator's "I have food" broadcast allows the
character to decide to get some food from it. The food itself broadcasts that it needs cooking, and the
microwave broadcasts that it can cook food. Thus the character is guided from action to action
realistically, driven only by simple, object-level programming.
Developers were definitely taken with the possibilities of this approach, and there was much discussion
about it at the roundtables. The idea has obvious possibilities for other game genres as well. Imagine a
first-person shooter, for example, in which a given room that has seen lots of frags "broadcasts" this
fact to the NPCs assisting your player's character. The NPC could then get nervous and anxious, and
have a "bad feeling" about the room -- all of which would serve to heighten the playing experience and
make it more realistic and entertaining. Several developers took copious notes on this technique, so
As developers become more comfortable with their pathfinding tools, we are beginning to see complex
pathfinding coupled with terrain analysis. Terrain analysis is a much tougher problem than simple
pathfinding in that the AI must study the terrain and look for various natural features -- choke-points,
ambush locations, and the like. Good terrain analysis can provide a game's AI with multiple
"resolutions" of information about the game map that are well tuned for solving complex pathfinding
problems. Terrain analysis also helps make the AI's knowledge of the map more location-based, which
(as we've seen in the example of The Sims) can simplify many of the AI's tasks. Unfortunately, terrain
analysis is made somewhat harder when randomly generated maps are used, a feature which is
popular in today's games. Randomly generating terrain precludes developers from "pre-analyzing"
maps by hand and loading the results directly into the game's AI.
Several games released in the past year have made attempts at terrain analysis. For example,
Ensemble Studios completely revamped the pathfinding approach used in Age of Empires for its
successor, Age of Kings, which uses some fairly sophisticated terrain-analysis capabilities. Influence
maps were used to identify important locations such as gold mines and ideal locations for building
placement relative to them. They're also used to identify staging areas and routes for attacks: the AI
plots out all the influences of known enemy buildings so that it can find a route into an enemy's
domain that avoids any possible early warning.
Another game that makes interesting use of terrain analysis is Red Storm's Force 21. The developers
used a visibility graph (see "Visibility Graphs" sidebar) to break down the game's terrain into distinct
but interconnected areas; the AI can then use these larger areas for higher-level pathfinding and
vehicle direction. By cleanly dividing maps into "areas I can go" and "areas I can't get to," the AI is
able to issue higher-level movement orders to its units and leave the implementation issues (such as
not running into things, deciding whether to go over the bridge or through the stream, and so on) to
the units themselves. This in turn has an additional benefit: the units can make use of the A*
algorithm to solve smaller, local problems, thus leaving more of the CPU for other AI activity.
Formations. Closely related to the subject of pathfinding in general is that of unit formations --
techniques used by developers to make groups of military units behave realistically. While only a few
developers present at this year's roundtables had actually needed to use formations in their games,
the topic sparked quite a bit of interest (probably due to the recent spate of games with this feature).
Most of those who had implemented formations had used some form of flocking with a strict overlying
rules-based system to ensure that units stayed where they were supposed to. One developer, who was
working on a sports game, said he was investigating using a "playbook" approach (similar to that used
by a football coach) to tell his units where to go.
| | | |
Features
by Steven Woodcock
[Author's Bio]
Can AI SDKs Help?
November 1, 2000
The single biggest topic of discussion at the GDC 2000 roundtables was the
This article originally feasibility of AI SDKs. There are at least three software development kits Contents
appeared in the August currently available to AI developers:
2000 issue of:
● Mathématiques Appliquées' DirectIA, an agent-based toolkit that
uses state machines to build up emergent behaviors.
Trends Since Last Year
● Louder Than A Bomb's Spark!, a fuzzy-logic editor intended for AI
engine developers. Can AI SDKs Help?
● The Motion Factory's Motivate, which can provide some fairly
sophisticated action/reaction state machine capabilities for animating
characters. It was used in Red Orb's Prince of Persia 3D, among
others.
Many developers (especially those at the "AI for Beginners" session) were relatively unaware of these
toolkits and hence were very interested in their capabilities. It didn't seem, however, that many of the
Printer Friendly more experienced developers thought these toolkits would be all that useful, though a quick poll did
Version reveal that one or two developers were in the process of evaluating the DirectIA toolkit. Most
expressed the opinion that one or more SDKs would come to market that would prove them wrong.
In discussing possible features, most felt that an SDK that provided simple flocking or pathfinding
functions might best meet their needs. One developer said he'd like to see some kind of standardized
Letters to the Editor: "bot-like" language for AI scripts, though there didn't seem to be any widespread enthusiasm for this
Write a letter idea (probably because of fears it would limit creativity). Also discussed briefly in conjunction with this
View all letters topic was the matter of what developers would be willing to pay for such an SDK, should a useful one
actually be available. Most felt that price was not a particular object; developers today are used to
paying (or convincing their bosses to pay) thousands of dollars for toolkits, SDKs, models, and the
like. This indicates that if somebody can develop an AI SDK flexible enough to meet the demands of
developers, they should be able to pay the rent.
Technologies on the Wane
It's become clearer since last year's roundtables that the influence of
the more "nontraditional" AI techniques, such as neural networks and Visibility Graphs
genetic algorithms (GAs), is continuing to wane. Whereas in previous
years developers had many stories to tell of exploring these and other
technologies during their design and development efforts, at this
________________________________________________________
| | | |
Features
by Dave C. Pottinger
[Author's Bio]
November 8, 2000
Game AI: The State of the Industry, Part Two
This article originally
appeared in the August Last week in Part One of this article, Steven Woodcock took inventory of the
current state game AI, based on the roundtables he led at the 2000 Game Contents
2000 issue of:
Developers Conference. Now in Part Two, Ensemble Studios' Dave Pottinger
looks at what the future holds for game AI, and University of Michigan
Professor John E. Laird discusses bridging the gap between AI researchers Better AI Development
and game developers. Better Pathfinding AI
As I slowly reclined back into the seat of the last E3 bus this spring, I was
Bridging the Gap by Prof.
certain of two things: some really great games were coming out in the next John E. Laird
year and my feet hurt like hell. A lot of the games that created a buzz
featured excellent AI.Since my fellow Ensembleites assured me (repeatedly)
that no one really cared to hear about my feet, I thought I'd use this space to talk about some of the
games coming out in the next 18 months and the new and improved AI technology that will be in
them.
Printer Friendly
Version Better AI Development Processes and Tools
Discuss this AI has traditionally been slapped together at the eleventh hour in a product's development cycle. Most
programmers know that the really good computer-player (CP) AI has to come at the end because it's
Article darn near impossible to develop CP AI until you know how the game is going to be played. As the use
of AI in games has matured, we're starting to see more time and energy spent on developing AI
systems that are modular and built in a way that allows them to be tweaked and changed easily as the
gameplay changes. This allows the AI development to start sooner, resulting in better AI in the final
product. A key component in improving the AI development process is building better tools to go along
with the actual AI.
For Ensemble's third real-time strategy (RTS) game, creatively code-named RTS3, we've spent almost
a full man-year so far developing a completely new expert system for the CP AI. It's been a lot of work
taking the expert system (named, also creatively, XS) from the in-depth requirements discussions with
designers to the point where it's ready to pay off. We've finally hit that payoff and have a very robust,
extensible scripting language.
The language has been so solid and reusable that, in addition to using it to write the CP AI content,
In the early days of first-person shooters, non-player characters (NPCs) had the intelligence of nicely
rounded rocks. But they've been getting much better lately -- look no further than Half-Life's
storytelling NPCs and Unreal Tournament's excellent bot AI. The market success of titles such as these
has prompted developers to put more effort into AI, so it looks as if smarter NPCs will continue to
show up in games.
Grey Matter Studios showed some really impressive technology at E3 with Return to Castle
Wolfenstein. When a player throws grenades at Nazi guards, those guards are able to pick up the
grenades and throw them back at the player, adding a simple but very effective new wrinkle to NPC
interactivity. A neat gameplay mechanic that arises out of this feature is the player's incentive to hold
on to grenades long enough so they explode before the guards have a chance to throw them back.
Thankfully, Grey Matter thought of this and has already made the guards smart enough not to throw
the grenades back if there's no time to do so.
More developers are coupling their AI to their animation/simulation systems to generate characters
which move with more realism and accuracy. Irrational did this with System Shock 2 and other
developers have done the same for their projects. The developers at Raven are doing similar things
with their NPC AI for Star Trek: Elite Force. They created a completely new NPC AI system that's
integrated into their Icarus animation system. Elite Force's animations are smoothly integrated into the
character behavior, which prevents pops and enables smooth transitions between animations. The
result is a significant improvement to the look and feel of the game. I believe that as the use of
inverse kinematics in animation increases, games will rely on advanced AI state machines to control
and generate even more of the animations. As a side benefit, coupling AI to animation gives you the
benefit of more code reuse and memory savings.
Better Communication Using AI
Since the days of Eliza and HAL, people have wanted to talk with their computers. While real-time
voice recognition and language processing are still several years off, greater strides are being made to
let players better communicate with their computer opponents and allies.
For example, in our upcoming Age of Empires: The Age of Kings expansion pack, The Conquerors,
we've enabled a chat communication system that lets you command any computer player simply by
sending a chat message or selecting messages from a menu. Combined with AoK's ability to let you
script your own CP AI, this lets you craft a computer ally that plays on its own and lets you have
conversational exchanges with it in random-map games. This is a small step toward the eventual goal
of having players talk to their computer allies in the same way as to humans. Unfortunately, we still
have to wait a while for technology to catch up to our desire.
________________________________________________________
| | | |
Features
by Eric Dybsand
Gamasutra
[Author's Bio] Game Developers Conference 2001: An AI
April 23, 2001
Perspective
With each year that I attend the conference (I have attended 14 of the 16 conferences that have been
held so far) there are conflicts with sessions, networking opportunities and just the sheer magnitude of
Tutorial:"Artificial Life the conference, which prevents me from attending all the great sessions that I am interested in. I am
for Computer Games" most disappointed about being unable to attend "Design Plunder" (lecture by Will Wright) and "Those
Darn Sims: What Makes Them Tick?" (lecture by Jamie Doornbos), both which discussed The Sims
"The Basics of Team
AI"
game AI.
Printer Friendly That being said, I was able to attend many other excellent computer game AI related sessions. The
Version following represents the perspective I obtained, from the computer game AI related sessions I did
attend during the recent GDC 2001 in San Jose.
Discuss this
Article Tuesday, March 20, Tutorial:"Artificial Life for Computer Games"
This tutorial was an update from the tutorial by the same name, presented first at last year's GDC. The
same speakers were present: Bruce Blumberg, John Funge, Craig Reynolds and Demetri Terzopoulos.
With its focus on applications of artificial life techniques, this tutorial offered the new-to-ALife attendee
a comprehensive look at some of the research work of these noted ALife experts.
Since I had sat through the same entire tutorial last year, and from attending the first hour of this
year's tutorial, and having a conflict with another tutorial at the same time, I can only comment on
one speaker's presentation. That speaker was Bruce Blumberg who described the latest status of the
work on virtual creatures done by the Synthetic Characters Group at the MIT Media Lab. Specifically,
Blumberg reviewed the status of the research and development on Duncan, a virtual dog that behaves
autonomously. In a session to be held later in the conference (that I discuss later in this report) two of
Blumberg's students presented more detail regarding the architecture and development of Duncan.
Because I had to dash off to the conflicting tutorial, I was not able to attend the presentations by
Funge, Reynolds or Terzopoulos. So, I can only speculate that Reynolds offered an update of his work
with steering behaviors and flocking (for which he is credited with being the "Father of Flocking") and
probably demonstrated his hockey game application of these low level behaviors. Also, I would
suggest that Terzopoulos provided an update of his work with physics based locomotion learning.
Tuesday, March 20, Tutorial: "Cutting Edge Techniques for Modeling &
Simulation III"
This is the tutorial that conflicted with the Artificial Life for Computer Games tutorial and was a
presentation by Roger Smith, of the status of techniques used in military simulations. I had missed the
GDC 2000 version of this tutorial because I had sat through the complete Artificial Life for Computer
Games tutorial for GDC 2000. [As I mentioned in the beginning of this report, the sheer magnitude of
the GDCs means that conflicts arise in what sessions a person wants to see.]
Much of the first part of this tutorial, was more relevant to game design (primarily war game, flight
sim and FPS game design) as Smith went into some detail regarding the history and design of military
training simulations. In doing so, Smith presented some interesting parallels between game and
simulation development. Smith further reviewed some code models, interface specifications and object
declarations found in use in today's military simulations.
As Smith discussed modeling concepts, many of his examples came from The Sims computer game by
Maxis, which was the most successful sim game released last year. And as he presented an AI Vehicle
Movement Model, I found myself relating to my own current work in developing an artificial driver for a
soon-to-be-released Trans-Am car racing game.
Probably the part of the presentation that related the most widely to computer game AI was when
Smith reviewed behavioral modeling. The design needs of an intelligent agent for a military simulation
are very much like those for most computer games (where behavior is to be expected to appear
intelligent). During this part of the presentation, Smith reviewed: simple reflex agents, goal-based
agents and utility-based agents (all of which I personally have seen in use in computer game AI
implementations). Smith further discussed finite state machines (both in singular and hierarchal
usage), expert systems, Markov chains, constraint satisfaction and fuzzy logic. All of these concepts
are widely used in computer game AI development. (Well, maybe not hidden Markov chains.) Perhaps
the best take-away for an AI programmer attending this tutorial, was the opportunity to see a variety
of techniques "and how they were being used" in the way the military designs and implements its
training simulations. As a result, this was certainly a tutorial well worth attending, for this AI
programmer.
The section of the tutorial that covered planning was especially interesting to me, as the speakers
described the planning process of a bot playing Quake II. Various selection criteria were reviewed, and
multi-step look-ahead techniques were suggested. Since my AI development tends to produce agents
that are very goal-oriented (a component of planning) this section was very relevant to me.
The speakers concluded the tutorial with a presentation of components of their SOAR-based Quake bot
work. During this section a variety of 'bot behaviors were described and the SOAR approach
presented. What stood out for me, within this section, was the approach by the SOAR-based bot to
anticipate its enemy's actions.
While this tutorial was interesting and enlightening, I am sure I still would be fragged if I encountered
the SOAR-based bot in a death match, despite now knowing more about the processes that it uses to
make decisions.
________________________________________________________
| | | |
Features
by Ian Wright and
James Marshall
Gamasutra
June 19, 2000 More AI in Less Processor Time: 'Egocentric'
Printer Friendly
Version
AI
The design brief for the new game's AI has just been handed to you, and to
call it optimistic would be an understatement. You are charged with Contents
Discuss this Article developing a real-time living, breathing city, populated by thousands of
pedestrians, hundreds of cars, and dozens of non-player characters. The
Games Should Be Fun
'incidental' pedestrians and traffic need to react convincingly to each other
and to your actions, while the NPCs absolutely positively must act in a Process Manager
believable manner when you encounter them. It's going to be
computationally expensive, but you've only been given 20% of the processor Process Peaking
time each frame, and if you exceed that and the game frames out, you've
failed.
Modern games will increasingly make such demands on hardware and
programmers. Fortunately help is at hand with techniques to control and manage real-time AI
execution, techniques that open up the possibility of future hardware acceleration of AI.
Games Should Be Fun
Games should be fun. This requirement has many consequences. One important consequence is that
games that allow player input at any moment ("arcade-style" games) should run in real-time,
presenting events that occur sufficiently fast to challenge the player's reactions. Lower frame-rates
look bad, reduce the opportunity for interaction, increase player frustration, and do not reflect the
speed of events in the real world. With this firmly in mind, we set out to design a framework for the
general execution of AI code.
Latter stages of a game project involve optimising parts of game code for processing time reductions.
This includes AI code, which, depending on the type of game, can take up more or less of the available
CPU time. Given this, an important requirement for general AI execution is that (a) it conforms to the
timing constraint of the overall game frame rate. A consequence of (a) is that the AI never exceeds a
maximum per-frame processing time.
Letters to the Editor: AI requires the execution of arbitrarily complex and heterogeneous pieces of code, often grouped
Write a letter
together as behavioural "rules" or "behavioursets" for various game objects or agents, such as the AI
View all letters
code for a trapdoor, obstacle, spring, or the code for an adversary, racing vehicle or character.
Therefore, a further requirement for general AI execution is that (b) it makes no assumptions about
the exact nature of the AI code, including assumptions about how long the code will take to execute.
Rendering code normally has to execute every frame in order to construct the visual scene. The
situation is different for AI code. Consider a soccer player, who may need to check for passing and
shooting opportunities every frame, but only need check its position against the team's formation
every other frame, or only in a dead-ball situation. AI code involves a wide range of execution
frequencies compared to non-AI game code. If all AI code is fully executed every frame when this is
not required then the resulting code is inefficient. Also, some games require different execution
frequencies for objects and agents, in addition to controlling the execution frequencies of their internal
processes. For example, a very slow moving tortoise need not be processed every frame, whereas the
hare may need to be. Hence, a further requirement for general AI execution is (c) it allows different
execution frequencies to be specified both for agents and their constitutive internal processes.
Finally we realised that some AI processes can be extensively time-sliced across many frames,
particularly if the results of the process are not immediately required. For example, if a strategy game
agent needs to plan a route through a terrain, then the planning can potentially take place over many
frames before the agent actually begins to traverse the deduced route. Time slicing allows
computationally expensive processes to be 'smeared' across many frames thereby reducing the per
frame CPU hit. Therefore, a final requirement for general AI execution is (d) it allows AI processes to
be dynamically suspended and reactivated.
An arcade-style game is somewhat like the real world, consisting of both active and passive agents
and events that unfold over time. But the game need not process every object, agent and event in the
virtual world in order to present a believable, entertaining experience. For example, if a truck is within
the player's field of view when planting a mine then the game necessarily needs to process the truck
movement and the mine drop, and the rendering code necessarily needs to draw this event to the
screen. However, if the truck is 'off-screen' the rendering code need not be run, and the AI code
controlling the truck could simply assert the existence of a mine on the road at a certain time, rather
than processing the fine-grained movement of the truck. Virtual game worlds need to present a
believable world to the player, and not necessarily present an accurate simulation of the real world.
Events not 'interactively close' to the human player need not be fully processed. Therefore,
requirement (a) can be satisfied if some AI processes need only be "believable" rather than "accurate".
These kinds of processes can be time-sliced over many frames, executed at a lower frequency, be
replaced with computationally less expensive "default" behaviours, or simply postponed. Furthermore,
what may need to be "accurate" at one time may need to be only "believable" at another, depending
on the current activities of the human player. We call the idea of prioritising the update of parts of the
game world currently most relevant to the player "egocentric processing". Our Process Manager
implements this idea.
________________________________________________________
Process Manager
See Also:
Artificial Intelligence:Gaming
The posts are presented essentially as is, with some *minor* editing
on my part for formatting.
Here are the e-mail addresses for those contributors whom I have.
Again, my profound apologies if I missed anybody; PLEASE let me know
and I'll correct this forthwith:
Hopefully this will spark a renewal of the original thread and prove to
be informative to all concerned. I know that *I* have found this to be
this most illustrative and informative thread I've ever seen on the Net;
this is truly where this medium shines.
Enjoy!
Steven
Thanks in Advance.
Andrae Muys.
==============================================================================
: Thanks in Advance.
: Andrae Muys.
Andrae:
Glad to see I'm not the only one wrestling with this problem! ;)
front -- Where the mass of the enemy units are known to be. The
direction I want to attack.
flank -- Any area in which there are fewer (1/3 ?) as many enemy
units as the area designed as 'front'. (Note this is
purely arbitrary, based as much on prior experience as
anything else.) Perhaps also selected based on natural
defensive terrain (i.e., oceans or mountains).
One problem I can think of off the top of my head is how to handle
multiple front situations; there's at least some possibility of overlapping
definitions, meaning that some precedence order must be established.
Special exceptions will also have to be made for overflying enemy aircraft
and incursions by enemy units of various types. (Example: If the enemy
drops some paratroopers into your 'rear' area, does it automatically become
a 'front'?)
Steven
==============================================================================
In article <3onpj8$lbc@theopolis.orl.mmc.com>,
woodcock@escmail.orl.mmc.com (Steve Woodcock) wrote:
> : extensive with *extremely* basic rules and mechanics. However as each
> : unit may still move once each turn, and a number of distances the
> : branching factor puts the CHESS/GO thread to shame. In fact a lower
> : bound calculated from rules far simpler than I intend to use ended with a
> : branching factor of 2.6^8. A simple PLY search is therefore out of the
> : question. Brainstorming suggested that by abstracting the battlefield
> : into three frontal sections and maybe a reserve leaves a basic set of
> : rules with a branching factor of approx 16(much more manageable).
> : However to implement this the AI needs to be able to recognise what
> : constitues a flank/centre/rear/front etc... From practical wargaming
> : experience this is something that is normally arrived at by intuition.
> : Surely it is a problem which has been faced before and I was wondering if
> : there was any theory/code/code examples which I might use to build
> : such an engine.
> : In the meantime I intend to start on the AI assuming that a way can be
> : found to recognise such stratigic dispositions.
>
> : Thanks in Advance.
> : Andrae Muys.
>
> Andrae:
>
>
> Glad to see I'm not the only one wrestling with this problem! ;)
> Your approach to break things down into 'front', 'flank', and
> 'rear' makes sense and seems like a reasonable simplification of the problem.
> A first-order definition of each might be:
you define your 'center', their 'center' and the board 'center'. Anything
closer to your center than theirs is yours and can be taken. If they take
it back then it takes you less time to re-take than it takes them - if
they bother to take it they lose more time than you.
The idea is to
i) move your center towards the center of the board, then
ii) move your center towards their center, always keeping it between
their center and the center of the board. This should push them off the
board.
There is also one other interesting piece of info we've come across. It
Most people would say that the 'Y' in the following diagram is not between
the 'X's:
X X
Whereas, they would say that the Y is between these two X's:
X X
Y
The definition we found for 'between' or 'blocking' and hence allows you
get neighbour (nothing is between) is as follows.
i) draw a circle between the two items such that the items sit on either
end of a diameter of the circle.
ii) If there is anything inside that circle it is between the two items
otherwise it's not.
\x/ill :-}
==============================================================================
Andrae.
==============================================================================
: front -- Where the mass of the enemy units are known to be. The
: direction I want to attack.
: flank -- Any area in which there are fewer (1/3 ?) as many enemy
: units as the area designed as 'front'. (Note this is
: purely arbitrary, based as much on prior experience as
: anything else.) Perhaps also selected based on natural
: defensive terrain (i.e., oceans or mountains).
:
: These definitions work by identifying the front, then extrapolating
: from that. As enemy units move around, become detected, attack, etc.,
: the 'size' of the front will likely grown and shrink, forcing similar changes
: to the flanks (especially) and perhaps to the rear areas as well.
Identifing the front first and then defining the rest w.r.t it would seem
to simplify the problem further. I hadn't thought of that, it looks like
a good idea. However one question to contimplate. Where are the fronts
in the following position. {X - YOURS, Y - THEIRS}
: One problem I can think of off the top of my head is how to handle
: multiple front situations; there's at least some possibility of overlapping
: definitions, meaning that some precedence order must be established.
: Special exceptions will also have to be made for overflying enemy aircraft
: and incursions by enemy units of various types. (Example: If the enemy
: drops some paratroopers into your 'rear' area, does it automatically become
: a 'front'?)
This is why I am using a very basic set of game mechanics, and using a
different era(see other post). This way the only way troops can reach
your rear is to march there. Also there are very few battles in this era
with multiple fronts. Although allowance must be made for bent and
twisted fronts. The hinge being a very critical point in an extended line.
In the rules I have in mind, most cases you will only have mass attacks
or at least dense fronts. One problem you do have if you try to model a
high echelon game such as the eastern front(WWII) what happened next.
The russian front fragmented and from one dense front you ended up with
hundreds of small localised fronts, the resulting loss of cohesion being
one of the greatest advantages of blitzcrieg. Because cohesion is so
much more important at a grand stratigic level(not that it isn't in
stratagies at a operational/tactical level) I feel that a search for a
front maybe counter productive. My gut feeling is that it would be
better to consider area controled by your forces, controlled by their
forces, and contested. With an emphisis on your forces maintaining
unbroken contact between spheres of influence. So the insertion of
forces 'behind the lines' would only alter the balance of control in the
local area. A domino effect would be possible where forces stratigicly
inserted would weaken a units control of an area weakening a unit relying
on it for its 'cohesive link' weaking its control of another area and so
on. However this is what happens in real life so if any thing it
suggests that it may be a good approach.
ditto.
: Steven
Andrae
==============================================================================
In real life, I would imagine one of the main targets in any campaign to
be supply lines. For example, The Dambusters is a movie about some
special bombers with special bombs desiged to destroy dams, with the aim
of crippling Germany's iron/steel industry. General Custer was in trouble
because he was surrounded and cut off from supplies and reinforcements
(yes, my knowledge is very sketchy).
I heard someone claim once that war is about economic ruin rather than
outright carnage. Is there any way your AI can calculate the move that
will cause most damage to industry and support, rather than shoot the
most enemy? Of course, these strategies apply to wars, not battles...
Just a thought...
-Alex
==============================================================================
: In real life, I would imagine one of the main targets in any campaign to
: be supply lines. For example, The Dambusters is a movie about some
: special bombers with special bombs desiged to destroy dams, with the aim
: of crippling Germany's iron/steel industry. General Custer was in trouble
: because he was surrounded and cut off from supplies and reinforcements
: (yes, my knowledge is very sketchy).
Yes you are right one of the major considerations at a Stratigic level is
supply, how do I attack yours, how do I protect mine. One point
concerning General Custer however, his problem wasn't so much that his
supply lines were cut, more that he was surrounded with no avenue of
retreat. This is a position which is so inheriently poor that any AI
should automatically avoid it without any requirement for a 'special case'.
With the game mechanics we have been considering of late the AI won't
have to be concerned with most of these problems. I personally can't see
how defining a front 'where you want it to be' is useful although this is
probably more me not thinking it though properly than a problem with the
idea. What do you mean by it, and is it in anyway related to the concept
of critical point/contact point currently being discussed?
: I heard someone claim once that war is about economic ruin rather than
: outright carnage. Is there any way your AI can calculate the move that
: will cause most damage to industry and support, rather than shoot the
: most enemy? Of course, these strategies apply to wars, not battles...
BTW: does anyone know if there is a e-text version of The Art of War
anywhere?
Andrae.
==============================================================================
: : I heard someone claim once that war is about economic ruin rather than
: : outright carnage. Is there any way your AI can calculate the move that
: : will cause most damage to industry and support, rather than shoot the
: : most enemy? Of course, these strategies apply to wars, not battles...
: Personally I prefer Sun Tzu's philosophy. Basically it holds that to win
: without fighting is best, and the aim of war is to capture territory
: without damaging it.
: BTW: does anyone know if there is a e-text version of The Art of War
: anywhere?
Steven
==============================================================================
http://timpwrmac.clh.icnet.uk/Docs/suntzu/szcontents.html
for the Art of War (not the 1960's translation, an older one) , and
http://fermi.clas.virginia.edu/~gl8f/paradoxes.html
for George Silver's Paradoxes of defence, which is probably
in a similarr vein, but I have not got around to reading it yet.
==============================================================================
<<>>
: There is also one other interesting piece of info we've come across. It
:
: most people would say that the 'Y' in the following diagram is not between
: the 'X's:
: X X
: Y
: Whereas, they would say the the Y is between these two X's:
: X X
: Y
Here I would consider Y to have interdicted the link, or the X's are
still neighbours but they have outflanked/maybe even overrun, Y.
: The definition we found for 'between' or 'blocking' and hence allows you
: get neighbour (nothing is between) is as follows.
: i) draw a circle between the two items such that the items sit on either
: end of a diameter of the circle.
: ii) If there is anything inside that circle it is between the two items
: otherwise it's not.
Andrae.
==============================================================================
: you define your 'center', their 'center' and the board 'center'. Anything
: closer to your center than theirs is yours and can be taken. If they take
: it back then it takes you less time to re-take than it takes them - if
: they bother to take it they lose more time than you.
: The idea is to
: i) move your center towards the center of the board
: then ii) move your center towards their center, always keeping it between
: their center and the center of the board. This should push them off the
: board.
: There is also one other interesting piece of info we've come across. It
: involves finding neighbours.
: most people would say that the 'Y' in the following diagram is not between
: the 'X's:
: X X
: Y
: Whereas, they would say the the Y is between these two X's:
: X X
: Y
: The definition we found for 'between' or 'blocking' and hence allows you
: get neighbour (nothing is between) is as follows.
: i) draw a circle between the two items such that the items sit on either
: end of a diameter of the circle.
: ii) If there is anything inside that circle it is between the two items
: otherwise it's not.
: \x/ill :-}
Hmmm...this does have some merit to it. I like the idea of the 'center'
being arrived at via this circle-method; it has an elegance to it that also
is somewhat intuitive.
The only potential problem I can see with this approach is that it will
tend towards great massive brute force engagements by migrating the bulk
of both forces towards a common 'center'. This is fine in the case of
two combatants (i.e., two Bolo tanks) but not so good for two armies.
x x x
y1 y2 y3 y4
x x x
Opinions? Comments?
Steven
==============================================================================
In the type of combat I was thinking of, you each have only one active
piece that moves around the board moving the other 'static' pieces. If
Player A surrounds Player B, but Player B has supplies 'inside', then
player B has the
advantage. e.g.
1 2
3 4
Supplies
5 6
7 8
Assume that 1,2,7 and 8 are owned by A and effectively 'surround' B who
owns 3, 4, 5 and 6 (and some supplies that mean he never has to leave his
fort). B can attack 1. when A moves there to defend, B can break off and
attack 8. A has a long way to go to get to 8 (A has to go right around
whereas B can go through the centre) and so will probably lose that
piece. Being more spead-out than the opposition can be a big problem.
I agree that this is probably not the type of combat you were thinking
about. For an example of this type of combat look at the mac game Bolo.
\x/ill :-}
P.S. as a side note, Sun Tzu (sp?) in his 'Art of War' recommends against
sieges which is effectively the situation you have above.
==============================================================================
> I think we could solve the multiple front problem if we generalized the
>problem to find SEVERAL 'localized centers', thus allowing for multiple axes
>of advance along a somewhat more fluid 'front'. In the case of two armies,
>you might get something like this:
>
>
> x x x
>
> y1 y2 y3 y4
>
> x x x
>
> In this case, each of the x-x pairs make INDEPENDENT determinations
>of what lies 'between' them. Then, based on the relative combat strengths
>and other factors, you could issue separate orders for each section of
>the battlefield. This effectively sets up a variety of 'mini-centers'
>(using our terminology from above) and more realistically (IMHO) emulates
>realworld operations (i.e., lots of mini-objectives, the possibility for
>overlapping objectives, etc.).
Hmmm...
Robert Uhl
==============================================================================
First off, let me just say that I think this is a *great* thread, easily
one of the more interesting I've seen in this newsgroup. *This* is the kind
of brainstorming the Net was made for....
: Identifing the front first and then defining the rest w.r.t it would seem
: to simplify the problem further. I hadn't thought of that, it looks like
: a good idea. However one question to contimplate. Where are the fronts
: in the following position. {X - YOURS, Y - THEIRS}
I would agree with your assessment of the situation and your breakdown
of the forces into a front. Obviously in this case, X either a.) needs
to rapidly execute a turn of his front or b.) is in the midst of a brilliant
plan that wil prove to be Y's undoing. (The challenge, of course, is to
get a computer AI to execute 'b' more often than 'a'.)
: : One problem I can think of off the top of my head is how to handle
: : multiple front situations; there's at least some possibility of overlapping
: : definitions, meaning that some precedence order must be established.
: : Special exceptions will also have to be made for overflying enemy aircraft
: : and incursions by enemy units of various types. (Example: If the enemy
: : drops some paratroopers into your 'rear' area, does it automatically become
: : a 'front'?)
: This is why I am using a very basic set of game mechanics, and using a
: different era(see other post). This way the only way troops can reach
: your rear is to march there. Also there are very few battles in this era
: with multiple fronts. Although allowance must be made for bent and
: twisted fronts. The hinge being a very critical point in an extended line.
Okay; let's go with that simplification for now. It'll certainly make this
easier to think about, and we can always make the AI smarter in Rev 2.0! ;)
: In the rules I have in mind, most cases you will only have mass attacks
: or at least dense fronts. One problem you do have if you try to model a
: high echelon game such as the eastern front(WWII) what happened next.
: The russian front fragmented and from one dense front you ended up with
: hundreds of small localised fronts, the resulting loss of cohesion being
: one of the greatest advantages of blitzcrieg. Because cohesion is so
: much more important at a grand stratigic level(not that it isn't in
: stratagies at a operational/tactical level) I feel that a search for a
: front maybe counter productive. My gut feeling is that it would be
: better to consider area controled by your forces, controlled by their
: forces, and contested. With an emphisis on your forces maintaining
: unbroken contact between spheres of influence. So the insertion of
: forces 'behind the lines' would only alter the balance of control in the
: local area. A domino effect would be possible where forces stratigicly
: inserted would weaken a units control of an area weakening a unit relying
: on it for its 'cohesive link' weaking its control of another area and so
: on. However this is what happens in real life so if any thing it
: suggests that it may be a good approach.
Okay then, fronts are out. Spheres of influence are in. They do seem
to better reflect the 'domino effect', as you suggest.
natural way of handling the above situation. Based on what we've discussed
so far, I would envision an AI's logic train going something like this:
Steven
==============================================================================
: I would agree with your assessment of the situation and your breakdown
: of the forces into a front. Obviously in this case, X either a.) needs
: to rapidly execute a turn of his front or b.) is in the midst of a brilliant
: plan that wil prove to be Y's undoing. (The challenge, of course, is to
: get a computer AI to execute 'b' more often than 'a'.)
must extricate itself well. Just one more 'special' situation to test
the AI's abiltiy.
Well this thread is useful. The idea of contact points should radically
prune any decision tree.(And sooner or later the AI will have to make a
choice) Of course at the stratigic/grand stratigic levels we may need a
modified definition of contact point but at the level I am interested in
Contact points appear to be a good way to look at things. In fact now
that I think about it, contact points are how **I** allocate **MY**
consideration. This approach however leads us to consider how to
recognise potential contact points and how to evaluate the relitive
benifits of creating/avoiding specific contact points. e.g. in the
example above X should avoid contact with Y until he has rotated his
front.(it looks like we may still need to consider fronts as well).
: : This is why I am using a very basic set of game mechanics, and using a
: : different era(see other post). This way the only way troops can reach
: : your rear is to march there. Also there are very few battles in this era
: : with multiple fronts. Although allowance must be made for bent and
: : twisted fronts. The hinge being a very critical point in an extended line.
: Okay; let's go with that simplification for now. It'll certainly make this
: easier to think about, and we can always make the AI smarter in Rev 2.0! ;)
My thoughts exactly.
<<<>>>
: : front maybe counter productive. My gut feeling is that it would be
: : better to consider area controled by your forces, controlled by their
: : forces, and contested. With an emphisis on your forces maintaining
: : unbroken contact between spheres of influence. So the insertion of
: : forces 'behind the lines' would only alter the balance of control in the
: : local area. A domino effect would be possible where forces stratigicly
: : inserted would weaken a units control of an area weakening a unit relying
: : on it for its 'cohesive link' weaking its control of another area and so
: : on. However this is what happens in real life so if any thing it
: : suggests that it may be a good approach.
: Okay then, fronts are out. Spheres of influence are in. They do seem
: to better reflect the 'domino effect', as you suggest.
: so far, I would envision an AI's logic train going something like this:
One more thing about cavalry is that the other arms(INF/ART) can't
operate this way as it requires a level of speed.
Andrae.
==============================================================================
>: |
>: | What do you think?
>: --------C|
>: |
>: |
>
>
>
> I would agree with your assessment of the situation and your breakdown
>of the forces into a front. Obviously in this case, X either a.) needs
>to rapidly execute a turn of his front or b.) is in the midst of a brilliant
>plan that wil prove to be Y's undoing. (The challenge, of course, is to
>get a computer AI to execute 'b' more often than 'a'.)
> If we define a contact point, then does that give us a natural focus
>towards which to direct our forces and our strategic 'thinking'? They would
>seem to.
> Okay then, fronts are out. Spheres of influence are in. They do seem
>to better reflect the 'domino effect', as you suggest.
The sphere of influence idea seems to work well with the points of
contact idea. Perhaps each unit has a sphere wherein it can contact
within a certain number fo moves (say 3), and its job is to contact
the fewest enemies at once but the most over time. IOW, it doesn't
want to be outnumbered but it wants to see action.
And two units have a greater sphere of influence, say 8 moves, than
just one. This would help reflect the greater power of two. Controlled
territory would be defined as that surrounded by my own units and
without enemy units, once again utilizing the SoIs. Contested would,
of course, be that which neither side controls. Perhaps a 'strength'
value would be attached to areas, indicating the number of units
controlling it, time for more to get there &c. Thiswould provide an
incentive for the AI to encircle enemy territory.
Robert Uhl
==============================================================================
Good point. Yet another example of how a human can spot an opportunity
in the face of the most daunting situations....
: > If we define a contact point, then does that give us a natural focus
: >towards which to direct our forces and our strategic 'thinking'? They would
: >seem to.
I'm not sure I agree that I would want the AI to maximize its number
of contacts with the enemy; I would agree that it should seek to control
them.
: The sphere of influence idea seems to work well with the points of
: contact idea. Perhaps each unit has a sphere wherein it can contact
: within a certain number fo moves (say 3), and its job is to contact
: the fewest enemies at once but the most over time. IOW, it doesn't
: want to be outnumbered but it wants to see action.
: And two units have a greater sphere of influence, say 8 moves, than
: just one. This would help reflect the greater power of two. Controlled
: territory would be defined as that surrounded by my own units and
: without enemy units, once again utilizing the SoIs. Contested would,
: of course, be that which neither side controls. Perhaps a 'strength'
: value would be attached to areas, indicating the number of units
: controlling it, time for more to get there &c. Thiswould provide an
: incentive for the AI to encircle enemy territory.
Steven
==============================================================================
Two options I can see here - either X moves its forces into Y's "front"
to create as much damage as possible (in a concentrated strike or
"blitzkreig" style attack) or X moves its "front" back, forcing Y to make
the next move (allowing X the advantage in defence).
Just a thought...
-Alex
==============================================================================
I appologise for the excessive quoting but without the diagram any reply
is awkward.
: Two options I can see here - either X moves its forces into Y's "front"
: to create as much damage as possible (in a concentrated strike or
: "blitzkreig" style attack) or X moves its "front" back, forcing Y to make
: the next move (allowing X the advantage in defence).
Andrae.
==============================================================================
;)
: Well this thread is useful. The idea of contact points should radically
: prune any decision tree.(And sooner or later the AI will have to make a
: choice) Of course at the stratigic/grand stratigic levels we may need a
: modified definition of contact point but at the level I am interested in
: clearified my concerns.
The AI will end up being a bit 'bookish', if you will, but certainly
ought to surprise you once in a while.
Steven
==============================================================================
> : One problem I can think of off the top of my head is how to handle
> : multiple front situations; there's at least some possibility of overlapping
> : definitions, meaning that some precedence order must be established.
> : Special exceptions will also have to be made for overflying enemy aircraft
> : and incursions by enemy units of various types. (Example: If the enemy
> : drops some paratroopers into your 'rear' area, does it automatically become
> : a 'front'?)
.
> This is why I am using a very basic set of game mechanics, and using a
> different era(see other post). This way the only way troops can reach
> your rear is to march there. Also there are very few battles in this era
> with multiple fronts. Although allowance must be made for bent and
> twisted fronts. The hinge being a very critical point in an extended line.
I don't know if it can help, however, when I was thinking about how to
build a program for a computer opponent for Squad Leader (an Avalon Hill's
boardgame) I didn't consider directly recognizing the troops patterns. I
would build intermediate maps describing entities like density of fire,
distances in unit movement points (rether than linear distance),
probability of passing the zone without damage, etc. and base the
reasoning on this intermediate maps. It turns out that looking (as an
example) at the fire distribution makes it clearer if some allineated
troops makes a real front or not. Many sub tactical problems could be
solved by looking for the shortest path in some of these maps: from what
side should I attack that hill? find the longest path from your units to
the hill in the percentage of survival map, where the path lenght is the
product of the percentage of the zones passed!
Sure it is not so simple, because the enviroment is highly dinamic and
there are a lot of interdependancies, but it is a good starting point.
Daniele Terdina
sistest@ictp.trieste.it
==============================================================================
Daniele had so many good points that I did in fact respond to her by
e-mail, but thought everybody here (hopefully) would be interested
as well.....
: Daniele Terdina
: sistest@ictp.trieste.it
Hello Daniele:
Steven
==============================================================================
==============================================================================
At least one article from myself is apparently missing here. It may have
been an e-mail from myself to Daniele, or somebody else may have posted it
but I missed snagging it. If somebody should happen to have it I'd
appreciate a copy.
==============================================================================
==============================================================================
These really poor moves should be avoided by the use of a planner. The idea
is very rough because I hadn't time to try things out. In the first
scenario (all buildings) the russian has to conquest some buildings
initially occupied by germans. The planner would make reasonings of the
following sort:
> : The sphere of influence idea seems to work well with the points of
> : contact idea. Perhaps each unit has a sphere wherein it can contact
> : within a certain number fo moves (say 3), and its job is to contact
> : the fewest enemies at once but the most over time. IOW, it doesn't
> : want to be outnumbered but it wants to see action.
2 3
\x/ill :-}
==============================================================================
In article ,
Will Uther wrote:
>In article <3pd6nc$lf7@theopolis.orl.mmc.com>,
>woodcock@escmail.orl.mmc.com (Steve Woodcock) wrote:
>
>> Robert A. Uhl (ruhl@phoebe.cair.du.edu) wrote:
>>
>> Maximization in itself will only lead to toe-to-toe WWI slugfests, and
>> basically leads to the AI playing a war of attrition. That's perhaps one
>> of the defining characteristics of most AIs today--if they don't cheat
>> somehow, then they tend to fight wars of attrition.
>
>Very good point.
First of all, the unit must attempt to maximize contact with weak
units _over time_. It must seek to minimize contact with any units,
esp. strong ones, at once. To do so, it can be given either planning
skills, or merely given a function which will do so. Once again, I
will concentrate on the simpler way and faster. Planning takes CPU
time, something which most 'puters lack. Planning would take the form
of a path finder which would decide the path for a unit which would
bring it in contact with units individually.
The function must first seek to find nearby units. The first
criteria for 'nearness' would simply be physical nearness. This would
Also, it must be made possible for them to retreat. With this, your
units willbe akin to men; they don't like to be hit on too much.
Perhaps they would be craven or berserk. That would be neat.
>> : The sphere of influence idea seems to work well with the points of
>> : contact idea. Perhaps each unit has a sphere wherein it can contact
>> : within a certain number fo moves (say 3), and its job is to contact
>> : the fewest enemies at once but the most over time. IOW, it doesn't
>> : want to be outnumbered but it wants to see action.
>
>This is where the determination of neighbours came in before. Assume the
>following:
>
> 1
>
> 2 3
>
> A
>
> A is attacking and could normally reach 1, 2 or 3. However most humans
>would rule out attacking 1 because 2 and 3 are 'in the way' - 1 is not a
>neighbour of A.
> This is assuming normal distances are used. If you penalise paths for
>travelling close to an enemy then the shortest path A -> 1 may be around
>the outside of 2 or 3 - making it 'out of range' anyway. You have to
>search for shortest paths in this case though.
Hmm. I would suggest that the AI measure the distances. For the sake
of argument A->1 = 3, A->2 = 2, A->3 = 2.
Simplistic, and probably not the best method. In fatc, I know that
it is rather bad, but it works, and that is what counts for the
moment. Perhaps it would only be good for individual movement. In
fact, I think thta it would mimic a single man rather well. The man
takes on the closest, weakest enemy.
Robert Uhl
==============================================================================
: The function must first seek to find nearby units. The first
: criteria for 'nearness' would simply be physical nearness. This would
: choose, say 10 or 20 units. These would be sorted by strength, then by
: friendly units in the rough direction of the enemy. The idea is that a
: unit doesn't know where its allies are headed, but if they are near an
: enemy, the odds are that it is a melee situation. After sorting these
: the second time, the AI would choose the 'nearest' unit. In close
: quarters, it would most likely choose the physically nearest one. But
: from a distance, it would tend towards otehr criteria, such as unit
: strength, friendlies in the area, &c.
I wonder if some variant of this isn't what many games already do?
Most of them do tend to exhibit a nasty tendency to trickle units unit
a battle.
: >
: > 1
: >
: > 2 3
: >
: > A
: >
: Hmm. I would suggest that the AI measure the distances. For the sake
: of argument A->1 = 3, A->2 = 2, A->3 = 2.
: Simplistic, and probably not the best method. In fatc, I know that
: it is rather bad, but it works, and that is what counts for the
: moment. Perhaps it would only be good for individual movement. In
: fact, I think thta it would mimic a single man rather well. The man
: takes on the closest, weakest enemy.
On a related subject, I wonder how one would best determine how many
units to allocate to the attack against a given enemy unit? I can think
of two approaches:
Steven
==============================================================================
I have been paying close attention to this thread and the Influence Mapping
one and their are two observation I have made:
1) First, the influence mapping algorithm proposed (not the priority queue
but the previous one) seemed very effective but is computationally intensive.
However, I expect it would not have to be computed on every game turn and that
instead, modifications could be made to an already computed influence map to
obtain a sufficiently good approximation. (eg. When your troops move, you increase
your influence in one hex and decrease it in the other, but your influence in the
area, ie. adjacent hexes, remains approximately unchanged.) Only when units are
destroyed or after a number of game turns would you have to go through the big
expense of applying the influence mapping algorithm to get an accurate map.
2) The search for a fronts is extremelly important in planning your strategy but
only looking for the front only gives you half the picture. Let me expand:
When I play war games, my first priority is to defend at fronts so my troops are
My last objective, once the destruction of all targets of opportunities has been
addressed, is to ATTACK on the fronts where I have superiority.
My beef about all this is that the current discussion does not include and
detect these targets of opportunity. I think what need to be done is to compute
two influence maps:
a) One map computed the standard way which defines fronts and zones of control.
b) One map computed the standard way except that the influence of opposing forces
does not affect the value of a hexes occupied by a friendly unit. Hence, a hex
occupied by a friendly is always positive, a hex occupied by a foe is always
negative.
Map b) would show target of opportunity since one or a few hexes would have negative
values while being surrounded by large positive values (Large derivatives would
identify these areas) and the map a) would show that you have a large influence
on that enemy occupied hex and hence, could destroy it easily.
Map b) would also identify which of your units are about to be destroyed because
they are greatly out numbered (the opposite situation) and some strategic decision
would have to be made about sacrificing , regrouping or reinforcing the unit.
Map b) would also indicate where on the front, identified with map a), you should
strike since it would identify where the enemy is the weakest and you the strongest.
ie. map a) identifies fronts as hexes where the influence value is zero but the
derivative is large. Map b) values at the front hexes will indicate which side
is on the defensive or offensive along each portion of the front. This is crucial
to 1) DEFEND along the front, and 2) start an offensive where the enemy is the
weakest on the front.
Marc Carrier
==============================================================================
==============================================================================
I apparently missed a few articles here as well. Judging from the next
post I have, there appear to have been several responses (at least 3) to
Marc Carrier's comment above. Again, if anybody has the originals I'd
love to get a copy to incorporate here.
==============================================================================
==============================================================================
writes:
: |> In article <3ptn8e$o9g@bmtlh10.bnr.ca>, Marc Carrier wrote:
: |> >
: (SNIP...)
: |> >My second priority is finding, creating and assaulting "targets of
opportunity".
These are enemy units or objectives which are not well defended and usually isolated.
Unfortunately, the influence mapping algorithm would show the hexes where these ta
rgets|> of opportunity are as under my control if they were surrounded by my troops.
This enemy unit may be far from the front but is an unit that every one would attack
immediately (excluding exceptions here and there.)
: |> >
: (SNIP...)
: |>
: |> I think you are overdoing the targets of opportunity (TOP's). Going for
<<<>>>
: I disagree!
:
: First, I agree I might have over emphasised the "creating" targets of
: opportunity part of my statement. But when one exist, taking adventage of
: it should be considered prior to carrying on your move towards your
: objectives. A single enemy unit behind your lines or out on the flanks
: can cause major damages later on if not addressed.
: Example: In Clash Of Steel, I win the game on the eastern front by
: sending single tanks around and behind enemy line to cut supplies, then
: I destroy the unsupplied units. Whether I play the AXIS or the
: ALLIES, the strategy usually works since the COS AI is not very good at
: recongnizing that enemy units are about to cut his supply routes (In
: fact, the COS AI is not very good period).
: In chess, unless you can see that you can check-mate your opponent in a
: number of moves, your rarely pass the opportunity to capture one of his
: pieces if you will not lose anything. What you have to be careful about
: is that doing so is not a trap by your opponent to open up your defenses
: and that you will not end up in a less desirable state.
I would here remind you of the gambit. And the sacrifice. I have won
many games of chess simply because I had a three or four move lead in my
development, bought with a couple of pawns, which I used wisely. I
personally consider this a good analogy to taking the time, and effort,
to pick off an insignifigant force left behind the lines.\
: The point about my previous post was that their is a distinction between
: being on the defensive and offensive on a front. And maximizing contact
: points over time is something you usually do only if you are on the
: offensive. So far, many good algorithms have been proposed to identify
: fronts. Now I am turning my attention to identifying the state of the
: troops on that front (Offensive/Defensive). Further more, if troops
: are on the defensive, can they stand or must they fall back and regroup?
: If they are on the offensive, can they attack now or should they wait to
: group and concentrate fire power instead of trickling into combat?
Good questions.
|
|
Enemy Control | My Control
My time adv | My time adv
|
|
==============================
|
|
My Control | Enemy Control
Enemy time adv| Enemy time adv
|
|
Now of course you could swap them and represent the time variable by the
angle but I was thinking that if you represent the strenth by an angle
then you can have no poles(points going to infinity) in the final
function. This should help with array bounds or the like. Still I have
no idea how you might form the time function(the strenth function should
require only a little modification to apply and I will take a shot at it
after exams).
Andrae.
==============================================================================
This was somewhat the idea I had w.r.t. using two influence maps since there is more
than just the soil you control that determines your posture. However, i had not
considered time as a second dimentioned but it makes a lot more of sense and the
simple matrix you presented with its four quadrant can simply summarized the four
behaviors I mentioned earlier:
Now, how do we represent time? One method that came to mind would be to use the
normal influence mapping algorithm proposed and compared the results after n
iterations and 2*n iterations where n < number of iterations required for value
to stabilize and converge. Basically, compute the map with the influence of the
units propagated n times which shows your more immediate influence and compare
it to the map with the strengh propagated 2*n times (for example) which is your
influence in the more distant future. (In not too sure this thinking is correct
but I will do some simulation this weekend to try it out.)
An Example: After n iterations , the influence map could show that the ground
where a friendly unit is located is under enemy control (ie. lots of enemy units
close by) however, after 2*n iterations, the influence value might turn positive,
indicating that a lot more friendly units are close enough to reinforce the
position.
(I want to simulate this to see if this can happen with the influence mapping
algorithm.)
One problem I see with this is that the group of friendly units that could reinforce
the position will have its influence spread all around and many friendly units in
enemy territory may count on that reinforcement when the 2*n map is considered.
Unfortunately this method does not help to make the decision of where the
reinforcement
should go and the units in enemy territory that do not get the reinforcement will die
if they stand.
Marc
==============================================================================
I've read, with some interest, all of the proposals for wargame
AI, and I haven't seen this technique postulated. This is the technique
I intend to use for my Fantasy Wargame Engine.
First, there are two objectives that are possible in any tactical
level combat: 1) Offensive, 2) Defensive.
Concentration of force:
The *MOST* important thing in a battle is to decide where the
enemy is weakest and kick his ass there. This is why flank attacks are
so effective. Flanks are usually weaker than the center and are *not*
mutually supportable. Whereas if you attack the center, it can be
reinforced by the flanks (or rear) if need be.
Breaking the center will destroy an army because it isolates the
flanks (therefore they can't be mutually supporting), but breaking the
center is *hard* because it's constantly being reinforced by the flanks
and rear.
First you have to find the weak points in the enemy line. This
can simply be done by taking each enemy unit, add up its attack and
defense strengths, then add up the attack&defense strengths of the units
1 move away, and add them to the unit. The unit with the lowest score
will be the weakest.
Now this won't help if that unit happens to have 8 million
archers between the AI troops and itself. So the computer also has to
take into account "spheres of influence".
Basically, each unit has a sphere of influence, that is an area
of the battlefield where it can inflict damage on the enemy. The enemy
units must have a sphere of influence as well, where those sphere's
overlap is the contested area. Now, the idea is to try to get into a
position where the AIs sphere of influence is larger than the human
player's unit. This way the contested area is farther away from the
computer's units than the human. Note that the AI units and the Human
units might well be out of range of each other, but that doesn't matter.
Right now all the AI is trying to do is to control the field.
Then all the computer has to do is to move its units so that if
it's losing the strength battle in a contested area, it either moves the
outnumbered unit towards its other units (combining their contested
areas), or moving a reserve unit to the understrength unit thereby
reinforcing its contested area.
The above model works fairly well with units making independant
decisions about where they want to go and who they want to attack. But
this is just the starting point!
The description up above mentioned a few things like detaching
units to defend a piece of ground, or reinforcement of in-dnager units.
This requires a "General" level of AI. The General has to make overall
decisions about the course of the battle.
Luckily the General only has to make two types of decisons.
1) Where is the enemy line weakest so that it can concentrate its forces
on that area.
2) Where is its line weakest so that it can reinforce the area with more
units.
Well, I've typed enough. If anyone has any questions, feel free
to E-Mail me (as I don't read this newsgroup often).
Chris.
==============================================================================
My apolgies to all for the lengthly quotes, but I didn't want any
quotes or responses to be out of context.
Oooohhh....sounds interesting! ;)
You do address this somewhat farther on, but I'll go ahead and bring
this up here: I am somewhat concerned over the effects of finding those
units which are 'least engaged' and then detaching them to attack the
identified weaker enemy units. First, there may be some penalty for detaching
from combat, which must be taken into account in any such weighting of
possible options. Second, while it may be desirable to attack the enemy
in question I *may* want to finish off the unit I'm currently engaging
first, even if it is in the middle of a nest of enemy units. The enemy
I'm currently engaged with may only need one more hit/attack/whatever to
finish off, while the enemy unit I would *like* to engage may be a turn
of two away from its (presumed) objective. In other words, sometimes it's
better to take the bird in hand than the one in the bush.
: Concentration of force:
: The *MOST* important thing in a battle is to decide where the
: enemy is weakest and kick his ass there. This is why flank attacks are
: so effective. Flanks are usually weaker than the center and are *not*
: mutually supportable. Whereas if you attack the center, it can be
: reinforced by the flanks (or rear) if need be.
: Breaking the center will destroy an army because it isolates the
: flanks (therefore they can't be mutually supporting), but breaking the
: center is *hard* because it's constantly being reinforced by the flanks
: and rear.
Good summation....
: First you have to find the weak points in the enemy line. This
: can simply be done by taking each enemy unit, add up its attack and
: defense strengths, then add up the attack&defense strengths of the units
: 1 move away, and add them to the unit. The unit with the lowest score
: will be the weakest.
: Now this won't help if that unit happens to have 8 million
: archers between the AI troops and itself. So the computer also has to
: take into account "spheres of influence".
: Basically, each unit has a sphere of influence, that is an area
: of the battlefield where it can inflict damage on the enemy. The enemy
: units must have a sphere of influence as well, where those sphere's
: overlap is the contested area. Now, the idea is to try to get into a
: position where the AIs sphere of influence is larger than the human
: player's unit. This way the contested area is farther away from the
: computer's units than the human. Note that the AI units and the Human
: units might well be out of range of each other, but that doesn't matter.
: Right now all the AI is trying to do is to control the field.
: Then all the computer has to do is to move its units so that if
: it's losing the strength battle in a contested area, it either moves the
: outnumbered unit towards its other units (combining their contested
: areas), or moving a reserve unit to the understrength unit thereby
: reinforcing its contested area.
: The above model works fairly well with units making independant
: decisions about where they want to go and who they want to attack. But
: this is just the starting point!
: The description up above mentioned a few things like detaching
: units to defend a piece of ground, or reinforcement of in-dnager units.
: This requires a "General" level of AI. The General has to make overall
: decisions about the course of the battle.
: Luckily the General only has to make two types of decisons.
: 1) Where is the enemy line weakest so that it can concentrate its forces
: on that area.
: 2) Where is its line weakest so that it can reinforce the area with more
: units.
Reserves are something we never even really talked about. Good point.
Steve
==============================================================================
Steven Woodcock _
Senior Software Engineer, Gameware _____C .._.
Lockheed Martin Information Systems Group ____/ \___/
Phone: 407-826-6986 <____/\_---\_\ "Ferretman"
E-mail: woodcock@gate.net (Home)
swoodcoc@oldcolo.com (Alternate Home)
woodcock@escmail.orl.mmc.com (Work)
Disclaimer: My opinions in NO way reflect the opinions of the
Lockheed Martin Information Systems Group, although
(like Rush Limbaugh) they should. ;)
Motto:
"...Men will awake presently and be Men again, and colour and laughter and
splendid living will return to a grey civilization. But that will only come
true because a few Men will believe in it, and fight for it, and fight in its
name against everything that sneers and snarls at that ideal..."
-- Leslie Charteris
THE LAST HERO
==============================================================================
>: Then all the computer has to do is to move its units so that if
>: it's losing the strength battle in a contested area, it either moves the
>: outnumbered unit towards its other units (combining their contested
>: areas), or moving a reserve unit to the understrength unit thereby
>: reinforcing its contested area.
>
> Okay, that's an interesting approach. Isn't it somewhat similar,
>however, to the 'mesh analysis' approach suggested in the earlier thread?
>In this case, rather than spreading the influence of a unit across the
>map, it merely spreads to adjacent hexes. This is certainly simpler,
>but don't you lose the ability to readily identify fronts and lines of
>control?
>
> On the other hand, it may be a good 'sub-system' approach for determing
>actual engagement strategy. That is, having first used the mesh analysis
>approach for mapping out the battlefield zones of control, one could then
>use this methodology for picking out individual weak points.
That makes for another step to the process and more "thinking"
time for the AI (not bad, but even *I* get bored at some wargames and
start shouting at the screen: "Come on!!"). Also, the extra step is not
needed as the sphere of control scheme, by reflex (implied by the
concept), will automatically take care of finding the front, and
maintaining good lines of control.
Think of it this way: Two amoebae (sp?) are fighting with their
cilia (the units). They're stuck in the same test tube in a limited area
(the battlefield). Each feels the genetic need to grow to fill the test
tube and can only do that by killing the other amoeba's cilia and pushing
its cell wall back.
Initially, the two amoebae are seperated by a certain distance.
However, they feel the need to expand. The growth is instinctive (AI is
programmed to do this, the human player needs to do this to gain control
of the battle).
The AI amoeba looks out with its sensors to the maximum range
that it's cilia can attack (with deady morphogens!!). Suddenly it sees
the contested area for a few of its cilia, and the Human amoeba has more
cilia in that area. Well, the DNA looks for nearby cilia that have a low
area of contention or none at all, and moves them to strengthen the weak
area.
The advance continues.
I hope this analogy brings across the basic theory behind what
I'm saying.
>sense. The General draws up the overall battle plan and determines
>what objectives to take. The Segeant determines the best way to do it.
>Beyond Squad Leader will reportedly use a similar approach.
>
>: Don't forget to have the general define some units at the
>: beginning of the battle as reserve troops. This can easily be
>: calculated. Just set back the units that have the greatest sphere of
>: influence and/or greatest strength *AFTER* it assigns units to control
>: the good ground.
>
> Reserves are something we never even really talked about. Good point.
Yah. I've beaten more AIs by probing the enemy line with my
front line units while holding back a sizable reserve. When I identify a
weakness in the enemy attack (or defense), I send my reserves in to bust
open the enemy line and kick 'em in the ass!
I want my AI to be able to do that.
>Steve
Chris.
==============================================================================
True, using the reserve is a logical approach, but I'm still a bit
worried that this overall technique will tend to force units to run
back and forth around the battlefield, engaging the weakest enemy unit
they see and/or grabbing the most valuable piece of real estate in the
immediate vicinity. Without some kind of factoring in the value of
an 'attack in progress', so to speak, I'm not sure that units using
this technique will ever finish the job.
Hmmm.
: Think of it this way: Two amoebae (sp?) are fighting with their
: cilia (the units). They're stuck in the same test tube in a limited area
: (the battlefield). Each feels the genetic need to grow to fill the test
: tube and can only do that by killing the other amoeba's cilia and pushing
: its cell wall back.
: Initially, the two amoebae are seperated by a certain distance.
: However, they feel the need to expand. The growth is instinctive (AI is
: programmed to do this, the human player needs to do this to gain control
: of the battle).
: The AI amoeba looks out with its sensors to the maximum range
: that it's cilia can attack (with deady morphogens!!). Suddenly it sees
: the contested area for a few of its cilia, and the Human amoeba has more
: cilia in that area. Well, the DNA looks for nearby cilia that have a low
: area of contention or none at all, and moves them to strengthen the weak
: area.
: The advance continues.
: I hope this analogy brings across the basic theory behind what
: I'm saying.
But that's my point, I think. This approach seems aimed purely at the
General's side of things--"these are important objectives to seize, these
are units I'd like to see killed"--without consideration of the 'practical'
aspects of the problem (the Sergeant's job, in other words). Please don't
misunderstand; I think this is valuable IF being presented as an approach
for the General side of things.
Steve
==============================================================================
Steven Woodcock _
Senior Software Engineer, Gameware _____C .._.
Lockheed Martin Information Systems Group ____/ \___/
Phone: 407-826-6986 <____/\_---\_\ "Ferretman"
E-mail: woodcock@gate.net (Home)
swoodcoc@oldcolo.com (Alternate Home)
woodcock@escmail.orl.mmc.com (Work)
Disclaimer: My opinions in NO way reflect the opinions of the
Lockheed Martin Information Systems Group, although
(like Rush Limbaugh) they should. ;)
Motto:
"...Men will awake presently and be Men again, and colour and laughter and
splendid living will return to a grey civilization. But that will only come
true because a few Men will believe in it, and fight for it, and fight in its
name against everything that sneers and snarls at that ideal..."
-- Leslie Charteris
THE LAST HERO
==============================================================================
From iplmail.orl.mmc.com!news.den.mmc.com!news.coop.net!cs.umd.edu!zombie.ncsc.mil!
news.mathworks.com!news.kei.com!wang!news Fri Jun 16 16:15:16 1995
Newsgroups: comp.ai.games
Path: iplmail.orl.mmc.com!news.den.mmc.com!news.coop.net!cs.umd.edu!zombie.ncsc.mil!
news.mathworks.com!news.kei.com!wang!news
From: bruck@actcom.co.il (Uri Bruck)
Subject: Re: Influence Mapping: Strategic . . .
Organization: ACTCOM - Internet Services in Israel
Date: Fri, 16 Jun 1995 13:05:10 GMT
Message-ID:
References: <3rict3$bl2@theopolis.orl.mmc.com> <3rl3f3$88b@clarknet.clark.net>
<3rnl56$n83@theopolis.orl.mmc.com>
Sender: news@wang.com
Lines: 103
I thinkI can add something to the thread about influence mapping, using
the already mentioned idea of General/Sargent algorithm.
This may seem obvious to many of you, but it's a point worth mentioning.
This design (with more or less levels) becomes effective if the General
only sees the map in a lower resolution than.
(Perhaps I should also mention that I like to use cartesian coordinates rather
than hexes, since I have no pre-computer war game experience, this deosn't
mean using squares - like Dune II apparently does, but continous coordinates)
Influence mapping can still be done this way.
The 2nd level commanders - Colonels - receive their instructions, such as,
attack a certain area,= they would need a more detailed map of that area, and
perhaps the way to get there, it is not necessary to make the detailed map
for the entire playing field, only for those areas which the Colonels need
information about, if two Colonels need information about the same area, they
can use the same piece of map. Colonels have different kinds of units they
can use to carry out their mission. The units are grouped under sargents,
I see two possible kinds of groups, homogenous groups, or mixed groups,
provided the units in the mixed groups can travel the same types of terrain
at more or less the same speeds.
Colonels need to find the method that will give the best chance of success
in their mission. They can another method mentioned under several names
in the thread, like 'flowing' through the influence map and determine
the shortest route, finding the weakest spot etc.
They can try to determine which tatics would have the best chance of
success.
possibilty of success.
Sargents are simpler - They receive one of the basic command from the Colonel
and distribute them among their units, basic command like move, attack, stand
and hold fire, all the lower level stuff like changing direction, updating
position etc. is handled by the unit itself. The Sargent may receiv a command
to attack a group of units, and assign each of its units to on of the units in
the group. Most of the units in the group can be pretty dumb - the Sargent can
be used to check out the surounding area and adjust the movement orders if
necessary.
This may sound comlicated to do at every turn - but then I was not thinking
of a turn based game in the usual sense, but something more along the lines
of Dune II, which runs continously.
What is left is detemining how often each level of command should be updated.
Units that actually move should be continously updated. The higher the level
the less updating one needs, what the higher level units should do is mostly
check on the progress of the current mission and things that may need
immediate attention. this can be done both by using the maps,and the reports
(In my implementation I update the maps about every six program cycles, when
considereing a turn based game this sound too slow, but I my design was
a continous play game, to prvent jerkiness, I update 1 sixth of the
general map, every turn)
Reports - just as commands flow down the command hierarchy, reports should
flow upwards, information from reports can sometimes greatly enhance the
information received from control maps, letting each command level know
the exact status of the units one level below, thus it can determine whether
its plans are being carried out successfuly, whether it is necessary to call
on resrves, change plan of action, give commands at the proper time.
(this assumes flasless communications - it would interesting to watch
what happens if we allow comminications to falter)
AI should also try to guess where the enemy intends to attack, recognize
concentrations of force before they happen, this is posssible, by
extrapolating movement vectors of groups o funits, this isn't precise, but
it give the AI a general idea where the enemy might converge and it could send
some forces to be in the vicinity, so they can at least slow down the
enemy forces.
Uri Bruck
==============================================================================
>:
>:Lots of stuff
>:
Let me site some historical examples of why I think this would work best:
1. At crechy bridge, the English had superior missile strength and
inferior mobile troops. Neither force had much infantry. Now, if the
English had made their stand on a hill in open country, the French would
have been able to threaten all sides then concentrate the forces on a
weak point. Instead, the English chose to defend a gap between woods,
which funneled the French mobile troop into the English fire and enable
the English to set up a short defensive line to repel what French made
it through the fire.
2. During the era of Phalanx combat, high ground was a disadvantage. Missile
fire at the time was not very effective and more effective for disrupting
formations. The general's goal was to find an area were his Phalanx
would be on level, clear ground while they were fighting so that their
line would be the most cohesive.
3. When Marc Anthony invaded Persia, he had very little cavalry and the
Persians had a lot of effective horsebowmen. Then eventually forced
Marc Anthony's retreat not by keeping any ground but by taking advantage
that they could damage him from a distance and that the Romans were
not mobile enough to catch them.
You also have to consider the bigger picture. Let's say the battle is between
a small, mobile force and your larger force of infantry and artillery. Logic
would say that you should use your artillery to pin the opponent down, move up
infantry in a deliberate fashion to prevent the units from being disorganized
when the mobile units could possible attack, then overwhelm the mobile units.
But what if the mobile units are holding a bridge and are soon going to be
relieved by an army much larger than yours? Now, you need a whole new attack
method so that you can destroy the bridge as quickly as possible.
Dennis W. Disney
disney@mcnc.org
==============================================================================
In article <3rsl65$ade@stingray.mcnc.org>,
disney@mcnc.org (Dennis W. Disney) wrote:
>Christopher Spencer wrote:
>> I've read, with some interest, all of the proposals for wargame
>>AI, and I haven't seen this technique postulated. This is the technique
>You also have to consider the bigger picture. Let's say the battle is
between
>a small, mobile force and your larger force of infantry and artillery. Logic
>would say that you should use your artillery to pin the opponent down, move
up
>infantry in a deliberate fashion to prevent the units from being disorganized
>when the mobile units could possible attack, then overwhelm the mobile units.
>But what if the mobile units are holding a bridge and are soon going to be
>relieved by an army much larger than yours? Now, you need a whole new attack
>method so that you can destroy the bridge as quickly as possible.
>
>Dennis W. Disney
>disney@mcnc.org
I'm not a programmer but I know a little about strategy. You might try for a
"personality" algorythm of some kind -- e.g. a agressive, defensive, or
conservative algorythm which could be varied according to circumstances.
1 - The "Commanders" are alocated thier algorythm at the start of the game and
make all moves accordingly.
Re: 2 --
This may be of no use to anyone, but the idea of a game with "personality
Owen Coughlan
PS, I don't want to argue about Montgomery, I'm just citing it as a tenuous
example.
==============================================================================
: Let me site some historical examples of why I think this would work best:
: 1. At crechy bridge, the English had superior missile strength and
: inferior mobile troops. Neither force had much infantry. Now, if the
: English had made their stand on a hill in open country, the French would
: have been able to threaten all sides then concentrate the forces on a
: weak point. Instead, the English chose to defend a gap between woods,
: which funneled the French mobile troop into the English fire and enable
: the English to set up a short defensive line to repel what French made
: it through the fire.
: 2. During the era of Phalanx combat, high ground was a disadvantage. Missile
: fire at the time was not very effective and more effective for disrupting
: formations. The general's goal was to find an area were his Phalanx
: would be on level, clear ground while they were fighting so that their
: line would be the most cohesive.
: 3. When Marc Anthony invaded Persia, he had very little cavalry and the
: Persians had a lot of effective horsebowmen. Then eventually forced
: Marc Anthony's retreat not by keeping any ground but by taking advantage
: that they could damage him from a distance and that the Romans were
: not mobile enough to catch them.
: You also have to consider the bigger picture. Let's say the battle is between
: a small, mobile force and your larger force of infantry and artillery. Logic
: would say that you should use your artillery to pin the opponent down, move up
: infantry in a deliberate fashion to prevent the units from being disorganized
: when the mobile units could possible attack, then overwhelm the mobile units.
: But what if the mobile units are holding a bridge and are soon going to be
: relieved by an army much larger than yours? Now, you need a whole new attack
: method so that you can destroy the bridge as quickly as possible.
This is the point I was trying to make earlier with regards to deciding
which units to use. Most of the algorithms we've discussed fail to 'weight'
their decision making based on what they're doing AT THE MOMENT. If already
engaged in battle, for example, they may very well be ABLE to run over and
nuke some isolated enemy unit, but they might be better off standing where
they are to finish off the unit or units they're presently engaged with.
Steven
Discuss this article in the forums
See Also:
Artificial Intelligence:Genetic Algorithms
Abstract: This paper describes the evolution of a genetic program to optimize a problem featuring task prioritization in a
dynamic, randomly updated environment. The specific problem approached is the "snake game" in which a snake confined to a
rectangular board attempts to avoid the walls and its own body while eating pieces of food. The problem is particularly interesting
because as the snake eats the food, its body grows, causing the space through which the snake can navigate to become more
confined. Furthermore, with each piece of food eaten, a new piece of food is generated in a random location in the playing field,
adding an element of uncertainty to the program. This paper will focus on the development and analysis of a successful function
set that will allow the evolution of a genetic program that causes the snake to eat the maximum possible pieces of food.
Background
The "snake game" has been in existence for over a decade and seen incarnations on nearly every popular computing platform. The
game begins with a snake having a fixed number of body segments confined to a rectangular board. With each time step that
passes, the snake can either change direction to the right or left, or move forward. Hence the snake is always moving. Within the
game board there is always one piece of food available. If the snake is able to maneuver its head onto the food, its tail will then
grow by a single body segment and another piece of food will randomly appear in an open portion of the game board during the
next time step. The game ends when the snake’s head advances into a game square that is filled with either a snake body segment,
or a section of the wall surrounding the game board. From a task prioritization standpoint, then, the snake’s primary goal is to
avoid running into an occupied square. To the extent that this first priority is being achieved, its second priority is to pursue the
food.
Methods
Table 1 provides the tableau for the initial runs of the snake game. Following over twenty initial runs of the program, the
maximum score that had been achieved was 123 hits. As it was apparent that a maximum solution would not be obtained using
the initial function set, the function set was expanded to enhance the snake’s movement and environment sensing capabilities. For
the remainder of the paper, any GP runs performed with the function and terminal sets given in Table 1 will be referred to as a run
made with the "initial" function set. Any run made with the enhanced function set, which includes the complete initial function
set as a subset, will be referred to as having been made with the "final" function set. A discussion of both the initial and final
function sets follows.
Table 1. Tableau for Snake-Game Problem
Objective: Find a computer program that eats the maximum possible pieces of
food.
Standardized Maximum possible pieces of food eaten (211) minus the raw fitness.
fitness:
Hits: Total pieces of food eaten during a run of the program, same as raw
fitness.
Wrapper: None.
Terminals: The terminal set chosen for the problem was right, left, and forward. Each terminal was a macro that would cause the
snake to take the corresponding action during a time step as follows:
Right: the snake would change its current direction, making a move to the right
Left: the snake would change its current direction, making a move to the left
Forward: the snake would maintain its current direction, and move forward. This is the same as a no-op, as the snake must make
a move during each time step.
These three terminals represent the minimal terminal set with which the snake can effectively navigate its surroundings. While
some problems consisting of navigation in a two-dimensional grid can be successfully navigated by way of only one direction
changing terminal, that is impractical for the snake game because the facts that the game board is enclosed and that the snake has
an extended body that is impassible necessitate the ability for the snake to move in either direction in order to avoid death. More
advance terminals, such as moving the snake along the shortest path to the food, were not implemented. Rather, the function set
was constructed in such a manner that the GP could evolve the necessary capabilities to achieve the maximum score.
Functions: Initially the snake was given very limited functionality. One function gave it information about the location of the
food, three other functions gave it information about any immediately accessible danger, and progn2 was provided as connective
"glue" to allow a function tree to make multiple moves in a single pass. All functions were implemented as macros of arity two,
and therefore would only execute one of their arguments depending on the current state of the game, except for progn2, which
would execute both of its arguments. Even though no expressions evolved from this initial function and terminal set were able to
achieve the optimum score of 211 pieces of food, this set served as a baseline by which to evaluate progress and determine
2. Anytime an "ifDanger*" function was used, it would need the aid of a helper function, such as the new "ifMoving*"
functions in order to make intelligent moves based on an assessment of the danger.
Taking the second complexity into account, the reader may now note that the same disadvantage is true of the two new functions,
"ifFoodUp" and "ifFoodRight." Indeed this is true, but an important difference between the role of food and the role of danger in
the game makes for a worthwhile tradeoff. The difference is that there will only be one piece of food on the board at any time.
This allows the new "ifFood*" functions to serve as two functions each. To clarify, consider the ifFoodUp function. When not
true, it is indicating that the food is either down, or on the same horizontal plane as the snake’s head. Now consider a hypothetical
"ifDangerUp" function. If this function were not true, it would tell nothing about whether or not danger is down, because it can be
anywhere simultaneously. Likewise is would not even tell whether existing danger that was "up" posed a immediate threat to the
snake, as the further information of the snake’s current moving direction would need to be known, as discussed earlier. For the
second special characteristic of the new functions, consider the new "ifMoving*" functions. These functions can be used as helper
functions with the two new "ifFood*" functions to create beneficial schemata.
As an example of a beneficial schemata, consider "ifFoodUp(ifMovingRight(left, ifMovingUp(fwd, right))))", which will orient
the snake to pursue food that is upward. As will be seen in the results section, not only does the GP learn how to use these
functions in conjunction with the two new "ifFood*" functions, but they also prove useful in helping the snake discover patterns
that greatly extend its life. Discussion of other schemata is given below in the description of schemata, and specific examples are
given in the "Results" section.
Fitness Cases: For initial runs of the problem, only a single fitness case was used to determine the fitness for each individual.
Because the food placement is random both during a single run, and from one run to another, occasionally individuals would
score a number of hits because of fortuitous placement of the food, and not as much on the merit of their function tree.
To better ensure that the most successful individuals achieved high fitness measures primarily on the basis of their function tree,
new GP runs were often made featuring a "primed" population in which the fitness was measured as the average of four runs of
an individual. The procedure for this is as follows: once a run had completed without obtaining a solution, or if a run had stalled
on a single individual for a large number (100 or more) of generations, a new run was begun with this final individual as one of
the initial individuals. For this new run, however, the fitness was taken as the average fitness of an individual over four runs
instead of merely a single run. The averaging of the fitness over four runs helped eliminate the possibility of an individual having
a high fitness due simply to lucky placement of the food. Using this averaging method to determine fitness was only used in
primed populations because it increased the time of a GP run fourfold. Furthermore, it was common for the generations that timed
out to feature an individual who had scored a high fitness as a result of a lucky run. By beginning a new run with this individual
in the initial population, it not only assured a more realistic fitness measure, but it introduced an entirely new mix of randomly
generated schemata that could potentially benefit the stalled individual. Details of results produced by primed runs are given in
the results section.
Fitness Measure: The fitness measure used is the maximum possible pieces of food eaten, 211, minus the actual number of
pieces of food eaten. Furthermore, if the snake was unsuccessful at eating any food the fitness would be penalized by the number
of board squares that it was from the food. This additional criterion was added to favor individuals who moved toward the food in
early generations of snakes who were unable eat any food.
Parameters: Population was set to 10000. The maximum number of generations was set to 500. The size of a function tree was
limited to 150 points. These parameters were chosen mainly based on available computer resources, covered in computer
equipment and run-time explanation below.
Designating a result and criterion for terminating a run: The best of generation individual will be the one that is able to eat
the most pieces of food. A run will end when one of three termination criteria are met:
1. The snake runs into a section of the game board occupied by a wall
2. The snake runs into a section of the game board occupied by a segment of the snake’s body
3. The number of moves made by the snake exceeds a set limit. This limit was set to 300, slightly larger than the size of the
game board. This will prevent a snake from wandering aimlessly around a small portion of the board.
The reader may note that there is no termination criterion for the completely successful snake. That is because upon eating the
final piece of food, the snake’s tail will grow onto its head, causing it to satisfy termination criteria 2 above. Hence even the
Results
As mentioned in the methods section, there were three types of GP runs made in an attempt to evolve a solution to the snake
game: runs using the initial function set, the final function set, and primed runs, also using the final function set. The highest
number of hits generated by a run using the initial function set was 123. Three separate solutions were generated using the final
function set, although none of them were found to consistently generate a solution. The number of hits achieved by each solution
depended on the placement of the food. It was not until the method of "priming" a run, described in the methods section, was used
that a consistent solution was generated. Of ten primed runs, using variousinitial seeds, exactly five of them evolved a solution,
Consider initially the rightmost sub-tree of the function tree, which is given on the last line as progn2 (left)(right). This is the
branch executed initially and for the majority of this zig-zagger’s run. When executed repeatedly, this sub-tree will cause the
snake to move left then right, progressing diagonally across the board. For this example, the sub-tree is executed whenever there
is no food ahead of the snake’s line of movement, and there is no danger in front of or to the left of the snake’s head. This
continuous zig-zagging motion allows the snake to examine successive rows or columns of the board in search of the food.
Because both branches of the progn2 are executed before returning to the beginning of the function tree, however, the snake will
only detect the food if the second argument of the progn2, right, leaves the snake’s head in line with the food.
In evaluating this individual, first consider the root, which consists of the "ifFoodAhead" function. For any case in which there is
food ahead, the very simple left sub-tree is executed. This subtree simply checks for danger ahead and attempts to avoid it to the
left if present, otherwise the snake will continue along its current movement path towards the food. While this sub-tree proves
both simple and effective, the fact is clear that the individual spends the majority of its run without the food immediately ahead,
which is handled by the much larger right-hand sub-tree.
While it appears much more complicated than the left-hand sub-tree, the fundamental strategy of the right-hand sub-tree is to
avoid danger. This strategy is executed impressively by the three different "ifDanger*" functions, noted with 1, 2, and 3. These
functions provide the roots for the three sub-trees along the right-hand side of the main function tree. The reader can verify that
each of these three sub-trees contains schemata that are highly effective at avoiding any impending danger to the snake. Having
already taken precautions to pursue food and avoid danger, the final sub-tree provides the snake with its wall-slithering motion, in
which it spends the majority of its time.
The final sub-tree, noted with a "4" above, is rooted with a progn2. This indicates that multiple actions will be carried out every
First note the leftmost branch of the function tree, in which the snake will primarily avoid danger to both the front and the left.
Certainly the left-hand sub-tree, though simple, proves highly effective at achieving the snake’s primary goal of avoiding danger.
Secondly, take note of the right-hand sub-tree, which is parsed whenever danger is not immediately ahead of the snake. If the
snake is not moving upwards, it simply continues forward, which is already known to be a safe move. This proves to be the move
that snake most commonly makes. If, however, the snake is moving upwards, and there is danger to the right, then it will turn left
as soon as the food in no longer above it. The primary moves of this snake, then, are to continue forward around the outside of the
board until either there is danger ahead and it turns left, or the snake is moving upwards and there is food to left, when it turns
This individual followed a pattern exactly the same as that shown in figure 2. There were only a few minor deviations from the
pattern that would occur during very infrequent states of the game board. Before considering any such deviations an examination
of the major pattern following steps will be made.
The overall pattern followed by the individual above is as follows, with the movement steps noted by superscripts on the
individual. To simplify the analysis consider that the snake has already eaten enough food to be as long as the board is high, 11
segments, and that the snake is currently moving upward with its head at position (2,10) of the board:
1. While moving upward, if there is not danger two ahead, move forward twice.
2. Once there is danger two ahead, turn right; snake now moving right one row from the top of the board.
3. Turn right again, to begin heading downward.
4. Continue moving downward until there is danger directly ahead.
5. Once there is danger ahead, turn left; snake now moving right at the bottom of the board.
6. Turn left again and return to step one until there is danger to the right of the snake.
7. Danger right indicated the final right-hand column, so the snake now moves up until danger is one ahead.
8. Once there is danger ahead, turn left to follow the top row of the board (4,7) while moving left; repeat this same step to
move down the left-hand side of the board, and when the bottom of the board is reached, return to step 5.
While it is clear that by repeatedly following this pattern the snake will continually trace the whole board, causing it to eat at least
one piece of food on each pass of the board, there is one notable exception from the pattern that is made whenever the food is in
the top row of the board and the snake is moving upward toward it. In this rare case, when step 2 of the pattern is reached, rather
than turning right, the snake will continue forward to eat the food, as noted with a "9" in the function tree. When this case occurs
the snake will resume the pattern to the right following its consumption of the food. If, however, this case occurs too far to the
right and the snake’s body is long enough, the snake can trap itself on the right side of the board, causing it to die. This is the only
way that the way that the individual shown above will not successfully eat 211 pieces of food.
Conclusion
This paper has presented the development and evaluation of a function set capable of evolving an optimal solution to the snake
game. An initial function set was presented and evaluated, but proved unsuccessful at evolving an optimal solution. The initial
function set was then expanded upon to create the successful final function set, and consistently optimal solutions were generated
using primed GP runs. A comparison was made of the results achieved by each function set, as well as by the primed GP runs.
Examples of commonly evolved strategies were presented and evaluated, and a final analysis of a consistently successful optimal
solution was given.
Future Work
The work presented in this paper provides innumerable opportunities for further investigation into the evolution of a task
prioritization scheme within a dynamically changing, randomly updated environment. Specific to the snake problem,
modifications can be made to create completely new and interesting problems, such as a non-rectangular game board, obstacles
within the game board, or multiple pieces of food. Multiple snakes could be co-evolved to competitively pursue the food. The
function set could be modified to feature enhanced detection capabilities and more advanced navigational options. The techniques
used for navigating the snake could be generalized to apply to various other problems of interest. Possibilities include automated
navigation of multiple robots through a crowded workspace, an automaton for tracking fleeing police suspects through harsh
environments, or a control scheme for an exploratory vehicle seeking a particular goal on a harsh alien planet. The possibilities
are only limited by the imagination.
References
Koza, John R. 1992. Genetic Programming: On the Programming of Computers by Means of Natural Selection. Cambridge,
Massachusetts: The MIT Press.
Discuss this article in the forums
A general GA toolkit implemented in Java, for experimenting with genetic algorithms and
handling optimization problems
Contents
The GAA Applet/Application
● Overview
● Browser Requirements and Loading Times
● General Notes
❍ Alphabet
❍ Problem Definition - Definition Files
❍ Problem Definition - Source Modifications
❍ Special GA Mechanisms
■ Automatic 'Kick'
■ Kin-competition compensating factor
■ Memory
■ Pre & Post Breed Functions
❍ Interactively (on-the-fly) defined functions
❍ Continuous Reporting
❍ Graphic Display
❍ User-Initiated Logging
❍ File Input/Output
■ Application Mode IO
■ Applet Mode IO
❍ Online Help
❍ Documentation
Overview
The GA Playground is a general purpose genetic algorithm toolkit where the user can define and run his
own optimization problems. The toolkit is implemented in the Java language, and requires (when used as
an application, in its full mode), a Java compiler and a very basic programming knowledge (just enough
for coding a fitness function). Defining a problem consists of creating an Ascii definition file in a format
similar to Windows Ini files, and modifying the fitness function in the GaaFunction source file. In
addition, other methods can (optionally) be overwritten (e.g. the drawing method), other classes can be
extended or replaced, and additional input can be supplied through Ascii files.
The GA Playground is primarily designed to be used as an application and not as an applet, since it
requires re-compiling of at least one class and use of local file I/O. In addition, it is a little heavy as an
applet, taking a relatively long loading time over the net. However, although its use as an applet does not
enable defining new problems with new fitness functions, it enables extensive playing with many
variations of an already existing problem type, by opening every aspect of the problem definition to the
user. For example, any TSP test problem can be loaded through the 'Parameters' module. Used as an
applet, the toolkit takes advantage of the Java cross-platform nature and the cross-world nature of the
Internet, to bring a GA Playground to anyone interested in experimenting with genetic algorithms.
The applet is large and takes a relatively long time to load. It uses four jar files, the first is about 100K
and the other three about 50K each. Please be patient until the loading process is finished. When the GA
Playground is used as an application (the program's natural mode), and loaded locally and not from the
Internet, the loading time is obviously very short.
General Notes
Alphabet
The implementation of the genetic algorithm uses a high alphabet to encode the chromosome's genes. In
this implementation, each locus on the chromosome stands for a complete gene or variable. Since Java
uses Unicode internally, the available range for the alleles of each gene is 64K, which provides an
adequate resolution for most cases.
The input is defined by a Problem Definition file, an Ascii file formatted like a Windows Ini file. Two
additional input files are optional: An alleles-definition file and a mapping file. These files are optional
since in many cases the input they specify can be created automatically by the program.
The design of the program is modular, and each module is packaged separately as an easily replaceable
(or extendable) class. Of the many classes that make the toolkit, only one should be handled by the user
in defining a new problem. This is the GaaFunction class, that contains three methods: getValue, draw
and createAllelesMap. The getValue method calculates the individual's fitness, and should obviously be
written explicitly for each problem. The draw method is optional, and should be overwritten only if a
graphic output is needed. The createAllelesMap is required only when a mapping file is not supplied, and
the mapping table should be generated by the program.
Besides these, any class can be extended or re-written to create a different genetic algorithm (e.g. the
GaaMutation class or the GaaCrossover class)
Special GA Mechanisms
The implementation of the genetic algorithm follows the standard GA structure but it incorporates
several less standard mechanisms:
1. An automatic 'Kick': A sensor in the program monitors the evolutionary process, and when it finds
that there has not been any advance in the recent N generations (N is user definable), it gives the
population a 'Kick' and scrambles it a little (in a user defined manner). This mechanism helps against the
GA tendency to get stuck at a mediocre local optimum.
2. A kin-competition compensating factor: If a population contains identical individuals, only one of
them receives the nominally calculated fitness. The others are assigned decreased fitness values. This
helps to maintain diversity in the population, and reduces the danger of the whole population being taken
by a single, relatively superior, individual. The evolutionary justification for this mechanism (a
justification is not really needed, but anyway), is that identical individuals compete over the same niche,
so that although each might possess good genes, the very existence of the others makes it more difficult
for him.
3. Memory: Each individual in a population owns both a chromosome (a solution string) and a memory
string, where history data can be recorded. The use of the memory string is optional. If required, the
memory maintenance code can be included in the fitness function.
4. Pre & Post Breed Functions: These two functions (empty by default) are activated just before and
just after breeding takes place (correspondingly). This enables to add any extra processing to the old
population (preBreed) or to the new population (postBreed) during the process of creating a new
generation
The GA Playground supports optimization of functions and expressions defined on-the-fly. The user can
enter an expression of arbitrary length and complexity, with up to 20 variables, define range constraints
for each variable, and let the program search for an optimized (minimum or maximum) solution. This
option is available only in application mode (cannot be run in a browser).
The interactive function optimization code is based on the Java Expressions Library (JEL), an amazing
library that enables fast evaluation of dynamic string expressions. The library was written by Konstantin
Metlov (metlov@fzu.cz), and its site is at http://galaxy.fzu.cz/JEL/.
Continuous Reporting
The applet user-interface supports three tools for monitoring the evolution process. Each is switchable,
the trade off being performance (switched off) versus information (switched on). They are: The text
window, where textual information is continuously displayed (when the text window is enabled), the
graphic window that supports the draw method of the GaaFunction class when relevant (e.g. in TSP
problems), and the Log window, that can gather information in the background and be displayed on
request. All these tools can be toggled on or off anytime, including during execution.
Graphic Display
The program's Gui provides a graphic window which can optionally be used for displaying graphic
representation of the evolutionary process. Problems of geometrical nature, such as TSP or resource
allocation problems, are natural candidates for taking use of this option. To use the graphic option it is
required to override the "draw" function in the user-modifiable GaaFunction file. The graphic display can
be turned on or off at any time, including in the course of the evolution process.
User-Initiated Logging
In addition to the switchable continuous reporting, the user can output information to the log file at any
time. While the evolutionary process is going on, it is possible to log current chunks of data, such as a list
of the current population (list of current chromosome strings), or a description of the current mating
process (in the format: Father + Mother => Kid => Mutated Kid). The population list can be printed in
several formats, some more suited for short chromosomes, other for longer ones. All logging functions
are accessible through the Log menu. Saving the log file to disk is possible only in application mode,
while displaying the log file on screen is available in any mode. The log file can be viewed (or saved)
either during the calculation process or afterwards, and can be used to analyze what happened during the
evolutionary process.
File Input/Output
1. Application Mode IO: Both input and output files can be saved to and loaded from disk. Parameters
that were modified interactively can be saved to a file, and loaded later as a new problem definition.
Parameter files (as well as all other Ascii files) can be edited in a text editor, saved (optionally under
different names) and later used to define different problems
Population of strings can be saved, together with the their function and fitness values, to be studied and
analyzed. Saved population files can be loaded (completely or partially) into the program, to define a
specific population with known individuals. The population file can be edited in order to modify strings
(chromosomes) or create new ones.
Finally, the log screen that optionally stores information about the evolutionary process, can also be
saved to disk for subsequent examination.
2. Applet Mode IO: When used as an applet, the GA Playground cannot access user's local disk.
Therefore input can only be modified through the multi-tabbed Parameters module. This module supports
editing of any aspect of the problem definition, as long as it uses the same fitness function. Output is
limited to screen, but it can be copied and pasted to another application through the clipboard (I am
talking Windows here).
Online Help
The GA Playground assumes some familiarity with genetic algorithm programs, and the help is relatively
concise. There are three help utilities
1. Online Help Screens: Two basic help screens (General Help and Input Files Help) are accessible
from within the program (Help menu).
In addition there are two compact context-sensitive help mechanisms, which are particularly relevant to
the parameters (input) panels:
2. Automatic Tool-Tip: A short help string is automatically displayed in the status bar when the cursor
is over a particular component (Textfield, label or button). This automatic mechanism can be toggled on
or off through the Options menu.
3. Right-Mouse-Button click: An additional short help string can be displayed in the status bar by
clicking the right mouse button when the cursor is over a particular component. This second help tip
complements the automatic help tip described above.
Documentation: Currently there is only an empty, skeleton Javadoc documentation. While I have all the
good intentions of filling it with real documentation, it might take some time. Meanwhile it can be used,
with some intuition and possible trial-and-error, as a basis for extending and subclassing GA Playground
classes. The classes can be retrieved by expanding the gaa.jar file.
The GA-Playground can be downloaded to be run from local hard disks. The program can be run either
as an applet (from any browser supporting JDK 1.1.5), or as an application (running on a JDK 1.1.5
VM).
Please download the GaPlayground.zip (about 470K) and unzip it to any directory. The same files are
used for both applet and application modes. Sample code for the GaaFunction class (this class should be
modified when defining new problems) can be downloaded from the author's new web site
Applet Mode: Just load any of the Html problem files into your (JDK 1.1.5 capable) browser. The file
gaa.html (this file) contains an index of all the demo problems, and can be used to load any of them.
However, any specific problem can be loaded directly into the browser (e.g. TspDemo.html). In any case,
once the applet is loaded, any of the demo problems can be loaded through the Ga/Entry Screen menu.
Application Mode: The application is activated by the command:
java GaaApplet [Parameters File].
The Parameter file name is optional: When not given the "All Demos" version (All.par) will be activated
by default.
Examples:
To run the "All Demos" version: java GaaApplet
To run the TSP demo of Bavarian cities: java GaaApplet bayg29.par
Classpath (Application mode): When used as application the three jar files of the GA Playground
should be defined in the CLASSPATH variable. Each one should be entered separately.
Example:
If you unzipped the GaPlayground.zip file into C:\GAPL directory, your Classpath assignment should be
similar to the following:
set CLASSPATH=.;C:\GAPL\gaa.jar;C:\GAPL\ScsGrid.jar;C:\GAPL\tabsplitter.jar; C:\jel.jar
Application Batch File: On Windows you can create an icon that activates a batch file for running GA
Playground as an application.
The batch file should be similar to the following:
CLASSPATH=.;tabsplitter.jar;ScsGrid.jar;jel.jar;gaa.jar
java GaaApplet %1
Assuming the batch file is named RunGaa.bat, entering "RunGaa" at the command prompt will activate
GA Playground with the default (problem selection) mode, while entering e.g. "RunGaa TspDemo.par"
will run the specific TspDemo problem.
Javadoc files: The "skeleton" Javadoc files set is packaged as a zip file and can be downloaded It is a
little thin, but it might be useful.
The GA Playground is currently at a preliminary stage, still under construction. The options for
definitions of new problems are not fully implemented yet, and there is no Java documentation, so it is
currently limited to playing with the defined problems. When playing with the parameters (or preparing
new parameters files in the application mode), please be careful with your input parameters, as there is
no protection against illegal entries
I shall be glad to receive any comments or suggestions. Please use my mailbox for any sort of feedback.
If you will look at the source of any of the following Demo Html pages (by the View Source function in
your browser), you will see that each activates the same Java class file (GaaApplet.class), the only
difference is the ascii definition file that is given as a parameter to the applet. In each case, all the class
files used are the same. The only exception is the GaaFunction.class, which either should be specific for
each problem definition, or contain several alternative functions that are activated according to the
specific problem code.
Once you load a specific example, you can change the problem definition (by modifying problem
attributes, variables ranges and variables mapped values) and also experiment with the genetic algorithm
by playing with any of the GA parameters.
Important Notes:
● The applet requires a browser that supports JDK 1.1.5 or above (Communicator 4.05 (Preview
Release), MSIE 4.01, HotHava 1.1).
● If you browser does not support JDK 1.1.5 you can download the free Sun's Java Plugin:
http://www.javasoft.com/products/plugin/ (good for Netscape 3.0 and above, MSIE 3.02 and
above).
● The applet has a relatively long loading time, especially when loaded for the first time.
● Once the applet is loaded (with any of the demo problems), it is possible to load any other problem
from within the applet, through the 'GA/Entry Screen' menu command.
● The applet is best viewed in 800x600 (or higher) screen resolution
Multiple
Problems
Applets
All the demo problems: In this configuration you can select any of the examples
All Demos listed below, and switch between them. Selection is done from the applet's 'Entry
Screen' (GA menu)
TSP:
A Tsp where all cities are located on a circle. The number of cities is user
TSP on circle
definable.
Knapsack:
Single
A single knapsack problem with 50 objects
0/1-Knapsack
Weing1:
multiple A multiple knapsack problem with 2 knapsacks and 28 objects
0/1-Knapsack
Weish01:
multiple A multiple knapsack problem with 5 knapsacks and 30 objects
0/1-Knapsack
Bin Packing:
Binpack1
120 objects uniformly distributed in (20,100), bins of size 150
u120_00
Binpack5
60 objects in 'triplets' of items from (25,50), bins of size 100
t60_19
Facility
Allocation:
Steiner on
A facility allocating problem (where all cities are on a circle)
circle
Steiner by file A facility allocating problem (where city coordinates are read from a file)
Multi-modal
Functions:
Function
Optimization:
Single Variable
Minimize f(x) = x^4 - 12*x^3 + 15*x^2 + 56*x - 60
Minimization
| Home Page | Floys | iFloys | eFloys | tFloys | Floys Description | Java CA | Wica | Doll House |
Picture-Browser | Download | Alife Database | New Alife Database | Newest Alife Database | GA
Playground | Experiments |
Ariel Dolan
aridolan@netvision.net.il
Tel. 972-3-7526264
Fax. 972-3-5752173
Last modified on: Thursday, 10 September, 1998.
Search:
Home • Essays • Interviews • Programs • Books • Media • Software • Hardware • Games • Features
Discussion Board • AISolutions • Competition • Education • Projects • Glossary • Links • Search
(9,13,2,4) |57-30|=27
(13,5,7,2) |57-30|=22
(14,13,5,2) |63-30|=33
(13,5,5,2) |46-30|=16
#include <iostream.h>
#include "diophantine.h"
void main() {
CDiophantine dp(1,2,3,4,30);
int ans;
ans = dp.Solve();
if (ans == -1) {
cout << "No solution found." << endl;
} else {
gene gn = dp.GetGene(ans);
Search:
Home • Essays • Interviews • Programs • Books • Media • Software • Hardware • Games • Features
Discussion Board • AISolutions • Competition • Education • Projects • Glossary • Links • Search
Applications of GAs
The possible applications of genetic algorithms are immense. Any problem that has a large search domain could be suitable tackled
by GAs. A popular growing field is genetic programming (GP).
Genetic Programming
In programming languages such as LISP and Scheme, the mathematical notation is not written in standard notation, but in prefix
notation. Some examples of this:
+ 2 1 : 2 + 1
* + 2 1 2 : 2 * (2+1)
* + - 2 1 4 9 : 9 * ((2 - 1) + 4)
Notice the difference between the left-hand side to the right? Apart from the order being different, no parenthesis! The prefix method
makes life a lot easier for programmers and compilers alike, because order precedence is not an issue. You can build expression trees
out of these strings that then can be easily evaluated, for example, here are the trees for the above three expressions.
+ * *
/ \ / \ / \
1 2 + 2 + 9
/ \ / \
1 2 - 4
/ \
2 1
You can see how expression evaluation is thus a lot easier. What this have to do with
GAs? If for example you have numerical data and 'answers', but no expression to conjoin
the data with the answers. A genetic algorithm can be used to 'evolve' an expression tree
to create a very close fit to the data. By 'splicing' and 'grafting' the trees and
evaluating the resulting expression with the data and testing it to the answers, the
fitness function can return how close the expression is. The limitations of genetic
programming lie in the huge search space the GAs have to search for - an infinite number
of equations. Therefore, normally before running a GA to search for an equation, the user
tells the program which operators and numerical ranges to search under. Uses of genetic
programming can lie in stock market prediction, advanced mathematics and military
applications (for more information on military applications, see the interview with Steve
Smith).
Other Areas
Genetic Algorithms can be applied to virtually any problem that has a large search space.
Al Biles uses genetic algorithms to filter out 'good' and 'bad' riffs for jazz
improvisation, the military uses GAs to evolve equations to differentiate between
different radar returns, stock companies use GA-powered programs to predict the stock
market.
Search:
Home • Essays • Interviews • Programs • Books • Media • Software • Hardware • Games • Features
Discussion Board • AISolutions • Competition • Education • Projects • Glossary • Links • Search
Overview:
Why would anyone want a `new' sort of computer?
What is a neural network?
A biological neuron may have as many as 10,000 different inputs, and may send its output (the presence or absence of a short-duration spike) to many other
neurons. Neurons are wired up in a 3-dimensional pattern.
Real brains, however, are orders of magnitude more complex than any artificial neural network so far considered.
Example: A simple single unit adaptive network:
The network has 2 inputs, and one output. All are binary. The output is
1 if W0 *I0 + W1 * I1 + Wb > 0
0 if W0 *I0 + W1 * I1 + Wb <= 0
The network adapts as follows: change the weight by an amount proportional to the difference between the desired
output and the actual output.
As an equation:
&Delta Wi = &eta * (D-Y).Ii
where &eta is the learning rate, D is the desired output, and Y is the actual output.
This is called the Perceptron Learning Rule, and goes back to the early 1960's.
We expose the net to the patterns:
I0 I1 Desired output
http://www.cs.stir.ac.uk/~lss/NNIntro/InvSlides.html (2 of 9) [25/06/2002 3:06:36 PM]
An Introduction to Neural Networks
0 0 0
0 1 1
1 0 1
1 1 1
We train the network on these examples. Weights after each epoch (exposure to complete set of patterns)
At this point (8) the network has finished learning. Since (D-Y)=0 for all
patterns, the weights cease adapting. Single perceptrons are limited in what
they can learn:
If we have two inputs, the decision surface is a line. ... and its equation is
I1 = (W0/W1).I0 + (Wb/W1)
Back-Propagated Delta Rule Networks (BP) (sometimes known and multi-layer perceptrons (MLPs)) and Radial Basis Function Networks (RBF) are both
well-known developments of the Delta rule for single layer networks (itself a development of the Perceptron Learning Rule). Both can learn arbitrary mappings or
classifications. Further, the inputs (and outputs) can have real values
is a development from the simple Delta rule in which extra hidden layers (layers additional to the input and output layers, not connected externally) are added. The
network topology is constrained to be feedforward: i.e. loop-free - generally connections are allowed from the input layer to the first (and possibly only) hidden
The hidden layer learns to recode (or to provide a representation for) the
inputs. More than one hidden layer can be used.
The architecture is more powerful than single-layer networks: it can be
shown that any mapping can be learned, given two hidden layers (of units).
The units are a little more complex than those in the original perceptron: their
input/output graph is
As a
function:
Y=1/
(1+ exp(-k.(&s
Win *
Xin))
The graph
shows the
output for
k=0.5, 1,
and 10, as
the
activation
varies
from -10 to 10.
Training BP Networks
The weight change rule is a development of the perceptron learning rule. Weights are changed by an amount proportional to the error at that unit times the output
of the unit feeding into the weight.
Running the network consists of
Forward pass:
the outputs are calculated and the error at the output units calculated.
Backward pass:
The output unit error is used to alter weights on the output units. Then the error at the hidden nodes is calculated (by back-propagating the error at the
output units through the weights), and the weights on the hidden nodes altered using these values.
For each data pair to be learned a forward pass and backwards pass is performed. This is repeated over and over again until the error is at a low enough level (or
we give up).
http://www.cs.stir.ac.uk/~lss/NNIntro/InvSlides.html (4 of 9) [25/06/2002 3:06:36 PM]
An Introduction to Neural Networks
Radial basis function networks are also feedforward, but have only one hidden layer.
Like BP, RBF nets can learn arbitrary mappings: the primary difference is in
the hidden layer.
RBF hidden layer units have a receptive field which has a centre: that is, a
particular input value at which they have a maximal output.Their output tails
off as the input moves away from this point.
Generally, the hidden unit function is a Gaussian:
● deciding on their centres and the sharpnesses (standard deviation) of their Gaussians
Generally, the centres and SDs are decided on first by examining the vectors in the training data. The output layer weights are then trained using the Delta rule. BP
is the most widely applied neural network technique. RBFs are gaining in popularity.
Nets can be
RBFs have the advantage that one can add extra units with centres near parts of the input which are difficult to classify. Both BP and RBFs can also be used for
processing time-varying data: one can consider a window on the data:
Networks of this form (finite-impulse response) have been used in many applications.
There are also networks whose architectures are specialised for processing time-series.
Unsupervised networks:
Simple Perceptrons, BP, and RBF networks need a teacher to tell the network what the desired output
should be. These are supervised networks.
In an unsupervised net, the network adapts purely in response to its inputs. Such networks can learn to
pick out structure in their input.
clustering data:
exactly one of a small number of output units comes on in response to an input.
reducing the dimensionality of data:
data with high dimension (a large number of input units) is compressed into a lower dimension
(small number of output units).
Although learning in these nets can be slow, running the trained net is very fast - even on a computer simulation of a neural net.
- takes a high-dimensional input, and clusters it, but retaining some topological ordering of the output.
After training, an input will cause some the output units in some area to become
active.
Such clustering (and dimensionality reduction) is very useful as a preprocessing
stage, whether for further neural network data processing, or for more traditional
techniques.
In particular, they can form a model from their training data (or possibly input data) alone.
This is particularly useful with sensory data, or with data from a complex (e.g. chemical, manufacturing, or commercial) process. There may be an algorithm, but
it is not known, or has too many variables. It is easier to let the network learn from examples.
in investment analysis:
to attempt to predict the movement of stocks currencies etc., from previous data. There, they are replacing earlier simpler linear models.
in signature analysis:
as a mechanism for comparing signatures made (e.g. in a bank) with those stored. This is one of the first large-scale applications of neural networks in the
USA, and is also one of the first to use a neural network chip.
in process control:
there are clearly applications to be made here: most processes cannot be determined as computable algorithms. Newcastle University Chemical Engineering
Department is working with industrial partners (such as Zeneca and BP) in this area.
in monitoring:
networks have been used to monitor
❍ the state of aircraft engines. By monitoring vibration levels and sound, early warning of engine problems can be given.
❍ British Rail have also been testing a similar application monitoring diesel engines.
in marketing:
networks have been used to improve marketing mailshots. One technique is to run a test mailshot, and look at the pattern of returns from this. The idea is to
find a predictive mapping from the data known about the clients to how they have responded. This mapping is then used to direct further mailshots.
To probe further:
A rather longer introduction (which is more commercially oriented) is hosted by StatSoft, Inc.
The Natural Computing Applications Forum runs meetings (with attendees from industry, commerce and academe) on applications of Neural Networks. Contact
NCAF through their website, by telephone +44 (0)1332 246989, or by fax +44 (0)1332 247129
Internet addresses: NeuroNet which was at Kings College, London, was a European Network of Excellence in Neural Networks which finished in March 2001.
Howwever, their website remains a very useful source of information
IEEE Neural Networks Council http://www.ieee.org/nnc/index.html
CRC NCRC Institute for Information Technology Artificial Intelligence subject index has a useful entry on Neural Networks.
Newscomp.ai.neural-nets has an very useful set of frequently asked questions (FAQ's), available as a WWW document at: ftp://ftp.sas.com/pub/neural/FAQ.html
Courses
Quite a few organisations run courses: we used to run a 1 year Masters course in Neural Computation: unfortunately, this course in in abeyance. We can even run
Some further information about applications can be found from the book Stimulation Initiative for European Neural Applications (SIENA) pages, and there is an
interesting and useful page about applications.
For more information on Neural Networks in the Process Industries, try A. Bulsari's home page .
The company BrainMaker has a nice list of references on applications of its software package that shows the breadth of applications areas.
Journals.
Books.
There's a lot of books on Neural Computing. See the FAQ above for a much longer list.
For a not-too-mathematical introduction, try
Fausett L., Fundamentals of Neural Networks, Prentice-Hall, 1994. ISBN 0 13 042250 9 or
Gurney K., An Introduction to Neural Networks, UCL Press, 1997, ISBN 1 85728 503 4
Haykin S., Neural Networks , 2nd Edition, Prentice Hall, 1999, ISBN 0 13 273350 1 is a more detailed book, with excellent coverage of the whole subject.
Pen PC's
PC's where one can write on a tablet, and the writing will be recognised and translated into (ASCII) text.
Speech and Vision recognition systems
Search:
Home • Essays • Interviews • Programs • Books • Media • Software • Hardware • Games • Features
Discussion Board • AISolutions • Competition • Education • Projects • Glossary • Links • Search
An Introduction to Robotics
Most of Artificial Intelligence will eventually lead to robotics. Most neural networking, natural language processing, image
recognition, speech recognition/synthesis research aims at eventually encorporating their technology into the epitome of robotics -
the creation of a fully humanoid robot.
The field of robotics has been around nearly as long as Artificial Intelligence - but the field has made little progress. This is only
natural, since the field not only attempts to conquer intelligence, but also the body that embodies it - a formidable task indeed!
Robotics, though, is not just about humanoid robots; but also about their commerical applications in manufacturing, safety and
hundreds of other fields. Let us back-track though, and look at what could consistute a robot?
What is a Robot?
According to the Oxford Dictionary, a robot is an "apparently human automaton, intelligent and
obedient but impersonal machine". Indeed, the word robot comes from robota, Czech for
'forced labour'. Yet, as robotics advances this definition is rapidly becoming old. Basically, a
robot is a machine designed to do a human job (excluding research robots) that is either tedious,
slow or hazardous. It is only relatively recently that robots have started to employ a degree of
Artificial Intelligence in their work - many robots required human operators, or precise guidance
throughout their missions. Slowly, robots are becoming more and more autonomous.
The difference between robots and machinery is the presence of autonomy, flexibility and
precision. Indeed, many few robots are mere extensions of machinery - but as the field advances
more and more, the current 'fine line' will widen more and more. To understand more of what robots can include, let us look at some
examples.
Current Projects
There are many interesting robot projects. RoboMonkey is a great example - a robot built to emulate the gibbon. This incredibly agile
robot can swing from bar to bar (fixed distance) by using its body to give it the correct momentum. The robot learns from its
mistakes, and will adapt accordingly. But for me, the most interesting projects are being conducted at MIT's robotics laboratory. Two
major robot projects are in progress (and have been so for many years) - Cog and Kismet.
Cog
When MIT started the Cog project, their primary ideology was that the robot must be built from the
bottom-up. The robot would be taught all the necessary data it needed - with very little explicit
programming. Cog will eventually be a complete humanoid robot; as it currently stands (sits) it has a
head, torso and arms all with proportional degrees of freedom.
As of yet, Cog does not perform any higher-level functions of a human, nevertheless the current robot is
quite incredible. Cog can track target smoothly with his head (smooth-pursuit), he can move his head if
a stimulus is provided somewhere in his field of vision (saccade to motion). Cog even has a
vestibulo-ocular reflex (the ability to keep the eyes in a fixed position when the head moves). Cog can
recognize faces and can detect whether eye contact has been established by using his high-resolution
cameras. The robot has also learnt how to reach for targets through trial-and-error and can even imitate
simply facial movements.
The philosophy behind Cog is essential to its success, and continued success. The following passage really demonstrates how the
Cog team feels:
"…We believe that classical and neo-classical AI make a fundamental error: both approaches make the mistake of assuming that because a description of reasoning/behavior/learning is
possible at some level, then that description must be made explicit and internal to any system that carries out the reasoning/behavior/learning. This introspective confusion between surface
observations and deep structure has led AI away from its original goals of building complex, versatile, intelligent systems and toward the construction of systems capable of performing
only within limited problem domains and in extremely constrained environmental conditions.…We believe that [our] abilities are a direct result of four intertwining key human attributes:
developmental organization, social interaction, embodiment and physical coupling, and multimodal integration of the system…"
Kismet
Kismet is aimed at helping research robots and social interaction with human beings. Kismet is only
a head, but much more detailed than its Cog counterpart. Kismet's head consist of two eye's (with
embedded cameras), ears, a mouth, and eyebrows. These can be combined to produce numerous
facial expressions denoting Kismet's feelings and emotions. Kismet can be stimulating using toys
such as a slinky - over stimulation will upset him, under-stimulation will bore him, but just the right
amount will make him happy.
The Cog team believes that social interaction is key to helping robots learn and grow - just like
babies. Therefore, Kismet is a precursor to teaching Cog all about life! Kismet has a dedicated
'trainer' that plays and interacts with Kismet, and watches his output (Kismet is controlled by a
Pentium processor) on the monitor to gain a further understanding of the robots interal states.
Robots at Home
Robotics is slowly making its way into the home - either through leisure, or actual commercial home-based bots. Recently, Probotics
released the world's firt true personal robot - Cye. Cye allows its human operator to create a map of the environment (using a
Windows interface) and download it via an IR link to the robot. The robot will then be able to navigate the area doing various tasks -
including vacuuming! Consumer robots, though, have not yet made a big impact. So-called leisure-robots are.
Recently, Tiger released a furry little toy called a Furby. These toys were touted to contain some impressive artificial intelligence,
plus the ability to communicate, talk and learn from other Furbies. Furbies have an array of sensors - light sensors, tilt sensors, a
microphone, various strategically-placed buttons, and an IR communication device. Furbies have their own language, but can
apparently learn any other language. Once they have learnt the language, they will start to teach it to other Furbies in their vicinity.
Furbies are extremely well-priced now, and a robotics version The Gigabot is now available. Sony released their own version - a
much hyped, expensive electronic dog, Aibo. The dog does various tricks, has a proximity sensor to avoid obstacles, and can even
pick itself up when it falls over! The unit was released in several short bursts, with Sony releasing about 10,000 each valued at about
$2,500 each!
Below are three pictures of (left to right) the Cye SR, a Furby, and Aibo.
Conclusion
Robotics is an absolutely fascinating field that interests most people - AI buff or not. As research from more serious robotics projects
such as Cog and Kismet filter down into the commerical arena we should look forward to some very interesting (and cheap) virtual
pets like Aibo and the furbies. Hopefully, commericial home-based robots will also be avaible for a price not more than an expensive
vacuum cleaner. With computers becoming more and more powerful, interfacing home robots with your computer will become a
reality, and house work will (hopefully!) disappear.
● Hardware Reviews - Reviews for robots like the Sony AIBO, LEGO Mindstorms and more!
● Introduction to Robotics - The basics of robots.
● Problems with Machine Vision - An intro to the problems that image recognition faces.
● AISolutions - Many robotics articles.
Search:
Home • Essays • Interviews • Programs • Books • Media • Software • Hardware • Games • Features
Discussion Board • AISolutions • Competition • Education • Projects • Glossary • Links • Search
Learning Machines?
Back in the 50s, AI researchers were never concerned with learning. We were very content with our chess-playing programs and
theorem provers. As more research had been done on understanding, it was suddenly discovered that learning could become a very
powerful tool for intelligent systems. Many types of systems present different approaches to learning (see aspects of artificial
intelligent systems). Generally speaking, learning is very important and almost essential if we expect a machine to adapt to its ever
changing environment. It is impossible to endow a machine with all the thoughts and concepts that humans possess. If we created a
system that could learn these things instead- by themselves, then our task is made much simpler. Through simple observations, or
experimenting through trial and error, our machines can learn.
A Definition of Understanding
In AI, we can say, in the most general terms that a system understands if it behaves accordingly. For example if a robot is told to
sweep the floor, it understands if it follows the command, but if not, then it does not understand. This poses a problem. Suppose a
system "accidentally" behaves in a way that would imply understanding, it conceptually, it doesn't really understand, does it? And in
some other cases, a system doesn't neccessarily have to behave physically in order to show that it understands. For example we can
understand that 2+2=4 without having to write it out on paper. For these cases we must seek a deeper definition. A system
understands something if it makes changes to its internal conceptual structure, or appropriately modifies its knowledge base. This
definition is very ambiguous, but generality is not always bad. It's true that SAM (Script Applier Mechanism) can only understand
the letters in words, but not necessarily what those words would imply in the real world (SAM wouldn't know how JOHN looks like).
Nevertheless, it understands something about the nature of words, and their syntax and semantics.
All content copyright © 1998-2002, Generation5
Search:
Home • Essays • Interviews • Programs • Books • Media • Software • Hardware • Games • Features
Discussion Board • AISolutions • Competition • Education • Projects • Glossary • Links • Search
Artificial Life
Artificial Life, or ALife, is the attempt to get computers to accurately model the ways and practises of nature. As you can tell from
this definition, not only is ALife a large domain, it overlaps very much with Artificial Intelligence. So much so, in fact, that ALife
deals with various aspects of AI (such as genetic algorithms), and AI deals with various aspects of Alife (such as flocking).
Introduction to ALife
Alife and AI coexist together very well. ALife looks at algorithms that can mimic nature and it ways -- cellular automata, simulation
of group behaviour (ants, for example) -- whereas AI tends to look at mimicking (creating?) human intelligence.
Cellular Automata
A prime area of ALife is that of Cellular Automata (CA). The best definition of CA I've found so far is as follows:
"...A regular spatial lattice of "cells", each of which can have any one of a finite number of states. The state of all cells
in the lattice are updated simultaneously and the state of the entire lattice advances in discrete time steps. The state of
each cell in the lattice is updated according to a local rule which may depend on the state of the cell and its neighbors at
the previous time step..." From FOLDOC.
There are two very good examples of CAs. Wolfram's 1D CA, and Conway's Life, a 2D CA.
Wolfram's 1D-CA
Wolfram was a genius for his age, and he showed how incredible complexity could come out of a simple rules and
simple structures. Wolfram created a program where each of the cells were either on or off (dead or alive). It started
off with an initial configuration (either one cell, or a random stream of them), then the cells beneath the initial line
are determined by the previous line. There are eight possible combinations for Line A that will determine Line B -
000, 001, 010, 011, 100, 101, 110, 111. The results are dependent on the rule set (there are 256 possiblities).
Recently (23/8/99), I created a Windows 95 program that allows you to create your own 1D CAs. The program can generate some
very interesting results, and can generate results both from a point and a line. This program replaced the old Wolfram Pascal program
I'd created.
Conway's Life
Sorry, your browser doesn't support Java. A good example of ALife is the classic program, Conway's life. An example Java applet by
Fabio Ciucci is running at the right. The rules to Conway's Life are simple - each pixel represents a cell. A cell cannot live with either
more than 3 or less than two other cells immediately adjacent to it. If the number of cells around an empty space is 3, then a new cell
is born. These simple rules lead to rather complex behaviour, as demostrated by the applet to the right. The Life system will
eventually stabilize, either by dying out completely, or by an equilibrium point being established (as will happen in the applet if left
for a few minutes).
Behaviour
Another area Alife covers it that of animal behaviours. Finding mathematical formulas behind behaviour. The most famous study of
behaviour was Craig Reynolds' Boids - a program that simulated flocking to an incredibly realistic extent. Again, the program used 3
very simple rules, yet yielded incredibly realistic behaviour. The rules were:
● Separation: Steer to avoid crowding local flockmates.
You can find an more complicated example of flocking based on Boids here.
The Philosophies
Just where does ALife begin and Artificial Intelligence end? Will Artificial Intelligence come naturally if Artificial Life is
successfully created? Does Artificial Intelligence constitute Artificial Life? These are three questions that have been with ALife and
AI since they were first conceived.
Will Artificial Intelligence come naturally if ALife is created? Perhaps, perhaps not. Successfull is so subjective, and the question of
intelligence has been a philosophical question that has plagued AI since its foundations.
The question of whether AI is ALife is interesting. Imagine an artificially intelligent program the doesn't simulate any of the natural
processes of life, yet shows the ability to cognitively understand human speech, make completely autonomous decisions, possibilty
even simulate emotions - would that constitute life? Some would say yes, some no. These areas in Alife are where ALife/AI meet
morals, ethics and politics.
● ALife Essays
● ALife Programs - Windows-based, with full source code!
● ALife Books
● ALife Software
● ALife Software
● AISolutions
Search:
Home • Essays • Interviews • Programs • Books • Media • Software • Hardware • Games • Features
Discussion Board • AISolutions • Competition • Education • Projects • Glossary • Links • Search
Computational Neuroethology
Computational neuroethology (CN) is the study of a systems behavior within an environment. It is concerned with the modeling of
the behavior of these systems, as well as their neural substrates (which is what neuroehtology is concerned with). CN systems
percieve their environments directly. i.e. they are not stored in some global database that was created through human input. They
work in a "closed-loop" environment, free from outside interactions. Their actions are solely based on what they conclude from the
state of their environment and as well as their prior actions. For example, if we wish to simulate a robot in a closed-loop
environment, then it must act not based on whatever semantics or clues that could be provided from a human, but simply from the
changes (or the state) of the environment that it is in.
CN systems learn neurally and evolve genetically. They are adaptive, and act based on the circumstances that they face in their
environment. The drawback with CN systems are that they require enormous computational resources. However, many CN
advocators insist that much more can be gained from building complex models of simple animals (systems) then from building
simple models of complex animals (which is the traditional, and more direct approach). Following this idea, the ultimate goal of AI is
to create a human being. Yet to accomplish this, we must first create a baby, not a full-grown adult.
Artificial Life
Artificial Life (AL) is the study of artificial systems that behave like other natural living systems. One example of AL is Reynold's
Boids (link to Java adaption of Boids - Flozoids). This is a computer model of flocking behavior in animals (such as birds or fish).
One of its characteristics is that, the flock (which is made up of many "boids") will always reassemble if it passes through an obstacle
(which causes it to scatter). How does it accomplish this? Well, each boid follows a few rules, such as: don't fall behind, keep up with
nearby boids, try to stay a minumum distance between your neighbors and obstacles, move towards what seems to be the center of
mass of nearby boids. While these rules may seem very simple, the result is a bunch of boids behaving like a real flock. If you would
like to learn more about boids, see our exclusive interview with Craig Reynolds or Artificial Life.
Another example of AL are surprisingly, computer viruses. Computer viruses exhibit reproductive behavior, usually with the
intention of making trouble, emulating their biological counterparts.
Artificial life systems are mainly concerned with the formal basis of life. Its not important how an AL system was created, but how it
acts and behaves under its environment. They attempt to emulate lifelike behavior. Many AL systems often evolve and coevolve,
simulating evolutionary processes for adaptation to an environment (luckily computer viruses can not evolve, hopefully this will
remain true for decades to come). However, there are limitations to these simulations; this is because of the simple fact that
everything in the physical universe can't be fully detailed. The more accurate the details (as well as, the more the details) that are
given in the simulation, the more likely it is to be successful.
● ALife Essays
● ALife Programs - Windows-based, with full source code!
● ALife Books
● ALife Software
● ALife Software
● AISolutions
Basics of Game AI
By Geoff Howland
Game AI (artificial intelligence) is the subject of a lot of discussion recently, with good cause. As games forge ahead in the
areas of graphics and sounds it becomes increasingly obvious when the actions of the game controlled players are not
functioning in an "intelligent" manner.
More important than the game controlled player's "intelligence", is really the game controlled player's stupidity. Most
game players are not expecting to load up the newest first person shooter and face off against Professor Moriarty with a
nail gun. They do expect that they will not be playing against an inbred lobotomy patient that is confounded by the
complexity of a corner.
Unit Behavioral AI
Game AI is not necessarily AI in the standard meaning of the word. Game AI for units is really an attempt to program
life-like attributes to provide a challenge or an appearance of reality.
A guard in a game that stands in one place never moving can seem very unrealistic. However, if you create a routine to
make that guard look around in different directions from time to time, or change his posture, he can start to look more
alive. By creating a situation where another guard who walks in a pre-made path occasionally stops in front of the standing
guard and appears to speak to him, the appearance of reality can be increased greatly.
There are two categories of actions in unit AI: reactionary and spontaneous.
Units act in a reactionary manner whenever they are responding to a change in their environment. If an enemy spots you
and starts to run towards you and shoot, then they have acted as a reaction to seeing you.
Units act in a spontaneous manner when they perform an action that is not based on any change in their environment. A
unit that decides to move from his standing guard post to a walking sentry around the base has made a spontaneous action.
By using both types of unit behaviors in your game, you can create the illusion that you have "intelligent" units who are
autonomous, and not necessarily simple machines.
Reactionary AI
Reactionary inputs should always be based on the senses of the units. When modeling AI after human characters you need
to take into consideration their range and distance of sight, their sense of hearing and if applicable, smell.
Creating levels of alertness is a good way to handle the input of different senses. If a unit sees an enemy directly in its
sights, then it should switch to an alert mode that corresponds to how it would react to confronting an enemy. If the unit
does not see the enemy but hears footsteps or a gunshot, then the unit should go to a corresponding level of alert that
would apply to non-direct knowledge of the situation.
In the case of a unit that was a guard, hearing a gunshot would make the guard act to investigate in the area the shot was
fired in. Hearing footsteps may make the guard stand in waiting to ambush the walking unit. All of these different kinds of
actions and alerts can be set up in a rule-based or fuzzy logic system so that you can interpret each sound or sighting by
every unit and have them react in an appropriate manner.
A general alert is also an important factor in the appearance of reality and intelligence in a game. If you have been running
around shooting up a base full of people and keep encountering new enemies who are completely clueless to the fact there
has been non-stop gun fire for the past ten minutes, it is going to really seem out of place. By creating an alert system for
all the units, and perhaps an alert plan, you can strengthen the appearance of reality in your game world.
An alert plan would consist of rules that units would follow if there were an alert, as opposed to rules for non-alert
situations. For instance, if there was an alert you could have all units that are in non-critical areas run to the entrances of
the base to assist in the defense.
Spontaneous AI
Spontaneous AI is extremely important in creating a sense of life in your game worlds. If everyone you meet is just
standing there waiting for you to talk to them or kill them, or almost worse, wandering around aimlessly, it is not going to
be very convincing for the player.
Some methods of breaking up the standing around problem are to give each of the units a set of non-alert goals. These
could include walking pre-set paths, walking to pre-set areas randomly, stopping next to other units occasionally when
passing them and walking with other units to pre-set destinations. I have said pre-set in all of these situations because
unless you come up with a very good algorithm for setting destinations or paths your units are going to look like they are
wandering aimlessly.
Unit Actions
What really makes a game unit look intelligent is their actions. If they move in a way that the player might, or do an
action, such as ducking, in a situation that the player might, then the unit can look intelligent. You do not necessarily need
a lot of actions to create an illusion of intelligent actions, you just need enough to cover whatever basic situations your
units will be involved in.
The more areas you cover, if you cover them well, the greater the chance that your players will believe your units are
acting "intelligently". Put yourself in the place of your units. What would you do in their situation? How would you
respond to various forms of attack or enemy encounters? What would you be doing if nothing was going on at all?
If you answer these, and correctly implement them for each situation your units will encounter you will maximize your
chances of having intelligent looking units, and that is the first step to creating a good, solid game AI.
See Also:
Artificial Intelligence:Introduction
COMP.AI.GAMES
Frequently Asked Questions
8th January 2000
Version 2.01
Currently maintained by David J Burbage (dave@blueleg.co.uk)
Originally found at http://www.geocities.com/cagfaq/cf1.htm
Table of Contents
0 Introductory Stuff
0.1 Copyright
0.2 Disclaimer
0.3 Work in Progress
0.4 Netiquette and related topics
0.5 Whom to blame
0.6 What's New in this Release?
1 Overview of comp.ai.games
1.1 What is the purpose of comp.ai.games?
1.2 What are appropriate topics on c.a.g?
1.3 What are _not_ appropriate topics on c.a.g?
1.4 Is there an archive for this group?
7 Further information
7.1 Where is Web information on XXX?
7.2 What are some related news groups?
7.3 Where can I FTP text and binaries?
7.4 What books are available on AI and gaming topics?
8 Source Code
9 Glossary
10 Acknowledgements
Note: You can most easily move from section to section below by searching for lines of "-".
---------------------------------------------------------------------------
Proposed FAQ for comp.ai.games
0 Introductory Stuff
0.1 Copyright
Portions copyright 1999 David J Burbage. All rights reserved.
Portions copyright 1996 Sunir Shah. All rights reserved.
Portions copyright (c) 1995 by Steve Furlong. All rights reserved.
Portions copyright 1995 by Doug Walker and others.
Sunir Shah currently has permission to edit sections copyright Steve Furlong, Doug Walker and Rob Uhl.
This FAQ may be freely redistributed in its entirety without modification provided that this copyright notice is not
removed. It may not be sold for profit or incorporated in commercial documents (e.g., published for sale on CD-ROM,
floppy disks, books, magazines, or other print form) without the prior written permission of the copyright holder.
Permission is expressly granted for this document to be made available for file transfer from installations offering
unrestricted anonymous file transfer on the Internet.
0.2 Disclaimer
This FAQ is provided AS IS without any express or implied warranty. While the copyright holder and others have made
every effort to ensure that the information contained herein is accurate, they cannot be held liable for any damages arising
from its use.
1 Overview of comp.ai.games
1.1 What is the purpose of comp.ai.games?
CHARTER for comp.ai.games
The newsgroup comp.ai.games will provide a forum for the discussion of artificial intelligence in games and
game-playing. The group will explore practical and theoretical aspects of applying artificial intelligence concepts to the
domain of game-playing. In addition to the traditional game areas such as chess and go, the group will also welcome those
seeking to bring artificial intelligence into other computer games. Computer games in this context would consist of games
played by humans against computers, and by computers (including robots) against each other. This newsgroup is not an
appropriate place for the discussion of machine specific coding problems, nor is it the proper place to discuss strategies for
defeating computer opponents in existing games. There are other newsgroups already in existence to answer these
questions.
It should be pointed out that comp.ai.games is *not* just for professional or academic discussion of artificial intelligence
in games. Amateurs and hobbyist game developers will find themselves welcome here as well.
● minimax algorithms
● neural nets
● boardgame logic
It is certainly in order to post links to samples, demos and downloads for purposes of discussion of AI in games. There are
a fair number on the Net already, and many can be listed in this FAQ. Send me your URLs. 1.2.1
Game Designers - If you are/have/will be working on a commercially released game, please feel free to discuss how the
AI was approached for the game(s) in question. Of course, we don't want to know anything which is commercially
sensitive, but there's a distinct lack of knowledge in this area (how are well known games' AI approached)?
2.4A1 LISP
2.4A2 PROLOG
Games in these languages will probably not be commercially released, but who knows? There are LISP-like scripting
elements in some commercial games.
2.4B Conventional Languages
2.4B1 C and Pascal
2.4B2 C++ and Object Pascal
2.4B3 Visual Basic or other BASICs
All these can be summarised at once ... AI techniques are algorithms. The language they are implemented in is besides the
point. So, if you can code most algorithms in a language you're all set. The intelligence is in the algorithm, not the
language.
2.4C 4th Generation Languages
A very high-level language. May use natural English or visual constructs. Algorithms or data structures may be chosen by
the compiler.
Best example is Smalltalk. Not known for their applicability to Games.
2.4D Third-party Libraries
"AISEARCH--C++ Search Class Library", Victor Volkman, C/C++ Users Journal, November 1994. Describes a library
with several search algorithms. The library is available in binary from the C Users Group Library as CUG volume 42.
2.4E Development Environments
while. When he comes back, he takes the best algorithm and weights that were generated, sticks that into his program as a
fixed algorithm and lookup table, and goes to market.
This "AI During Development" method has both the worst and the best of both worlds. On the "worst" side of the ledger,
the programmer is going to the effort of learning and using real AI techniques during development, but is distributing a
fixed routine for the real game. On the "best" side, a program written using a procedural language and algorithms will run
much faster and use fewer resources than a program using real AI techniques.
A#d: Flexible Algorithms
Next we come to adaptive algorithms and data tables for the computer opponent. This class of methods uses algorithms
and weights which are adjusted at runtime to get better results. For instance, in a complex wargame, the computer
opponent can start out using a given method of moving units and attacking under different circumstances. Or, better yet, it
initially uses a variety of techniques. The program monitors the effectiveness of each algorithm and set of weights, and
gradually weeds out the least effective.
This method greatly resembles the one described above, except that the modification of algorithms and weights happens
while the game is running, rather than during development. Re-read the list of the drawbacks of run-time artificial
intelligence toolkits.
The use of genetic algorithms and other methods of self-modifying the program is considered by some to be artificial
intelligence, but it doesn't quite make the grade. For one thing, something as simple as this would never pass the Turing
test.
A#e: Analyzing the Human's Actions
The next step up is analysis of the human player's actions. In this context, "analysis" means just that: identification, study,
and classification of elements of the human's style. We've now reached an area of serious AI research. At the least, a
program at this level will need strong pattern recognition capabilities.
Computer pattern recognition, though much progress has been made recently, doesn't come anywhere near the ability of a
human being or most house pets. Think about a shooting game. If your opponent always dodges left when you shoot at
him, you, the human, will probably catch on pretty quickly and learn to fire a quick second shot to his left. A computer, on
the other hand, might see only that sometimes the human moves 14 units over, and sometimes 15; no pattern there!
Pattern recognition is much more effective as the number of cases grows. In neural-net terms, the training session can
continue forever, even if the net needs to give results before forever is over. And being able to divide the experiences into
different buckets can help even more. For this reason, asking the player to identify himself and storing as much
information permanently will greatly increase the game's apparent intelligence.
A#f: Sub-Goal Selection
Sub-goal selection has a pretty obvious meaning: the choice of goals to accomplish, each of which lead to the
accomplishment of the overall goal. For instance, in the space shoot-em-up we've used several times in this answer, let's
say the computer checks energy levels and basic abilities and determines that the human has a definite advantage because
it has a higher energy level, but is otherwise equal. The computer opponent could decide on the subgoal of depleting the
human's energy without giving up any advantage. To do that, the computer could decide to zip in, attract fire, and dash off
without being hit. The actions the computer would perform in this scenario could have been pre-programmed to crop up in
the proper circumstances, assuming that the programmer was infinitely far-sighted. In a proper AI system, the computer
would somehow recognize a need, sift through a large number of possible goals and actions, and choose the one most
likely to succeed. So far as I know, no game on the market uses anything approaching this technique.
A#g: Be creative
Combine, change, alter, come up with new, mutate, evolve, destroy, reconstruct, burn, spray paint algorithms to come up
with a solution.
So now, to return to the original question, how should you add AI to your game? You first need to decide what level of
computer response will suit your needs, abilities, and available time. If your main intent is to write a game which will keep
the human players entertained and challenged, go with the lowest workable level, as they are defined above. Do research,
and lift whatever you can from public-domain sources. The job of adding convincing responses to your game will
probably dwarf any estimates you make, so anything you can do to minimize the work is research effort well spent. The
best AIs implemented have been shown to use a combination of many approaches.
● Gomoku (5 in a row) on a 15x15 board. First player wins. The game is solved both with and without overlines (six
or more in a row) being counted as a win.
● 3-D Tic-Tac-Toe (3x3x3). First player wins.
● Awari. The general name of "Awari" covers a huge number of variants. Some of these have been solved analytically
and some have not.
● Sprouts, for some numbers of points, anyway. Sorry, I don't have the figures on hand, but they were given in the
original Scientific American article back in the '70's. I haven't heard of any further investigation.
---------------------------------------------------------------------------
7.1.1 Chess
A good, but not recently updated guide, is at http://www.xs4all.nl/~verhelst/chess/programming.html
7.1.2 GO
A good opening for AI discussion of GO is at: http://www.usgo.org/computer/
7.1.3 Backgammon
A fine starting point for Backgammon, including a discussion of the various AI programs available and their capabilities :
http://www.statslab.cam.ac.uk/~sret1/backgammon/
7.1.6 3D shootemups AI
Bots, programming of both server and client, can be found at http://www.planetquake.com/botshop and many more at
http://www.botepidemic.com/
8 Further information
8.1 What are some related news groups?
The entire comp.ai.* hierarchy. Particular topics people seem interested in include comp.ai.fuzzy, comp.ai.genetic, and
comp.ai.neural-net. The rec.games.* and alt.games.* hierarchies.
The comp.ai FAQ in particular has an enormous fund of information. I plan to incorporate as much of it as I can without
being accused of plagiarism, but for now, just look it over. This FAQ has a long list of reference books and articles
covering topics of interest to AI gamers. http://www.cs.uu.nl/wais/html/na-dir/ai-faq/general/.html
x2ftp.oulu.fi:/pub/books/game/
3d-books.320 3D graphics books reviewed by Brian Hook - 3.20
aaa_set.toc Action Arcade Adventure Set - Diana Gruber 1994 (FastGraph)
cgames_1.txt Computer Games I - Levy 1988
cgames_2.txt Computer Games II - Levy 1988
explorer.toc PC Game Programming Explorer - Dave Roberts 1994 playgod.zip Playing
God, Creating Virtual Worlds - Roehl 1994 (TOC/errata)
tricks.rev Tricks of the Game Programming Gurus - LaMothe/Ratcliff 1994
tricks.toc Tricks of the Game Programming Gurus - LaMothe/Ratcliff 1994
x2ftp.oulu.fi:/pub/msdos/programming/ai/
x2ftp.oulu.fi:/pub/msdos/programming/theory/
Especially the latter directory has some excellent documents.
BOLO : the 'official' archive seems to be down. I suggest starting at the home page, but the development seems to have
died. http://www.lgm.com/bolo/
10 Glossary
Case-based Reasoning:
Technique whereby "cases" similar to the current problem are retrieved and their "solutions" modified to work on the
current problem.
Fuzzy Logic:
In Fuzzy Logic, truth values are real values in the closed interval [0..1]. The definitions of the boolean operators are
extended to fit this continuous domain. By avoiding discrete truth-values, Fuzzy Logic avoids some of the problems
inherent in either-or judgments and yields natural interpretations of utterances like "very hot". Fuzzy Logic has
applications in control theory.
SPA:
Shortest Path Algorithm.
---------------------------------------------------------------------------
11 Acknowledgements
[Dave Burbage's]
My thanks to:
Sunir Shah - for handing me the FAQ.
Bjorn Reese - for not bullying me too much in getting this version out at all!
[Sunir Shah's]
My thanks to:
David Burbage for taking over the FAQ.
Steve Furlong
Doug Walker for conferring the copyrighted on me.
Robert Uhl
Lukas Bradley for HTMLizing the FAQ.
Steven Woodcock for putting up http://www.gameai.com
Everyone else for putting up with me.
[Steve Furlong's]
My thanks to:
Dan Thies for the charter, and for creating this group
To create a fun and successful game you need to be able to challenge your players. They need to feel that they are
overcoming something by beating your game. One way to achieve this is to have your game learn from their actions. Have
it analyze what they are doing and try to come up with counter attacks to provide a challenge and to create the illusion of
more intelligent opponents.
Pattern Matching
There are a lot of legitimate artificial intelligence algorithms to finding patterns in data and finding patterns in reactions to
them based on different success requirements. However, for the purposes of game design a lot of these are currently over
kill, and more importantly, not tuned to the scope of the problem.
You are not interested in creating a perfect reactionary machine in a game enemy, you are interested in provide a challenge
for the player. Any game already has a big plus for you as the designer, since it is your creation and you know the limits of
the game. You can therefore build your own pre-made patterns and test for them by checking the player's input or different
aspects of how they are playing.
For instance, in a fighting game such as Street Fighter 2, the player has six buttons they can choose from. By capturing
when the player hits these buttons and the distance of the enemy or if the enemy is in the air, you can find certain patterns
of play. The player may often try to punch and then move in for a throw when they are close enough. The player may
always try to do an uppercut when their enemy has jumped. By recording different input and game information at the time
of input you can create a map of possible actions that you can use for the game's AI. In doing so you can "learn" the
players moves and then try to counter them.
Real-time strategy (RTS) games have a much more complex system of attack, because the input of mouse clicks are
irrelevant. To try to learn what your player is attempting to do in an RTS game you will have to abstract the data of the
player's actions to find a common pattern. This is a totally game dependent process, but as an example let's use Command
& Conquer (C&C).
In C&C the objective of an average mission is to build troops and a base to defend yourself, then destroy the enemy and
their base. There are two necessary points of learning: how the player interacts with enemy units and how the player builds
his base. To keep this example in focus we will only explore the first learning objective although the second would be
crucial for counter-attacks.
Contact between C&C's units is very limited, when they are close enough together then they will begin to fight each other.
The first type data you will need to search on is the player's preferred unit types. The player may prefer doing tank rushes;
in this case, you will need to build defenses that specialize in defeating tanks. If they prefer making mini-gunner units then
you will adjust your defenses against that type of attack.
The player could have a preference of attacking the harvesters versus attacking the base directly. This can be recorded and
used so that you can send out troops to guard the harvesters or build more protection around the main base. To create a
good learning system you need to find the most common methods of attack and then figure out how you can determine if
they are occurring.
When you search for a pattern, you want to search on either one or more criteria. In order to do this you should save your
data in a way that you can easily access data quickly. This requires that at the time you save your data you need to plan for
the way the data will be accessed. If you wish to save each occurrence of data as a separate element you will need to save
them in an order that is quickly searchable. To use the C&C example you can save them starting with unit type. By
creating a table for each unit type you will short cut the need to search through all the records and collect the records that
contain the appropriate types of units.
Another method is to store all the data in one table of ratios. The ratio of attacks to harvesters versus attacks to the base for
instance. The ratio of using one type of unit over another. This would make a good in-game search method, as there are no
records to retrieve and analyze. Single entry records could still be saved and analyzed outside the game. You could also
weight the latest actions to represent them as more important than previous actions as an attempt to cover for any change
in tactics.
Overview
Creating learning methods, like any other type of game AI, is going to take a lot of sitting down and thinking of situations.
It will also take a lot of play testing. Players will come up with methods of play that you can't be obligated to think of
beforehand, so you need to build your learning database flexible enough to add in more situations.
Learning the player's styles and preferences is not a key to creating an unbeatable opponent or the ultimate AI; it is a
method for creating a challenge for your players. By never letting them develop a tactic that will constantly work against
the computer, you will ultimately extended the life of your game and keep it fresh and challenging.
Search:
Home • Essays • Interviews • Programs • Books • Media • Software • Hardware • Games • Features
Discussion Board • AISolutions • Competition • Education • Projects • Glossary • Links • Search
The History of AI
Mankind has long been curious about how the mind works and fascinated by intelligent machines. From Talos, the copper giant in
Iliad , Pinocchio, the fairy wooden puppet acting like a real boy, and the early debates on the nature of the mind and thought by
European philosophers and mathematicians, we can see people's desire to understand and even to create intelligence.
An Overview
AI is a result of the merge of philosophy, mathematics, psychology, neurology, linguistics, computer science, and many other fields.
Futhermore, the application of AI relates to almost any fields. This variety gives AI an endless potential. A relatively young science,
AI has made much progress in 50 years. Though fast-growing, AI has never actually caught up with all the expectation imposed on it.
There are two reasons for public's over-confidence in AI. First, AI theories are often ingenious and subtle even fictional, implying
much futuristic applications. Second, AI, being incorporated with computer technology, is often expected to progress as fast as the
computer technology. Conclusionally, AI is a young, energetic, and attractive science.
Search:
Home • Essays • Interviews • Programs • Books • Media • Software • Hardware • Games • Features
Discussion Board • AISolutions • Competition • Education • Projects • Glossary • Links • Search
will learn only how to manipulate syntax. It will be able to ask a question about what the capital or Saudi Arabia is, however if it
were given something a bit more complex, such as Martin Luther King's 'I have a dream...' speech, it would not be able to come up
with questions that force people to draw inferences (Ex.: Under what context was this speech given in?); neither does it really
understand what it is asking.
Many researchers realized this limitation, and as a result conceptual dependency (CD) theory was created. CR systems such as SAM
(Script Applier Mechanism) are story understanders. When SAM is given a story, and later asked questions about it, it will answer
many of those questions accurately. (Thus showing that it "understands") It can even infer. It accomplishes this through use of
scripts. The scripts designate a sequence of actions that are to be performed in chronological fashion for a certain situation. A
restaurant script would say that you would need to sit down by a table before you are served dinner.
The following is a small example of SAM (Script Applier Mechanism) paraphrasing a story (notice the inferences):
Input: John went to a restaurant. He sat down. He got mad. He left.
Paraphrase: JOHN WAS HUNGRY. HE DECIDED TO GO TO A RESTAURANT. HE WENT TO ONE. HE SAT DOWN IN A
CHAIR. A WAITER DID NOT GO TO THE TABLE. JOHN BECAME UPSET. HE DECIDED HE WAS GOING TO LEAVE
THE RESTAURANT. HE LEFT IT.
Scripts allow CD systems to draw links and inferences between things. They are also able to classify and distinguish primitive
actions. Kicking someone, for example could be a physical action that institutes 'hurt', while loving could be an emotional
expressiong that implies 'affection'.
In summary, the outcome of the test is too dependent on human involvement, and so also is the question of whether a certain system
is really intelligent or not. Such a question is actually quite trivial and shallow. As Tantomito puts it, We should be asking about the
kinds, quality and quantity of knowledge in a aystem, the kinds of inference that it can make with this knowledge, how well-directed
its search procedure is, and what means of automatic knowledge acquisition are provided. There are many dimensions of
intelligence, and these interact with one another.
Search:
Home • Essays • Interviews • Programs • Books • Media • Software • Hardware • Games • Features
Discussion Board • AISolutions • Competition • Education • Projects • Glossary • Links • Search
Syntactic Understanding
● Acquiring knowledge about the grammar and structure of words and sentences.
● Effective representation and implementation of this allows effective manipulation of language in respect to grammar.
● This is usually implemented through a parser.
The Parser
A parser assigns phrase markers (or grammatical objects) to words, such as verbs, adverbs, nouns, etc. It breaks down sentences into
grammatical objects. For example, the IQATS parser is context-free. That is, recursion is allowed in the parsing of words, thus
multiple levels of embedding will be allowed for grouping words with their respective phrase markers. This makes much more
intricate parsing possible.
A parsed sentence can be easily represented in tree form.
(subject (Jack) (predicate (verb (ate)) (direct-object (a frog))))
(s (np (jack)) (vp (verb (ate)) (np (a frog))))
Below the trees are representations of parsings in lists. This is how the data structures for parsed sentences may look like in a list
processing language like LISP.
Semantic Understanding
● This includes the literal meaning of words in language.
● The inferences they can make.
● The conclusions we can draw from them.
● Generally the most difficult process to develop in NLP.
● Associates, and defines objects with other interconnected objects in a tree structure.
mammal
/ \
/ \
bird sealife
● Programs implementing CD can infer or "read between the lines" of sentences (see example below) by drawing real-word
information from scripts. The primitive actions defining the actions of the real word are chained together in scripts. A restaurant
script for example may describe that a person should pay a tip before leaving, or if he is not satisfied with the service, he should not
be expected to pay a tip. A script on reading books may stipulate that one must lay one's eyes on a book before actually drawing
information from words. Scripts play an essential role in enabling programs using CD to draw inferences.
Here is a small example of SAM (Script Applier Mechanism) paraphrasing a story (notice the inferences):
Input: John went to a restaurant. He sat down. He got mad. He left.
Paraphrase: JOHN WAS HUNGRY. HE DECIDED TO GO TO A RESTAURANT. HE WENT TO ONE. HE SAT DOWN IN A
CHAIR. A WAITER DID NOT GO TO THE TABLE. JOHN BECAME UPSET. HE DECIDED HE WAS GOING TO LEAVE
THE RESTAURANT. HE LEFT IT.
Search:
Home • Essays • Interviews • Programs • Books • Media • Software • Hardware • Games • Features
Discussion Board • AISolutions • Competition • Education • Projects • Glossary • Links • Search
Applications of CR
What does this have to do with Artificial Intelligence? Well, imagine the potential of a program that could parse information and
store it in a string of concepts. Here are just a few applications that CR can be applied to:
● Translation: Translation programs are notorious for their incredible sketchy translations, due to the fact they often took the
absolute, or most common, meaning a word to translate. For example, a program might take the english phrase, "Mum, please
don't hassle me, I've gotta fly to school now, I woke up late." and translate the word fly as to take a plane. If CR was used,
firstly a parser would parse the sentence, then a conceptual-representor would create the necessary data structures (see CR
Structures), then a translator would translate those concepts into the necessary language, without the complications that arose
before.
● Paraphrasing: When you paraphrase something, you take the information you were given and then recreate your own shorter
version. The Microsoft Word paraphraser uses a mathematical approach to paraphrasing, here how it does it:
How does AutoSummarize determine what the key points are? AutoSummarize analyzes the document and
assigns a score to each sentence. (For example, it gives a higher score to sentences that contain words used
frequently in the document.) You then choose a percentage of the highest-scoring sentences to display in the
summary.
(Extract from Microsoft Word Online Help)
Whilst this is very effective when you would like to take a large document and only read the most important parts, such a
mathematical method produces output that does not always make sense as a whole.
If a CR-approach is used, the overall summary will not only highlight the most important parts, it will also make grammatical
sense as a piece of text. Such a program would be great for radio stations and other networks that receive information from
Press networks. Information could be paraphrased by the very computers receiving the information, making the job of making
the news reports presentable, a lot easier.
● Story creation: Apart from the field of computer arts, other applications of story creation could perhaps be in gaming, where
the story is altered and reconstructed dynamically according to how the player changes the game world - image the long-term
playability of that!
CR Structures (CRS)
After giving you three examples of how CR can be used, it makes you wonder why such programs aren't out yet. Creating a
successful CR program is incredibly difficult. Let's look at one possible CRS. This CRS was devised at Yale for, as they put it,
"possesion-changing-actions". They named it ATRANS, Abstract TRANSfer of Possesion. The example I am going to give is a very
simple one — for more complicated diagrams, see Does the Top-Down or Bottom-Up Approach Best Model the Human Brain.
Basically, a CRS (like ATRANS) represents a very simple action, a base action that many other actions can be made up of. ATRANS
can be used to represent give, trade, buy, exchange, sell and many, many more. Structurally, a CR is a series of 'slots', or
expectations, that the computer fills as its parses and interprets the sentence. Input: "John sold Mary a book" Through the
representation that would be created inferences such as these could be created:
● Mary gave John some money and John gave her a book.
Now, look at some of the inferences that the program could make:
● Mary has a book.
Obviously, for these type of inferences and paraphrases to be made, prior knowledge of various aspects must have been programmed
into the application. Also, the inferences made are the every day type. The type that would be made if the program also utilized
scripting. For example, John could have sold the book to Mary because it was a cookery book for Tahitian food, which John doesn't
really like. Therefore, he didn't read the book...but the program would infer he had done. Perhaps with more advanced CR primitives,
and a more powerful inference engine that will check for such possibilities before inferring them, such cases could be reduced or
overcome completely.
Scripting
How does the program about create such inferences? I said that it uses prior knowledge - this prior knowledge will often come in the
form of a script. Schank describes a script like this:
"...Scripts are prepackaged set of expectations, inferences, and knowledge that are applied in common situations, like a
blueprint for action without the detail filled in..."
Cognitive Computer, pg 114.
Scripts are used by humans, in a sense. Imagine you hear this story: "Bob went to the shops. Ten minutes later, he walked out with
his shopping and went home." You make a few assumptions - that Bob bought the shopping, that Bob was short of a few items etc.
The reason you know this is because you follow a script unconciously in your head. You know the basic outline of shopping (due to
experience) and you can fill in the details, and make assumptions from the rest. Let's look at another story: "Bob went to the
gardeners. He asked the waiter for a BMW and left." Now, this story makes no sense whatsoever to the normal person! This is
because is does not follow the "gardeners-script". Gardeners don't have waiters, nor do they sell BMWs!
Example of a Script.
Here is an incredibly simple example of a script, based on how to turn on a computer:
Script: COMPUTER-ON.
Track: Computer Room.
Props: Computer.
On-button.
Keyboard.
Roles: User.
CR Programs
Having said that CR programs are incredibly difficult to program, that doesn't mean such programs don't exist. All have been
demonstration, proof-of-concept programs. I will look at two - one called SAM and another called IPP.
SAM
Perhaps one of the most famous AI programs, SAM (Script Applier Mechanism) was developed in 1975 by Richard Cullingford,
Wendy Lehnert, Anatole Gershman and Jaime Carbonell. It was designed to read stories that followed basic scripts, and output
summaries in several languages, and create questions and answers based on the text.
SAM had 4 basic modules: a parser and generator based on a previous program, then the main module - the Script Applier (by
Input:
Friday evening a car swerved off Route 69. The vehicle struck a tree.
The passenger, a New Jersey man, was killed. David Hall, 27, was
Pronouced dead at the scene by Dr. Dana Blanchard, medical examiner.
frank Miller, 32, of 592 Foxon Rd., the driver, was taken to Milford
Hospital by Flanagan Ambulance. He was treated and released. No
charges were made. Patrolman Robert Onofrio investigated the accident.
English Summary:
AN AUTOMOBILE HIT A TREE NEAR HIGHWAY 69 FOUR DAYS AGO. DAVID HALL,
AGE 27, RESIDENCE IN NEW JERSEY, THE PASSENGER, DIED. FRANK MILLER,
AGE 32, RESIDENCE AT 593 FOXON ROAD IN NEW HAVEN, CONNECTICUT, THE
DRIVER, WAS SLIGHTLY INJURED. THE POLICE DEPARTMENT DID NOT FILE
CHARGES.
Spanish Summary:
Question-Answering Output:
IPP
IPP was developed in 1980 by Michael Lebowitz. IPP used slightly more advanced techniques than SAM -- in addition it to CR
primitives and scripts it used plans and goals too (beyond the scope of this essay). IPP was built to look at newpaper articles of a
specific domain, and to make generalizations about the information it read and remembered. IPP was important because it could
update and expand its own memory structures.
Here is some sample output from the program, reading articles about Basque terrorism:
*(PARSE S1-7)
(5 15 80) SPAIN
See Also:
Artificial Intelligence:Neural Networks
Neural Netware
by Andre' LaMothe - Xtreme Games
Biological Analogs
Neural Nets were inspired by our own brains. Literally, some brain in someone's head said, "I wonder how I work?" and then
proceeded to create a simple model of itself. Weird huh? The model of the standard neurode is based on a simplified model of a
human neuron invented over 50 years ago. Take a look at Figure 1.0. As you can see, there are 3 main parts to a neuron, they are:
● Dendrite(s)........................Responsible for collecting incoming signals.
The average human brain has about 100,000,000,000 or 1011 neurons and each neuron has up to 10,000 connections via the
dendrites. The signals are passed via electro-chemical processes based on NA (sodium), K (potassium), and CL (chloride) ions.
Signals are transferred by accumulation and potential differences caused by these ions, the chemistry is unimportant, but the signals
can be thought of simple electrical impulses that travel from axon to dendrite. The connections from one dendrite to axon are called
synapses and these are the basic signal transfer points.
So how does a neuron work? Well, that doesn't have a simple answer, but for our purposes the following explanation will suffice.
The dendrites collect the signals received from other neurons, then the soma performs a summation of sorts and based on the result
causes the axon to fire and transmit the signal. The firing is contingent upon a number of factors, but we can model it as an transfer
function that takes the summed inputs, processes them, and then creates an output if the properties of the transfer function are met. In
addition, the output is non-linear in real neurons, that is, signals aren't digital, they are analog. In fact, neurons are constantly
receiving and sending signals and the real model of them is frequency dependent and must be analyzed in the S-domain (the
frequency domain). The real transfer function of a simple biological neuron has, in fact, been derived and it fills a number of
chalkboards up.
Now that we have some idea of what neurons are and what we are trying to model, let's digress for a moment and talk about what we
can use neural nets for in video games.
Applications to Games
Neural nets seem to be the answer that we all are looking for. If we could just give the characters in our games a little brains, imagine
how cool a game would be! Well, this is possible in a sense. Neural nets model the structure of neurons in a crude way, but not the
high level functionality of reason and deduction, at least in the classical sense of the words. It takes a bit of thought to come up with
ways to apply neural net technology to game AI, but once you get the hang of it, then you can use it in conjunction with deterministic
algorithms, fuzzy logic, and genetic algorithms to create very robust thinking models for your games. Without a doubt better than
anything you can do with hundreds of if-then statements or scripted logic. Neural nets can be used for such things as:
Environmental Scanning and Classification - A neural net can be feed with information that could be interpreted as vision or auditory
information. This information can then be used to select an output response or teach the net. These responses can be learned in
real-time and updated to optimize the response.
Memory - A neural net can be used by game creatures as a form of memory. The neural net can learn through experience a set of
responses, then when a new experience occurs, the net can respond with something that is the best guess at what should be done.
Behavioral Control - The output of a neural net can be used to control the actions of a game creature. The inputs can be various
variables in the game engine. The net can then control the behavior of the creature.
Response Mapping - Neural nets are really good at "association" which is the mapping of one space to another. Association comes in
two flavors: autoassociation which is the mapping of an input with itself and heterassociation which is the mapping of an input with
something else. Response mapping uses a neural net at the back end or output to create another layer of indirection in the control or
behavior of an object. Basically, we might have a number of control variables, but we only have crisp responses for a number of
certain combinations that we can teach the net with. However, using a neural net on the output, we can obtain other responses that are
in the same ballpark as our well defined ones.
The above examples may seem a little fuzzy, and they are. The point is that neural nets are tools that we can use in whatever way we
like. The key is to use them in cool ways that make our AI programming simpler and make game creatures respond more
intelligently.
Now that we have seen the wetware version of a neuron, let's take a look at the basic artificial neuron to base our discussions on.
Figure 2.0 is a graphic of a standard "neurode" or "artificial neuron". As you can see, it has a number of inputs labeled X1 - Xn and B.
These inputs each have an associated weight w1 - wn, and b attached to them. In addition, there is a summing junction Y and a single
output y. The output y of the neurode is based on a transfer or "activation" function which is a function of the net input to the
neurode. The inputs come from the Xi's and from B which is a bias node. Think of B as a "past history", "memory", or "inclination".
The basic operation of the neurode is as follows: the inputs Xi are each multiplied by their associated weights and summed. The
output of the summing is referred to as the input activation Ya. The activation is then fed to the activation function fa(x) and the final
output is y. The equations for this is:
Eq. 1.0
n
Ya = B*b + ∑ Xi * wi
i =1
AND
y = fa(Ya)
Before we move on, we need to talk about the inputs Xi, the weights wi, and their respective domains. In most cases, inputs consist of
the positive and negative integers in the set ( -∞ , +inputs are ∞ ). However, many neural nets use simpler bivalent values (meaning
that they have only two values). The reason for using such a simple input scheme is that ultimately all binary as image or bipolar and
The equations for each are fairly simple, but each are derived to model or fit various properties.
The step function is used in a number of neural nets and models a neuron firing when a critical input signal is reached. This is the
purpose of the factor θ, it models the critical input level or threshold that the neurode should fire at. The linear activation function is
used when we want the output of the neurode to more closely follow the input activation. This kind of activation function would be
used in modeling linear systems such as basic motion with constant velocity. Finally, the exponential activation function is used to
create a non-linear response which is the only possible way to create neural nets that have non-linear responses and model non-linear
processes. The exponential activation function is key in advanced neural nets since the composition of linear and step activation
functions will always be linear or step, we will never be able to create a net that has non-linear response, therefore, we need the
exponential activation function to address the non-linear problems that we want to solve with neural nets. However, we are not
locked into using the exponential function. Hyperbolic, logarithmic, and transcendental functions can be used as well depending on
the desired properties of the net. Finally, we can scale and shift all the functions if we need to.
Figure 3.0 - A 4 Input, 3 Neurode, SingleLayer Neural Net.
As you can imagine, a single neurode isn't going to do alot for us, so we need to take a group of them and create a layer of neurodes,
this is shown in Figure 3.0. The figure illustrates a single layer neural network. The neural net in Figure 3.0 has a number of inputs
and a number of output nodes. By convention this is a single layer net since the input layer is not counted unless it is the only layer in
the network. In this case, the input layer is also the output layer and hence there is one layer. Figure 4.0 shows a two layer neural net.
Notice that the input layer is still not counted and the internal layer is referred to as "hidden". The output layer is referred to as the
output or response layer. Theoretically, there is no limit to the number of layers a neural net can have, however, it may be difficult to
derive the relationship of the various layers and come up with tractable training methods. The best way to create multilayer neural
nets is to make each network one or two layers and then connect them as components or functional blocks.
All right, now let's talk about temporal or time related topics. We all know that our brains are fairly slow compared to a digital
v7 = (1,1,1)
For example if we let b2=1, b1=0, and b0=1 then we get the vector:
v = (1,0,0) * 1 + (0,1,0) * 0 + (0,0,1) * 1 = (1,0,0) + (0,0,0) + (0,0,1) = (1,0,1) which is v5 in our possible input set.
A basis is a special vector summation that describes a set of vectors in a space. So v describes all the vector in our space. Now to
make a long story short, the more orthogonal the vectors in the input set are the better they will distribute in a neural net and the
better they can be recalled. Orthogonality refers to the independence of the vectors or in other words if two vector are orthogonal then
their dot product is 0, their projection onto one another is 0, and they can't be written in terms of one another. In the set v there are a
lot of orthogonal vectors, but they come in small groups, for example v0 is orthogonal to all the vectors, so we can always include it.
But if we include v1 in our set S then the only other vectors that will fit and maintain orthogonality are v2 and v4 or the set:
Why? Because vi • vj for all i,j from 0..3 is equal to 0. In other words, the dot product of all the pairs of vectors in 0, so they must all
be orthogonal. Therefore, this set will do very well in a neural net as input vectors. However, the set:
v6 = (1,1,0), v7 = (1,1,1)
will potentially do poorly as inputs since v6•v7 is non-zero or in a binary system it is 1. The next question is, "can we measure this
orthogonality?" The answer is yes. In the binary vector system there is a measure called hamming distance. It is used to measure the
n-dimensional distance between binary bit vectors. It is simply, the number of bits that are different between two vectors. For
example the vectors:
v0 = (0,0,0), v1 = (0,0,1)
0 1 0
1 0 0
1 1 1
We can model this with a two input MP neural net with weights w1=1, w2=1, and θ = 2. This neural net is shown in Figure 6.0a. As
you can see, all input combinations work correctly. For example, if we try inputs X1=0, Y1=1, then the activation will be:
X1*w1 + X2*w2 = (1)*(1) + (0)*(1) = 1.0
If we apply 1.0 to the activation function fmp(x) then the result is 0 which is correct. As another example, if we try inputs X1=1, X2=1,
then the activation will be:
X1*w1 + X2*w2 = (1)*(1) + (1)*(1) = 2.0
If we input 2.0 to the activation function fmp(x), then the result is 1.0 which is correct. The other cases will work also. The function of
the OR is similar, but the threshold θ of is changed to 1.0 instead 2.0 as it is in the AND. You can try running through the truth table
yourself to see the results.
The XOR network is a little different because it really has 2 layers in a sense because the results of the pre-processing are further
processed in the output neuron. This is a good example of why a neural net needs more than one layer to solve certain problems. The
XOR is a common problem in neural nets that is used to test a neural net's performance. In any case, XOR is not linearly separable in
a single layer, it must be broken down into smaller problems and then the results added together. Let's take a look at XOR as the final
example of MP neural networks. The truth table for XOR is as follows:
Table 2.0 - Truth Table for Logical XOR.
X1 X2 Output
0 0 0
0 1 1
1 0 1
1 1 0
XOR is only true when the inputs are different, this is a problem since both inputs map to the same output. XOR is not linearly
separable, this is shown in Figure 7.0. As you can see, there is no way to separate the proper responses with a straight line. The point
is that we can separate the proper responses with 2 lines and this is just what 2 layers do. The first layer pre-processes or solves part
of the problem and the remaining layer finishes up. Referring to Figure 6.0c, we see that the weights are w1=1, w2=-1, w3=1, w4=-1,
w5=1, w6=1. The network works as follows: layer one computes if X1 and X2 are opposites in parallel, the results of either case (0,1)
or (1,0) are feed to layer two which sums these up and fires if either is true. In essence we have created the logic function:
z = ((X1 AND NOT X2) OR (NOT X1 AND X2))
If you would like to experiment with the basic McCulloch Pitts neurode Listing 1.0 is a complete 2 input, single neurode simulator
that you can experiment with.
Listing 1.0 - A McCulloch-Pitts Logic Neurode Simulator
// INCLUDES
/////////////////////////////////////////////////////
// MAIN
/////////////////////////////////////////////////////
void main(void)
{
float threshold, // this is the theta term used to threshold the summation
w1,w2, // these hold the weights
x1,x2, // inputs to the neurode
y_in, // summed input activation
y_out; // final output of neurode
printf("\n\nBegining Simulation:");
while(1)
{
printf("\n\nSimulation Parms: threshold=%f,
W=(%f,%f)\n",threshold,w1,w2);
// compute activation
y_in = x1*w1 + x2*w2;
// try again
printf("\nDo you wish to continue Y or N?");
char ans[8];
scanf("%s",ans);
if(toupper(ans[0])!='Y')
break;
} // end while
printf("\n\nSimulation Complete.\n");
} // end main
That finishes up our discussion of the basic building block invented by McCulloch and Pitts now let's move on to more contemporary
neural nets such as those used to classify input vectors.
Figure 8.0 - The Basic Neural Net Model Used for Discussion.
Notice that the function is step with bipolar outputs. Before we continue, let me place a seed in your mind; the bias and threshold end
up doing the same thing, they give us another degree of freedom in our neurons that make the neurons respond in ways that can't be
achieved without them. You will see this shortly.
The single neurode net in Figure 8.0 is going to perform a classification for us. It is going to tell us if our input is in one class or
another. For example, is this image a tree or not a tree. Or in our case is this input (which just happens to be the logic for an AND) in
the +1 or -1 class? This is the basis of most neural nets and the reason I was belaboring linear separability. We need to come up with
a linear partitioning of space that maps our inputs and outputs so that there is a solid delineation of space that separates them. Thus,
we need to come up with the correct weights and a bias that will do this for us. But how do we do this? Do we just use trial and error
or is there a methodology? The answer is that there are a number of training methods to teach a neural net. These training methods
work on various mathematical premises and can be proven, but for now, we're just going to pull some values out of the hat that work.
These exercises will lead us into the learning algorithms and more complex nets that follow.
All right, we are trying to finds weights wi and bias b that give use the correct result when the various inputs are feed to our network
with the given activation function fc(x). Let's write down the activation summation of our neurode and see if we can infer any
relationship between the weights and the inputs that might help us. Given the inputs X1 and X2 with weights w1 and w2 along with
B=1 and bias b, we have the following formula:
Eq. 7.0
X1*w1 + X2*w2 + B*b=θ
.
.
X2 = -X1*w1/w2 + (θ -b)/w2 (solving in terms of X2)
What is this entity? It's a line! And if the left hand side is greater than or equal to θ, that is, (X1*w1 + X2*w2 + b) then the neurode
will fire and output 1, otherwise the neurode will output -1. So the line is a decision boundary. Figure 9.0a illustrates this. Referring
to the figure, you can see that the slope of the line is -w1/w2 and the X2 intercept is (θ -b)/w2. Now can you see why we can get rid of
θ? It is part of a constant and we can always scale b to take up any loss, so we will assume that θ = 0, and the resulting equation is:
X2 = -X1*w1/w2 - b/w
What we want to find are weights w1 and w2 and bias b so that it separates our outputs or classifies them into singular partitions
without overlap. This is the key to linear separability. Figure 9.0b shows a number of decision boundaries that will suffice, so we can
pick any of them. Let's pick the simplest values which would be:
w1 = w2 = 1
b = -1
With these values our decision boundary becomes:
X2 = -X1*w1/w2 - b/w2 -> X2 = -1*X1 + 1
The slope is -1 and the X2 intercept is 1. If we plug the input vectors for the logical AND into this equation and use the fc(x) activation
function then we will get the correct outputs. For example if, X2 + X1-1 >0 then fire the neurode, else output -1. Let's try it with our
AND inputs and see what we come up with:
Table 4.0 - Truth Table for Bipolar AND with decision boundary.
Input X1 X2 Output (X2+X1-1)
-1 -1 (-1) +( -1) -1 = 3 < 0 don't fire, output -1
-1 1 (-1) + (1) -1 = -1< 0 don't fire, output -1
1 -1 (1) + (-1) -1 = -2 < 0 don't fire, output -1
1 1 (1) + (1)-1 = 1 > 0 fire, output 1
As you can see, the neural network with the proper weights and bias solves the problem perfectly. Moreover, there are a whole family
of weights that will do just as well (sliding the decision boundary in a direction perpendicular to itself). However, there is an
important point here. Without the bias or threshold, only lines through the origin would be possible since the X2 intercept would have
to be 0. This is very important and the basis for using a bias or threshold, so this example has proven to be an important one since it
has flushed this fact out. So, are we closer to seeing how to algorithmically find weights? Yes, we now have a geometrical analogy
and this is the beginning of finding an algorithm.
● There are n input vectors and we will refer to the set as I and the jth element as Ij.
● Outputs will be referred to as yj and there are k of them, one for each input Ij
● The weights w1-wk are contained in a single vector w = (w1, w2, ... wk).
Step 1. Initialize all your weights to 0, and let them be contained in a vector w that has n entries. Also initialize the bias b to 0.
end do
The algorithm is nothing more than an "accumulator" of sorts. Shifting, the decision boundary based on the changes in the input and
output. The only problem is that it sometimes can't move the boundary fast enough (or at all) and "learning" doesn't take place.
So how do we use Hebbian learning? The answer is, the same as the previous network except that now we have an algorithmic
method teach the net with, thus we refer to the net as a Hebb or Hebbian Net. As an example, let's take our trusty logical AND
function and see if the algorithm can find the proper weights and bias to solve the problem. The following summation is equivalent to
running the algorithm:
w = [I1*y1] + [I2*y2] + [I3*y3] + [I4*y4] = [(-1, -1)*(-1)] + [(-1, 1)*(-1)] + [( 1, -1)*(-1)] + [(1, 1)*(1)] = (2,2)
Therefore, w1=2, w2=2, and b=-2. These are simply scaled versions of the values w1=1, w2=1, b=-1 that we derived geometrically in
the previous section. Killer huh! With this simple learning algorithm we can train a neural net (consisting of a single neurode) to
respond to a set of inputs and either classify the input as true or false, 1 or -1. Now if we were to array these neurodes together to
create a network of neurodes then instead of simple classifying the inputs as on or off, we can associate patterns with the inputs. This
is one of the foundations for the next network neural net structure; the Hopfield net. One more thing, the activation function used for
a Hebb Net is a step with a threshold of 0.0 and bipolar outputs 1 and -1.
To get a feel for Hebbian learning and how to implement an actual Hebb Net, Listing 2.0 contains a complete Hebbian Neural Net
Simulator. You can create networks with up to 16 inputs and 16 neurodes (outputs). The program is self explanatory, but there are a
couple of interesting properties: you can select 1 of 3 activation functions, and you can input any kind of data you wish. Normally,
we would stick to the Step activation function and inputs/outputs would be binary or bipolar. However, in the light of discovery,
maybe you will find something interesting with these added degrees of freedom. However, I suggest that you begin with the step
function and all bipolar inputs and outputs.
Listing 2.0 - A Hebb Net Simulator (in neuralnet.zip).
John Hopfield is a physicist that likes to play with neural nets (which is good for us). He came up with a simple (in structure at least),
but effective neural network called the Hopfield Net. It is used for autoassociation, you input a vector x and you get x back
(hopefully). A Hopfield net is shown in Figure 10.0. It is a single layer network with a number of neurodes equal to the number of
inputs Xi. The network is fully connected meaning that every neurode is connected to every other neurode and the inputs are also the
outputs. This should strike you as weird since there is feedback. Feedback is one of the key features of the Hopfield net and this
feedback is the basis for the convergence to the correct result.
The Hopfield network is an iterative autoassociative memory. This means that is may take one or more cycles to return the correct
result (if at all). Let me clarify; the Hopfield network takes an input and then feeds it back, the resulting output may or may not be the
desired input. This feedback cycle may occur a number of times before the input vector is returned. Hence, a Hopfield network
functional sequence is: first we determine the weights based on our input vectors that we want to autoassociate, then we input a
vector and see what comes out of the activations. If the result is the same as our original input then we are done, if not, then we take
the result vector and feed it back through the network. Now let's take a look at the weight matrix and learning algorithm used for
Hopfield nets.
The learning algorithm for Hopfield nets is based on the Hebbian rule and is simply a summation of products. However, since the
Hopfield network has a number of input neurons the weights are no longer a single array or vector, but a collection of vectors which
are most compactly contained in a single matrix. Thus the weight matrix W for a Hopfield net is created based on this equation:
Given:
● Inputs vectors are in bipolar form I = (-1,1,,...-1,1) and contain k elements.
● There are n input vectors and we will refer to the set as I and the jth element as Ij.
● Outputs will be referred to as yj and there are k of them, one for each input Ij.
● The weight matrix W is square and has dimension kxk since there are k inputs.
Eq. 8.0
k
W (kxk) = ∑ Iit x Ii
i=1
note: each outer product will have dimension k x k, since we are multiplying a column vector and a row vector.
and, Wii = 0, for all i.
Notice that there are no bias terms and the main diagonal of W must be all zero's. The weight matrix is simply the sum of matrices
generated by multiplying the transpose Iit x Ii for all i from 1 to n. This is almost identical to the Hebbian algorithm for a single
neurode except that instead of multiplying the input by the output, the input is multiplied by itself, which is equivalent to the output
in the case of autoassociation. Finally, the activation function fh(x) is shown below:
Eq. 9.0
fh(x) = 1, if x ≥ 0
0, if x < 0
fh(x) it is a step function with a binary output. This means that the inputs must be binary, but we already said that inputs are bipolar?
Well, they are, and they aren't. When the weight matrix is generated we convert all input vectors to bipolar, but for normal operation
we use the binary version of the inputs and the output of the Hopfield net will also be binary. This convention is not necessary, but
makes the network discussion a little simpler. Anyway, let's move on to an example. Say we want to create a four node Hopfield net
and we want it to recall these vectors:
I1=(0,0,1,0), I2=(1,0,0,0), I3=(0,1,0,1) Note: they are all orthogonal.
Now we need to compute W1, W2, W3, where Wi is the product of the transpose of each input with itself.
1 1 -1 1
1 1 -1 1
-1 -1 1 -1
1 1 -1 1
1 -1 -1 -1
-1 1 1 1
-1 1 1 1
-1 1 1 1
1 -1 1 -1
-1 1 -1 1
1 -1 1 -1
-1 1 -1 1
W(1+2+3) =
3 -1 -1 -1
-1 3 -1 3
-1 -1 3 -1
-1 3 -1 3
Zeroing out the main diagonal gives us the final weight matrix:
W=
0 -1 -1 -1
-1 0 -1 3
-1 -1 0 -1
-1 3 -1 0
That's it, now we are ready to rock. Let's input our original vectors and see the results. To do this we simply have to matrix multiple
the input by the matrix and then process each output value with our activation function fh(x). Here are the results:
The inputs were perfectly recalled, and they should be since they were all orthogonal. As a final example, let's assume that our input
(vision, auditory etc.) is a little noisy and the input has a single error in it. Let's take I3 = (0,1,0,1) and add some noise to I3 resulting
in I3noise = (0,1,1,1). Now let's see what happens if we input this noisy vector to the Hopfield net:
Amazingly enough, the original vector is recalled. This is very cool. So we might have a memory that is filled with bit patterns that
look like trees, (oaks, weeping willow, spruce, redwood etc.) then if we input another tree that is similar to say a weeping willow, but
hasn't been entered into the net, our net will (hopefully) output a weeping willow indicating that this is what it "thinks" it looks like.
This is one of the strengths of associative memories, we don't have to teach it every possible input, but just enough to give it a good
idea. Then inputs that are "close" will usually converge to an actual trained input. This is the basis for image, and voice recognition
systems. Don't ask me where the heck the "tree" analogy came from. Anyway, to complete our study of neural nets, I have included a
final Hopfield autoassociative simulator that allows you to create nets with up to 16 neurodes. It is similar to the Hebb Net, but you
must use a step activation function and your inputs exemplars must be in bipolar while training and binary while associating
(running). Listing 3.0 contains the code for the simulator.
Listing 3.0 - A Hopfiled Autoassociative Memory Simulator (in neuralnet.zip).
Brain Dead...
Well that's all we have time for. I was hoping to get to the Perceptron network, but oh well. I hope that you have an idea of what
neural nets are and how to create some working computer programs to model them. We covered basic terminology and concepts,
some mathematical foundations, and finished up with some of the more prevalent neural net structures. However, there is still so
much more to learn about neural nets. We need to cover Perceptrons, Fuzzy Associative Memories or FAMs, Bidirectional
Associative Memories or BAMs, Kohonen Maps, Adalines, Madalines, Backpropagation networks, Adaptive Resonance Theory
networks, "Brain State in a Box", and a lot more. Well that's it, my neural net wants to play N64!
Download: neuralnet.zip (Source & EXE)
Michael A Arbib
Abstract
See the abstract for Chapter B1.
There are many types of artificial neuron, but most of them can be captured as formal objects of the kind
shown in figure B1.1.1. There is a set X of signals which can be carried on the multiple input lines x1 ,
. . . , xn and single output line y. In addition, the neuron has an internal state s belonging to some state
set S.
Figure B1.1.1. A ‘generic’ neuron, with inputs x1 , . . . , xn , output y, and internal state s.
A neuron may be either discrete-time or continuous-time. In other words, the input values, state and
output may be given at discrete times t ∈ Z = {0, 1, 2, 3, . . .}, say, or may be given at all times t in some
interval contained in the real line R. A discrete-time neuron is then specified by two functions which
specify (i) how the new state is determined by the immediately preceding inputs and (in some neuron
models, but by no means all) the previous state, and (ii) how the current output is to be ‘read out’ from
the current state:
The next-state-function f : Xn × S → S, s(t) = f (x1 (t − 1), . . . , xn (t − 1), s(t − 1)); and
The output function g : S → Y, y(t) = g(s(t)).
As we shall see in later sections, popular choices take the signal-set X to be either a binary set—{0, 1}
is the ‘classical choice’, though physicists, inspired by the ‘spin-glass’ analogy, often use the spin-down,
spin-up set denoted by {−1, +1}—or an interval of the real line, such as [0, 1]; while the state-set is often
taken to be R itself. A continuous-time neuron is also specified by two functions f : Xn × S → S, and
g : S → Y, y(t) = g(s(t)), but now f serves to define the rate of change of the state, that is, it provides
the right-hand side of the differential equation which defines the state dynamics:
ds(t)
= f (x1 (t), . . . , xn (t), s(t)) .
dt
Clearly, S at least can no longer be a discrete set. A popular choice is to take the signal-set X to be
an interval of the real line, such as [0, 1], and the state-set to be R itself.
The focus of this chapter will be on motivating and defining some of the best known forms for f
and g. But first it is worth noting that the subject of neural computation is not interested in neurons as
c 1997 IOP Publishing Ltd and Oxford University Press Handbook of Neural Computation release 97/1 B1.1:1
Neurons and neural networks: the most abstract view
Figure B1.1.2. A neural network viewed as a system (continuous-time case) or automaton (discrete-time
case). The input at time t is the pattern on the input lines, the output is the pattern on the output lines; and
the internal state is the vector of states of all neurons of the network.
ends in themselves but rather in neurons as units which can be composed into networks. Thus, both as
background for later chapters and as a framework for the focused discussion of individual neurons in this
chapter, we briefly introduce the idea of a neural network.
We first show how a neural network comprised of continuous-time neurons can also be seen as a
continuous-time system in this sense. As typified in figure B1.1.2, we characterize a neural network by
selecting N neurons and by taking the output line of each neuron, which may be split into several branches
carrying identical output signals, and either connecting each branch to a unique input line of another neuron
or feeding it outside the network to provide one of the NL network output lines. Then every input to a
given neuron must be connected either to an output of another neuron or to one of the (possibly split)
N1 input lines of the network. Then the input set X of the entire network is RN1 , the state set Q = RN ,
and the output set Y = RNL . If the ith output line comes from the j th neuron, then the output function
is determined by the fact that the ith component of the output at time t is the output gj (sj (t)) of the j th
neuron at time t. The state transition function for the neural network follows from the state transition
functions of each of the N neurons
dsj (t)
= fj (x1j (t), . . . , xnj j (t), sj (t))
dt
as soon as we specify whether xij (t) is the output of the kth neuron or the value currently being applied
on the lth input line of the overall network.
Turning to the discrete-time case, we first note that, in computer science, an automaton is a discrete-
time system with discrete input, output and state spaces. Formally, we describe an automaton by the sets X,
Y and Q of inputs, outputs and states, respectively, together with the next-state function δ : Q × X → Q
and the output function β : Q → Y . If the automaton is in state q and receives input x at time t,
then its next state will be δ(q, x) and its next output will be β(q). It should be clear that a network
like that shown in figure B1.1.2, but now a discrete-time network made up solely from discrete-time
neurons, functions like a finite automaton, as each neuron changes state synchronously on each tick of
the time-scale t = 0, 1, 2, 3, . . . . Conversely, it can be shown (see e.g. Arbib 1987, Chapter 2—that the
result was essentially, though inscrutably, due to McCulloch and Pitts 1943) that any finite automaton
can be simulated by a suitable network of discrete-time neurons (even those of the ‘McCulloch–Pitts
type’ defined below). Although we can define a neural network for the very general notion of ‘neuron’
shown in figure B1.1.1, most artificial neurons are of the kind shown in figure B1.1.3 in which the input
lines are parametrized by real numbers. The parameter attached to an input line to neuron i that comes
from the output of neuron j is often denoted by wij , and is referred to by such terms as the strength or
synaptic weight for the connection from neuron j to neuron i. Much of the study of neural computation B3.3
is then devoted to finding settings for these weights which will get a given neural network to approximate
some desired behavior. The weights may either be set on the basis of some explicit design principles,
or ‘discovered’ through the use of learning rules whereby the weight settings are automatically adjusted B3.3
‘on the basis of experience’. But all this is meat for later chapters, and we now return to our focal aim:
c 1997 IOP Publishing Ltd and Oxford University Press Handbook of Neural Computation release 97/1 B1.1:2
Neurons and neural networks: the most abstract view
introducing a number of the basic models of single neurons which ‘fill in the details’ in figure B1.1.3. As
described in Section A1.2, there are radically different types of neurons in the human brain, and further A1.2
variations in neuron types of other species.
Figure B1.1.3. A neuron in which each input xi passes through a ‘synaptic weight’ or ‘connection strength’
wi .
Figure B1.1.4. The ‘basic’ neuron. The soma and dendrites act as the input surface; the axon carries the
output signals. The tips of the branches of the axon form synapses upon other neurons or upon effectors.
The arrows indicate the direction of information flow from inputs to outputs.
In neural computation, the artificial neurons are designed as variations on the abstractions of brain
theory and implemented in software, VLSI, or other media. Figure B1.1.4 indicates the main features E1.3, E1.4.3
needed to visualize biological neurons. We divide the neuron into three parts: the dendrites, the soma
(cell body) and a long fiber called the axon whose branches form the axonal arborization. The soma
and dendrites act as input surface for signals from other neurons and/or input devices (sensors). The
axon carries ‘spikes’ from the neuron to other neurons and/or effectors (motors, etc). Towards a first
approximation, we may think of a ‘spike’ as an all-or-none (binary) event; each neuron has a ‘refractory
period’ such that at most one spike can be triggered per refractory period. The locus of interaction between
an axon terminal and the cell upon which it impinges is called a synapse, and we say that the cell with
the terminal synapses upon the cell with which the connection is made.
References
Arbib M A 1987 Brains, Machines and Mathematics 2nd edn (Berlin: Springer)
McCulloch W S and Pitts W H 1943 A logical calculus of the ideas immanent in nervous activity Bull. Math. Biophys.
5 115–33
c 1997 IOP Publishing Ltd and Oxford University Press Handbook of Neural Computation release 97/1 B1.1:3
Justin Heyes-Jones personal web pages - A* Tutorial
[ Contents | Who Am I? | Resume | Jokes | Products | A* Tutorial |
Programmers book shelf | Java | Win95 | Guestbook | Send Email ]
A* algorithm tutorial
Introduction
This document contains a description of the AI algorithm known as A*. The downloads
section also has full source code for an easy to use extendable implementation of the
algorithm, and two example problems.
Previously I felt that it would be wrong of me to provide source code, because I wanted to
AI Game focus on teaching the reader how to implement the algorithm rather than just supplying a
Programming ready made package. I have now changed my mind, as I get many emails from people
struggling to get something working. The example code is written in Standard C++ and
Wisdom
uses STL, and does not do anything machine or operating system specific, so hopefully it
will be quite useful to a wide audience.
Amazon.com
associate
State space search
A* is a type of search algorithm. Some problems can be solved by representing the world
in the initial state, and then for each action we can perform on the world we generate states
Amazon.co.uk for what the world would be like if we did so. If you do this until the world is in the state
associate that we specified as a solution, then the route from the start to this goal state is the solution
to your problem.
In this tutorial I will look at the use of state space search to find the shortest path between
two points (pathfinding), and also to solve a simple sliding tile puzzle (the 8-puzzle). Let's
look at some of the terms used in Artificial Intelligence when describing this state space
search.
Some terminology
A node is a state that the problem's world can be in. In pathfinding a node would be just a
2d coordinate of where we are at the present time. In the 8-puzzle it is the positions of all
the tiles.
Next all the nodes are arranged in a graph where links between nodes represent valid steps
in solving the problem. These links are known as edges. In the 8-puzzle diagram the edges
are shown as blue lines. See figure 1 below.
State space search, then, is solving a problem by beginning with the start state, and then
for each node we expand all the nodes beneath it in the graph by applying all the possible
moves that can be made at each point.
At this point we introduce an important concept, the heuristic. This is like an algorithm,
but with a key difference. An algorithm is a set of steps which you can follow to solve a
problem, which always works for valid input. For example you could probably write an
algorithm yourself for multiplying two numbers together on paper. A heuristic is not
guaranteed to work but is useful in that it may solve a problem for which there is no
Cost
When looking at each node in the graph, we now have an idea of a heuristic, which can
estimate how close the state is to the goal. Another important consideration is the cost of
getting to where we are. In the case of pathfinding we often assign a movement cost to
each square. The cost is the same then the cost of each square is one. If we wanted to
differentiate between terrain types we may give higher costs to grass and mud than to
newly made road. When looking at a node we want to add up the cost of what it took to
get here, and this is simply the sum of the cost of this node and all those that are above it
in the graph.
8 Puzzle
Let's look at the 8 puzzle in more detail. This is a simple sliding tile puzzle on a 3*3 grid
where one tile is missing and you can move the other tiles into the gap until you get the
puzzle into the goal position. See figure 1.
There are 362,880 different states that the puzzle can be in, and to find a solution the
search has to find a route through them. From most positions of the search the number of
edges (that's the blue lines) is two. That means that the number of nodes you have in each
level of the search is 2^d where d is the depth. If the number of steps to solve a particular
state is 18, then that’s 262,144 nodes just at that level.
The 8 puzzle game state is as simple as representing a list of the 9 squares and what's in
them. Here are two states for example; the last one is the GOAL state, at which point
we've found the solution. The first is a jumbled up example that you may start from.
Pathfinding
In a video game, or some other pathfinding scenario, you want to search a state space and
find out how to get from somewhere you are to somewhere you want to be, without
bumping into walls or going too far. For reasons we will see later, the A* algorithm will
not only find a path, if there is one, but it will find the shortest path. A state in pathfinding
is simply a position in the world. In the example of a maze game like Pacman you can
represent where everything is using a simple 2d grid. The start state for a ghost say, would
be the 2d coordinate of where the ghost is at the start of the search. The goal state would
be where pacman is so we can go and eat him. There is also example code to do
pathfinding on the downloads page.
Implementing A*
We are now ready to look at the operation of the A* algorithm. What we need to do is start
with the goal state and then generate the graph downwards from there. Let's take the
8-puzzle in figure 1. We ask how many moves can we make from the start state? The
answer is 2, there are two directions we can move the blank tile, and so our graph expands.
If we were just to continue blindly generating successors to each node, we could
potentially fill the computer's memory before we found the goal node. Obviously we need
to remember the best nodes and search those first. We also need to remember the nodes
that we have expanded already, so that we don't expand the same state repeatedly.
Let's start with the OPEN list. This is where we will remember which nodes we haven't yet
expanded. When the algorithm begins the start state is placed on the open list, it is the only
state we know about and we have not expanded it. So we will expand the nodes from the
start and put those on the OPEN list too. Now we are done with the start node and we will
put that on the CLOSED list. The CLOSED list is a list of nodes that we have expanded.
f=g+h
Using the OPEN and CLOSED list lets us be more selective about what we look at next in
the search. We want to look at the best nodes first. We will give each node a score on how
good we think it is. This score should be thought of as the cost of getting from the node to
the goal plus the cost of getting to where we are. Traditionally this has been represented by
So far we have looked at the components of the A*, let's see how they all fit together to
make the algorithm :
Pseudocode
Hopefully the ideas we looked at in the preceding paragraphs will now click into place as
we look at the A* algorithm pseudocode. You may find it helpful to print this out or leave
the window open while we discuss it.
To help make the operation of the algorithm clear we will look again at the 8-puzzle
problem in figure 1 above. Figure 3 below shows the f,g and h scores for each of the tiles.
First of all look at the g score for each node. This is the cost of what it took to get from the
start to that node. So in the picture the center number is g. As you can see it increases by
one at each level. In some problems the cost may vary for different state changes. For
example in pathfinding there is sometimes a type of terrain that costs more than other
types.
Next look at the last number in each triple. This is h, the heuristic score. As I mentioned
above I am using a heuristic known as Nilsson's Sequence, which converges quickly to a
correct solution in many cases. Here is how you calculate this score for a given 8-puzzle
state :
Looking at the picture you should satisfy yourself that the h scores are correct according to
this algorithm.
Finally look at the digit on the left, the f score. This is the sum of g and h, and by tracking
Let me now look at the example source code provided with the tutorial, for although the
algorithm at this stage may be clear in your mind the implementation is a little
complicated. The language of choice for this kind of algorithm is really lisp or prolog, and
most Universities use these when teaching. This effectively lets students focus on the
algorithm rather than the implementation details such as memory and data stuctures. For
our purposes however, I will refer to my example source code. This is in C++ and uses
standard library and STL data structures.
If you intend on compiling and running the example code then you can get it on the
downloads page. I have not put any project, workspace or makefiles in the archive, but
compilation and linking should be straight forward; the programs run from a command
line. As we will see the A* algorithm is in a header file, since it is implemented as a
template class, so to compile you need only compile on of the example files 8puzzle.cpp
or findpath.cpp.
There are comments throughout the source, and I hope it is clear and readable. What
follows then is a very brief summary for how it works, and the basic design ideas.
The main class is called AStarSearch, and is a template class. I chose to use templates
because this enables the user to specialise the AStarSearch class to their user state in an
efficient way. Originally I used inheritence from a virtual base class, but that lead to the
use of type casts in many places to convert from the base Node to the user's node. Also
templates are resolved at compile time rather than runtime and this makes them more
efficient and require less memory.
You pass in a type which represents the state part of the problem. That type must contain
the data you need to represent each state, and also several member functions which get
called during the search. These are described below :
The idea is that you should easily be able to implement different problems. All you need
do is create a class to represent a state in your problem, and then fill out the functions
above.
Once you have done that you create a search class instance like this :
AStarSearch astarsearch;
Then the create the start and goal states and pass them to the algorithm to initialize the
search :
SearchState = astarsearch.SearchStep();
Which returns a status which let's you know whether the search succeeded, failed, or is
still going.
Once your search succedes you need to be able to display it to the user, or use it in your
program. To facilitate this I have added functions to allow movement through the solution.
UserState *GetSolutionStart();
UserState *GetSolutionNext()
UserState *GetSolutionEnd();
UserState *GetSolutionPrev()
You use these to move an internal iterator through the solution. The most typical use
would be to GetSolutionStart (the start state) and the iterate through each node using
GetSolutionNext. For debugging purposes or some problems you may need to iterate
through the solution backwards, and the second two functions allow that.
Let's say you decide to display the OPEN and CLOSED lists at each step of the solution.
This is a common debug feature whilst getting the algorithm working. Further, for the
student it is often easier to see what is going on this way. Using the following calls you
can display the lists during the search process...
As you see these calls take references to float values for f,g and h so if your debugging or
learning needs involve looking at these then you can pass floats in to store the results. If
you don't care these are optional arguments.
Examples of how you use these features are present in both the findpath.cpp and
8puzzle.cpp example files.
I hope that at this point you will understand the key concepts you need, and by reading and
experimenting with the example code (stepping through it with a debugger is very
instructive) you hopefully will fully grasp the A* Algorithm. To complete this
introduction I will briefly cover Admissibility and Optimization issues.
Admissibility
Any graph search algorithm is said to be admissible if it always returns an optimal soution,
that is the one with the lowest cost, if a solution exists at all.
However, A* is only admissible if the heuristic you use h' never over-estimates the
distance to the goal. In other words if you knew a heuristic h which always gave the exact
distance to goal then to be admissible h' must be less than or equal to h.
For this reason when choosing a heuristic you should always try to ensure that it does not
over-estimate the distance the goal. In practice this may be impossible. Look at the
8-puzzle for example; in our heuristic above it is possible that we may get an estimated
cost to goal that is higher than is really neccessary. But it does help you to be aware of this
theory. If you set the heuristic to return zero, you will never over-estimate the distance to
goal, but what you will get is a simple search of every node generated at each step
(breadth-first search).
One final note about admissibility; there is a corollary to this theory called the Graceful
Decay of Admissibility which states that if your heuristic rarely over-estimates the real
distance to goal by more than a certain value (lets call it E) then the algorithm will rarely
Optimization
A good source of optimizations for A* can be found in Steve Rabin's chapters in Game
Gems, which is on the books page. The forthcoming book AI Wisdom by the same
publisher is going to have several chapters on optimization of A*. These of course focus
on pathfinding, which is the ubiquitous use of A* in games.
Optimizing pathfinding is a whole subject in itself and I only want to target the A*
algorithm for general use, but there are some obvious optimizations you will want to make
for most problems. After testing my example code with VTune I found the two main
bottlenecks were searching the OPEN and CLOSED lists for a new node, and managing
new nodes. A simple but very effective optimization was to write a simpler memory
allocator than the C++ std new uses. I have provided the code for this class and you can
enable it in stlastar.h. I may write a tutorial on it in the future if there is sufficient interest.
Since you always want to get the node with the lowest 'f' score off the OPEN list each
search loop you can use a data structure called a 'priority queue'. This enables to you to
organise your data in a way in which the best (or worst depending on how you set it up)
item can always be removed efficiently. Steve Rabin's chapter in the book above shows
how to use an STL Vector along with heap operations to get this behaviour. My source
code uses this technique
If you are interested in priority queues follow the link above to my old A* tutorial as I
implemented one from scratch in C, and the source code has been used in public projects
such as FreeCell Solver
Another optimization is that instead of searching the lists you should use a hash table. This
will prevent you having to do a linear search. A third optimization is that you never need
to backtrack in a graph search. If you look at pathfinding for example you will never be
nearer to the goal if you step back to where you came from. So when you write your code
to generate the successor's of a node, you can check the generated ones and eliminate any
states that are the same as the parent. Although this makes no difference to the operation
of the algorithm it does make backtracking quicker.
The key to optimization is not to do it until you have your code working and you know
that your problem is correctly represented. Only then can you start to optimize the data
structures to work better for your own problem. Using VTune or True Time, or whatever
profiler you have available is the next step. In some problems checking to see if something
is the goal or not may be costly, whilst in others generating the successor nodes at each
step may be a significant bottleneck. Profiling takes the guesswork out of finding where
the bottleneck is, so that you can target the key problems in your application.
● The original tutorial was inspired by Brian Stout's pathfinding tutorial in Game
Developer:
http://www.gamasutra.com/features/19990212/sm_01.htm
● I also recommend Stephen Woodcock's website:
http://www.gameai.com
● Books See the books page; the AI section contains two good text books which include
good sections on the A* algorithm.
If this list is out of date or you would like to add an implementation link I be grateful for
your email.
Accessed:
● Alpha-Beta search
● Aspiration search
● Transposition table
● Iterative Deepening
● Principal Variation Search
● Memory Enhanced Test
● Enhanced Transposition Cutoff
● Killer heuristic
● History heuristic
● Null move heuristic
● Quiescence search
● Selective extensions
The various search algorithms are illustrated in a compact pseudo-C. The variables and functions used
have the following meaning:
pos A position in a chess game.
depth The number of levels in the tree to be searched.
Evaluate A function that determines a value for a position as seen for the side to move. In practice
such a function will be composed of the difference in material values and a large number of
positional terms. Results lie between -INFINITY and +INFINITY.
best The best value seen while searching the next level in the tree.
Successors A function that determines the set of all positions that can be reached from a position in one
move (move generation).
succ The set of positions reachable form the input position by doing one move.
Alpha-Beta search
Alpha-Beta search is the first major refinement for reducing the number of positions that has to be
searched and thus making greater depths possible in the same amount of time. The idea is that in large
parts of the tree we are not interested in the exact value of a position, but are just interested if it is better
or worse than what we have found before. Only the value of the psoition along the principal variation has
to be determined exactly (the principle variation is the alternation of best own moves and best opponent
moves from the root to the depth of the tree).
The AlphaBeta search procedure gets two additional arguments which indicate the bounds between
which we are interested in exact values for a position:
Aspiration search
Aspiration search is a small improvement on Alpha-Beta search. Normally the top level call would be
AlphaBeta(pos, depth, -INFINITY, +INFINITY). Aspiration search changes this to
AlphaBeta(pos, depth, value-window, value+window), where value is an estimate
for the expected result and window is a measure for the deviations we expect from this value.
Aspiration search will search less positions because it uses alpha/beta limits already at the root of the
tree. The danger is that the search result will fall outside the aspiration window, in which case a re-search
has to be done. A good choice of the window variable will still give an average net gain.
Transposition table
The transposition table is a hashing scheme to detect positions in different branches of the search tree
that are identical. If a search arrives at a position that has been reached before and if the value obtained
can be used, the position does not have to be searched again. If the value cannot be used, it is still
possible to use the best move that was used previously at that position to improve the move ordering.
A transposition table is a safe optimization that can save much time. The only danger is that mistakes can
be made with respect to draw by repetition of moves because two positions will not share the same move
history.
A transposition table can save up to a factor 4 on tree size and thus on search time. Because of the
exponential nature of tree growth, this means that maybe one level deeper can be searched in the same
amount of time.
Iterative Deepening
Iterative deepening means repeatedly calling a fixed depth search routine with increasing depth until a
time limit is exceeded or maximum search depth has been reached. The advantage of doing this is that
you do not have to choose a search depth in advance; you can always use the result of the last completed
search. Also because many position evaluations and best moves are stored in the transposition table, the
deeper search trees can have a much better move ordering than when starting immediately searching at a
deep level. Also the values returned from each search can be used to adjust the aspiration search window
of the next search, if this technique is used.
{
pos = RemoveOne(succ);
if (best > alpha) alpha = best;
value = -PrincipalVariation(pos, depth-1, -alpha-1, -alpha);
if (value > alpha && value < beta)
best = -PrincipalVariation(pos, depth-1, -beta, -value);
else if (value > best)
best = value;
}
return best;
}
A further refinement of this is known as NegaScout. See Alexander Reinefeld's on-line description .
Killer heuristic
The killer heuristic is used to improve the move ordering. The idea is that a good move in one branch of
the tree is also good at another branch at the same depth. For this purpose at each ply we maintain one or
two killer moves that are searched before other moves are searched. A successful cutoff by a non-killer
move overwrites one of the killer moves for that ply.
History heuristic
The history heuristic is another improvement method for the move ordering. In a table indexed by from
and to square statistics are maintained of good cutoff moves. This table is used in the move ordering sort
(together with other information such as capture gains/losses).
Quiescense search
Instead of calling Evaluate when depth=0 it is customary to call a quiescence search routime. Its purpose
is to prevent horizon effects, where a bad move hides an even worse threat because the threat is pushed
beyond the search horizon. This is done by making sure that evaluations are done at stable positions, i.e.
positions where there are no direct threats (e.g. hanging pieces, checkmate, promotion). A quiescence
search does not take all possible moves into account, but restricts itself e.g. to captures, checks, check
evasions, and promotion threats. The art is to restrict the quiescence search in such a way that it does not
add too much to the search time. Major debates are possible about whether it is better to have one more
level in the full width search tree at the risk of overlooking deeper threats in the quiescence search.
Selective extensions
In the full width part of the tree, search depth can be increased in forced variations. Different criteria can
be used to decide if a variation is forced; examples are check evasions, capturing a piece that has just
captured another piece, promotion threats. The danger if used carelessly is an explosion in tree size.
A special case of selective extensions is the singular extension heuristic introduced in the Deep Thought
chess program. The idea here is to detect forced variations by one successor position being sgnificantly
better than the others. Implementation is tricky because in alpha-beta search exact evaluations are often
not available.
It is said that the commercial chess programs use fractional depth increments to distiguish the quality of
different moves; moves with high probabbility of being good get a lower depth increment than moves
that seem bad. I have no direct references to this; the commercial chess programmers do not publish their
techniques.
| | | |
Features
By Dave C. Pottinger
Gamasutra
January 22, 1999
Coordinated Unit Movement
Originally How many times have you been sitting in rush-hour traffic thinking, "Hey, I know Contents
Published in where I want to go. And I'm sure everyone around me knows where they want to
Game Developer Introduction
go, too. If we could just work together, I'll bet we would all get where we wanted
Magazine, to go a lot easier, faster, and without rear-ending each other"? As your frustration Movement Issues
January, 1999. rises, you realize that impatient commuters aren't the most cooperative people. Facing Game
However, if you're a game player, uncooperative resource gatherers and infantry Developers
are probably even more frustrating than a real-life traffic jam. Figuring out how to
Simple Movement
get hundreds of units moving around a complex game map in real time -
Algorithm
commonly referred to as pathfinding - is a tough task. While pathfinding is a hot
industry buzzword, it's only half of the solution. Movement, the execution of a Collision
given path, is the other half of the solution. For real-time strategy games, this Determination
movement goes hand in hand with pathfinding. An axeman certainly needs a plan
(as in, a path) for how he's going to get from one side of his town to the other to Discrete vs.
help stave off the enemy invasion. If he doesn't execute that plan using a good Continuous Simulation
movement system, however, all may be lost.
Predicted Positions
Game Developer has already visited the topic of pathfinding in such past articles as
Unit to Unit
"Smart Move: Path-Finding" by Brian Stout (October/November 1996) and
"Real-Time Pathfinding for Multiple Objects" by Swen Vincke (June 1997). Rather Cooperation
than go over the same material, I'll approach the problem from the other side by Basic Planning
examining the ways to execute a path that's already been found. In this article, I'll
cover the basic components of an effective movement system. In a companion Basic Definitions
article in next month's Game Developer, I'll extend these basic concepts to cover
higher-order movement and implementation. Though the examples in these articles focus mainly on a
real-time strategy game, the methods I'll describe can easily be applied to other genres.
| | | |
Features
By Dave C. Pottinger
Gamasutra
January 22, 1999 Movement Issues Facing Game Developers
Before we dive into coordinated unit movement, let's take a look at some of the Contents
Originally movement issues facing game developers today. Most of these have to do with Introduction
Published in minimizing CPU load versus maximizing the accuracy and intelligence of the
Game Developer movement. Movement Issues
Magazine, Facing Game
January, 1999. Moving one unit versus moving multiple units. Moving one unit is generally pretty Developers
simple, but methods that work well for one unit rarely scale up effortlessly for Simple Movement
application to hundreds of units. If you're designing a system for hundreds of Algorithm
units, it will need to be very conservative in its CPU use.
Collision
Some movement features are CPU intensive. Very few games that move hundreds Determination
of units support advanced behavior such as modeling the acceleration and
deceleration of these units. The movement of large ships and heavily armored Discrete vs.
units has a lot more realism with acceleration and deceleration, but that realism Continuous Simulation
comes at a high cost in terms of extra CPU usage. The actual movement
Predicted Positions
calculation becomes more complicated because you have to apply the time
differential to the acceleration to create the new velocity. As we extend our Unit to Unit
movement system to handle prediction, we'll see that acceleration and Cooperation
deceleration complicate these calculations as well. Modeling a turn radius is also
difficult because many pathfinding algorithms are not able to take turn radii into Basic Planning
account at all. Thus, even though a unit can find a path, it may not be able to
follow that path because of turn radius restrictions. Most systems overcome this Basic Definitions
deficiency by slowing the unit down to make a sharp turn, but this involves an
extra set of calculations.
Different lengths for the main game update loop. Most games use the length of the last pass through
update loop as some indication of how much time to simulate during the next update pass. But such a
solution creates a problem for unit movement systems because these lengths vary from one update to
the next (see Figure 1 below). Unit movement algorithms work much better with nice, consistent
simulation intervals. A good update smoothing system can alleviate this problem quite a bit.
Sorting out unit collisions. Once units come into contact with one another, how do you get them apart
again? The naïve solution is just never to allow units to collide in the first place. In practice, though,
this requirement enforces exacting code that is difficult to write. No matter how much code you write,
your units will always find a way to overlap. More importantly, this solution simply isn't practical for
good game play; in many cases, units should be allowed to overlap a little. Hand-to-hand combat in
Ensemble Studios' recent title Age of Empires should have been just such a case. The restriction for
zero collision overlap often makes units walk well out of their way to fight other units, exposing them
to needless (not to mention frustrating) additional damage. You'll have to decide how much collision
overlap is acceptable for your game and resolve accordingly.
Map complexity. The more complex the map is, the more complicated and difficult good movement will
be to create. As game worlds and maps are only getting more intricate and realistic, the requirement
for movement that can handle those worlds goes up, too.
Random maps or controlled scenarios? Because you can't hard-code feasible paths, random maps are
obviously more difficult to deal with in many cases, including pathfinding. When pathfinding becomes
too CPU intensive, the only choice (aside from reducing map complexity or removing random maps) is
to decrease the quality of the pathfinding. As the quality of the pathfinding decreases, the quality of
the movement system needs to increase to pick up the slack.
Maximum object density. This issue, more than anything, dictates how accurate the movement system
must be. If your game has only a handful of moving objects that never really come into contact with
one another (as is the case with most any first-person shooter), then you can get away with a
relatively simple movement system. However, if you have hundreds of moving objects that need to
have collision and movement resolution on the scale of the smallest object (for example, a unit can
walk through a small gap between two other units), then the quality and accuracy requirements of
your movement system are dramatically raised.
| | | |
Features
By Dave C. Pottinger
Gamasutra
January 22, 1999 Simple Movement Algorithm
Let's start with some pseudo code for a simple, state-based movement algorithm Contents
Originally (Listing 1). While this algorithm doesn't do much more than follow a path and Introduction
Published in decide to find a new path when a collision is found, it does work equally well for
Game Developer both 2D and 3D games. We'll start in a given state and iterate until we can find a Movement Issues
Magazine, waypoint to move towards. Once we find that point, we break out of the loop and Facing Game
January, 1999. do the movement. There are three states: WaitingForPath, ReachedGoal, and Developers
IncrementWaypoint. The movement state for a unit is preserved across game
updates in order to allow us to set future events, such as the "automatic" waypoint Simple Movement
Algorithm
increment on a future game update. By preserving a unit's movement state, we
lessen the chance that a unit will make a decision on the next game update that Collision
counters a decision made during the current update. This is the first of several Determination
planning steps that we'll introduce.
Discrete vs.
We assume that we'll be given a path to follow and that the path is accurate and Continuous Simulation
viable (meaning, no collisions) at the time it was given to us. Because most
strategy games have relatively large maps, a unit may take several minutes to get Predicted Positions
all the way across the map. During this time, the map can change in ways that can
Unit to Unit
invalidate the path. So, we do a simple collision check during the state loop. At this Cooperation
point, if we find a collision, we'll just repath. Later on, we'll cover several ways to
avoid repathing. Basic Planning
Basic Definitions
Collision Determination
| | | |
Features
By Dave C. Pottinger
Gamasutra
January 22, 1999 Collision Determination
The basic goal of any collision determination system is to find out if two units have Contents
Originally collided. For the time being, we'll represent all collisions as two-entity collisions. Introduction
Published in We'll cover compound collisions (collisions involving three or more entities) next
Game Developer month. Once a collision is found, each entity needs to know about the collision in Movement Issues
Magazine, order to make appropriate movement decisions. Facing Game
January, 1999. Developers
Basic collision determination for most strategy games consists of treating all units
as spheres (circles in 2D) and doing a simple spherical collision check. Whether or Simple Movement
not such a system is sufficient depends on the specific requirements of a game. Algorithm
Even if a game implements more complex collision - such as oriented bounding
boxes or even low-level polygon to polygon intersection tests - maintaining a total Collision
Determination
bounding sphere for quick potential collision elimination will usually improve
performance. Discrete vs.
Continuous Simulation
There are three distinct entity types to take into account when designing a collision
system: the single unit, a group of units, and a formation (see Figure 2 below). Predicted Positions
Each of these types can work well using a single sphere for quick collision culling
(elimination of further collision checks). In fact, the single unit simply uses a Unit to Unit
Cooperation
sphere for all of its collision checking. The group and the formation require a bit
more work, though. Basic Planning
Basic Definitions
For a group of units, the acceptable minimum is to check each unit in the group for a collision. By
itself, this method will allow a non-grouped unit to sit happily in the middle of your group. For our
purposes, we can overlook this discrepancy, because formations will provide the additional, more rigid
collision checking. Groups also have the ability to be reshaped at any time to accommodate tight
quarters, so it's actually a good idea to keep group collision checking as simple as possible.
A formation requires the same checks as a group, but these check must further ensure that there are
no internal collisions within the formation. If a formation has space between some of its units, it is
unacceptable for a non-formed unit to occupy that space. Additionally, formations generally don't have
the option to reshape or break. However, it's probably a good idea to implement some game rules that
allow formations to break and reform on the other side of an obstacle if no path around the obstacle
can be found.
For our system, we'll also keep track of the timing of the collision. Immediate collisions represent
collisions currently existing between two objects. Future collisions will happen at a specified point in
the future (assuming neither of the objects changes its predicted movement behavior). In all cases,
immediate collisions have a higher resolution priority than future collisions. We'll also track the state of
each collision as unresolved, resolving, or resolved.
| | | |
Features
By Dave C. Pottinger
Gamasutra
January 22, 1999 Discrete vs. Continuous Simulation
Most movement algorithms are discrete in nature. That is, they move the unit from Contents
Originally point A to point B without considering what might be between those two points,
Published in whereas a continuous simulation would consider the volume between the two Introduction
Game Developer points as well. In a lag-ridden Internet game, fast moving units can move quite a
Magazine, distance in a single game update. When discrete simulations are coupled with Movement Issues
January, 1999. these long updates, units can actually hop over other objects with which they Facing Game
should have collided. In the case of a resource gathering unit, no one really minds Developers
too much. But players rarely want enemy units to be able to walk through a wall.
While most games work around this problem by limiting the length of a unit's Simple Movement
Algorithm
move, this discrete simulation problem is relatively easy to solve (see Figure 3
below). Collision
Determination
Discrete vs.
Continuous Simulation
Predicted Positions
Unit to Unit
Cooperation
Basic Planning
Basic Definitions
One way to solve the problem is to sub-sample each move into a series of several smaller moves.
Taking the size of the moving unit into account, we make the sampling interval small enough to
guarantee that no other unit can fit between two of the sample points. We then run each of those
points through the collision determination system. Calculating all of those points and collisions may
seem overly expensive, but later on we'll see a potential way to offset most of that cost.
Another method is to create what we'll call a move line. A move line represents the unit's move as a
line segment starting at point A and ending at point B. This system creates no extra data, but the
collision check does have an increase in complexity; we must convert from a simple spherical collision
check to a more expensive calculation that involves finding the distance from a point to a line
segment. Most 3D games have already implemented a fast hierarchical system for visible object
culling, so we can reuse that for collision culling. By quickly narrowing down the number of potential
collisions, we can afford to spend more time checking collisions against a small set of objects.
Predicted Positions
| | | |
Features
By Dave C. Pottinger
Gamasutra
January 22, 1999
Predicted Positions
Originally Now that we have a simple movement algorithm and a list of unit collisions, what Contents
Published in else do we need to get decent unit cooperation? Position prediction.
Game Developer Introduction
Magazine, Predicted positions are simply a set of positions (with associated orientations and
January, 1999. Movement Issues
time stamps) that indicate where an object will be in the future (see Figure 4 Facing Game
below). A movement system can calculate these positions using the same Developers
movement algorithm that's used to move the object. The more accurate these
positions are, the more useful they are. Position prediction isn't immediately free, Simple Movement
though, so let's look at how to offset the additional CPU usage. Algorithm
Collision
Determination
Discrete vs.
Continuous Simulation
Predicted Positions
Unit to Unit
Cooperation
Basic Planning
Basic Definitions
The most obvious optimization is to avoid recalculating all of your predicted positions at every frame. A
simple rolling list works well (see Figure 5 below); you can roll off the positions that are now in the
past and add a few new positions each frame to keep the prediction envelope at the same scale. While
this optimization doesn't get rid of the start-up cost of creating a complete set of prediction positions
the first time you move, it does have constant time for the remainder of the movement.
The next optimization is to create a prediction system that handles both points and lines. Because our
collision determination system already supports points and lines, it should be easy to add this support
to our prediction system. If a unit is traveling in a straight line, we can designate an enclosed volume
by using the current position, a future position, and the unit's soft movement radius. However, if the
object has a turn radius, things get a little more complicated. You can try to store the curve as a
function, but that's too costly. Instead, you're better off doing point sampling to create the right
predicted points (see Figure 6 below). In the end, you really want a system that seamlessly supports
both point and line predictions, using the lines wherever possible to cut down on the CPU cost.
The last optimization we'll cover is important and perhaps a little nonintuitive. If we're going to get this
predicted system with as little overhead as possible, we don't want to duplicate our calculations for
every unit by predicting its position and then doing another calculation to move it. Thus, the solution is
to predict positions accurately, and then use those positions to move the object. This way, we're only
calculating each move once, so there's no extra cost aside from the aforementioned extra start-up
time.
In the actual implementation, you'll probably just pick a single update length to do the prediction. Of
course, it's fairly unlikely that all of the future updates will be consistent. If you blindly move the unit
from one predicted position to the next without any regard to what the actual update length currently
is, you're bound to run into some problems. Some games (or some subset of objects in a game) can
accept this inaccuracy. Those of us developing all the other games will end up adding some
interpolation so that can quickly adjust a series of predicted points that isn't completely accurate. You
also need to recognize when you're continually adjusting a series of predicted positions so that you cut
your losses and just recalculate the entire series.
Most of the rest of the implementation difficulties arise from the fact that we use these predicted
positions in collision detection just as we do for the object's actual current position. You should easily
see the combinatorial explosion that's created by comparing predicted positions for all units in a given
area. However, in order to have good coordinated unit movement, we have to know where units are
going to be in the near future and what other units they're likely to hit. This takes a good, fast collision
determination system. As with most aspects of a 3D engine, the big optimizations come from quickly
eliminating potential interactions, thus allowing you to spend more CPU cycles on the most probable
interactions.
| | | |
Features
By Dave C. Pottinger
Gamasutra
January 22, 1999 Unit to Unit Cooperation
We've created a complex system for determining where an object is going to be in Contents
Originally the future. It supports 3D movement, it doesn't take up much more CPU time than Introduction
Published in a simple system, and it provides an accurate list of everything we expected a unit
Game Developer to run into in the near future. Now we get to the fun part. Movement Issues
Magazine, Facing Game
January, 1999. If we do our job well, most of the collisions that we must deal with are future Developers
collisions (because we avoid most of the immediate collisions before they even
happen). While the baseline approach for any future collision is to stop and repath, Simple Movement
it's important to avoid firing up the pathfinder as much as possible. Algorithm
This set of collision resolution rules is a complete breakdown of how to approach Collision
Determination
the problem of unit-to-unit collision resolution (from a unit's frame of reference).
Discrete vs.
Unresolved collisions Continuous Simulation
3. Else, if we're the higher-priority unit and we can push the lower-priority unit along our path,
push the lower priority-unit. Change the collision state to resolving.
2. If collision with hard radius overlap is inevitable and we're the higher-priority unit, tell the
lower-priority unit to pause, and go to Case 3.
3. Else, if we're the higher-priority unit, calculate our get-to point and tell the lower-priority unit to
slow down enough to avoid the collision.
Resolving Collisions
● If we're the unit that's moving in order to resolve a Case 1 collision and we've reached our
desired point, resolve the collision.
● If we're the Case 3.1 lower-priority unit and the higher- priority unit has passed its get-to point,
start returning to the previous position and resolve the collision.
● If we're the Case 3.1 higher-priority unit, wait (slow down or stop) until the lower-priority unit
has gotten out of the way, then continue.
● If we're the Case 3.3 higher-priority unit and the lower-priority unit can now get out of the way,
go to Case 3.1.
● If we're the Case 4.3 lower-priority unit and the higher-priority unit has passed its get-to point,
resume normal speed and resolve the collision.
One of the key components of coordinated unit movement is to prioritize and resolve disputes. Without
a solid, well-defined priority system, you're likely to see units doing a merry-go-round dance as each
demands that the other move out of its way; no one unit has the ability to say no to a demand. The
priority system also has to take the collision severity into account. A simple heuristic is to take the
highest-priority hard collision and resolve down through all of the other hard collisions before
considering any soft collisions. If the hard collisions are far enough in the future, though, you might
want to spend some time resolving more immediate soft collisions. Depending on the game, the
resolution mechanism might also need to scale based on unit density. If a huge melee battle is
creating several compound hard collisions between some swordsmen, you're better served spending
your CPU time resolving all of those combat collisions than resolving a soft collision between two of
your resource gatherers on a distant area of the map. An added bonus to tracking these areas of high
collision density is that you can influence the pathfinding of other units away from those areas.
Basic Planning
| | | |
Features
By Dave C. Pottinger
Gamasutra
January 22, 1999 Basic Planning
Planning is a key element of unit cooperation. All of these predictions and Contents
Originally calculations should be as accurate as possible. Inevitably, though, things will go Introduction
Published in wrong. One of the biggest mistakes we made with the Age of Empires' movement
Game Developer was to make every decision within a single frame of reference. Every decision was Movement Issues
Magazine, always made correctly, but we didn't track that information into future updates. As Facing Game
January, 1999. a result, we ended up with units that would make a decision, encounter a problem Developers
during the execution of that decision, and then make a decision that sent them
right back on their original path, only to start the whole cycle over again the next Simple Movement
update. Planning fixes this tautology. We keep around the old, resolved collisions Algorithm
long enough (defined by some game-specific heuristic) so that we can reference
Collision
them should we get into a predicament in the future. When we execute an Determination
avoidance, for example, we remember what object it is that we're avoiding.
Because we'll have created a viable resolution plan, there's no reason to do Discrete vs.
collision checking with the other unit in the collision unless one of the units gets a Continuous Simulation
new order or some other drastic change takes place. Once we're done with the
avoidance maneuver, we can resume normal collision checking with the other unit. Predicted Positions
As you'll see next month, we'll reuse this planning concept over and over again to
accomplish our goals. Unit to Unit
Cooperation
Simple games are a thing of the past; so is simple movement. We've covered the
Basic Planning
basic components necessary for creating a solid, extensible movement system: a
state-based movement algorithm, a scalable collision determination system, and a Basic Definitio ns
fast position prediction system. All of these components work together to create a
deterministic plan for collision resolution.
Next month, we'll extend these concepts to cover higher-order movement topics, such as group
movement, full-blown formation movement, and compound collision resolution. I'll also go into more
detail about some implementation specifics that help solve some of the classic movement problems.
Basic Definitions
| | | |
Features
By Dave C. Pottinger
Gamasutra
January 22, 1999 Basic Definitions
Movement. The execution of a path. Simple movement algorithms move a unit Contents
Originally along a path, while more complex systems check collisions and coordinate unit Introduction
Published in movement to avoid collisions and allow otherwise stuck units to move.
Game Developer
Movement Issues
Magazine, Pathfinding. The act of finding a path (a planned route for a unit to get from Facing Game
January, 1999. point A to point B). The algorithm used can be anything from a simple exhaustive Developers
search to an optimized A* implementation.
Simple Movement
Waypoint. A point on a path that a unit must go through to execute the path. Algorithm
Each path, by definition, has one waypoint at the start and one waypoint at the
end. Collision
Determination
Unit. A game entity that has the ability to move around the game map. Discrete vs.
Continuous Simulation
Group. A general collection of units that have been grouped together by the user
for convenience (usually to issue the same order to all of the units in the group). Predicted Positions
Most games try to keep all of the units in a group together during movement.
Unit to Unit
Formation. A more complex group. A formation has facing (a front, a back, and Cooperation
two flanks). Each unit in the formation tries to maintain a unique relative position
inside the formation. More complex models provide an individualized unit facing Basic Planning
inside of the overall formation and support for wheeling during movement.
Basic Definitions
Hard Movement Radius. A measure of the volume of a unit with which we absolutely do not allow
other units to collide.
Soft Movement Radius. A measure of the volume of a unit with which we would prefer not to collide.
Movement Prediction. Using the movement algorithms to predict where a unit will be at some point
in the future. A good prediction system will take acceleration and deceleration into account.
Turn Radius. The radius of the tightest circle a unit can turn on at a given speed.
| | | |
Features
By Dave C. Pottinger
Gamasutra
January 29, 1999
Implementing Coordinated Movement
Originally
Published in the Part of the fun of working in the game industry is the constant demand for technical innovations that
February 1999 issue will allow designers to create better games. In the real-time strategy (RTS) genre, most developers
of: are focusing on improving group movement for their next round of games. I'm not talking about the
relatively low-tech methods cur- rently in use. Instead, I'm referring to coordinated group movement,
where units cooperate with each other to move around the map with intelligence and cohesion. Any
RTS game developer that wants to be competitive needs to look beyond simple unit movement; only
the games that weigh in with solid coordinated movement systems will go the distance.
In this article, the second and final part of my coordinated unit movement series, we'll take a look at
Letters to the Editor: how to use the systems that we considered in the first article to satisfy our coordinated group
Write a letter movement goal. We'll also examine how we can use our coordinated movement fundamentals to solve
View all letters some classic, complex movement problems. While we will spend most of our time talking about these
features through the RTS microscope, they can easily be applied to other types of games.
Catching Up
Last week (Coordinated Unit Movement), we discussed a lot of the low-level issues of coordinated unit
movement. While pathfinding (the act of finding a route from point A to point B) gets all of the press,
the movement code (the execution of a unit along a path) is just as important in creating a positive
game experience. A game can have terrific pathfinding that never fails to find the optimum path. But,
if the movement system -isn't up to par, the overall appearance to the players is going to be that the
units are stupid and can't figure out where to go.
One of the key components to any good movement system is the collision determination system. The
collision system really just needs to provide accurate information about when and where units will
collide. A more advanced collision system will be continuous rather than discrete. Most games scale
the length of the unit movement based on the length of the game update loop. As the length of that
update loop increases, the gap between point A and point B can get pretty large. Discrete collision
systems ignore that gap, whereas continuous systems check the gap to make sure there isn't anything
in between the two points that would have created a collision with the unit being moved. Continuous
collision determination systems are more accurate and more realistic. They're more difficult to write,
though.
Another important element for coordinated unit movement is position prediction. We need to know
where our units are trying to go so that we can make intelligent decisions about how to avoid
collisions. Although building a fast position-prediction system presents us with a number of issues, for
this article, we can assume that our collision determination system has been augmented to tell us
about future collisions in addition to current collisions. Thus, each unit in the game will know with
which units it's currently in collision with and which units it will collide with in the near future. We
presented several rules for getting two units out of collision in last month's article.
All of these elements work together to create the basis for a solid, first-order (single unit to single
unit) coordinated movement system. The core thing to keep in mind for this article is that we have an
accurate, continuous collision determination system that tells us when and where units will collide.
We'll use that collision system in conjunction with the collision resolution rules to create second order
(three or more units/groups in collision) coordination.
Group Movement
Looking at the definition of a group (see sidebar at right), we can immediately see that we need to
store several pieces of data. We need a list of the units that make up our group, and we need the
maximum speed at which the group can move while still keeping together. Additionally, we probably
want to store the centroid of the group, which will give us a handy reference point for the group. We
also want to store a commander for the group. For most games, it doesn't matter how the commander
is selected; it's just important to have one.
Units in a group just move at the same speed. Formation. A more complex group. A
Usually, this sort of organization moves the group at formation has an orientation (a front, a back, a
the maximum speed of its slowest unit, but sometimes right flank, and a left flank). Each unit in the
it's better to let a slow unit move a little faster when formation tries to maintain a unique relative
it's in a group (see Figure 1 below). Designers position within the formation. More complex
generally give units a slow movement speed for a models provide an individualized unit facing
reason, though; altering that speed can often create within the overall formation and support for
unbalanced game play by allowing a powerful unit to wheeling during movement.
move around the map too quickly.
Units, Groups, and Formations
Figure 1
Units in a group move at the same speed and take the same path. This sort of organization
prevents half of the group's units from walking one way around the forest while the other half takes a
completely different route (see Figure 2 below). Later, we'll look at an easy way to implement this sort
of organization.
Figure 2
Units in a group move at the same speed, take the same path, and arrive at the same time.
This organization exhibits the most complex behavior that we'll apply to our group definition. In
addition to combining the previous two options, it also requires that units farther ahead wait for other
units to catch up and possibly allows slower units to get a temporary speed boost in order to catch up.
So, how can we achieve the last option? By implementing a hierarchical movement system, we can
manage individual unit movement in a way that allows us to consider a group of units together. If we
group units together to create a group object, we can store all of the necessary data, calculate the
Listing 1. BUnitGroup.
//*****************************************************************************
// BUnitGroup
//*****************************************************************************
class BUnitGroup
{
public:
BUnitGroup( void );
~BUnitGroup( void );
//Returns the ID for this group instance.
int getID( void ) const { return(mID); }
//Various get and set functions. Type designates the type of the group
//(and is thus game specific). Centroid, maxSpeed, and commander are
//obvious. FormationID is the id lookup for any formation attached to
//the group (will be some sentinel value if not set).
int getType( void ) const { return(mType); }
void setType( int v ) { mType=v; }
BVector& getCentroid( void ) const { return(mCentroid); }
float getMaxSpeed( void ) const { return(mMaxSpeed); }
int getCommanderID( void ) const { return(mCommanderID); }
BOOL getFormationID( void ) const { return(mFormationID); }
BOOL setFormationID( int fID );
//Standard update and render functions. Update generates all of the
//decision making within the group. Render is here for graphical
//debugging.
BOOL update( void );
BOOL render( BMatrix& viewMatrix );
//Basic unit addition and removal functions.
BOOL addUnit( int unitID );
BOOL removeUnit( int unitID );
int getNumberUnits( void ) const { return(mNumberUnits); }
int getUnit( int index );
protected:
int mID;
int mType;
BVector mCentroid;
float mMaxSpeed;
int mCommanderID;
int mFormationID;
int mNumberUnits;
BVector* mUnitPositions;
BVector* mDesiredPositions;
};
The BGroup class manages the unit interactions within itself. At any point in time, it should be able to
develop a schedule for resolving collisions between its units. It needs to be able to control or modify
the unit movement through the use of parameter settings and priority manipulation. If your unit
system only has support for one movement priority, you'll want to track a secondary movement
priority within the group for each unit in the group. Thus, to the outside world, the group can behave
as a single entity with a single movement priority, but still have an internal prioritization. Essentially,
the BGroup class is another complete, closed movement system.
The commander of the group is the unit that will be doing the pathfinding for the group. The
commander will decide which route the group as a whole will take. For basic group movement, this
may not mean much more than the commander being the object that generates the pathfinding call.
As we'll see in the next section, though, there's a lot more that the commander can do.
The BFormation class (below)manages the definition of the desired positions (the positions and
//The unit management functions. These all return information for the
//canonical definition of the formation. It would probably be a good
//idea to package the unit information into a class itself.
BOOL setUnits( int num, BVector* pos, BVector* ori, int* types );
int getNumberUnits( void ) const { return(mNumberUnits); }
BVector& getUnitPosition( int index );
BVector& getUnitOrientation( int index );
int getUnitType( int index );
protected:
BVector mOrientation;
int mState;
int mNumberUnits;
BVector* mPositions;
BVector* mOrientations;
int* mTypes;
};
Under this model, we must also track the state of a formation. The state cStateBroken means that
the formation isn't formed and isn't trying to form. cStateForming signifies that our formation is
trying to form up, but hasn't yet reached cStateFormed. Once all of our units are in their desired
positions, we change the formation state to cStateFormed. To make the movement considerably
easier, we can say that a formation can't move until it's formed.
When we're ready to use a formation, our first task is to form the formation. When given a formation,
BGroup enforces the formation's desired positions. These positions are calculated relative to the
current orientation of the formation. When the formation's orientation is rotated, then the formation's
desired positions will automatically wheel in the proper direction.
To form the units into a formation, we use scheduled positioning. Each position in the formation has a
scheduling value (either by simple definition or algorithmic calculation) that will prioritize the order in
which units need to form. For starters, it works well to form from the inside and work outward in order
to minimize collisions and formation time (see Figure 3 below). The group code manages the forming
with the algorithm shown in Listing 3.
Listing 3.
Set all units' internal group movement priorities to same low priority value.
Figure 3
So, now that we have all of our swordsmen in place, what do we do with them? We can start moving
them around the board. We can assume that our pathfinding has found a viable path (a path that can
be followed) for our formation's current size and shape (see Figure 4 below). If we don't have a viable
path, we'll have to manipulate our formation (we'll talk about how to do this shortly). As we move
around the map, we designate one unit as the commander of our formation. As the commander
changes direction to follow the path, the rest of our units will also change direction to match the
commander's; this is commonly called flocking.
Figure 4
We have a couple of ways to deal with direction changes for a formation: we can ignore the direction
change or we can wheel the formation to face in the new direction. Ignoring the direction change is
simple and is actually appropriate for something such as a box formation (Figure 5).
Figure 5
Wheeling isn't much more complicated and is very appropriate for something such as a line. When we
want to wheel, our first step is to stop the formation from moving. After rotating the orientation of the
formation, we recalculate the desired positions (see Figure 6 below). When that's done, we just go
back to the cStateForming state, which causes the group code to move our units to their new
positions and sets us back to cStateFormed when its done (at which point, we can continue to move).
Figure 6
Scaling unit positions. Because the desired positions are really just vector offsets within a
formation, we can apply a scaling factor to the entire formation to make it smaller. And a smaller
formation can fit through small gaps in walls or treelines (see Figure 7 below). This method works well
for formations in which the units are spread out, but it's pretty useless for formations where the units
are already shoulder to shoulder (as in a line). Scaling the offsets down in that case would just make
our swordsmen stand on top of each other, which isn't at all what we want.
Figure 7
Figure 8
Halving and rejoining. While simple avoidance is a good solution, it does dilute the visual impact of
seeing a formation move efficiently around the map. Halving can preserve the visual impact of
well-formed troops. When we encounter an obstacle that's within the front projection of the formation
(see Figure 9 below), we can pick a split point and create two formations out of our single formation.
These formations then move to the rejoin position and then merge back into one formation. Halving is
a simple calculation that dramatically increases the visual appeal of formations.
Figure 9
Path Stacks
A path stack is simple a stack-based (last in, first out) method for storing the movement route
information for a unit (see Figure 10 below). The path stack tracks information such as the path the
unit is following, the current waypoint the unit is moving toward, and whether or not the unit is on a
patrol. A path stack suits our needs in two significant ways.
Figure 10
First, it facilitates a hierarchical pathfinding setup (see Figure 11 below). Game developers are
beginning to realize that there are two distinctly different types of pathfinding: high-level and
low-level. A high-level path routes a unit around major terrain obstacles and chokepoints on a map,
similarly to how a human player might manually set waypoints for a unit. A low-level path deals with
avoidance of smaller obstacles and is much more accurate on details. A path stack is the ideal method
for storing this high- and low-level information. We can find a high-level path and stuff that into the
stack. When we need to find a low-level path (to avoid a future collision with that single tree in the big
field), we can stuff more paths onto the stack and execute those. When we're done executing a path
on the stack, we pop it off the stack and continue moving along the path that's now at the top of the
stack.
Figure 11
Second, a path stack enables high-level path reuse. If you recall, one of the key components to good
group and formation movement is that all of the units take the same route around the map. If we
write our path stack system so that multiple units can reference the same path, then we can easily
allow units to reuse the same high-level path. A formation commander would find a high-level path
and pass that path on to the rest of the units in his formation without any of them having to do any
extra work.
Structuring the path storage in this manner offers us several other benefits. By breaking up a
high-level path into several low-level paths, we can refine future low-level segments before we
execute them. We can also delay finding a future low-level path segment if we can reasonably trust
that the high-level path is viable. If we're doing highly coordinated unit movement, a path stack allows
us to push temporary avoidance paths onto the stack and have them easily and immediately
integrated into the unit's movement (see Figure 12).
Figure 12
If we have a compound collision between three units (see Figure 13 below), our first task is to find the
highest-priority unit involved in the collision. Once we've identified that unit, we need to look at the
other units in the collision and find the most important collision for the highest priority unit to resolve
(this may or may not be a collision with the next highest-priority unit in the collision). Once we have
two units, we pass those two units into the collision resolution system.
Figure 13
As soon as the collision between the first two units is resolved (see Figure 14 below), we need to
reevaluate the collision and update the unit involvement. A more complex system could handle the
addition of new units to the collision at this point, but you can get good results by simply removing
units as they resolve their collisions with the original units.
Figure 14
Once we've updated the units in the collision, we go back to find two more units to resolve; we repeat
this process until no more units are involved in the collision.
You can implement this system in two different areas: the collision resolution rules or the collision
determination system. The collision resolution rules would need to be changed in the way in which
they units higher and lower priority; these rules aren't particularly difficult to change, but this
modification does increase the complexity of that code. Alternatively, you can change the collision
determination system so that it only generates collisions that involve two units at a time; you still have
to find all of the units in a collision, though, before you can make this decision.
The first step is to identify that you have a stacked canyon problem. This step is important because it's
needed to propagate the movement priority of the driving unit (the unit trying to move through the
stacked canyon) through to the rest of the units. We could just let each unit ask other units to move
out of the way based on its own priority, but a better solution to use the priority of the driving unit -
after all, that's the unit that we really want to get through the canyon. Identifying a stacked canyon
problem can be done in a couple of ways: noticing that the driving unit will push the first unit into a
second unit or looking at the driving unit's future collision list to find multiple collisions. Whichever
method is used, the pushed unit should move with the same priority as the driving unit.
Once we've identified the problem, we have a fairly simple recursive execution using the coordinated
movement. We treat the first pushed unit as the driving unit for the second pushed unit, and so on.
Each unit is pushed away from its driving unit until it can move to the side. When the last unit moves
to the side, the original driving unit has a clear path by which to move through the canyon.
A nice touch is to restore the canyon units to their original states. To do this, we simply need to track
the pushing and execute the moves in the reverse order from which the units moved to the side. It's
also useful to have the movement code recognize when the driving unit is part of a group so that the
rest of the group's units can move through the canyon before the canyon units resume their original
positions.
Tips
Optimize this general system to your game. A lot of extra computation can be eliminated or
simplified if you're only doing a 2D game. Regardless of whether you're doing a 2D or 3D game, your
collision detection system will need a good, highly optimized object culling system; they're not just for
graphics anymore.
Use different methods for high and low level pathing. To date, most games have used the same
system for both solutions. Using a low-level solution for high-level pathfinding generally results in
high-level pathfinding that's slow and not able to find long paths. Conversely, a high-level pathfinder
used for low-level pathfinding creates paths that don't take all of the obstacles into account or are
forced to allow units to move completely through each other. Bite the bullet and do two separate
systems.
No matter what you do, units will overlap. Unit overlap is unavoidable or, at best, incredibly
difficult to prevent in all cases. You're better off simply writing code that can deal with the problem
early. Your game will be a lot more playable throughout its development.
Game maps are getting more and more complex. Random maps are going to be one of the key
discriminating features in RTS games for some time to come. The better movement systems will
handle random maps and also take changing map circumstances into account.
Understand how the update affects unit movement. Variable update lengths are a necessary evil
that your movement code will have to be able to handle. Use a simple update smoothing algorithm to
make the most of the problems go away.
Single update frames of reference are a thing of the past. It's impossible to do coordinated unit
movement without planning. It's impossible to do planning if you don't track past decisions and look at
what's likely to happen in the future. Any generalized coordinated unit movement system needs to be
able to recall past collision information and have future collision information available at all times.
Remember that minor variations during the execution of a collision resolution plan can be ignored.
No Stupid Swordsmen
Simple unit movement is, well, simple. Good, coordinated unit movement is something that we should
be working on in order to raise our games to the next level and make our players a lot happier in the
process. With these articles, we've laid the foundation for a coordinated unit movement system by
talking about topics such as planning across multiple updates and using a set of collision resolution
rules that can handle any two-unit collision. Don't settle for stupid swordsman movement again!
After several close calls, Dave managed to avoid getting a "real job" and joined Ensemble
Studios straight out of college a few years ago (just in time to the do the computer player
AI for a little game called AGE OF EMPIRES). These days, Dave spends his time either
leading the development of Ensemble Studios' engines or with his lovely wife Kristen. Dave
can be reached at dpottinger@ensemblestudios.com.
See Also:
Artificial Intelligence:Pathfinding and Searching
Featured Articles:Featured Articles
The player is presented with a new environment, having just battled his way past a hoard of enemies. The new environment is a
network of corridors and rooms - and throughout the environment are enemies, bonuses, and other interactive items. The player
explores the environment, taking out the enemies, collecting the bonuses, and discovering the interactive items.
Now, take two.
The NPC is presented with a new environment, having just been generated by the engine. The new environment is a network of
corridors and rooms - and throughout the environment are enemies, bonuses, and other interactive items. The NPC heads straight for
the nearest room, because it can see from the nodegraph that there is a bonus there, and then takes a route through the environment that
avoids and evades all enemies. The framerate drops a little as the route is recalculated each step, to account for enemy movements. The
player watches in disbelief.
Sound familiar? This article is going to discuss a method for avoiding this: something I call an 'expandable path table.' It allows NPCs
to both navigate through an environment, and to explore and learn it, rather than being presented with the entire thing. It's also faster
than constantly recalculating a route.
Before we start
You'll need to know a pathfinding algorithm. I used Dijikstra's algorithm when researching - but the A* algorithm, or any other
algorithm for finding a path through a network of nodes, will do fine.
The Problem
The above network represents my level. The nodes are significant points - such as junctions, rooms, or corners - and the lines represent
the routes between them. There are no weights on this network, but they could easily be added in a real situation (i.e. the length of the
routes between the nodes).
The NPC is my enemy and needs to track me through the level. I'm moving around, and so is it, so it needs to recalculate the route to
my position each step. (For this article, we will assume that the NPC and I can only be at a node at any given time - in a real situation
though, it could be assumed that a node will have a line-of-sight to all nodes that it connects to, so the NPC could move to the node
nearest to me and be able to shoot at me).
Now, while it will depend on your pathfinding algorithm, recalculating a path each step isn't fast. Especially with an algorithm such as
Dijikstra's, which sometimes involves labeling the entire network. Ideally, we need a method to store all the routes between nodes
beforehand.
A concept
If the shortest route from node A to node D is ABCD, then the shortest route from A to C must be ABC.
I haven't yet found a case where this isn't true. For Dijikstra's, the shortest distance (and therefore, route) to each node in the network is
found, and the algorithm depends on it being true. I'm not going to try and disprove it. :-)
That'll be nice, won't it? Look up where we are, where we're going, and get... what? The answer: the next node we need to move to. I'll
fill out the matrix for the network above, quickly.
To
A B C D E F G H
A - B B E E F F F
B A - C C A A C A
C B B - D D G G G
From D E C C - E G G G
E A A D D - F F F
F A A G G E - G H
G F C C D F F - H
H F F G G F F G -
Does that make sense? Trace the route from any one node to another, quickly. For example, B to H.
B->H: Move to A
A->H: Move to F
F->H: Move to H
GOAL
That demonstrates that we've got a very simple algorithm on our hands. The matrix is also easy to build, too - you just calculate the
route from the 'from' node to the 'start' node, and store the first node in the route (excluding the start node itself).
This network is also a little special because there are no dead ends - that is, nodes with only one connection. If there were - and in real
situations, there would be - that row in the matrix is easy to do, as there's only one option for each cell.
Now, what about the original problem of trying to track me? I'll show a sample 'game,' with the NPC's moves on the left, and mine on
the right. Let's say that the NPC starts at node A, and I at node C. I don't know the level very well, so I may make some bad moves
from time to time, but it demonstrates the AI.
A->C: Move to B C->D
B->D: Move to C D->E
In each case, all the NPC needs to do is to know my position, which it can use to look up where to move in the table. It's fast, no?
Sure, if I'd played a little better, I could have had it running around in circles for as long as I felt like. That's partly because of the level,
and partly because the bot doesn't have the option to 'rest.' If the bot stopped to rest (or, for that matter, if I did) then the outcome would
have been different. Also, if there were other goals, the bot may have chosen to take a different route to pick up a bonus on the way.
Ultimately, I guess, it's just a version of the Terminator algorithm (or 'tracker,' but I like Terminator better :-) ) applied to a network.
Still, it's useful.
Cheating
Let's return to the scenario I originally presented you with, of the NPC entering a new environment. Rather than exploring, it uses a
pre-calculated network to find the optimal route and reach its goal immediately, rather than having to explore.
The matrix method that I've just demonstrated can be expanded for education. The mind is a little more complicated, but it's still
understandable.
This one's a little more complex. The blue lines represent walls.
We can build two extra tables from this that we'll need later. The first one is one that I will call our VIS table - it contains information
about which nodes can 'see' which other nodes. Our bot will only 'learn' about the existence of a node when it 'sees' it.
Sight Table - T: TRUE, BLANK: FALSE
To
ABCDEFGH I JKLMNOP
A - T T TT
B T - T TT
C - TT
D T - T T
E TTT -
F - T TT
G T T - TT
From H TT - T T
I TT T -
J - TT T T
K T T - T T
L T TT - T
M TTT - T
N T - T
O TT T - T
P TT T T -
(Yes, it is symmetrical along the To=From line - because we're working on the premise that if A can see B, B can see A. I can't think of
a practical application where this isn't true, but if you can, kudos to you. :-) )
The second table we can build - a much smaller one - is a magnitude table. It lists the number of connections each node has. I won't do
it yet, as we don't pre-build it, we build it as we go along.
OK. Let's say that our bot starts at node A. He knows only node A, and nothing more. His route table looks like this:
To
A
From
A
Nice! B-) But, somehow, kinda useless. So, as our bot cannot move and has no higher priority goal (that is achievable - if he is hungry,
for example, his limited network will not have any food in it), he decides to look around.
Now, we are going to cheat a little here, but if we didn't, it'd look strange. Node A can see nodes B, L, O, and P. So, we pick the first of
those, and turn the bot to face it.
In actual fact, B, O, and P are all in a line. So, when the bot's field of view picks up node B, it will pick up nodes O and P as well.
The first move is to add the newly discovered nodes to the route and magnitude tables, and to update the magnitude table:
To
A B O P
A -
From B -
O -
P -
Node Magnitude
A 1
B 2
O 1
P 2
Now, recalculate all new rows and columns in the route table:
To
A B O P
A - B B B
From B A - P O
O P P - P
P B B O -
Conclusion
Right, that's all. This method has plenty of room for expansion - off the top of my head, assigning 'goodness' values to each node - for
strategic importance, safety, proximity to health/weapons, etc - might be useful to break out of the loop situations I mentioned earlier.
Oof. I'm going to go and eat some toast, I think. Questions, comments, etc, you can email me at rfine@lycos.com, post about this in the
forums (I usually check) or catch me in #gamedev. Happy brainstorming! =]
Discuss this article in the forums
See Also:
Artificial Intelligence:Pathfinding and Searching
Introduction
First of all, I'm no authority on pathfinding or motion planning. There just seems to be a lot of interest in pathfinding algorithms on the
artificial intelligence message board. Last year I did a big project on motion planning for a university course, but since almost nobody
on the net understands Dutch, I thought it might be best to translate it and let a much broader public get acquainted with the general
concepts of motion planning. In this article I'll shed some light on several techniques using potential fields. We'll focus on the
theoretical part of the algorithms and now and then I'll throw in some screenshots from a demo-application I made for the project.
Should you have more questions or ideas on the subject, my name is Stefan Baert, I live in Belgium and on the net you'll mostly find
me as 'StrategicAlliance'.
If you do a graphical representation of building the tree, you can see that local minima are 'filled' until the 'bucket runs over' and we can
continue a straight line to our goal. In the picture, the members of the tree are blue, except for the leaves which are yellow. Obstacles
are represented in white.
If you look at the picture you'll notice that local minima now get a higher value (indicated by a brighter yellow) because it's a longer
way around an object to get there than in a straight line. The path we follow is indicated by the blue line moving east and then south.
The picture shows wavefronts coming from the edges of the objects and a purple roadmap exactly in the middle between several
objects.
Conclusion
This concludes our voyage using potential fields. The techniques described above are an interesting alternative for the A* algorithm
that seems to be very popular these days. Though potential fields require some serious calculations to be applied their characteristic of
defining a whole grid in numbers can be an asset if you need quickly changing destination goals, because all you need to do is increase
or decrease the numbers in a certain region to make it more attractive (or not) for the unit to follow that path.
Discuss this article in the forums
| | | |
Features
by Mark Brockington
Gamasutra
June 26, 2000
Pawn Captures Wyvern: How Computer Chess
Can Improve Your Pathfinding
Printer Friendly Editor's note: This paper was originally published in the 2000 Game Developer's Conference
Version proceedings
1. Introduction
Discuss this Article Contents
Most of you with Computer Science training have probably been through the
typical Artificial Intelligence lecture on search and planning. You are shown 1. Introduction
A*, with some trivial example (so your professor doesn't get lost while doing
it) which shows all of the various parts of A*. You've also 3.1 Game Trees and
sat through the proof of why A* generates an optimal solution when it has Minimax Search
an admissible heuristic. If you're really lucky, you get to implement A* in
Lisp or Prolog in an assignment, and solve a puzzle involving sliding tiles. 3.2 Iterative Deepening
Jump ahead a few years, and you've been given the task of implementing a 4.0 Reimplementing A*
pathfinding algorithm for your game. You shift through your notes, trying to
remember why you need a CLOSED list, and how to translate all the car() 4.3 The History of
and cdr() instructions from Lisp into something that your lead programmer Heuristic
won't bring up during your next performance review. You study web sites on
AI and pathfinding, try a few enhancements, and eventually come up with a solution that behaves in a
very similar manner to the A* algorithm from your notes.
In an alternate universe, there are academics and hobbyists that concentrate on computer games of
thought, such as chess, checkers and Othello. There are regular tournaments between programs, and
the two main ways to outplay your opponent and win the game involve outsearching your opponent
and having a smarter (but still computationally fast) evaluation of positions. I have heard each of
these statements while chatting with other Othello programmers during tournaments. Do these
statements sound like anything you've heard a programmer in your company mention?
● "I don't trust C++ to generate fast code, so I'm still using ANSI C."
● "I coded the inner loop in assembly. It took me two months of work, but it speeds up the
program by 10%, so it was worth it."
Letters to the Editor:
● "I've had about eight hours of sleep in 72 hours, but I've improved the performance."
Write a letter
View all letters
Computer chess programmers been dealing with a search algorithm (cryptically called ab) for the last
25 years. They have a library of standard enhancements that they can use to enhance ab and improve
the performance of their program without having to resort to learning MIPS processor machine
language, or trying to acquire knowledge about what sort of positions their program handles poorly.
Academics involved in the field often quoted the desire to beat the World Chess Champion in a game
of chess to get their research funding. However, IBM and Deep Blue brought the funding train to a
screeching halt. Most have moved onto games that are significantly harder for the computer to do well
at, such as Poker, Bridge and Go. However, others realized that A* search really is not all that
different from .
When we cast aside the superficial differences between the two algorithms, we quickly discover A* and
ab are actually remarkably similar, and we can use the standard search enhancements from the typical
computer chess program in your pathfinding algorithm. We will be describing the subset of the
computer chess based search enhancements that we use in our pathfinding code at BioWare.
Section 2 will quickly review the standard A* algorithm (so you do not have to dig out your AI lecture
notes again). Section 3 will discuss the anatomy of a computer chess search algorithm, and Section 4
shows you how to put the search enhancements into A*.
2. A Really Brief Review of A*
Some of the performance information referenced in this paper refers to the sliding-tile puzzle instead
of pathfinding, since this has been the most popular test in academic circles for studying A*. An
example of the sliding-tile puzzle can be found in Figure 1. In the sliding-tile puzzle, the Manhattan
distance (the sum of the vertical and horizontal displacements of each tile from its current square to
its goal square) is an admissible and effective heuristic for use in A* search.
3. The Anatomy Of A Chess Program
Now that we have quickly reviewed A*, let us deal with a computer chess search algorithm.
Games such as chess, checkers and Othello belong to a broad group of games called two-player
zero-sum games with perfect information. Zero-sum implies that if one player wins, the other player
loses. Perfect information implies the entire state of the game is known at any time. Scrabble has
hidden tiles, and is defined as a game of imperfect information.
Two-player zero-sum games with perfect information are well known to game theoreticians [von
Neumann 1944]. In any position for a game in this category, an optimal move can be determined. An
optimal move can be determined via the minimax algorithm which, for a game like chess, contains a
matrix that has been estimated to contain more molecules than our entire planet! However, all hope is
not lost, since there are alternative formulations of the minimax algorithm that involve searching a
game tree.
________________________________________________________
3.1 Game Trees and Minimax Search
| | | |
Features
by Mark Brockington
Gamasutra
June 26, 2000
Editor's note: This paper was originally published in the 2000 Game Developer's Conference
Printer Friendly proceedings
Version
3.1 Game Trees And Minimax Search
Contents
Discuss this Article The root of a game tree represents the current state of the game. Each node
in the tree can have any number of child nodes. Each child of a node 1. Introduction
represents a new state after a legal move from the node's state. This
continues until we reach a leaf, a node with no child nodes, in the game 3.1 Game Trees and
tree. We assign a payoff vector to each leaf in the game tree. In a Minimax Search
generalized game tree, this payoff vector represents the utility of the final
position to both players. In general, winning a game represents a positive 3.2 Iterative Deepening
utility or a player, while losing a game represents a negative utility. Since
4.0 Reimplementing A*
the game is a two-player zero-sum game, the utility for the first player must
equal the negative of the utility for the second player. The utility for the side
4.3 The History of
to move at the root of the tree is usually the only one given to save space. Heuristic
at the root, while the opponent, O, moves at the first level below the root. A position is normally
categorized by how many levels down in the game tree it is located. The common term for this is ply.
The root is said to be at ply 0, while the immediate successors of the root are said to be at ply 1, et
cetera.
Naughts and Crosses, like chess and checkers, has only three possible outcomes for a player: win, loss
or draw. Normally, we assign the payoff of +1, 0 and -1 to a win, draw or loss for the player to move,
respectively. These payoffs are given in Figure 2 at the bottom of each leaf position, with respect to
the player with the crosses.
Once the root of the game tree has been assigned a minimax value, a best move for Max is defined as
a move which leads to the same minimax value as the root of the tree. We can trace down the tree,
always choosing moves that lead to the same minimax value. This path of moves gives us an optimal
line of play for either player, and is known as a principal
variation. Note that in our game of Naughts and Crosses, the side playing the Crosses will draw the
game, but only if an X is played in the lower central square. Playing to either square in the top row can
lead to a loss for the Crosses, if the opponent plays the best move.
To compute the minimax value of a position, we can use any algorithm that searches the whole game
tree. A depth-first search uses less memory than a best-first or breadth-first tree search algorithm, so
it is preferred in current game-tree search programs. In Figure 3, we show two C functions which are
the basis of a recursive depth-first search of a game tree. By calling Maximize with a position p, we
will get the minimax value of position p as the output of the function after the entire game tree has
been searched.
In Listing 1, we have left out some of the details. For example, we have not defined what a position is,
since this is game-dependent. There are three additional functions that would be required to
implement the minimax search: (1) EndOfGame, which determines whether the game is over at the
input position, returning TRUE if the game is over; (2) GameValue, which accepts a position as a
parameter, determines who has won the game, and returns the payoff with respect to the player Max;
and (3) GenerateSuccessors which generates an array of successor positions (p.succ[]) from the input
position, and returns the number of successors to the calling procedure.
Note that Maximize() and Minimize() recursively call one another until a position is reached where the
EndOfGame() function returns TRUE. As each successor of a node is explored, gamma maintains the
current assessment of the position, based on all of the moves that have been searched so far. Once all
successors have been examined, the minimax value for that position has been computed and stored in
gamma, which can be returned to a higher level within the tree (please refer to Listing 1).
The minimax algorithm can also determine which move yields the score gamma, and return that up
the tree as well. However, there is only one place we are interested in the move choice: the root of the
game tree. We could write a special version of Maximize that returns a best move and the minimax
value.
This formulation requires exactly the same amount of work as the matrix formulation did, but further
pruning can be done on this tree. The algorithm [Knuth 1975] improves on the typical minimax
However, it would still take a modern computer millions of years to evaluate the full game tree for the
game of chess if one had to go all the way to the terminal nodes. How can we control the size of the
search?
________________________________________________________
3.2 Iterative Deepening
| | | |
Features
by Mark Brockington
Gamasutra
June 26, 2000
Editor's note: This paper was originally published in the 2000 Game Developer's Conference
proceedings
The idea is that the algorithm should be limited to exploring a small 4.3 The History of
search depth k by forcing evaluations of nodes once they reach that depth. Heuristic
Once that search is done, the limit k can be moved forward by a step s, and
the search can be repeated to a depth of k+s. In chess programs, k and s usually equal 1. Thus, the
program does a 1-ply search before doing a 2-ply search, which occurs before the 3-ply search et
cetera.
Scott noted that there is no way of predicting how long an ab search will take, since it depends heavily
on the move ordering. However, by using iterative deepening, one can estimate how long a (k+1)-ply
search will take, based on the length of the preceding k-ply search. Unfortunately, the prediction may
be far off the accurate value. In some cases, a real time constraint (such as a time control in a chess
game) may necessitate aborting the current search. Without iterative deepening, if a program has not
finished a search when the time constraint interrupts the search, the program may play a catastrophic
move. With iterative deepening, we can use the best move from the deepest search that was
completed.
Other benefits were explored by Slate and Atkin in their Chess 4.5 program [Slate 1977]. They
discovered that there were many statistics that could be gathered from a search iteration, including
the principal variation. The principal variation of a k-ply search is a good starting place to look for a
principal variation of a (k+1)-ply search, so the principal variation from the k-ply search is searched
Letters to the Editor:
first at depth (k+1). This improves the ordering of the moves in the (k+1)-ply search. Usually, the
Write a letter number of bottom positions explored for all of the searches up to depth d with iterative deepening is
View all letters
significantly smaller than attempting a d-ply search without iterative deepening.
3.3 Transposition Tables
Specific information about a search can be saved in a transposition table [Greenblatt 1967].
In the minimax algorithm given in Listing 1, all of the information about a node can be accumulated
including the best score, the best move from that position, the depth it was searched to. All of this
information is commonly stored into one transposition table entry. Transposition tables are normally
constructed as closed hash tables, with hashing functions that are easy to update (such as a number
of XOR operations) as one traverses the tree. The transposition table information can be used in two
main ways: duplicate detection and move ordering.
Why would we need to detect duplicates in a game tree? In reality, the game tree is a graph; some of
the positions appear in multiple places within the tree. Thus, it makes sense that each position should
only be explored once if the information obtained is sufficient to terminate the search. The
transposition table assists in finding and eliminating these duplicated positions.
The same position in the game will always hash to the same location in the transposition table. What if
the information stored in the table is the same position as the current node, and the stored result of a
search of that position is at least as deep as the search we are attempting to execute? If we have an
exact minimax value in the hash table for a search that is at least as deep as the one to be executed,
The transposition table only offers move ordering information about a single move in the move list.
The history heuristic [Schaeffer 1989] is a useful technique for sorting all other moves. In the game of
chess, a 64 by 64 matrix is used to store statistics. Each time a move from a square startsq to a
square endsq is chosen as a best move during the search, a bonus is stored in the matrix at the
location [startsq,endsq]. The size of this bonus depends on the depth at which the move was
successful at. A bonus that varies exponentially based on the depth of the subtree under that position
has been found to work well in practice. Moves with higher history values are more likely to be best
moves at other points in the tree; thus, moves are sorted based on their current history values. This
makes a dynamic ordering for all possible legal moves in cases where no ordering information exists.
In the programs that the author is aware of, both move ordering techniques are used. The
transposition table move is always used first, since it yields specific information about that node from a
previous search. Once the transposition table move has been searched, the remaining moves are
sorted by the history heuristic.
3.5 Another Method Of Searching A Game Tree - SSS*
Stockman [1979] introduced the SSS* algorithm, a variant to the depth-first search algorithms for
determining the minimax value. Initially, it was believed that the algorithm dominated in the sense
that SSS* will not search a node if did not search it. A problem with SSS* is that a list structure
(the OPEN list) must be maintained, which could grow to b^d/2 elements, where b is the branching
factor and d is the depth of the tree to be searched. At the time, this space requirement was
considered to be too large for a practical chess-playing program. Even if the space requirement was
not a problem, maintaining the OPEN list slowed down the algorithm to make it slower than in
practice.
Although versions of SSS* eventually managed to become faster than for game trees [Reinefeld
1994a], it has been recently discovered that SSS* can be implemented as a series of null-window
calls, using a transposition table instead of an OPEN list [Plaat 1996]. The research showed that the
drawbacks of SSS* are not true. However, it is also important to note that the benefits also disappear:
SSS* is not necessarily better than when dynamic move reordering is considered. When all of the
typical enhancements are used, SSS* can be outperformed by NegaScout and MTD(f).
In game-tree search, a depth-first search algorithm generates results faster than a best-first search
algorithm. A* is also a best-first search algorithm. Is there a better single-agent search algorithm than
A*, that uses a depth-first iterative deepening formulation?
________________________________________________________
4.0 Reimplementing A*
| | | |
Features
by Mark Brockington
Gamasutra
June 26, 2000
Editor's note: This paper was originally published in the 2000 Game Developer's Conference
proceedings
IDA* is a little slower, but what do you gain? Well, have you seen any mention of sorting OPEN
positions on a list, or inserting entries into the CLOSED list? When you use a depth-first iterative
deepening approach, you don't have to store either list. IDA* uses O(d) memory instead of A*, which
uses O(b^d) memory. This makes IDA* a good choice where memory is at a premium. Also note that
because you have very little state information during a search, IDA* is very easy to save and restore if
the AI time slice is up.
4.2 Transposition Tables
If you are using IDA*, you have lost the CLOSED list. Unlike the OPEN list, the CLOSED list has other
functions. The primary function of the CLOSED list in A* is the ability to detect duplicate positions
within the tree. If the same node is reached by two separate paths, IDA* will blindly search through
the node both times. When the first path to the node is shorter than the second path, we have wasted
search effort. We would like a technique that allows us to detect duplicates, and store information
________________________________________________________
| | | |
Features
by Mark Brockington
Gamasutra
June 26, 2000
Editor's note: This paper was originally published in the 2000 Game Developer's Conference
proceedings
We have implemented the techniques described above, and we are currently using them to plot paths
for our creatures in our soon-to-be-released role-playing game, Neverwinter Nights.
There are many caveats to using these techniques, and it is important to be able to understand the
drawbacks. The speed improvements that these techniques yield will vary depending on your
application (they vary dramatically when implementing them in chess, Othello and checkers
Letters to the Editor: programs!) ... but you now have some new enhancements that can help you search more efficiently.
Write a letter
View all letters
To summarize the utility of adding standard enhancements to search algorithms, let us examine
another problem: finding push-optimal solutions for Sokoban problems. If you have never seen the
game Sokoban, a picture of one of the 90 positions is given in Figure 4. The goal is for the little worker
to push all of the round stones into the goal squares (the goal squares are shaded with diagonal lines).
On the surface, this may seem as easy as pathfinding, and an easy application for A*. However, all
pathfinding "mistakes" are undoable by retracing the path. One wrong push of a stone could leave you
in a state where you are unable to complete the task. Thus, the need to plan the path of all stones to
the goal squares is paramount.
IDA* is incapable of solving any of the puzzles with 20 million nodes searched. If we enhance IDA*
with the transposition table and the move ordering techniques, 4 of the puzzles can be solved
[Junghanns 1997]. If we search one billion nodes, only 6 of the 90 puzzles can be solved using IDA*,
transposition tables and move ordering. If we use all of the domain-dependent techniques the
researchers developed (including deadlock tables, tunnel macros, goal macros, goal cuts, pattern
search, relevance cuts and overestimation), the program Rolling Stone can solve 52 of the 90
problems within the billion node limit for each puzzle [Junghanns 1999]. Pathfinding is a relatively
trivial problem in comparison to finding push-optimal solutions for Sokoban puzzles, and I am happy to
say my bosses at BioWare haven't asked me to solve Sokoban in real time.
There's a lot of very good academic information on single-agent search (including a special issue of the
journal Artificial Intelligence later this year which will be devoted to the topic), and I would encourage
everyone to look up some of these references. If you have any further questions on any of the
reference material, please feel free to e-mail me.
Mark Brockington is the lead research scientist at BioWare Corp. His email adress is
markb@bioware.com
References
[Breuker 1996] D. M. Breuker, J. W. H. M. Uiterwijk, and H. J. van den Herik. Replacement Schemes
and Two-Level Tables. ICCA Journal, 19(3):175-179, 1996.
[Greenblatt 1967] R. D. Greenblatt, D. E. Eastlake, and S.D. Crocker. The Greenblatt Chess Program.
In Proceedings of the Fall Joint Computer Conference, volume 31, pages 801-810, 1967.
[Hart 1968] P. E. Hart, N. J. Nilsson, and B. Raphael. A Formal Basis for the Heuristic Determination of
Minimum Cost Paths. IEEE Transactions on Systems Science and Cybernetics, SSC-4(2):100-107,
1968.
[Junghanns 1997] A. Junghanns and J. Schaeffer. Sokoban: A Challenging Single-Agent Search
Problem, Workshop Using Games as an Experimental Testbed for AI Research, IJCAI-97, Nagoya,
Japan, August 1997.
[Junghanns 1999] A. Junghanns. Pushing the Limits: New Developments in Single-Agent Search, PhD
Thesis, Department of Computing Science, University of Alberta, 1999. URL:
http://www.cs.ualberta.ca/~games/Sokoban/papers.html
[Knuth 1975] D. E. Knuth and R. W. Moore. An Analysis of Alpha-Beta Pruning. Artificial Intelligence,
6(3):293-326, 1975.
[Korf 1985] R. E. Korf. Depth-First Iterative Deepening: An Optimal Admissible Tree Search. Artificial
Intelligence, 27:97-109, 1985.
[Nilsson 1971] N. J. Nilsson. Problem-Solving Methods in Artificial Intelligence. McGraw-Hill Book
Company, New York, NY, 1971.
[Plaat 1996] A. Plaat, J. Schaeffer, W. Pijls, and A. de Bruin. Exploiting Graph Properties of Game
Trees. In AAAI-1996, volume 1, pages 234-239, Portland, Oregon, August 1996.
[Reinefeld 1983] A. Reinefeld. An Improvement to the Scout Tree-Search Algorithm. ICCA Journal,
6(4):4-14, 1983.
[Reinefeld 1994a] A. Reinefeld. A Minimax Algorithm Faster than Alpha-Beta. In H.J. van den Herik,
I.S. Herschberg and J.W.H.M. Uiterwijk, editors, Advances In Computer Chess 7, pages 237-250.
University of Limburg, 1994.
[Stockman 1979] G. C. Stockman. A Minimax Algorithm Better than Alpha-Beta? Artificial Intelligence,
12:179-196, 1979.
[Stout 1996] W. B. Stout. Smart Moves: Intelligent Path-Finding. Game Developer, pp. 28-35,
Oct./Nov. 1996.
[von Neumann 1944] J. von Neumann and O. Morgenstern. Theory of Games and Economic Behavior.
Princeton Press, Princeton, NJ, 1944
________________________________________________________
| | | |
Features
By W. Bryan Stout
Gamasutra
February 12, 1999
Smart Moves: Intelligent Path-Finding
Published in Of all the decisions involved in computer-game AI, the most common is probably
Game Developer Contents
path-finding-looking for a good route for moving an entity from here to there. The
Magazine, entity can be a single person, a vehicle, or a combat unit; the genre can be an
Introduction
October 1996 action game, a simulator, a role-playing game, or a strategy game. But any game Path-Finding On The
in which the computer is responsible for moving things around has to solve the Move
path-finding problem.
Looking Before You
And this is not a trivial problem. Questions about path-finding are regularly seen in Leap
online game programming forums, and the entities in several games move in less
than intelligent paths. However, although path-finding is not trivial, there are some The Star of the
well-established, solid algorithms that deserve to be known better in the game Search Algorithms (A*
Search)
community.
How Do I Use A*?
Several path-finding algorithms are not very efficient, but studying them serves us
by introducing concepts incrementally. We can then understand how different Transforming the
shortcomings are overcome. Search Space
To demonstrate the workings of the algorithms visually, I have developed a Storing It Better
program in Delphi 2.0 called "PathDemo." It is available for readers to download.
Fine-Tuning Your
The article and demo assume that the playing space is represented with square
Search Engine
tiles. You can adapt the concepts in the algorithms to other tilings, such as
hexagons; ideas for adapting them to continuous spaces are discussed at the end What If I'm in a
of the article. Smooth World?
| | | |
Features
By W. Bryan Stout
Gamasutra
February 12, 1999 Path-Finding on the Move
The typical problem in path-finding is obstacle avoidance. The simplest approach to Contents
Published in the problem is to ignore the obstacles until one bumps into them. The algorithm
Game Developer Introduction
would look something like this:
Magazine,
Path-Finding On The
October 1996 Move
Fine-Tuning Your
pick another direction according to Search Engine
This approach is simple because it makes few demands: all that needs to be known are the relative
positions of the entity and its goal, and whether the immediate vicinity is blocked. For many game
situations, this is good enough.
● Tracing around the obstacle. Fortunately, there are other ways to get around. If the obstacle is
large, one can do the equivalent of placing a hand against the wall and following the outline of
the obstacle until it is skirted. Figure 2A shows how well this can deal with large obstacles. The
problem with this technique comes in deciding when to stop tracing. A typical heuristic may be:
"Stop tracing when you are heading in the direction you wanted to go when you started tracing."
This would work in many situations, but Figure 2B shows how one may end up constantly
circling around without finding the way out.
● Robust tracing. A more robust heuristic comes from work on mobile robots: "When blocked,
calculate the equation of the line from your current position to the goal. Trace until that line is
again crossed. Abort if you end up at the starting position again." This method is guaranteed to
find a way around the obstacle if there is one, as is seen in Figure 3A. (If the original point of
blockage is between you and the goal when you cross the line, be sure not to stop tracing, or
more circling will result.) Figure 3B shows the downside of this approach: it will often take more
time tracing the obstacle than is needed, making it look pretty simple-minded-though not as
simple as endless circling. A happy compromise would be to combine both approaches: always
use the simpler heuristic for stopping the tracing first, but if circling is detected, switch to the
| | | |
Features
By W. Bryan Stout
Gamasutra
February 12, 1999 Looking Before You Leap
Although the obstacle-skirting techniques discussed above can often do a passable Contents
Published in or even adequate job, there are situations where the only intelligent approach is to Introduction
Game Developer plan the entire route before the first step is taken. In addition, these methods do
Magazine, little to handle the problem of weighted regions, where the difficulty is not so much Path-Finding On The
October 1996 avoiding obstacles as finding the cheapest path among several choices where the Move
terrain can vary in its cost.
Looking Before You
Fortunately, the fields of Graph Theory and conventional AI have several Leap
algorithms that can be used to handle both difficult obstacles and weighted
The Star of the
regions. In the literature, many of these algorithms are presented in terms of Search Algorithms (A*
changing between states, or traversing the nodes of a graph. They are often used Search)
in solving a variety of problems, including puzzles like the 15-puzzle or Rubik's
cube, where a state is an arrangement of the tiles or cubes, and neighboring states How Do I Use A*?
(or adjacent nodes) are visited by sliding one tile or rotating one cube face.
Applying these algorithms to path-finding in geometric space requires a simple Transforming the
adaptation: a state or a graph node stands for the entity being in a particular tile, Search Space
and moving to adjacent tiles corresponds to moving to the neighboring states, or
adjacent nodes. Storing It Better
Fine-Tuning Your
Working from the simplest algorithms to the more robust, we have: Search Engine
● Breadth-first search. Beginning at the start node, this algorithm first
What If I'm in a
examines all immediate neighboring nodes, then all nodes two steps away,
Smooth World?
then three, and so on, until a goal node is found. Typically, each node's
unexamined neighboring nodes are pushed onto an Open list, which is
usually a FIFO (first-in-first-out) queue. The algorithm would go something
like what is shown in Listing 1. Figure 4 shows how the search proceeds.
We can see that it does find its way around obstacles, and in fact it is
guaranteed to find a shortest path-that is, one of several paths that tie for
the shortest in length-if all steps have the same cost. There are a couple of
obvious problems. One is that it fans out in all directions equally, instead of
directing its search towards the goal; the other is that all steps are not
equal-at least the diagonal steps should be longer than the orthogonal ones.
Letters to the Editor:
Write a letter ● Bidirectional breadth-first search. This enhances the simple breadth-first
View all letters search by starting two simultaneous breadth-first searches from the start
and the goal nodes and stopping when a node from one end's search finds a
neighboring node marked from the other end's search. As seen in Figure 5,
this can save substantial work from simple breadth-first search (typically by
a factor of 2), but it is still quite inefficient. Tricks like this are good to
remember, though, since they may come in handy elsewhere.
● Best-first search. This is the first heuristic search considered, meaning that it
takes into account domain knowledge to guide its efforts. It is similar to
Dijkstra's algorithm, except that instead of the nodes in Open being scored
by their distance from the start, they are scored by an estimate of the
distance remaining to the goal. This cost also does not require possible
updating as Dijkstra's does. Figure 8 shows its performance. It is easily the
fastest of the forward-planning searches we have examined so far, heading
in the most direct manner to the goal. We also see its weaknesses. In 8A, we
see that it does not take into account the accumulated cost of the terrain,
plowing straight through a costly area rather than going around it. And in
8B, we see that the path it finds around the obstacle is not direct, but
weaves around it in a manner reminiscent of the hand-tracing techniques
seen above.
| | | |
Features
By W. Bryan Stout
Gamasutra
February 12, 1999 The Star of the Search Algorithms (A* Search)
The best-established algorithm for the general searching of optimal paths is A* Contents
Published in (pronounced "A-star"). This heuristic search ranks each node by an estimate of the Introduction
Game Developer best route that goes through that node. The typical formula is expressed as:
Magazine,
Path-Finding On The
October 1996 Move
So it combines the tracking of the previous path length of Dijkstra's algorithm, with the heuristic
estimate of the remaining path from best-first search. The algorithm proper is seen in Listing 3. Since
some nodes may be processed more than once-from finding better paths to them later-we use a new
list called Closed to keep track of them.
A* has a couple interesting properties. It is guaranteed to find the shortest path, as long as the
heuristic estimate, h(n), is admissible-that is, it is never greater than the true remaining distance to
Letters to the Editor: the goal. It makes the most efficient use of the heuristic function: no search that uses the same
Write a letter
heuristic function h(n) and finds optimal paths will expand fewer nodes than A*, not counting
View all letters tie-breaking among nodes of equal cost. In Figures 9A through 9C, we see how A* deals with
situations that gave problems to other search algorithms.
| | | |
Features
By W. Bryan Stout
Gamasutra
February 12, 1999 How Do I use A*?
A* turns out to be very flexible in practice. Consider the different parts of the Contents
Published in algorithm. Introduction
Game Developer
Magazine, The state would often be the tile or position the entity occupies. But if needed, it
Path-Finding On The
October 1996 can represent orientation and velocity as well (for example, for finding a path for a Move
tank or most any vehicle-their turn radius gets worse the faster they go).
Looking Before You
Neighboring states would vary depending on the game and the local situation. Leap
Adjacent positions may be excluded because they are impassable or are between
the neighbors. Some terrain can be passable for certain units but not for others; The Star of the
units that cannot turn quickly cannot go to all neighboring tiles. Search Algorithms (A*
Search)
The cost of going from one position to another can represent many things: the
How Do I Use A*?
simple distance between the positions; the cost in time or movement points or fuel
between them; penalties for traveling through undesirable places (such as points Transforming the
within range of enemy artillery); bonuses for traveling through desirable places Search Space
(such as exploring new terrain or imposing control over uncontrolled locations);
and aesthetic considerations-for example, if diagonal moves are just as cheap as Storing It Better
orthogonal moves, you may still want to make them cost more, so that the routes
chosen look more direct and natural. Fine-Tuning Your
Search Engine
The estimate is usually the minimum distance between the current node and the
What If I'm in a
goal multiplied by the minimum cost between nodes. This guarantees that h(n) is
Smooth World?
admissible. (In a map of square tiles where units may only occupy points in the
grid, the minimum distance would not be the Euclidean distance, but the minimum number of
orthogonal and diagonal moves between the two points.)
The goal does not have to be a single location but can consist of multiple locations. The estimate for a
node would then be the minimum of the estimate for all possible goals.
Search cutoffs can be included easily, to cover limits in path cost, path distance, or both.
From my own direct experience, I have seen the A* star search work very well for finding a variety of
types of paths in wargames and strategy games.
Letters to the Editor:
Write a letter
View all letters
The Limitations of A*
There are situations where A* may not perform very well, for a variety of reasons. The more or less
real-time requirements of games, plus the limitations of the available memory and processor time in
some of them, may make it hard even for A* to work well. A large map may require thousands of
entries in the Open and Closed list, and there may not be room enough for that. Even if there is
enough memory for them, the algorithms used for manipulating them may be inefficient.
The quality of A*'s search depends on the quality of the heuristic estimate h(n). If h is very close to
the true cost of the remaining path, its efficiency will be high; on the other hand, if it is too low, its
efficiency gets very bad. In fact, breadth-first search is an A* search, with h being trivially zero for all
nodes-this certainly underestimates the remaining path cost, and while it will find the optimum path, it
will do so slowly. In Figure 10A, we see that while searching in expensive terrain (shaded area), the
frontier of nodes searched looks similar to Dijkstra's algorithm; in 10B, with the heuristic increased,
the search is more focused.
Let's look at ways to make the A* search more efficient in problem areas.
| | | |
Features
By W. Bryan Stout
Gamasutra
February 12, 1999 Transforming the Search Space
Perhaps the most important improvement one can make is to restructure the Contents
Published in problem to be solved, making it an easier problem. Macro-operators are sequences Introduction
Game Developer of steps that belong together and can be combined into a single step, making the
Magazine, search take bigger steps at a time. For example, airplanes take a series of steps in Path-Finding On The
October 1996 order to change their orientation and altitude. A common sequence may be used Move
as a single change of state operator, rather than using the smaller steps
individually. In addition, search and general problem-solving methods can be Looking Before You
greatly simplified if they are reduced to sub-problems, whose individual solutions Leap
are fairly simple. In the case of path-finding, a map can be broken down into large
contiguous areas whose connectivity is known. One or two border tiles between The Star of the
each pair of adjacent areas are chosen; then the route is first laid out in by a Search Algorithms (A*
Search)
search among adjacent areas, in each of which a route is found from one border
point to another. How Do I Use A*?
For example, in a strategic map of Europe, a path-finder searching for a land route Transforming the
from Madrid to Athens would probably waste a fair amount of time looking down Search Space
the boot of Italy. Using countries as areas, a hierarchical search would first
determine that the route would go from Spain to France to Italy to Yugoslavia Storing It Better
(looking at an old map) to Greece; and then the route through Italy would only
Fine-Tuning Your
need to connect Italy's border with France, to Italy's border with Yugoslavia. As Search Engine
another example, routes from one part of a building to another can be broken
down into a path of rooms and hallways to take, and then the paths between doors What If I'm in a
in each room. Smooth World?
It is much easier to choose areas in predefined maps than to have the computer figure them out for
randomly generated maps. Note also that the examples discussed deal mainly with obstacle
avoidance; for weighted regions, it is trickier to assign useful regions, especially for the computer (it
may not very useful, either).
Storing It Better
| | | |
Features
By W. Bryan Stout
Gamasutra
February 12, 1999 Storing it Better
Even if the A* search is relatively efficient by itself, it can be slowed down by Contents
Published in inefficient algorithms handling the data structures. Regarding the search, two Introduction
Game Developer major data structures are involved.
Magazine,
Path-Finding On The
October 1996 The first is the representation of the playing area. Many questions have to be Move
addressed. How will the playing field be represented? Will the areas accessible
from each spot-and the costs of moving there-be represented directly in the map Looking Before You
or in a separate structure, or calculated when needed? How will features in the Leap
area be represented? Are they directly in the map, or separate structures? How
can the search algorithm access necessary information quickly? There are too The Star of the
many variables concerning the type of game and the hardware and software Search Algorithms (A*
Search)
environment to give much detail about these questions here.
How Do I Use A*?
The second major structure involved is the node or state of the search, and this
can be dealt with more explicitly. At the lower level is the search state structure. Transforming the
Fields a developer might wish to include in it are: Search Space
❍ The location (coordinates) of the map position being considered at this
Storing It Better
state of the search.
Fine-Tuning Your
❍ Other relevant attributes of the entity, such as orientation and velocity. Search Engine
What If I'm in a
❍ The cost of the best path from the source to this location. Smooth World?
❍ The estimate of the cost to the goal (or closest goal) from this location.
❍ The score of this state, used to pick the next state to pop off Open.
❍ A limit for the length of the search path, or its cost, or both, if applicable.
Letters to the Editor:
Write a letter ❍ A reference (pointer or index) to the parent of this node-that is, the node
View all letters that led to this one.
On the higher level are the aggregate data structures-the Open and Closed lists. Although keeping
them as separate structures is typical, it is possible to keep them in the same structure, with a flag in
the node to show if it is open or not. The sorts of operations that need to be done in the Closed list
are:
❍ Insert a new node.
❍ Remove an arbitrary node.
❍ Search for a node having certain attributes (location, speed, direction).
❍ Clear the list at the end of the search.
The Open list does all these, and in addition will:
● Pop the node with the best score.
There are several types of binary search trees: 2-3-4 trees, red-black trees, height-balanced trees
(AVL trees), and weight-balanced trees.
Heaps and balanced search trees have the advantage of logarithmic times for insertion, deletion, and
search; however, if the number of nodes is rarely large, they may not be worth the overhead they
require.
| | | |
Features
By W. Bryan Stout
Gamasutra
February 12, 1999 What if I'm in a Smooth World?
All these search methods have assumed a playing area composed of square or Contents
Published in hexagonal tiles. What if the game play area is continuous? What if the positions of Introduction
Game Developer both entities and obstacles are stored as floats, and can be as finely determined as
Magazine, the resolution of the screen? Figure 11A shows a sample layout. For answers to Path-Finding On The
October 1996 these search conditions, we can look at the field of robotics and see what sort of Move
approaches are used for the path-planning of mobile robots. Not surprisingly,
many approaches find some way to reduce the continuous space into a few Looking Before You
important discrete choices for consideration. After this, they typically use A* to Leap
search among them for a desirable path. Ways of quantizing the space include:
The Star of the
❍ Tiles. A simple approach is to slap a tile grid on top of the space. Tiles that Search Algorithms (A*
contain all or part of an obstacle are labeled as blocked; a fringe of tiles Search)
touching the blocked tiles is also labeled as blocked to allow a buffer of
movement without collision. This representation is also useful for weighted How Do I Use A*?
regions problems. See Figure 11B.
Transforming the
Search Space
❍ Points of visibility. For obstacle avoidance problems, you can focus on the
critical points, namely those near the vertices of the obstacles (with enough Storing It Better
space away from them to avoid collisions), with points being considered
connected if they are visible from each other (that is, with no obstacle Fine-Tuning Your
Search Engine
between them). For any path, the search considers only the critical points as
intermediate steps between start and goal. See Figure 11C.
What If I'm in a
Smooth World?
❍ Convex polygons. For obstacle avoidance, the space not occupied by
polygonal obstacles can be broken up into convex polygons; the intermediate
spots in the search can be the centers of the polygons, or spots on the
borders of the polygons. Schemes for decomposing the space include:
C-Cells (each vertex is connected to the nearest visible vertex; these lines
partition the space) and Maximum-Area decomposition (each convex vertex
of an obstacle projects the edges forming the vertex to the nearest obstacles
or walls; between these two segments and the segment joining to the
nearest visible vertex, the shortest is chosen). See Figure 11D. For
Letters to the Editor:
weighted regions problems, the space is divided into polygons of
Write a letter homogeneous traversal cost. The points to aim for when crossing boundaries
View all letters
are computed using Snell's Law of Refraction. This approach avoids the
irregular paths found by other means.
❍ Potential fields. An approach that does not quantize the space, nor require
complete calculation beforehand, is to consider that each obstacle has a
repulsive potential field around it, whose strength is inversely proportional to
the distance from it; there is also a uniform attractive force to the goal. At
close regular time intervals, the sum of the attractive and repulsive vectors
is computed, and the entity moves in that direction. A problem with this
approach is that it may fall into a local minimum; various ways of moving
out of such spots have been devised.
Bryan Stout has done work in "real" AI for Martin Marietta and in computer games for
MicroProse. He is preparing a book on computer game AI to be published by Morgan
Kaufmann Publishers in 2001. He can be contacted at bstout@mindspring.com.
| | | |
Features
by Marco Pinter
Gamasutra
[Author's Bio] Toward More Realistic Pathfinding
March 14, 2001
Pathfinding is a core component of most games today. Characters, animals, and vehicles all move in
some goal-directed manner, and the program must be able to identify a good path from an origin to a
goal, which both avoids obstacles and is the most efficient way of getting to the destination. The
best-known algorithm for achieving this is the A* search (pronounced "A star"), and it is typical for a
Introduction lead programmer on a project simply to say, "We'll use A* for pathfinding." However, AI programmers
have found again and again that the basic A* algorithm can be woefully inadequate for achieving the
Adding Realistic Turns kind of realistic movement they require in their games.
Directional Curved This article focuses on several techniques for achieving more realistic looking results from pathfinding.
Paths Many of the techniques discussed here were used in the development of Activision's upcoming Big
Game Hunter 5, which made for startlingly more realistic and visually interesting movement for the
A Better Smoothing various animals in the game. The focal topics presented here include:
Pass
● Achieving smooth straight-line movement. Figure 1a shows the result of a standard A*
Printer Friendly search, which produces an unfortunate "zigzag" effect. This article presents postprocessing
Version solutions for smoothing the path, as shown in Figure 1b.
● Adding smooth turns. Turning in a curved manner, rather than making abrupt changes of
Discuss this direction, is critical to creating realistic movement. Using some basic trigonometry, we can make
Article
turns occur smoothly over a turning radius, as shown in Figure 1c. Programmers typically use
the standard A* algorithm and then use one of several hacks or cheats to create a smooth turn.
Several of these techniques will be described.
● Achieving legal turns. Finally, I will discuss a new formal technique which modifies the A*
algorithm so that the turning radius is part of the actual search. This results in guaranteed
"legal" turns for the whole path, as shown in Figure 1d.
Letters to the Editor:
Write a letter
View all letters
Dealing with realistic turns is an important and timely AI topic. In the August 2000 issue of Game
Developer ("The Future of Game AI"), author Dave Pottinger states, "So far, no one has proffered a
simple solution for pathing in true 3D while taking into account such things as turn radius and other
movement restrictions," and goes on to describe some of the "fakes" that are commonly done. Also, in
a recent interview on Feedmag.com with Will Wright, creator of The Sims, Wright describes movement
of The Sims' characters: "They might have to turn around and they kind of get cornered -- they
actually have to calculate how quickly they can turn that angle. Then they actually calculate the angle
of displacement from step to step. Most people don't realize how complex this stuff is..."
In addition to the above points, I will also cover some important optimization techniques, as well as
some other path-related topics such as speed restrictions, realistic people movement, and movement
along roads. After presenting the various techniques below, we'll see by the end that there is no true
"best approach," and that the method you choose will depend on the specific nature of your game, its
characters, available CPU cycles and other factors.
Note that in the world of pathfinding, the term "unit" is used to represent any on-screen mobile
element, whether it's a player character, animal, monster, ship, vehicle, infantry unit, and so on. Note
also that while the body of this article presents examples based on tile-based searching, most of the
techniques presented here are equally applicable to other types of world division, such as convex
polygons and 3D navigation meshes.
A Brief Introduction to A*
The A* algorithm is a venerable technique which was originally applied to various mathematical
problems and was adapted to pathfinding during the early years of artificial intelligence research. The
basic algorithm, when applied to a grid-based pathfinding problem, is as follows: Start at the initial
position (node) and place it on the Open list, along with its estimated cost to the destination, which is
determined by a heuristic. The heuristic is often just the geometric distance between two nodes. Then
perform the following loop while the Open list is nonempty:
● Pop the node off the Open list that has the lowest estimated cost to the destination.
● If the node is the destination, we've successfully finished (quit).
● Examine the node's eight neighboring nodes.
Hierarchical Pathfinding
Critical to any discussion of efficient pathfinding within a game is the notion of hierarchical maps. To
perform an efficient A* search, it is important that the origin and destination nodes of any particular
search are not too far apart, or the search time will become enormous. I recommend that the distance
between origin and destination be constrained to 40 tiles, and that the total search space be no more
than 60x60 tiles (creating a 10-tile-wide buffer behind both origin and destination, allowing the path to
wrap around large obstacles.) If units need to search for more distant destinations, some method of
hierarchical pathfinding should be used.
In the real world, people do not formulate precise path plans which stretch on for miles. Rather, if a
person has some domain knowledge of the intermediate terrain, they will subdivide the path, i.e. "first
get to the highway on-ramp, then travel to the exit for the mall, then drive to the parking lot."
Alternatively, if a person has no domain knowledge, they will create intermediate points as they see
them. For example, if you wanted to eventually reach some point you knew was far to the North, you
would first look North and pick a point you could see, plan a path toward it, and only when you got
there, you would pick your next point.
Within a game program, the techniques for creating a map hierarchy include:
1. Subdivide the line to the destination into midpoints, each of which is then used as a
subdestination. Unfortunately, this always leaves the possibility that a chosen midpoint will be at
an impossible location, which can eliminate the ability to find a valid path (see the "Path Failure"
section later in this article).
2. Preprocess the map into a large number of regions, for example castles, clearings, hills, and so
on. This can be done by an artist/designer, or even automated if maps are random. Then start
by finding a path on the "region map" to get from the current position to the destination region,
and then find a tile-based path on the detailed map to get to the next region. Alternatively, if a
unit has no region knowledge and you want to be completely realistic with its behavior, it can
just choose the next region which lies in the compass direction of its ultimate destination.
(Though again, this can result in path failure.)
Note additionally that the "list" can be implemented as a binary tree (by having two Next node
pointers at each element), but we've actually found it to be substantially faster to have a simple
(non-priority) list. While this does result in time O(n) for the search for the lowest cost node at the top
of the A* loop (rather than O(log n) for a priority queue), it excels in that all insertions and deletions,
of which there are many, are only O(1). Best of all, it eliminates the inner loop search that checks if
neighboring nodes yet exist on the Open or Closed lists, which otherwise would take O(n) (or maybe a
One simple method of reducing the number of turns is to make the following modification to the A*
algorithm: Add a cost penalty each time a turn is taken. This will favor paths which are the same
distance, but take fewer turns, as shown in Figure 2b. Unfortunately, this simplistic solution is not very
effective, because all turns are still at 45-degree angles, which causes the movement to continue to
look rather unrealistic. In addition, the 45-degree-angle turns often cause paths to be much longer
than they have to be. Finally, this solution may add significantly to the time required to perform the A*
algorithm.
The actual desired path is that shown in Figure 2c, which takes the most direct route, regardless of the
angle. In order to achieve this effect, we introduce a simple smoothing algorithm which takes place
after the standard A* algorithm has completed its path. The algorithm makes use of a function
Walkable(pointA, pointB), which samples points along a line from point A to point B at a certain
granularity (typically we use one-fifth of a tile width), checking at each point whether the unit overlaps
any neighboring blocked tile. (Using the width of the unit, it checks the four points in a diamond
The smoothing algorithm simply checks from waypoint to waypoint along the path, trying to eliminate
intermediate waypoints when possible. To achieve the path shown in Figure 2c, the four waypoints
shown in red in Figure 2a were eliminated.
Since the standard A* algorithm searches the surrounding eight tiles at every node, there are times
when it returns a path which is impossible, as shown with the green path in Figure 4. In these cases,
the smoothing algorithm presented above will smooth the portions it can (shown in purple), and leave
the "impossible" sections as is.
This simple smoothing algorithm is similar to "line of sight" smoothing, in which all waypoints are
progressively skipped until the last one that can be "seen" from the current position. However, the
algorithm presented here is more accurate, because it adds collision detection based on the width of
the character and also can be used easily in conjunction with the realistic turning methods described in
the next section.
Note that the simple smoothing algorithm presented above, like other simple smoothing methods, is
less effective with large units and with certain configurations of blocking objects. A more sophisticated
smoothing pass will be presented later.
________________________________________________________
| | | |
Features
by Marco Pinter
Gamasutra
[Author's Bio] Toward More Realistic Pathfinding
March 14, 2001
Adding Realistic Turns
The next step is to add realistic curved turns for our units, so that they don't appear to change
direction abruptly every time they need to turn. A simple solution involves using a spline to smooth
Introduction the abrupt corners into turns. While this solves some of the aesthetic concerns, it still results in
physically very unrealistic movement for most units. For example, it might change an abrupt cornering
Adding Realistic Turns of a tank into a tight curve, but the curved turn would still be much tighter than the tank could
actually perform.
Directional Curved
Paths
A Better Smoothing
Pass
Printer Friendly
Version
Discuss this
Article
For a better solution, the first thing we need to know is the turning radius for our unit. Turning radius
is a fairly simple concept: if you're in a big parking lot in your car, and turn the wheel to the left as far
as it will go and proceed to drive in a circle, the radius of that circle is your turning radius. The turning
radius of a Volkswagen Beetle will be substantially smaller than that of a big SUV, and the turning
radius of a person will be substantially less than that of a large, lumbering bear.
Let's say you're at some point (origin) and pointed in a certain direction, and you need to get to some
other point (destination), as illustrated in Figure 5. The shortest path is found either by turning left as
far as you can, going in a circle until you are directly pointed at the destination, and then proceeding
forward, or by turning right and doing the same thing.
In Figure 5 the shortest route is clearly the green line at the bottom. This path turns out to be fairly
straightforward to calculate due to some geometric relationships, illustrated in Figure 6.
First we calculate the location of point P, which is the center of our turning circle, and is always radius
r away from the starting point. If we are turning right from our initial direction, that means P is at an
angle of (initial_direction - 90) from the origin, so:
angleToP = initial_direction - 90
P.x = Origin.x + r * cos(angleToP)
P.y = Origin.y + r * sin(angleToP)
Now that we know the location of the center point P, we can calculate the distance from P to the
destination, shown as h on the diagram:
dx = Destination.x - P.x
dy = Destination.y - P.y
h = sqrt(dx*dx + dy*dy)
At this point we also want to check that the destination is not within the circle, because if it were, we
could never reach it:
if (h < r)
return false
Now we can calculate the length of segment d, since we already know the lengths of the other two
sides of the right triangle, namely h and r. We can also determine angle from the right-triangle
relationship:
d = sqrt(h*h - r*r)
theta = arccos(r / h)
Finally, to figure out the point Q at which to leave the circle and start on the straight line, we need to
know the total angle + , and is easily determined as the angle from P to the destination:
phi = arctan(dy / dx) [offset to the correct quadrant]
Q.x = P.x + r * cos(phi + theta)
Q.y = P.y + r * sin(phi + theta)
The above calculations represent the right-turning path. The left-hand path can be calculated in
exactly the same way, except that we add 90 to initial_direction for calculating angleToP, and
later we use - instead of + . After calculating both, we simply see which path is shorter and
use that one.
In our implementation of this algorithm and the ones that follow, we utilize a data structure which
stores up to four distinct "line segments," each one being either straight or curved. For the curved
paths described here, there are only two segments used: an arc followed by a straight line. The data
structure contains members which specify whether the segment is an arc or a straight line, the length
of the segment, and its starting position. If the segment is a straight line, the data structure also
specifies the angle; for arcs, it specifies the center of the circle, the starting angle on the circle, and
the total radians covered by the arc.
Once we have calculated the curved path necessary to get between two points, we can easily calculate
our position and direction at any given instant in time, as shown in Listing 2.
LISTING 2. Calculating the position and orientation at a particular time.
distance = unit_speed * elapsed_time
loop i = 0 to 3:
This solution is actually quite acceptable for many games. However, we often don't want to allow any
obviously illegal turns where the unit overlaps obstacles. The next three methods address this
problem.
2. Path recalculations. With this method, after the A* has completed, we step through the path,
making sure every move from one waypoint to the next is valid. (This can be done as part of a
smoothing pass.) If we find a collision, we mark the move as invalid and try the A* path search
again. In order to do this, we need to store one byte for every tile (or add an additional byte to
the matrix elements described in the optimization section above). Each bit will correspond to one
of the eight tiles accessible from that tile. Then we modify the A* algorithm slightly so that it
checks whether a particular move is valid before allowing it. The main problem with this method
is that by invalidating certain moves, a valid path approaching the tile from a different direction
can be left unfound. Also, in a worst-case scenario, this method could need to recalculate the
path many times over.
3. Making tighter turns. Another solution is that whenever we need to make a turn that would
normally cause a collision, we allow our turning radius to decrease until the turn becomes legal.
This is illustrated with the first turn in Figure 7a. One proviso is that when we conduct the A*
search, we need to search only the surrounding four tiles at every node (as opposed to eight), so
we don't end up with impossible situations like the one illustrated in Figure 4. In the case of
During the algorithm, when we're at a parent node p and checking a child node q, we don't just check
if the child itself is a blocked tile. We check if a curved path from p to q is possible (taking into account
the orientation at p, the orientation at q, and the turning radius); and if so, we check if traveling on
that path would hit any blocked tiles. Only then do we consider a child node to be valid. In this
fashion, every path we look at will be legal, and we will end up with a valid path given the size and
turning radius of the unit. Figure 8 illustrates this.
FIGURE 8. A legal turn wich will only be found with the Directional A*
technique.
The shortest path, and the one that would be chosen by the standard A* algorithm, goes from a to c.
However, the turning radius of the unit prevents the unit from performing the right turn at c given the
surrounding blockers, and thus the standard A* would return an invalid path in this case. The
Directional A*, on the other hand, sees this and instead looks at the alternate path through b. Yet
even at b, a 90 degrees turn to the left is not possible due to nearby blockers, so the algorithm finds
that it can make a right-hand loop and then continue.
________________________________________________________
| | | |
Features
by Marco Pinter
Gamasutra
[Author's Bio] Toward More Realistic Pathfinding
March 14, 2001
Directional Curved Paths
In order to implement the Directional A* algorithm, it is necessary to figure out how to compute the
shortest path from a point p to a point q, taking into account not only starting direction, orientation,
Introduction and turning radius, but also the ending direction. This algorithm will allow us to compute the shortest
legal method of getting from a current position and orientation on the map to the next waypoint, and
Adding Realistic Turns also to be facing a certain direction upon arriving there.
Directional Curved Earlier we saw how to compute the shortest path given just a starting orientation and turning radius.
Paths Adding a fixed final orientation makes the process a bit more challenging.
A Better Smoothing
Pass
Printer Friendly
Version
Discuss this
Article
There are four possible shortest paths for getting from origin to destination with fixed starting and
ending directions. This is illustrated in Figure 9. The main difference between this and Figure 5 is that
we approach the destination point by going around an arc of a circle, so that we will end up pointing in
the correct direction. Similar to before, we will use trigonometric relationships to figure out the angles
and lengths for each segment, except that there are now three segments in total: the first arc, the line
in the middle, and the second arc.
We can easily position the turning circles for both origin and destination in the same way that we did
earlier for Figure 6. The challenge is finding the point (and angle) where the path leaves the first
circle, and later where it hits the second circle. There are two main cases that we need to consider.
First, there is the case where we are traveling around both circles in the same direction, for example
clockwise and clockwise (see Figure 10).
The second case, where the path travels around the circles in opposite directions (for example,
clockwise around the first and counterclockwise around the second), is somewhat more complicated
(see Figure 11). To solve this problem, we imagine a third circle centered at P3 which is tangent to the
destination circle, and whose angle relative to the destination circle is at right angles with the (green)
path line. Now we follow these steps:
1. Observe that we can draw a right triangle between P1, P2, and P3.
2. We know that the length from P2 to P3 is (2 * radius), and we already know the length from P1
to P2, so we can calculate the angle as = arccos(2 * radius / Length(P1, P2))
3. Since we also already know the angle of the line from P1 to P2, we just add or subtract
(depending on clockwise or counterclockwise turning) to get the exact angle of the (green) path
line. From that we can calculate the arc angle where it leaves the first circle and the arc angle
where it touches the second circle.
We now know how to determine all four paths from origin to destination, so given two nodes (and their
associated positions and directions), we can calculate the four possible paths and use the one which is
the shortest.
Note that we can now use the simple smoothing algorithm presented earlier with curved paths, with
just a slight modification to the Walkable(pointA, pointB) function. Instead of point-sampling in a
straight line between pointA and pointB, the new Walkable(pointA, directionA, pointB,
directionB) function samples intermediate points along a valid curve between A and B given the
Nonetheless, the algorithm still requires that the waypoints are at the center of tiles and at exact
compass directions. These restrictions can seemingly cause problems where a valid path may not be
found. The case of tile-centering is discussed in more detail below. The problem of rounded compass
directions, however, is in fact very minimal and will almost never restrict a valid path. It may cause
visible turns to be a bit more exaggerated, but this effect is very slight.
Expanded searching to surrounding tiles. So far in this discussion, we have assumed that at every
node, you check the surrounding eight locations as neighbors. We call this a Directional-8 search. As
mentioned in the preceding paragraph, there are times when this is restrictive. For example, the
search shown in Figure 12 will fail for a Directional-8 search, because given a wide turning radius for
the ship, it would impossible to traverse a -> b -> c -> d without hitting blocking tiles. Instead, it is
necessary to find a curve directly from a -> d.
Accomplishing this requires searching not just the surrounding eight tiles, which are one tile away, but
the surrounding 24 tiles, which are two tiles away. We call this a Directional-24 search, and it was
such a search that produced the valid path shown in Figure 12. We can even search three tiles away
for a Directional-48 search. The main problem with these extended searches is computation time. A
node in a Directional-8 search has 8 x 8 = 64 child nodes, but a node in a Directional-24 search has 24
x 8 = 192 child nodes.
A small optimization we can do is to set up a directional table to tell us the relative position of a child
given a simple index. For example, in a Directional-48 search, we loop through directions 0 -> 47, and
a sample table entry would be:
DirTable[47] = <-3,+3>.
The modified heuristic. Our modified A* algorithm will also need a modified heuristic (to estimate
the cost from an intermediate node to the goal). The original A* heuristic typically just measures a
straight-line distance from the current position to the goal position. If we used this, we would end up
equally weighing every compass direction at a given location, which would make the search take
substantially longer in most cases. Instead, we want to favor angles that point toward the goal, while
also taking turning radius into account. To do this, we change the heuristic to be the distance of the
Any tiles that would be touched by a unit traveling between those nodes (other than the origin and
destination tiles themselves) will be marked by a "1" in the appropriate bit-field, while all others will be
"0." Then during the search algorithm itself, the table will simply be accessed, and any marked nodes
will result in a check to see if the associated node in the real map is blocked or not. (A blocked node
would then result in report of failure to travel between those nodes.) Note that the table is dependent
on both the size and turn radius of the unit, so if those values change, the table will need to be
recomputed.
Earlier, I mentioned how the very first position in a path may not be at the precise center location of a
tile or at a precise compass direction. As a result, if we happen to be specifically checking neighbors of
that first tile, the algorithm needs to do a full computation to determine the path, since the table
would not be accurate.
Other options: backing up. Finally, if your units are able to move in reverse, this can easily be
incorporated into the Directional A* algorithm to allow further flexibility in finding a suitable path. In
addition to the eight forward directions, simply add an additional eight reverse directions. The
algorithm will automatically utilize reverse in its path search. Typically units shouldn't be traveling in
reverse half the time though, so you can also add a penalty to the distance and heuristic computations
for traveling in reverse, which will "encourage" units to go in reverse only when necessary.
Correctness of Directional A*
The standard A* algorithm, if used in a strict tile-based world with no turning restrictions, and in a
world where a unit must always be at the center of a tile, is guaranteed to find a solution if one exists.
The Directional A* algorithm on the other hand, when used in a more realistic world with turning
restrictions and nondiscrete positions, is not absolutely guaranteed to find a solution. There are a
couple reasons for this.
Earlier we saw how the Directional-8 algorithm could occasionally miss a valid path, and this was
illustrated in Figure 12. The conclusion was to use a Directional-24 search or even a Directional-48
search. However, in very rare circumstances, the same problem could occur with a Directional-48
search. We could extend even further to a Directional-80 search, but at that point the computation
What we really want is a solution which can modify a continuous path, like the one shown at the
bottom of Figure 14, and create a very similar path using just discrete lines, as shown in the top of the
figure.
The solution involves two steps. First, for all arcs in the path, follow the original circle as closely as
possible, but staying just outside of it. Second, for straight lines in the path, create the closest
approximation using two line segments of legal angles. These are both illustrated in Figure 15. For the
purposes of studying the figure, we have allowed only eight legal directions, though this can easily be
extended to 16.
On the left, we see that we can divide the circle into 16 equivalent right triangles. Going one direction
on the outside of the circle (for example, northeast) involves traversing the bases of two of these
triangles. Each triangle has one leg which has the length of the radius, and the inside angle is
simply /8 or 22.5 degrees. Thus the base of each triangle is simply:
base = r * tan(22.5)
We can then extrapolate any point on the original arc (for example, 51.2 degrees) onto the point
where it hits one of the triangles we have just identified. Knowing these relationships, we can then
calculate (with some additional work) the starting and ending point on the modified arc, and the total
distance in between.
For a straight line, we simply find the two lines of the closest legal angles, for example 0 degrees and
45 degrees, and determine where they touch. As shown in the figure, there are actually two such
routes (one above the original line and another below), but in our sample program we always just pick
one. Using basic slope and intercept relationships, we simply calculate intersection of the two lines to
determine where to change direction.
Note that we still use the "line segment" storage method introduced earlier to store the modified path
for fixed character art. In fact, this is why I said earlier we would need up to four line segments. The
starting and ending arc remain one line segment each (and we determine the precise position of the
unit on the modified "arc" while it is actually moving), but the initial straight line segment between the
two arcs now becomes two distinct straight line segments.
A B
FIGURE 16. (a) Standard cornering, and (b) Modified tight cornering
for roads.
At certain times, even though it is longer, the path in Figure 16b may be what is desired. This most
often occurs when units are supposed to be traveling on roads. It simply is not realistic for a vehicle to
drive diagonally across a road just to save a few feet in total distance.
There are a few ways of achieving this.
1. When on roads, make sure to do only a regular A* search or a Directional-8 search, and do not
apply any smoothing algorithm afterwards. This will force the unit to go to the adjacent tile.
However, this will only work if the turning radius is small enough to allow such a tight turn.
________________________________________________________
| | | |
Features
by Marco Pinter
Gamasutra
[Author's Bio] Toward More Realistic Pathfinding
March 14, 2001
A Better Smoothing Pass
The smoothing algorithm given earlier is less than ideal when used by itself. There are two reasons for
this. Figure 17 demonstrates the first problem. The algorithm stops at point q and looks ahead to see
Introduction how many nodes it can skip while still conducting a legal move. It makes it to point r, but fails to allow
a move from q to s because of the blocker near q. Therefore it simply starts again at r and skips to the
Adding Realistic Turns destination. What we'd really like to see is a change of direction at p, which cuts diagonally to the final
destination, as shown with the dashed line.
Directional Curved
Paths
A Better Smoothing
Pass
Printer Friendly
Version
Discuss this
Article
The second problem exhibits itself only when we have created a path using the simple
(non-directional) method, and is demonstrated by the green line in Figure 18. The algorithm moves
forward linearly, keeping the direction of the ship pointing straight up, and stops at point p. Looking
ahead to the next point (q), it sees that the turning radius makes the turn impossible. The smoothing
algorithm then proceeds to "cheat" and simply allow the turn. However, had it approached p from a
diagonal, it could have made the turn legally as evidenced by the blue line.
To fix these problems, we introduce a new pre-smoothing pass that will be executed after the A*
search process, but prior to the simple smoothing algorithm described earlier. This pass is actually a
very fast version of our Directional-48 algorithm, with the difference that we only allow nodes to move
along the path we previously found in the A* search, but we consider the neighbors of any node to be
those waypoints which were one, two, or three tiles ahead in the original path. We also modify the
cost heuristic to favor the direction of the original path (as opposed to the direction toward the goal).
The algorithm will automatically search through various orientations at each waypoint, and various
combinations of hopping in two- or three-tile steps, to find the best way to reach the goal.
Because this algorithm sticks to tiles along the previous path, it runs fairly quickly, while also allowing
us to gain many of the benefits of a Directional-48 search. For example, it will find the legal blue line
path shown in Figure 18. Of course it is not perfect, as it still will not find paths that are only visible to
a full Directional-48 search, as seen in Figure 19.
The original, nondirectional search finds the green path, which executes illegal turns. There are no
legal ways to perform those turns while still staying on the path. The only way to arrive at the
destination legally is via a completely different path, as shown with the blue line. This pre-smoothing
algorithm cannot find that path: it can only be found using a true Directional search, or by one of the
hybrid methods described later. So the pre-smoothing algorithm fails under this condition. Under such
a failure condition, and especially when the illegal move occurs near the destination, the
pre-smoothing algorithm may require far more computation time than we desire, because it will search
back through every combination of directional nodes along the entire path. To help alleviate this and
improve performance, we add an additional feature such that once the pre-smoothing algorithm has
reached any point p along the path, if it ever searches back to a point that is six or more points prior
to p in the path, it will fail automatically.
FIGURE 21. Faster hybrid paths. FIGURE 20 (a, b and c from top to
bottom) The (fast) hybrid
The standard A* algorithm allows for tiles to have a variety Directional A* pathfinding
of costs. For example, movement on sand is much more technique.
"expensive" than movement over pavement, so the algorithm will favor paths on pavement even if the
total distance may be longer. This type of terrain costing is fully supported by the Directional A*
algorithm and other techniques listed here.
Speed restrictions are a trickier issue. For the map in Figure 22, the algorithms presented here will
choose the green path, because it has the shortest distance. However, for a vehicle with slow
acceleration/deceleration times, the blue path would be faster, because it only requires two turns
instead of eight, and has long stretches for going at high speeds.
The formal way to attack this problem would be to add yet another dimension to the search space. For
the Directional algorithm, we added current direction as a third dimension. We could theoretically add
Friendly-collision avoidance. Units which are friendly to one another typically need some method of
avoiding collisions and continuing toward a goal destination. One effective method is as follows: Every
half-second or so, make a quick map of which tiles each unit would hit over the next two seconds if
they continued on their current course. Each unit then "looks" to see whether it will collide with any
other unit. If so, it immediately begins decelerating, and plans a new route that avoids the problem
tile. (It can start accelerating again once the paths no longer cross.) Ideally, all units will favor
movement to the right side, so that units facing each other won't keep hopping back to the left and
right (as we often do in life). Still, units may come close to colliding and need to be smart enough to
stop, yield to the right, back up a step if there's not enough room to pass, and so on.
Final Notes
This article has made some simplifying assumptions to help describe the search methods presented.
First, all searches shown have been in 2D space. Most games still use 2D searches, since the third
dimension is often inaccessible to characters, or may be a slight variation (such as jumping) that
The algorithms presented in this article are only partially optimized. They can potentially be sped up
further through various techniques. There is the possibility of more and better use of tables, perhaps
even eliminating trigonometric functions and replacing them with lookups. Also, the majority of time
spent in the Directional algorithm is in the inner loop which checks for blocking tiles which may have
been hit. An optimization of that section of code could potentially double the performance. Finally, the
heuristics used in the Directional algorithm and the Smoothing-48 pass could potentially be revised to
find solutions substantially faster, or at least tweaked for specific games.
Pathfinding is a complex problem which requires further study and refinement. Clearly not all
questions are adequately resolved. One critical issue at the moment is performance. I am confident
that some readers will find faster implementations of the techniques presented here, and probably
faster techniques as well. I look forward to this growth in the field.
Web sites
Books
________________________________________________________