Topic: Cognitive Architectures in HCI
Topic: Cognitive Architectures in HCI
Topic: Cognitive Architectures in HCI
ABSTRACT :
This paper serves as the overview and introduction to a symposium of the same name.
The symposium is made up of this introduction and six other papers on cognitive architectures
in HCI. As many readers may not be familiar with cognitive architectures, a description of what
cognitive architectures are is presented first. In an effort to be accessible to a wide audience,
this description is fairly abstract. Once it is clear what is meant by a cognitive architecture, and
what a model derived from such an architecture is, then the potential uses of such models in
HCI efforts, both research and practical, can be laid out. While there is a great deal of promise,
there are still challenges involved with using cognitive models in HCI, which are detailed in
the third section. Finally, an overview of the other symposium papers is presented along with
some orienting context.
But a cognitive architecture is more than just a theory of cognition. It is, as defined by
Young (Gray, Young, & Kirschenbaum, 1997; Ritter & Young, 2001) an embodiment of “a
scientific hypothesis about those aspects of human cognition that are relatively constant over
time and relatively independent of task.” That is, it is an attempt to describe those aspects of
the human cognitive system that are more or less universal, both across and within individuals.
Thus, a cognitive architecture alone is generally not able to describe human performance on
any particular task; it must be given knowledge about how to do the task. Generally speaking,
a model of a task in a cognitive architecture (generally termed a “cognitive model”) consists of
the architecture and the requisite knowledge to perform the specified task. This knowledge is
typically based on a thorough task analysis of the target activity being modeled. Finally, a
cognitive architecture is a piece of executable software. It is code, written by a programmer
(or, more generally, by multiple programmers).
This is another critical way in which cognitive architectures differ from most theories
in cognitive psychology; most cognitive architectures produce not only a prediction about
performance, but in fact output actual performance. That is, they produce a timestamped
sequence of actions (e.g., mouse clicks, eye movements) that can be compared to actual human
performance on a task. The time stamps on the actions mean that architectures produce models
that are quantitative. A model could thus predict that not only is task A faster than task B, but
that it is 2.2 seconds or 20% faster. This has numerous engineering implications, which are
discussed further in section 2 as well as in Byrne and Gray (2003). Another important
implication of this is that the knowledge which has to be supplied to an architecture generally
has to be supplied in the language of the architecture. Structuring knowledge in this form is
distinctly like programming, so architecture-based modelers typically have strong
programming skills.
As cognitive architectures progress and models of more and more complex and
interactive tasks are built, it is increasingly common that the architecture is connected to a
complex simulation of the environment in which the task is performed. In some cases, the
cognitive architecture interacts directly with the actual software that humans use to perform the
task. In other cases some form of connecting software must be constructed. Overall, a model
of a task generally has three components: the architecture, task knowledge, and a dynamic task
environment with which the model interacts. The output of this system is, as mentioned, a
timespamped behavior stream, as depicted in Figure 1.
supplied interacts with produces Cognitive Architecure Task Knowledge Task Enfironment
Timestamped Behavior Stream Figure 1: Structure of a model based on a cognitive architecture
Presently, cognitive architectures are primarily research tools housed in academic laboratories;
however, this is changing. There are now numerous consulting/technical companies who,
among other things, employ cognitive architectures in their work and even provide models
developed with cognitive architectures to their clients. The Lebiere, et al. paper in this
symposium is an example of a non-academic use of a cognitive architecture. Furthermore, work
is being done with several architectures to reduce the usually substantial learning curve
This is not to suggest that cognitive models will entirely supplant usability tests; real
empirical tests both in the laboratory and in the field are still conducted in other engineering
disciplines as well. However, cognitive models can help the usability engineer focus usability
tests on features or tasks likely to be crucial and can help rule out early design alternatives,
thereby reducing the number of cycles of usability testing and possibly even the number of
users needed. This is particularly attractive when the user population is very small or very
difficult to access due to specialization or expense, or when the tasks or situations required in
the tests are dangerous or expensive. For example, testing commercial airline pilots in flight is
quite challenging because pilots are difficult to recruit and their time is expensive, outfitting
commercial jetliners with new equipment requires considerable time and engineering effort,
and poor results can have fatal consequences. Some of these problems can be overcome by the
use of simulators rather than real cockpits, but high-fidelity simulators are themselves very
expensive. Similar issues come up when designing or evaluating systems for use by medical
professionals (particularly advanced specialists); one can imagine any number of special
populations that would raise some or all (or even more) such issues.
Now, it is certainly the goal of related techniques such as GOMS analysis (see John &
Kieras, 1996) or Cognitive Walkthrough (Polson, Lewis, Reiman, & Wharton, 1992) to make
predictions, often quantitative, of many of the same things. As it turns out, these techniques
were originally grounded in the same ideas as those that underlie prominent cognitive
architectures and are essentially abstractions of the relevant architectures for HCI purposes.
Furthermore, cognitive models provide things that such analyses do not. Cognitive models are
executable and generative, which means they produce not just global execution times, but they
actually generate behavior. A GOMS analysis, on the other hand, is a description of the
procedural knowledge the user has to have and the sequence of actions that must be performed
to accomplish a specific task instance, while the equivalent computational model actually
generates the behaviors, often in real time or faster. Equally importantly, computational models
have the capacity to be reactive in real time. So, while it may be possible to construct a GOMS
model that describes the knowledge necessary and the time it will take an operator to classify
a new object on an air traffic controller’s screen, a paper-and-pencil GOMS model cannot
actually execute the procedure in response to the appearance of such an object. However, a
running computational model can. (It should be noted that David Kieras has done considerable
work on a tool called GLEAN which in essence does allow for the execution of GOMS models;
see his paper in this symposium for more information.)
2.2 Cognitive Models in Lieu of Human Users :
The fact that these models are executable in real time (or faster) means they have a
number of other HCIrelevant applications that may not be immediately apparent. One such use
is in intelligent tutoring systems (ITSs). Consider the Lisp tutor (Anderson, Conrad, & Corbett,
1989). This tutoring system contained an architecture-based running computational model of
the knowledge necessary to implement the relevant Lisp functions, and a module for assessing
which pieces of this knowledge were mastered by the student. Because the model was
executable, it could predict what action the student would take if the student had correct
knowledge of how to solve the problem. When the student took a different action, this told the
ITS that the student was missing one or more relevant pieces of knowledge. The student could
then be given feedback about what knowledge was missing or incomplete, and problems which
exercised this knowledge could be selected by the ITS for further practice by the student. By
identifying students’ knowledge, and the gaps in that knowledge, it was possible to generate
more effective educational experiences. Problems that contained knowledge the student had
already mastered could be avoided, to not bore the student with things they already knew. This
freed up the student to concentrate on the material they had not yet mastered, resulting in
improved learning (Anderson, et al., 1989). While the Lisp tutor is an old research system, ITSs
based on the same underlying cognitive architecture with the same essential methodology have
been developed for more pressing educational needs such as high school algebra and geometry
and are now sold commercially (see http://www.carnegielearning.com for more information).
This is by no means a complete list; there are many other potential uses for cognitive
architectures in HCI. However, this should provide enough so that the value (or at least the
potential value) of such endeavors to the HCI community is apparent. This is not an entirely
promissory note, either, as there have been many successful applications of cognitive
architectures in HCI beyond those mentioned here. Most issues of journals such as Human-
Computer Interaction and each of the proceedings of the ACM SIGCHI conference contain
papers which use architecture-based cognitive models. There was a special section of the
journal Human-Computer Interaction devoted to such models in 1997 (see Gray, Young, &
Kirschenbaum, 1997), a similar special issue of the International Journal of Human-Computer
Studies in 2001 (see Ritter & Young, 2001), and a special section of the journal Human Factors
in 2003 (see Byrne & Gray, 2003) that contained several papers based on such models.
Another aspect of cognitive architectures that does not quite meet the full range of
capabilities necessary to model HCI tasks is that such architectures are almost entirely
cognitive; they model primarily aspects of thinking, and just enough perception and motor
control to support that thinking. They generally do not take into account factors like affect and
social influence. However, as affect and emotion have been topics which have received
considerable attention lately in the HCI community, work on integrating affect with cognitive
architectures is emerging rapidly (Gratch & Marsella, 2004; Norman, Ortony, & Revelle, in
press). Along similar lines, models developed in cognitive architectures do not express
aesthetic or subjective preferences, something real users obviously do (often vociferously!).
Another set of factors which has not yet received adequate attention are factors like
fatigue, stress, sleep deprivation, and the like. While these certainly may carry with them
affective or emotional components, such factors also have direct effects on human cognition
and performance. Unfortunately, cognitive architectures have generally not been used to model
the effects of such moderators. Again, though, research into how such effects can be modeled
with cognitive architectures is underway (for an example see Ritter, Reifers, Klein, Quigley, &
Schoelles, 2004).
Lastly, most cognitive modeling has been aimed at modeling the behavior of a single
user at a time. Clearly, many tasks of great interest to HCI researchers and practitioners involve
groups or teams of users. While there is nothing in principle which prevents the use of cognitive
architectures to construct multiple models and have those models interact with one another,
such usage has not been the norm and it is not clear the extent to which these would really
capture the richness of human social interaction. However, there has been some promising
work on this front as well presented by Kieras and Santoro (2004), though that work did not
delve deeply into social factors.
3.2 Technical and Practical Hurdles :
There are other barriers to the widespread adoption of cognitive architectures in HCI beyond
theoretical coverage. Just as with those issues, however, progress is being made on a number
of fronts. Probably the largest problem in principle and in practice is the knowledge engineering
problem. Consider Figure 1 again. One of the things that must be supplied to a cognitive
architecture as one constructs a model is the knowledge about the task possessed by the person
or people being modeled. In some cases, this is not a huge hurdle. In a huge array of laboratory
psychology experiments, the tasks are both simple and novel and people have little knowledge
to bring to bear. This is almost certainly true of some user interfaces as well; the expectation is
that users will come to the interface with little or no direct experience. Automated teller
machines (ATMs) and many information kiosks are of this general type. However, even in
those cases, users generally do come to the task with a surprisingly substantial amount of
knowledge, even if that knowledge is not specific to that particular interface. For example,
novice users of an ATM are assumed to be able to read and to understand the mapping between
instructions printed on the display and the actions they are to take. While this may seem like
fairly trivial knowledge, it is so implicit for most people that the subtleties involved are easy to
under-appreciate. What cognitive architectures need for situations like this are large
“background knowledge” bases along with accurate models of human learning processes.
Work to address the issue of learning from simple instructions was initiated with Soar
architecture some years ago (see Lewis, Newell, & Polk, 1989) and more recently, this has
become a major focus of some of the researchers in the ACT-R community (Anderson, Bothell,
Byrne, Douglass, Lebiere, & Qin, 2004).
Another issue that arises with some regularity when dealing with cognitive architectures
is the other part of the diagram in Figure 1, which is the connection between the architecture
and the task environment. An excellent summary of this problem can be found in Ritter, Baxter,
Jones, and Young (2000). In brief, for very simple environments, this can be a very simple
representation of the environment written in the same simulation language as the architecture
itself. In other cases, where that environment is both complex and dynamic, this can be a very
difficult problem. In general, most user interfaces are “closed” pieces software with no built-
in support for supplying a cognitive model with the information it needs for perception (i.e.,
what is on the screen where) or accepting input from a model. Somehow, the interface and the
model must be connected. Work on making this easier is ongoing (see Section 4) and there is
reason to be optimistic that this hurdle can be straightforwardly overcome, but for the time
being this is still often a problem for cognitive modelers.
Finally, there is the issue of the usability of cognitive architectures themselves. These
systems tend to be large, complex, and unwieldy pieces of software, generally with limited
documentation. This does not make them especially approachable by anyone other than those
researchers who have extensive experience with a particular architecture. There is indeed a
certain irony to this situation. However, work is underway to help with this problem as well.
One very exciting line of work is presented in John, Prevas, Salvucci, & Koedinger (2004).
They have combined an HTML graphical interface builder with a programming-by-
demonstration interpreter and a system for automatically synthesizing ACT-R models from
simple execution traces. This is done to make the generation of cognitive models
straightforward for certain classes of interfaces and tasks. While space considerations prevent
a more detailed presentation of this work, the promise held here is substantial. Despite these
and other challenges beyond the scope of this paper, there are many questions, both research
questions and practical questions, for which cognitive architectures are well suited.
For instance, one of the issues raised is the representation of background knowledge,
not specific to any interface, held by most users. This is particularly pertinent to understanding
and predicting the behavior of users in environments where the basic mechanics (i.e., pointing
and clicking) of the interface are straightforward, but the information environment faced by
users most definitely is not. The paradigm example for such an interface right now is the World-
Wide Web. With even moderately well-constructed Web pages, the mechanics of pointing and
clicking links to navigate the Web are straightforward.
However, the problem of choosing which link to follow in order to find particular pieces
of information can be a difficult one for users, and an even more challenging problem to predict
with any kind of model. This is because this kind of behavior is largely a function of the huge
amount of knowledge users bring with them to the task. Not only is there a lot of knowledge
to represent, but that knowledge is highlystructured and semantically rich. Even a few years
ago, representation of such knowledge in a cognitive model was a challenge that was hard to
fathom. However, an enormous amount of progress has been made on this problem in recent
years, which is represented in this symposium by two papers, one by Kitajima, Blackmon, and
Polson and the other by Pirolli, Fu, Chi, and Farahat. These papers differ in many important
and interesting ways, but both showcase how cognitive architectures have been scaled up to
address the semantic demands of the Web.
Another issue raised in the previous section is the issue of knowledge engineering. One
of the things which makes knowledge engineering for cognitive architectures so labor-intensive
is that the knowledge typically has to be specified and debugged at a very detailed level, down
to the level of individual eye movements. While the problem of understanding what people,
particularly experts, know about a task is not particularly easy to solve, there are multiple
researchers who are actively looking at ways to make the representation of this knowledge
easier. The paper in the symposium by Lebiere, Archer, Warwick and Schunk is an example
of such an approach. This paper describes their work on unifying a detailed cognitive
architecture, ACT-R, with a higher-level task network simulation tool, IMPRINT. Task
network tools use a task representation that is at a much higher level of abstraction than those
typically used in cognitive architectures. Such models have been a part of the human factors
toolbox for some time, so this work represents the unification of two important human
performance modeling traditions.
Another approach to this problem is to simply forego the level of detail typically found
in cognitive architectures. For many HCI-level tasks, such detail is not necessary and forces
the model-builder to confront issues on or outside the boundaries of the science base. While
resolution of those issues is ultimately important for both scientific and practical reasons, it is
not in the best interests of the HCI practitioner to have to confront those issues on a routine
basis. Such is the argument presented in the paper by Kieras. He outlines a two-pronged
approach to cognitive modeling in HCI, a “low fidelity” approach based on a tool called
GLEAN aimed at keeping such details away from modelers and a “high fidelity” approach
based on an architecture called EPIC aimed at addressing these detailed issues in the longer
term. An interesting alternative middle ground can be found in a paper by St. Amant, Freed,
and Ritter (2005) in which they present a tool, G2A, which takes a GLEAN-level specification
and translates it directly into a model basd in the ACT-R architecture, which is at approximately
the same level of analysis as EPIC.
However, this is not the problem at which the symposium paper by St. Amant, Reidl,
and Ritter is aimed. Their symposium paper concerns the problem of how to connect a cognitive
architecture to a closed software systems that were not designed with communication with a
cognitive model in mind. In particular, they describe a system called SegMan that applies
image processing techniques to raw bitmaps to allow cognitive models to “see” and manipulate
arbitrary Windows applications, without having to modify the Windows application in any
way. The promise of this tool is tremendous and could potentially go a long way toward solving
the software integration problems faced by cognitive modelers in a wide variety of domains.
Finally, there is the paper by Vera, Howes, Lewis, Tollinger, Eng, and Richardson. This
represents a different approach to cognitive modeling, one based on enumerating the
constraints on human performance and then reasoning over those constraints to predict expert
human behavior. This is not exactly a cognitive architecture as defined in section 1 of this
paper, but it shares a great deal of the same intellectual tradition and aims to produce
quantitative models of human performance with HCI implications. Their approach is both
novel and interesting, and may ultimately provide insights which are later incorporated into the
kinds of cognitive architectures described in the other papers.
CONCLUSION :
I would like to thank all of the symposium authors for their excellent contributions to
the session. I would also like to acknowledge the financial support of the Office of Naval
Research under grant number N00014-03-1-0094 and the National Aeronautics and Space
Administration under grant number NCC2- 1219. The views and conclusions contained herein
are those of the authors and should not be interpreted as necessarily representing the official
policies or endorsements, either expressed or implied, of ONR, NASA, the U.S. Government,
or any other organization.
References :
• Anderson, J. R., Bothell, D., Byrne, M. D., Douglass, S., Lebiere, C., & Quin, Y.
(2004).
• An integrated theory of the mind. Psychological Review, 111(1036-1060). Anderson,
J. R., Conrad, F. G., & Corbett, A. T. (1989).
• Skill acquisition and the LISP tutor. Cognitive Science, 13(4), 467-505. Byrne, M. D.,
& Gray, W. D. (2003).
• Returning human factors to an engineering discipline: Expanding the science base
through a new generation of quantitative methods—preface to the special section.
Human Factors, 45, 1-4. Card, S. K., Moran, T. P., & Newell, A. (1983).
• The psychology of human-computer interaction. Hillsdale, NJ: Lawrence Erlbaum
Associates. Daily, L. Z., Lovett, M. C., & Reder, L. M. (2001).
• Modeling individual differences in working memory performance: A source activation
account. Cognitive Science, 25, 315–353. Gratch, J., & Marsella, S. (2004).
• A domain-independent framework for modeling emotion. Journal of Cognitive Systems
• Research, 5, 269-306. Gray, W. D., Young, R. M., & Kirschenbaum, S. S. (1997).
Introduction to this special issue on cognitive architectures and human-computer
interaction. Human-Computer Interaction, 12, 301-309. John, B. E., & Kieras, D. E.
(1996).
• GOMS family of user interface analysis techniques: Comparison and contrast. ACM
Transactions on Computer-Human Interaction, 3, 320-351. John, B. E., Prevas, K.,
Salvucci, D. D., & Koedinger, K. (2004).
• Predictive human performance modeling made easy. In Human factors in computing
systems: Proceedings of CHI 2004 (pp. 455- 462).
• New York: ACM. Jones, R. M., Laird, J. E., Nielsen, P. E., Coulter, K. J., Kenny, P.,
& Koss, F. V. (1999).
• Automated intelligent pilots for combat flight simulation. AI Magazine, 20(1), 27-41.
Kieras, D. E., Meyer, D. E., & Ballas, J. A. (2001).
• Towards demystification of direct manipulation: Cognitive modeling charts the gulf of
execution. In Proceedings of ACM CHI 01 Conference on Human Factors in
Computing Systems (pp. 128-135). New York: ACM. Kieras, D. E., & Santoro, T. P.
(2004).
• Computational GOMS modeling of a complex team task: Lessons learned. In Human
Factors in Computing Systems: Proceedings of CHI 2004 (pp. 97-104).
• New York: ACM. Lewis, R. L., Newell, A., & Polk, T. A. (1989). Toward a Soar theory
of taking instructions for immediate reasoning tasks. In Proceedings of the Eleventh
Annual Conference of the Cognitive Science Society (pp. 514-521).
• Hillsdale, NJ: Lawrence Erlbaum Associates. Norman, D. A., Ortony, A., & Revelle,
W. (in press). Effective functioning: A three-level model of affect, behavior, and
cognition. In J. M. Fellous & M. A. Arbib (Eds.), Who needs emotions? The brain meets
the machine. New York: Oxford University Press. Polson, P. G., Lewis, C., Reiman, J.,
& Wharton, C. (1992).
• Cognitive walkthroughs: A method for theory-based evaluation of user interfaces.
International Journal of Man-machine Studies, 36, 741- 773. Ritter, F. E., Baxter, G.
D., Jones, G., & Young, R. M. (2000).
• Supporting cognitive models as users. ACM Transactions on Computer-Human
Interaction, 7, 141-173. Ritter, F. E., Reifers, A., Klein, L. C., Quigley, K., & Schoelles,
M. (2004).
• Using cognitive modeling to study behavior moderators: Pre-task appraisal and anxiety.
In Proceedings of the Human Factors and Ergonomics Society 48th Annual Meeting
(pp. 2121-2125).
• Santa Monica, CA: Human Factors and Ergonomics Society. Ritter, F. E., & Young, R.
M. (2001).
• Embodied models as simulated users: Introduction to this special issue on using
cognitive models to improve interface design. International Journal of HumanComputer
Studies, 55, 1-14. St. Amant, R., Freed, A. R., & Ritter, F. E. (2005).
• Specifying ACT-R models of user interaction with a GOMS language. Cognitive
Systems Research, 6, 71-88.