Wickens Ch2 Research Methods

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

l

An Introduction
to Human Factors
Engineering
Second Edition

Christopher D. Wickens
University of Illinois at Champaign-Urbana

John Lee
University of Iowa

Viii Liu
University of Michigan

Sallie Gordon Becker


Becker & Associates, Palo Alto, California

Pearson Education International


Chapter

Resea

"

A state legislator suffered an automobile acci-


dent when another driver ran a stop sign while talking on a cellular phone. The re-
sulting concern about cell phones and driving safety led the legislator to introduce
legislation banning the use of cellular phones while a vehicle is in motion. But oth-
ers challenged whether the one individual's experience could justify a ban on all
others' cellular phone use. After all, a single personal experience does not necessarily
generalize to all, or even most others. To resolve this debate, a human factors com-
pany was contracted to provide the evidence regarding whether or not use of cellular
phones compromises driver safety.
Where and how should that evidence be obtained? The company might con-
sult accident statistics and police reports, which could reveal that cell phone use
was no more prevalent in accidents than it was found to be prevalent in a survey
of drivers asked to report how frequently they talked on their cellular phone. But
how reliable and accurate is this evidence? Not every accident report may have a
place for the officer to note whether a cellular phone was or was not in use; and
those drivers filling out the survey may not have been entirely truthful about how
often they use their phone while driving. The company might also perform its
own research in an expensive driving simulator, comparing driving performance
of people while the cellular phone was and was not in use. But how much do the
conditions of the simulator replicate those on the highway? On the highway, peo-
ple choose when they want to talk on the phone. In the simulator, people are
asked to talk at specific times. The company might also rely on more basic labo-
ratory research that characterizes the degree of dual task interference between
conversing and carrying out a "tracking task" like vehicle control, while detecting
events that may represent pedestrians (Strayer & Johnston, 2001). But isn't a
computer-driven tracking task unlike the conditions of real automobile driving?

10
Introduction to Research Methods 11

The approaches to evidence-gathering described above represent a sample


of a number of research methods that human factors researchers can employ to
discover "the truth" (or something close to it) about the behavior of humans in-
teracting with systems in the "real world:' Research involves the scientific gather-
ing of observations or data and the interpretation of the meaning of these data
regarding the research questions involved. In human factors, such meaning is
often expressed in terms like what works? what is unsafe? which is better? (in
terms of criteria of speed, accuracy, and workload). It may also be expressed in
terms of general principles or models of how humans function in the context of
a variety of different systems. Because human factors involves the application of
science to system design, it is considered by many to be an applied science. While
the ultimate goal is to establish principles that reflect performance of people in
real-world contexts, the underlying scientific principles are gained through re-
search conducted in both laboratory and real-world environments.
Human factors researchers use standard methods for developing and testing
scientific principles that have been developed over the years in traditional physi-
cal and social sciences. These methods range from the "true scientific experi-
ment" conducted in highly controlled laboratory environments to less
controlled but more realistic observational studies in the real world. Given this
diversity of methods, a human factors researcher must be familiar with the
range of research methods that are available and know which methods are best
for specific types of research questions. It is equally important for researchers to
understand how practitioners ultimately uses their findings. Ideally, this enables
a researcher to direct his or her work in ways that are more likely to be useful to
design, thus making the science applicable (Chapanis, 1991).
Knowledge of basic research methods is also necessary for human factors
design work. That is, standard design methods are used during the first phases of
product or system design. As alternative design solutions emerge, it is sometimes
necessary to perform formal or informal studies to determine which design so-
lutions are best for the current problem. At this point, designers must select and
use appropriate research methods. Chapter 3 provides an overview of the more
common design methods used in human factors and will refer you back to vari-
ous research methods within the design context.

INTRODUCTION TO RESEARCH METHODS


As we noted in Chapter 1, comprehensive human factors research spans a variety
of disciplines, from a good understanding of the mind and how the brain
processes information to a good ·understanding of the physical and physiological
limits of the body. But the human factors researcher must also understand how
the brain and the body work in conjunction with other systems, whether these
systems are physical and mechanical, like the handheld cellular phone, the
shovel, or the airplane; or are informational, like the dialogue with a copilot,
with a 911 emergency dispatch operator, or with an instructional display. Because
of this, much of the scientific research in human factors cannot be as simple or
12 Chapter 2: Research Methods

"context-free" as more basic research in psychology, physics, and physiology, al-


though many of the tenets of basic research remain relevant.

Basic and Applied Research


It should be apparent that scientific study relevant to human factors can range
from basic to very applied research. Basic research can be defined as "the devel-
opment of theory, principles, and findings that generalize over a wide range of
people, tasks, and settings;' An example would be a series of studies that tests the
theory that as people practice a particular activity hundreds of times, it becomes
automatic and no longer takes conscious, effortful cognitive processing. Applied
research can be defined loosely as "the development of theory, principles, and
findings that are relatively specific with respect to particular populations, tasks,
products, systems, and/or environments;' An example of applied research would
be measuring the extent to which the use of a particular cellular phone while
driving on an interstate highway takes driver attention away from primary driv-
ing tasks.
While some specialists emphasize the dichotomy between basic and applied
research, it is more accurate to say that there is a continuum, with all studies
falling somewhere along the continuum depending on the degree to which the
theory or findings generalize to other tasks, products, or settings. Both basic and
applied research have complementary advantages and disadvantages. Basic re-
search tends to develop basic principles that have greater generality across a vari-
ety of systems and environments than does applied research. It is conducted in
rigorously controlled laboratory environments, an advantage because it prevents
intrusions from other confounding variables, and allows us to be more confident
in the cause-and-effect relationships we are studying. Conversely, research in a
highly controlled laboratory environment is often simplistic and artificial and
may bear little resemblance to performance in real-world environments. Caution
is required in assuming that theory and findings developed through basic re-
search will be applicable for a particular design problem (Kantowitz, 1990). For
this reason, people doing controlled research should strive to conduct controlled
studies with a variety of tasks and within a variety of settings, some of which are
conducted in the field rather than in the lab. This increases the likelihood that
their findings are generalizable to new or different tasks and situations.
We might conclude from this discussion that only applied research is valuable
to the human factors designer. After all, applied research yields principles and
findings specific to particular tasks and settings. A designer need only locate re-
search findings corresponding to the particular combination of factors in the cur-
rent design problem and apply the findings. The problem with this view is that
many, if not most, design problems are somehow different from those studied in
the past. The advantage of applied research is also its downfall. It is more descrip-
tive of real-world behavior, but it also tends to be much more narrow in scope.
In addition, applied research such as field studies is often very expensive. It
often uses expensive equipment (for example, driving simulators or real cars in
answering the cellular phone question), and may place the human participant at
risk for accidents, an issue we address later in this chapter.
Introduction to Research Methods 13

Often there are so few funds available for answering human factors research
questions, or the time available for such answers is so short, that it is impossible
to address the many questions that need asking in applied research designs. As a
consequence, there is a need to conduct more basic, less expensive and risky lab-
oratory research, or to draw conclusions from other researchers who have pub-
lished their findings in journals and books. These research studies may not have
exactly duplicated the conditions of interest to the human factors designer. But
if the findings are strong arid reliable, they may provide useful guidance in ad-
dressing that design problem, informing the designer or applied researcher, for
example, of the driving conditions that might make cellular phone use more or
less distracting; or the extent of benefits that could be gained by a voice-dialed
over a hand-dialed phone.

Overview of Research Methods


The goal of scientific research is to describe, understand, and predict relation-
ships between variables. In our example, we are interested in the relationship be-
tween the variable of "using a cellular phone while driving" and "driving"
performance." More specifically, we might hypothesize that use of a cellular
phone will result in poorer driving performance than not using the phone.
As noted earlier, we might collect data from a variety of sources. The data
source of more basic research is generally the experiment, although applied re-
search and field studies also often involve experiments. The experimental method
consists of deliberately producing a change in one or more causal or indepen-
dent variables and measuring the effect of that change on one or more dependent
variables. The key to good experiments is control. That is, only the independent
variable should be changed, and all other variables should be held constant or
controlled. However, control becomes progressively more difficult in more ap-
plied research, where participants perform their tasks in the context of the envi-
ronment to which the research results are to generalize.
As control is loosened, out of necessity, the researcher depends progressively
more on descriptive methods: describing relations that exist, even though they
could not be actually manipulated or controlled by the researcher. For example,
the researcher might describe the greater frequency of cell phone accidents in
city than in freeway driving to help draw a conclusion that cell phones are more
likely to disrupt the busier driver. A researcher might also simply observe drivers
while driving in the real world, objectively recording and later analyzing their
behavior.
In human factors, as in any kind of research, collecting data, whether exper-
imental or descriptive, is only half of the process. The other half is inferring the
meaning or message conveyed by the data, and this usually involves generalizing
or predicting from the particular data sampled to the broader population. Do
cell phones compromise (or not) driving safety in the broad section of automo-
bile drivers, and not just in the sample of drivers used iri the simulator experi-
ment or the sample involved in accident statistics? The ability to generalize
involves care in both the design of experiments and in the statistical analysis.
14 Chapter 2: Research Methods

EXPERIMENTAL RESEARCH METHODS


An experiment involves looking at the relationship between causal independent
variables and resulting changes in one or more dependent variables, which are
typically measures of performance, workload, preference, or other subjective
evaluations. The goal is to show that the independent variable, and no other
variable, is responsible for causing any quantitative differences that we measure
in the dependent variable. When we conduct an experiment, we proceed
through a process of five steps or stages.

Steps in Conducting an Experiment


Step 1. Define problem and hypotheses. A researcher first hypothesizes the rela-
tionships between a number of variables and then sets up experimental designs
to determine whether a cause-and-effect relationship does in fact exist. For ex-
ample, we might hypothesize that changing peoples' work shifts back and forth
between day and night produces more performance errors than having people
on a constant shift. Once the independent and dependent variables are defined
in an abstract sense (e.g., fatigue or attention) and hypotheses are stated, the re-
searchers must develop more detailed experimental specifications.
i
Step 2. Specify the experimental plan. Specifying the experimental plan con- I

sists of identifying all the detail of the experiment to be conducted. Here we i


must specify exactly what is meant by the dependent variable. What do we mean !
by performance? What task will our participants be asked to perform, and what
aspects of those tasks do we measure? For example, we could define ·perfor-
mance as the number of keystroke errors in data entry. We must also define each
independent variable in terms of how it will be manipulated. For example, we
would specify exactly what we mean by alternating between day and night shifts.
Is this a daily change or a weekly change? Defining the independent variables is
an important part of creating the experimental design. Which independent vari-
ables do we manipulate? How many levels of each? For example, we might de-
cide to examine the performance of three groups of workers: those on a day
shift, those on a night shift, and those alternating between shifts.
Step 3. Conduct the study. The researcher obtains participants for the experi-
ment, develops materials, and prepares to conduct the study. If he or she is un-
sure of any aspects of the study, it is efficient to perform a very small
experiment, a pilot study, before conducting the entire "real" study. After every-
thing is checked through a pilot study, the experiment is carried out and data
collected.
Step 4. Analyze the data. In an experiment, the dependent variable is measured
and quantified for each subject (there may be more than one dependent vari-
able). For our example, you would have a set of numbers representing the key-
stroke errors for the people on changing work shifts, a set for the people on day
shift, and a set for the people on night shift. Data are analyzed using both de-
scriptive and inferential statistics to see whether there are significant differences
among the three groups.
Experimental Research Methods 15

Step 5. Draw conclusions. Based on the results of the statistical analysis, the re-
searchers draw conclusions about the cause-and-effect relationships in the ex-
periment. At the simplest level, this means determining whether hypotheses
were supported. In applied research, it is often important to go beyond the obvi-
ous. For example, our study might conclude that shiftwork schedules affect
older workers more than younger workers or that it influences the performance
of certain tasks, and not others. Clearly, the conclusions that we draw depend a
lot on the experimental design. It is also important for the researcher to go be-
yond concluding what was found, to ask "why". For example, are older people
more disrupted by shiftwork changes because they need more sleep? Or because
their natural circadian (day-night rhythms) are more rigid? Identifying underly-
ing reasons, whether psychological or physiological, allows for the development
of useful and generalizable principles and guidelines.

Experimental Designs
For any experiment, there are different designs that can be used to collect the data.
Which design is best depends on the particular situation. Major features that differ
between designs include whether each independent variable has two levels or
more, whether one or more independent variable is manipulated, and whether the
same or different subjects participate in the different conditions defined by the in-
dependent variables (Keppel, 1992; Elmes et al., 1995; Williges, 1995).
The Two-Group Design. In a two-group design, one independent variable or fac-
tor is tested with only two conditions or levels of the independent variable. In
the classic two-group design, a control group gets no treatment (e.g., driving
with no cellular phone), and the experimental group gets some "amount" of the
independent variable (e.g., driving while using a cellular phone). The dependent
variable (driving performance) is compared for the two groups. However, in
human factors we often compare two different experimental treatment condi-
tions, such as performance using a trackball versus using a mouse. In these cases,
a control group is unnecessary: A control group to compare with mouse and
trackball users would have no cursor control at all, which does not make sense.
Multiple Group Designs. Sometimes the two-group design does not adequately
test our hypothesis of interest. For example, if we want to assess the effects of
VDT brightness on display perception, we might want to evaluate several differ-
ent levels of brightness. We would be studying one independent variable
(brightness) but would want to evaluate many levels of the variable. If we used
five different brightness levels and therefore five groups, we would still be study-
ing one independent variable but would gain more information than if we used
only two levels/groups. With this design, we could develop a quantitative model
or equation that predicts performance as a function of brightness. In a different
multilevel design, we might want to test four different input devices for cursor
control, such as trackball, thumbwheel, traditional mouse, and key-mouse. We
would have four different experimental conditions but still only one indepen-
dent variable (type of input device).
16 Chapter 2: Research Methods

Factorial Designs. In addition to increasing the number of levels used for ma-
nipulating a single independent variable, we can expand the two-group design
by evaluating more than one independent variable or factor in a single experi-
ment. In human factors, we are often interested in complex systems and there-
fore in simultaneous relationships between many variables rather than just two.
As noted above, we may wish to determine if shiftwork schedules (Factor A)
have the same or different effects on older versus younger workers (Factor B).
A multifactor design that evaluates two or more independent variables by
combining the different levels of each independent variable is called a factorial
design. The term factorial indicates that all possible combinations of the inde-
pendent variable levels are combined and evaluated. Factorial designs allow the
researcher to assess the effect of each independent variable by itself and also to
assess how the independent variables interact with one another. Because much
of human performance is complex and human-machine interaction is often
complex, factorial designs are the most common research designs used in both
basic and applied human factors research.
Factorial designs can be more complex than a 2 X 2 design in a number of
ways. First, there can be more than two levels of each independent variable.
For example, we could compare driving performance with two different cellu-
lar phone designs (e.g., hand-dialed and voice-dialed), and also with a "no
phone" control condition. Then we might combine that first three-level vari-
able with a second variable consisting of two different driving conditions: city
and freeway driving. This would result in a 3 X 2 factorial design. Another way
that factorial designs can become more complex is by increasing the number
of factors or independent variables. Suppose we repeated the above 2 X 3 de-
il!i
sign with both older and younger drivers. This would create a 2 X 3 X 2 de- "

sign. A design with three independent variables is called a three-way factorial


design.
Adding independent variables has three advantages: (I) It allows designers
to vary more system features in a single experiroent: It is efficient. (2) It cap-
tures a greater part of the complexity found in the real world, making experi-
mental results more likely to generalize. (3) It allows the experiroenter to see if
there is an interaction between independent variables, in which the effect of one
independent variable on performance depends on the level of the other inde-
pendent variable, as we describe in the box.
Between.Subjects Design. In most of the previous examples, the different lev-
els of the independent variable were assessed using separate groups of subjects.
For example, we might have one group of subjects use a cellular car phone in
heavy traffic, another group use a cellular phone in light traffic, and so on. We
compare the driving performance between groups of subjects and hence use the
term between-subjects. A between-subjects variable is an independent variable
whereby different groups of subjects are used for each level or experiroental con-
dition.
A between-subjects design is a design in which all of the independent vari-
ables are between-subjects, and therefore each combination of independent
variables is administered to a different group of subjects. Between-subjects
Experimental Research Methods 17

",,-,o,~.},-";":~li'., A SIMPLE FACTORIAL DESIGN


To illustrate the logic behind factorial designs, we consider an example
of the most simple factorial design. This is where two levels of one in-
dependent variable are combined with twa levels af a second indepen-
dent variable. Such a design is called a 2 x 2 factorial design. Imagine
that a researcher wants to evaluate the effects of using a cellular
phone on driving performance (and hence on safety). The researcher
manipulates the first independent variable by comparing driving with
and without use of a cellular phone. However, the researcher suspects
that the driving impairment may only occur if the driving is taking place
in heavy traffic. Thus, he or she may add a second independent vari-
able consisting of light versus heavy traffic driving conditions. The ex-
perimental design would look like that illustrated in Figure 2.1: four
groups of subjects derived from combining the two independent vari-
ables.
Imagine that we conducted the study, and for each of the subjects
in the four groups shown in Figure 2.1, we counted the number of
times the driver strayed outside of the driving lane as the dependent
variable. We can look at the general pattern of dats by evaluating the
cell means; that is, we combine the scores of all subjects within each
of the four groups. Thus, we might obtain data such as that shown in
Table 2.1. ,
If we look only at the effect of cellular phone use (combining the
light and heavy traffic conditions), we might be led to believe that use
of cell phones impairs driving performance. But looking at the entire
picture, as shown in Figure 2.2, we see that the use of a cell phone

DRIVING CONDITIONS

Light traffic Heavy traffic

No cell phone No ceJJ phone


No car phone while driving in while driving in
light traffic heavy traffic

Use cell phone Use cell phone


Car phone while driving in while driving in
light traffic heavy traffic

FIGURE 2.1
The four experimental conditions for a 2 X 2 factorial design.
18 Chapter 2: Research Methods

impairs driving only in heavy traffic conditions [as defined in this partic-
ular study]. When the lines connecting the cell means in a factorial
study are not parallel, as in Figure 2.2, we know that there is some
type of interaction between the independent variables: The effect of
phone use depends on driving conditions. Factorial designs are popular
for both basic research and applied questions because they allow re-
searchers to evaluate interactions between variables.

TABLE 2.1 Hypothetical Data for Driving Study: Average Number


of Lane Deviations
Cell Phone Use Light Traffic Heavy Traffic
No cell phone 2.1 2.1
Cell phone 2.2 5.8

6,
o earphone

,i
!

I
"

"

!i
0'
• • No car phone

I I
Light traffic Heavy traffic

FIGURE 2.2
Interaction between cellular phone use and driving conditions.

designs are most commonly used when having subjects perform in more than
one of the conditions would be problematic. For example, if you have subjects
receive one type of training (e.g., on a simulator), they could not begin over
again for another type of training because they would already know the mater-
ial. Between-subjects designs also eliminate certain confounds related to order
effects, which we discuss shortly.
Experimental Research Methods 19

Within-Subject Designs. In many experiments, it is feasible to have the same


subjects participate in all of the experimental conditions. For example, in the
driving study, we could have the same subjects drive for periods of time in each
. of the four conditions shown in Table 2.1. In this way, we could compare the
performance of each person with him- or herself across the different conditions.
This within-subject performance comparison illustrates where the methods gets
its name. When the same subject experiences all levels of an independent vari-
able, it is termed a within-subjects variable. An experiment where all indepen-
dent variables are within-subject variables is termed a within-subjects design.
Using a within-subjects design is advantageous in a number of respects, includ-
ing that it is more sensitive and easier to find statistically significant differences
between experimental conditions. It i~ also advantageous when the number of
people available to participate in the experiment is limited.
Mixed Designs. In factorial designs, each independent variable can be either
between-subjects or within-subjects. Ifboth types are used, the design is termed
a mixed design. If one group of subjects drove in heavy traffic with and without a
cellular phone, and a second group did so in light traffic, this is a mixed design.
Multiple Dependent Variables. In the previous sections, we described several
different types of experimental design that were variations of the same thing-
multiple independent variables combined with a single dependent variable or
"effect:' However, the systems that we study, including the human, are very com-
plex. We often want to measure how causal variables affect several dependent
variables at once. For example, we might want to measure how use of a cellular
phone affects a number of driving variables, including deviations from the lane,
reaction time to brake for cars or other objects in front of the vehicle, time to
recognize objects in the driver's peripheral vision, speed, acceleration, and so
forth.

Selecting the Apparatus and Context


Once the experimental design has been specified with respect to dependent vari-
ables, the researcher must decide what tasks the person will be performing and
under what context. For applied research, we try to identify tasks and environ-
ments that will give us the most generalizable results. This often means conduct-
ing the experiments under real-world or high-fidelity conditions.

Selecting Experimental Participants


Participants should represent the population or group in which the researcher is
interested. For example, if we are studying pilot behavior, we would pick a sam-
ple of pilots who represent the pilot population in general. If we are studying el-
derly, we define the population of interest (e.g., all people aged 65 and older who
are literate); then we obtain a sample that is representative of that population.
Notice that it would be difficult to find a sample that has all of the qualities of all
elderly people. If lucky, we might get a sample that is representative of all elderly
people living in the United States who are healthy, speak English, and so on.
20 Chapter 2: Research Methods

Experimental Control and Confounding Variables


In deciding how the study will be conducted, it is important to consider all vari-
ables that might impact the dependent variable. Extraneous variables have
the potential to interfere in the causal relationship and must be controlled so
that they do not interfere. If these extraneous variables do influence the depen-
dent variable, we say that they are confounding variables. One group of extrane-
ous variables is the wide range of ways participants differ from one another.
These variables must be controlled, so it is important that the different groups of
people in a between-subjects experiment differ only with respect to the treat-
ment condition and not on any other variable or category. For example, in the
cellular phone study, you would not want elderly drivers using the car phone
and young drivers using no phone. Then age would be a confounding variable.
One way to make sure all groups are equivalent is to take the entire set of sub-
jects and randomly put them in one of the experimental conditions. That way,
on the average, if the sample is large enough, characteristics of the subjects will
even out across the groups. This procedure is termed random assignment. An-
other way to avoid having different characteristics of subjects in each group is to
use a within-subjects design. However, this design creates a different set of chal-
lenges for experimental control.
Other variables in addition to subject variables must be controlled. For ex-
ample, it would be a poor experimental design to have one condition where cel-
lular phones are used in a Jaguar and another condition where no phone is used
in an Oldsmobile. There may be driving characteristics or automobile size dif-
ferences that cause variations in driving behavior. The phone versus no-phone
comparison should be carried out in the same vehicle (or same type of vehicle).
We need to remember, however, that in more applied research, it is sometimes
impossible to exert perfect control.
For within-subjects designs, there is another variable that must be con-
trolled: the order in which the subject receive~ his or her experimental condi-
tions, which creates what are called order effects. When people participate in
several treatment conditions, the dependent measure may show differences from
one condition to the next simply because the treatments, or levels of the inde-
pendent variable, are experienced in a particular order. For example, if partici-
pants use five different cursor-control devices in an experiment, they might be
fatigued by the time they are tested on the fifth device and therefore exhibit
more errors or slower times. This would be due to the order of devices used
rather than the device per se. In contrast, if the cursor-control task is new to the
participant, he or she might show learning and actually do best on the fifth de-
vice tested, not because it was better, but because the cursor-control skill was
more practiced. These order effects of fatigue and practice in between-subjects
designs are both potential confounding variables; while they work in opposite
directions, to penalize or reward the late-tested conditions, they do not necessar-
ily balance each other out.
As a safeguard to keep order from confounding the independent variables,
we use a variety of methods. For example, extensive practice can reduce learning
effects. Time between conditions can reduce fatigue. Finally, researchers often
Experimental Research Methods 21

use a technique termed counterbalancing. This simply means that different sub-
jects receive the treatment conditions in different orders. For example, half of
the participants in a study would use a trackball and then a mouse. The other
half would use a mouse and then a trackball. There are specific techniques for
counterbalancing order effects; the most common is a Latin-square design. Re-
search methods books (e.g., Keppel, 1992) provide instruction on using these
designs.
In'summary, the researcher must <ontrol extraneous variables by making
sure they do not covary with the independent variable. If they do covary, they
become confounds and make interpretation of the data impossible. This is be-
cause the researcher does not know which variable caused the differences in the
dependent variable.

Conducting the Study


After designing the study and identifying a sample of participants, the researcher
is ready to conduct the experiment and collect data (sometimes referred to as
"running subjects"). Depending on the nature of the study, the experimenter
may want to conduct a small pretest, or pilot study, to check that manipulation
levels are set right, that participants (subjects) do not experience unexpected
problems, and that the experiment will generally go smoothly. When the experi-
ment is being conducted, the experimenter should make sure that data collec-
tion methods remain constant. For example, an observer should not become
more lenient over time; measuring instruments should remain calibrated. Fi-
nally, all participants should be treated ethically, as described later.

Data Analysis
Once the experimental data have been collected, the researcher must determine
whether the dependent variable(s) actually did change as a function of experi-
mental condition. For example, was driving performance really "worse" while
using a cellular phone? To evaluate the research questions and hypotheses, the
experimenter calculates two types of statistics: descriptive and inferential statis-
tics. Descriptive statistics are a way to summarize the dependent variable for the
different treatment conditions, while inferential statistics tell us the likelihood
that any differences between our experimental groups are "real" and not just
random fluctuations due to chance.
Descriptive Statistics. Differences between experimental groups are usually de-
scribed in terms of averages. Thus, the most common descriptive statistic is the
mean. Research reports typically describe the mean scores on the dependent
variable for each group of subjects (e.g., see the data shown in Table 2.1 and
Figure 2.2). This is a simple way of conveying the effects of the independent
variable(s) on the dependent variable. Standard deviations are also sometimes
given to convey the spread of scores.
Inferential Statistics. While experimental groups may show different means
for the various conditions, it is possible that such differences occurred solely on
the basis of chance. Humans almost always show random variation in perfor-
22 Chapter 2: Research Methods

mance, even without manipulating any variables. It is not uncommon to get two
groups of subjects who have different means on a variable, without the differ-
ence being due to any experimental manipulation, in the same way that you are
likely to get a different number of "heads" if you do two series of 10 coin tosses.
In fact, it is unusual to obtain means that are exactly the same. So, the question
becomes, Is the difference big enough that we can rule out chance and assume
the independent variable had an affect? Inferential statistics give us, effectively,
the probability that the difference between the groups is due to chance. If we can
rule out the "chance" explanation, then we infer that the difference was due to
the experimental manipulation.
For a two-group design, the inferential statistical test usually used is a Hest.
For more than two groups, we use an analysis of variance (ANOVA). Both tests
yield a score; for a Hest, we get a value for a statistical term called t, and for
ANOVA, we get a value for F. Most important, we also identify the probability, p,
that the tor F value would be found by chance for that particular set of data if
there was no effect or difference. The smaller the p probably is, the more signifi-
cant our result becomes and the more confident we are that our independent
variable really did cause the difference. This p value will be smaller as the differ- ,

ence between means is greater, as the variability between our observations I

within a condition (standard deviation) is less, and, importantly, as the sample


size of our experiment increases (more subjects, or more measurements per I
subject). A greater sample size gives our experiment greater statistical power to
find significant differences.

Drawing Conclusions
Researchers usually assume that if p is less than .05, they can conclude that the
results are not due to chance and therefore that there was an effect of the inde-
pendent variable. Accidentally concluding that independent or causal variables
had an effect when it was really just chance is referred to as making a Type I
error. If scientists use a .05 cutoff, they will make a Type I error only one time in
20. In traditional sciences, a Type I error is considered a "bad thing" (Wickens,
1998). This makes sense if a researcher is trying to develop a cause-and-effect
model of the physical or social world. The Type I error would lead to the devel-
opment of false theories.
Researchers in human factors have also accepted this implicit assumption
that making a Type I error is bad. Research where the data result in inferential
statistics with p > .05 is not generally accepted for publication in most journals.
Experimenters studying the effects of system design alternatives often conclude
that the alternatives made no difference. Program evaluation where introduc-
tion of a new program resulted in statistics of p > .05 often conclude that the
new program did not work, all because there is greater than a l-in-20 chance
that spurious factors could have caused the results.
The cost of setting this arbitrary cutoff of p = .05 is that researchers are
more likely to make Type II errors, concluding that the experimental manipula-
tion did not have an effect when in fact it did. (Keppel, 1992). This means, for
Experimental Research Methods 23

example, that a safety officer might conclude that a new piece of equipment is
no easier to use under adverse environmental conditions, when in fact it is eas-
ier. The likelihood of making Type I and Type II errors are inversely related.
Thus, if the experimenter showed that the new equipment was not statistically
significantly better (p < .05) than the old, the new equipment might be rejected
even though it might actually be better, and if the p level had been set at 0.10 in-
stead of .05, it would have been concluded to be better.
The total dependence of researchers on the p ~ .05 criterion is especially
problematic in human factors because we frequently must conduct experiments
and evaluations with relatively low numbers of subjects because of expense or
the limited availability of certain highly trained professionals (Wickens, 1998).
As we saw, using a small number of subjects makes the statistical test less power-
ful and more likely to show no significance, or p > .05, even when there is a dif-
ference. In addition, the variability in performance between different subjects or
for the same subject but over time and conditions is also likely to be great when
we try to do our research in more applied environments, where all confounding
extraneous variables are harder to control. Again, these factors make it more
likely that the results will show no significance, or p > .05. The result is that
human factors researchers frequently conclude that there is no difference in ex-
perimental conditions simply because there is more than a l-in-20 chance that it
could be caused by random variation in the data.
In human factors, researchers should consider the probability of a Type II
error when their difference is not significant at the conventional .05 level and
consider the consequences if others use their research to conclude that there is no
difference (Wickens, 1998). For example, will a safety-enhancing device fail to be
adopted? In the cellular phone study, suppose that performance really was worse
with cell phones than without, but the difference was not quite big enough to
reach .05 significance. Might the legislature conclude, in error, that cell phone use
was "safe"? There is no easy answer to the question of how to balance Type I and
Type II statistical errors (Keppel, 1992; Nickerson, 2001). The best advice is to re-
alize that the higher the sample size, the less either type of error will occur, and to
consider the consequences of both types of errors when, out of necessity, the sam-
ple size and power of the design of a human factors experiment must be low.

Statistical Significance Versus Practical Significance


Once chance is ruled out, meaning p < .05, researchers discuss the differences
between groups as though they are a fact. However, it is important to remember
that two groups of numbers can be statistically different from one another with-
out the differences being very large. Suppose we compare two groups of Army
trainees. One group is trained in tank gunnery with a low-fidelity personal com-
puter. Another group is trained with an expensive, high-fidelity simulator. We
might find that when we measure performance, the mean percent correct for the
personal computer group is 80, while the mean percent correct for the simulator
group is 83. If we used a large number of subjects in a very powerful design,
there may be a statistically significant difference between the two groups, and
we would therefore conclude that the simulator is a better training system.
24 Chapter 2: Research Methods

However, especially for applied research, we must look at the difference between
tbe two groups in terms of practical significance. Is it wortb spending millions to
place simulators on every military base to get an increase from 80 percent to 83
percent? This illustrates the tendency for some researchers to place too much
emphasis on statistical significance and not enough emphasis on practical sig-
nificance.

DESCRIPTIVE METHODS
While experimentation in a well controlled environment is valuable for uncov-
ering basic laws and principles, tbere are often cases where research is better
conducted in tbe real world. In many respects, tbe use of complex tasks in a real-
world environment results in more generalizable data that capture more of tbe
characteristics of a complex, real-world environment. Unfortunately, conducting
research in real-world settings often means tbat we must give up the "true" ex-
perimental design because we cannot directly manipulate and control variables.
One example is descriptive research, where researchers simply measure a number
of variables and evaluate how tbey are related to one anotber. Examples of this
type of research include evaluating the driving behavior of local residents at var-
ious intersections, measuring how people use a particular design of ATM (auto-
matic teller machine), and observing workers in a manufacturing plant to
identify the types and frequencies of unsafe behavior.

Observation
In many instances, human factors research consists of recording behavior during
tasks performed under a variety of circumstances. For example, we might install
video recorders in cars (with the drivers' permission) to film tbe circumstances in
which they place or receive calls on a cellular phone during tbeir daily driving.
In planning observational studies, a researcher identifies the variables to be
measured, the methods to be employed for observing and recording each vari-
able, conditions under which observation will occur, the observational time-
frame, and so fortb. For our cellular phone study, we would develop a series of
"vehicle status categories" in which to assign each phone use (e.g., vehicle
stopped, during turn, city street, freeway, etc.) These categories define a
taxonomy. Otherwise, observation will result in a large number of specific pieces
of information that cannot be reduced into any meaningful descriptions or con-
clusions. It is usually most convenient to develop a taxonomy based on pilot
data. This way, an observer can use a checklist to record and classify each in-
stance of new information, condensing the information as it is collected.
In situations where a great deal of data is available, it may be more sensible
to sample only a part of the behavioral data available or to sample behavior dur-
ing different sessions ratber tban all at once. For example, a safety officer is bet-
ter off sampling tbe prevalence of improper procedures or risk-taking behavior
on tbe shop floor during several different sessions over a period of time than all
at once during one day. The goal is to get representative samples of behavior,
Descriptive Methods 25

and this is more easily accomplished by sampling over different days and during
different conditions.

Surveys and Questionnaires


Both basic and applied research frequently rely on surveys or questionnaires to
measure variables. The design of questionnaires and surveys is a challenging task
if it is to be done in a way that yields reliable and valid data, and the reader is re-
ferred to Salvendy and Carayan (1997) and for proper procedures. Question-
naires and surveys sometimes gather qualitative data from open-ended
questions (e.g., "what features on the device would you like to seel" or "what
were the main problems in operating the device?"). However more rigorous
treatment of the survey results can typically be obtained from quantitative data,
often obtained from a numerical rating scale, often with endpoints ranging be-
tween, say, 1-7 or 1-10. Such quantitative data has the advantage of being ad-
dressed by statistical analysis.
A major concern with questionnaires is their validity. Aside from assuring that
questions are designed to appropriately assess the desired content area, under most
circumstances, respondents should be told that their answers will be both confiden-
tial and anonymous. It is common practice for researchers to place identifying
numbers rather than names on the questionnaires. Employees are more likely to be
honest if their names will never be directly associated with their answers.
A problem is that many people do not fill out questionnaires if they are volun-
tary. If the sample of those who do and who do not return questionnaires is differ-
ent along some important dimension relat,d to the topic surveyed, the survey
results will obviously be biased. For example, in interpreting the results of an
anonymous survey of unsafe acts in a factory, those people who are time-stressed in
their job are more likely to commit unsafe acts, but also do not have time to com-
plete the survey. Hence, their acts will be underrepresented in the survey results.
Questionnaires and surveys are, by definition, subjective. Their outputs can
often be contrasted with objective performance data, such as error rates or re-
sponse times. The difference between these two classes of measures is important,
given that subjective measures are often easier and less expensive to obtain, with
a high sample size.
Several good papers have been published on the objective versus subjective
measurement issue (e.g., Hennessy, 1990; Muckier, 1992). If we evaluate the lit-
erature, it is clear that both objective and subjective measures have their uses.
For example, in a study of factors that lead to stress disorders in soldiers,
Solomon, Mikulincer, and Hobfoll (1987) found that objective and subjective
indicators of event stressfulness and social support were predictive of combat
stress reaction and later posttraumatic stress disorder and that "subjective para-
meters were tl,e stronger predictors of the two" (p. 581). In considering subjec-
tive measures, however, it is important to realize that what people subjectively
rate as "preferred" is not always the system feature that supports best perfor-
mance (Andre & Wickens, 1995). For example, people almost always prefer a
colored display to a monochrome one, even when the color is used in such a way
that it can be detrimental to performance.
26 Chapter 2: Research Methods

Incident and Accident Analysis


Sometimes a human factors analyst must determine the overall functioning of a
system, especially with respect to safety. There are a number of methods for
evaluating safety, including the use of surveys and questionnaires. Another
method is to evaluate the occurrence of incidences, accidents, or both. An inci-
dent is where a noticeable problem occurs during system operation, but an ac-
tual accident does not result from it. Some fields, such as the aerospace
community, have formalized databases for recording reported incidents and ac-
cidents (Rosenthal & Reynard, 1991). The Aviation Safety Reporting System's
(ASRS) database is run by NASA and catalogs approximately 30,000 incidents
reported by pilots or air traffic controllers each year.
While this volume of information is potentially invaluable, there are certain
difficulties associated with tlie database (Wickens, 1995). First, the sheer size of
the qualitative database makes it difficult to search to develop or verify causal
analyses. Second, even though people who submit reports are guaranteed
anonymity, not all incidents are reported. A third problem is that the reporting
person may not give information that is necessary for identifying the root causes
of the incident or accident. The more recent use of follow-up interviews has
helped reduce but not completely eliminated the problem.
Accident prevention is a major goal of the human factors profession, espe-
cially as humans are increasingly called upon to operate large and complex sys-
tems. Accidents can be systematically analyzed to determine the underlying root
causes) whether they arose in the human, machine, or some interaction. Acci-
dent analysis has pointed to a multitude of cases where poor system design has
resulted in human error, including problems such as memory failures in the
1989 Northwest Airlines Detroit crash, training and decision errors in the 1987
Air Florida crash at Washington National Airport, and high mental workload
and poor decision making at Three-Mile Island. Accidents are usually the result
of several coinciding breakdowns within a system. This means that most of the
time, there are multiple unsafe elements such as training, procedures, controls
and displays, system components, and so on that would ideally be detected be-
fore rather than after an accident. This requires a proactive approach to system
safety analysis rather than a reactive one such as accident analysis. This topic is
addressed in greater length in Chapter 14.

Data Analysis for Descriptive Measures


Most descriptive research is conducted in order to evaluate the relationships be-
tween a number of variables. Whether the research data has been collected
through observation or questionnaires, the goal is to see whether relationships
exist and to measure their strength. Relationships between variables can be mea-
sured in a number of ways.
Relationships Between Continuous Variables. If we were interested in determin-
ing if there is a relationship between job experience and safety attitudes within
an organization, this could be done by performing a correlational analysis. The
correlational analysis measures the extent to which two variables covary such
Descriptive Methods 27

that the value of one can be somewhat predicted by knowing the value of the
other. For example, in a positive correlation, one variable" increases as the value
of another variable increases; for example, the amount of illumination needed
to read text will be positively correlated with age. In a negative correlation, the
value of one variable decreases as the other variable increases; for example, the
intensity of a soft tone that can be just heard is negatively correlated with age. By
calculating the correlation coefficient, r, we get a measure of the strength of the
relationship. Statistical tests can be performed that determine the probability
that the relationship is due to chance fluctuation in the variables. Thus, we get
information concerning whether a relationship exists (p) and a measure of the
strength of the relationship (r). As with other statistical measures, the likelihood
of finding a significant correlation increases as the sample size-the number of
items measured on both variables-increases.
One caution should be noted. When we find a statistically significant corre-
lation, it is tempting to assume that one of the variables caused the changes seen
in the other variable. This causal inference is unfounded for two reasons. First,
the direction of causation could actually be in the opposite direction. For exam-
ple, we might find that years on the job is negatively correlated with risk-taking.
While it is possible that staying on the job makes an employee more cautious, it
is also possible that being more cautious results in a lower likelihood of injury or
death. This may therefore cause people to stay on the job. Second, a third vari-
able might cause changes in both variables. For example, people who try hard to
do a good job may be encouraged to stay on and may also behave more cau-
tiouslyas part of trying hard.

Complex Modeling and Simulation


Researchers sometimes collect a large number of data points for multiple vari-
ables and then test the relationships through models or simulations (Pew &
Mavor, 1998). According to Bailey (1989), a model is "a mathematical/physical
system, obeying specific rules and conditions, whose behavior is used to under-
stand a real (physical, biological, human-technical, etc.) system to which it is
analogous in certain respects." Models range from simple mathematical equa-
tions, such as the equation that might be used to predict display perception as a
function of brightness level, to highly complex computer simulations (runnable
models); but in all cases, models are more restricted and less "real" than the sys-
tem they reflect.
Models are often used to describe relationships in a physical system or the
physiological relationships in the human body. Mathematical models of the
human bodyhave been used to create simulations that support workstation de-
sign. As an example, COMBIMAN is a simulation model that provides graphical
displays of the human body in various workstation configurations (McDaniel &
Hofroann, 1990). It is used to evaluate the physical accommodation of a pilot to
existing or proposed crew station designs.
Mathematical models can be used to develop complex simulations (see
Elkind et aI., 1990; Pew & Mavor, 1998; Laughery & Corker, 1997). That is, key
variables in some particular system and their interrelationships are mathemati-
28 Chapter 2: Research Methods

cally modeled and coded into a runnable simulation program. Various scenarios
are run, and the model shows what would happen to the system. The predictions
of a simulation can be validated against actual human performance (time, er-
rors, workload). This gives future researchers a powerful tool for predicting the
effects of design changes without having to do experiments. One important ad-
vantage of using models for research is that they can replace evaluation using
human subjects to assess the impact of harmful environmental conditions (Kan-
towitz, 1992; Moroney, 1994).

Literature Surveys
A final research method that should be considered is the careful literature search
and survey. While this often proceeds an experimental write-up, a good litera-
ture search can often substitute for the experiment itself if other researchers
have already answered the experimental question. One particular form of litera-
ture survey, known as a meta-analysis, can integrate the statistical findings of a
lot of other experiments that have examined a common independent variable in
order to draw a collective and very reliable conclusion regarding the effect of
that variable (Rosenthal & Reynard, 1991).

ETHICAL ISSUES
I
It is evident that the majority of human factors research involves the use of peo- I
ple as participants in research. Many professional affiliations and government
agencies have written specific guidelines for the proper way to involve partici-
pants in research. Federal agencies rely strongly on the guidelines found in the
Code of Federal Regulations HHS, Title 45, Part 46; Protections of Human Sub-
jects (Department of Health and Human Services, 1991). The National Institute
of Health has a Web site where students can be certified in human subjects test-
ing (http://ohsr.od.nih.gov/cbtl). Anyone who conducts research using human
participants should become familiar with the federal guidelines as well as APA
published guidelines for ethical treatment of human subjects (American Psy-
chological Association, 1992). These guidelines fundamentally advocate the fol-
lowing principles:

• Protection of participants from mental or physical harm


• The right of participants to privacy with respect to their behavior
• The assurance that participation in research is completely voluntary
• The right of participants to be informed beforehand about the nature of
the experimental procedures

When people participate in an experiment, or to provide data for research


by other methods they are told the general nature of the study. Often, they can-
not be told the exact nature of the hypotheses because this will bias their behav-
ior. Participants should be informed that all results will be kept anonymous and
confidential. This is especially important in human factors because often partici-
pants are employees who fear that their performance will be evaluated by man-
Ethical Issues 29

agement. Finally, participants are generally asked to sign a document, an


informed consent form, stating that they understand the nature and risks of the
experiment, or data gathering project, that their participation is voluntary, and
that they understand they may withdraw at any time. In human factors field re-
search, the experiment is considered to be reasonable in risk if the risks are no
greater than those faced'in the actual job' environment. Research boards in the
university or organization ,\,here the research is to be conducted certify the ade-
quacy of the consent form and that the potential for any risks to the participant
is outweighed by the overall benefits of the research to society.
As one last note, experimenters should always treat participants with re-
spect. Participants are usually self-conscious because they feel their performance
is being evaluated (which it is, in some sense) and they fear that they are not
doing well enough. It is the responsibility of the investigator to put participants
at ease, assuring them that the system components are being evaluated and not
the people themselves. This is one reason that the term user testing has been
changed to usability testing (see next chapter) to refer to situations where people
are asked to use various system configurations in order to evaluate overall ease
of use and other factors.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy