Chapter 2
Chapter 2
Chapter 2
Designers are more efficient and effective if they can raise their level of
abstraction.
This chapter introduces one of the major innovations in the second edition
of Introduction to Software Testing. Software testing is inherently
complicated and our ultimate goal, completely correct software, is
unreachable. The reasons are formal (as discussed below in section 2.1)
and philosophical. As discussed in Chapter 1, it’s not even clear that the
term “correctness” means anything when applied to a piece of engineering
as complicated as a large computer program. Do we expect correctness out
of a building? A car? A transportation system? Intuitively, we know that
all large physical engineering systems have problems, and moreover, there
is no way to say what correct means. This is even more true for software,
which can quickly get orders of magnitude more complicated than physical
structures such as office buildings or airplanes.
Instead of looking for “correctness,” wise software engineers try to
evaluate software’s “behavior” to decide if the behavior is acceptable
within consideration of a large number of factors including (but not limited
to) reliability, safety, maintainability, security, and efficiency. Obviously
this is more complex than the naive desire to show the software is correct.
So what do software engineers do in the face of such overwhelming
complexity? The same thing that physical engineers do–we use
mathematics to “raise our level of abstraction. ” The Model-Driven Test
Design (MDTD) process breaks testing into a series of small tasks that
simplify test generation. Then test designers isolate their task, and work at
a higher level of abstraction by using mathematical engineering structures
to design test values independently of the details of software or design
artifacts, test automation, and test execution.
A key intellectual step in MDTD is test case design. Test case design
can be the primary determining factor in whether tests successfully find
failures in software. Tests can be designed with a “human-based”
approach, where a test engineer uses domain knowledge of the software’s
purpose and his or her experience to design tests that will be effective at
finding faults. Alternatively, tests can be designed to satisfy well-defined
engineering goals such as coverage criteria. This chapter describes the task
activities and then introduces criteria-based test design. Criteria-based test
design will be discussed in more detail in Chapter 5, then specific criteria
on four mathematical structures are described in Part II. After these
preliminaries, the model-driven test design process is defined in detail. The
book website has simple web applications that support the MDTD in the
context of the mathematical structures in Part II.
One of the most important facts that all software testers need to know is
that testing can show only the presence of failures, not their absence. This
is a fundamental, theoretical limitation; to be precise, the problem of
finding all failures in a program is undecidable. Testers often call a test
successful (or effective) if it finds an error. While this is an example of
level 2 thinking, it is also a characterization that is often useful and that we
will use throughout the book. This section explores some of the theoretical
underpinnings of testing as a way to emphasize how important the MDTD
is.
The definitions of fault and failure in Chapter 1 allow us to develop the
reachability, infection, propagation, and revealability model (“RIPR”).
First, we distinguish testing from debugging.
Figure 2.3, often called the “V model,” illustrates a typical scenario for
testing levels and how they relate to software development activities by
isolating each step. Information for each test level is typically derived from
the associated development activity. Indeed, standard advice is to design
the tests concurrently with each development activity, even though the
software will not be in an executable form until the implementation phase.
The reason for this advice is that the mere process of designing tests can
identify defects in design decisions that otherwise appear reasonable. Early
identification of defects is by far the best way to reduce their ultimate cost.
Note that this diagram is not intended to imply a waterfall process. The
synthesis and analysis activities generically apply to any development
process.
Figure 2.3. Software development activities and testing levels – the “V Model”.
2.4 COVERAGE CRITERIA
The essential problem with testing is the numbers. Even a small program
has a huge number of possible inputs. Consider a tiny method that
computes the average of three integers. We have only three input
variables, but each can have any value between -MAXINT and
+MAXINT. On a 32-bit machine, each variable has a possibility of over 4
billion values. With three inputs, this means the method has over 80
Octillion possible inputs!
So no matter whether we are doing unit testing, integration testing, or
system testing, it is impossible to test with all inputs. The input space is, to
all practical purposes, infinite. Thus a test designer’s goal could be
summarized in a very high-level way as searching a huge input space,
hoping to find the fewest tests that will reveal the most problems. This is
the source of two key problems in testing: (1) how do we search? and (2)
when do we stop? Coverage criteria give us structured, practical ways to
search the input space. Satisfying a coverage criterion gives a tester some
amount of confidence in two crucial goals: (A) we have looked in many
corners of the input space, and (B) our tests have a fairly low amount of
overlap.
Coverage criteria have many advantages for improving the quality and
reducing the cost of test data generation. Coverage criteria can maximize
the “bang for the buck,” with fewer tests that are effective at finding more
faults. Well-designed criteria-based tests will be comprehensive, yet factor
out unwanted redundancy. Coverage criteria also provide traceability from
software artifacts such as source, design models, requirements, and input
space descriptions. This supports regression testing by making it easier to
decide which tests need to be reused, modified, or deleted. From an
engineering perspective, one of the strongest benefits of coverage criteria
is they provide a “stopping rule” for testing; that is, we know in advance
approximately how many tests are needed and we know when we have
“enough” tests. This is a powerful tool for engineers and managers.
Coverage criteria also lend themselves well to automation. As we will
formalize in Chapter 5, a test requirement is a specific element of a
software artifact that a test case must satisfy or cover, and a coverage
criterion is a rule or collection of rules that yield test requirements. For
example, the coverage criterion “cover every statement” yields one test
requirement for each statement. The coverage criterion “cover every
functional requirement” yields one test requirement for each functional
requirement. Test requirements can be stated in semi-formal, mathematical
terms, and then manipulated algorithmically. This allows much of the test
data design and generation process to be automated.
The research literature presents a lot of overlapping and identical
coverage criteria. Researchers have invented hundreds of criteria on
dozens of software artifacts. However, if we abstract these artifacts into
mathematical models, many criteria turn out to be exactly the same. For
example, the idea of covering pairs of edges in finite state machines was
first published in 1976, using the term switch cover. Later, the same idea
was applied to control flow graphs and called two-trip, still again, the same
idea was “invented” for state transition diagrams and called transition-pair
(we define this formally using the generic term edge-pair in Chapter 7).
Although they looked very different in the research literature, if we
generalize these structures to graphs, all three ideas are the same.
Similarly, node coverage and edge coverage have each been defined
dozens of times.
Sidebar
Black-Box and White-Box Testing
Black-box testing and the complementary white-box testing are old and
widely used terms in software testing. In black-box testing, we derive
tests from external descriptions of the software, including
specifications, requirements, and design. In white-box testing, on the
other hand, we derive tests from the source code internals of the
software, specifically including branches, individual conditions, and
statements. This somewhat arbitrary distinction started to lose
coherence when the term gray-box testing was applied to developing
tests from design elements, and the approach taken in this book
eliminates the need for the distinction altogether.
Some older sources say that white-box testing is used for system testing
and black-box testing for unit testing. This distinction is certainly false,
since all testing techniques considered to be white-box can be used at
the system level, and all testing techniques considered to be black-box
can be used on individual units. In reality, unit testers are currently
more likely to use white-box testing than system testers are, simply
because white-box testing requires knowledge of the program and is
more expensive to apply, costs that can balloon on a large system.
This book relies on developing tests from mathematical abstractions
such as graphs and logical expressions. As will become clear in Part II,
these structures can be extracted from any software artifact, including
source, design, specifications, or requirements. Thus asking whether a
coverage criterion is black-box or white-box is the wrong question. One
more properly should ask from what level of abstraction is the structure
drawn.
In fact, all test coverage criteria can be boiled down to a few dozen
criteria on just four mathematical structures: input domains, graphs, logic
expressions, and syntax descriptions (grammars). Just like mechanical,
civil, and electrical engineers use calculus and algebra to create abstract
representations of physical structures, then solve various problems at this
abstract level, software engineers can use discrete math to create abstract
representations of software, then solve problems such as test design.
The core of this book is organized around these four structures, as
reflected in the four chapters in Part II. This structure greatly simplifies
teaching test design, and our classroom experience with the first edition of
this book helped us realize this structure also leads to a simplified testing
process. This process allows test design to be abstracted and carried out
efficiently, and also separates test activities that need different knowledge
and skill sets. Because the approach is based on these four abstract models,
we call it the Model-Driven Test Design process (MDTD).
Sidebar
MDTD and Model-Based Testing
Model-based testing (MBT) is the design of software tests from an
abstract model that represents one or more aspects of the software. The
model usually, but not always, represents some aspects of the behavior
of the software, and sometimes, but not always, is able to generate
expected outputs. The models are often described with UML diagrams,
although more formal models as well as other informal modeling
languages are also used. MBT typically assumes that the model has
been built to specify the behavior of the software and was created
during a design stage of development.
The ideas presented in this book are not, strictly speaking, exclusive to
model-based testing. However, there is much overlap with MDTD and
most of the concepts in this book can be directly used as part of MBT.
Specifically, we derive our tests from abstract structures that are very
similar to models. An important difference is that these structures can
be created after the software is implemented, by the tester as part of
test design. Thus, the structures do not specify behavior; they represent
behavior. If a model was created to specify the software behavior, a
tester can certainly use it, but if not, a tester can create one. Second, we
create idealized structures that are more abstract than most modeling
languages. For example, instead of UML statecharts or Petri nets, we
design our tests from graphs. If model-based testing is being used, the
graphs can be derived from a graphical model. Third, model-based
testing explicitly does not use the source code implementation to design
tests. In this book, abstract structures can be created from the
implementation via things like control flow graphs, call graphs, and
conditionals in decision statements.
2.5.1 Test Design
As said above, test design is the process of designing input values that will
effectively test software. In practice, engineers use two general approaches
to designing tests. In criteria-based test design, we design test values that
satisfy engineering goals such as coverage criteria. In human-based test
design, we design test values based on domain knowledge of the program
and human knowledge of testing. These are quite different activities.
Criteria-based test design is the most technical and mathematical job in
software testing. To apply criteria effectively, the tester needs knowledge
of discrete math, programming, and testing. That is, this requires much of
a traditional degree in computer science. For somebody with a degree in
computer science or software engineering, this is intellectually stimulating,
rewarding, and challenging. Much of the work involves creating abstract
models and manipulating them to design high-quality tests. In software
development, this is analogous to the job of software architect; in building
construction, this is analogous to the job of construction engineer. If an
organization uses people who are not qualified (that is, do not have the
required knowledge), they will spend time creating ineffective tests and be
dissatisfied at work.
Human-based test design is quite different. The testers must have
knowledge of the software’s application domain, of testing, and of user
interfaces. Human-based test designers explicitly attempt to find stress
tests, tests that stress the software by including very large or very small
values, boundary values, invalid values, or other values that the software
may not expect during typical behavior. Human-based testers also
explicitly consider actions the users might do, including unusual actions.
This is much harder than developers may think and more necessary than
many test researchers and educators realize. Although criteria-based
approaches often implicitly include techniques such as stress testing, they
can be blind to special situations, and may miss problems that human-
based tests would not. Although almost no traditional CS is needed, an
empirical background (biology or psychology) or a background in logic
(law, philosophy, math) is helpful. If the software is embedded on an
airplane, a human-based test designer should understand piloting; if the
software runs an online store, the test designers should understand
marketing and the products being sold. For people with these abilities,
human-based test design is intellectually stimulating, rewarding, and
challenging–but often not to typical CS majors, who usually want to build
software!
Many people think of criteria-based test design as being used for unit
testing and human-based test design as being used for system testing.
However, this is an artificial distinction. When using criteria, a graph is
just a graph and it does not matter if it started as a control flow graph, a
call graph, or an activity diagram. Likewise, human-based tests can and
should be used to test individual methods and classes. The main point is
that the approaches are complementary and we need both to fully test
software.
2.5.2 Test Automation
The final result of test design is input values for the software. Test
automation is the process of embedding test values into executable scripts.
Note that automated tool support for test design is not considered to be test
automation. This is necessary for efficient and frequent execution of tests.
The programming difficulty varies greatly by the software under test
(SUT). Some tests can be automated with basic programming skills,
whereas if the software has low controllability or observability (for
example, with embedded, real-time, or web software), test automation will
require more knowledge and problem-solving skills. The test automator
will need to add additional software to access the hardware, simulate
conditions, or otherwise control the environment. However, many domain
experts using human-based testing do not have programming skills. And
many criteria-based test design experts find test automation boring. If a
test manager asks a domain expert to automate tests, the expert is likely to
resist and do poorly; if a test manager asks a criteria-based test designer to
automate tests, the designer is quite likely to go looking for a development
job.
2.5.3 Test Execution
Test execution is the process of running tests on the software and recording
the results. This requires basic computer skills and can often be assigned to
interns or employees with little technical background. If all tests are
automated, this is trivial. However, few organizations have managed to
achieve 100% test automation. If tests must be run by hand, this becomes
the most time-consuming testing task. Hand-executed tests require the
tester to be meticulous with bookkeeping. Asking a good test designer to
hand execute tests not only wastes a valuable (and possibly highly paid)
resource, the test designer will view it as a very tedious job and will soon
look for other work.
2.5.4 Test Evaluation
These four tasks focus on designing, implementing and running the tests.
Of course, they do not cover all aspects of testing. This categorization
omits important tasks like test management, maintenance, and
documentation, among others. We focus on these because they are
essential to developing test values.
A challenge to using criteria-based test design is the amount and type of
knowledge needed. Many organizations have a shortage of highly
technical test engineers. Few universities teach test criteria to
undergraduates and many graduate classes focus on theory, supporting
research rather than practical application. However, the good news is that
with a well-planned division of labor, a single criteria-based test designer
can support a fairly large number of test automators, executors and
evaluators.
The model-driven test design process explicitly supports this division of
labor. This process is illustrated in Figure 2.4, which shows test design
activities above the line and other test activities below.
The MDTD lets test designers “raise their level of abstraction ” so that a
small subset of testers can do the mathematical aspects of designing and
developing tests. This is analogous to construction design, where one
engineer creates a design that is followed by many carpenters, plumbers,
and electricians. The traditional testers and programmers can then do their
parts: finding values, automating the tests, running tests, and evaluating
them. This supports the truism that “testers ain’t mathematicians.”
The starting point in Figure 2.4 is a software artifact. This could be
program source, a UML diagram, natural language requirements, or even a
user manual. A criteria-based test designer uses that artifact to create an
abstract model of the software in the form of an input domain, a graph,
logic expressions, or a syntax description. Then a coverage criterion is
applied to create test requirements. A human-based test designer uses the
artifact to consider likely problems in the software, then creates
requirements to test for those problems. These requirements are sometimes
refined into a more specific form, called the test specification. For
example, if edge coverage is being used, a test requirement specifies which
edge in a graph must be covered. A refined test specification would be a
complete path through the graph.
Once the test requirements are refined, input values that satisfy the
requirements must be defined. This brings the process down from the
design abstraction level to the implementation abstraction level. These are
analogous to the abstract and concrete tests in the model-based testing
literature. The input values are augmented with other values needed to run
the tests (including values to reach the point in the software being tested,
to display output, and to terminate the program). The test cases are then
automated into test scripts (when feasible and practical), run on the
software to produce results, and results are evaluated. It is important that
results from automation and execution be used to feed back into test
design, resulting in additional or modified tests.
This process has two major benefits. First, it provides a clean separation
of tasks between test design, automation, execution and evaluation.
Second, raising our abstraction level makes test design much easier.
Instead of designing tests for a messy implementation or complicated
design model, we design at an elegant mathematical level of abstraction.
This is exactly how algebra and calculus has been used in traditional
engineering for decades.
Figure 2.5 illustrates this process for unit testing of a small Java method.
The Java source is shown on the left, and its control flow graph is in the
middle. This is a standard control flow graph with the initial node marked
as a dotted circle and the final nodes marked as double circles (this
notation will be defined rigorously in Chapter 7). The nodes are annotated
with the source statements from the method for convenience.
Figure 2.5. Example method, CFG, test requirements and test paths.
The first step in the MDTD process is to take this software artifact, the
indexOf() method, and model it as an abstract structure. The control flow
graph from Figure 2.5 is turned into an abstract version. This graph can be
represented textually as a list of edges, initial nodes, and final nodes, as
shown in Figure 2.5 under Edges. If the tester uses edge-pair coverage,
(fully defined in Chapter 7), six requirements are derived. For example,
test requirement #3, [2, 3, 2], means the subpath from node 2 to 3 and back
to 2 must be executed. The Test Paths box shows three complete test
paths through the graph that will cover all six test requirements.
EXERCISES
Chapter 2.
1. How are faults and failures related to testing and debugging?
2. Answer question (a) or (b), but not both, depending on your
background.
(a) If you do, or have, worked for a software development
company, how much effort did your testing / QA team put into
each of the four test activities? (test design, automation,
execution, evaluation)
(b) If you have never worked for a software development company,
which of the four test activities do you think you are best
qualified for? (test design, automation, execution, evaluation)
2.7 BIBLIOGRAPHIC NOTES