Chapter 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

2

Model-Driven Test Design

Designers are more efficient and effective if they can raise their level of
abstraction.

This chapter introduces one of the major innovations in the second edition
of Introduction to Software Testing. Software testing is inherently
complicated and our ultimate goal, completely correct software, is
unreachable. The reasons are formal (as discussed below in section 2.1)
and philosophical. As discussed in Chapter 1, it’s not even clear that the
term “correctness” means anything when applied to a piece of engineering
as complicated as a large computer program. Do we expect correctness out
of a building? A car? A transportation system? Intuitively, we know that
all large physical engineering systems have problems, and moreover, there
is no way to say what correct means. This is even more true for software,
which can quickly get orders of magnitude more complicated than physical
structures such as office buildings or airplanes.
Instead of looking for “correctness,” wise software engineers try to
evaluate software’s “behavior” to decide if the behavior is acceptable
within consideration of a large number of factors including (but not limited
to) reliability, safety, maintainability, security, and efficiency. Obviously
this is more complex than the naive desire to show the software is correct.
So what do software engineers do in the face of such overwhelming
complexity? The same thing that physical engineers do–we use
mathematics to “raise our level of abstraction. ” The Model-Driven Test
Design (MDTD) process breaks testing into a series of small tasks that
simplify test generation. Then test designers isolate their task, and work at
a higher level of abstraction by using mathematical engineering structures
to design test values independently of the details of software or design
artifacts, test automation, and test execution.
A key intellectual step in MDTD is test case design. Test case design
can be the primary determining factor in whether tests successfully find
failures in software. Tests can be designed with a “human-based”
approach, where a test engineer uses domain knowledge of the software’s
purpose and his or her experience to design tests that will be effective at
finding faults. Alternatively, tests can be designed to satisfy well-defined
engineering goals such as coverage criteria. This chapter describes the task
activities and then introduces criteria-based test design. Criteria-based test
design will be discussed in more detail in Chapter 5, then specific criteria
on four mathematical structures are described in Part II. After these
preliminaries, the model-driven test design process is defined in detail. The
book website has simple web applications that support the MDTD in the
context of the mathematical structures in Part II.

2.1 SOFTWARE TESTING FOUNDATIONS

One of the most important facts that all software testers need to know is
that testing can show only the presence of failures, not their absence. This
is a fundamental, theoretical limitation; to be precise, the problem of
finding all failures in a program is undecidable. Testers often call a test
successful (or effective) if it finds an error. While this is an example of
level 2 thinking, it is also a characterization that is often useful and that we
will use throughout the book. This section explores some of the theoretical
underpinnings of testing as a way to emphasize how important the MDTD
is.
The definitions of fault and failure in Chapter 1 allow us to develop the
reachability, infection, propagation, and revealability model (“RIPR”).
First, we distinguish testing from debugging.

Definition 2.6 Testing: Evaluating software by observing its


execution.

Definition 2.7 Test Failure: Execution of a test that results in a


software failure.

Definition 2.8 Debugging: The process of finding a fault given a


failure.
Of course the central issue is that for a given fault, not all inputs will
“trigger” the fault into creating incorrect output (a failure). Also, it is often
very difficult to relate a failure to the associated fault. Analyzing these
ideas leads to the fault/failure model, which states that four conditions are
needed for a failure to be observed.
Figure 2.1 illustrates the conditions. First, a test must reach the location
or locations in the program that contain the fault (Reachability). After the
location is executed, the state of the program must be incorrect (Infection).
Third, the infected state must propagate through the rest of the execution
and cause some output or final state of the program to be incorrect
(Propagation). Finally, the tester must observe part of the incorrect portion
of the final program state (Revealability). If the tester only observes parts
of the correct portion of the final program state, the failure is not revealed.
This is shown in the cross-hatched intersection in Figure 2.1. Issues with
revealing failures will be discussed in Chapter 4 when we present test
automation strategies.

Figure 2.1. Reachability, Infection, Propagation, Revealability (RIPR) model.

Collectively, these four conditions are known as the fault/failure model,


or the RIPR model.
It is important to note that the RIPR model applies even when the fault
is missing code (so-called faults of omission). In particular, when
execution passes through the location where the missing code should be,
the program counter, which is part of the program state, necessarily has the
wrong value.
From a practitioner’s view, these limitations mean that software testing
is complex and difficult. The common way to deal with complexity in
engineering is to use abstraction by abstracting out complicating details
that can be safely ignored by modeling the problem with some
mathematical structures. That is a central theme of this book, which we
begin by analyzing the separate technical activities involved in creating
good tests.

2.2 SOFTWARE TESTING ACTIVITIES

In this book, a test engineer is an Information Technology (IT)


professional who is in charge of one or more technical test activities,
including designing test inputs, producing test case values, running test
scripts, analyzing results, and reporting results to developers and
managers. Although we cast the description in terms of test engineers,
every engineer involved in software development should realize that he or
she sometimes wears the hat of a test engineer. The reason is that each
software artifact produced over the course of a product’s development has,
or should have, an associated set of test cases, and the person best
positioned to define these test cases is often the designer of the artifact. A
test manager is in charge of one or more test engineers. Test managers set
test policies and processes, interact with other managers on the project,
and otherwise help the engineers test software effectively and efficiently.
Figure 2.2 shows some of the major activities of test engineers. A test
engineer must design tests by creating test requirements. These
requirements are then transformed into actual values and scripts that are
ready for execution. These executable tests are run against the software,
denoted P in the figure, and the results are evaluated to determine if the
tests reveal a fault in the software. These activities may be carried out by
one person or by several, and the process is monitored by a test manager.
Figure 2.2. Activities of test engineers.

One of a test engineer’s most powerful tools is a formal coverage


criterion. Formal coverage criteria give test engineers ways to decide what
test inputs to use during testing, making it more likely that the tester will
find problems in the program and providing greater assurance that the
software is of high quality and reliability. Coverage criteria also provide
stopping rules for the test engineers. The technical core of this book
presents the coverage criteria that are available, describes how they are
supported by tools (commercial and otherwise), explains how they can
best be applied, and suggests how they can be integrated into the overall
development process.
Software testing activities have long been categorized into levels, and
the most often used level categorization is based on traditional software
process steps. Although most types of tests can only be run after some part
of the software is implemented, tests can be designed and constructed
during all software development steps. The most time-consuming parts of
testing are actually the test design and construction, so test activities can
and should be carried out throughout development.

2.3 TESTING LEVELS BASED ON SOFTWARE


ACTIVITY

Tests can be derived from requirements and specifications, design


artifacts, or the source code. In traditional texts, a different level of testing
accompanies each distinct software development activity:
Acceptance Testing : assess software with respect to requirements or
users’ needs.
System Testing : assess software with respect to architectural design
and overall behavior.
Integration Testing : assess software with respect to subsystem design.
Module Testing: assess software with respect to detailed design.
Unit Testing : assess software with respect to implementation.

Figure 2.3, often called the “V model,” illustrates a typical scenario for
testing levels and how they relate to software development activities by
isolating each step. Information for each test level is typically derived from
the associated development activity. Indeed, standard advice is to design
the tests concurrently with each development activity, even though the
software will not be in an executable form until the implementation phase.
The reason for this advice is that the mere process of designing tests can
identify defects in design decisions that otherwise appear reasonable. Early
identification of defects is by far the best way to reduce their ultimate cost.
Note that this diagram is not intended to imply a waterfall process. The
synthesis and analysis activities generically apply to any development
process.

Figure 2.3. Software development activities and testing levels – the “V Model”.

The requirements analysis phase of software development captures the


customer’s needs. Acceptance testing is designed to determine whether the
completed software in fact meets these needs. In other words, acceptance
testing probes whether the software does what the users want. Acceptance
testing must involve users or other individuals who have strong domain
knowledge.
The architectural design phase of software development chooses
components and connectors that together realize a system whose
specification is intended to meet the previously identified requirements.
System testing is designed to determine whether the assembled system
meets its specifications. It assumes that the pieces work individually, and
asks if the system works as a whole. This level of testing usually looks for
design and specification problems. It is a very expensive place to find
lower-level faults and is usually not done by the programmers, but by a
separate testing team.
The subsystem design phase of software development specifies the
structure and behavior of subsystems, each of which is intended to satisfy
some function in the overall architecture. Often, the subsystems are
adaptations of previously developed software. Integration testing is
designed to assess whether the interfaces between modules (defined
below) in a subsystem have consistent assumptions and communicate
correctly. Integration testing must assume that modules work correctly.
Some testing literature uses the terms integration testing and system testing
interchangeably; in this book, integration testing does not refer to testing
the integrated system or subsystem. Integration testing is usually the
responsibility of members of the development team.
The detailed design phase of software development determines the
structure and behavior of individual modules. A module is a collection of
related units that are assembled in a file, package, or class. This
corresponds to a file in C, a package in Ada, and a class in C++ and Java.
Module testing is designed to assess individual modules in isolation,
including how the component units interact with each other and their
associated data structures. Most software development organizations make
module testing the responsibility of the programmer; hence the common
term developer testing.
Implementation is the phase of software development that actually
produces code. A program unit, or procedure, is one or more contiguous
program statements, with a name that other parts of the software use to call
it. Units are called functions in C and C++, procedures or functions in
Ada, methods in Java, and subroutines in Fortran. Unit testing is designed
to assess the units produced by the implementation phase and is the
“lowest” level of testing. In some cases, such as when building general-
purpose library modules, unit testing is done without knowledge of the
encapsulating software application. As with module testing, most software
development organizations make unit testing the responsibility of the
programmer, again, often called developer testing. It is straightforward to
package unit tests together with the corresponding code through the use of
tools such as JUnit for Java classes.
Because of the many dependencies among methods in classes, it is
common among developers using object-oriented (OO) software to
combine unit and module testing and use the term unit testing or
developertesting.
Not shown in Figure 2.3 is regression testing, a standard part of the
maintenance phase of software development. Regression testing is done
after changes are made to the software, to help ensure that the updated
software still possesses the functionality it had before the updates.
Mistakes in requirements and high-level design end up being
implemented as faults in the program; thus testing can reveal them.
Unfortunately, the software faults that come from requirements and design
mistakes are visible only through testing months or years after the original
mistake. The effects of the mistake tend to be dispersed throughout
multiple software components; hence such faults are usually difficult to
pin down and expensive to correct. On the positive side, even if tests
cannot be executed, the very process of defining tests can identify a
significant fraction of the mistakes in requirements and design. Hence, it is
important for test planning to proceed concurrently with requirements
analysis and design and not be put off until late in a project. Fortunately,
through techniques such as use case analysis, test planning is becoming
better integrated with requirements analysis in standard software practice.
Although most of the literature emphasizes these levels in terms of
when they are applied, a more important distinction is on the types of
faults that we are looking for. The faults are based on the software artifact
that we are testing, and the software artifact that we derive the tests from.
For example, unit and module tests are derived to test units and modules,
and we usually try to find faults that can be found when executing the units
and modules individually.
One final note is that OO software changes the testing levels. OO
software blurs the distinction between units and modules, so the OO
software testing literature has developed a slight variation of these levels.
Intra-method testing evaluates individual methods. Inter-method testing
evaluates pairs of methods within the same class. Intra-class testing
evaluates a single entire class, usually as sequences of calls to methods
within the class. Finally, inter-class testing evaluates more than one class
at the same time. The first three are variations of unit and module testing,
whereas inter-class testing is a type of integration testing.

2.4 COVERAGE CRITERIA

The essential problem with testing is the numbers. Even a small program
has a huge number of possible inputs. Consider a tiny method that
computes the average of three integers. We have only three input
variables, but each can have any value between -MAXINT and
+MAXINT. On a 32-bit machine, each variable has a possibility of over 4
billion values. With three inputs, this means the method has over 80
Octillion possible inputs!
So no matter whether we are doing unit testing, integration testing, or
system testing, it is impossible to test with all inputs. The input space is, to
all practical purposes, infinite. Thus a test designer’s goal could be
summarized in a very high-level way as searching a huge input space,
hoping to find the fewest tests that will reveal the most problems. This is
the source of two key problems in testing: (1) how do we search? and (2)
when do we stop? Coverage criteria give us structured, practical ways to
search the input space. Satisfying a coverage criterion gives a tester some
amount of confidence in two crucial goals: (A) we have looked in many
corners of the input space, and (B) our tests have a fairly low amount of
overlap.
Coverage criteria have many advantages for improving the quality and
reducing the cost of test data generation. Coverage criteria can maximize
the “bang for the buck,” with fewer tests that are effective at finding more
faults. Well-designed criteria-based tests will be comprehensive, yet factor
out unwanted redundancy. Coverage criteria also provide traceability from
software artifacts such as source, design models, requirements, and input
space descriptions. This supports regression testing by making it easier to
decide which tests need to be reused, modified, or deleted. From an
engineering perspective, one of the strongest benefits of coverage criteria
is they provide a “stopping rule” for testing; that is, we know in advance
approximately how many tests are needed and we know when we have
“enough” tests. This is a powerful tool for engineers and managers.
Coverage criteria also lend themselves well to automation. As we will
formalize in Chapter 5, a test requirement is a specific element of a
software artifact that a test case must satisfy or cover, and a coverage
criterion is a rule or collection of rules that yield test requirements. For
example, the coverage criterion “cover every statement” yields one test
requirement for each statement. The coverage criterion “cover every
functional requirement” yields one test requirement for each functional
requirement. Test requirements can be stated in semi-formal, mathematical
terms, and then manipulated algorithmically. This allows much of the test
data design and generation process to be automated.
The research literature presents a lot of overlapping and identical
coverage criteria. Researchers have invented hundreds of criteria on
dozens of software artifacts. However, if we abstract these artifacts into
mathematical models, many criteria turn out to be exactly the same. For
example, the idea of covering pairs of edges in finite state machines was
first published in 1976, using the term switch cover. Later, the same idea
was applied to control flow graphs and called two-trip, still again, the same
idea was “invented” for state transition diagrams and called transition-pair
(we define this formally using the generic term edge-pair in Chapter 7).
Although they looked very different in the research literature, if we
generalize these structures to graphs, all three ideas are the same.
Similarly, node coverage and edge coverage have each been defined
dozens of times.

Sidebar
Black-Box and White-Box Testing
Black-box testing and the complementary white-box testing are old and
widely used terms in software testing. In black-box testing, we derive
tests from external descriptions of the software, including
specifications, requirements, and design. In white-box testing, on the
other hand, we derive tests from the source code internals of the
software, specifically including branches, individual conditions, and
statements. This somewhat arbitrary distinction started to lose
coherence when the term gray-box testing was applied to developing
tests from design elements, and the approach taken in this book
eliminates the need for the distinction altogether.
Some older sources say that white-box testing is used for system testing
and black-box testing for unit testing. This distinction is certainly false,
since all testing techniques considered to be white-box can be used at
the system level, and all testing techniques considered to be black-box
can be used on individual units. In reality, unit testers are currently
more likely to use white-box testing than system testers are, simply
because white-box testing requires knowledge of the program and is
more expensive to apply, costs that can balloon on a large system.
This book relies on developing tests from mathematical abstractions
such as graphs and logical expressions. As will become clear in Part II,
these structures can be extracted from any software artifact, including
source, design, specifications, or requirements. Thus asking whether a
coverage criterion is black-box or white-box is the wrong question. One
more properly should ask from what level of abstraction is the structure
drawn.

In fact, all test coverage criteria can be boiled down to a few dozen
criteria on just four mathematical structures: input domains, graphs, logic
expressions, and syntax descriptions (grammars). Just like mechanical,
civil, and electrical engineers use calculus and algebra to create abstract
representations of physical structures, then solve various problems at this
abstract level, software engineers can use discrete math to create abstract
representations of software, then solve problems such as test design.
The core of this book is organized around these four structures, as
reflected in the four chapters in Part II. This structure greatly simplifies
teaching test design, and our classroom experience with the first edition of
this book helped us realize this structure also leads to a simplified testing
process. This process allows test design to be abstracted and carried out
efficiently, and also separates test activities that need different knowledge
and skill sets. Because the approach is based on these four abstract models,
we call it the Model-Driven Test Design process (MDTD).

Sidebar
MDTD and Model-Based Testing
Model-based testing (MBT) is the design of software tests from an
abstract model that represents one or more aspects of the software. The
model usually, but not always, represents some aspects of the behavior
of the software, and sometimes, but not always, is able to generate
expected outputs. The models are often described with UML diagrams,
although more formal models as well as other informal modeling
languages are also used. MBT typically assumes that the model has
been built to specify the behavior of the software and was created
during a design stage of development.
The ideas presented in this book are not, strictly speaking, exclusive to
model-based testing. However, there is much overlap with MDTD and
most of the concepts in this book can be directly used as part of MBT.
Specifically, we derive our tests from abstract structures that are very
similar to models. An important difference is that these structures can
be created after the software is implemented, by the tester as part of
test design. Thus, the structures do not specify behavior; they represent
behavior. If a model was created to specify the software behavior, a
tester can certainly use it, but if not, a tester can create one. Second, we
create idealized structures that are more abstract than most modeling
languages. For example, instead of UML statecharts or Petri nets, we
design our tests from graphs. If model-based testing is being used, the
graphs can be derived from a graphical model. Third, model-based
testing explicitly does not use the source code implementation to design
tests. In this book, abstract structures can be created from the
implementation via things like control flow graphs, call graphs, and
conditionals in decision statements.

2.5 MODEL-DRIVEN TEST DESIGN

Academic teachers and researchers have long focused on the design of


tests. We define test design to be the process of creating input values that
will effectively test software. This is the most mathematical and
technically challenging part of testing, however, academics can easily
forget that this is only a small part of testing.
The job of developing tests can be divided into four discrete tasks: test
design, test automation, test execution, and test evaluation. Many
organizations assign the same person to all tasks. However, each task
requires different skills, background knowledge, education, and training.
Assigning the same person to all these tasks is akin to assigning the same
software developer to requirements, design, implementation, integration,
and configuration control. Although this was common in previous decades,
few companies today assign the same engineers to all development tasks.
Engineers specialize, sometimes temporarily, sometimes for a project, and
sometimes for their entire career. But should test organizations still assign
the same people to all test tasks? They require different skills, and it is
unreasonable to expect all testers to be good at all tasks, so this clearly
wastes resources. The following subsections analyze each of these tasks in
detail.

2.5.1 Test Design

As said above, test design is the process of designing input values that will
effectively test software. In practice, engineers use two general approaches
to designing tests. In criteria-based test design, we design test values that
satisfy engineering goals such as coverage criteria. In human-based test
design, we design test values based on domain knowledge of the program
and human knowledge of testing. These are quite different activities.
Criteria-based test design is the most technical and mathematical job in
software testing. To apply criteria effectively, the tester needs knowledge
of discrete math, programming, and testing. That is, this requires much of
a traditional degree in computer science. For somebody with a degree in
computer science or software engineering, this is intellectually stimulating,
rewarding, and challenging. Much of the work involves creating abstract
models and manipulating them to design high-quality tests. In software
development, this is analogous to the job of software architect; in building
construction, this is analogous to the job of construction engineer. If an
organization uses people who are not qualified (that is, do not have the
required knowledge), they will spend time creating ineffective tests and be
dissatisfied at work.
Human-based test design is quite different. The testers must have
knowledge of the software’s application domain, of testing, and of user
interfaces. Human-based test designers explicitly attempt to find stress
tests, tests that stress the software by including very large or very small
values, boundary values, invalid values, or other values that the software
may not expect during typical behavior. Human-based testers also
explicitly consider actions the users might do, including unusual actions.
This is much harder than developers may think and more necessary than
many test researchers and educators realize. Although criteria-based
approaches often implicitly include techniques such as stress testing, they
can be blind to special situations, and may miss problems that human-
based tests would not. Although almost no traditional CS is needed, an
empirical background (biology or psychology) or a background in logic
(law, philosophy, math) is helpful. If the software is embedded on an
airplane, a human-based test designer should understand piloting; if the
software runs an online store, the test designers should understand
marketing and the products being sold. For people with these abilities,
human-based test design is intellectually stimulating, rewarding, and
challenging–but often not to typical CS majors, who usually want to build
software!
Many people think of criteria-based test design as being used for unit
testing and human-based test design as being used for system testing.
However, this is an artificial distinction. When using criteria, a graph is
just a graph and it does not matter if it started as a control flow graph, a
call graph, or an activity diagram. Likewise, human-based tests can and
should be used to test individual methods and classes. The main point is
that the approaches are complementary and we need both to fully test
software.

2.5.2 Test Automation

The final result of test design is input values for the software. Test
automation is the process of embedding test values into executable scripts.
Note that automated tool support for test design is not considered to be test
automation. This is necessary for efficient and frequent execution of tests.
The programming difficulty varies greatly by the software under test
(SUT). Some tests can be automated with basic programming skills,
whereas if the software has low controllability or observability (for
example, with embedded, real-time, or web software), test automation will
require more knowledge and problem-solving skills. The test automator
will need to add additional software to access the hardware, simulate
conditions, or otherwise control the environment. However, many domain
experts using human-based testing do not have programming skills. And
many criteria-based test design experts find test automation boring. If a
test manager asks a domain expert to automate tests, the expert is likely to
resist and do poorly; if a test manager asks a criteria-based test designer to
automate tests, the designer is quite likely to go looking for a development
job.

2.5.3 Test Execution

Test execution is the process of running tests on the software and recording
the results. This requires basic computer skills and can often be assigned to
interns or employees with little technical background. If all tests are
automated, this is trivial. However, few organizations have managed to
achieve 100% test automation. If tests must be run by hand, this becomes
the most time-consuming testing task. Hand-executed tests require the
tester to be meticulous with bookkeeping. Asking a good test designer to
hand execute tests not only wastes a valuable (and possibly highly paid)
resource, the test designer will view it as a very tedious job and will soon
look for other work.

2.5.4 Test Evaluation

Test evaluation is the process of evaluating the results of testing and


reporting to developers. This is much harder than it may seem, especially
reporting the results to developers. Evaluating the results of tests requires
knowledge of the domain, testing, user interfaces, and psychology. The
knowledge required is very much the same as for human-based test
designers. If tests are well-automated, then most test evaluation can (and
should) be embedded in the test scripts. However, when automation is
incomplete or when correct output cannot neatly be encoded in assertions,
this task gets more complicated. Typical CS or software engineering
majors will not enjoy this job, but to the right person, this is intellectually
stimulating, rewarding, and challenging.

2.5.5 Test Personnel and Abstraction

These four tasks focus on designing, implementing and running the tests.
Of course, they do not cover all aspects of testing. This categorization
omits important tasks like test management, maintenance, and
documentation, among others. We focus on these because they are
essential to developing test values.
A challenge to using criteria-based test design is the amount and type of
knowledge needed. Many organizations have a shortage of highly
technical test engineers. Few universities teach test criteria to
undergraduates and many graduate classes focus on theory, supporting
research rather than practical application. However, the good news is that
with a well-planned division of labor, a single criteria-based test designer
can support a fairly large number of test automators, executors and
evaluators.
The model-driven test design process explicitly supports this division of
labor. This process is illustrated in Figure 2.4, which shows test design
activities above the line and other test activities below.

Figure 2.4. Model-driven test design.

The MDTD lets test designers “raise their level of abstraction ” so that a
small subset of testers can do the mathematical aspects of designing and
developing tests. This is analogous to construction design, where one
engineer creates a design that is followed by many carpenters, plumbers,
and electricians. The traditional testers and programmers can then do their
parts: finding values, automating the tests, running tests, and evaluating
them. This supports the truism that “testers ain’t mathematicians.”
The starting point in Figure 2.4 is a software artifact. This could be
program source, a UML diagram, natural language requirements, or even a
user manual. A criteria-based test designer uses that artifact to create an
abstract model of the software in the form of an input domain, a graph,
logic expressions, or a syntax description. Then a coverage criterion is
applied to create test requirements. A human-based test designer uses the
artifact to consider likely problems in the software, then creates
requirements to test for those problems. These requirements are sometimes
refined into a more specific form, called the test specification. For
example, if edge coverage is being used, a test requirement specifies which
edge in a graph must be covered. A refined test specification would be a
complete path through the graph.
Once the test requirements are refined, input values that satisfy the
requirements must be defined. This brings the process down from the
design abstraction level to the implementation abstraction level. These are
analogous to the abstract and concrete tests in the model-based testing
literature. The input values are augmented with other values needed to run
the tests (including values to reach the point in the software being tested,
to display output, and to terminate the program). The test cases are then
automated into test scripts (when feasible and practical), run on the
software to produce results, and results are evaluated. It is important that
results from automation and execution be used to feed back into test
design, resulting in additional or modified tests.
This process has two major benefits. First, it provides a clean separation
of tasks between test design, automation, execution and evaluation.
Second, raising our abstraction level makes test design much easier.
Instead of designing tests for a messy implementation or complicated
design model, we design at an elegant mathematical level of abstraction.
This is exactly how algebra and calculus has been used in traditional
engineering for decades.
Figure 2.5 illustrates this process for unit testing of a small Java method.
The Java source is shown on the left, and its control flow graph is in the
middle. This is a standard control flow graph with the initial node marked
as a dotted circle and the final nodes marked as double circles (this
notation will be defined rigorously in Chapter 7). The nodes are annotated
with the source statements from the method for convenience.
Figure 2.5. Example method, CFG, test requirements and test paths.

The first step in the MDTD process is to take this software artifact, the
indexOf() method, and model it as an abstract structure. The control flow
graph from Figure 2.5 is turned into an abstract version. This graph can be
represented textually as a list of edges, initial nodes, and final nodes, as
shown in Figure 2.5 under Edges. If the tester uses edge-pair coverage,
(fully defined in Chapter 7), six requirements are derived. For example,
test requirement #3, [2, 3, 2], means the subpath from node 2 to 3 and back
to 2 must be executed. The Test Paths box shows three complete test
paths through the graph that will cover all six test requirements.

2.6 WHY MDTD MATTERS

The MDTD represents several years of introspection and deep thinking


about the meaning and role of software testing. The first key insight was
that the definitions and applications of test criteria are independent of the
level of testing (unit, integration, system, etc.). This led to a powerful
abstraction process that greatly simplifies testing, and was a major
innovation of the first edition of this book. The analogy to the role of
algebra and calculus in traditional engineering gives very strong support to
the long-term viability of this idea.
This insight led us to a broader understanding of software testing
activities and tasks. The separation of human-based and criteria-based test
design is an important distinction, and the recognition that they are
complementary, not competitive, activities is key to this book. All too
often, academic researchers focus on criteria-based test design without
respecting human-based test design, and practitioners and consultants
focus on human-based test design without regard to criteria-based test
design. Unfortunately this artificial split has reduced communication to the
detriment of the field.
Figure 2.4 illustrates how viewing test design as separate from test
construction and execution can help distinguish test activities in
meaningful ways, and combine them in an efficient process. Just as with
software development and most traditional engineering activities, different
people can be assigned to different activities. This allows test engineers to
be more efficient, more effective, and have greater job satisfaction.
The four structures mentioned in Section 2.4 form the heart of this book.
Each is used in a separate chapter in Part II to develop methods to design
tests and to define criteria on the structures. The ordering in Part II follows
the RIPR model of Section 2.1. The first structure, the input domain, is
based on simple sets. The criteria in Chapter 6 help testers explore the
input domain and do not explicitly satisfy any of the RIPR conditions.
Chapter 7 uses graphs to design tests. The criteria require tests to “get to”
certain places in the graph, thus satisfying reachability. Chapter 8 uses
logic expressions to design tests. The criteria require tests to explore
various truth assignments to the logic expressions, thus requiring that the
tests not only reach the logic expressions, but also that they infect the state
of the program. Chapter 9 uses grammars to design tests. These tests are
not only required to reach locations and infect the program state, but the
infection must also propagate to external behavior. Thus each chapter in
Part II goes deeper into the RIPR model.

EXERCISES
Chapter 2.
1. How are faults and failures related to testing and debugging?
2. Answer question (a) or (b), but not both, depending on your
background.
(a) If you do, or have, worked for a software development
company, how much effort did your testing / QA team put into
each of the four test activities? (test design, automation,
execution, evaluation)
(b) If you have never worked for a software development company,
which of the four test activities do you think you are best
qualified for? (test design, automation, execution, evaluation)
2.7 BIBLIOGRAPHIC NOTES

The elementary result that finding all failures in a program is undecidable


is due to Howden [Howden, 1976].
The fault/failure model was developed independently by Offutt and
Morell in their dissertations [DeMillo and Offutt, 1993, Morell, 1990,
Morell, 1984, Offutt, 1988]. Morell used the terms execution, infection,
and propagation [Morell, 1984, Morell, 1990], and Offutt used
reachability, sufficiency, and necessity [DeMillo and Offutt, 1993, Offutt,
1988]. This book merges the two sets of terms by using what we consider
to be the most descriptive terms: reachability, infection, and propagation
(RIP). The first edition of this book stopped there, but in 2014 Li and
Offutt [Li and Offutt, 2016] extended the model by noting that automated
test oracles necessarily only look at part of the output state. Even when the
outputs are checked by hand, most humans will not be able to look at
everything. Thus, the failure is only revealed to the tester if the tester looks
at the “right” part of the output. Thus, this edition extends the old RIP
model to the RIPR model.
Although this book does not focus heavily on the theoretical
underpinnings of software testing, students interested in research should
study such topics more in depth. A number of the papers are quite old, and
often do not appear in current literature, and their ideas are beginning to
disappear. The authors strongly encourage the study of the older papers.
Among those are truly seminal papers in the 1970s by Goodenough and
Gerhart [Goodenough and Gerhart, 1975] and Howden [Howden, 1976],
and DeMillo, Lipton, Sayward, and Perlis [DeMillo et al., 1979, DeMillo
et al., 1978]. These papers were followed up and refined by Weyuker and
Ostrand [Weyuker and Ostrand, 1980], Hamlet [Hamlet, 1981], Budd and
Angluin [Budd and Angluin, 1982], Gourlay [Gourlay, 1983], Prather
[Prather, 1983], Howden [Howden, 1985], and Cherniavsky and Smith
[Cherniavsky and Smith, 1986]. Later theoretical papers were contributed
by Morell [Morell, 1984], Zhu [Zhu, 1996], and Wah [Wah, 1995, Wah,
2000]. Every PhD student’s adviser will certainly have his or her own
favorite theoretical papers.
The definition of unit is from Stevens, Myers and Constantine [Stevens
et al., 1974], and the definition of module is from Sommerville
[Sommerville, 1992]. The definition of integration testing is from Beizer
[Beizer, 1990]. The clarification for OO testing levels with the terms intra-
method, inter-method, and intra-class testing is from Harrold and
Rothermel [Harrold and Rothermel, 1994] and inter-class testing is from
Gallagher, Offutt and Cincotta [Gallagher et al., 2007].
Pimont and Rault’s switch cover paper was published in 1976 [Pimont
and Rault, 1976]. The British Computer Society standard that used the
term two-trip appeared in 1997 [British Computer Society, 2001]. Offutt et
al.’s transition-pair paper was published in 2003 [Offutt et al., 2003].
The research literature on model-based testing is immense and growing,
including a three-part special issue in Software Testing, Verification, and
Reliability, edited by Ammann, Fraser, and Wotawa [Ammann et al.,
2012a, Ammann et al., 2012b, Ammann et al., 2012c]. Rather than try to
discuss all aspects of MBT, we suggest starting with Utting and Legeard’s
2006 book, Practical Model-Based Testing [Utting and Legeard, 2006].
Good sources for issues about controllability and observability are
Freedman [Freedman, 1991] and Binder [Binder, 2000].

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy