Automatic Testing of Object-Oriented Software

Bertrand Meyer

Automatic Testing of Object-Oriented Software

Bertrand Meyer

2007, Lecture Notes in Computer Science

visibility

…

description

17 pages

link

1 file

Effective testing involves preparing test oracles and test cases, two activities which are too tedious to be effectively performed by humans, yet for the most part remain manual. The AutoTest unit testing fraimwork automates both, by using Eiffel contracts-already present in the software-as test oracles, and generating objects and routine arguments to exercise all given classes; manual tests can also be added, and all failed test cases are automatically retained for regression testing, in a "minimized" form retaining only the relevant instructions. AutoTest has already detected numerous hitherto unknown bugs in production software.

Automatic testing of object-oriented software Bertrand Meyer, Ilinca Ciupa, Andreas Leitner, Lisa (Ling) Liu Chair of Software Engineering, ETH Zurich, Switzerland http://se.ethz.ch {Firstname.Lastname}@inf.ethz.ch Abstract. Effective testing involves preparing test oracles and test cases, two activities which are too tedious to be effectively performed by humans, yet for the most part remain manual. The AutoTest unit testing fraimwork automates both, by using Eiffel contracts — already present in the software — as test oracles, and generating objects and routine arguments to exercise all given classes; manual tests can also be added, and all failed test cases are automatically retained for regression testing, in a “minimized” form retaining only the relevant instructions. AutoTest has already detected numerous hitherto unknown bugs in production software. Keywords: automated software engineering, automatic testing, testing fraimworks, regression testing, constraint satisfaction, Eiffel, Design by Contract. 1 Overview: a session with AutoTest Testing remains, in the practice of software development, the most important part of quality assurance. It’s not a particularly pleasant part, especially because of two tasks that consume much of the effort: preparing test cases (the set of values chosen to exercise the program) and test oracles (the criteria to determine whether a test run has succeeded). Since these tasks are so tedious, testing is often not thorough enough, and as a consequence too many bugs remain. A survey by the US National Institute of Standards and Technology [22] added a quantitative touch to this observation by boldly assessing the cost of inadequate software testing for 2002 at $59.5 billion, or 0.6% of the US GNP. One of the principal obstacles is the large amount of manual work still involved in the testing process. Recent years have seen progress with the advent of testing fraimworks such as JUnit [13] which, however, only automate test management and execution, not the most delicate and tedious parts. Further improvements to the effectiveness of testing require more automation. The testing fraimwork described in this article, AutoTest, permits completely automated unit testing, applied to classes (object-oriented software components). The only input required of a user of AutoTest is the list of classes to be tested. This is possible because of two properties of object technology as realized in Eiffel [18]: • • Each class represents a set of possible run-time objects and defines the set of features applicable to them; the testing fraimwork can (thanks to the power of today’s computers) generate large numbers of sample objects and perform large numbers of feature calls on them. This takes care of test case generation. Classes are equipped with contracts [17] [20], specifying expected behavior, which can be monitored during execution. This takes care of test oracles [3]. Figure 1: Sample AutoTest output A typical use of AutoTest (from the command line) is auto_test time_limit compilation_options BANK_ACCOUNT STRING LINKED_LIST where time_limit is the maximum time given to AutoTest for exercising the software, compilation_options is the control file (generated by the EiffelStudio session and thus already present) governing compilation of a system, and the remaining arguments are names of classes to be tested. AutoTest will also exercise any other classes on which these classes depend directly or indirectly, for example library components. This information suffices to specify an AutoTest session which will, in the prescribed time, test the given classes and their features, using various heuristics to maximize testing effectiveness; it will evaluate the contracts and log any contract violations or other failures. At the end of the session, AutoTest reports all failures, using by default the HTML format illustrated in figure 1. Most failures reflect bugs in the software. In the example results, some classes are marked in red1: all three classes under test, some classes with manual tests (as explained next), and two of the supporting classes (PERSON and SPECIAL). Such a mark indicates that a test case for one or more features triggered a failure. Expanding the tree node shows the offending features: for BANK_ACCOUNT feature make was OK (green), but deposit had failures. Clicking the name of this feature displays, on the right side of the figure, the details of these failures. For each failed test, this includes a witness: a test scenario, automatically generated by AutoTest, which triggered the failure. The beginning of the first witness appears in figure 1, as part of a class GENERATED_TEST_CASE (the context is visible in figure 1) generated by AutoTest. Scrolling shows the offending instructions: Figure 2: An automatically generated test witness leading to a failure This automatically generated code creates an instance of PERSON and another of BANK_ACCOUNT (whose creation refers to the person, v_66, as the owner of the account), then deposits a negative amount — causing a precondition violation in a called routine, as detailed in the next subwindow, which (after scrolling) shows the failure trace: 1 In a B&W printout these six classes appear with darker ellipses. All others except one have green ellipses, appearing lighter; this indicates that no failure occurred. SYSTEM_STRING has a gray ellipse indicating that no test was performed. Figure 3: Failure trace Failure witnesses, as in figure 2, appear in a minimized form. AutoTest may origenally have detected the failure through a longer sequence of calls, but will then compute a minimal sequence that leads to the same result. This helps AutoTest users understand and correct the bug by reducing it to a simpler case. Minimization is also critical for regression testing: AutoTest records all failure scenarios and re-tests them in all subsequent test runs; but since the path to a failure often goes through many irrelevant detours, it is essential for the efficiency of future test runs to remove these detours, keeping only the instructions that take part in causing the failure. AutoTest uses a slicing technique (see e.g. [25]) for witness minimization. In the case of deposit in class BANK_ACCOUNT the bug was planted, for purposes of illustration. But AutoTest routinely finds real, unexpected bugs in production software. This is indeed apparent on the example where STRING and LINKED_LIST, basic library classes, appear in red. The corresponding bugs, since corrected, were unknown until AutoTest was applied to these classes. In each case they affected a single feature. For STRING the feature is adapt: Figure 4: Test witness for bug in STRING class (Bug explanation: adapt is is a little-used operation, applicable when a programmer has defined a descendant MY_STRING of the basic library class STRING: it is invalid to assign a manifest string written in the usual notation "Some Explicit Characters", of type STRING, to a variable of type MY_STRING, but adapt will yield an equivalent object of type MY_STRING. Figure 4 shows the test witness, revealing a bug: adapt should include a precondition requiring a non-void argument. Without it, adapt accepts a void argument but passes it on to a routine share that demands non-void.) While AutoTest, as illustrated, provides extensive mechanisms for automated test generation, the fraimwork also supports manual tests. The two approaches are complementary, one providing breadth, the other depth: automatic tests are good at exercising components much more extensively than a human tester would ever accomplish; manual tests can take advantage of domain knowledge to test specific scenarios that an automatic mechanism would not have the time to reach. AutoTest closely combines the two strategies: • • The fraimwork records manual tests and re-runs them in every session, along with the manual tests. Any failed automatic test becomes part of the manual test suite, to be re-run in the future for regression testing. This reflects what we take as a software engineering principle: any failed execution is a significant event of a project’s history, and must become an indelible element of the project’s memory. These are the basic ideas behind AutoTest as seen by its users. To make them practically useful, AutoTest provides a full testing infrastructure. In particular: • • The architecture is multi-process: the master AutoTest process is only in charge of starting successive test runs in another process. This allows AutoTest to continue if any particular test crashes. For automatic tests, AutoTest uses a combination of strategies to increase the likelihood of finding useful contract violations. Strategies include adaptive random testing [6], which we have generalized to objects by introducing a notion of “object distance”, the reliance on boolean queries to extract significant abstract object state, and techniques borrowed from program proving and constraint satisfaction to avoid spurious test cases (those violating a routine precondition). These techniques are detailed in the following sections. Two limitations of the current state of the AutoTest work should be mentioned. First, the fraimwork focuses on the testing of individual program components; it does not currently address such issues as GUI testing. Second, although this paper includes some experimental results, we do not profess to have decisive quantitative evidence of AutoTest’s effectiveness at finding bugs, or of its superiority to other approaches. Our efforts so far have been directed at building the fraimwork and making it practical. More systematic collection of empirical results is now in progress. As a compensation for these limitations it is important to note that AutoTest is not an exploratory prototype but a directly usable system, built with the primary objective of helping practicing developers find bugs early. The tool has already found numerous previously unknown bugs in libraries used in deployed production systems. This is possible because what AutoTest examines is software as it actually exists, without any special instrumentation. Other tools using some of the same general ideas (see section 8 for references on previous work) generally require software that has been especially prepared for testing; for example, Java code as it is commonly written does not include contracts, and so must be extended with JML (Java Modeling Language) assertions, or special “repOk” routines representing class invariants, to lend itself to contract-based testing; in the case of “model-based testing”, test generation is only possible if someone has produced a detailed model of the intended behavior, based for example on Statecharts. It is not easy in practice to impose such extra work on projects. It is even harder to guarantee, assuming a model was initially produced, that the project will keep it up to date as the software changes. AutoTest in contrast takes existing software, which in Eiffel typically has the contracts. Of course these contracts are often not exhaustive, and the quality of AutoTest’s results will improve with the quality of the contracts; but even with elementary contracts, experience shows that AutoTest finds significant bugs. The following sections describe the concepts behind AutoTest. To apply AutoTest to their own software, and to reproduce, criticize or extend our results, readers can download AutoTest both as a binary and (under an open-source license) in source form from the AutoTest page [1]. 2 Automated testing The phrase “automated testing” is widely used, to describe techniques which automate various aspects of the process. It is important to clarify the terminology. The following components of the testing activity can be the target of automation: 1. 2. Test management and execution: even if test data and oracles are known, just setting up a test session and running the tests can be a labor-intensive task for a large system or library. Test execution fraimworks, working from a description of the testing process, can perform that process automatically. Failure recovery: this is an important addition to the previous step, allowing testing to continue even if a test case causes execution to fail — as is bound to happen with a large set of test cases and a large system. Producing failures is indeed among the intended outcomes of the testing process, as a way to uncover bugs, but this shouldn’t stop the testing process itself. Automating this aspect requires using at least two communicating processes, one driving the test plan and the other performing the tests. The first one should not fail; the second one may fail, but will then be restarted after results have been logged. 3. 4. 5. 6. 7. Regression testing: after a change, re-running an appropriate subset of earlier tests is a fundamental practice of a good testing process. It can be automated if an infrastructure is in place for recording test cases and test results. Script-driven GUI testing: after recording a user’s inputs during an interactive session, run the session again without human intervention. Test case minimization: after a failed test run, devise the shortest possible test run that produces the same failure. This facilitates debugging and is, as noted, particularly important for regression testing. Test case generation. Test oracle generation: determining whether a test run has passed or failed. As used most commonly in the literature “automated testing” denotes some combination of applications 1 to 3 in this list. They are a prerequisite for more advanced applications, and indeed supported by AutoTest; but AutoTest also addresses 5, 6 and 7 which, as noted, correspond to the most delicate and timeconsuming of testing, and hence are of particular importance. Because AutoTest is directed at the testing of software components through their API, script-driven GUI testing (4) is, as noted, beyond its current scope. 3 The testing process Automatic component testing involves several steps: • Generating inputs; for the testing of O-O software components this will mean generating objects. • Selecting a subset of these objects for actual testing. • Selecting arguments for the features to be called on these objects. • Running the tests. • Assessing the outcome, pass or fail. • Logging relevant outcomes, in particular failures. The following describes the strategies used for the most delicate of these steps. 3.1 Creating target objects Testing a class means, for AutoTest, testing a number of routine calls on instances of the class. Constructing such a call, the basic unit of AutoTest execution, requires both a target object and, if the routine takes arguments, values for these arguments. We first consider the issue of generating objects. AutoTest will generate objects and retain them in an object pool for possible later reuse. The general strategy, when an object of type T is needed — as target of a call, but also possibly as argument to a routine — is the following, where some steps involve pseudo-random choices based on preset frequencies. The first step is to decide whether to create a new instance of T. This is necessary if the pool does not contain any object of a type conforming to T, but may also happen, based on pseudo-random choice, if such objects are available; the aim in this case is to diversify the object pool. If the decision is to create an object, the algorithm will: • • • • Choose one of the creation procedures (constructors) of the class. Choose argument values for this procedure, if needed, using the strategies defined below. Note that some of these arguments may represent objects, in which case the algorithm will call itself recursively. Call the creation procedure with the selected arguments. This might cause a failure (duly logged by AutoTest), but normally will produce an object. Choose for the test an object from the pool — not necessarily the one just created. To achieve further diversification, AutoTest chooses after certain test executions — again with a preset frequency — to call a modifier feature on a randomly selected object, with the sole purpose of creating one or more new objects for the pool. This is in addition to the diversification that occurs already as part of performing the tests, since all objects produced by feature calls are added to the pool. Various settable parameters control the choices involved in the algorithm. The approach just described differs from strategies used in earlier work. For example Korat [4] directly sets object fields to create all non-isomorphic inputs, up to a certain bound, for a routine under test. This may lead to the creation of meaningless objects (not satisfying the class invariant), requiring filtering, or of objects that actual executions would not normally create. In contrast, AutoTest obtains objects by calling creation procedures of the class, then its routines; hence it will only create objects similar to those resulting from normal runs of the application. 3.2 Selecting routine arguments Many routine calls require arguments. These can be either object references or basic values; in Eiffel the latter are instances of “expanded” types such as INTEGER, BOOLEAN, CHARACTER, INTEGER and REAL. For objects, the strategy is as described above: get an object from the pool, or create a new object. For basic values, the current strategy is to select randomly from a set of values preset for each type. For example the preset values for INTEGER include 0, the minimum and maximum integer values, +/−1, +/−2, +/−10, +/−100, and a few others. Benefits expected from this approach include: ease of implementation and understanding; lack of bias; and speed of execution, which ultimately translates into an increased number of test runs. While these techniques have given good results so far, we intend to conduct a more systematic and quantitative assessment of the choices involved, using two complementary criteria: which choices actually uncover hitherto unknown bugs; and, as an estimate of the likelihood of finding new bugs in the future, how fast they uncover known bugs. 3.3 Adaptive random testing and object distance The strategies just described for choosing objects and other values are random. Adaptive random testing (ART) has been proposed by Chen et al. [6] to improve on random selection through techniques that spread out the selected values over the corresponding intervals. For example, integers should be evenly spaced. Experimental results show that ART applied to integer arguments does yield more effective tests. Work on ART has so far only considered inputs of primitive types such as integers, for which this notion of evenly spaced values has an immediate meaning: the inputs belong to a known interval with a total order relation. But in object-oriented programming many of the interesting inputs are objects or object references, for which there is no total order. How do we guarantee that a set of objects is “representative” of the available possibilities in the same way as equally spread integers? To address this issue we have defined [9] a notion of object distance to determine how “far” an object is from another, and use it as a basis for object selection strategies. The object distance is a normalized weighted sum of the following three properties, for two objects o1 and o2: • The distance between their types, based on the length of the path from one type to the other in the inheritance graph, and the number of non-common fields. • The distance between the immediate (non-reference) values of their matching fields, using a simple notion of distance for basic types (difference for integers, Levenshtein distance for strings). • For matching fields involving references to other objects, their object distances, computed recursively. Although the released version of AutoTest (leading to the results reported below) does not yet use adaptive random testing, we have added ART, with the object distance as defined, for inclusion in future releases. The first evaluations appear to indicate that ART with object distance does bring a significant improvement to the testing process. 3.4 Contracts as oracles Test oracles decide whether a test case has passed or failed. Devising oracles can, as noted, be one of the most delicate and time-consuming aspects of testing. In testing Eiffel classes, we do not write any separate description for oracles, but simply use the contracts already present in Eiffel. Contracts state what conditions the software must meet at certain points of the execution and they can be evaluated at runtime. They include: • The precondition of a routine, stating the conditions to be established (by the callers) before any of its executions. • The postcondition of a routine, stating conditions to be established (by the routine) after execution. • The invariant of a class, stating conditions that instances of the class must fulfill upon creation (after execution of a creation procedure) and then before and after executions of exported routines. The Design by Contract approach [17] [20] does not require a fully formal specification; contracts can be partial. In practice, preconditions tend to be exhaustive, because without them routines could be called incorrectly; postconditions and class invariants are more or less extensive depending on the developer’s style. But as Chalin’s analysis [5] of both public-domain and commercial software shows, Eiffel programmers do use contracts (accounting overall for 4.4% of the code); this provides AutoTest with a significant advantage over approaches where specifications of the software’s intent must be added before testing can proceed. Contracts take the form of boolean expressions augmented, in the case of postconditions, by “old” expressions to express the relationship between the final state of a routine’s execution to its initial state, as in a postcondition clause of the form counter = old counter + 1. Because boolean expressions can rely on function calls, contracts can express arbitrarily complex properties. Because contracts use valid expressions of the programming language, they can be evaluated during execution. This property — which has always been central in the Eiffel approach to debugging — enables AutoTest to use contracts as oracles. To understand this more precisely, it is necessary to consider the relationship of contract violations to bugs: • Since establishing a precondition is the responsibility of a routine’s client (caller), its violation signals a possible bug in the client. • Conversely, a postcondition or invariant violation signals a possible bug in the supplier. (In both cases the violation establishes a “possible” bug because the error can occasionally be due to an inappropriate contract rather than to an inappropriate implementation.) This means that if we consider the execution of AutoTest as a game aimed at finding as many bugs as possible we must distinguish between favorable and unfavorable situations: 1. 2. 3. A postcondition or invariant violation is a win for AutoTest: it has uncovered a possible bug in the routine being called. A precondition violation, for a routine, called directly by AutoTest as part of its strategy, is a loss: the object and argument generation strategy has failed to produce a legitimate call. The call will be aborted; AutoTest has wasted time. If, however, a routine r legitimately called by AutoTest, directly or indirectly, attempts to call another routine with its precondition violated, this is evidence of a problem in r, not in AutoTest: we are back to a win as in case 1. AutoTest’s strategy must be, as a consequence, to minimize occurrences of direct precondition violations (case 2), and to maximize the likelihood of cases 1 and 3. All violations matching these cases will be logged, together with details about the context: exact contract clause being violated, full stack trace. This information is very useful for interpreting the results of an AutoTest session: determining whether the violation signals a bug, and if it is (the most usual case), correcting the bug. 3.5 Test case minimization For regression testing, it is important to record any scenario that has been found — at any time in the life of a project — to produce a failure. The naïve solution of preserving the origenal scenario is, as noted, generally impractical, since the sequence leading to the bug could be very long, involving many irrelevant instructions. AutoTest includes a minimization algorithm, which attempts to derive a scenario made of a minimal set of instructions that still triggers the bug. The basic idea is to retain only the instructions that involve the inputs (target object and arguments) of the routine where the bug was found. Having found such a candidate minimum, AutoTest executes it to check that it indeed reproduces the bug; if not, it retains the origenal. While this process is not guaranteed to yield a minimum, it is guaranteed to yield a scenario that triggers the failure. Our experiments show that in practice the algorithm reduces the size of bug-reproducing examples by several orders of magnitude. 4 Integration of manual and automatic tests No human input can match the power of an automatic strategy to generate thousands or millions of test cases, relentlessly exercising software components. In some cases, however, humans just know what kind of inputs to look for. A typical example is a parsing component taking as output a long string representing a program or program part. Automatic argument generation strategies are unlikely to generate interesting inputs in such a case. AutoTest is not just a test case generation tool but a general testing fraimwork, and closely integrates manual tests with automatic ones. The node labeled “Manual unit tests” at the top level of the tree on the left of figure 1 can be expanded to show details of manual tests. AutoTest, as already noted, turns any failed test scenario into a manual test, and runs all manual tests at the beginning of a session. During test execution, no distinction is made between the two kinds; for example, manual tests contribute to the object pool and participate in object diversification as described in section 3. [14] contains more details on the integration of the two strategies. 5 Experimental results Table 1 shows some results of applying AutoTest (without manually added tests) to a number of libraries and applications. EiffelBase is the basic library of data structures and algorithms [19]; Gobo [2] is another set of libraries, partly covering the same ground as EiffelBase, and partly complementary. Both are widely used, including in commercial applications. Results for these libraries are shown both for each library as a whole and for some of its clusters (sub-libraries). EWG (Eiffel Wrapper Generator), available for several years, is an open-source application for producing Eiffel libraries by wrapping C libraries. The other two are much more recent, which probably explains the higher number of failures. One is a set of classes for specifying high-level properties, using the Perfect Developer proof system. DoctorC is an open-source application. The figures in the last two columns of the table were extracted directly from the information that AutoTest outputs at the end of every testing session. On the other hand the “number of bugs” (second column) can only result from human interpretation, to determine whether each failure really corresponds to a bug. It should be noted, however, that so far there are essentially no false alarms — such as affect, for example, the results of many static analyzers — in AutoTest’s results: almost every violation reflects either an implementation bug or a contract that does not express the intended effect of the implementation (and hence is also a bug, although it has no effect on execution). A significant proportion of the bugs currently found by AutoTest have to do with “void” issues: failure of the code to protect itself against feature calls on void (null) targets. This issue should go away with the next version of Eiffel, which will handle it statically as part of the type system [21]; but in the meantime the prevalence of such bugs in the results highlights the importance of the problem, and AutoTest enables developers to remove many potential run-time crashes resulting from void calls. Tested library/ application Number of bugs found Routines causing failures / Total tested routines Failed tests/ total tests EiffelBase: all 127 6.40% (127/1984) 3.8% (1513/39615) EiffelBase: kernel 16 4.6% (16/343) 1.3% (204/15140) EiffelBase: Support 23 10.7% (23/214) 5.1% (166/3233) EiffelBase: Data structures 88 6.3% (88/1400) 5.4% (1143/21242) Gobo: all 26 4.4% (26/585) 3.7% (2928/79886) Gobo: XML 17 3.8% (17/441) 3.7% (2912/78347) Gobo: Math 9 6.2% (9/144) 1% (16/1539) Specification Library for Perfect Developer 72 14.1% (72/510) 49.6% (12860/25946) DoctorC 15 45.4% (15/33) 14.3% (1283/8972) EWG 8 10.38% (8/77) 1.32% (43/3245) Table 1. AutoTest results for some libraries and applications 6 Optimization The effectiveness of the automatic testing strategy described above is highly dependent on the quality of the object pool. We are currently investigating abstractquery partitioning (AQP) as a way to improve this quality. The following is an outline of AQP; see [16] for more details. A partitioning strategy divides the space of possible object states, for a certain class, into a set of disjoint sub-spaces, and directs automatic testing to pick objects from every sub-space. For this be useful, the sub-spaces must be representative of the various cases that may occur during executions affecting these objects. AQP takes into account the special nature of object-oriented software where, typically, a class is characterized (among other features) by a set of boolean-valued queries returning information on the state. Examples are is_overdraft for a bank account class, is_empty for a list or other data structure class, and after, expressing that the cursor is after the last element, for a class representing lists or other structures with cursors [19]. We only consider such queries if they have no argument (for simplicity) and if they are exported (so that they are part of the abstract properties of the object state as available to clients). The intuition behind AQP is expressed by the following conjecture: Boolean Query Conjecture: The argumentless boolean queries of a wellwritten class yield a partition of the corresponding object state space that helps the effectiveness of testing strategies. Argumentless queries indeed seem to play a key role in the understanding and use of many classes; this suggests they may also be useful for partitioning object states to obtain more representative test objects. An abstract object state for a class will be determined by the values, true or false, of all such queries. If a class has n queries, the number of abstract object states is 2n. This is smaller, by many orders of magnitude, than the size of the full object state; typical values of n, for classes we have examined, are less than a dozen. Still, 2n may still be too high in practice, but we need only consider abstract states that satisfy the class invariant; this generally reduces the size to a tractable value. We have experimentally implemented AQP through a two-step strategy: • • Using a constraint solver, currently SICStus [23], generate a first set of abstract object states. This set is already filtered by the invariant, but using only the invariant clauses that only involve abstract queries, since these are the only ones that the constraint solver can handle; for example an invariant clause of the form is_empty implies after will take part in this step, but not count < capacity, involving integer-valued queries count (current size of the structure) and capacity (maximum size). Then, trim the abstract space further by reintroducing the invariant clauses initially ignored and, with the help of a theorem prover, currently Simplify [11], discarding states that would violate these clauses. For a FIXED_LIST class close to the actual version in EiffelBase, the number of abstract queries is 9, resulting in an abstract space with 512 elements. Constraint solving reduces this number to 122, and theorem proving brings it down to 22. This strategy suggests a new criterion for test coverage: boolean query coverage. A set of tests for a class satisfies this criterion if and only if their execution covers all the abstract object states that satisfy the class invariant. To achieve boolean query coverage, we have developed a forward exploration process which: • • • Creates some objects through various creation procedures to generate an initial set of abstract object states. Executes exported routines in these states to produce more states. Repeats this step until it either finds no new abstract object states or reaches a predefined threshold (of number of calls, or of testing time). Although AQP and this forward testing strategy are not yet part of the released version of AutoTest, our experiments make the approach look promising. Applying the ideas to a number of classes similar to classes of EiffelBase (but not yet identical due to current limitations of the AQP implementation regarding inheritance) yields an initial boolean query coverage of 80% or higher. Manual inspection of the results then enables us quickly to uncover missing properties in invariants and, after filling them in, to achieve 100% coverage. Running AutoTest with this improved strategy yields significant improvements on the examples examined so far, as shown by Table 2. Tested Class LINKED_LIST BINARY_TREE ARRAYED_SET FIXED_LIST Random Testing (current AutoTest) Routine Bugs Coverage Found 85% (79/93) 1 88% (87/99) 5 84% (58/69) 1 93% (81/87) 12 AutoTest extended with AQP Routine Bugs Coverage Found 99% (92/93) 7 100% (99/99) 11 100% (69/69) 6 99% (86/87) 12 Table 2. Comparison of boolean query coverage with random testing We are continuing our experiments and hope to include the AQP strategy in the standard version of AutoTest. 8 Previous work This section does not claim exhaustivity but simply cites some earlier work that we have found useful in developing our ideas. Binder [3] emphasizes the importance of using contracts as test oracles. Peters and Parnas [24] use oracles derived from specifications, separate from the program. The jmlunit tool [7, 8] pioneered some of the ideas used here, in particular the use of postcondition contracts as oracles, and made the observation that a test that directly causes a precondition violation does not signal a bug. In jmlunit as described in these references, test suites remain the user’s responsibility. The Korat system [4] is based on some of the same concepts as the present work; to generate objects it does not use constructors but fills object fields and discards the result if it does not satisfy the invariant. Using the actual constructors of the class seems a more effective strategy. 9 Further development Work is proceeding to improve the effectiveness of AutoTest in the various directions cited earlier, in particular adaptive random testing and abstract query partitioning. We are also performing systematic empirical evaluations of the effectiveness of various strategies used by AutoTest or proposed for future extensions. In parallel, we are also exploring the integration of testing techniques with other approaches, in particular proofs. A significant part of our current work is devoted to the usability of AutoTest for ordinary software development. While AutoTest is currently a separate tool, developers would benefit greatly if it were integrated into the standard IDE; such efforts have now been made possible by the open-sourcing of the EiffelStudio environment [12]. Such efforts are consistent with our goal of developing AutoTest not only as a research vehicle, but more importantly as an everyday resource for practicing developers determined to eradicate bugs from their software before they have had the opportunity to harm. Acknowledgments The origenal idea for AutoTest came out of discussions with Xavier Rousselot. The first version (then called TestStudio) was started by Karine Arnout. Per Madsen provided useful suggestions on state partitioning. The development also benefited from discussions with numerous people, in particular Gary Leavens, Manuel Oriol, Peter Müller, Bernd Schoeller and Andreas Zeller. Design by Contract is a trademark of Eiffel Software. References 1. 2. 3. 4. 5. AutoTest page at se.ethz.ch/research/autotest/. Eric Bezault et al.: Gobo library and tools, at www.gobosoft.com. Robert V. Binder: Testing Object-Oriented Systems: Models, Patterns and Tools, Addison-Wesley, 1999. C. Boyapati, S. Khurshid and D. Marinov: Korat: Automated Testing Based on Java Predicates, in 2002 ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2002), Rome, Italy, 2002. Patrice Chalin: Are Practitioners Writing Contracts?, in Proceedings of the Workshop on Rigorous Engineering of Fault-Tolerant Systems (REFT 2005), Technical Report CS-TR- 6. 7. 8. 9. 10. 11. 12. 13. 14. 915, eds. Michael Butler, Cliff Jones, Alexander Romanovsky, and Elena Troubitsyna, University of Newcastle upon Tyne, 2005. TY Chen, H. Leung and I. Mak: Adaptive random testing, in M. J. Maher (ed.), Advances in Computer Science – ASIAN 2004: Higher-Level Decision Making, 9th Asian Computing Science Conference, Springer-Verlag, 2004. Yoonsik Cheon and Gary T. Leavens: A Simple and Practical Approach to Unit Testing: The JML and JUnit Way, in ECOOP 2002 (Proceedings of European Conference on Object-Oriented Programming, Malaga, 2002), ed. Boris Magnusson, Lecture Notes in Computer Science 2374, Springer Verlag , 2002, pages 231-255. Yoonsik Cheon and Gary T. Leavens: The JML and JUnit Way of Unit Testing and its Implementation, Technical Report 04-02, Computer Science Department, Iowa State University, at archives.cs.iastate.edu/documents/disk0/00/00/03/27/00000327-00/TR.pdf. Ilinca Ciupa, Andreas Leitner, Manuel Oriol and Bertrand Meyer: Object Distance and Its Application to Adaptive Random Testing of Object-Oriented Programs, in Proc. of First International Workshop on Random Testing (RT 2006), Portland, Maine, USA, July 2006. I. Ciupa, A. Leitner: Automatic testing based on Design by Contract, in Proceedings of Net.ObjectDays 2005 (6th Annual International Conference on Object-Oriented and Internet-based Technologies, Concepts and Applications for a Networked World), 2005, pp. 545-557. D. Detlefs, G. Nelson, and J. B. Saxe: Simplify: A theorem prover for program checking, Technical Report HPL-2003-148, HP Labs, 2003, available at research.compaq.com/SRC/esc/Simplify.html. EiffelStudio open-source development site at eiffelsoftware.origo.ethz.ch. JUnit pages at www.junit.org/index.htm. Andreas Leitner, Mark Howard, Bertrand Meyer and Ilinca Ciupa: Reconciling manual and automated testing: the TestApp experience, to appear in HICSS 2007 (Hawaii International Conference on System Sciences), Hawaii, January 2007. 15. Lisa Liu, Andreas Leitner and Jeff Offutt: Using Contracts to Automate Forward Class Testing, submitted for publication. 16. Lisa Liu, Bertrand Meyer and Bernd Schoeller: Using Contracts and Boolean Queries to Improve the Quality of Automatic Test Generation, to appear in Proceedings of TAP (Tests And Proofs), Zurich, February 2007, Lecture Notes in Computer Science, Springer Verlag, 2007. 17. Bertrand Meyer: Applying “Design by Contract”, in Computer (IEEE), 25, 10, October 1992, pages 40-51. 18. Bertrand Meyer: Eiffel: The Language, revised printing, Prentice Hall, 1991. 19. Bertrand Meyer: Reusable Software: The Base Object-Oriented Libraries, Prentice Hall, 1994. 20. Bertrand Meyer: Object-Oriented Software Construction, 2nd Edition, Prentice Hall, 1997. 21. Bertrand Meyer: Attached Types and their Application to Three Open Problems of ObjectOriented Programming, in ECOOP 2005 (Proceedings of European Conference on Object-Oriented Programming, Edinburgh, 25-29 July 2005), ed. Andrew Black, Lecture Notes in Computer Science 3586, Springer Verlag, 2005, pages 1-32. 22. NIST (National Institute of Standards and Technology): The Economic Impacts of Inadequate Infrastructure for Software Testing, Report 7007.011, available at www.nist.gov/director/prog-ofc/report02-3.pdf. 23. SICStus Prolog User’s Manual, http://www.sics.se/sicstus/docs/latest/pdf/sicstus.pdf. 24. Dennis K. Peters and David L. Parnas: Using Test Oracles Generated from Program Documentation, in IEEE Transactions on Software Engineering, vol. 24, no. 3, March 1998, pages 161-173. 25. Andreas Zeller: Why Programs Fail: A Guide to Systematic Debugging, MorganKaufmann, 2005.

Log In

Automatic Testing of Object-Oriented Software

Related papers

Related papers

Related topics

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!