ADEQUACY OF TESTING
2.5 ADEQUACY OF TESTING
Testing gives designers and programmers much confidence in a software component or a complete product if it passes their test cases. Assume that a set of test cases
2.5 ADEQUACY OF TESTING
Figure 2.3 Different ways of comparing power of test methods: (a) produces all test cases produced by another method; (b) test sets have common elements.
T has been designed to test a program P . We execute P with the test set T . If T reveals faults in P , then we modify the program in an attempt to fix those faults. At this stage, there may be a need to design some new test cases, because, for example, we may include a new procedure in the code. After modifying the code, we execute the program with the new test set. Thus, we execute the test-and-fix loop until no more faults are revealed by the updated test set. Now we face a dilemma as follows: Is P really fault free, or is T not good enough to reveal the remaining faults in P ? From testing we cannot conclude that P is fault free, since, as Dijkstra observed, testing can reveal the presence of faults, but not their absence. Therefore, if P passes T , we need to know that T is “good enough” or, in other words, that T is an adequate set of tests. It is important to evaluate the adequacy of T because if T is found to be not adequate, then more test cases need to be designed, as illustrated in Figure 2.4. Adequacy of T means whether or not T thoroughly tests P .
Ideally, testing should be performed with an adequate test set T . Intuitively, the idea behind specifying a criterion for evaluating test adequacy is to know whether or not sufficient testing has been done. We will soon return to the idea of test adequacy. In the absence of test adequacy, developers will be forced to use ad hoc measures to decide when to stop testing. Some examples of ad hoc measures for stopping testing are as follows [13]:
• Stop when the allocated time for testing expires. • Stop when it is time to release the product. • Stop when all the test cases execute without revealing faults.
Figure 2.4 depicts two important notions concerning test design and evaluating test adequacy as follows:
44 CHAPTER 2 THEORY OF PROGRAM TESTING
Design a set of test cases T to test a program P.
Execute P with T.
Does T reveal
Yes
Fix the faults in P. If there
faults in P?
is a need, augment T with new test cases.
adequate test set? Augment T with new test cases.
Yes
Stop
Figure 2.4 Context of applying test adequacy.
• Adequacy of a test set T is evaluated after it is found that T reveals no more faults. One may argue: Why not design test cases to meet an adequacy criterion? However, it is important to design test cases independent of an adequacy criterion because the primary goal of testing is to locate errors, and, thus, test design should not be constrained by an adequacy criterion. An example of a test design criteria is as follows: Select test cases to execute all statements in a program at least once. However, the difficulty with such a test design criterion is that we may not be able to know whether every program statement can be executed. Thus, it is difficult to judge the adequacy of the test set selected thereby. Finally, since the goal of testing is to reveal faults, there is no point in evaluating the adequacy of the test set as long as faults are being revealed.
• An adequate test set T does not say anything about the correctness of a program. A common understanding of correctness is that we have found and fixed all faults in a program to make it “correct.” However, in practice, it is not realistic—though very much desirable—to find and fix all faults in a program. Thus, on the one hand, an adequacy criterion may not try
45 to aim for program correctness. On the other hand, a fault-free program
2.6 LIMITATIONS OF TESTING
should not turn any arbitrary test set T into an adequate test. The above two points tell us an important notion: that the adequacy of a test set be
evaluated independent of test design processes for the programs under test. Intu- itively, a test set T is said to be adequate if it covers all aspects of the actual computation performed by a program and all computations intended by its specifi- cation. Two practical methods for evaluating test adequacy are as follows:
• Fault Seeding: This method refers to implanting a certain number of faults in a program P and executing P with test set T . If T reveals k percent of the implanted faults, we assume that T has revealed only k percent of the original faults. If 100% of the implanted faults have been revealed by T , we feel more confident about the adequacy of T . A thorough discussion of fault seeding can be found in Chapter 13.
• Program Mutation: Given a program P , a mutation is a program obtained by making a small change to P . In the program mutation method, a series of mutations are obtained from P . Some of the mutations may contain faults and the rest are equivalent to P . A test set T is said to be adequate if it causes every faulty mutation to produce an unexpected outcome. A more thorough discussion of program mutation can be found in Chapter 3.