FAULT DETECTABILITY
14.2 FAULT DETECTABILITY
Whereas complexity is a reliable indicator of fault proneness/fault density, the metrics we explore in this section reflect the ease of detecting the presence of faults (in a test- ing environment) or, conversely, the likelihood of fault sensitization (in an operating environment).
We consider a program g on space S and a specification R on S, and we let T be
a test data set which is a subset of dom (R). Assuming that program g is faulty, we define the following metrics that reflect the level of effectiveness of T in exposing faults in g:
• The P-Measure, which is the probability that at least one failure is detected through the execution of the program on test data T. • The E-Measure, which is the expected number of failures detected through the execution of the program on test data T. • The F-Measure, which is the number of elements of test data T that we expect to
execute on average before we experience the first failure of program g. These metrics can be seen as indicators of the effectiveness of test set T in exposing
program faults, but if we let T be a random test data set, then these metrics can be seen as characterizing the ease of exposing faults in program g.
These three metrics can be estimated in terms of the failure rate of the program (say, θ), which is the probability that the execution of the program on a random element s of the domain of R produces an image G (s) such that s, G s R. Under the assumption of random test generation (where the same initial state may be generated more than once), we find the following expressions:
• The P-Measure. The probability that n tests generated at random do not cause the program to fail is
1 −θ n . Hence the P-measure of the program is:
P = 1− 1−θ n
318 METRICS FOR SOFTWARE TESTING
• The E-Measure. The expected number of failures experienced through the exe- cution of the program on n randomly generated test data is:
E=n ∗ θ
• The F-Measure. Under the assumption of random test generation, the probability that the first test failure occurs at the ith test is 1 −θ i −1 θ . Statistical analysis
shows that for this probability distribution, the mean (F) and the median (F med ) of the number of tests before the first failure are respectively:
where is the ceiling operator in the set of natural numbers. All these calculations depend on an estimation of θ, the probability of failure of the
execution of the program on a randomly chosen initial state. This probability depends on the following two parameters:
• The set of initial states on which the candidate program g fails to satisfy spec- ification R; as we have seen in Chapter 6, this set is defined (in relational form) as dom R G
• The probability distribution of initial states. If the space of the program is finite and if the probability distribution is finite then the
probability of failure can be written as:
dom R dom R G θ = dom R
where represents set cardinality. As an illustrative example, we consider the space S defined by natural variables x, y, and z, and the following specification R on S.
R= s,s y = x + y z < 99
We let g be the following program on space S:
g: {y = x+y; z = y%100;}
whose function is:
G= s,s x=x y = x + y z = x + y mod 100
14.2 FAULT DETECTABILITY 319
To estimate the probability of failure () of this program, we compute dom R G . dom R G ={substitution} dom s, s y = x + y z < 99 x = x y = x + y z = x + y mod 100 ={simplification} dom s, s x + y mod 100 < 99 x = x y = x + y z = x + y mod 100 ={taking the domain}
s s x + y mod 100 < 99 x = x y = x + y z = x + y mod 100 ={logical simplification}
s x + y mod 100 < 99 s x = x y = x + y z = x + y mod 100 ={logical simplification}
s x + y mod 100 < 99 Taking the complement of this relation, we find: dom R G
={logic} s x + y mod 100 ≥ 99
={since a mod 100 function returns values between 0 and 99} s x + y mod 100 = 99 Hence θ=0.01, since 99 is one value out of 100 that the mod function may take. With
this value of probability failure, we can now compute the various measures of interest for a random test sample of size, say, 400:
• P-Measure. P = 1− 1−θ n
320 METRICS FOR SOFTWARE TESTING
• E-Measure.
E=n ∗ θ = 400 ∗ 0 01 = 4
• F-Measure. ○ Mean:
1 F= = 100 θ
○ Median:
F med = − log 2 = − log 2 = 68 9676 = 69
log 1 − θ
log 0 99
In other words, there is a 0.98205 probability that a random test of size 400 will expose at least one failure, the expected number of failures exposed by a random test of size 400 is four, the mean number of random tests before we observe the first failure is 100, and the median number of tests before we observe the first failure is 69.
Of course, in practice, these metrics are not computed analytically in the way we have just shown; rather they are estimated or derived empirically by extrapolating from field observations. Also regardless of how accurately we can (or cannot) estimate them, these metrics are useful in the sense that they enable us to reason about how easy it is to expose faults in a program and what test generation strategies enable us to opti- mize test removal effectiveness.