Reliability Estimation and Reliability Improvement
13.3.5 Reliability Estimation and Reliability Improvement
The purpose of this section is to discuss how we can use testing to estimate the reli- ability of a software product and how to use testing and fault removal to improve the reliability of a product up to a predefined standard.
If we are given • a software product,
• a specification that describes its requirements,
296 TEST OUTCOME ANALYSIS
• an oracle that is derived from the specification, and • a usage profile of the product in the form of a probability distribution over the
domain of the specification, then the simplest way to estimate its reliability is to run the product on randomly
generated test data according to the given usage profile, to record all the failures of the product, and to compute the average time (number of tests/executions) that elapses between successive failures. One may argue that what we are measuring herein is actually the MTBF rather than the MTTF; while we agree that strictly speak- ing this experiment is measuring the MTBF, we argue that it provides an adequate indication of the mean time to failure.
The simple procedure outlined herein applies when the software product is unchanged throughout the testing process and our purpose is to estimate its mean time to failure as is. We now consider the case of a software product which is due to undergo a system-level test for the purpose of removing faults therein until the system’s reliability reaches or exceeds a target reliability requirement. This process applies to the aggregate made up of the following artifacts:
• the software product under test, • the specification against which the product is being tested, and the oracle that
is derived from this specification, • a usage profile of the product in the form of a probability distribution over the domain of the specification, • a target reliability requirement that the product must reach or exceed upon delivery, in the form of a MTTF,
and iterates through the following cycle until the estimated product reliability reaches or exceeds the target MTTF requirement:
• Run the software product on randomly generated test data according to the pro- duct’s anticipated usage pattern and deploy the oracle derived from the selected specification, until a failure is disclosed by the oracle.
• Analyze (off-line) the failure, identify the fault that caused it, and remove it. • Compute a new estimate of the product reliability, in light of the latest
removed fault. The first step of this iterative process can be automated by means of the following
test driver:
void testRun (int runLength) {stateType initS, s; bool moreTests; runLength=0; moreTests=true; while moreTests
{generateRandom(s); inits=s; runLength++;
13.3 STOCHASTIC CLAIMS: FAILURE PROBABILITY 297 g();
// modifies s, preserves initS moreTests = oracle(initS,s);} runLength--;}
The second step is carried out off-line; each execution of the test driver and removal of the corresponding fault is referred to as a test run. As for the third step,
a question of how we estimate/update the reliability of the software product after each test run is raised. The obvious (and useless) answer to this question is: it depends on what fault we have removed; indeed, we know that the impact of faults on reliability varies a great deal from one fault to another; some faults cause more frequent failures than others, hence their removal produces a greater increase in reliability.
Cleanroom reliability testing assumes that each fault removal increases the mean time to failure by a constant amount starting from a base value and uses the testing phase to estimate the initial mean time to failure as well as the ratio by which the mean time to failure increases after each fault removal. This model is based on the following assumptions:
• Unit testing is replaced by static analysis of the source code, using verification techniques similar to those we discuss in Chapter 5. • Reliability testing replaces integration testing and applies to the whole system, in which no part has been previously tested. • Reliability testing records all executions of the system starting from its first execution.
It we let MTTF 0 be the mean time to failure of the system upon its integration and we denote by MTTF N the mean time to failure of the system after the removal of the N first faults (resulting from the N first failures), then we can write (according to our modeling assumption):
MTTF
N = MTTF 0 ×R ,
where R is the reliability growth factor, which reflects by what multiplicative factor the MTTF grows, on average, after each fault is removed. At the end of N runs, we know what N is, of course; we need to determine the remaining constants, namely
MTTF 0 and R. To determine these two constants, we use the historic data we have collected on the N first runs, and we take a linear regression on the logarithmic version of the equation above:
log MTTF N = log MTTF 0 + N × log R We perform a linear regression where log(MTTF N ) is the dependent variable and N
is the independent variable. For the sake of argument, we show in the following table a sample record of a reliability test, where the first column shows the ordinal of the runs
298 TEST OUTCOME ANALYSIS
and the second column shows the length of each run (measured in terms of the number of executions before failure).
Inter-failure run
In the third column, we record the logarithm of the length of the test runs. When we perform a regression using the third column as dependent variable and the first column as independent variable, we find the following result:
Log MTTF = 0 95 + 0 52 N
Figure 13.5 gives a graphic representation of the regression, on a logarithmic scale. The least squares linear regression gives the Y intercept and the slope of the regression
as follows:
Active Model Conf. interval (mean 95%)
Conf. interval (obs. 95%)
Figure 13.5 Regression log(MTTF) by N.
13.3 STOCHASTIC CLAIMS: FAILURE PROBABILITY 299
MTTF 0 = 10 0 95 = 8 83 R = 10 0 52 = 3 35
From this, we infer the mean time to failure at the conclusion of the testing phase as follows:
MTTF = 8 83 × 3 35 5 = 3725
In other words, if this software product is delivered in its current form, it is expected to execute 3725 times before its next failure. If this reliability attains or exceeds the required standard, then the testing phase ends; else, we proceed with the next test run.