EMPIRICAL ADEQUACY ASSESSMENT

8.5 EMPIRICAL ADEQUACY ASSESSMENT

Whereas in the foregoing discussions, we have attempted to characterize the adequacy of a test data T with respect to test selection requirements by means of analytical argu- ments, in this section we consider empirical arguments. Specifically, we ponder the question: How can we assess the ability of a test set T to expose faults in candidate programs? A simple-minded way to do this is to run candidate programs on a test set T and see what proportion of faults we are able to expose; the trouble with this approach is that we do not usually know what faults a program has. Hence, if execu- tion of program p on test set T yields no failures, or few failures, we have no way to tell whether this is because the program has no (or few) faults or because test set T is

156 TEST GENERATION CONCEPTS

inadequate. To obviate this difficulty, we generate mutants of program p, which are programs obtained by making small changes to p, and we run all these mutants on test set T; we can then assess the adequacy of test set T by its ability to distinguish all the mutants from the original p, and to distinguish them from each other. A note of caution is in order, though: it is quite possible for mutants to be indistinguishable, in the sense that the original program p and its mutant compute the same function; in such cases, the inability of set T to distinguish the two programs does not reflect negatively on T. This means that in theory, we should run this experiment only on mutants which we know to be distinct (i.e., to compute a different function) from the original; but because it is very difficult in practice to tell whether a mutant does or does not com- pute the same function as the original, we may sometimes (for complex programs) run the experiment on the assumption that all mutants are distinct from the original, and from each other.

As an illustrative example, we consider the following sorting program, which we had studied in Chapter 6; we call it p.

void somesort (itemtype a[MaxSize], indextype N) // line 1 {

// 2 indextype i; i=0;

// 3 while (i<=N-2)

// 4 {indextype j; indextype mindx; itemtype minval;

// 5 j=i; mindx=j; minval=a[j];

// 6 while (j<=N-1)

// 7 {if (a[j]<minval) {mindx=j; minval=a[j];}

// 8 j++;}

// 9 itemtype temp;

// 10 temp=a[i]; a[i]=a[mindx]; a[mindx]=temp;

Imagine that we have derived the following test data to test this program:

T Index N

Array a[..]

Comment/rationale

t 1 1 [5]

Trivial size

t 2 2 [5,5]

Borderline size, identical elements

t 3 2 [5,9]

Borderline size, sorted

t 4 2 [9,5]

Borderline size, inverted

t 5 6 [5,5,5,5,5,5]

Random size, identical elements

t 6 6 [5,7,9,11,13,15]

Random size, sorted

t 7 6 [15,13,11,9,7,5]

Random size, inverted

t 8 6 [9,11,5,15,13,7]

Random size, random order

8.5 EMPIRICAL ADEQUACY ASSESSMENT 157

The question we ask is: How adequate is this test data? If we run our sorting routine on this data and all executions are successful, how confident can we be that our pro- gram is correct? The approach advocated by mutation testing is to generate mutants of program p by making small alterations to its source code and checking to what extent the test data is sensitive to these alterations. Let us, for the sake of argument, consider the following mutants of program p:

void m1 (itemtype a[MaxSize], indextype N) // line 1 {

// 2 indextype i; i=0;

// 3 while (i<=N-1)

// 4 {indextype j; indextype mindx; itemtype minval;

// changed N-2 into N-1

// 5 j=i; mindx=j; minval=a[j];

// 6 while (j<=N-1)

// 7 {if (a[j]<minval) {mindx=j; minval=a[j];}

// 8 j++;}

// 9 itemtype temp;

// 10 temp=a[i]; a[i]=a[mindx]; a[mindx]=temp;

void m2 (itemtype a[MaxSize], indextype N) // line 1 {

// 2 indextype i; i=0;

// 3 while (i<=N-2)

// 4 {indextype j; indextype mindx; itemtype minval;

// 5 j=i; mindx=j; minval=a[j];

// 6 while (j<N-1)

// 7 {if (a[j]<minval) {mindx=j; minval=a[j];}

// changed <= into <

// 8 j++;}

// 9 itemtype temp;

// 10 temp=a[i]; a[i]=a[mindx]; a[mindx]=temp;

void m3 (itemtype a[MaxSize], indextype N) // line 1 {

// 2 indextype i; i=0;

// 3 while (i<=N-2)

// 4 {indextype j; indextype mindx; itemtype minval;

// 5 j=i; mindx=j; minval=a[j];

// 6 while (j<=N-1)

// 7 {if (a[j]<=minval) {mindx=j; minval=a[j];} // changed < into <=

// 8 j++;}

158 TEST GENERATION CONCEPTS

itemtype temp; // 10 temp=a[i]; a[i]=a[mindx]; a[mindx]=temp;

void m4 (itemtype a[MaxSize], indextype N) // line 1 {

// 2 indextype i; i=1;

// 3 while (i<=N-2)

// changed 0 into 1

// 4 {indextype j; indextype mindx; itemtype minval;

// 5 j=i; mindx=j; minval=a[j];

// 6 while (j<=N-1)

// 7 {if (a[j]<minval) {mindx=j; minval=a[j];}

// 8 j++;}

// 9 itemtype temp;

// 10 temp=a[i]; a[i]=a[mindx]; a[mindx]=temp;

void m5 (itemtype a[MaxSize], indextype N) // line 1 {

// 2 indextype i; i=0;

// 3 while (i<=N-2)

// 4 {indextype j; indextype mindx; itemtype minval;

// 5 j=i; mindx=j; minval=a[j];

// 6 while (j<=N-1)

// 7 {if (a[j]<minval) {mindx=j; minval=a[j];}

// 8 j++;}

// 9 itemtype temp;

// 10 a[i]=a[mindx]; temp=a[i]; a[mindx]=temp; // inverted the first two statements

Given these mutants, we now run the following test driver, which considers the mutants in turn and checks whether test set T distinguishes them from the original program p.

void main () {for (int i=0; i<=5; i++) // does T distinguish

// mutant (i) from p {for (int j=1; j<=8; j++) // is p(tj) different from mi(tj)? {load tj onto N, a;

8.5 EMPIRICAL ADEQUACY ASSESSMENT 159 run p, store result in a’;

load tj onto N, a; run mutant i, compare outcome to a’;}

if one of the tj returned a different outcome from p, announce:

“mutant i distinguished”

else announce: “mutant i not distinguished”;} }; // assess T according to how many mutants were distinguished

The actual source code for this is shown in the appendix. Execution of this program yields the following output, in which we show for each test datum t j and for each

mutant m i whether execution of the mutant on the datum yields a different result from execution of the original program p on the same datum.

Mutants T

True True t 2 True

t 1 True

True

True

True True t 3 True

True

True

True True t 4 True

True

True

False False t 5 True

True

True

True True t 6 True

True

True

True True t 7 True

True

True

False False t 8 True

False False

Before we make a judgment on the adequacy of our test data set, we must first check whether the mutants that have not been distinguished from the original pro- gram are identical to it or not (i.e., compute the same function). For example, it is

clear from inspection of the source code that mutant m 1 is identical to program p: indeed, since program p sorts the array by selection sort, then once it has selected the smallest N −1 elements of the array, the remaining element is necessarily the

largest; hence, the array is already sorted. What mutant m 1 does is to select the Nth element of the array and permute it with itself—a futile operation, which program p skips. Mutant m 3 also appears to compute the same function as the original program p, though it selects a different value for variable mindx when the array contains duplicates; this difference has no impact on the overall function of the program.

The question of whether mutant m 2 computes the same function as the original program is left as an exercise.

160 TEST GENERATION CONCEPTS

In general, once we have ruled out mutants that are deemed to be equivalent to the original program, we must consider the mutants that the test data did not distinguish from the original program even though they are distinct and raise the question: What additional test data should we generate to distinguish all these mutants? Conversely, we can view the proportion of distinct mutants that the test data has not distinguished as a measure of inadequacy of the test data, a measure that we should minimize by adding extra test data or refining existing data.

Note that the test data t 1 ,t 2 ,t 3 ,t 5 , and t 6 does not appear to help much in testing the sorting program, as they are unable to distinguish any mutant from the original program. In addition to its use to assess test sets, mutation is also used to automatically correct minor faults in programs, when their specification is available and readily testable: one can generate many mutants and test them against the specification using an adequate test set, until it encounters a mutant that satisfies the specification. There is no assurance that such a mutant can be found, nor that only one mutant can be found to satisfy the specification, nor that a mutant that satisfies the specification is more correct than the original program; nevertheless, this technique may find some uses in practice.

Dokumen yang terkait

Analisis Komparasi Internet Financial Local Government Reporting Pada Website Resmi Kabupaten dan Kota di Jawa Timur The Comparison Analysis of Internet Financial Local Government Reporting on Official Website of Regency and City in East Java

19 819 7

ANTARA IDEALISME DAN KENYATAAN: KEBIJAKAN PENDIDIKAN TIONGHOA PERANAKAN DI SURABAYA PADA MASA PENDUDUKAN JEPANG TAHUN 1942-1945 Between Idealism and Reality: Education Policy of Chinese in Surabaya in the Japanese Era at 1942-1945)

1 29 9

Improving the Eighth Year Students' Tense Achievement and Active Participation by Giving Positive Reinforcement at SMPN 1 Silo in the 2013/2014 Academic Year

7 202 3

Improving the VIII-B Students' listening comprehension ability through note taking and partial dictation techniques at SMPN 3 Jember in the 2006/2007 Academic Year -

0 63 87

The Correlation between students vocabulary master and reading comprehension

16 145 49

Improping student's reading comprehension of descriptive text through textual teaching and learning (CTL)

8 140 133

The correlation between listening skill and pronunciation accuracy : a case study in the firt year of smk vocation higt school pupita bangsa ciputat school year 2005-2006

9 128 37

Perancangan Sistem Informasi Akuntansi Laporan Keuangan Arus Kas Pada PT. Tiki Jalur Nugraha Ekakurir Cabang Bandung Dengan Menggunakan Software Microsoft Visual Basic 6.0 Dan SQL Server 2000 Berbasis Client Server

32 174 203

Pengaruh Kualitas Software Aplikasi pengawasan kredit (C-M@X) Pt.PLN (PERSERO) Distribusi Jawa Barat Dan Banten (DJBB) Terhadap Produktivitas Kerja karyawan UPJ Bandung Utara

5 72 130

Transmission of Greek and Arabic Veteri

0 1 22