MEASURING TEST EFFECTIVENESS

13.11 MEASURING TEST EFFECTIVENESS

It is useful to evaluate the effectiveness of the testing effort in the development of a product. After a product is deployed in the customer environment, a common measure of test effectives is the number of defects found by the customers that were not found by the test engineers prior to the release of the product. These defects had escaped from our testing effort. A metric commonly used in the industry to measure test effectiveness is the defect removal efﬁciency (DRE) [22], deﬁned as

number of defects found in testing DRE =

number of defects found in testing + number of defects not found We obtain the number of defects found in testing from the defect tracking

system. However, calculating the number of defects not found during testing is a difﬁcult task. One way to approximate this number is to count the defects found by the customers within the ﬁrst six months of its operation. There are several issues that must be understood to interpret the DRE measure in an useful way:

• Due to the inherent limitations of real test environments in a laboratory, certain defects are very difﬁcult to ﬁnd during system testing no matter how thorough we are in our approach. One should include these defects in the calculation if the goal is to measure the effectiveness of the testing effort including the limitations of the test environments. Otherwise, these kinds of defects may be eliminated from the calculation.

• The defects submitted by customers that need corrective maintenance are taken into consideration in this measurement. The defects submitted that need either adaptive or perfective maintenance are not real issues in the software; rather they are requests for new feature enhancements. Therefore, defects that need adaptive and perfective maintenance may be removed from the calculation.

• There must be a clear understanding of the duration for which the defects are counted, such as starting from unit testing or system integration testing to the end of system testing. One must be consistent for all test projects.

• This measurement is not for one project; instead it should be a part of the long-term trend in the test effectiveness of the organization.

Much work has been done on the fault seeding approach [23] to estimate the number of escaped defects. In this approach the effectiveness of testing is determined by estimating the number of actual defects using an extrapolation technique. The approach is to inject a small number of representative defects into the system and measure the percentage of defects that are uncovered by the sustaining test engineers. Since the sustaining test team remains unaware of the seeding, the extent to which the product reveals the known defects allows us to extrapolate the extent to which it found unknown defects. Suppose that the product contains N defects and K defects are seeded. At the end of the test experiments, the sustaining test team has found n unseeded and k seeded defects. The fault seeding theory

442 CHAPTER 13 SYSTEM TEST EXECUTION

asserts the following:

For example, consider a situation where 25 known defects have been delib- erately seeded in a system. Let us assume that the sustaining testers detect 20 (80%) of these seeded defects and uncovers 400 additional defects. Then, the total number of estimated defects in the system is 500. Therefore, the product still has 500 − 400 = 100 defects waiting to be discovered plus 5 seed defects that still exist in the code. Using the results from the seeding experiment, it would be possible to obtain the total number of escaped defects from the system testing phase. This estimation is based on the assumption that the ratio of the number of defects found to the total number of defects is 0.80—the same ratio as for the seeded defects. In other words, we assume that the 400 defects constitute 80% of all the defects found by the sustaining test engineers in the system. The estimated remaining number of defects are still hidden in the system. However, the accuracy of the measure is dependent on the way the defects are injected. Usually, artificially injected defects are manually planted, but it has been proved difficult to implement defect seeding in practice. It is not easy to introduce artificial defects that can have the same impact as the actual defects in terms of difficulty of detection. Generally, artificial defects are much easier to find than actual defects.

Spoilage Metric Defects are injected and removed at different phases of a software development cycle [24]. Defects get introduced during requirements analysis, high-level design, detailed design, and coding phases, whereas these defects are removed during unit testing, integration testing, system testing, and acceptance testing phases. The cost of each defect injected in phase X and removed in phase Y is not uniformly distributed; instead the cost increases with the increase in the distance between X and Y . The delay in finding the dormant defects cause greater harms, and it costs more to fix because the dormant defects may trigger the injection of other related defects, which need to be fixed in addition to the original dormant defects. Therefore, an effective testing method would find defects earlier than a less effective testing method would. Hence an useful measure of test effectiveness is defect age, known as PhAge. As an example, let us consider Table 13.15, which shows a scale for measuring age. In this example, a requirement defect discovered during high-level design review would be assigned a PhAge of 1, whereas a requirement defect discovered at the acceptance testing phase would be assigned a PhAge of 7. One can modify the table to accommodate different phases of the software development life cycle followed within an organization, including the PhAge numbers.

If the information about a defect introduction phase can be determined, it can

be used to create a matrix with rows corresponding to the defect injected in each phase and columns corresponding to defects discovered in each phase This is often called a defect dynamic model . Table 13.16 shows a defect injected–v discovered matrix from an imaginary test project called Boomerang. In this example, there were

13.11 MEASURING TEST EFFECTIVENESS

TABLE 13.15 Scale for Defect Age

Phase Discovered High-

Phase

Unit Integration System Acceptance Injected

Level Detailed

Requirements Design Design Coding Testing Testing Testing Testing Requiremants

0 1 2 3 4 5 6 7 High-level

0 1 2 3 4 5 6 design

Detailed 0 1 2 3 4 5 design

Coding 0 1 2 3 4

TABLE 13.16 Defect Injection versus Discovery on Project Boomerang

Phase Discovered High-

Integration System Acceptance Total Injected

Phase

Level Detailed

Unit

Requirements Design Design Coding Testing Testing Testing Testing Defects Requiremants

0 7 3 1 0 0 2 4 17 High-level

0 8 4 1 2 6 1 22 design Detailed design

seven requirement defects found in high-level design, three in detailed design, one in coding, two in system testing, and four in acceptance testing phases.

Now a new metric called spoilage [25, 26] is deﬁned to measure the defect removal activities by using the defect age and defect dynamic model. The spoilage metric is calculated as

(number of defects × discovered PhAge) Spoilage = total number of defects

Table 13.17 shows the spoilage values for the Boomerang test project based on the number of defects found (Table 13.15) weighted by defect age (Table 13.16). During acceptance testing, for example, 17 defects were discovered, out of which 4 were attributed to defects injected during the requirements phase of the Boomerang project. Since the defects that were found during acceptance testing could have been found in any of the seven previous phases, the requirement defects that were dormant until the acceptance testing were given a PhAge of 7. The weighted number of requirement defects revealed during acceptance testing is 28, that is,

7 × 4 = 28. The spoilage values for requirements, high-level design, detail design, and coding phases are 3.2, 2.8, 2.0, and 1.98, respectively. The spoilage value for the Boomerang test project is 2.2. A spoilage value close to 1 is an indication of

TABLE 13.17 Number of Defects Weighted by Defect Age on Project Boomerang

Phase Discovered

Spoilage as Phase

Total Weight/Total Injected

High-Level Detailed

Unit

Integration System Acceptance

Weight Defects Defects Requiremants

Coding Testing

0 7 6 3 0 0 12 28 56 17 3.294117647 High-level design

0 8 8 3 8 30 6 63 22 2.863636364 Detailed design

0 13 6 12 20 0 51 25 2.04 Coding

48 270 136 1.985294118 Summary

13.12 SUMMARY

a more effective defect discovery process. As an absolute value, the spoilage metric has little meaning. This metric is useful in measuring the long-term trend of test effectiveness in an organization.

MEASURING TEST EFFECTIVENESS

13.11 MEASURING TEST EFFECTIVENESS

Parts

Dokumen yang terkait

ANALISIS DANA PIHAK KETIGA PADA PERBANKAN SYARIAH DI INDONESIA PERIODE TRIWULAN I 2002 – TRIWULAN IV 2007

ANALISIS KEMAMPUAN SISWA SMP DALAM MENYELESAIKAN SOAL PISA KONTEN SHAPE AND SPACE BERDASARKAN MODEL RASCH

Analisis Komparasi Internet Financial Local Government Reporting Pada Website Resmi Kabupaten dan Kota di Jawa Timur The Comparison Analysis of Internet Financial Local Government Reporting on Official Website of Regency and City in East Java

STUDI PENJADWALAN DAN RENCANA ANGGARAN BIAYA (RAB) PADA PROYEK PEMBANGUNAN PUSAT PERDAGANGAN CIREBON RAYA (PPCR) CIREBON – JAWA BARAT

PENGARUH PENGGUNAAN BLACKBERRY MESSENGER TERHADAP PERUBAHAN PERILAKU MAHASISWA DALAM INTERAKSI SOSIAL (Studi Pada Mahasiswa Jurusan Ilmu Komunikasi Angkatan 2008 Universitas Muhammadiyah Malang)

AN ANALYSIS OF DESCRIPTIVE TEXT WRITING COMPOSED BY THE HIGH AND THE LOW ACHIEVERS OF THE EIGHTH GRADE STUDENTS OF SMPN SUKORAMBI JEMBER

AN ANALYSIS OF LANGUAGE CONTENT IN THE SYLLABUS FOR ESP COURSE USING ESP APPROACH THE SECRETARY AND MANAGEMENT PROGRAM BUSINESS TRAINING CENTER (BTC) JEMBER IN ACADEMIC YEAR OF 2000 2001

A DISCOURSE ANALYSIS ON “SPA: REGAIN BALANCE OF YOUR INNER AND OUTER BEAUTY” IN THE JAKARTA POST ON 4 MARCH 2011

Improving the Eighth Year Students' Tense Achievement and Active Participation by Giving Positive Reinforcement at SMPN 1 Silo in the 2013/2014 Academic Year

THE INTEGRATION BETWEEN INDONESIA AND WORLD RICE MARKET

Dukungan

Links

MEASURING TEST EFFECTIVENESS

13.11 MEASURING TEST EFFECTIVENESS

Parts

Dokumen yang terkait

ANALISIS DANA PIHAK KETIGA PADA PERBANKAN SYARIAH DI INDONESIA PERIODE TRIWULAN I 2002 – TRIWULAN IV 2007

ANALISIS KEMAMPUAN SISWA SMP DALAM MENYELESAIKAN SOAL PISA KONTEN SHAPE AND SPACE BERDASARKAN MODEL RASCH

Analisis Komparasi Internet Financial Local Government Reporting Pada Website Resmi Kabupaten dan Kota di Jawa Timur The Comparison Analysis of Internet Financial Local Government Reporting on Official Website of Regency and City in East Java

STUDI PENJADWALAN DAN RENCANA ANGGARAN BIAYA (RAB) PADA PROYEK PEMBANGUNAN PUSAT PERDAGANGAN CIREBON RAYA (PPCR) CIREBON – JAWA BARAT

PENGARUH PENGGUNAAN BLACKBERRY MESSENGER TERHADAP PERUBAHAN PERILAKU MAHASISWA DALAM INTERAKSI SOSIAL (Studi Pada Mahasiswa Jurusan Ilmu Komunikasi Angkatan 2008 Universitas Muhammadiyah Malang)

AN ANALYSIS OF DESCRIPTIVE TEXT WRITING COMPOSED BY THE HIGH AND THE LOW ACHIEVERS OF THE EIGHTH GRADE STUDENTS OF SMPN SUKORAMBI JEMBER

AN ANALYSIS OF LANGUAGE CONTENT IN THE SYLLABUS FOR ESP COURSE USING ESP APPROACH THE SECRETARY AND MANAGEMENT PROGRAM BUSINESS TRAINING CENTER (BTC) JEMBER IN ACADEMIC YEAR OF 2000 2001

A DISCOURSE ANALYSIS ON “SPA: REGAIN BALANCE OF YOUR INNER AND OUTER BEAUTY” IN THE JAKARTA POST ON 4 MARCH 2011

Improving the Eighth Year Students' Tense Achievement and Active Participation by Giving Positive Reinforcement at SMPN 1 Silo in the 2013/2014 Academic Year

THE INTEGRATION BETWEEN INDONESIA AND WORLD RICE MARKET

Dokumen yang Anda mencari sudah siap untuk unduhkan