RELIABILITY METRICS 4.8 RELIABILITY METRICS

4 8 RELIABILITY METRICS 4.8 RELIABILITY METRICS

Reliability metrics are used to quantitatively express the reliability of a

software product. ft d t

 Some reliability metrics, which can be used to quantify the reliability of a

software product are: ft d t

1. MTTF (Mean Time to Failure).

Components have an average lifetime and this is reflected in the most widely used hardware reliability metric , Mean Time to Failure (MTTF).

• The MTTF is the mean time for which a component is expected to be operational. •

The MTTF is the average time between observed system failures.

• An MTTF of 500 means that one failure can be expected every 500 time units. The time units are totally dependent on the system and it can even be specified in the number of transactions, as is the case of database query systems.

• MTTF i MTTF is relevant for systems with long transactions, i.e., where system l tf t ith l t ti i h t

processing takes a long time.

• MTTF should be longer than transaction length. For example, it is suitable for computer-aided design systems where a designer will work on a design for computer aided design systems where a designer will work on a design for several hours as well as for word-processor systems.

2. MTTR (Mean Time to Repair).

• • Once failure occurs some time is required Once failure occurs, some time is required to fix the error to fix the error. •

MTTR measures the average time it takes to track the errors causing the failure and then to fix them.

OR OR •

MTTR is the average time to replace a defective component. •

Once a hardware component fails then the failure is usually permanent so the Mean Time to Repair (MTTR) Mean Time to Repair (MTTR) reflects the time taken to repair or replace the reflects the time taken to repair or replace the component .

3. MTBF (Mean Time Between Failures).

• • We can combine the MTTF and MTTR metrics to get the MTBF metric We can combine the MTTF and MTTR metrics to get the MTBF metric : : •

MTBF = MTTF + MTTR

• Thus, a MTBF of 300 hours indicates that once a failure occurs, the next failure is expected to occur only after 300 hours. In this case, the time measurements are expected to occur only after 300 hours. In this case, the time measurements are real time and not the execution time as in MTTF.

4. POFOD (Probability of Failure on Demand).

POFOD is the likelihood POFOD is the likelihood that the system will fail when a service request is that the system will fail when a service request is made.

• A POFOD of 0.001 means that one out of a thousand service requests may result in failure. result in failure.

• POFOD is an important measure for safety critical systems and should be kept as low as possible. It is relevant for many safety-critical systems with the exception of management components, such as an emergency shutdown system in a chemical plant.

5. ROCOF (Rate of Occurrences of Failure).

• ROCOF is the frequency of occurrence with which unexpected behavior is likely to occur.

A ROCOF of 2/100 means that two failures are likely to occur in each 100 operational time units.

• This metric is sometimes called the failure intensity . It is relevant for operating systems and transaction-processing systems where the system has to process a large number of similar requests that are relatively frequent; for example credit-card processing systems airlinebooking systems etc example, credit card processing systems, airlinebooking systems, etc.

6. AVAIL (Availability). ( y)

• Availability is the probability that the system is available for use at a given time. An availability of 0.998 means that in every 1000 time units, the system is likely to be available for 998 of these.

4.8.1 Measurements of Reliability and Availability Measurement of Reliability

 Most hardware-related reliability models are predicted on failure due to Most hardware related reliability models are predicted on failure due to wear rather than failure due to design defects.

– In hardware, failures due to physical wear (e.g., effect of temperature corrosion) are more likely than a design related failure There has been corrosion) are more likely than a design-related failure. There has been debate over the relationship between key concepts in hardware reliability and their applicability to software. Although an irrefutable link has yet to be established established.

 If we consider a computer-based system, a simple measure of

reliability is mean-time-between-failure (MTBF ) and it can be expressed as: d

 MTBF = MTTF + MTTR, where where

– MTTF = Mean Time to Failure – MTTR = Mean Time to Repair .

Measurement of Availability Measurement of Availability

 Many researchers argue that MTBF is a far more useful measure than Many researchers argue that MTBF is a far more useful measure than defects/KLOC or defects/FP because an end-user is concerned with

failures, not with the total error count. Because each error contained within a program does not have the same failure rate the total error within a program does not have the same failure rate, the total error count provides little indication of the reliability of a system. In addition to the reliability measure, we must develop a measure of availability. Software availability is the probability that a program is operating Software availability is the probability that a program is operating according to requirements at a given point in time and is defined as:

 Availability = (MTTF/(MTTF +MTTR))  Availability = (MTTF/(MTTF +MTTR)) 100% 100%

 The MTBF reliability measure is equally sensitive to MTTF and MTTR. The availability measure is somewhat more sensitive to MTTR The availability measure is somewhat more sensitive to MTTR.