KNOWLEDGE VERIFICATION AND VALIDATION

11.7 KNOWLEDGE VERIFICATION AND VALIDATION

Knowledge acquired from experts needs to be evaluated for quality, including evalua- tion, validation, and verification. These terms are often used interchangeably. We use the definitions provided by O'Keefe et al. (1987).

• Evaluation is a broad concept. Its objective is to assess an expert system's overall value. In addition to assessing acceptable performance levels, it analyzes whether the system would be usable, efficient, and cost-effective.

• Validation is the part of evaluation that deals with the performance of the system (e.g., as it compares to the expert's). Simply stated, validation is building the right system, that is, substantiating that a system performs with an acceptable level of

accuracy. • Verification is building the system right or substantiating that the system is cor-

rectly implemented to its specifications. In the realm of expert systems, these activities are dynamic because they must be

repeated each time the prototype is changed. In terms of the knowledge base, it is nec-

C H A P T E R 11 KNOWLEDGE ACQUISITION, REPRESENTATION, AND REASONING

(verification). For each IF statement, m o r e t h a n 30 criteria can be used in verification (see PC Al, March/April 2002, p. 59).

In performing these quality-control tasks, we deal with several activities and con- cepts, as listed in Table 11.8. The process can be very difficult if one considers the many sociotechnical issues involved (Sharma and Conrath, 1992).

A m e t h o d for validating ES, based on validation approaches f r o m psychology, was developed by Sturman and Milkovich (1995). T h e a p p r o a c h tests the extent to which the system and the expert decisions agree, the inputs and processes used by an expert

c o m p a r e d to the machine, and the d i f f e r e n c e b e t w e e n expert and novice decisions. Validation and verification techniques on specific ES are described by R a m and R a m (1996) for innovative m a n a g e m e n t . Avritzer et al. (1996) provide an algorithm for reli- ability testing of expert systems designed to o p e r a t e in industrial settings, particularly to monitor and control large real-time systems.

A u t o m a t e d verification o f k n o w l e d g e i s o f f e r e d i n t h e A C Q U I R E p r o d u c t described earlier. Verification is c o n d u c t e d by m e a s u r i n g the system's p e r f o r m a n c e and is limited to classification cases with probabilities. It works as follows: W h e n an ES is presented with a new case to classify, it assigns a confidence factor to each selection. By comparing these confidence factors with those provided by an expert, one can m e a - sure the accuracy of the ES for each case. By performing comparisons on m a n y cases, one can derive an overall measure of ES p e r f o r m a n c e ( O ' K e e f e and O'Leary, 1993).

TABLE 1 1 . 8 Measures of Validation Measure or Criterion Description Accuracy

H o w well the system reflects reality, how correct the knowledge is in the knowledge base

Adaptability

Possibilities for future development, changes

Adequacy Portion of the necessary knowledge included in the knowledge base (or completeness) Appeal

H o w well the knowledge base matches intuition and stimulates thought and practicability

Breadth

H o w well the domain is covered

Depth

D e g r e e of detailed knowledge

Face validity Credibility of knowledge Generality

Capability of a knowledge base to be used with a broad range of

similar problems

Precision Capability of the system to replicate particular system parameters, consistency of advice, coverage of variables in knowledge base

Realism Accounting for relevant variables and relations, similarity to reality Reliability

Fraction of the ES predictions that are empirically correct Robustness

Sensitivity of conclusions to model structure

Sensitivity Impact of changes in the knowledge base on quality of outputs Technical and

Quality of the assumed assumptions, context, constraints, and operational validity

conditions, and their impact on other measures Turing test

Ability of a human evaluator to identify whether a given conclusion

is made by an ES or by a human expert

Usefulness H o w adequate the knowledge is (in terms of parameters and relationships) for solving correctly

Validity Knowledge base's capability of producing empirically correct

P A R T IV INTELLIGENT DECISION SUPPORT SYSTEMS

Rosenwald and Liu (1997) have developed a validation procedure that uses the rule base's knowledge and structure to generate test cases that efficiently cover the entire input space of the rule base. Thus, the entire set of cases need not be examined. A sym- bolic execution of a model of the ES is used to determine all conditions under which the fundamental knowledge can be used. For an extensive bibliography on validation and verification, see G r o g o n o et al. (1991) and J u a n et al. (1999). An easy a p p r o a c h to automating verification of a large rule base can be found in Goldstein (2002).