KNOWLEDGE VERIFICATION AND VALIDATION
11.7 KNOWLEDGE VERIFICATION AND VALIDATION
Knowledge acquired from experts needs to be evaluated for quality, including evalua- tion, validation, and verification. These terms are often used interchangeably. We use the definitions provided by O'Keefe et al. (1987).
• Evaluation is a broad concept. Its objective is to assess an expert system's overall value. In addition to assessing acceptable performance levels, it analyzes whether the system would be usable, efficient, and cost-effective.
• Validation is the part of evaluation that deals with the performance of the system (e.g., as it compares to the expert's). Simply stated, validation is building the right system, that is, substantiating that a system performs with an acceptable level of
accuracy. • Verification is building the system right or substantiating that the system is cor-
rectly implemented to its specifications. In the realm of expert systems, these activities are dynamic because they must be
repeated each time the prototype is changed. In terms of the knowledge base, it is nec-
C H A P T E R 11 KNOWLEDGE ACQUISITION, REPRESENTATION, AND REASONING
(verification). For each IF statement, m o r e t h a n 30 criteria can be used in verification (see PC Al, March/April 2002, p. 59).
In performing these quality-control tasks, we deal with several activities and con- cepts, as listed in Table 11.8. The process can be very difficult if one considers the many sociotechnical issues involved (Sharma and Conrath, 1992).
A m e t h o d for validating ES, based on validation approaches f r o m psychology, was developed by Sturman and Milkovich (1995). T h e a p p r o a c h tests the extent to which the system and the expert decisions agree, the inputs and processes used by an expert
c o m p a r e d to the machine, and the d i f f e r e n c e b e t w e e n expert and novice decisions. Validation and verification techniques on specific ES are described by R a m and R a m (1996) for innovative m a n a g e m e n t . Avritzer et al. (1996) provide an algorithm for reli- ability testing of expert systems designed to o p e r a t e in industrial settings, particularly to monitor and control large real-time systems.
A u t o m a t e d verification o f k n o w l e d g e i s o f f e r e d i n t h e A C Q U I R E p r o d u c t described earlier. Verification is c o n d u c t e d by m e a s u r i n g the system's p e r f o r m a n c e and is limited to classification cases with probabilities. It works as follows: W h e n an ES is presented with a new case to classify, it assigns a confidence factor to each selection. By comparing these confidence factors with those provided by an expert, one can m e a - sure the accuracy of the ES for each case. By performing comparisons on m a n y cases, one can derive an overall measure of ES p e r f o r m a n c e ( O ' K e e f e and O'Leary, 1993).
TABLE 1 1 . 8 Measures of Validation Measure or Criterion Description Accuracy
H o w well the system reflects reality, how correct the knowledge is in the knowledge base
Adaptability
Possibilities for future development, changes
Adequacy Portion of the necessary knowledge included in the knowledge base (or completeness) Appeal
H o w well the knowledge base matches intuition and stimulates thought and practicability
Breadth
H o w well the domain is covered
Depth
D e g r e e of detailed knowledge
Face validity Credibility of knowledge Generality
Capability of a knowledge base to be used with a broad range of
similar problems
Precision Capability of the system to replicate particular system parameters, consistency of advice, coverage of variables in knowledge base
Realism Accounting for relevant variables and relations, similarity to reality Reliability
Fraction of the ES predictions that are empirically correct Robustness
Sensitivity of conclusions to model structure
Sensitivity Impact of changes in the knowledge base on quality of outputs Technical and
Quality of the assumed assumptions, context, constraints, and operational validity
conditions, and their impact on other measures Turing test
Ability of a human evaluator to identify whether a given conclusion
is made by an ES or by a human expert
Usefulness H o w adequate the knowledge is (in terms of parameters and relationships) for solving correctly
Validity Knowledge base's capability of producing empirically correct
P A R T IV INTELLIGENT DECISION SUPPORT SYSTEMS
Rosenwald and Liu (1997) have developed a validation procedure that uses the rule base's knowledge and structure to generate test cases that efficiently cover the entire input space of the rule base. Thus, the entire set of cases need not be examined. A sym- bolic execution of a model of the ES is used to determine all conditions under which the fundamental knowledge can be used. For an extensive bibliography on validation and verification, see G r o g o n o et al. (1991) and J u a n et al. (1999). An easy a p p r o a c h to automating verification of a large rule base can be found in Goldstein (2002).