Test of clinical software
6.2.3. Test of clinical software
Purpose of test The purpose of this test is to verify that the results from the
implementation of the clinical software are as expected and are reproducible. The guidelines in Section 6.2.4 provide more details regarding specific situations that require a test of the clinical software.
Materials The materials required are a set of clinical patient studies suitable to be
used with particular clinical software. For example, a set of 10 studies (or more) that covers the range of normal and abnormal results, which could be supplied by the software provider and which originate from an international database or from one’s own department.
Procedure (1) Ensure that the 10 test studies are available for data processing in one
session and that the persons who will perform the test are familiar with the processing procedure and documentation accompanying the software.
(2) Process the studies by employing each person who will use the software, preferably in one session. (3) Document the results and include a hard copy of the final display of results, the date and the person who processed the data. (4) Twice repeat steps (2) and (3) on other, separate occasions, so that each person has processed the test studies three times.
(5) Perform Bland–Altman [22] plots of the quantitative results for each pair of results, in order to evaluate the intra- and inter-observer reproducibility. A Bland–Altman plot is made by plotting the difference between the means on the Y axis against the average of the means on the
X axis. It is a sensitive test to determine the validity of one test compared with a second test or a new test.
Interpretation of results (1) Examine the documentation display and hard copy of the study and check
that it conforms with that expected (e.g. study identification, numerical data, labelling of images, absolute or relative colour scaling, colour scale used, ROIs).
(2) Examine the Bland–Altman plots of the quantitative results and check for acceptable variability over the whole range of abnormal and normal values. Ensure that there is no difference in variability at the extreme ends of the range of results, or a systematic offset.
Limits of acceptability (1) For fully automatic clinical software, the results from the same clinical
studies should be identical, regardless of who processed the data and regardless of the computer system on which the processing took place.
(2) Intra-observer variability, i.e. the same person processing the same set of clinical studies on different occasions, should produce <3% variability in results.
(3) Inter-observer variability, i.e. different persons processing the same set of clinical studies, should produce <5% variability in results. (4) If the limits are exceeded, the reasons should be investigated, follow-up action taken and the test repeated. (5) If the results and display do not conform to that indicated by the software supplier, then this must be reported to the supplier for follow-up action. The software should not be used clinically before the problem has been resolved.
Conclusion (1) This test should be performed by each person who will use the software
for the first time, before that person is permitted to use the software routinely.
(2) This test should be performed by each person who infrequently uses the software before using the software. (3) Record the date and document follow-up action taken.