MODEL VALIDATION

8.4 MODEL VALIDATION

Validation activities are critical to the construction of credible models. The standard approach to model validation is to collect data (parameter values, performance metrics, etc.) from the system under study, and compare them to their model counterparts (see Law and Kelton [2000] and Banks et al. [2004]).

As previously discussed in Section 7.2, the data collection effort of input analysis can provide the requisite data from the system under study. Data collected are classified into input values and corresponding output values. For instance, consider a machine that processes jobs arriving at a given rate, and suppose that we are interested in delay times experienced by jobs in the buffer. Raw input data might then consist of job interarrival times, processing times, and buffer delays, which may be obtained by observing the system over, say, a number of days. In a similar vein, a model is constructed and run to produce simulation data (that serve as input data sets) from which the corresponding output data (performance metrics) are then estimated.

Table 8.1 displays schematically the structure of the correspondence between a generic input data set and the corresponding output metrics over a period of N time units (e.g., days). Each set of output metric values constitutes a sample of random values. In our case, the output metrics consist of estimated mean delay times; that is, we collect two sets (samples) of output data:

D 1 D N ) of observed average delays collected from the real-life system under study over days i ¼ 1, . . . , N

2. A sample { ^ D 1 ,...,^ D N } of estimated mean delays collected from runs of the simulation model over days i ¼ 1, . . . , N

Thus, for the real-life system under study, O i ¼ D i ,i ¼ 1, . . . , N, while for the simulation runs, O i

¼ ^ D i ,i ¼ 1, . . . , N. More precisely, model validation seeks to determine the goodness-of-fit of the model to the real-life system under study (in our

case we statistically compare the average and mean delay times). Recasting this check in terms of hypothesis testing (see Section 3.11), we wish to determine whether the two delay metrics are statistically similar (null hypothesis) or statistically different (alternative hypothesis). To this end, define a sample consisting of the differences

G i ¼ D ^ i D i ,i ¼ 1, . . . , N,

which are approximately normally distributed with mean m and variance 2 G G . The hypothesis testing assumes the form

Table 8.1 Generic input and output data

Day

Input Data Set

Output Metric Set 1 I 1 O 1

162 Model Goodness: Verification and Validation

G ¼0

G 6¼ 0

Then, under the null hypothesis, the statistic

t N1 ¼

p ffiffiffiffi S G = N

is distributed according to a t distribution with N – 1 degrees of freedom, where

G and S G are the sample mean and sample standard deviation of the sample {G 1 ,...,G N }, respectively. For a prescribed significance level, a, the corresponding confidence interval satisfying

Pr (t 1 N1

is equivalent to

G G þt 1a =2, N 1 S G = N ) ¼1 where t 1 ¼t a =2, N 1 and t 2 ¼t 1a =2, N 1 . If the above confidence interval contains 0,

then H 0 cannot be rejected at significance level a, indicating that the test supports model validity. If, however, the confidence interval does not contain 0, then the test suggests that the model is not valid.