POINT ESTIMATION

9.3 POINT ESTIMATION

Consider a generic steady-state replication r, during which the simulation program collects some statistics in order to estimate their steady-state values. Since the replica- tion is fixed in this section, we will simplify the notation by suppressing the replication index.

9.3.1 P OINT E STIMATION FROM R EPLICATIONS

Suppose the replication collects a sequence of n variates, fX 1 ,...,X n

g, yielding a

g. The estimator for the mean value parameter is the sample mean

corresponding sample of observations, fx 1 ,...,x n

n j ¼1

The sample mean is classified as a point estimator, because it estimates a scalar. In a similar vein, when the realizations fx 1 ,...,x n g are substituted into Eq. 9.2, the resulting sample mean is similarly referred to as a point estimate. For example, the sample values might represent the buffer delay experienced by successive job arrivals. Recall that in this context, the average is classified as a customer average statistic for the obvious reason that each index j in Eq. 9.2 corresponds to customer j, and the averaging is carried out over a sequence of customer-oriented variates.

Suppose we are interested in a continuous-time stochastic process fX t : 0 over some time interval [0, T], yielding a corresponding sample of observations, fx t ,:0 estimator

X t dt

referred to as a time average statistic, because the variates involved are indexed by time. In fact, time averages of the form in Eq. 9.3 constitute the continuous analog of Eq. 9.2. Again, when the realizations fx t : 0 time average is similarly referred to as a point estimate.

172 Output Analysis

A common example of time-continuous variates in the queueing context is the total number of jobs, N t , in a queueing system (buffer and server) at time t. Another important continuous-time stochastic process is the server utilization process fU t

The utilization statistic is the time average

U t dt :

which is an estimator of the probability that the server is busy. Clearly, U t ¼ 1 when the server is busy, while U t ¼ 0 when the server is idle. It follows that the integral is just the length of time (in [0, T]) during which the server is busy, and the utilization is the fraction of time the server is busy. More generally, the probability of any event expressed in terms of a time-continuous stochastic process is estimated by the time average of the corresponding event indicator variates. For more examples of customer averages and time averages, see Section 2.3.1.

9.3.2 P OINT E STIMATION IN A RENA

The Statistic module allows the user to obtain estimates for Tally statistics (system- defined and user-defined customer averages), and Time Persistent statistics (system-defined and user-defined time averages), although Arena automatically collects statistics for queue lengths, delays, and utilizations. Both Tally and Time Persistent statistics permit user access to a number of Arena variables, such as TAVG, DAVG, and so on. For example, let Some_Tally_Stat and Some_Time_Persistent_Stat be the names of statistics declared in the Statistic module. The user may request computation of the dynamic values of the running averages TAVG(Some_Tally_Stat) and DAVG(Some_Time_Persistent_Stat) of the corresponding variables at any time during a simulation run. Observe that Time Persistent statistics allow the user to compute probabilities as expectations of indicator functions of events. For example, the probability that the queue Some_Queue has more than four waiting jobs can be estimated by setting the Type field of the Statistic module to Time- Persistent, and entering the predicate (logical condition) NQ(Some_Queue) > 4 in the corresponding Expression field. For a full listing of Arena variables, refer to the Arena Variables Guide.

As another example of point estimation in Arena, consider a workstation subject to failure, where jobs arrive with exponential interarrival times of mean 1 hour, and have a fixed processing time of 0.75 hours. Specifically, the workstation goes through up/down cycles as follows: It fails randomly (while busy) with exponentially distributed time-to- failure of mean 20 hours, and is then repaired with uniformly distributed repair times between 1 and 5 hours. Clearly, the system can be expected to exhibit higher waiting times than the corresponding system without failures, since the introduction of failures increases the probability that the machine is not available to arriving jobs. Based on analytical calculations, we expect the throughput to be one job per hour (same as the

Output Analysis 173

Table 9.1 Results of five replications of a workstation subject to failures

Replication

Probability of Number

Throughput

Average

Job Delay

Down State

arrival rate), the downtime probability to be 0.1125, and the average job delay in the buffer to be 4.11 hours per job (see Altiok [1997], Chapter 3). The results of five replications of the workstation are displayed in Table 9.1.

Note that the estimates vary across replications (the underlying estimator is a random variable!). Any of the values can be used to estimate the true (unknown) parameter, but how confident can the modeler be about their accuracy? Intuitively, the accuracy should improve by forming a pooled estimate of the sample mean, that is, by averaging over all five replication estimates. However, we still lack quantitative information on the confidence to be ascribed to any estimate. The next section addresses the confidence issue via interval estimation.