MODELING TIME SERIES DATA

7.3 MODELING TIME SERIES DATA

The focal point of input analysis is the data modeling stage. In this stage, a proba- bilistic model (stochastic process; see Section 3.9) is fitted to empirical time series data (pairs of time and corresponding observations) collected in Stage 1 (Section 7.1). Examples of empirical observations follow:

An observed sequence of arrival times in a queue. Such arrival processes are often modeled as consisting of iid exponential interarrival times (i.e., a Poisson process; see Section 3.9.2). An observed sequence of times to failure and the corresponding repair times. The associated uptimes may be modeled as a Poisson process, and the downtimes as a renewal process (see Section 3.9.3) or as a dependent process (e.g., Markovian process; see Section 3.9.4).

128 Input Analysis Depending on the type of time series data to be modeled, this stage can be broadly

classified into two categories:

1. Independent observations are modeled as a sequence of iid random variables (see Section 3.9.1). In this case, the analyst's task is to merely identify (fit) a “good” distribution and its parameters to the empirical data. Arena provides built-in facilities for fitting distributions to empirical data, and this topic will be discussed later on in Section 7.4.

2. Dependent observations are modeled as random processes with temporal depend- ence (see Section 3.9). In this case, the analyst's task is to identify (fit) a “good”

probability law to empirical data. This is a far more difficult task than the previous one, and often requires advanced mathematics. Although Arena does not provide facilities for fitting dependent random processes, we will cover this advanced topic in Chapter 10.

We now turn to the subject of fitting a distribution to empirical data, and will focus on two main approaches to this problem:

1. The simplest approach is to construct a histogram from the empirical data (sample), and then normalize it to a step pdf (see Section 3.8.2) or a pmf (see Section 3.7.1), depending on the underlying state space. The obtained pdf or pmf is then declared to be the fitted distribution. The main advantage of this approach is that no assumptions are required on the functional form (shape) of the fitted distribution.

2. The previous approach may reveal (by inspection) that the histogram pdf has a particular functional form (e.g., decreasing, bell shape, etc.). In that case, the analyst may try to obtain a better fit by postulating a particular class of distribu- tions having that shape, and then proceeding to estimate (fit) its parameters from the sample. Two common methods that implement this approach are the method of moments and the maximum likelihood estimation (MLE) method, to

be described in the sequel. This approach can be further generalized to multiple functional forms by searching for the best fit among a number of postulated classes of distributions. The Arena Input Analyzer provides facilities for this generalized fitting approach.

We now proceed to describe in some detail the fitting methods of the approach outlined in 2 above, while the Arena facilities that support the generalized approach will

be presented in Section 7.4.

7.3.1 M ETHOD OF M OMENTS

The method of moments fits the moments (see Section 3.5) of a candidate model to sample moments using appropriate empirical statistics as constraints on the candidate model parameters. More specifically, the analyst first decides on the class of distribu- tions to be used in the fitting, and then deduces the requisite parameters from one or more moment equations in which these parameters are the unknowns.

As an example, consider a random variable X and a data sample whose first two moments are estimated as ^ m 1 ¼ 8:5 and ^ m 2 ¼ 125:3. Suppose the analyst decides on the class of gamma distributions (see Section 3.8.7). Since the gamma distribution

Input Analysis 129 has two parameters (a and b), two equations are needed to determine their values, given

m ^ 1 and ^ m 2 above. Using the formulas for the mean and variance of a gamma distribution in Section 3.8.7, we note the following relations connecting the first two moments of a gamma distribution, m 1 and m 2 , and its parameters, a and b, namely,

m 1 ¼ ab

2 ¼ ab (1 þ a)

Substituting the estimated values of the two moments ^ m 1 and ^ m 2 for m 1 and m 2 above yields the system of equations

^ a^ b ¼ 8:5 ^ ^ a b 2 (1 þ^ a ) ¼ 125:3

whose unique non-negative solution is

a ^ ¼ 1:3619 ^ b ¼ 6:2412

and this solution completes the specification of the fitted distribution. Note that the same solution can be obtained from an equivalent system of equations, formulated in terms of the gamma distribution mean and variance rather than the gamma moments.

7.3.2 M AXIMAL L IKELIHOOD E STIMATION M ETHOD

This method postulates a particular class of distributions (e.g., normal, uniform, exponential, etc.), and then estimates its parameters from the sample, such that the resulting parameters give rise to the maximal likelihood (highest probability or density) of obtaining the sample. More precisely, let f (x;y) be the postulated pdf as function of its ordinary argument, x, as well as the unknown parameter y. We mention that y may actually be a set (vector) of parameters, but for simplicity we assume here that it is

scalar. Finally, let (x 1 ,...,x N ) be a sample of independent observations. The maximal likelihood estimation (MLE) method estimates y via the likelihood function L(x 1 ,...,x N ; y ), given by

N ; y ) : Thus, L(x 1 ,...,x N ; y ) is the postulated joint pdf of the sample, and is viewed as

L(x 1 ,...,x N ; y ) ¼ f (x 1 ; y ) f (x 2 ; y )

a function of both the (known) sample, (x 1 ,...,x N ), as well as the (unknown) para- meter, y. A maximal likelihood estimator, ^ y , maximizes the function L(x 1 ,...,x N ; y ) (or equivalently, the log-likelihood function, ln L(x 1 ,...,x N ; y )) over y, for a given sample, (x 1 ,...,x N ). As an example, consider the exponential distribution with parameter y ¼ l, and derive its maximum likelihood estimate, ^ y ^ ¼ l . The corresponding maximal likelihood function is

L(x ,...,x ; l )

and the log likelihood function is

130 Input Analysis

ln L(x 1 ,...,x N ; l ) ¼ N ln l l

i ¼1

The value of l that maximizes the function ln L(x 1 ,...,x N ; l ) over l is obtained by differentiating it with respect to l and setting the derivative to zero, that is,

ln L(x 1 ,...,x N ; l ) ¼

Solving the above in l yields the maximal likelihood estimate

i ¼1

which is simply the sample rate (reciprocal of the sample mean). As another example, a similar computation for the uniform distribution Unif(a,b) yields the MLE estimates ^ a ¼ minfx i : 1 b ^ ¼ maxfx i : 1