CORRELATION IN INPUT ANALYSIS

10.1 CORRELATION IN INPUT ANALYSIS

Correlation analysis as part of input analysis is simply an approach to modeling and data fitting. This approach insists on high-quality models incorporating temporal dependence, and strives to fit correlation-related statistics in a systematic way. To set the scene, consider a stationary time series

n fX 1 g n ¼0 , that is, a time series in which all statistics remain unchanged under the passage of time. In particular, all X n share a common mean, m X , and common variance, s 2 X . To fix the ideas, suppose that fX n g is to

be used to model interarrival times at a queue (in which case the time series is non- negative). What characteristics of fX n g should be carefully modeled? We refer to a collection of such characteristics (both in a stochastic model as well as their counterparts in the empirical data that gave rise to the model) as a signature. As signatures are often just a set of statistical parameters (means, variances, and so on), the terms signature and

196 Correlation Analysis statistical signature will be used interchangeably. We can often (but not always) judge

one signature to be stronger than another. Clearly, signatures become stronger under inclusion; for example, the statistical signature consisting of the mean and variance is obviously stronger than the signature consisting of the mean alone. Of course, in practice, the true statistical parameters in a signature are typically unknown; in this case, the modeler estimates the signature from empirical data (if available), or absent such data, the modeler may elect to go out on a limb and make an educated guess. In any event, it should be clear that a model fitted to a “strong” statistical signature should have a higher predictive power than one fitted to a “weaker” one.

The foregoing discussion suggests that the modeler should generally strive to fit a model to a strong statistical signature; more accurately, the modeler should strive to fit a model to as strong a signature as is practically feasible. On the other hand, the modeler should be careful not to “over-fit,” in the sense that the resulting model contains too many parameters as compared to the data available. Note, however, that model “over-fitting” is relatively rare as compared to model “under-fitting,” that is, producing a model by fitting an unduly weak signature. Ultimately, the model produced is a trade-off, incorporating multiple factors, such as the availability of sufficient empirical data, the goal of the modeling project, and perforce, the modeler's knowledge and modeling skill.

The purpose of this section is to encourage the reader to generally strive to fit a strong signature, which includes both the marginal distribution and the autocorrelation function (a statistical proxy for temporal dependence). To this end, we will consider a particular class of models, called TES models, that has favorable modeling properties. TES modeling will be described in the next section.

To clarify the strong signature recommended, we return to the time series of inter- arrival times, fX n

g, and assume that we have at our disposal a considerable number of N empirical observations, ^ Y ¼ f^y n g n ¼0 (we use the caret symbol to indicate empirical observations or statistics formed from them as estimates, in counterdistinction to theoretical constructs). Consider the following set of statistical signatures in ascending strength:

1. The mean, m X , of the interarrival distribution. This is a “minimal” signature, since its reciprocal, 1 =m X is the arrival rate—a key statistic in queueing models.

2. The mean, m X , and the variance, s 2 X , of the interarrival distribution. The scatter information embedded in the variance can affect queueing statistics, when sup- plementing mean information.

3. Additional moments of the interarrival distribution, such as skewness and kurtosis (see Section 3.5).

4. The (marginal) distribution, F X , of the interarrival distribution. Since a distribu- tion determines all its moments, this signature subsumes all the above. In practice,

we estimate F X via an empirical histogram, ^ H , constructed from the empirical data, ^ Y.

5. The distribution, F X , and the autocorrelation function, r X (t), of the interarrival time series. In practice, we estimate r X (t) by some estimate ^ r X (t), t ¼ 1, . . . , T, where T <N.

Clearly the last signature is stronger than any of the preceding ones. This signature is the focus of this section, and as such merits further elucidation.

Correlation Analysis 197 The marginal distribution, F X , is a first-order statistic of fX n

g. This means that it involves only a single random variable. It is important to realize that fitting a distribu-

tion is by itself a strong signature, since this would automatically fit all its moments. Recall that the autocorrelation function of a stationary time series fX n g with a finite variance is given by

X The autocorrelation function, r X (t), is a second-order statistic, because its definition

requires pairs of lagged random variables in fX n g (in addition to the first-order statistics of mean and variance). More importantly, the autocorrelation function is a convenient

g, that is, it provides information on the “behavior” of random variables in fX n g that are separated by t lags (time-index units). More specifically, r X (t) is just the correlation coefficient of the lagged random

(partial) descriptor of temporal dependence within fX n

variables, fX n g and fX n þt g (recall that by stationarity, this value is the same for all n), and as such it provides a measure of the linear dependence prevailing among such lagged random variables. Simply put, it tells us to what extent the two are likely to behave as follows (see Section 3.6):

Vary in the same direction, that is, if one random variable is large (small), to what extent is the other likely to be large (small) too on a scale of 0 to 1? Vary in opposite directions, that is, if one random variable is large (small), to what extent is the other likely to be small (large) too on a scale of

1 to 0? This covariation information is an important aspect of temporal dependence, but by

no means characterizes it. In fact, temporal dependence is fully characterized by all joint probabilities of random variables. Unfortunately, each such probability introduces a separate parameter into a prospective model, and fitting all of them is not a practical proposition. The autocorrelation function, in contrast, is a much cruder measure of

temporal dependence (the statistic E X ½ n X n þt Š can be deduced from the joint probabil- ities, but not vice versa), which focuses on linear dependence. However, it captures an important aspect of temporal dependence and introduces comparatively few parameters (its values for each lag considered), so that it qualifies as an effective statistical proxy for temporal dependence.