DATA ANALYSIS

7.2 DATA ANALYSIS

Once adequate data are collected, the analyst often performs a preliminary analysis of the data to assist in the next stage of model fitting to data (see Section 7.4). The analysis stage often involves the computation of various empirical statistics from the collected data, including

Statistics related to moments (mean, standard deviation, coefficient of variation, etc.) Statistics related to distributions (histograms) Statistics related to temporal dependence (autocorrelations within an empirical time series, or cross-correlations among two or more distinct time series)

These statistics provide the analyst with information on the collected sample, and constitute the empirical statistics against which a proposed model will be evaluated for goodness-of-fit.

To illustrate the nature of data analysis, consider the sample of 100 repair time observations in Table 7.1, collected in a manufacturing process (consecutive observa- tions are arranged by rows). Data analysis reveals that the sample minimum and maximum values are 10.3 minutes and 29.9 minutes, respectively. Recall, however,

126 Input Analysis

Table 7.1 Sample data of repair time observations

sample, and would usually vary from sample to sample. These statistics (minimum and maximum) play an important role in the choice of a particular (repair time) distribution. Data analysis further discloses that the sample mean is 19.8, the sample standard deviation is 5.76, and the squared sample coefficient of variation is 0.0846. These statistics suggest that repair times have low variability, a fact supported by inspection of Table 7.1.

Arena provides data analysis facilities via its Input Analyzer tool, whose main objective is to fit distributions to a given sample. The Input Analyzer is accessible from the Tools menu in the Arena home screen. After opening a new input dialog box (by selecting the New option in the File menu in the Input Analyzer window), raw input data can be selected from two suboptions in the Data File option of the File menu:

1. Existing data files can be opened via the Use Existing option.

2. New (synthetic) data files can be created using the Generate New option as iid samples from a user-prescribed distribution. This option is helpful in studying distributions and in comparing alternative distributions.

Once the subsequent Input Analyzer files have been created, they can be accessed in the usual way via the Open option in the File menu.

Returning to the repair time data of Table 7.1, and having opened the corresponding data file (called Repair_Times.dft), the Input Analyzer automatically creates a histogram from its sample data, and provides a summary of sample statistics, as shown in Figure 7.1.

The Options menu in the Input Analyzer menu bar allows the analyst to customize a histogram by specifying its number of intervals through the Parameters option and its Histogram option’s dialog box. Once a distribution is fitted to the data (see next section), the same menu also allows the analyst to change the parameter values of the

Input Analysis 127

Figure 7.1 Histogram and summary statistics for the repair time data of Table 7.1.

fitted distribution. The Window menu grants the analyst access to input data (as a list of numbers) via the Input Data option.