2.1 INTRODUCTION
DescribingPatternsinData
In Chapter 1, we stated that numerical information data is often required for gaining new knowledge, effectively im-
proving business processes, and, in general, making better decisions. We also suggested that some amount of variability in
the data is unavoidable even though measurements are made under identical or very nearly identical conditions. This chapter
is concerned with methods for describing and summarizing data to highlight any important features or patterns they may
contain. The goal is to make the information in data obvious. Because there are several types of data, a particular procedure
for displaying and summarizing one kind of data may not be
H A P T E R W O
H A P T E R W O
C T
C T
2.2 ENUMERATIVE VERSUS ANALYTIC STUDIES
p
exploratory data analysis.
enumerative study,
Frame:
beyond
36
p
CHAPTER 2 DESCRIBING PATTERNS IN DATA
A distinction emphasized by Dr. Deming and others.
appropriate for another. However, there are general categories of methods that work, to a greater or lesser extent, for everything. These categories include tables, plots, and
numerical summaries. The practice of examining data with a collection of relatively simple tables, plots, and numbers is called
We said in Chapter 1 that the population often represents the target of the numerical inquiry, although the population may be difficult to define or simply unavailable for
study. In general, however, we learn about the population by sampling from it. To illustrate situations where defining the appropriate population is difficult and
generalizations the sample observations must be carefully interpreted, we must
distinguish between enumerative and analytic studies. In an
interest centers on the identifiable, unchanging, gen- erally finite collection of units from which the sample was selected. For example, we
may be interested in the 1996 per capita cost of health care for all U.S. companies with at least 100 employees. To get some idea of what the average 1996 per capita cost
might be, a sample of these firms is selected and health care costs determined. The per capita costs for the units firms in the sample are the sample observations. The per
capita costs for the entire collection of units firms are the population observations. The population numbers include the sample numbers and, if time and resources allow,
a complete enumeration of the population is possible.
A list, or similar mechanism, for identifying the entire set of relevant sampling units is called a frame.
A list of the entire set of relevant sampling units In the example on health care cost, a frame is a list of all U.S. firms with at
least 100 employees as of December 31, 1996. Enumerative investigations are typically concerned with making generalizations inferences from the sample data to the
complete collection of units in the frame. Along these lines, enumerative studies have two distinguishing characteristics. First, the frame entire collection of units does not
change. This ordinarily means that enumerative studies pertain to an environment existing at a particular point in time. Second, a 100 sample of the frame provides the
complete answer to the question posed.
An internal audit conducted to determine the extent to which long-distance telephone calls are business related is an enumerative study. The frame is a list of
all long-distance calls made by the several hundred employees of a particular firm for the previous month. A sample of employees is selected, and their long-distance
calls are audited. The results will be used to determine the amount the employer paid for nonbusiness-related calls. Perhaps the audit will suggest an investigation of all the
items in the frame.
analytic study
predict before
exit poll
37
2.2 ENUMERATIVE VERSUS ANALYTIC STUDIES
Product acceptance sampling is another good example of an enumerative study. A shipment of parts from a supplier is accepted or rejected depending on the number
of defective parts in a sample of parts from the shipment. The frame is the aggregate collection of parts in the shipment, say, a truckload, and interest centers on the number
of defective parts in the truckload.
An is a study that is not enumerative. Analytic studies generally
take place over time and are concerned with processes or cause-and-effect systems. The most effective analytic studies involve a plan for collecting the data. The objective
is to improve future practices or products. Analytic studies often involve comparisons. Will this material or that material lead to more durable products? Will this method of
training or that method of training lead to more productive employees? Will this type of service or that type of service lead to a higher retention of customers?
In analytic studies, we are interested in drawing conclusions about a process or product that often does not exist at the beginning of the study. We are no longer
dealing with a collection of identifiable units and, consequently, there is no relevant frame population from which to sample. Instead, we are typically dealing with
observations derived from a current process or product, and we must what will
happen at some future time if, for example, certain actions are taken. Consider a public opinion poll of registered voters held
an election. If interest centers on the proportion of people voting for, say, the Republican candidate on election
day, the pre-election-day poll is an analytic study. Even a 100 sample of the registered vot- ers will not allow us to predict the outcome of the election with certainty. Between the time
of the poll and election day, some voters will change their minds, additional eligible people may register, some voters will not vote for one reason or another, and these “stay-at-homes”
may well differ in their voting preferences from those who do vote. We want to draw con- clusions about a future process election-day voting from information on a current process
pre-election voting indications that might be quite different. However, an
to de- terminetheproportionofvoterswhohavevotedforaparticularcandidateisanenumerative
study, since a 100 sample provides perfect information provided all voters tell the truth . New products are frequently test-marketed before full-scale production occurs.
Consumer responses to a prototype product are used to fine-tune the product before full production or, perhaps, to abandon it altogether. Full evaluation of all test-marketed
products still may not tell us about the process of interest—the process associated with producing the final product. Studies involving prototypes or trial products are analytic.
The vast majority of numerical studies in business are analytic and we have to be careful about making statements or taking actions based on observations from current
processes. If a process is stable and unchanging in a state of “statistical control” and remains so, current data may be used to reach conclusions about future performance
of the process. However, the validity of extrapolating from current conditions should always be thoroughly examined. Enumerative studies, too, must be conducted with
care. The validity of inferences from enumerative studies depends, in part, on how well the frame represents the target population.
The issues raised either explicitly or implicitly in this section—collecting appro- priate data, summarizing numerical information, monitoring processes, generalizing
beyond the data or time period, reaching valid conclusions, and so forth—will be considered as we progress through this book. After-the-fact analysis cannot compen-
sate for a poorly planned investigation and, as we shall see, the planning process is different for analytic than for enumerative studies.
2.3 VARIABLES AND DATA