STATISTICS IN CONTEXT Directory UMM :wiley:Public:college:statistics:Johnson:

Kaunas 422,000 10,000 Niemen R. Berezina R. Smolensk Moscow 100,000 ° C –15 ° –30 ° –9 ° –21 ° –11 ° –20 ° –30 ° Dec. 6 Nov. 28 Nov. 14 Oct. 9 Temperature † p

2.8 STATISTICS IN CONTEXT

Figure 2.18 4 The Demise of Napoleon’s Army in Russia, 1812 – 1813 based on display by Charles Minard The quality control department of a midwest manufacturer of microwave ovens is re- quired by the U.S. government to monitor the amount of radiation emitted when the doors of the ovens are closed. Observations of the radiation passing through the closed doors of 42 ovens were obtained and are given in Table 2.8. The Visual Display of Quantitative Information. 86 p Data courtesy of J. D. Cryer. n CHAPTER 2 DESCRIBING PATTERNS IN DATA A copy of Minard’s original map and additional discussion are contained in Tufte, E. R. Cheshire, Conn.: Graphics Press, 1983. Napoleon’s Grand Army to capture Russia. A simplified version of the original graphic appears in Figure 2.18. The 422,000 troops that entered Russia near Kaunas are shown as a wide shaded river flowing toward Moscow, and the retreating army as a small black stream. The width of the band indicates the size of the army at each location on the map. Napoleon had to provide 422,000 soldiers in Poland to field 100,000 troops in Moscow. Even the simplified version Fig. 2.18 of the original graphic dramatically conveys the losses that left the army with 10,000 returning members. The temperature scale at the bottom of the graph, pertaining to the retreat from Moscow, helps to explain the loss of life, including the incident where thousands died trying to cross the Berezina River in subzero temperatures. Another informative graph is presented in Figure 2.19. In this graph, countries of the world are scaled to approximate size according to 1992 stock market capitalization. We see that most of the world’s money resides in the United States, Japan, and, to a lesser extent, the United Kingdom. Although we do not intend to pursue graphics, the impact of these examples should motivate you to think creatively when displaying data. † 87 Barbados Jamaica TrinidadTobago Argentina Brazil Colombia Venezuela Costa Rica Peru Chile UK Ireland United Kingdom Finland Sweden Norway Czechoslovakia Germany Austria Greece Netherlands Belgium Italy Spain France Portugal Hungary Switzerland Denmark Poland Kenya Turkey Egypt Nigeria South Africa Morocco Botswana Tunisia Ghana Ivory Coast Zimbabwe Mauritius Cyprus Jordan Israel Kuwait Iran Oman Indonesia Australia Singapore Pakistan Sri Lanka China Hong Kong India Thailand Bangladesh Malaysia Philippines Taiwan Korea Japan New Zealand Luxembourg Canada The World According to Stock Market Capitalization Where the worlds money is: Countries are scaled to approximate size according to 1992 stock market capitalization. Sources: IFC and FT-Actuaries World Indices United States Mexico Figure 2.19 The World According to Stock Market Capitalization 0.00 0.10 0.20 0.30 0.40 RADTNDC RADTNDC Stem-and-leaf of RADTNDC N = 42 Leaf Unit = 0.010 N MEAN MEDIAN STDEV 42 0.1283 0.1000 0.1003 6 0 112223 17 0 55555788899 MIN MAX Q1 Q3 11 1 00000000012 0.0100 0.4000 0.0500 0.1800 14 1 55588 9 2 000 6 2 6 3 0000 2 3 2 4 00 4 4 4 Panel 2.1 Radiation Through Closed Doors of Microwave Ovens mwcm TABLE 2.8 To determine the chance of exceeding a prespecified tolerance level, a pattern of variation for the amounts of radiation was required. Can we regard the observations here as being normally distributed? Panel 2.1 shows the stem-and-leaf diagram and the boxplot of the radiation data in Table 2.8. We call the radiation variable RADiaTioN Door Closed in the Minitab plots. It is clear from these plots that the radiation data are skewed to the right, with several ovens having relatively large values of .30 and .40. To bring the large radiation measurements more in line with the remaining obser- vations, we consider a reexpression or transformation of the data. The objective here is to create a set of data that can reasonably be described by a normal distribution. One transformation that brings large positive values relatively closer to the re- maining values is the square root transformation. For example, the numbers 9 3 and 100 10 are closer together than the original numbers 9 and 100. If we apply the square root transformation twice, that is, take the fourth root, , of each observation, we get the results shown in Panel 2.2. We label the reexpressed data Fourth ROOT Door Closed in the plots. It is evident from the stem-and-leaf diagram and boxplot in Panel 2.2 that the transformed observations are reasonably symmetric and, we would argue, nearly normal. 88 2 x CHAPTER 2 DESCRIBING PATTERNS IN DATA .15 .09 .18 .10 .05 .12 .18 .05 .08 .10 .07 .02 .01 .10 .10 .10 .02 .10 .01 .40 .10 .05 .03 .05 .15 .10 .15 .09 .08 .18 .10 .20 .11 .30 .02 .20 .20 .30 .30 .40 .30 .05 Plots of Microwave Radiation Data Door Closed 0.36 0.48 0.60 0.72 FROOTDC Stem-and-leaf of FROOTDC N = 42 FROOTDC Leaf Unit = 0.010 N MEAN MEDIAN STDEV 2 3 11 42 0.5643 0.5623 0.1198 5 3 777 6 4 1 MIN MAX Q1 Q3 11 4 77777 0.3162 0.7953 0.4729 0.6514 17 5 133344 11 5 66666666678 14 6 222 11 6 55666 6 7 4444 2 7 99 Panel 2.2 The radiation data are measurements of a particular characteristic associated with the manufacture of microwave ovens. The extent to which these measurements depict the behavior of radiation readings for yet-to-be-manufactured ovens depends on the sta- bility of the manufacturing process. As we explain in Chapter 13, if the process is “in- control”—that is, if the causes of variation in the radiation measurements remain the same or —then we would expect the current observations to tell us something about future values. If something unusual occurs, or the manufacturing process changes in some fundamental way, for example, there is a new supplier of raw materials or parts, or a new method of assembly, then the current radiation measure- ments may have little to say about future emissions through closed doors. The validity of any generalizations beyond the data in hand depends very much on the crucial as- sumption that the future is much like the past. We are generally concerned with the radiation emitted through the closed doors of all ovens that have been or produced. Consequently, a study of the radiation measurements to see whether the microwave ovens meet government standards is an analytic study. A 100 sample of all the ovens currently available will not provide perfect information about the performance of future ovens. Suppose the data in Table 2.8 are recorded in the order in which the ovens were manufactured. That is, the first observation in the first row is the radiation measure- ment for the oldest oven, the first oven of the group produced. The second observa- tion in the first row is the radiation measurement for the second oldest oven, and so forth. Thus, as we read across the rows in the table, we encounter more recent ob- servations. A time-ordered plot of the transformed radiation measurements is given in Figure 2.20 page 90. There are three horizontal lines in the figure. The middle line is at the value of the mean of the observations. The upper line is at 3 standard deviations above the mean. This line is called the The lower line is located at 3 standard deviations below the mean. This line is called the Because 89 constant common causes will be

2.8 STATISTICS IN CONTEXT

upper control limit UCL. lower control limit LCL. Plots of the Transformed Microwave Radiation Data Door Closed LCL = .2567 x = .5643 UCL = .8718 – Individual values .30 10 20 Observation number 30 40 .60 .90 p Figure 2.20 s of their location with respect to the mean, the UCL and LCL are sometimes called the three-sigma 3 limits. A chart like the one in Figure 2.20 is called a A control chart is simply a time-ordered or time series plot with upper and lower control limits drawn on each side of the mean of the observations. Control charts are used to display variability and to discover how much variability in the observations is due to random or common cause variation, and how much is due to unique events or We discuss control charts in Chapter 13. At this point, we use the control chart simply to display the time- ordered radiation measurements relative to their mean and 3 standard deviation limits. Let’s interpret what we see. We see that all the transformed radiation measurements are within 3 standard devia- tions of the mean. However, there is some tendency for the radiation values for the older ovens to be below the mean and the values for the more recently manufactured ovens to be above the mean. This could indicate some change in the manufacturing process leading to higher levels of emitted radiation. The evidence is inconclusive and addi- tional monitoring may be required, but the slight upward drift in the data illustrates the importance of looking at observations in the time order in which they were produced. If the process is stable in control, we would expect the observations in the control chart to vary about the centerline mean, within the 3 limits, with no specific pattern of variation. A few observations outside the 3 limits or a long sequence of observa- tions above or below the mean suggest that the process is not stable. That is, the causes of variability in the numbers are not constant or common over time. A change has oc- curred. As we discuss in Chapter 13, once a change is detected, we search for the reason for the change. Sometimes subgroup means rather than the individual observations are plotted in control charts. This often produces a clearer picture of the measured characteristic. Let’s look at the measurements in Figure 2.20. Suppose we collect the observations into subgroups of size 3. The rationale might be that the first three observations are a 90 p The standard deviation used in the construction of the control limits is an estimate of based on the range and produces a slightly different number than the sample standard deviation . s special causes. CHAPTER 2 DESCRIBING PATTERNS IN DATA s s s control chart. A Time-Ordered Plot of the Transformed Radiation Measurements LCL = .3635 x = .5643 UCL = .7650 – Sample mean .40 5 10 Sample number 15 .60 .80 v v v v X

2.9 CHAPTER SUMMARY