CHAPTER SUMMARY Directory UMM :wiley:Public:college:statistics:Johnson:

LCL = .3635 x = .5643 UCL = .7650 – Sample mean .40 5 10 Sample number 15 .60 .80 v v v v X

2.9 CHAPTER SUMMARY

4 42 3 Figure 2.21 X exploratory data analysis. enumerative study frame. analytic study variable. quantitative qualitative. nominal data. ordinal data. discrete continuous. sample from a batch of microwave ovens produced at about the same time. The second three observations are a sample from a later batch of ovens, and so forth. The subgroup means are plotted in the control chart in Figure 2.21. Since the points plotted are means, the chart is called an The 3 limits UCL and LCL are determined using the standard deviation of the subgroup means. This standard deviation is less than the standard deviation of the original observations. Since we have 42 observations, there are 14 sample subgroup means plotted in the chart. Notice the upward drift in the mean radiation numbers over time. This pattern is consistent with the pattern for the individual observations in Figure 2.20. The drift in the means is clearer than the drift in the original observations, suggesting that there may be a problem with the more recently manufactured ovens. 91 In this chapter, we have learned: The general practice of summarizing data with a collection of relatively simple tables, plots, and numbers is called An is a study about an identifiable, unchanging, generally finite population done at a particular point in time. The list of sampling units for an enumerative study is called a An enumerative study is like a snapshot. An is a study that is not enumerative. Analytic studies generally take place over time, like a moving picture. The objective is to use current information to predict what will happen in the future. Analytic studies often involve comparisons. The characteristic of interest in either an enumerative or analytic study is called a Variables are of two types: and Qualitative variables are non- numerical. The “values” for qualitative variables are categories. Numbers are often assigned to the categories to distinguish them. If the categories have no particular order, such as male female, the numbers assigned to the categories are called If the categories are ordered, such as the bond ratings, Aaa, Aa, . . . , the ordered numbers assigned to the categories are called Quantitative variables are naturally numerical and may be or Discrete variables are those variables whose values are

2.9 CHAPTER SUMMARY

s -bar chart. An -bar Chart for the Transformed Radia- tion Data v v v v v v v v v v v v v 4 1 2 3 2 2 2 3 1 1 2 3 x M Q Q Q Q M s s s Q Q Q Q M Q x y y a bx y x N , x x s x s x s x s X N , Y a bX N a b , b 4 ` 4 4 4 ` 4 ` ` frequency distribution. dot diagram, stem-and-leaf diagram, histogram density histogram, boxplot. mean, median, per- centiles. Quartiles, robust resistant. variance, standard deviation, range, interquartile range, Boxplots five-number summary, linear transformation. normal density function. normal distribution. empirical rule 68 – 95 – 99.7 Rule standard normal distribution 92 2 2 6 6 6 m s m s m s m s m s numbers with gaps between them, such as the number of votes received by candidates in an election. Continuous variables are those variables that can, in principle, take any value in an interval. Sometimes discrete variables are treated as continuous variables. The pattern of variability in a set of data is called its A frequency distribution indicates the values, or categories, for the variable and the number of times each value, or category, occurs. Frequency distributions are best characterized by graphs or plots. Possible plots for displaying a frequency distribution include: a a a or and a Summary numbers indicating location include the , the , and The location of the mean is such that the sum of the deviations of the observations from the mean is zero. The median divides the ordered data in half. , , , are percentiles that divide the ordered data into quarters. Consequently, the second quartile, , is also the median, . Summary measures that are not appreciably affected by a few extreme observations are said to be or The median, for example, is a robust measure of location. Summary numbers measuring variation include the following: the , the , the Range Max Min; and the IR . The interquartile range is a robust measure of variation. are pictorial representations of the Min, , , , and Max. Variables and that are connected by the expression are said to be connected by a We say that is a linear transformation of . When density histograms are symmetric about a single peak and look like the outline of a bell, they can often be approximated by a smooth curve known as the The normal density function with mean and standard deviation is denoted by . The mean locates the middle of the normal density function along the -axis, and the stan- dard deviation controls the spread or concentration of the normal curve about the mean. As the standard deviation decreases, the normal curve becomes more tightly concentrated about its mean. The normal density function is also called the The allows us to summarize the locations of increasing proportions of a set of numbers using only the sample mean and sample standard deviation. The empirical rule tells us that about 68 of the data lie in the interval ; about 95 of the data lie in the interval 2 ; and about 99.7 of the data lie in the interval 3 . The empirical rule works best for large, mound-shaped data sets. The summarizes the area under the normal curve in terms of standard deviation intervals centered at the mean. For any normal density function, 1. 68 of the area under the curve is contained within 1 standard deviation of the mean. 2. 95 of the area under the curve is contained within 2 standard deviations of the mean. 3. 99.7 of the area under the curve is contained within 3 standard deviations of the mean. The 68 – 95 – 99.7 rule and the empirical rule are related. In fact, the empirical rule comes from assuming that a data frequency distribution can be approximately represented by a normal density function with a mean equal to the sample mean and a standard deviation equal to the sample standard deviation . The is a normal distribution with mean 0 and standard deviation 1. If is distributed as , then is distributed as . CHAPTER 2 DESCRIBING PATTERNS IN DATA u u v v 4

2.10 IMPORTANT CONCEPTS AND TOOLS