The mode of a numerical data set is the value that occurs

Type 3 350 361 362 364 364 365 366 371 377 377 377 379 380 380 392 a. Construct a comparative boxplot, and comment on simi- larities and differences. b. Construct a comparative dotplot a dotplot for each sam- ple with a common scale. Comment on similarities and differences. c. Does the comparative boxplot of part a give an inform- ative assessment of similarities and differences? Explain your reasoning. 76. The three measures of center introduced in this chapter are the mean, median, and trimmed mean. Two additional measures of center that are occasionally used are the midrange, which is the average of the smallest and largest observations, and the midfourth, which is the average of the two fourths.Which of these five measures of center are resistant to the effects of out- liers and which are not? Explain your reasoning. 77. The authors of the article “Predictive Model for Pitting Corrosion in Buried Oil and Gas Pipelines” Corrosion, 2009: 332–342 provided the data on which their investiga- tion was based. a. Consider the following sample of 61 observations on maximum pitting depth mm of pipeline specimens buried in clay loam soil. 0.41 0.41 0.41 0.41 0.43 0.43 0.43 0.48 0.48 0.58 0.79 0.79 0.81 0.81 0.81 0.91 0.94 0.94 1.02 1.04 1.04 1.17 1.17 1.17 1.17 1.17 1.17 1.17 1.19 1.19 1.27 1.40 1.40 1.59 1.59 1.60 1.68 1.91 1.96 1.96 1.96 2.10 2.21 2.31 2.46 2.49 2.57 2.74 3.10 3.18 3.30 3.58 3.58 4.15 4.75 5.33 7.65 7.70 8.13 10.41 13.44 Construct a stem-and-leaf display in which the two largest values are shown in a last row labeled HI. b. Refer back to a, and create a histogram based on eight classes with 0 as the lower limit of the first class and class widths of .5, .5, .5, .5, 1, 2, 5, and 5, respectively. c. The accompanying comparative boxplot from Minitab shows plots of pitting depth for four different types of soils. Describe its important features. 78. Consider a sample and suppose that the values of , s 2 , and s have been calculated. a. Let for . How do the values of s 2 and s for the y i ’ s compare to the corresponding values for the x i ’ s? Explain. b. Let for . What are the values of the sample variance and sample standard deviation for the z i ’ s? 79. Let and denote the sample mean and variance for the sample and let and denote these quanti- ties when an additional observation is added to the sample. a. Show how can be computed from and . b. Show that so that can be computed from , , and . c. Suppose that a sample of 15 strands of drapery yarn has resulted in a sample mean thread elongation of 12.58 mm and a sample standard deviation of .512 mm. A 16 th strand results in an elongation value of 11.8. What are the values of the sample mean and sample standard devi- ation for all 16 elongation observations? 80. Lengths of bus routes for any particular transit system will typically vary from one route to another. The article “Planning of City Bus Routes” J. of the Institution of Engineers, 1995: 211–215 gives the following information on lengths km for one particular system: Length Frequency 6 23 30 35 32 Length Frequency 48 42 40 28 27 Length Frequency 26 14 27 11 2 402,45 352,40 302,35 282,30 262,28 242,26 222,24 202,22 182,20 162,18 142,16 122,14 102,12 82,10 62,8 s n 2 x n x n1 1 s n1 1 2 ns n1 1 2 5 n 2 1s n 2 1 n n 1 1 x n1 1 2 x n 2 x n1 1 x n x n1 1 x n1 1 s n1 1 2 x n1 1 x 1 , c , x n s n 2 x n i 5 1, c , n z i 5 x i 2 x s i 5 1, c , n y i 5 x i 2 x x x 1 , x 2 , c , x n Maximum pit depth 14 12 10 8 6 4 2 C CL SCL SYCL Soil type Comparative boxplot for Exercise 77 Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. a. Draw a histogram corresponding to these frequencies. b. What proportion of these route lengths are less than 20? What proportion of these routes have lengths of at least 30? c. Roughly what is the value of the 90 th percentile of the route length distribution? d. Roughly what is the median route length? 81. A study carried out to investigate the distribution of total braking time reaction time plus accelerator-to-brake move- ment time, in ms during real driving conditions at 60 kmhr gave the following summary information on the distribution of times “A Field Study on Braking Responses During Driving,” Ergonomics, 1995: 1903–1910: What can you conclude about the shape of a histogram of this data? Explain your reasoning. 82. The sample data sometimes represents a time series, where value of a response variable x at time t. Often the observed series shows a great deal of random variation, which makes it difficult to study longer- term behavior. In such situations, it is desirable to produce a smoothed version of the series. One technique for doing so involves exponential smoothing. The value of a smoothing constant is chosen . Then with at time t, we set , and for , . a. Consider the following time series in which of effluent at a sewage treatment plant on day t: 47, 54, 53, 50, 46, 46, 47, 50, 51, 50, 46, 52, 50, 50. Plot each x t against t on a two-dimensional coordinate system a time-series plot. Does there appear to be any pattern? b. Calculate the using . Repeat using . Which value of gives a smoother series? x t a a 5 .5 a 5 .1 x t ’s x t 5 temperature 8F x t

5 ax

t 1 1 2 ax t2 1 t 5 2, 3, c , n x 1 5 x 1 x t 5 smoothed value 0 , a , 1 a x t 5 the observed x 1 , x 2 , c , x n 95th percentile 5 720 90th percentile 5 640 10th percentile 5 430 5th percentile 5 400 maximum 5 925 minimum 5 220 sd 5 96 mode 5 500 median 5 500 mean 5 535 c. Substitute on the right-hand side of the expression for , then substitute in terms of and , and so on. On how many of the values does depend? What happens to the coefficient on as k increases? d. Refer to part c. If t is large, how sensitive is to the ini- tialization ? Explain. [Note: A relevant reference is the article “Simple Statistics for Interpreting Environmental Data,” Water Pollution Control Fed. J., 1981: 167–175.] 83. Consider numerical observations . It is frequently of interest to know whether the x i s are at least approxi- mately symmetrically distributed about some value. If n is at least moderately large, the extent of symmetry can be assessed from a stem-and-leaf display or histogram. However, if n is not very large, such pictures are not partic- ularly informative. Consider the following alternative. Let y 1 denote the smallest x i , y 2 the second smallest x i , and so on. Then plot the following pairs as points on a two-dimensional coordinate system: There are n2 points when n is even and when n is odd. a. What does this plot look like when there is perfect sym- metry in the data? What does it look like when observa- tions stretch out more above the median than below it a long upper tail? b. The accompanying data on rainfall acre-feet from 26 seeded clouds is taken from the article “A Bayesian Analysis of a Multiplicative Treatment Effect in Weather Modification” Technometrics, 1975: 161–166. Construct the plot and comment on the extent of sym- metry or nature of departure from symmetry. 4.1 7.7 17.5 31.4 32.7 40.6 92.4 115.3 118.3 119.0 129.6 198.6 200.7 242.5 255.0 274.7 274.7 302.8 334.1 430.0 489.1 703.4 978.0 1656.0 1697.8 2745.6 n 2 12 y n2 2 2 x |, x| 2 y 3 , c y n2 1 2 x |, x| 2 y 2 , y n 2 x |, x| 2 y 1 , x 1 , c , x n x 1 5 x 1 x t x t2k x t x t , x t2 1 , c , x 1 x t2 3 x t2 2 x t2 2 x t x t2 1

5 ax

t2 1 1 1 2 ax t2 2 Bibliography Chambers, John, William Cleveland, Beat Kleiner, and Paul Tukey, Graphical Methods for Data Analysis, BrooksCole, Pacific Grove, CA, 1983. A highly recommended presentation of various graphical and pictorial methodology in statistics. Cleveland, William, Visualizing Data, Hobart Press, Summit, NJ, 1993. An entertaining tour of pictorial techniques. Peck, Roxy, and Jay Devore, Statistics: The Exploration and Analysis of Data 6th ed., Thomson BrooksCole, Belmont, CA, 2008. The first few chapters give a very nonmathemati- cal survey of methods for describing and summarizing data. Freedman, David, Robert Pisani, and Roger Purves, Statistics 4th ed., Norton, New York, 2007. An excellent, very nonmathe- matical survey of basic statistical reasoning and methodology. Hoaglin, David, Frederick Mosteller, and John Tukey, Understanding Robust and Exploratory Data Analysis, Wiley, New York, 1983. Discusses why, as well as how, exploratory methods should be employed; it is good on details of stem-and-leaf displays and boxplots. Moore, David, and William Notz, Statistics: Concepts and Controversies 7th ed., Freeman, San Francisco, 2009. An extremely readable and entertaining paperback that contains an intuitive discussion of problems connected with sampling and designed experiments. Peck, Roxy, et al. eds., Statistics: A Guide to the Unknown 4th ed., Thomson BrooksCole, Belmont, CA, 2006. Contains many short nontechnical articles describing various applica- tions of statistics. Verzani, John, Using R for Introductory Statistics, Chapman and HallCRC, Boca Raton, FL, 2005. A very nice introduction to the R software package. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 50 Probability 2 INTRODUCTION The term probability refers to the study of randomness and uncertainty. In any situation in which one of a number of possible outcomes may occur, the disci- pline of probability provides methods for quantifying the chances, or likeli- hoods, associated with the various outcomes. The language of probability is constantly used in an informal manner in both written and spoken contexts. Examples include such statements as “It is likely that the Dow Jones average will increase by the end of the year,” “There is a 50–50 chance that the incum- bent will seek reelection,” “There will probably be at least one section of that course offered next year,” “The odds favor a quick settlement of the strike,” and “It is expected that at least 20,000 concert tickets will be sold.” In this chapter, we introduce some elementary probability concepts, indicate how probabilities can be interpreted, and show how the rules of probability can be applied to compute the probabilities of many interesting events. The method- ology of probability will then permit us to express in precise language such informal statements as those given above. The study of probability as a branch of mathematics goes back over 300 years, where it had its genesis in connection with questions involving games of chance. Many books are devoted exclusively to probability, but our objective here is to cover only that part of the subject that has the most direct bearing on problems of statistical inference. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Example 2.3 Example 2.2 Example 2.1 DEFINITION 2.1 Sample Spaces and Events An experiment is any activity or process whose outcome is subject to uncertainty. Although the word experiment generally suggests a planned or carefully controlled laboratory testing situation, we use it here in a much wider sense. Thus experiments that may be of interest include tossing a coin once or several times, selecting a card or cards from a deck, weighing a loaf of bread, ascertaining the commuting time from home to work on a particular morning, obtaining blood types from a group of individuals, or measuring the compressive strengths of different steel beams. The Sample Space of an Experiment The sample space of an experiment, denoted by , is the set of all possible outcomes of that experiment. S The simplest experiment to which probability applies is one with two possible out- comes. One such experiment consists of examining a single fuse to see whether it is defective. The sample space for this experiment can be abbreviated as , where N represents not defective, D represents defective, and the braces are used to enclose the elements of a set. Another such experiment would involve tossing a thumbtack and noting whether it landed point up or point down, with sample space , and yet another would consist of observing the gender of the next child born at the local hospital, with . ■ If we examine three fuses in sequence and note the result of each examination, then an outcome for the entire experiment is any sequence of N’s and D’s of length 3, so If we had tossed a thumbtack three times, the sample space would be obtained by replacing N by U in above, with a similar notational change yielding the sample space for the experiment in which the genders of three newborn children are observed. ■ Two gas stations are located at a certain intersection. Each one has six gas pumps. Consider the experiment in which the number of pumps in use at a particular time of day is determined for each of the stations. An experimental outcome specifies how many pumps are in use at the first station and how many are in use at the second one. One possible outcome is 2, 2, another is 4, 1, and yet another is 1, 4. The 49 outcomes in are displayed in the accompanying table. The sample space for the experiment in which a six-sided die is thrown twice results from deleting the 0 row and 0 column from the table, giving 36 outcomes. S S S 5 5NNN, NND, NDN, NDD, DNN, DND, DDN, DDD6 S 5 5M, F6 S 5 5U, D6 S 5 5N, D6 Second Station 1 2 3 4 5 6 0, 0 0, 1 0, 2 0, 3 0, 4 0, 5 0, 6 1 1, 0 1, 1 1, 2 1, 3 1, 4 1, 5 1, 6 2 2, 0 2, 1 2, 2 2, 3 2, 4 2, 5 2, 6 First Station 3 3, 0 3, 1 3, 2 3, 3 3, 4 3, 5 3, 6 4 4, 0 4, 1 4, 2 4, 3 4, 4 4, 5 4, 6 5 5, 0 5, 1 5, 2 5, 3 5, 4 5, 5 5, 6 6 6, 0 6, 1 6, 2 6, 3 6, 4 6, 5 6, 6 ■ Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Example 2.5 DEFINITION A reasonably large percentage of C⫹⫹ programs written at a particular company compile on the first run, but some do not a compiler is a program that translates source code, in this case C⫹⫹ programs, into machine language so programs can be executed. Suppose an experiment consists of selecting and compiling C⫹⫹ pro- grams at this location one by one until encountering a program that compiles on the first run. Denote a program that compiles on the first run by S for success and one that doesn’t do so by F for failure. Although it may not be very likely, a possible outcome of this experiment is that the first 5 or 10 or 20 or . . . are F’s and the next one is an S. That is, for any positive integer n, we may have to examine n programs before seeing the first S. The sample space is , which contains an infinite number of possible outcomes. The same abbreviated form of the sample space is appropriate for an experiment in which, starting at a specified time, the gender of each newborn infant is recorded until the birth of a male is observed. ■ Events In our study of probability, we will be interested not only in the individual outcomes of but also in various collections of outcomes from . S S S 5 5S, FS, FFS, FFFS, c 6 An event is any collection subset of outcomes contained in the sample space . An event is simple if it consists of exactly one outcome and compound if it consists of more than one outcome. S When an experiment is performed, a particular event A is said to occur if the result- ing experimental outcome is contained in A. In general, exactly one simple event will occur, but many compound events will occur simultaneously. Consider an experiment in which each of three vehicles taking a particular freeway exit turns left L or right R at the end of the exit ramp. The eight possible outcomes that comprise the sample space are LLL, RLL, LRL, LLR, LRR, RLR, RRL, and RRR. Thus there are eight simple events, among which are . Some compound events include the event that exactly one of the three vehicles turns right the event that at most one of the vehicles turns right the event that all three vehicles turn in the same direction Suppose that when the experiment is performed, the outcome is LLL. Then the sim- ple event E 1 has occurred and so also have the events B and C but not A. ■ When the number of pumps in use at each of two six-pump gas stations is observed, there are 49 possible outcomes, so there are 49 simple events: . Examples of compound events are E 1 5 50, 06, E 2 5 50, 16, c , E 49 5 56, 66 C 5 5LLL, RRR6 5 B 5 5LLL, RLL, LRL, LLR6 5 A 5 5RLL, LRL, LLR6 5 E 1 5 5LLL6 and E 5 5 5LRR6 Example 2.6 Example 2.3 continued Example 2.4 Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Example 2.9 Example 2.4 continued Example 2.8 Example 2.3 continued DEFINITION the event that the number of pumps in use is the same for both stations the event that the total number of pumps in use is four the event that at most one pump is in use at each station ■ The sample space for the program compilation experiment contains an infinite num- ber of outcomes, so there are an infinite number of simple events. Compound events include the event that an even number of programs are examined ■ Some Relations from Set Theory An event is just a set, so relationships and results from elementary set theory can be used to study events. The following operations will be used to create new events from given events. E 5 5FS, FFFS, FFFFFS, c 6 5 A 5 5S, FS, FFS6 5 the event that at most three programs are examined C 5 50, 0, 0, 1, 1, 0, 1, 16 5 B 5 50, 4, 1, 3, 2, 2, 3, 1, 4, 06 5 A 5 50, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 66 5

1. The complement of an event A, denoted by , is the set of all outcomes in

that are not contained in A.

2. The union of two events A and B, denoted by

and read “A or B,” is the event consisting of all outcomes that are either in A or in B or in both events so that the union includes outcomes for which both A and B occur as well as outcomes for which exactly one occurs—that is, all outcomes in at least one of the events.

3. The intersection of two events A and B, denoted by

and read “A and B, ” is the event consisting of all outcomes that are in both A and B. A ¨ B A ´ B S A r For the experiment in which the number of pumps in use at a single six-pump gas station is observed, let , , and . Then ■ In the program compilation experiment, define A, B, and C by Then ■ A ´ B 5 5S, FS, FFS, FFFFS6, A ¨ B 5 5S, FFS6 A r 5 5FFFS, FFFFS, FFFFFS, c 6, Cr 5 5S, FFS, FFFFS, c 6 A 5 5S, FS, FFS6, B 5 5S, FFS, FFFFS6, C 5 5FS, FFFS, FFFFFS, c 6 A ¨ B 5 53, 46, A ¨ C 5 51, 36, A ¨ Cr 5 50, 2, 4, 5, 66 A r 5 55, 66, A ´ B 5 50, 1, 2, 3, 4, 5, 66 5 S, A ´ C 5 50, 1, 2, 3, 4, 56, C 5 51, 3, 56 B 5 53, 4, 5, 66 A 5 50, 1, 2, 3, 46 Example 2.7 Example 2.4 continued Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Example 2.10 DEFINITION Sometimes A and B have no outcomes in common, so that the intersection of A and B contains no outcomes. Let denote the null event the event consisting of no outcomes whatsoever. When , A and B are said to be mutually exclusive or disjoint events. A ¨ B 5 [ [ A small city has three automobile dealerships: a GM dealer selling Chevrolets and Buicks; a Ford dealer selling Fords and Lincolns; and a Toyota dealer. If an experi- ment consists of observing the brand of the next car sold, then the events and are mutually exclusive because the next car sold cannot be both a GM product and a Ford product at least until the two companies merge. ■ The operations of union and intersection can be extended to more than two events. For any three events A, B, and C, the event is the set of outcomes contained in at least one of the three events, whereas is the set of out- comes contained in all three events. Given events , these events are said to be mutually exclusive or pairwise disjoint if no two events have any out- comes in common. A pictorial representation of events and manipulations with events is obtained by using Venn diagrams. To construct a Venn diagram, draw a rectangle whose interior will represent the sample space . Then any event A is represented as the interior of a closed curve often a circle contained in . Figure 2.1 shows examples of Venn diagrams. S S A 1 , A 2 , A 3 , c A ¨ B ¨ C A ´ B ´ C B 5 5Ford, Lincoln6 A 5 5Chevrolet, Buick6 EXERCISES Section 2.1 1–10 1. Four universities—1, 2, 3, and 4—are participating in a holi- day basketball tournament. In the first round, 1 will play 2 and 3 will play 4. Then the two winners will play for the championship, and the two losers will also play. One possi- ble outcome can be denoted by 1324 1 beats 2 and 3 beats 4 in first-round games, and then 1 beats 3 and 2 beats 4. a. List all outcomes in . b. Let A denote the event that 1 wins the tournament. List outcomes in A. c. Let B denote the event that 2 gets into the championship game. List outcomes in B. d. What are the outcomes in and in ? What are the outcomes in A⬘? 2. Suppose that vehicles taking a particular freeway exit can turn right R, turn left L, or go straight S. Consider A ¨ B A ´ B S observing the direction for each of three successive vehicles. a. List all outcomes in the event A that all three vehicles go in the same direction. b. List all outcomes in the event B that all three vehicles take different directions. c. List all outcomes in the event C that exactly two of the three vehicles turn right. d. List all outcomes in the event D that exactly two vehicles go in the same direction. e. List outcomes in D⬘, , and . 3. Three components are connected to form a system as shown in the accompanying diagram. Because the components in the 2–3 subsystem are connected in parallel, that subsystem will function if at least one of the two individual components C ¨ D C ´ D A B a Venn diagram of events A and B A B e Mutually exclusive events A B c Shaded region is A 傼 B A d Shaded region is A A B b Shaded region is A 傽 B Figure 2.1 Venn diagrams Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.