Some Important Statistics

8.2 Some Important Statistics

  Our main purpose in selecting random samples is to elicit information about the unknown population parameters. Suppose, for example, that we wish to arrive at

  a conclusion concerning the proportion of coffee-drinkers in the United States who prefer a certain brand of coffee. It would be impossible to question every coffee- drinking American in order to compute the value of the parameter p representing the population proportion. Instead, a large random sample is selected and the proportion ˆ p of people in this sample favoring the brand of coffee in question is calculated. The value ˆ p is now used to make an inference concerning the true proportion p.

  Now, ˆ p is a function of the observed values in the random sample; since many

  Chapter 8 Fundamental Sampling Distributions and Data Descriptions

  random samples are possible from the same population, we would expect ˆ p to vary somewhat from sample to sample. That is, ˆ p is a value of a random variable that we represent by P . Such a random variable is called a statistic.

  Definition 8.4: Any function of the random variables constituting a random sample is called a

  statistic .

  Location Measures of a Sample: The Sample Mean, Median, and Mode

  In Chapter 4 we introduced the two parameters μ and σ 2 , which measure the center

  of location and the variability of a probability distribution. These are constant population parameters and are in no way affected or influenced by the observations of a random sample. We shall, however, define some important statistics that describe corresponding measures of a random sample. The most commonly used statistics for measuring the center of a set of data, arranged in order of magnitude, are the mean, median, and mode. Although the first two of these statistics were

  defined in Chapter 1, we repeat the definitions here. Let X 1 ,X 2 ,...,X n represent

  n random variables.

  (a) Sample mean:

  Note that the statistic ¯ X assumes the value ¯ x= 1 n x i when X 1 assumes the

  i=1

  value x 1 ,X 2 assumes the value x 2 , and so forth. The term sample mean is applied

  to both the statistic ¯ X and its computed value ¯ x.

  (b) Sample median:

  x (n+1)2 ,

  if n is odd,

  x= ˜ 1

  2 (x n2 +x n2+1 ), if n is even.

  The sample median is also a location measure that shows the middle value of the sample. Examples for both the sample mean and the sample median can be found in Section 1.3. The sample mode is defined as follows.

  (c) The sample mode is the value of the sample that occurs most often.

  Example 8.1: Suppose a data set consists of the following observations:

  The sample mode is 0.43, since this value occurs more than any other value.

  As we suggested in Chapter 1, a measure of location or central tendency in a sample does not by itself give a clear indication of the nature of the sample. Thus,

  a measure of variability in the sample must also be considered.

  8.2 Some Important Statistics

  Variability Measures of a Sample: The Sample Variance, Standard Deviation, and Range

  The variability in a sample displays how the observations spread out from the average. The reader is referred to Chapter 1 for more discussion. It is possible to have two sets of observations with the same mean or median that differ considerably in the variability of their measurements about the average.

  Consider the following measurements, in liters, for two samples of orange juice bottled by companies A and B:

  Sample A 0.97 1.00 0.94 1.03 1.06 Sample B 1.06 1.01 0.88 0.91 1.14

  Both samples have the same mean, 1.00 liter. It is obvious that company A bottles orange juice with a more uniform content than company B. We say that the variability, or the dispersion, of the observations from the average is less for sample A than for sample B. Therefore, in buying orange juice, we would feel more confident that the bottle we select will be close to the advertised average if we buy from company A.

  In Chapter 1 we introduced several measures of sample variability, including the sample variance, sample standard deviation, and sample range. In

  this chapter, we will focus mainly on the sample variance. Again, let X 1 ,...,X n

  represent n random variables.

  (a) Sample variance:

  The computed value of S 2 for a given sample is denoted by s 2 . Note that

  S 2 is essentially defined to be the average of the squares of the deviations of the

  observations from their mean. The reason for using n − 1 as a divisor rather than the more obvious choice n will become apparent in Chapter 9.

  Example 8.2:

  A comparison of coffee prices at 4 randomly selected grocery stores in San Diego showed increases from the previous month of 12, 15, 17, and 20 cents for a 1-pound bag. Find the variance of this random sample of price increases.

  Solution : Calculating the sample mean, we get

  Whereas the expression for the sample variance best illustrates that S 2 is a

  measure of variability, an alternative expression does have some merit and thus the reader should be aware of it. The following theorem contains this expression.

  Chapter 8 Fundamental Sampling Distributions and Data Descriptions

  Theorem 8.1: If S 2 is the variance of a random sample of size n, we may write

  Proof : By definition,

  As in Chapter 1, the sample standard deviation and the sample range are defined below.

  (b) Sample standard deviation:

  √ S= S 2 ,

  where S 2 is the sample variance.

  Let X max denote the largest of the X i values and X min the smallest. (c) Sample range:

  R=X max −X min .

  Example 8.3: Find the variance of the data 3, 4, 5, 6, 6, and 7, representing the number of trout

  caught by a random sample of 6 fishermen on June 19, 1996, at Lake Muskoka.

  Solution : We find that

  x 2 i = 171,

  x i = 31, and n = 6. Hence,

  Thus, the sample standard deviation s = 136 = 1.47 and the sample range is

  7 − 3 = 4.

  Exercises

  8.1 Define suitable populations from which the fol- (c) Two hundred pairs of a new type of tennis shoe lowing samples are selected:

  were tested on the professional tour and, on aver-

  (a) Persons in 200 homes in the city of Richmond are

  age, lasted 4 months.

  called on the phone and asked to name the candi- (d) On five different occasions it took a lawyer 21, 26, date they favor for election to the school board.

  24, 22, and 21 minutes to drive from her suburban

  (b) A coin is tossed 100 times and 34 tails are recorded.

  home to her midtown office.

  Exercises

  8.2 The lengths of time, in minutes, that 10 patients of laundry, in grams, for a random sample of various waited in a doctor’s office before receiving treatment types of detergents used according to the prescribed were recorded as follows: 5, 11, 9, 5, 10, 15, 6, 10, 5, directions: and 10. Treating the data as a random sample, find

  Laundry

  Phosphates per Load

  (a) the mean;

  (b) the median;

  A P Blue Sail

  (c) the mode.

  Concentrated All

  Cold Water All

  8.3 The reaction times for a random sample of 9 sub-

  Breeze

  jects to a stimulant were recorded as 2.5, 3.6, 3.1, 4.3,

  Oxydol

  2.9. 2.3, 2.6, 4.1, and 3.4 seconds. Calculate

  Ajax

  (a) the mean;

  (b) the median.

  Fab

  Cold Power

  8.4 The number of tickets issued for traffic violations

  Rinso

  by 8 state troopers during the Memorial Day weekend For the given phosphate data, find are 5, 4, 7, 7, 6, 3, 8, and 6.

  (a) the mean;

  (a) If these values represent the number of tickets is-

  sued by a random sample of 8 state troopers from (b) the median; Montgomery County in Virginia, define a suitable (c) the mode. population.

  (b) If the values represent the number of tickets issued 8.9 Consider the data in Exercise 8.2, find

  by a random sample of 8 state troopers from South (a) the range; Carolina, define a suitable population.

  (b) the standard deviation.

  8.5 The numbers of incorrect answers on a true-false

  competency test for a random sample of 15 students 8.10 calculate For the sample of reaction times in Exercise 8.3,

  were recorded as follows: 2, 1, 3, 0, 1, 3, 6, 0, 3, 3, 5,

  2, 1, 4, and 2. Find

  (a) the range;

  (a) the mean;

  (b) the variance, using the formula of form (8.2.1).

  (b) the median;

  (c) the mode.

  8.11 For the data of Exercise 8.5, calculate the vari- ance using the formula

  (a) of form (8.2.1);

  8.6 Find the mean, median, and mode for the sample whose observations, 15, 7, 8, 95, 19, 12, 8, 22, and 14, (b) in Theorem 8.1. represent the number of sick days claimed on 9 fed-

  eral income tax returns. Which value appears to be 8.12 The tar contents of 8 brands of cigarettes se-

  the best measure of the center of these data? State lected at random from the latest list released by the reasons for your preference.

  Federal Trade Commission are as follows: 7.3, 8.6, 10.4, 16.1, 12.2, 15.1, 14.5, and 9.3 milligrams. Calculate

  8.7 A random sample of employees from a local man- (a) the mean; ufacturing plant pledged the following donations, in (b) the variance. dollars, to the United Fund: 100, 40, 75, 15, 20, 100,

  75, 50, 30, 10, 55, 75, 25, 50, 90, 80, 15, 25, 45, and 8.13 The grade-point averages of 20 college seniors

  100. Calculate

  selected at random from a graduating class are as fol-

  (a) the mean;

  lows:

  (b) the mode.

  8.8 According to ecology writer Jacqueline Killeen,

  phosphates contained in household detergents pass Calculate the standard deviation. right through our sewer systems, causing lakes to turn

  into swamps that eventually dry up into deserts. The 8.14 (a) Show that the sample variance is unchanged

  following data show the amount of phosphates per load

  if a constant c is added to or subtracted from each

  Chapter 8 Fundamental Sampling Distributions and Data Descriptions

  value in the sample.

  8.16 In the 2004-05 football season, University of

  (b) Show that the sample variance becomes c 2 times Southern California had the following score differences

  its original value if each observation in the sample for the 13 games it played. is multiplied by c.

  8.15 Verify that the variance of the sample 4, 9, 3, 6, 4, and 7 is 5.1, and using this fact, along with the results of Exercise 8.14, find

  Find

  (a) the variance of the sample 12, 27, 9, 18, 12, and 21; (a) the mean score difference; (b) the variance of the sample 9, 14, 8, 11, 9, and 12.

  (b) the median score difference.