Some Important Statistics
8.2 Some Important Statistics
Our main purpose in selecting random samples is to elicit information about the unknown population parameters. Suppose, for example, that we wish to arrive at
a conclusion concerning the proportion of coffee-drinkers in the United States who prefer a certain brand of coffee. It would be impossible to question every coffee- drinking American in order to compute the value of the parameter p representing the population proportion. Instead, a large random sample is selected and the proportion ˆ p of people in this sample favoring the brand of coffee in question is calculated. The value ˆ p is now used to make an inference concerning the true proportion p.
Now, ˆ p is a function of the observed values in the random sample; since many
228 Chapter 8 Fundamental Sampling Distributions and Data Descriptions random samples are possible from the same population, we would expect ˆ p to vary
somewhat from sample to sample. That is, ˆ p is a value of a random variable that we represent by P . Such a random variable is called a statistic.
Definition 8.4: Any function of the random variables constituting a random sample is called a statistic.
Location Measures of a Sample: The Sample Mean, Median, and Mode
In Chapter 4 we introduced the two parameters μ and σ 2 , which measure the center of location and the variability of a probability distribution. These are constant population parameters and are in no way affected or influenced by the observations of a random sample. We shall, however, define some important statistics that describe corresponding measures of a random sample. The most commonly used statistics for measuring the center of a set of data, arranged in order of magnitude, are the mean, median, and mode. Although the first two of these statistics were
defined in Chapter 1, we repeat the definitions here. Let X 1 ,X 2 ,...,X n represent n random variables.
(a) Sample mean:
X= ¯ 1 X i .
n i=1
1 Note that the statistic ¯ n X assumes the value ¯ x=
x i when X 1 assumes the
i=1
value x 1 ,X 2 assumes the value x 2 , and so forth. The term sample mean is applied to both the statistic ¯ X and its computed value ¯ x.
(b) Sample median:
(n+1)/2 ,
if n is odd,
x= ˜
2 (x n/2 +x n/2+1 ), if n is even. The sample median is also a location measure that shows the middle value of the
sample. Examples for both the sample mean and the sample median can be found in Section 1.3. The sample mode is defined as follows.
(c) The sample mode is the value of the sample that occurs most often. Example 8.1: Suppose a data set consists of the following observations:
0.32 0.53 0.28 0.37 0.47 0.43 0.36 0.42 0.38 0.43. The sample mode is 0.43, since this value occurs more than any other value.
As we suggested in Chapter 1, a measure of location or central tendency in a sample does not by itself give a clear indication of the nature of the sample. Thus,
a measure of variability in the sample must also be considered.
8.2 Some Important Statistics 229
Variability Measures of a Sample: The Sample Variance, Standard Deviation, and Range
The variability in a sample displays how the observations spread out from the average. The reader is referred to Chapter 1 for more discussion. It is possible to have two sets of observations with the same mean or median that differ considerably in the variability of their measurements about the average.
Consider the following measurements, in liters, for two samples of orange juice bottled by companies A and B:
Sample A 0.97 1.00 0.94 1.03 1.06 Sample B 1.06 1.01 0.88 0.91 1.14
Both samples have the same mean, 1.00 liter. It is obvious that company A bottles orange juice with a more uniform content than company B. We say that the variability, or the dispersion, of the observations from the average is less for sample A than for sample B. Therefore, in buying orange juice, we would feel more confident that the bottle we select will be close to the advertised average if we buy from company A.
In Chapter 1 we introduced several measures of sample variability, including the sample variance, sample standard deviation, and sample range. In this chapter, we will focus mainly on the sample variance. Again, let X 1 ,...,X n represent n random variables.
(a) Sample variance:
The computed value of S 2 for a given sample is denoted by s 2 . Note that S 2 is essentially defined to be the average of the squares of the deviations of the observations from their mean. The reason for using n − 1 as a divisor rather than the more obvious choice n will become apparent in Chapter 9.
Example 8.2:
A comparison of coffee prices at 4 randomly selected grocery stores in San Diego showed increases from the previous month of 12, 15, 17, and 20 cents for a 1-pound bag. Find the variance of this random sample of price increases.
Solution : Calculating the sample mean, we get
Whereas the expression for the sample variance best illustrates that S 2 is a measure of variability, an alternative expression does have some merit and thus the reader should be aware of it. The following theorem contains this expression.
230 Chapter 8 Fundamental Sampling Distributions and Data Descriptions
Theorem 8.1: If S 2 is the variance of a random sample of size n, we may write
Proof : By definition,
As in Chapter 1, the sample standard deviation and the sample range are defined below.
(b) Sample standard deviation:
√ S= S 2 ,
where S 2 is the sample variance.
Let X max denote the largest of the X i values and X min the smallest. (c) Sample range:
R=X max −X min .
Example 8.3: Find the variance of the data 3, 4, 5, 6, 6, and 7, representing the number of trout caught by a random sample of 6 fishermen on June 19, 1996, at Lake Muskoka.
Solution : We find that
x 2 i = 171,
x i = 31, and n = 6. Hence,
Thus, the sample standard deviation s = 13/6 = 1.47 and the sample range is
Exercises
8.1 Define suitable populations from which the fol- (c) Two hundred pairs of a new type of tennis shoe lowing samples are selected:
were tested on the professional tour and, on aver- (a) Persons in 200 homes in the city of Richmond are
age, lasted 4 months.
called on the phone and asked to name the candi- (d) On five different occasions it took a lawyer 21, 26, date they favor for election to the school board.
24, 22, and 21 minutes to drive from her suburban (b) A coin is tossed 100 times and 34 tails are recorded.
home to her midtown office.
Exercises 231 8.2 The lengths of time, in minutes, that 10 patients of laundry, in grams, for a random sample of various
waited in a doctor’s office before receiving treatment types of detergents used according to the prescribed were recorded as follows: 5, 11, 9, 5, 10, 15, 6, 10, 5, directions: and 10. Treating the data as a random sample, find
Phosphates per Load (a) the mean;
Laundry
(grams) (b) the median;
Detergent
A & P Blue Sail
47 (c) the mode.
Dash
Concentrated All
42 8.3 The reaction times for a random sample of 9 sub-
Cold Water All
41 jects to a stimulant were recorded as 2.5, 3.6, 3.1, 4.3,
Breeze
34 2.9. 2.3, 2.6, 4.1, and 3.4 seconds. Calculate
Oxydol
31 (a) the mean;
Ajax
30 (b) the median.
Cold Power
29 8.4 The number of tickets issued for traffic violations
Bold
26 by 8 state troopers during the Memorial Day weekend For the given phosphate data, find
Rinso
are 5, 4, 7, 7, 6, 3, 8, and 6.
(a) the mean;
(a) If these values represent the number of tickets is- sued by a random sample of 8 state troopers from (b) the median; Montgomery County in Virginia, define a suitable (c) the mode. population.
(b) If the values represent the number of tickets issued 8.9 Consider the data in Exercise 8.2, find by a random sample of 8 state troopers from South (a) the range; Carolina, define a suitable population.
(b) the standard deviation.
8.5 The numbers of incorrect answers on a true-false 8.10 For the sample of reaction times in Exercise 8.3, competency test for a random sample of 15 students calculate
were recorded as follows: 2, 1, 3, 0, 1, 3, 6, 0, 3, 3, 5, 2, 1, 4, and 2. Find
(a) the range;
(a) the mean; (b) the variance, using the formula of form (8.2.1). (b) the median;
(c) the mode. 8.11 For the data of Exercise 8.5, calculate the vari- ance using the formula
(a) of form (8.2.1);
8.6 Find the mean, median, and mode for the sample whose observations, 15, 7, 8, 95, 19, 12, 8, 22, and 14, (b) in Theorem 8.1. represent the number of sick days claimed on 9 fed- eral income tax returns. Which value appears to be
8.12 The tar contents of 8 brands of cigarettes se- the best measure of the center of these data? State lected at random from the latest list released by the reasons for your preference.
Federal Trade Commission are as follows: 7.3, 8.6, 10.4, 16.1, 12.2, 15.1, 14.5, and 9.3 milligrams. Calculate
8.7 A random sample of employees from a local man- (a) the mean; ufacturing plant pledged the following donations, in (b) the variance. dollars, to the United Fund: 100, 40, 75, 15, 20, 100,
75, 50, 30, 10, 55, 75, 25, 50, 90, 80, 15, 25, 45, and 8.13 The grade-point averages of 20 college seniors 100. Calculate
selected at random from a graduating class are as fol- (a) the mean;
lows:
(b) the mode. 3.2 1.9 2.7 2.4 2.8 2.9 3.8 3.0 2.5 3.3 1.8 2.5 3.7 2.8 2.0
8.8 According to ecology writer Jacqueline Killeen, 3.2 2.3 2.1 2.5 1.9 phosphates contained in household detergents pass Calculate the standard deviation.
right through our sewer systems, causing lakes to turn into swamps that eventually dry up into deserts. The
8.14 (a) Show that the sample variance is unchanged following data show the amount of phosphates per load
if a constant c is added to or subtracted from each
232 Chapter 8 Fundamental Sampling Distributions and Data Descriptions value in the sample.
8.16 In the 2004-05 football season, University of (b) Show that the sample variance becomes c 2 times Southern California had the following score differences
its original value if each observation in the sample for the 13 games it played. is multiplied by c.
11 49 32 3 6 38 38 30 8 40 31 5 36 8.15 Verify that the variance of the sample 4, 9, 3,
6, 4, and 7 is 5.1, and using this fact, along with the results of Exercise 8.14, find
Find
(a) the variance of the sample 12, 27, 9, 18, 12, and 21; (a) the mean score difference; (b) the variance of the sample 9, 14, 8, 11, 9, and 12.
(b) the median score difference.