2.18
2.19
2.20
2.21 2.22
2.23
2.24
Company A
B
. .
. .
. .
. .
. .
. .
. .
. .
.
68
Mean salary 27,000
23,500 Median salary
22,000 25,000
CHAPTER 2 DESCRIBING PATTERNS IN DATA
The data displays and summary numbers we have considered to this point are helpful in organizing large sets of numbers, and they are particularly appropriate for
measurements of quantitative variables.
Given the four observations 3
1 4
calculate the sample variance, sample standard deviation, range, and interquar- tile range.
Calculate the sample mean, variance, and standard deviation for each of the following data sets:
a. 6, 9, 7, 9, 14
b. 23, 29, 22, 26
c. 1 1, .8,
2, 1.6, 2.9 Annual salaries in thousands of dollars for ten of the top-ranking officers in a
large corporation are given here: 175
150 210
650 425
230 190
260 300
250 Calculate the sample mean and sample median. Comment on the appropriate-
ness of these numbers as summary measures of top executive salaries. Refer to Exercise 2.20. Obtain the sample quartiles of top executive salaries.
Sketch three density histograms: a symmetric histogram, one with a long right- hand tail, and one with a long left-hand tail. Keeping in mind that the mean is
the center of gravity, or balancing point, of the data distribution and that the median divides the data in half, indicate the relative positions of the sample
mean and the sample median on each of your density histograms.
Vendors doing business with a particular state were sampled to determine the economic impact of state business on their gross sales. A sample of 15 firms
that provide services to the state had the following percentages of total annual sales as a result of sales to the state:
27 0 12 0
14 9 1 2
1 1 0
1 5 3
7 6 5 0
1 0 1 0
3 2 3 0
7 0 a.
Find the sample median, first quartile, and third quartile. b.
Find the range and interquartile range. c.
Find the sample 90th percentile. The mean and median salaries of machinists employed by two competing
companies, A and B, are as follows:
EXERCISES
2 2
2.25
a. 2.26
2.27
b, 2.28
4
4 4
x Hint:
M
Hint:
Minitab or similar program recommended x
Hint:
Q Q
s x
x n
n computing formula
x x
x x
, ,
, ,
, ,
, ,
69
2.5 NUMERICAL SUMMARIES OF DATA DISTRIBUTIONS
DeathClm.dat
1 2
Assume that the salaries are set in accordance with job competence and that the overall quality of workers is about the same in the two companies.
a. Which company offers a better prospect to a machinist having superior
ability? Explain. b.
Where can a medium-quality machinist expect to earn more? Explain. Consider the data on workers per vehicle for the 10 most productive vehicle
assembly plants listed in Table 2.4 see Example 2.8 . a.
Plot these data as a dot diagram. Calculate the sample mean, , and indicate the mean on the dot diagram.
b. Calculate the 5 trimmed mean. [
Round 10 .05 up to the next integer when determining the number of observations to delete from the
ordered data. ] c.
Calculate the sample median, , and indicate the median on the dot diagram
in part Compare the mean and median. What does the discrepancy if
any tell you about the symmetry of this data set? Consider the RD expenditures as a percentage of sales given in Example
2.2. a.
Calculate the 5 trimmed mean. [ Round 12 .05 up to the next
integer when determining the number of observations to delete from the ordered data. ]
b. Compare the 5 trimmed mean with the sample mean see Exercise 1.37
of Chapter 1 . c.
Calculate the sample median. d.
Discuss the relative effects of outliers on the 5 trimmed mean, the sample mean, and the median for this particular example.
Consider the death claim amounts given in Example 2.10.
a. Calculate the sample mean,
, and the 5 trimmed mean. [ See
Example 2.10 for the ordered claims. Round 31 .05 up to the next integer when determining the number of observations to delete from the ordered
set. ] b.
Calculate the first and third quartiles, and
, and determine the interquartile range, IR.
c. Using the data and the median from Example 2.10 and the results in part
display the boxplot or modified boxplot for the death claim amounts. The sample variance may be written
1 1
1 This formula is known as the
for the sample variance. It leads to faster calculations because it uses the basic quantities,
and ,
directly and does not require the intermediate quantities . You are given
the four observations 1 000 000
1 000 001 1 000 000
1 000 000
1 3
2 2
2 1
1 2
2 2
2
n n
i i
i i
i i
i
All departments
Natural sciences
Engineering Social
sciences GRE quantitative scores
Humanities and arts
Education 200
300 400
500 600
700 800
2.29
2.30
4
4
x x
s n
assuming your hand-held calculator can keep only 8 digits in any number.
70
CHAPTER 2 DESCRIBING PATTERNS IN DATA
h j
h j
h j
Calculate the sample variance using the definitional formula, 1
Calculate the sample variance using the computing formula Compare the
results. Consider the statement, “Although the computing formula leads, in general, to faster calculations, it may lead to inaccurate results because of
round-off error for a set of uniformly large numbers.” Do you agree?
Side-by-side boxplots of GRE Quantitative scores for students admitted to graduate study in departments classified as Natural Sciences, Engineering, and
so forth are shown here. The boxplots are based on GRE scores accumulated over a five-year period.
a. Using the vertical scale in the diagram, interpret the boxplot for Education.
b. Which departments tend to have the highest Quantitative scores? Which
departments have the most highly concentrated Quantitative scores about the median score as measured by the interquartile range? Which departments
have the largest range of Quantitative scores?
c. Looking at the boxplot for all departments, would you say the distribution
of Quantitative scores is symmetric or skewed? Justify your choice. The scores of which departments are the most heavily skewed? Are these scores
skewed to the right or left?
Select the appropriate phrase to make the sentence correct. a.
The mean of a data set with the outliers eliminated will be smaller than; larger than; smaller or larger than; equal to the average of the data set with
the outliers included. b.
The standard deviation of a data set with the outliers eliminated will be smaller than; larger than; smaller or larger than; equal to the standard
deviation of the data set with the outliers included. c.
The median of a data set with the outliers eliminated will be smaller than; larger than; smaller or larger than; equal to the median of the data set with
the outliers included.
2 2
1
2 2
n i
i
4
2.31
2.32
Age at Age at
Age at Name
Inauguration Name
Inauguration Name
Inauguration
Graduate Graduate
Graduate City
Degree City
Degree City
Degree
Minitab or similar program recommended
M Q
Q Minitab or similar program recommended
Fortune,
71
1. Washington 57
15. Buchanan 65
29. Harding 55
2. J. Adams 61
16. Lincoln 52
30. Coolidge 51
3. Jefferson 57
17. A. Johnson 56
31. Hoover 54
4. Madison 57
18. Grant 46
32. F. D. Roosevelt 51
5. Monroe 58
19. Hayes 54
33. Truman 60
6. J. Q. Adams 57
20. Garfield 49
34. Eisenhower 62
7. Jackson 61
21. Arthur 50
35. Kennedy 43
8. Van Buren 54
22. Cleveland 47
36. L. Johnson 55
9. W. H. Harrison 68
23. B. Harrison 55
37. Nixon 56
10. Tyler 51
24. Cleveland 55
38. Ford 61
11. Polk 49
25. McKinley 54
39. Carter 52
12. Taylor 64
26. T. Roosevelt 42
40. Reagan 69
13. Fillmore 50
27. Taft 51
41. Bush 64
14. Pierce 48
28. Wilson 56
42. Clinton 46
Raleigh Durham 11.6
Dayton 6.9
Norfolk 6.5
New York 10.5
Denver 9.4
Oakland 10.8
Boston 11.2
Detroit 6.4
Oklahoma City 7.3
Seattle 8.8
Ft. Lauderdale 6.4
Orlando 6.1
Austin 10.3
Fort Worth 6.2
Phoenix 6.9
Chicago 8.7
Grand Rapids 5.6
Pittsburgh 6.8
Houston 7.9
Greensboro 5.1
Portland 7.6
San Jose 12.0
Hartford 10.3
Richmond 7.7
Philadelphia 8.3
Honolulu 7.9
Rochester 8.7
Minneapolis 7.7
Indianapolis 7.4
Sacramento 6.9
Albany 10.2
Jacksonville 5.6
St. Louis 3.8
Atlanta 8.1
Kansas City 7.5
Salt Lake City 7.0
Baltimore 9.2
Las Vegas 4.7
San Antonio 6.8
Birmingham 6.6
Los Angeles 7.8
San Diego 8.8
Buffalo 7.5
Louisville 6.7
San Francisco 12.8
Charlotte 5.2
Memphis 6.4
Scranton 5.1
Cincinnati 7.1
Miami 7.6
Tampa 5.6
Cleveland 6.5
Milwaukee 6.5
Tulsa 6.0
Columbus 8.0
Nashville 7.1
Washington, D.C. 15.8
Dallas 8.2
New Orleans 6.9
West Palm Beach 7.6
2.5 NUMERICAL SUMMARIES OF DATA DISTRIBUTIONS
AgePres.dat
CitGrad.dat
The following table shows the age at inauguration of each U.S. president.
a. Make a stem-and-leaf diagram of the age at inauguration. Let the leaf
unit 1.
b. Find the median,
, and the first and third quartiles, and
. The article “The Best Cities for
Knowledge Workers” Nov. 15, 1993 states that one measure of the
brainpower that employers need is the number of workers 25 years old and older who hold a postbaccalaureate graduate degree. Consider the following
table.
1 3
Density Required
proportion
Variable Fixed
interval
2.6 THE NORMAL DENSITY FUNCTION