Does the value of the correlation coefficient reflect the degree of association shown in Why do you think there may be a correlation between these two diseases?

Med. 3.70 Refer to the data in Exercise 3.69.

a. Construct a scatterplot of the number of AIDS cases versus the number of tuberculo-

sis cases.

b. Compute the correlation between the number of AIDS cases and the number of

tuberculosis cases. c. Why do you think there may be a correlation between these two diseases? Med. 3.71 Refer to the data in Exercise 3.69. a. Construct a scatterplot of the number of syphilis cases versus the number of tuberculosis cases.

b. Compute the correlation between the number of syphilis cases and the number of

tuberculosis cases. c. Why do you think there may be a correlation between these two diseases? Med. 3.72 Refer to the data in Exercise 3.69. a. Construct a quantile plot of the number of syphilis cases. b. From the quantile plot, determine the 90th percentile for the number of syphilis cases. c. Identify the states having number of syphilis cases that are above the 90th percentile. Med. 3.73 Refer to the data in Exercise 3.69. a. Construct a quantile plot of the number of tuberculosis cases. b. From the quantile plot, determine the 90th percentile for the number of tuberculosis cases.

c. Identify the states having number of tuberculosis cases that are above the 90th

percentile. Med. 3.74 Refer to the data in Exercise 3.69. a. Construct a quantile plot of the number of AIDS cases. b. From the quantile plot, determine the 90th percentile for the number of AIDS cases.

c. Identify the states having number of AIDS cases that are above the 90th percentile. Med.

3.75 Refer to the results from Exercises 3.72 –3.74. a. How many states had number of AIDS, tuberculosis, and syphilis cases all above the 90th percentiles? b. Identify these states and comment on any common elements between the states. c. How could the U.S. government apply the results from Exercises 3.69 –3.75 in making public health policy? Med. 3.76 In the article “Viral load and heterosexual transmission of human immunodeficiency virus type 1” [New England Journal of Medicine 2000 342:921–929], studied the question of whether people with high levels of HIV-1 are significantly more likely to transmit HIV to their uninfected partners. Measurements follow of the amount of HIV-1 RNA levels in the group whose partners who were initially uninfected became HIV positive during the course of the study: values are given in units of RNA copiesmL. 79725, 12862, 18022, 76712, 256440, 14013, 46083, 6808, 85781, 1251, 6081, 50397, 11020, 13633 1064, 496433, 25308, 6616, 11210, 13900

a. Determine the mean, median, and standard deviation. b. Find the 25th, 50th, and 75th percentiles.

c. Plot the data in a boxplot and histogram. d. Describe the shape of the distribution.

Med. 3.77 In many statistical procedures, it is often advantageous to have a symmetric distribution. When the data have a histogram that is highly right-skewed, it is often possible to obtain a sym- metric distribution by taking a transformation of the data. For the data in Exercise 3.76, take the natural logarithm of the data and answer the following questions.

a. Determine the mean, median, and standard deviation. b. Find the 25th, 50th, and 75th percentiles.

c. Plot the data in a boxplot and histogram. d. Did the logarithm transformation result in a somewhat symmetric distribution? Env. 3.78 PCBs are a class of chemicals often found near the disposal of electrical devices. PCBs tend to concentrate in human fat and have been associated with numerous health problems. In the article “Some other persistent organochlorines in Japanese human adipose tissue” [Environ- mental Health Perspective, Vol. 108, pp. 599 – 603], researchers examined the concentrations of PCB ngg in the fat of a group of adults. They detected the following concentrations: 1800, 1800, 2600, 1300, 520, 3200, 1700, 2500, 560, 930, 2300, 2300, 1700, 720

a. Determine the mean, median, and standard deviation. b. Find the 25th, 50th, and 75th percentiles.

c. Plot the data in a boxplot. d. Would it be appropriate to apply the Empirical Rule to these data? Why or why not? Agr. 3.79 The focal point of an agricultural research study was the relationship between when a crop is planted and the amount of crop harvested. If a crop is planted too early or too late, farmers may fail to obtain optimal yield and hence not make a profit. An ideal date for planting is set by the researchers, and the farmers then record the number of days either before or after the designated date. In the following data set, D is the number of days from the ideal planting date and Y is the yield in bushels per acre of a wheat crop: D ⫺ 19 ⫺ 18 ⫺ 15 ⫺ 12 ⫺ 9 ⫺ 6 ⫺ 4 ⫺ 3 ⫺ 1 Y 30.7 29.7 44.8 41.4 48.1 42.8 49.9 46.9 46.4 53.5 D 1 3 6 8 12 15 17

19 21

24 Y 55.0 46.9 44.1 50.2 41.0 42.8 36.5 35.8 32.2 23.3 a. Plot the data in a scatterplot. b. Describe the relationship between the number of days from the optimal planting date and the wheat yield. c. Calculate the correlation coefficient between days from optimal planting and yield. d. Explain why the correlation coefficient is relatively small for this data set. Con. 3.80 Although an exhaust fan is present in nearly every bathroom, they often are not used due to the high noise level. This is an unfortunate practice because regular use of the fan results in a reduction of indoor moisture. Excessive indoor moisture often results in the development of mold which may lead to adverse health consequences. Consumer Reports in its January 2004 issue reports on a wide variety of bathroom fans. The following table displays the price P in dollars of the fans and the quality of the fan measured in airflow AF, cubic feet per minute cfm. P 95 115 110

15 20

20 75 150 60 60 AF 60 60 60 55 55 55 85 80 80 75 P 160 125 125 110 130 125 30 60 110 85 AF 90 90 100 110 90 90 90 110 110 60

a. Plot the data in a scatterplot and comment on the relationship between price and

airflow.

b. Compute the correlation coefficient for this data set. Is there a strong or weak rela-

tionship between price and airflow of the fans? c. Is your conclusion in part b consistent with your answer in part a? d. Based on your answers in parts a and b, would it be reasonable to conclude that higher priced fans generate greater airflow?