Research Study: Inferences about Performance-

has a sensitivity of 80 and specificity of 99. Thus, 16 of the 20 users would test positive, 20.8 ⫽ 16, and about 10 of the nonusers would test positive, 9801 ⫺ .99 ⫽ 9.8. If an athlete tests positive, what is the probability she is a user? We now have to make use of Bayes’ Formula to compute PPV. , where “sens” is the sensitivity of the test, “spec” is the specificity of the test, and “prior” is the prior probability that an athlete is a banned-drug user. For our ex- ample with a population of 1,000 athletes, Therefore, if an athlete tests positive there is only a 62 chance that she has used the drug. Even if the sensitivity of the test is increased to 100, the PPV is still relatively small: There is a 32 chance that the athlete is a nonuser even though the test result was positive. Thus, if the prior probability is small, there will always be a high degree of uncertainty with the test result even when the test has values of sensitivity and specificity near 1. However, if the prior probability is fairly large, then the PPV will be much closer to 1. For example, if the population consists of 900 users and only 100 nonusers, and the testing procedure has sensitivity ⫽ .9 and specificity ⫽ .99, then the PPV would be .9988, That is, the chance that the tested athlete is a user given she produced a positive test would be 99.88, a very small chance of a false positive. From this we conclude that an essential factor in Bayes’ Formula is the prior probability of an athlete being a banned-drug user. Making matters even worse in this situation is the fact that the prevalence prior probability of substance abuse is very difficult to determine. Hence, there will inevitably be a subjective aspect to assigning a prior probability. The authors of the article comment on the selection of the prior probability suggesting that in their particular sport, a hearing board consisting of ath- letes participating in the same sport as the athlete being tested would be especially appropriate for making decisions about prior probabilities. For example, assuming the board knows nothing about the athlete beyond what is presented at the hearing, they might regard drug abuse to be rare and hence the PPV would be at most moder- ately large. On the other hand, if the board knew that drug abuse is widespread, then the probability of abuse would be larger, based on a positive test result. To investigate further the relationship between PPV, prior probability, and sensitivity, for a fixed specificity of 99, consider Figure 4.29. The calculations of PPV are obtained by using Bayes’ Formula for a selection of prior and sensitivity, and with specificity ⫽ .99. We can thus observe that if the sensitivity of the test is relatively low—say, less than 50—then unless the prior is above 20 we will not be able to achieve a PPV ⫽ .9 900兾1,000 .9 900兾1,000 ⫹ 1 ⫺ .99 1 ⫺ 900兾1,000 ⫽ .9988 PPV ⫽ 1 20兾1,000 1 20兾1,000 ⫹ 1 ⫺ .99 1 ⫺ 20兾1,000 ⫽ .67 PPV ⫽ .8 20兾1,000 .8 20兾1,000 ⫹ 1 ⫺ .99 1 ⫺ 20兾1,000 ⫽ .62 PPV ⫽ sens prior sens prior ⫹ 1 ⫺ spec 1 ⫺ prior PPV greater than 90. The article describes how the above figure allows for using Bayes’ Formula in reverse. For example, a hearing board may make the decision that they would not rule against an athlete unless his or her probability of being a user was at least 95. Suppose we have a test having both sensitivity and specificity of 99. Then, the prior probability must be at least 50 in order to achieve a PPV of 95. This would allow the board to use their knowledge about the prevalence of drug abuse in the population of athletes to determine if a prevalence of 50 or larger is realistic. The authors conclude with the following comments: Conclusions about the likelihood of testosterone doping require consideration of three components: specificity and sensitivity of the testing procedure, and the prior probability of use. As regards the T 兾E ratio, anti-doping officials consider only speci- ficity. The result is a flawed process of inference. Bayes’ rule shows that it is impossible to draw conclusions about guilt on the basis of specificity alone. Policy-makers in the athletic federations should follow the lead of medical scientists who use sensitivity, specificity, and Bayes’ rule in interpreting diagnostic evidence.

4.16 Minitab Instructions

Generating Random Numbers To generate 1,000 random numbers from the set [0, 1, . . . , 9]: 1. Click on Calc, then Random Data, then Integer. 2. Type the number of rows of data: Generate 20 rows of data.

3.

Type the columns in which the data are to be stored: Store in columns: c1– c50. 4. Type the first number in the list: Minimum value: 0. 5. Type the last number in the list: Maximum value: 9. 6. Click on OK. Note that we have generated 20 50 ⫽ 1,000 random numbers. FIGURE 4.29 Relationship between PPV and prior probability for four different values of sensitivity. All curves assume specificity is 99. .2 .4 Prior PPV .6 .8 1.0 .2 .4 .6 .8 Sens = 1 Sens = .5 Sens = .1 Sens = .01 1.0 Calculating Binomial Probabilities To calculate binomial probabilities when n ⫽ 10 and p ⫽ 0.6: 1. Enter the values of x in column c1: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. 2. Click on Calc, then Probability Distributions, then Binomial.

3.

Select either Probability [to compute PX ⫽ x] or Cumulative proba- bility [to compute PX ⱕ x]. 4. Type the value of n: Number of trials: 10. 5. Type the value of p: Probability of success: 0.6. 6. Click on Input column. 7. Type the column number where values of x are located: C1. 8. Click on Optional storage. 9. Type the column number to store probability: C2. 10. Click on OK. Calculating Normal Probabilities To calculate when X is normally distributed with m ⫽ 23 and s ⫽ 5: 1. Click on Calc, then Probability Distributions, then Normal. 2. Click on Cumulative probability.

3.

Type the value of m: Mean: 23. 4. Type the value of s: Standard deviation: 5. 5. Click on Input constant. 6. Type the value of x: 18. 7. Click on OK. Generating Sampling Distribution of – y To create the sampling distribution of based on 500 samples of size n ⫽ 16 from a normal distribution with m ⫽ 60 and s ⫽ 5: 1. Click on Calc, then Random Data, then Normal. 2. Type the number of samples: Generate 500 rows.

3.

Type the sample size n in terms of number of columns: Store in columns c1– c16. 4. Type in the value of m: Mean: 60. 5. Type in the value of s: Standard deviation: 5. 6. Click on OK. There are now 500 rows in columns c1– c16, 500 samples of 16 values each to generate 500 values of . 7. Click on Calc, then Row Statistics, then mean. 8. Type in the location of data: Input Variables c1– c16. 9. Type in the column in which the 500 means will be stored: Store Results in c17. 10. To obtain the mean of the 500 s, click on Calc, then Column Statistics, then mean. 11. Type in the location of the 500 means: Input Variables c17. 12. Click on OK. 13. To obtain the standard deviation of the 500 s, click on Calc, then Column Statistics, then standard deviation. 14. Type in the location of the 500 means: Input Variables c17. 15. Click on OK. 16. To obtain the sampling distribution of , click Graph, then Histogram. 17. Type c17 in the Graph box. 18. Click on OK. y y y y y PX ⱕ 18