Conditional Probability and Independence

This definition for conditional probabilities gives rise to what is referred to as the multiplication law. Suppose that you have the additional information that the claim was associ- ated with a fire policy. Checking Table 4.2, we see that 20 or .20 of all claims are associated with a fire policy and that 6 or .06 of all claims are fraudulent fire policy claims. Therefore, it follows that the probability that the claim is fraudulent, given that you know the policy is a fire policy, is This probability, PF 兩fire policy, is called a conditional probability of the event F—that is, the probability of event F given the fact that the event “fire policy” has already occurred. This tells you that 30 of all fire policy claims are fraudulent. The vertical bar in the expression PF 兩fire policy represents the phrase “given that,” or simply “given.” Thus, the expression is read, “the probability of the event F given the event fire policy.” The probability PF ⫽ .10, called the unconditional or marginal probability of the event F, gives the proportion of times a claim is fraudulent—that is, the pro- portion of times event F occurs in a very large infinitely large number of repeti- tions of the experiment receiving an insurance claim and determining whether the claim is fraudulent. In contrast, the conditional probability of F, given that the claim is for a fire policy, PF 兩fire policy, gives the proportion of fire policy claims that are fraudulent. Clearly, the conditional probabilities of F, given the types of policies, will be of much greater assistance in measuring the risk of fraud than the unconditional probability of F. ⫽ .06 .20 ⫽ .30 PF 冷 fire policy ⫽ proportion of claims that are fraudulent fire policy claims proportion of claims that are against fire policies TABLE 4.2 Categorization of insurance claims Type of Policy Category Fire Auto Other Total Fraudulent 6 1 3 10 Nonfraudulent

14 29

47 90 Total 20 30 50 100 conditional probability unconditional probability DEFINITION 4.7 Consider two events A and B with nonzero probabilities, PA and PB. The conditional probability of event A given event B is The conditional probability of event B given event A is PB 冷 A ⫽ PA 傽 B PA PA 冷 B ⫽ PA 傽 B PB DEFINITION 4.8 The probability of the intersection of two events A and B is ⫽ PBPA 冷 B PA 傽 B ⫽ PAPB 冷 A The only difference between Definitions 4.7 and 4.8, both of which involve condi- tional probabilities, relates to what probabilities are known and what needs to be calculated. When the intersection probability and the individual proba- bility PA are known, we can compute . When we know PA and , we can compute . EXAMPLE 4.2 A corporation is proposing to select two of its current regional managers as vice pres- idents. In the history of the company, there has never been a female vice president. The corporation has six male regional managers and four female regional managers. Make the assumption that the 10 regional managers are equally qualified and hence all possible groups of two managers should have the same chance of being selected as the vice presidents. Now find the probability that both vice presidents are male. Solution Let A be the event that the first vice president selected is male and let B be the event that the second vice president selected is also male. The event that represents both selected vice presidents are male is the event A and B—that is, the event . Therefore, we want to calculate , using Definition 4.8. For this example, and Thus, Thus, the probability that both vice presidents are male is 1 兾3, under the condition that all candidates are equally qualified and that each group of two managers has the same chance of being selected. Thus, there is a relatively large probability of selecting two males as the vice presidents under the condition that all candidates are equally likely to be selected. Suppose that the probability of event A is the same whether event B has or has not occurred; that is, suppose Then we say that the occurrence of event A is not dependent on the occurrence of event B, or, simply, that A and B are independent events. When , the occurrence of A depends on the occurrence of B, and events A and B are said to be dependent events. PA 冷 B⫽PA PA 冷 B ⫽ PA 冷 B ⫽ PA PA 傽 B ⫽ PAPB 冷 A ⫽ 6 10 冢 5 9 冣 ⫽ 30 90 ⫽ 1 3 ⫽ of male managers after one male manager was selected of managers after one male manager was selected ⫽ 5 9 PB 冷 A ⫽ Psecond selection is male given first selection was male PA ⫽ Pfirst selection is male ⫽ of male managers of managers ⫽ 6 10 PA 傽 B ⫽ PB 冷 APA A傽 B PA 傽 B PB 冷 A PB 冷 A PA 傽 B independent events dependent events DEFINITION 4.9 Two events A and B are independent events if Note: You can show that if , then , and vice versa. PB 冷 A ⫽ PB PA 冷 B ⫽ PA PA 冷 B ⫽ PA or PB 冷 A ⫽ PB Definition 4.9 leads to a special case of . When events A and B are independent, it follows that The concept of independence is of particular importance in sampling. Later in the text, we will discuss drawing samples from two or more populations to compare the population means, variances, or some other population parameters. For most of these applications, we will select samples in such a way that the ob- served values in one sample are independent of the values that appear in another sample. We call these independent samples.

4.5 Bayes’ Formula

In this section, we will show how Bayes’ Formula can be used to update conditional probabilities by using sample data when available. These “updated” conditional probabilities are useful in decision making. A particular application of these tech- niques involves the evaluation of diagnostic tests. Suppose a meat inspector must decide whether a randomly selected meat sample contains E. coli bacteria. The inspector conducts a diagnostic test. Ideally, a positive result Pos would mean that the meat sample actually has E. coli, and a negative result Neg would imply that the meat sample is free of E. coli. However, the diagnostic test is occasionally in error. The results of the test may be a false positive, for which the test’s indication of E. coli presence is incorrect, or a false negative, for which the test’s conclusion of E. coli absence is incorrect. Large-scale screening tests are conducted to evalu- ate the accuracy of a given diagnostic test. For example, E. coli E is placed in 10,000 meat samples, and the diagnostic test yields a positive result for 9,500 sam- ples and a negative result for 500 samples; that is, there are 500 false negatives out of the 10,000 tests. Another 10,000 samples have all traces of E. coli NE removed, and the diagnostic test yields a positive result for 100 samples and a negative result for 9,900 samples. There are 100 false positives out of the 10,000 tests. We can sum- marize the results in Table 4.3. Evaluation of test results is as follows: False negative rate ⫽ PNeg 冷 NE ⫽ 500 10,000 ⫽ .05 True negative rate ⫽ PNeg 冷 NE ⫽ 9,900 10,000 ⫽ .99 False positive rate ⫽ PPos 冷 NE ⫽ 100 10,000 ⫽ .01 True positive rate ⫽ PPos 冷 E ⫽ 9,500 10,000 ⫽ .95 PA 傽 B ⫽ PAPB| A ⫽ PAPB PA 傽 B independent samples false positive false negative TABLE 4.3 E. coli test data Meat Sample Status Diagnostic Test Result E NE Positive 9,500 100 Negative 500 9,900 Total 10,000 10,000 The sensitivity of the diagnostic test is the true positive rate—that is, Ptest is positive 兩disease is present. The specificity of the diagnostic test is the true negative rate—that is, Ptest is negative 兩disease is not present. The primary question facing the inspector is to evaluate the probability of E. coli being present in the meat sample when the test yields a positive result—that is, the inspector needs to know PE 兩Pos. Bayes’ Formula provides us with a