Hypergeometric and Negative Binomial Distributions

3.5 Hypergeometric and Negative Binomial Distributions

  The hypergeometric and negative binomial distributions are both related to the binomial distribution. The binomial distribution is the approximate probability model for sampling without replacement from a finite dichotomous (S–F) population provided the sample size n is small relative to the population size N; the hypergeometric distribution is the exact probability model for the number of S’s in the sample. The binomial rv X is the number of S’s when the number n of trials is fixed, whereas the negative binomial distribution arises from fixing the number of S’s desired and letting the number of trials be random.

  the Hypergeometric distribution

  The assumptions leading to the hypergeometric distribution are as follows:

  1. The population or set to be sampled consists of N individuals, objects, or elements (a finite population).

  2. Each individual can be characterized as a success (S) or a failure (F), and there are M successes in the population.

  3. A sample of n individuals is selected without replacement in such a way that each subset of size n is equally likely to be chosen.

  The random variable of interest is X 5 the number of S’s in the sample. The

  probability distribution of X depends on the parameters n, M, and N, so we wish to obtain P(X 5 x) 5 h(x; n, M, N).

  ExamplE 3.34 During a particular period a university’s information technology office received 20 service orders for problems with printers, of which 8 were laser printers and 12 were inkjet models. A sample of 5 of these service orders is to be selected for inclusion in a customer satisfaction survey. Suppose that the 5 are selected in a completely

  random fashion, so that any particular subset of size 5 has the same chance of being selected as does any other subset. What then is the probability that exactly x (x 5 0, 1, 2, 3, 4, or 5) of the selected service orders were for inkjet printers?

  Here, the population size is N 5 20, the sample size is n 5 5, and the num- ber of S’s (inkjet 5 S) and F’s in the population are M 5 12 and N 2 M 5 8,

  respectively. Consider the value x 5 2. Because all outcomes (each consisting of 5 particular orders) are equally likely,

  number of outcomes having X 5 2

  P (X 5 2) 5 h(2; 5, 12, 20) 5

  number of possible outcomes

  The number of possible outcomes in the experiment is the number of ways of

  selecting 5 from the 20 service orders without regard to order—that is, 20 ( 5 ) . To

  count the number of outcomes having X 5 2, note that there are 12

  8 ( 2 ) ing 2 of the inkjet orders, and for each such way there are ways of select-

  ( 3 ) ways of selecting

  the 3 laser orders to fill out the sample. The product rule from Chapter 2 then gives

  2 )( 3 ) as the number of outcomes with X 5 2, so

  3.5 hypergeometric and Negative Binomial Distributions

  In general, if the sample size n is smaller than the number of successes in the population (M), then the largest possible X value is n. However, if M , n (e.g., a sample size of 25 and only 15 successes in the population), then X can be at most M. Similarly, whenever the number of population failures (N 2 M) exceeds the sample size, the smallest possible X value is 0 (since all sampled individuals might then be failures). However, if N 2 M , n, the smallest possible X value is n 2 (N 2 M). Thus, the pos- sible values of X satisfy the restriction max (0, n 2 (N 2 M)) x min (n, M). An argument parallel to that of the previous example gives the pmf of X.

  pROpOSITION

  If X is the number of S’s in a completely random sample of size n drawn from

  a population consisting of M S’s and (N 2 M) F’s, then the probability distri- bution of X, called the hypergeometric distribution, is given by

  M N2M

  1 x 2 1 n2x 2

  P (X 5 x) 5 h(x; n, M, N) 5

  1 N

  n 2

  for x an integer satisfying max (0, n 2 N 1 M) x min (n, M).

  In Example 3.34, n 5 5, M 5 12, and N 5 20, so h(x; 5, 12, 20) for x 5 0, 1, 2,

  3, 4, 5 can be obtained by substituting these numbers into Equation (3.15). ExamplE 3.35 Five individuals from an animal population thought to be near extinction in a certain

  region have been caught, tagged, and released to mix into the population. After they have had an opportunity to mix, a random sample of 10 of these animals is selected. Let X 5 the number of tagged animals in the second sample. Suppose there are actu- ally 25 animals of this type in the region.

  The parameter values are n 5 10, M 5 5 (5 tagged animals in the population), and N 5 25, so the pmf of X is

  1 x 2 1 10 2 x 2

  h (x; 10, 5, 25) 5

  x5 0, 1, 2, 3, 4, 5

  The probability that exactly two of the animals in the second sample are tagged is

  P (X 5 2) 5 h(2; 10, 5, 25) 5

  The probability that at most two of the animals in the recapture sample are tagged is

  P(X 2) 5 P(X 5 0, 1, or 2) 5

  x5 o 0

  h (x; 10, 5, 25)

  5 .057 1 .257 1 .385 5 .699 n

  128 Chapter 3 Discrete random Variables and probability Distributions

  Various statistical software packages will easily generate hypergeometric probabilities (tabulation is cumbersome because of the three parameters).

  As in the binomial case, there are simple expressions for E(X) and V(X) for hypergeometric rv’s.

  pROpOSITION

  The mean and variance of the hypergeometric rv X having pmf h(x; n, M, N) are

  N 1 N2 1 2 N 1 N 2

  The ratio M yN is the proportion of S’s in the population. Replacing MyN by p in E(X) and V(X) gives

  E(X) 5 np

  N2n

  1 N2 1 2

  V(X) 5

  ? np (1 2 p)

  Expression (3.16) shows that the means of the binomial and hypergeometric rv’s are equal, whereas the variances of the two rv’s differ by the factor (N 2 n) y(N 2 1), often called the finite population correction factor . This factor is less than 1, so the hypergeometric variable has smaller variance than does the binomial rv. The

  correction factor can be written as (1 2 n yN)y(1 2 1yN), which is approximately 1 when n is small relative to N.

  ExamplE 3.36

  In the animal-tagging example, n 5 10, M 5 5, and N 5 25, so p 5 5 y25 5 .2 and

  (Example 3.35

  E(X) 5 10(.2) 5 2

  continued)

  V(X) 5

  If the sampling had been carried out with replacement, V(X) 5 1.6.

  Suppose the population size N is not actually known, so the value x is observed and we wish to estimate N. It is reasonable to equate the observed sample proportion of S’s, x yn, with the population proportion, MyN, giving the estimate

  M?n Nˆ 5 x

  If M 5 100, n 5 40, and x 5 16, then Nˆ 5 250.

  n

  Our general rule of thumb in Section 3.4 stated that if sampling was without replacement but nN was at most .05, then the binomial distribution could be used to compute approximate probabilities involving the number of S’s in the sample.

  A more precise statement is as follows: Let the population size, N, and number of population S’s, M, get large with the ratio MN approaching p. Then h(x; n, M, N) approaches b(x; n, p); so for nN small, the two are approximately equal provided that p is not too near either 0 or 1. This is the rationale for the rule.

  the negative Binomial distribution

  The negative binomial rv and distribution are based on an experiment satisfying the following conditions:

  1. The experiment consists of a sequence of independent trials.

  2. Each trial can result in either a success (S) or a failure (F).

  3.5 hypergeometric and Negative Binomial Distributions

  3. The probability of success is constant from trial to trial, so P(S on trial i) 5 p for i 5 1, 2, 3,….

  4. The experiment continues (trials are performed) until a total of r successes have been observed, where r is a specified positive integer.

  The random variable of interest is X 5 the number of failures that precede the rth success; X is called a negative binomial random variable because, in contrast to the binomial rv, the number of successes is fixed and the number of trials is random.

  Possible values of X are 0, 1, 2, … . Let nb(x; r, p) denote the pmf of X. Consider nb(7; 3, p) 5 P(X 5 7), the probability that exactly 7 F’s occur before the

  3 rd S. In order for this to happen, the 10 th trial must be an S and there must be exactly

  2 S’s among the first 9 trials. Thus

  Generalizing this line of reasoning gives the following formula for the negative binomial pmf.

  pROpOSITION

  The pmf of the negative binomial rv X with parameters r 5 number of S’s and

  p5P (S) is

  1 r2 1 2

  x1r2 1

  nb (x; r, p) 5

  p r (1 2 p) x x5 0, 1, 2,…

  ExamplE 3.37 A pediatrician wishes to recruit 5 couples, each of whom is expecting their first child, to participate in a new natural childbirth regimen. Let p 5 P(a randomly selected couple agrees to participate). If p 5 .2, what is the probability that 15 cou- ples must be asked before 5 are found who agree to participate? That is, with S5 {agrees to participate}, what is the probability that 10 F’s occur before the fifth S ? Substituting r 5 5, p 5 .2, and x 5 10 into nb(x; r, p) gives

  nb (10; 5, .2) 5

  The probability that at most 10 F’s are observed (at most 15 couples are asked) is

  x1 4

  x5 o 0 x5 o 0 1 2

  P (X 10) 5 nb (x; 5, .2) 5 (.2) 5

  (.8) x 5 .164 n

  In some sources, the negative binomial rv is taken to be the number of trials X1r rather than the number of failures.

  In the special case r 5 1, the pmf is

  nb (x; 1, p) 5 (1 2 p) x p x5

  0, 1, 2,… (3.17)

  In Example 3.12, we derived the pmf for the number of trials necessary to obtain the first S, and the pmf there is similar to Expression (3.17). Both X 5 number of F’s

  and Y 5 number of trials ( 5 1 1 X) are referred to in the literature as geometric

  random variables , and the pmf in Expression (3.17) is called the geometric

  distribution .

  130 Chapter 3 Discrete random Variables and probability Distributions

  The expected number of trials until the first S was shown in Example 3.19 to

  be 1p, so that the expected number of F’s until the first S is (1 yp) 2 1 5 (1 2 p)yp. Intuitively, we would expect to see r ? (1 2 p) ypF’s before the rth S, and this is indeed E(X). There is also a simple formula for V(X).

  pROpOSITION

  If X is a negative binomial rv with pmf nb(x; r, p), then

  p 2

  Finally, by expanding the binomial coefficient in front of p r (1 2 p) x and doing some cancellation, it can be seen that nb(x; r, p) is well defined even when r is not an inte- ger. This generalized negative binomial distribution has been found to fit observed data quite well in a wide variety of applications.

  ExERcisEs Section 3.5 (68–78)

  68. Eighteen individuals are scheduled to take a driving test

  had been turned in, the instructor randomly ordered them

  at a particular DMV office on a certain day, eight of

  before grading. Consider the first 15 graded projects.

  whom will be taking the test for the first time. Suppose

  a. What is the probability that exactly 10 of these are

  that six of these individuals are randomly assigned to a

  from the second section?

  particular examiner, and let X be the number among the

  b. What is the probability that at least 10 of these are

  six who are taking the test for the first time.

  from the second section?

  a. What kind of a distribution does X have (name and

  c. What is the probability that at least 10 of these are

  values of all parameters)?

  from the same section?

  b. Compute P(X 5 2), P(X 2), and P(X 2).

  d. What are the mean value and standard deviation of

  c. Calculate the mean value and standard deviation of X.

  the number among these 15 that are from the second

  69. Each of 12 refrigerators of a certain type has been

  section?

  returned to a distributor because of an audible, high-

  e. What are the mean value and standard deviation of

  pitched, oscillating noise when the refrigerators are run-

  the number of projects not among these first 15 that

  ning. Suppose that 7 of these refrigerators have a defec-

  are from the second section?

  tive compressor and the other 5 have less serious prob-

  71. A geologist has collected 10 specimens of basaltic rock

  lems. If the refrigerators are examined in random order,

  and 10 specimens of granite. The geologist instructs a

  let X be the number among the first 6 examined that have

  laboratory assistant to randomly select 15 of the speci-

  a defective compressor.

  mens for analysis.

  a. Calculate P (X 5 4) and P(X 4)

  a. What is the pmf of the number of granite specimens

  b. Determine the probability that X exceeds its mean

  selected for analysis?

  value by more than 1 standard deviation.

  b. What is the probability that all specimens of one of

  c. Consider a large shipment of 400 refrigerators, of

  the two types of rock are selected for analysis?

  which 40 have defective compressors. If X is the

  c. What is the probability that the number of granite

  number among 15 randomly selected refrigerators

  specimens selected for analysis is within 1 standard

  that have defective compressors, describe a less

  deviation of its mean value?

  tedious way to calculate (at least approximately)

  72. A personnel director interviewing 11 senior engineers

  P (X 5) than to use the hypergeometric pmf.

  for four job openings has scheduled six interviews for the

  70. An instructor who taught two sections of engineering

  first day and five for the second day of interviewing.

  statistics last term, the first with 20 students and the second

  Assume that the candidates are interviewed in random

  with 30, decided to assign a term project. After all projects

  order.

  3.6 the poisson probability Distribution 131

  a. What is the probability that x of the top four candi-

  a. What is the probability that you purchase x boxes

  dates are interviewed on the first day?

  that do not have the desired prize?

  b. How many of the top four candidates can be

  b. What is the probability that you purchase four

  expected to be interviewed on the first day?

  boxes?

  73. Twenty pairs of individuals playing in a bridge tourna-

  c. What is the probability that you purchase at most

  ment have been seeded 1, … , 20. In the first part of the

  four boxes?

  tournament, the 20 are randomly divided into 10 east–

  d. How many boxes without the desired prize do you

  west pairs and 10 north–south pairs.

  expect to purchase? How many boxes do you expect

  a. What is the probability that x of the top 10 pairs end

  to purchase?

  up playing east–west?

  76. A family decides to have children until it has three chil-

  b. What is the probability that all of the top five pairs

  dren of the same gender. Assuming P(B) 5 P(G) 5 .5,

  end up playing the same direction?

  what is the pmf of X 5 the number of children in the

  c. If there are 2n pairs, what is the pmf of X 5 the number

  family?

  among the top n pairs who end up playing east–west?

  77. Three brothers and their wives decide to have children

  What are E(X) and V(X)?

  until each family has two female children. What is the pmf

  74. A second-stage smog alert has been called in a certain

  of X 5 the total number of male children born to the

  area of Los Angeles County in which there are 50 indus-

  brothers? What is E(X), and how does it compare to the

  trial firms. An inspector will visit 10 randomly selected

  expected number of male children born to each brother?

  firms to check for violations of regulations.

  78. According to the article “Characterizing the

  a. If 15 of the firms are actually violating at least one

  Severity and Risk of Drought in the Poudre River,

  regulation, what is the pmf of the number of firms

  Colorado” (J. of Water Res. Planning and Mgmnt.,

  visited by the inspector that are in violation of at

  2005: 383–393), the drought length Y is the number

  least one regulation?

  of consecutive time intervals in which the water sup-

  b. If there are 500 firms in the area, of which 150 are in

  ply remains below a critical value y 0 (a deficit), pre-

  violation, approximate the pmf of part (a) by a sim-

  ceded by and followed by periods in which the supply

  pler pmf.

  exceeds this critical value (a surplus). The cited paper

  c. For X5 the number among the 10 visited that are in

  proposes a geometric distribution with p 5 .409 for

  violation, compute E(X) and V(X) both for the exact

  this random variable.

  pmf and the approximating pmf in part (b).

  a. What is the probability that a drought lasts exactly

  75. The probability that a randomly selected box of a certain

  3 intervals? At most 3 intervals?

  type of cereal has a particular prize is .2. Suppose you

  b. What is the probability that the length of a drought

  purchase box after box until you have obtained two of

  exceeds its mean value by at least one standard

  these prizes.

  deviation?

Dokumen yang terkait

AN ALIS IS YU RID IS PUT USAN BE B AS DAL AM P E RKAR A TIND AK P IDA NA P E NY E RTA AN M E L AK U K A N P R AK T IK K E DO K T E RA N YA NG M E N G A K IB ATK AN M ATINYA P AS IE N ( PUT USA N N O MOR: 9 0/PID.B /2011/ PN.MD O)

0 82 16

Analisis Komparasi Internet Financial Local Government Reporting Pada Website Resmi Kabupaten dan Kota di Jawa Timur The Comparison Analysis of Internet Financial Local Government Reporting on Official Website of Regency and City in East Java

19 819 7

Anal isi s L e ve l Pe r tanyaan p ad a S oal Ce r ita d alam B u k u T e k s M at e m at ik a Pe n u n jang S MK Pr ogr a m Keahl ian T e k n ologi , Kese h at an , d an Pe r tani an Kelas X T e r b itan E r lan gga B e r d asarkan T ak s on om i S OL O

2 99 16

ANTARA IDEALISME DAN KENYATAAN: KEBIJAKAN PENDIDIKAN TIONGHOA PERANAKAN DI SURABAYA PADA MASA PENDUDUKAN JEPANG TAHUN 1942-1945 Between Idealism and Reality: Education Policy of Chinese in Surabaya in the Japanese Era at 1942-1945)

1 29 9

Improving the Eighth Year Students' Tense Achievement and Active Participation by Giving Positive Reinforcement at SMPN 1 Silo in the 2013/2014 Academic Year

7 202 3

Improving the VIII-B Students' listening comprehension ability through note taking and partial dictation techniques at SMPN 3 Jember in the 2006/2007 Academic Year -

0 63 87

The Correlation between students vocabulary master and reading comprehension

16 145 49

Improping student's reading comprehension of descriptive text through textual teaching and learning (CTL)

8 140 133

The correlation between listening skill and pronunciation accuracy : a case study in the firt year of smk vocation higt school pupita bangsa ciputat school year 2005-2006

9 128 37

Transmission of Greek and Arabic Veteri

0 1 22