Sesi 3. Population, Parameter, Statistics and the Probability Application on Statistics Distribution

Basi c Probabi l i ty
Prof. dr. Siswanto Agus Wilopo, M.Sc., Sc.D.
Department of Biostatistics, Epidemiology and
Population Health
Faculty of Medicine
Universitas Gadjah Mada

Biostatistics I: 2017-18

Chap 4-1

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Learni ng Obj ecti ves
In this lecture, you learn:








Basic probability concepts and
definitions
Conditional probability
To use Bayes’ Theorem to revise
probabilities
Various counting rules
Biostatistics I: 2017-18

Chap 4-2

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Important Terms









Probability – the chance that an
uncertain event will occur (always
between 0 and 1)
Event – Each possible outcome of a
variable
Simple Event – an event that can be
described by a single characteristic
Sample Space – the collection of all
possible events
Biostatistics I: 2017-18

Chap 4-3

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Assessi ng Probabi l i ty
There are three approaches to assessing the
probability of an uncertain event:

1. a priori classical probability
probability of occurrence 

X
number of ways the event can occur

T
total number of elementary outcomes

2. empirical classical probability
probability of occurrence 

number of favorable outcomes observed
total number of outcomes observed

3. subjective probability
an individual judgment or opinion about
the probability of occurrence
Biostatistics I: 2017-18


Chap 4-4

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Sample Space
The Sample Space is the collection of all
possible events
e.g. All 6 faces of a die:

e.g. All 52 cards of a bridge deck:

Biostatistics I: 2017-18

Chap 4-5

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Events



Simple event






Complement of an event A (denoted A’)





An outcome from a sample space with one
characteristic
e.g., A red card from a deck of cards
All outcomes that are not part of event A
e.g., All cards that are not diamonds

Joint event




Involves two or more characteristics simultaneously
e.g., An ace that is also red from a deck of cards
Biostatistics I: 2017-18

Chap 4-6

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Vi sual i zi ng Events


Contingency Tables
Ace



Sample

Space

Not Ace

Total

Black

2

24

26

Red

2

24


26

Total

4

48

52

Tree Diagrams

2

Sample
Space

24

Full Deck

of 52 Cards

2
Biostatistics I: 2017-18

24

Chap 4-7

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Vi sual i zi ng Events


Venn Diagrams


Let A = aces




Let B = red cards

A ∩ B = ace and red

A

A U B = ace or red
Biostatistics I: 2017-18

B

Chap 4-8

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Mutual l y Excl usi ve Events


Mutually exclusive events



Events that cannot occur together

example:
A = queen of diamonds; B = queen of clubs



Events A and B are mutually exclusive

Biostatistics I: 2017-18

Chap 4-9

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Col l ecti vel y Exhausti ve Events


Collectively exhaustive events



One of the events must occur
The set of events covers the entire sample
space

example:
A = aces; B = black cards;
C = diamonds; D = hearts




Events A, B, C and D are collectively
exhaustive (but not mutually exclusive – an
ace may also be a heart)
Events B, C and D are collectively exhaustive
and also mutually exclusive
Biostatistics I: 2017-18

Chap 4-10

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Probabi l i ty






Probability is the numerical
measure of the likelihood that an
event will occur
The probability of any event must
be between 0 and 1, inclusively
0 ≤ P(A) ≤ 1 For any event A

1

0.5

The sum of the probabilities of all
mutually exclusive and
collectively exhaustive events is 1
P(A)  P(B)  P(C)  1
If A, B, and C are mutually exclusive and
collectively exhaustive Biostatistics I: 2017-18

Certain

0

Impossible
Chap 4-11

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Computi ng Joi nt and
Margi nal Probabi l i ti es


The probability of a joint event, A and B:
number of outcomes satisfying A and B
P( A and B) 
total number of elementary outcomes



Computing a marginal (or simple)
probability:
P(A)  P(A and B1 )  P(A and B 2 )    P(A and Bk )


Where B1, B2, …, Bk are k mutually exclusive and collectively
exhaustive events
Biostatistics I: 2017-18

Chap 4-12

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Joi nt Probabi l i ty Exampl e
P(Red and Ace)
number of cards that are red and ace 2


total number of cards
52

Type

Color
Red

Black

Total

Ace

2

2

4

Non-Ace

24

24

48

Total

26

26

52

Biostatistics I: 2017-18

Chap 4-13

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Margi nal Probabi l i ty Exampl e
P(Ace)
2
2
4
 P( Ace and Re d)  P( Ace and Black ) 


52 52 52

Type

Color
Red

Black

Total

Ace

2

2

4

Non-Ace

24

24

48

Total

26

26

52

Biostatistics I: 2017-18

Chap 4-14

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Joi nt Probabi l i ti es Usi ng
Conti ngency Tabl e
Event
B1

Event

B2

Total

A1

P(A1 and B1) P(A1 and B2)

A2

P(A2 and B1) P(A2 and B2) P(A2)

Total

P(B1)

Joint Probabilities

P(B2)

P(A1)

1

Marginal (Simple) Probabilities
Biostatistics I: 2017-18

Chap 4-15

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

General Addi ti on Rul e
General Addition Rule:
P(A or B) = P(A) + P(B) - P(A and B)
If A and B are mutually exclusive, then
P(A and B) = 0, so the rule can be simplified:
P(A or B) = P(A) + P(B)
For mutually exclusive events A and B
Biostatistics I: 2017-18

Chap 4-16

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

General Addi ti on Rul e Exampl e
P(Red or Ace) = P(Red) +P(Ace) - P(Red and Ace)
= 26/52 + 4/52 - 2/52 = 28/52

Type

Color
Red

Black

Total

Ace

2

2

4

Non-Ace

24

24

48

Total

26

26

52

Biostatistics I: 2017-18

Don’t count
the two red
aces twice!

Chap 4-17

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Computi ng Condi ti onal
Probabi l i ti es


A conditional probability is the probability of one
event, given that another event has occurred:

P(A and B)
P(A | B) 
P(B)

The conditional
probability of A given
that B has occurred

P(A and B)
P(B | A) 
P(A)

The conditional
probability of B given
that A has occurred

Where P(A and B) = joint probability of A and B
P(A) = marginal probability of A
P(B) = marginal probability of B
Biostatistics I: 2017-18

Chap 4-18

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Condi ti onal Probabi l i ty Exampl e




Of the cars on a used car lot, 70% have air
conditioning (AC) and 40% have a CD player
(CD). 20% of the cars have both.

What is the probability that a car has a
CD player, given that it has AC ?
i.e., we want to find P(CD | AC)

Biostatistics I: 2017-18

Chap 4-19

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Condi ti onal Probabi l i ty Exampl e
(continued)


Of the cars on a used car lot, 70% have air conditioning
(AC) and 40% have a CD player (CD).
20% of the cars have both.
CD

No CD

Total

AC

0.2

0.5

0.7

No AC

0.2

0.1

0.3

Total

0.4

0.6

1.0

P(CD and AC) 0.2
P(CD | AC) 

 0.2857
P(AC)
0.7
Biostatistics I: 2017-18

Chap 4-20

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Condi ti onal Probabi l i ty Exampl e
(continued)


Given AC, we only consider the top row (70% of the cars). Of these,
20% have a CD player. 20% of 70% is about 28.57%.

CD

No CD

Total

AC

0.2

0.5

0.7

No AC

0.2

0.1

0.3

Total

0.4

0.6

1.0

P(CD and AC) 0.2
P(CD | AC) 

 0.2857
P(AC)
0.7
Biostatistics I: 2017-18

Chap 4-21

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Usi ng Deci si on Trees
.2
.7

Given AC or
no AC:

.5
.7
All
Cars

.2
.3

Biostatistics I: 2017-18

.1
.3

P(AC and CD) = 0.2

P(AC and CD’) = 0.5

P(AC’ and CD) = 0.2

P(AC’ and CD’) = 0.1
Chap 4-22

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Usi ng Deci si on Trees
.2
.4

Given CD or
no CD:

.2
.4
All
Cars

.5
.6

Biostatistics I: 2017-18

.1
.6

(continued)
P(CD and AC) = 0.2

P(CD and AC’) = 0.2

P(CD’ and AC) = 0.5

P(CD’ and AC’) = 0.1
Chap 4-23

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Stati sti cal Independence


Two events are independent if and
only if:

P(A | B)  P(A)


Events A and B are independent when the
probability of one event is not affected by the
other event

Biostatistics I: 2017-18

Chap 4-24

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Mul ti pl i cati on Rul es


Multiplication rule for two events A
and B:

P(A and B)  P(A | B) P(B)
Note: If A and B are independent, then P(A | B)  P(A)
and the multiplication rule simplifies to

P(A and B)  P(A) P(B)
Biostatistics I: 2017-18

Chap 4-25

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Bayes’ Theorem
Bayes’ theorem is used to revise previously calculated
probabilities after new information is obtained

P(A | B i )P(Bi )
P(B i | A) 
P(A | B 1)P(B1)  P(A | B 2 )P(B2 )  P(A | B k )P(Bk )


where:
Bi = ith event of k mutually exclusive and
collectively exhaustive events
A = new event that might impact P(Bi)

Biostatistics I: 2017-18

Chap 4-26

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Margi nal Probabi l i ty


Marginal probability for event A:

P(A)  P(A | B1 ) P(B1 )  P(A | B 2 ) P(B 2 )    P(A | Bk ) P(Bk )


Where B1, B2, …, Bk are k mutually exclusive and
collectively exhaustive events

Biostatistics I: 2017-18

Chap 4-27

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Practi ce probl em
If HIV has a prevalence of 3% in
Jayapura, and a particular HIV test
has a false positive rate of .001 and a
false negative rate of .01, what is the
probability that a random person
selected off the street will test
positive?

9/7/2017

Biostatistics Chap
I: 2017-18
4-28

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Answer
Conditional probability: the
probability of testing + given that
a person is +

Joint probability of being + and
testing +

Marginal probability of carrying
the virus.

P(test +)=.99
P(+)=.03

P (+, test +)=.0297

P(test - )= .01
P(+, test -)=.003
P(test +) = .001

P(-, test +)=.00097

P(-)=.97
P(test -) = .999

P(-, test -) = .96903
______________
1.0

P(test +)=.0297+.00097=.03067
P(+&test+)P(+)*P(test+)
.0297 .03*.03067 (=.00092)
Biostatistics I: 2017-18
 Dependent!

Marginal probability of testing
positive

9/7/2017

Chap 4-29

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Law of total probabi l i ty
P(test )  P(test  /HIV )P(HIV)  P(test  /HIV)P(HIV-)

One of these has to be true (mutually
exclusive, collectively exhaustive).
They sum to 1.0.

P(test )  .99(.03)  .001(.97)
9/7/2017

Biostatistics I: 2017-18

30

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Law of total probabi l i ty


Formal Rule: Marginal probability for event
A=

P(A)  P(A | B1 ) P(B1 )  P(A | B 2 ) P(B 2 )    P(A | Bk ) P(Bk )
k

B

i



Where:

 1.0 and P(Bi &B j )  0 (mutually exclusive)

i 1

B3

B1

A
B2

P(A)  (50%)(25%)  (0)(50%)    (50%)(25%)  25%
9/7/2017

Biostatistics I: 2017-18

31

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Exampl e 2




A 54-year old woman has an abnormal
mammogram; what is the chance that
she has breast cancer?
Mammogram sensitivity=0.90 and
specificity =0.89

9/7/2017

Biostatistics I: 2017-18
32

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Exampl e: Mammography
sensitivity
P(test +)=.90
P(BC+)=.003

P (+, test +)=.0027

P(test -) = .10
P(+, test -)=.0003
P(test +) = .11

P(-, test +)=.10967

P(BC-)=.997
P(test -) = .89

specificity

P(-, test -) = .88733
______________
1.0

Marginal probabilities of breast cancer….(prevalence
among all 54-year olds)
P(BC/test+)=.0027/(.0027+.10967)=2.4%
Biostatistics I: 2017-18

9/7/2017

Chap 4-33

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

BAYES’ RULE

Biostatistics I: 2017-18

34

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Bayes’ Rul e: deri vati on


Definition:
Let A and B be two events with P(B)  0.
The conditional probability of A given B
is:
P( A & B)
P( A / B) 
P( B)

The idea: if we are given that the event B occurred, the relevant
sample space is reduced to B {P(B)=1 because we know B is
true} and conditional probability becomes a probability measure
on B.
9/7/2017

Biostatistics I: 2017-18
35

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Bayes’ Rul e: deri vati on
P( A & B)
P( A / B) 
P( B)

can be re-arranged to:

P( A & B)  P( A / B) P( B)
and, since also:
P ( B / A) 

9/7/2017

P( A & B)
P ( A)

 P ( A & B )  P ( B / A) P ( A)

P ( A / B ) P ( B )  P ( A & B )  P ( B / A) P ( A)
P ( A / B ) P ( B )  P ( B / A) P ( A)
P ( B / A) P ( A)
 P( A / B) 
P ( B ) Biostatistics I: 2017-18

36

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Bayes’ Rul e:

P( B / A) P( A)
P( A / B) 
P( B)
OR

P( B / A) P( A)
P( A / B) 
P( B / A) P( A)  P( B / ~ A) P(~ A)

9/7/2017

From the “Law of
Total Probability”

Biostatistics Chap
I: 2017-18
4-37

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Bayes’ Rul e:




Why do we care??
Why is Bayes’ Rule useful??
It turns out that sometimes it is very
useful to be able to “flip” conditional
probabilities. That is, we may know
the probability of A given B, but the
probability of B given A may not be
obvious. An example will help…
Biostatistics I: 2017-18

38

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

In-Cl ass Exerci se


If HIV has a prevalence of 3% in Jayapura,
and a particular HIV test has a false positive
rate of .001 and a false negative rate of .01,
what is the probability that a random
person who tests positive is actually
infected (also known as “positive predictive
value”)?

Biostatistics I: 2017-18

39

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Answer: usi ng probabi l i ty tree
P(test +)=.99
P(+)=.03

P (+, test +)=.0297

P(test - = .01)
P(+, test -)=.003
P(test +) = .001

P(-, test +)=.00097

P(-)=.97
P(-, test -) = .96903
P(test -) = .999

______________
1.0

A positive test places one on either of the two “test +” branches.
But only the top branch also fulfills the event “true infection.”
Therefore, the probability of being infected is the probability of being on the top
branch given that you are on one of the two circled branches above.
P( / test ) 

.0297
P(test  &true)

 96.8%
.0297  .00097
P(test )
9/7/2017

Biostatistics Chap
I: 2017-18
4-40

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Answer: usi ng Bayes’ rul e

P (true  / test  ) 

P (test  / true  ) P (true  )

P (test  / true  ) P (true  )  P (test  / true  ) P (true  )

.99(.03)
 96.8%
.99(.03)  .001(.97)

9/7/2017

Biostatistics I: 2017-18
41

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

In-cl ass exerci se
An insurance company believes that drivers can be
divided into two classes—those that are of high risk
and those that are of low risk. Their statistics show
that a high-risk driver will have an accident at some
time within a year with probability .4, but this
probability is only .1 for low risk drivers.
a) Assuming that 20% of the drivers are high-risk, what is the
probability that a new policy holder will have an accident
within a year of purchasing a policy?
b) If a new policy holder has an accident within a year of
purchasing a policy, what is the probability that he is a highrisk type driver?

Biostatistics I: 2017-18

42

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Answer to (a)
Assuming that 20% of the drivers are of
high-risk, what is the probability that a new
policy holder will have an accident within a
year of purchasing a policy?
Use law of total probability:
P(accident)=
P(accident/high risk)*P(high risk) +
P(accident/low risk)*P(low risk) =
.40(.20) + .10(.80) = .08 + .08 = .16
9/7/2017

Biostatistics Chap
I: 2017-18
4-43

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Answer to (b)
If a new policy holder has an accident within a year of
purchasing a policy, what is the probability that he is a high-risk
type driver?
P(high-risk/accident)=
P(accident/high risk)*P(high risk)/P(accident)
=.40(.20)/.16 = 50%
P(accident, high risk)=.08

P(accident/HR)=.4

Or use tree:

P(high risk)=.20

P( no acc/HR)=.6
P(no accident, high risk)=.12)
P(accident/LR)=.1
P(accident, low risk)=.08

P(low risk)=.80
P( no
accident/LR)=.9

P(no accident, low risk)=.72

______________
1.0

P(high risk/accident)=.08/.16=50%
9/7/2017

Biostatistics Chap
I: 2017-18
4-44

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Condi ti onal Probabi l i ty for
Epi demi ol ogy:
The odds ratio and risk ratio as
conditional probability

Biostatistics I: 2017-18

45

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

The Ri sk Rati o and the Odds Rati o as
condi ti onal probabi l i ty
In epidemiology, the association between a
risk factor or protective factor (exposure)
and a disease may be evaluated by the “risk
ratio” (RR) or the “odds ratio” (OR).
Both are measures of “relative risk”—the
general concept of comparing disease risks
in exposed vs. unexposed individuals.

9/7/2017

Biostatistics I: 2017-18
46

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Odds and Ri sk (probabi l i ty)
Definitions:
Risk = P(A) = cumulative probability (you specify the time
period!)
For example, what’s the probability that a person with a high
sugar intake develops diabetes in 1 year, 5 years, or over a
lifetime?
Odds = P(A)/P(~A)
For example, “the odds are 3 to 1 against a horse” means that
the horse has a 25% probability of winning.
Note: An odds is always higher than its corresponding
probability, unless the probability is 100%.

Biostatistics I: 2017-18

Chap 4-47

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Odds vs. Ri sk=probabi l i ty
If the risk is…

Then the odds are…

½ (50%)

1:1

¾ (75%)

3:1

1/ 10 (10%)

1:9

1/ 100 (1%)

1:99

Note: An odds is always higher than its corresponding probability,
unless the probability is 100%.
9/7/2017

Biostatistics Chap
I: 2017-18
4-48

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Introducti on to the 2x2 Tabl e
Exposure (E)

No Exposure (~E)

Disease (D)

a

b

a+b = P(D)

No Disease (~D)

c

d

c+d = P(~D)

a+c = P(E)

b+d = P(~E)

Marginal probability
of exposure

Marginal probability of
disease
9/7/2017

Chap
4-49
Biostatistics
I: 2017-18

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Cohort Studi es
Disease
Exposed
Target
population

Disease-free
cohort

Disease-free
Disease

Not
Exposed
Disease-free
TIME
9/7/2017

Biostatistics Chap
I: 2017-18
4-50

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

The Risk Ratio, or Relative Risk (RR)
Exposure (E)

No Exposure (~E)

Disease (D)

a

b

No Disease (~D)

c

d

a+c

b+d

RR 

P(D / E )
P( D /~E )

risk to the exposed

a
/(
a

c
)

b /( bd )
risk to the unexposed
9/7/2017

Biostatistics Chap
I: 2017-18
4-51

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Hypotheti cal Data

Congestive
Heart Failure
No CHF

High Systolic BP

Normal BP

400

400

1100

2600

1500

3000

400
/
1500
 2 .0
RR 
400 / 3000
9/7/2017

Biostatistics Chap
I: 2017-18
4-52

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

The odds rati o…


This risk ratio seems like the perfect measure
of relative risk. Why not stop here? Why
introduce the more complicated odds ratio??



We cannot calculate a risk ratio from a casecontrol study. Case-control studies are a
popular study design in epidemiology,
because they are useful for studying rare
diseases.



In a case-control study, we sample
conditional on disease status, so we cannot
calculate risk of disease.
9/7/2017

Biostatistics Chap
I: 2017-18
4-53

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Case-Control Studi es

Exposed in
past
Disease
Target
population

(Cases)

Not exposed
Exposed

No Disease
(Controls)

9/7/2017

Not Exposed

Biostatistics I: 2017-18
54

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

The Odds Ratio (OR)
Cases: Liver
cancer
Controls

Hep C +

Hep C -

90

10

100

30

70

100

What are P(D/E) and P(D/~E) here?
We can’t tell, because, by design, we have fixed the
proportion of liver cancer cases in this sample at 50% simply
by selecting half controls and half cases.
All these data give us is: P(E/D) and P(E/~D).
9/7/2017

Biostatistics Chap
I: 2017-18
4-55

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

The Odds Ratio (OR)
Cases: Liver
cancer
Controls

Hep C +

Hep C -

90

10

100

30

70

100

by Bayes’ Rule…

Luckily, P(E/D) [=the quantity you have] 

P( E / D) 

P( D / E ) P( E )
P( D)

Unfortunately, our sampling scheme precludes calculation of the marginals: P(E) and P(D), but turns out we
don’t need these if we use an odds ratio because the marginals cancel out!

9/7/2017

Biostatistics Chap
I: 2017-18
4-56

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

P( E / D)
P(~ E / D)
P( E / ~ D)
P(~ E / ~ D)

Odds of exposure in the cases

Odds of exposure in the controls

Bayes’ Rule

P( D / E) P( E)
P( D)
P( D / ~ E) P(~ E)
P( D)
P(~ D / E) P( E)
P(~ D)
P(~ D / ~ E) P(~ E)
P(~ D)

P( D / E )
P(~ D / E )
P( D / ~ E )
P(~ D / ~ E)

=

Odds of disease in the exposed

What we want!
Odds of disease in the unexposed

9/7/2017

Biostatistics Chap
I: 2017-18
4-57

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

The Odds Ratio (OR)
Exposure (E)
Disease (D)

a

No Disease (~D)

c

OR 

a
b
c
d

No Exposure
(~E)
b
d

Odds of exposure
for the cases.

ad


bc
Odds of exposure
for the controls

a
c
b
d

Odds of disease
for the exposed

Odds of disease for

the unexposed
Biostatistics Chap
I: 2017-18
4-58

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

The rare di sease assumpti on

OR 

P(D / E )
P (~ D / E ) 1
P(D /~ E )
P (~ D / ~ E )



P(D / E )
P(D /~ E )

 RR

1
When a disease is rare:
P(~D) = 1 - P(D)  1

9/7/2017

Biostatistics Chap
I: 2017-18
4-59

sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health

Exampl e
Cases: Liver
cancer
Controls

Hep C +

Hep C -

90

10

30

70

OR = 90*70/10*30 =21.0
Note: This indicates that those with Hep C infection have a 21-fold
increase in their odds of developing liver cancer (not in their
risk!).
 The odds ratio will always be bigger than the corresponding risk
ratio if RR >1 and smaller if RR