Sesi 3. Population, Parameter, Statistics and the Probability Application on Statistics Distribution
Basi c Probabi l i ty
Prof. dr. Siswanto Agus Wilopo, M.Sc., Sc.D.
Department of Biostatistics, Epidemiology and
Population Health
Faculty of Medicine
Universitas Gadjah Mada
Biostatistics I: 2017-18
Chap 4-1
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Learni ng Obj ecti ves
In this lecture, you learn:
Basic probability concepts and
definitions
Conditional probability
To use Bayes’ Theorem to revise
probabilities
Various counting rules
Biostatistics I: 2017-18
Chap 4-2
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Important Terms
Probability – the chance that an
uncertain event will occur (always
between 0 and 1)
Event – Each possible outcome of a
variable
Simple Event – an event that can be
described by a single characteristic
Sample Space – the collection of all
possible events
Biostatistics I: 2017-18
Chap 4-3
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Assessi ng Probabi l i ty
There are three approaches to assessing the
probability of an uncertain event:
1. a priori classical probability
probability of occurrence
X
number of ways the event can occur
T
total number of elementary outcomes
2. empirical classical probability
probability of occurrence
number of favorable outcomes observed
total number of outcomes observed
3. subjective probability
an individual judgment or opinion about
the probability of occurrence
Biostatistics I: 2017-18
Chap 4-4
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Sample Space
The Sample Space is the collection of all
possible events
e.g. All 6 faces of a die:
e.g. All 52 cards of a bridge deck:
Biostatistics I: 2017-18
Chap 4-5
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Events
Simple event
Complement of an event A (denoted A’)
An outcome from a sample space with one
characteristic
e.g., A red card from a deck of cards
All outcomes that are not part of event A
e.g., All cards that are not diamonds
Joint event
Involves two or more characteristics simultaneously
e.g., An ace that is also red from a deck of cards
Biostatistics I: 2017-18
Chap 4-6
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Vi sual i zi ng Events
Contingency Tables
Ace
Sample
Space
Not Ace
Total
Black
2
24
26
Red
2
24
26
Total
4
48
52
Tree Diagrams
2
Sample
Space
24
Full Deck
of 52 Cards
2
Biostatistics I: 2017-18
24
Chap 4-7
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Vi sual i zi ng Events
Venn Diagrams
Let A = aces
Let B = red cards
A ∩ B = ace and red
A
A U B = ace or red
Biostatistics I: 2017-18
B
Chap 4-8
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Mutual l y Excl usi ve Events
Mutually exclusive events
Events that cannot occur together
example:
A = queen of diamonds; B = queen of clubs
Events A and B are mutually exclusive
Biostatistics I: 2017-18
Chap 4-9
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Col l ecti vel y Exhausti ve Events
Collectively exhaustive events
One of the events must occur
The set of events covers the entire sample
space
example:
A = aces; B = black cards;
C = diamonds; D = hearts
Events A, B, C and D are collectively
exhaustive (but not mutually exclusive – an
ace may also be a heart)
Events B, C and D are collectively exhaustive
and also mutually exclusive
Biostatistics I: 2017-18
Chap 4-10
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Probabi l i ty
Probability is the numerical
measure of the likelihood that an
event will occur
The probability of any event must
be between 0 and 1, inclusively
0 ≤ P(A) ≤ 1 For any event A
1
0.5
The sum of the probabilities of all
mutually exclusive and
collectively exhaustive events is 1
P(A) P(B) P(C) 1
If A, B, and C are mutually exclusive and
collectively exhaustive Biostatistics I: 2017-18
Certain
0
Impossible
Chap 4-11
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Computi ng Joi nt and
Margi nal Probabi l i ti es
The probability of a joint event, A and B:
number of outcomes satisfying A and B
P( A and B)
total number of elementary outcomes
Computing a marginal (or simple)
probability:
P(A) P(A and B1 ) P(A and B 2 ) P(A and Bk )
Where B1, B2, …, Bk are k mutually exclusive and collectively
exhaustive events
Biostatistics I: 2017-18
Chap 4-12
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Joi nt Probabi l i ty Exampl e
P(Red and Ace)
number of cards that are red and ace 2
total number of cards
52
Type
Color
Red
Black
Total
Ace
2
2
4
Non-Ace
24
24
48
Total
26
26
52
Biostatistics I: 2017-18
Chap 4-13
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Margi nal Probabi l i ty Exampl e
P(Ace)
2
2
4
P( Ace and Re d) P( Ace and Black )
52 52 52
Type
Color
Red
Black
Total
Ace
2
2
4
Non-Ace
24
24
48
Total
26
26
52
Biostatistics I: 2017-18
Chap 4-14
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Joi nt Probabi l i ti es Usi ng
Conti ngency Tabl e
Event
B1
Event
B2
Total
A1
P(A1 and B1) P(A1 and B2)
A2
P(A2 and B1) P(A2 and B2) P(A2)
Total
P(B1)
Joint Probabilities
P(B2)
P(A1)
1
Marginal (Simple) Probabilities
Biostatistics I: 2017-18
Chap 4-15
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
General Addi ti on Rul e
General Addition Rule:
P(A or B) = P(A) + P(B) - P(A and B)
If A and B are mutually exclusive, then
P(A and B) = 0, so the rule can be simplified:
P(A or B) = P(A) + P(B)
For mutually exclusive events A and B
Biostatistics I: 2017-18
Chap 4-16
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
General Addi ti on Rul e Exampl e
P(Red or Ace) = P(Red) +P(Ace) - P(Red and Ace)
= 26/52 + 4/52 - 2/52 = 28/52
Type
Color
Red
Black
Total
Ace
2
2
4
Non-Ace
24
24
48
Total
26
26
52
Biostatistics I: 2017-18
Don’t count
the two red
aces twice!
Chap 4-17
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Computi ng Condi ti onal
Probabi l i ti es
A conditional probability is the probability of one
event, given that another event has occurred:
P(A and B)
P(A | B)
P(B)
The conditional
probability of A given
that B has occurred
P(A and B)
P(B | A)
P(A)
The conditional
probability of B given
that A has occurred
Where P(A and B) = joint probability of A and B
P(A) = marginal probability of A
P(B) = marginal probability of B
Biostatistics I: 2017-18
Chap 4-18
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Condi ti onal Probabi l i ty Exampl e
Of the cars on a used car lot, 70% have air
conditioning (AC) and 40% have a CD player
(CD). 20% of the cars have both.
What is the probability that a car has a
CD player, given that it has AC ?
i.e., we want to find P(CD | AC)
Biostatistics I: 2017-18
Chap 4-19
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Condi ti onal Probabi l i ty Exampl e
(continued)
Of the cars on a used car lot, 70% have air conditioning
(AC) and 40% have a CD player (CD).
20% of the cars have both.
CD
No CD
Total
AC
0.2
0.5
0.7
No AC
0.2
0.1
0.3
Total
0.4
0.6
1.0
P(CD and AC) 0.2
P(CD | AC)
0.2857
P(AC)
0.7
Biostatistics I: 2017-18
Chap 4-20
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Condi ti onal Probabi l i ty Exampl e
(continued)
Given AC, we only consider the top row (70% of the cars). Of these,
20% have a CD player. 20% of 70% is about 28.57%.
CD
No CD
Total
AC
0.2
0.5
0.7
No AC
0.2
0.1
0.3
Total
0.4
0.6
1.0
P(CD and AC) 0.2
P(CD | AC)
0.2857
P(AC)
0.7
Biostatistics I: 2017-18
Chap 4-21
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Usi ng Deci si on Trees
.2
.7
Given AC or
no AC:
.5
.7
All
Cars
.2
.3
Biostatistics I: 2017-18
.1
.3
P(AC and CD) = 0.2
P(AC and CD’) = 0.5
P(AC’ and CD) = 0.2
P(AC’ and CD’) = 0.1
Chap 4-22
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Usi ng Deci si on Trees
.2
.4
Given CD or
no CD:
.2
.4
All
Cars
.5
.6
Biostatistics I: 2017-18
.1
.6
(continued)
P(CD and AC) = 0.2
P(CD and AC’) = 0.2
P(CD’ and AC) = 0.5
P(CD’ and AC’) = 0.1
Chap 4-23
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Stati sti cal Independence
Two events are independent if and
only if:
P(A | B) P(A)
Events A and B are independent when the
probability of one event is not affected by the
other event
Biostatistics I: 2017-18
Chap 4-24
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Mul ti pl i cati on Rul es
Multiplication rule for two events A
and B:
P(A and B) P(A | B) P(B)
Note: If A and B are independent, then P(A | B) P(A)
and the multiplication rule simplifies to
P(A and B) P(A) P(B)
Biostatistics I: 2017-18
Chap 4-25
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Bayes’ Theorem
Bayes’ theorem is used to revise previously calculated
probabilities after new information is obtained
P(A | B i )P(Bi )
P(B i | A)
P(A | B 1)P(B1) P(A | B 2 )P(B2 ) P(A | B k )P(Bk )
where:
Bi = ith event of k mutually exclusive and
collectively exhaustive events
A = new event that might impact P(Bi)
Biostatistics I: 2017-18
Chap 4-26
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Margi nal Probabi l i ty
Marginal probability for event A:
P(A) P(A | B1 ) P(B1 ) P(A | B 2 ) P(B 2 ) P(A | Bk ) P(Bk )
Where B1, B2, …, Bk are k mutually exclusive and
collectively exhaustive events
Biostatistics I: 2017-18
Chap 4-27
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Practi ce probl em
If HIV has a prevalence of 3% in
Jayapura, and a particular HIV test
has a false positive rate of .001 and a
false negative rate of .01, what is the
probability that a random person
selected off the street will test
positive?
9/7/2017
Biostatistics Chap
I: 2017-18
4-28
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Answer
Conditional probability: the
probability of testing + given that
a person is +
Joint probability of being + and
testing +
Marginal probability of carrying
the virus.
P(test +)=.99
P(+)=.03
P (+, test +)=.0297
P(test - )= .01
P(+, test -)=.003
P(test +) = .001
P(-, test +)=.00097
P(-)=.97
P(test -) = .999
P(-, test -) = .96903
______________
1.0
P(test +)=.0297+.00097=.03067
P(+&test+)P(+)*P(test+)
.0297 .03*.03067 (=.00092)
Biostatistics I: 2017-18
Dependent!
Marginal probability of testing
positive
9/7/2017
Chap 4-29
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Law of total probabi l i ty
P(test ) P(test /HIV )P(HIV) P(test /HIV)P(HIV-)
One of these has to be true (mutually
exclusive, collectively exhaustive).
They sum to 1.0.
P(test ) .99(.03) .001(.97)
9/7/2017
Biostatistics I: 2017-18
30
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Law of total probabi l i ty
Formal Rule: Marginal probability for event
A=
P(A) P(A | B1 ) P(B1 ) P(A | B 2 ) P(B 2 ) P(A | Bk ) P(Bk )
k
B
i
Where:
1.0 and P(Bi &B j ) 0 (mutually exclusive)
i 1
B3
B1
A
B2
P(A) (50%)(25%) (0)(50%) (50%)(25%) 25%
9/7/2017
Biostatistics I: 2017-18
31
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Exampl e 2
A 54-year old woman has an abnormal
mammogram; what is the chance that
she has breast cancer?
Mammogram sensitivity=0.90 and
specificity =0.89
9/7/2017
Biostatistics I: 2017-18
32
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Exampl e: Mammography
sensitivity
P(test +)=.90
P(BC+)=.003
P (+, test +)=.0027
P(test -) = .10
P(+, test -)=.0003
P(test +) = .11
P(-, test +)=.10967
P(BC-)=.997
P(test -) = .89
specificity
P(-, test -) = .88733
______________
1.0
Marginal probabilities of breast cancer….(prevalence
among all 54-year olds)
P(BC/test+)=.0027/(.0027+.10967)=2.4%
Biostatistics I: 2017-18
9/7/2017
Chap 4-33
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
BAYES’ RULE
Biostatistics I: 2017-18
34
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Bayes’ Rul e: deri vati on
Definition:
Let A and B be two events with P(B) 0.
The conditional probability of A given B
is:
P( A & B)
P( A / B)
P( B)
The idea: if we are given that the event B occurred, the relevant
sample space is reduced to B {P(B)=1 because we know B is
true} and conditional probability becomes a probability measure
on B.
9/7/2017
Biostatistics I: 2017-18
35
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Bayes’ Rul e: deri vati on
P( A & B)
P( A / B)
P( B)
can be re-arranged to:
P( A & B) P( A / B) P( B)
and, since also:
P ( B / A)
9/7/2017
P( A & B)
P ( A)
P ( A & B ) P ( B / A) P ( A)
P ( A / B ) P ( B ) P ( A & B ) P ( B / A) P ( A)
P ( A / B ) P ( B ) P ( B / A) P ( A)
P ( B / A) P ( A)
P( A / B)
P ( B ) Biostatistics I: 2017-18
36
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Bayes’ Rul e:
P( B / A) P( A)
P( A / B)
P( B)
OR
P( B / A) P( A)
P( A / B)
P( B / A) P( A) P( B / ~ A) P(~ A)
9/7/2017
From the “Law of
Total Probability”
Biostatistics Chap
I: 2017-18
4-37
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Bayes’ Rul e:
Why do we care??
Why is Bayes’ Rule useful??
It turns out that sometimes it is very
useful to be able to “flip” conditional
probabilities. That is, we may know
the probability of A given B, but the
probability of B given A may not be
obvious. An example will help…
Biostatistics I: 2017-18
38
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
In-Cl ass Exerci se
If HIV has a prevalence of 3% in Jayapura,
and a particular HIV test has a false positive
rate of .001 and a false negative rate of .01,
what is the probability that a random
person who tests positive is actually
infected (also known as “positive predictive
value”)?
Biostatistics I: 2017-18
39
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Answer: usi ng probabi l i ty tree
P(test +)=.99
P(+)=.03
P (+, test +)=.0297
P(test - = .01)
P(+, test -)=.003
P(test +) = .001
P(-, test +)=.00097
P(-)=.97
P(-, test -) = .96903
P(test -) = .999
______________
1.0
A positive test places one on either of the two “test +” branches.
But only the top branch also fulfills the event “true infection.”
Therefore, the probability of being infected is the probability of being on the top
branch given that you are on one of the two circled branches above.
P( / test )
.0297
P(test &true)
96.8%
.0297 .00097
P(test )
9/7/2017
Biostatistics Chap
I: 2017-18
4-40
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Answer: usi ng Bayes’ rul e
P (true / test )
P (test / true ) P (true )
P (test / true ) P (true ) P (test / true ) P (true )
.99(.03)
96.8%
.99(.03) .001(.97)
9/7/2017
Biostatistics I: 2017-18
41
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
In-cl ass exerci se
An insurance company believes that drivers can be
divided into two classes—those that are of high risk
and those that are of low risk. Their statistics show
that a high-risk driver will have an accident at some
time within a year with probability .4, but this
probability is only .1 for low risk drivers.
a) Assuming that 20% of the drivers are high-risk, what is the
probability that a new policy holder will have an accident
within a year of purchasing a policy?
b) If a new policy holder has an accident within a year of
purchasing a policy, what is the probability that he is a highrisk type driver?
Biostatistics I: 2017-18
42
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Answer to (a)
Assuming that 20% of the drivers are of
high-risk, what is the probability that a new
policy holder will have an accident within a
year of purchasing a policy?
Use law of total probability:
P(accident)=
P(accident/high risk)*P(high risk) +
P(accident/low risk)*P(low risk) =
.40(.20) + .10(.80) = .08 + .08 = .16
9/7/2017
Biostatistics Chap
I: 2017-18
4-43
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Answer to (b)
If a new policy holder has an accident within a year of
purchasing a policy, what is the probability that he is a high-risk
type driver?
P(high-risk/accident)=
P(accident/high risk)*P(high risk)/P(accident)
=.40(.20)/.16 = 50%
P(accident, high risk)=.08
P(accident/HR)=.4
Or use tree:
P(high risk)=.20
P( no acc/HR)=.6
P(no accident, high risk)=.12)
P(accident/LR)=.1
P(accident, low risk)=.08
P(low risk)=.80
P( no
accident/LR)=.9
P(no accident, low risk)=.72
______________
1.0
P(high risk/accident)=.08/.16=50%
9/7/2017
Biostatistics Chap
I: 2017-18
4-44
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Condi ti onal Probabi l i ty for
Epi demi ol ogy:
The odds ratio and risk ratio as
conditional probability
Biostatistics I: 2017-18
45
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
The Ri sk Rati o and the Odds Rati o as
condi ti onal probabi l i ty
In epidemiology, the association between a
risk factor or protective factor (exposure)
and a disease may be evaluated by the “risk
ratio” (RR) or the “odds ratio” (OR).
Both are measures of “relative risk”—the
general concept of comparing disease risks
in exposed vs. unexposed individuals.
9/7/2017
Biostatistics I: 2017-18
46
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Odds and Ri sk (probabi l i ty)
Definitions:
Risk = P(A) = cumulative probability (you specify the time
period!)
For example, what’s the probability that a person with a high
sugar intake develops diabetes in 1 year, 5 years, or over a
lifetime?
Odds = P(A)/P(~A)
For example, “the odds are 3 to 1 against a horse” means that
the horse has a 25% probability of winning.
Note: An odds is always higher than its corresponding
probability, unless the probability is 100%.
Biostatistics I: 2017-18
Chap 4-47
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Odds vs. Ri sk=probabi l i ty
If the risk is…
Then the odds are…
½ (50%)
1:1
¾ (75%)
3:1
1/ 10 (10%)
1:9
1/ 100 (1%)
1:99
Note: An odds is always higher than its corresponding probability,
unless the probability is 100%.
9/7/2017
Biostatistics Chap
I: 2017-18
4-48
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Introducti on to the 2x2 Tabl e
Exposure (E)
No Exposure (~E)
Disease (D)
a
b
a+b = P(D)
No Disease (~D)
c
d
c+d = P(~D)
a+c = P(E)
b+d = P(~E)
Marginal probability
of exposure
Marginal probability of
disease
9/7/2017
Chap
4-49
Biostatistics
I: 2017-18
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Cohort Studi es
Disease
Exposed
Target
population
Disease-free
cohort
Disease-free
Disease
Not
Exposed
Disease-free
TIME
9/7/2017
Biostatistics Chap
I: 2017-18
4-50
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
The Risk Ratio, or Relative Risk (RR)
Exposure (E)
No Exposure (~E)
Disease (D)
a
b
No Disease (~D)
c
d
a+c
b+d
RR
P(D / E )
P( D /~E )
risk to the exposed
a
/(
a
c
)
b /( bd )
risk to the unexposed
9/7/2017
Biostatistics Chap
I: 2017-18
4-51
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Hypotheti cal Data
Congestive
Heart Failure
No CHF
High Systolic BP
Normal BP
400
400
1100
2600
1500
3000
400
/
1500
2 .0
RR
400 / 3000
9/7/2017
Biostatistics Chap
I: 2017-18
4-52
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
The odds rati o…
This risk ratio seems like the perfect measure
of relative risk. Why not stop here? Why
introduce the more complicated odds ratio??
We cannot calculate a risk ratio from a casecontrol study. Case-control studies are a
popular study design in epidemiology,
because they are useful for studying rare
diseases.
In a case-control study, we sample
conditional on disease status, so we cannot
calculate risk of disease.
9/7/2017
Biostatistics Chap
I: 2017-18
4-53
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Case-Control Studi es
Exposed in
past
Disease
Target
population
(Cases)
Not exposed
Exposed
No Disease
(Controls)
9/7/2017
Not Exposed
Biostatistics I: 2017-18
54
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
The Odds Ratio (OR)
Cases: Liver
cancer
Controls
Hep C +
Hep C -
90
10
100
30
70
100
What are P(D/E) and P(D/~E) here?
We can’t tell, because, by design, we have fixed the
proportion of liver cancer cases in this sample at 50% simply
by selecting half controls and half cases.
All these data give us is: P(E/D) and P(E/~D).
9/7/2017
Biostatistics Chap
I: 2017-18
4-55
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
The Odds Ratio (OR)
Cases: Liver
cancer
Controls
Hep C +
Hep C -
90
10
100
30
70
100
by Bayes’ Rule…
Luckily, P(E/D) [=the quantity you have]
P( E / D)
P( D / E ) P( E )
P( D)
Unfortunately, our sampling scheme precludes calculation of the marginals: P(E) and P(D), but turns out we
don’t need these if we use an odds ratio because the marginals cancel out!
9/7/2017
Biostatistics Chap
I: 2017-18
4-56
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
P( E / D)
P(~ E / D)
P( E / ~ D)
P(~ E / ~ D)
Odds of exposure in the cases
Odds of exposure in the controls
Bayes’ Rule
P( D / E) P( E)
P( D)
P( D / ~ E) P(~ E)
P( D)
P(~ D / E) P( E)
P(~ D)
P(~ D / ~ E) P(~ E)
P(~ D)
P( D / E )
P(~ D / E )
P( D / ~ E )
P(~ D / ~ E)
=
Odds of disease in the exposed
What we want!
Odds of disease in the unexposed
9/7/2017
Biostatistics Chap
I: 2017-18
4-57
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
The Odds Ratio (OR)
Exposure (E)
Disease (D)
a
No Disease (~D)
c
OR
a
b
c
d
No Exposure
(~E)
b
d
Odds of exposure
for the cases.
ad
bc
Odds of exposure
for the controls
a
c
b
d
Odds of disease
for the exposed
Odds of disease for
the unexposed
Biostatistics Chap
I: 2017-18
4-58
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
The rare di sease assumpti on
OR
P(D / E )
P (~ D / E ) 1
P(D /~ E )
P (~ D / ~ E )
P(D / E )
P(D /~ E )
RR
1
When a disease is rare:
P(~D) = 1 - P(D) 1
9/7/2017
Biostatistics Chap
I: 2017-18
4-59
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Exampl e
Cases: Liver
cancer
Controls
Hep C +
Hep C -
90
10
30
70
OR = 90*70/10*30 =21.0
Note: This indicates that those with Hep C infection have a 21-fold
increase in their odds of developing liver cancer (not in their
risk!).
The odds ratio will always be bigger than the corresponding risk
ratio if RR >1 and smaller if RR
Prof. dr. Siswanto Agus Wilopo, M.Sc., Sc.D.
Department of Biostatistics, Epidemiology and
Population Health
Faculty of Medicine
Universitas Gadjah Mada
Biostatistics I: 2017-18
Chap 4-1
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Learni ng Obj ecti ves
In this lecture, you learn:
Basic probability concepts and
definitions
Conditional probability
To use Bayes’ Theorem to revise
probabilities
Various counting rules
Biostatistics I: 2017-18
Chap 4-2
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Important Terms
Probability – the chance that an
uncertain event will occur (always
between 0 and 1)
Event – Each possible outcome of a
variable
Simple Event – an event that can be
described by a single characteristic
Sample Space – the collection of all
possible events
Biostatistics I: 2017-18
Chap 4-3
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Assessi ng Probabi l i ty
There are three approaches to assessing the
probability of an uncertain event:
1. a priori classical probability
probability of occurrence
X
number of ways the event can occur
T
total number of elementary outcomes
2. empirical classical probability
probability of occurrence
number of favorable outcomes observed
total number of outcomes observed
3. subjective probability
an individual judgment or opinion about
the probability of occurrence
Biostatistics I: 2017-18
Chap 4-4
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Sample Space
The Sample Space is the collection of all
possible events
e.g. All 6 faces of a die:
e.g. All 52 cards of a bridge deck:
Biostatistics I: 2017-18
Chap 4-5
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Events
Simple event
Complement of an event A (denoted A’)
An outcome from a sample space with one
characteristic
e.g., A red card from a deck of cards
All outcomes that are not part of event A
e.g., All cards that are not diamonds
Joint event
Involves two or more characteristics simultaneously
e.g., An ace that is also red from a deck of cards
Biostatistics I: 2017-18
Chap 4-6
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Vi sual i zi ng Events
Contingency Tables
Ace
Sample
Space
Not Ace
Total
Black
2
24
26
Red
2
24
26
Total
4
48
52
Tree Diagrams
2
Sample
Space
24
Full Deck
of 52 Cards
2
Biostatistics I: 2017-18
24
Chap 4-7
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Vi sual i zi ng Events
Venn Diagrams
Let A = aces
Let B = red cards
A ∩ B = ace and red
A
A U B = ace or red
Biostatistics I: 2017-18
B
Chap 4-8
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Mutual l y Excl usi ve Events
Mutually exclusive events
Events that cannot occur together
example:
A = queen of diamonds; B = queen of clubs
Events A and B are mutually exclusive
Biostatistics I: 2017-18
Chap 4-9
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Col l ecti vel y Exhausti ve Events
Collectively exhaustive events
One of the events must occur
The set of events covers the entire sample
space
example:
A = aces; B = black cards;
C = diamonds; D = hearts
Events A, B, C and D are collectively
exhaustive (but not mutually exclusive – an
ace may also be a heart)
Events B, C and D are collectively exhaustive
and also mutually exclusive
Biostatistics I: 2017-18
Chap 4-10
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Probabi l i ty
Probability is the numerical
measure of the likelihood that an
event will occur
The probability of any event must
be between 0 and 1, inclusively
0 ≤ P(A) ≤ 1 For any event A
1
0.5
The sum of the probabilities of all
mutually exclusive and
collectively exhaustive events is 1
P(A) P(B) P(C) 1
If A, B, and C are mutually exclusive and
collectively exhaustive Biostatistics I: 2017-18
Certain
0
Impossible
Chap 4-11
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Computi ng Joi nt and
Margi nal Probabi l i ti es
The probability of a joint event, A and B:
number of outcomes satisfying A and B
P( A and B)
total number of elementary outcomes
Computing a marginal (or simple)
probability:
P(A) P(A and B1 ) P(A and B 2 ) P(A and Bk )
Where B1, B2, …, Bk are k mutually exclusive and collectively
exhaustive events
Biostatistics I: 2017-18
Chap 4-12
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Joi nt Probabi l i ty Exampl e
P(Red and Ace)
number of cards that are red and ace 2
total number of cards
52
Type
Color
Red
Black
Total
Ace
2
2
4
Non-Ace
24
24
48
Total
26
26
52
Biostatistics I: 2017-18
Chap 4-13
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Margi nal Probabi l i ty Exampl e
P(Ace)
2
2
4
P( Ace and Re d) P( Ace and Black )
52 52 52
Type
Color
Red
Black
Total
Ace
2
2
4
Non-Ace
24
24
48
Total
26
26
52
Biostatistics I: 2017-18
Chap 4-14
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Joi nt Probabi l i ti es Usi ng
Conti ngency Tabl e
Event
B1
Event
B2
Total
A1
P(A1 and B1) P(A1 and B2)
A2
P(A2 and B1) P(A2 and B2) P(A2)
Total
P(B1)
Joint Probabilities
P(B2)
P(A1)
1
Marginal (Simple) Probabilities
Biostatistics I: 2017-18
Chap 4-15
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
General Addi ti on Rul e
General Addition Rule:
P(A or B) = P(A) + P(B) - P(A and B)
If A and B are mutually exclusive, then
P(A and B) = 0, so the rule can be simplified:
P(A or B) = P(A) + P(B)
For mutually exclusive events A and B
Biostatistics I: 2017-18
Chap 4-16
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
General Addi ti on Rul e Exampl e
P(Red or Ace) = P(Red) +P(Ace) - P(Red and Ace)
= 26/52 + 4/52 - 2/52 = 28/52
Type
Color
Red
Black
Total
Ace
2
2
4
Non-Ace
24
24
48
Total
26
26
52
Biostatistics I: 2017-18
Don’t count
the two red
aces twice!
Chap 4-17
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Computi ng Condi ti onal
Probabi l i ti es
A conditional probability is the probability of one
event, given that another event has occurred:
P(A and B)
P(A | B)
P(B)
The conditional
probability of A given
that B has occurred
P(A and B)
P(B | A)
P(A)
The conditional
probability of B given
that A has occurred
Where P(A and B) = joint probability of A and B
P(A) = marginal probability of A
P(B) = marginal probability of B
Biostatistics I: 2017-18
Chap 4-18
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Condi ti onal Probabi l i ty Exampl e
Of the cars on a used car lot, 70% have air
conditioning (AC) and 40% have a CD player
(CD). 20% of the cars have both.
What is the probability that a car has a
CD player, given that it has AC ?
i.e., we want to find P(CD | AC)
Biostatistics I: 2017-18
Chap 4-19
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Condi ti onal Probabi l i ty Exampl e
(continued)
Of the cars on a used car lot, 70% have air conditioning
(AC) and 40% have a CD player (CD).
20% of the cars have both.
CD
No CD
Total
AC
0.2
0.5
0.7
No AC
0.2
0.1
0.3
Total
0.4
0.6
1.0
P(CD and AC) 0.2
P(CD | AC)
0.2857
P(AC)
0.7
Biostatistics I: 2017-18
Chap 4-20
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Condi ti onal Probabi l i ty Exampl e
(continued)
Given AC, we only consider the top row (70% of the cars). Of these,
20% have a CD player. 20% of 70% is about 28.57%.
CD
No CD
Total
AC
0.2
0.5
0.7
No AC
0.2
0.1
0.3
Total
0.4
0.6
1.0
P(CD and AC) 0.2
P(CD | AC)
0.2857
P(AC)
0.7
Biostatistics I: 2017-18
Chap 4-21
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Usi ng Deci si on Trees
.2
.7
Given AC or
no AC:
.5
.7
All
Cars
.2
.3
Biostatistics I: 2017-18
.1
.3
P(AC and CD) = 0.2
P(AC and CD’) = 0.5
P(AC’ and CD) = 0.2
P(AC’ and CD’) = 0.1
Chap 4-22
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Usi ng Deci si on Trees
.2
.4
Given CD or
no CD:
.2
.4
All
Cars
.5
.6
Biostatistics I: 2017-18
.1
.6
(continued)
P(CD and AC) = 0.2
P(CD and AC’) = 0.2
P(CD’ and AC) = 0.5
P(CD’ and AC’) = 0.1
Chap 4-23
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Stati sti cal Independence
Two events are independent if and
only if:
P(A | B) P(A)
Events A and B are independent when the
probability of one event is not affected by the
other event
Biostatistics I: 2017-18
Chap 4-24
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Mul ti pl i cati on Rul es
Multiplication rule for two events A
and B:
P(A and B) P(A | B) P(B)
Note: If A and B are independent, then P(A | B) P(A)
and the multiplication rule simplifies to
P(A and B) P(A) P(B)
Biostatistics I: 2017-18
Chap 4-25
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Bayes’ Theorem
Bayes’ theorem is used to revise previously calculated
probabilities after new information is obtained
P(A | B i )P(Bi )
P(B i | A)
P(A | B 1)P(B1) P(A | B 2 )P(B2 ) P(A | B k )P(Bk )
where:
Bi = ith event of k mutually exclusive and
collectively exhaustive events
A = new event that might impact P(Bi)
Biostatistics I: 2017-18
Chap 4-26
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Margi nal Probabi l i ty
Marginal probability for event A:
P(A) P(A | B1 ) P(B1 ) P(A | B 2 ) P(B 2 ) P(A | Bk ) P(Bk )
Where B1, B2, …, Bk are k mutually exclusive and
collectively exhaustive events
Biostatistics I: 2017-18
Chap 4-27
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Practi ce probl em
If HIV has a prevalence of 3% in
Jayapura, and a particular HIV test
has a false positive rate of .001 and a
false negative rate of .01, what is the
probability that a random person
selected off the street will test
positive?
9/7/2017
Biostatistics Chap
I: 2017-18
4-28
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Answer
Conditional probability: the
probability of testing + given that
a person is +
Joint probability of being + and
testing +
Marginal probability of carrying
the virus.
P(test +)=.99
P(+)=.03
P (+, test +)=.0297
P(test - )= .01
P(+, test -)=.003
P(test +) = .001
P(-, test +)=.00097
P(-)=.97
P(test -) = .999
P(-, test -) = .96903
______________
1.0
P(test +)=.0297+.00097=.03067
P(+&test+)P(+)*P(test+)
.0297 .03*.03067 (=.00092)
Biostatistics I: 2017-18
Dependent!
Marginal probability of testing
positive
9/7/2017
Chap 4-29
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Law of total probabi l i ty
P(test ) P(test /HIV )P(HIV) P(test /HIV)P(HIV-)
One of these has to be true (mutually
exclusive, collectively exhaustive).
They sum to 1.0.
P(test ) .99(.03) .001(.97)
9/7/2017
Biostatistics I: 2017-18
30
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Law of total probabi l i ty
Formal Rule: Marginal probability for event
A=
P(A) P(A | B1 ) P(B1 ) P(A | B 2 ) P(B 2 ) P(A | Bk ) P(Bk )
k
B
i
Where:
1.0 and P(Bi &B j ) 0 (mutually exclusive)
i 1
B3
B1
A
B2
P(A) (50%)(25%) (0)(50%) (50%)(25%) 25%
9/7/2017
Biostatistics I: 2017-18
31
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Exampl e 2
A 54-year old woman has an abnormal
mammogram; what is the chance that
she has breast cancer?
Mammogram sensitivity=0.90 and
specificity =0.89
9/7/2017
Biostatistics I: 2017-18
32
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Exampl e: Mammography
sensitivity
P(test +)=.90
P(BC+)=.003
P (+, test +)=.0027
P(test -) = .10
P(+, test -)=.0003
P(test +) = .11
P(-, test +)=.10967
P(BC-)=.997
P(test -) = .89
specificity
P(-, test -) = .88733
______________
1.0
Marginal probabilities of breast cancer….(prevalence
among all 54-year olds)
P(BC/test+)=.0027/(.0027+.10967)=2.4%
Biostatistics I: 2017-18
9/7/2017
Chap 4-33
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
BAYES’ RULE
Biostatistics I: 2017-18
34
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Bayes’ Rul e: deri vati on
Definition:
Let A and B be two events with P(B) 0.
The conditional probability of A given B
is:
P( A & B)
P( A / B)
P( B)
The idea: if we are given that the event B occurred, the relevant
sample space is reduced to B {P(B)=1 because we know B is
true} and conditional probability becomes a probability measure
on B.
9/7/2017
Biostatistics I: 2017-18
35
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Bayes’ Rul e: deri vati on
P( A & B)
P( A / B)
P( B)
can be re-arranged to:
P( A & B) P( A / B) P( B)
and, since also:
P ( B / A)
9/7/2017
P( A & B)
P ( A)
P ( A & B ) P ( B / A) P ( A)
P ( A / B ) P ( B ) P ( A & B ) P ( B / A) P ( A)
P ( A / B ) P ( B ) P ( B / A) P ( A)
P ( B / A) P ( A)
P( A / B)
P ( B ) Biostatistics I: 2017-18
36
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Bayes’ Rul e:
P( B / A) P( A)
P( A / B)
P( B)
OR
P( B / A) P( A)
P( A / B)
P( B / A) P( A) P( B / ~ A) P(~ A)
9/7/2017
From the “Law of
Total Probability”
Biostatistics Chap
I: 2017-18
4-37
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Bayes’ Rul e:
Why do we care??
Why is Bayes’ Rule useful??
It turns out that sometimes it is very
useful to be able to “flip” conditional
probabilities. That is, we may know
the probability of A given B, but the
probability of B given A may not be
obvious. An example will help…
Biostatistics I: 2017-18
38
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
In-Cl ass Exerci se
If HIV has a prevalence of 3% in Jayapura,
and a particular HIV test has a false positive
rate of .001 and a false negative rate of .01,
what is the probability that a random
person who tests positive is actually
infected (also known as “positive predictive
value”)?
Biostatistics I: 2017-18
39
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Answer: usi ng probabi l i ty tree
P(test +)=.99
P(+)=.03
P (+, test +)=.0297
P(test - = .01)
P(+, test -)=.003
P(test +) = .001
P(-, test +)=.00097
P(-)=.97
P(-, test -) = .96903
P(test -) = .999
______________
1.0
A positive test places one on either of the two “test +” branches.
But only the top branch also fulfills the event “true infection.”
Therefore, the probability of being infected is the probability of being on the top
branch given that you are on one of the two circled branches above.
P( / test )
.0297
P(test &true)
96.8%
.0297 .00097
P(test )
9/7/2017
Biostatistics Chap
I: 2017-18
4-40
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Answer: usi ng Bayes’ rul e
P (true / test )
P (test / true ) P (true )
P (test / true ) P (true ) P (test / true ) P (true )
.99(.03)
96.8%
.99(.03) .001(.97)
9/7/2017
Biostatistics I: 2017-18
41
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
In-cl ass exerci se
An insurance company believes that drivers can be
divided into two classes—those that are of high risk
and those that are of low risk. Their statistics show
that a high-risk driver will have an accident at some
time within a year with probability .4, but this
probability is only .1 for low risk drivers.
a) Assuming that 20% of the drivers are high-risk, what is the
probability that a new policy holder will have an accident
within a year of purchasing a policy?
b) If a new policy holder has an accident within a year of
purchasing a policy, what is the probability that he is a highrisk type driver?
Biostatistics I: 2017-18
42
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Answer to (a)
Assuming that 20% of the drivers are of
high-risk, what is the probability that a new
policy holder will have an accident within a
year of purchasing a policy?
Use law of total probability:
P(accident)=
P(accident/high risk)*P(high risk) +
P(accident/low risk)*P(low risk) =
.40(.20) + .10(.80) = .08 + .08 = .16
9/7/2017
Biostatistics Chap
I: 2017-18
4-43
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Answer to (b)
If a new policy holder has an accident within a year of
purchasing a policy, what is the probability that he is a high-risk
type driver?
P(high-risk/accident)=
P(accident/high risk)*P(high risk)/P(accident)
=.40(.20)/.16 = 50%
P(accident, high risk)=.08
P(accident/HR)=.4
Or use tree:
P(high risk)=.20
P( no acc/HR)=.6
P(no accident, high risk)=.12)
P(accident/LR)=.1
P(accident, low risk)=.08
P(low risk)=.80
P( no
accident/LR)=.9
P(no accident, low risk)=.72
______________
1.0
P(high risk/accident)=.08/.16=50%
9/7/2017
Biostatistics Chap
I: 2017-18
4-44
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Condi ti onal Probabi l i ty for
Epi demi ol ogy:
The odds ratio and risk ratio as
conditional probability
Biostatistics I: 2017-18
45
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
The Ri sk Rati o and the Odds Rati o as
condi ti onal probabi l i ty
In epidemiology, the association between a
risk factor or protective factor (exposure)
and a disease may be evaluated by the “risk
ratio” (RR) or the “odds ratio” (OR).
Both are measures of “relative risk”—the
general concept of comparing disease risks
in exposed vs. unexposed individuals.
9/7/2017
Biostatistics I: 2017-18
46
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Odds and Ri sk (probabi l i ty)
Definitions:
Risk = P(A) = cumulative probability (you specify the time
period!)
For example, what’s the probability that a person with a high
sugar intake develops diabetes in 1 year, 5 years, or over a
lifetime?
Odds = P(A)/P(~A)
For example, “the odds are 3 to 1 against a horse” means that
the horse has a 25% probability of winning.
Note: An odds is always higher than its corresponding
probability, unless the probability is 100%.
Biostatistics I: 2017-18
Chap 4-47
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Odds vs. Ri sk=probabi l i ty
If the risk is…
Then the odds are…
½ (50%)
1:1
¾ (75%)
3:1
1/ 10 (10%)
1:9
1/ 100 (1%)
1:99
Note: An odds is always higher than its corresponding probability,
unless the probability is 100%.
9/7/2017
Biostatistics Chap
I: 2017-18
4-48
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Introducti on to the 2x2 Tabl e
Exposure (E)
No Exposure (~E)
Disease (D)
a
b
a+b = P(D)
No Disease (~D)
c
d
c+d = P(~D)
a+c = P(E)
b+d = P(~E)
Marginal probability
of exposure
Marginal probability of
disease
9/7/2017
Chap
4-49
Biostatistics
I: 2017-18
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Cohort Studi es
Disease
Exposed
Target
population
Disease-free
cohort
Disease-free
Disease
Not
Exposed
Disease-free
TIME
9/7/2017
Biostatistics Chap
I: 2017-18
4-50
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
The Risk Ratio, or Relative Risk (RR)
Exposure (E)
No Exposure (~E)
Disease (D)
a
b
No Disease (~D)
c
d
a+c
b+d
RR
P(D / E )
P( D /~E )
risk to the exposed
a
/(
a
c
)
b /( bd )
risk to the unexposed
9/7/2017
Biostatistics Chap
I: 2017-18
4-51
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Hypotheti cal Data
Congestive
Heart Failure
No CHF
High Systolic BP
Normal BP
400
400
1100
2600
1500
3000
400
/
1500
2 .0
RR
400 / 3000
9/7/2017
Biostatistics Chap
I: 2017-18
4-52
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
The odds rati o…
This risk ratio seems like the perfect measure
of relative risk. Why not stop here? Why
introduce the more complicated odds ratio??
We cannot calculate a risk ratio from a casecontrol study. Case-control studies are a
popular study design in epidemiology,
because they are useful for studying rare
diseases.
In a case-control study, we sample
conditional on disease status, so we cannot
calculate risk of disease.
9/7/2017
Biostatistics Chap
I: 2017-18
4-53
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Case-Control Studi es
Exposed in
past
Disease
Target
population
(Cases)
Not exposed
Exposed
No Disease
(Controls)
9/7/2017
Not Exposed
Biostatistics I: 2017-18
54
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
The Odds Ratio (OR)
Cases: Liver
cancer
Controls
Hep C +
Hep C -
90
10
100
30
70
100
What are P(D/E) and P(D/~E) here?
We can’t tell, because, by design, we have fixed the
proportion of liver cancer cases in this sample at 50% simply
by selecting half controls and half cases.
All these data give us is: P(E/D) and P(E/~D).
9/7/2017
Biostatistics Chap
I: 2017-18
4-55
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
The Odds Ratio (OR)
Cases: Liver
cancer
Controls
Hep C +
Hep C -
90
10
100
30
70
100
by Bayes’ Rule…
Luckily, P(E/D) [=the quantity you have]
P( E / D)
P( D / E ) P( E )
P( D)
Unfortunately, our sampling scheme precludes calculation of the marginals: P(E) and P(D), but turns out we
don’t need these if we use an odds ratio because the marginals cancel out!
9/7/2017
Biostatistics Chap
I: 2017-18
4-56
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
P( E / D)
P(~ E / D)
P( E / ~ D)
P(~ E / ~ D)
Odds of exposure in the cases
Odds of exposure in the controls
Bayes’ Rule
P( D / E) P( E)
P( D)
P( D / ~ E) P(~ E)
P( D)
P(~ D / E) P( E)
P(~ D)
P(~ D / ~ E) P(~ E)
P(~ D)
P( D / E )
P(~ D / E )
P( D / ~ E )
P(~ D / ~ E)
=
Odds of disease in the exposed
What we want!
Odds of disease in the unexposed
9/7/2017
Biostatistics Chap
I: 2017-18
4-57
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
The Odds Ratio (OR)
Exposure (E)
Disease (D)
a
No Disease (~D)
c
OR
a
b
c
d
No Exposure
(~E)
b
d
Odds of exposure
for the cases.
ad
bc
Odds of exposure
for the controls
a
c
b
d
Odds of disease
for the exposed
Odds of disease for
the unexposed
Biostatistics Chap
I: 2017-18
4-58
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
The rare di sease assumpti on
OR
P(D / E )
P (~ D / E ) 1
P(D /~ E )
P (~ D / ~ E )
P(D / E )
P(D /~ E )
RR
1
When a disease is rare:
P(~D) = 1 - P(D) 1
9/7/2017
Biostatistics Chap
I: 2017-18
4-59
sawilopo@yahoo.com Universitas Gadjah Mada, Faculty of Medicine, Department of Biostatistics, Epidemiology and Population Health
Exampl e
Cases: Liver
cancer
Controls
Hep C +
Hep C -
90
10
30
70
OR = 90*70/10*30 =21.0
Note: This indicates that those with Hep C infection have a 21-fold
increase in their odds of developing liver cancer (not in their
risk!).
The odds ratio will always be bigger than the corresponding risk
ratio if RR >1 and smaller if RR