Directory UMM :Data Elmu:jurnal:M:Mathematical Biosciences:Vol167.Issue1.Sept2000:

Mathematical Biosciences 167 (2000) 31±50
www.elsevier.com/locate/mbs

Estimation of HIV infection and incubation via state space
models
Wai-Yuan Tan *, Zhengzheng Ye
Department of Mathematical Sciences, The University of Memphis, 335 Win®eld Dunn, Memphis, TN 38152, USA
Received 1 February 1999; received in revised form 28 August 1999; accepted 3 September 1999

Abstract
By using the state space model (Kalman ®lter model) of the HIV epidemic, in this paper we have developed a general Bayesian procedure to estimate simultaneously the HIV infection distribution, the HIV
incubation distribution, the numbers of susceptible people, infective people and AIDS cases. The basic
approach is to use the Gibbs sampling method combined with the weighted bootstrap method. We have
applied this method to the San Francisco AIDS incidence data from January 1981 to December 1992. The
results show clearly that both the probability density function of the HIV infection and the probability
density function of the HIV incubation are curves with two peaks. The results of the HIV infection distribution are clearly consistent with the ®nding by Tan et al. [W.Y. Tan, S.C. Tang, S.R. Lee, Estimation of
HIV seroconversion and eects of age in San Francisco homosexual populations, J. Appl. Stat. 25 (1998)
85]. The results of HIV incubation distribution seem to con®rm the staged model used by Satten
and Longini [G. Satten, I. Longini, Markov chain with measurement error: estimating the `true' course
of marker of the progression of human immunode®ciency virus disease, Appl. Stat. 45 (1996) 275].
Ó 2000 Elsevier Science Inc. All rights reserved.

Keywords: Backcalculation method; Chain binomial distribution; Gibbs sampler; HIV infection distribution; HIV
incubation distribution; Observation model; Prior distribution; Stochastic system model

1. Introduction
To estimate the numbers of susceptible people (S people), HIV-infected people (I people) and
AIDS cases, Tan and Xiang [3,4] have proposed some state space models in homosexual populations. In these models, the stochastic system models are the chain multinomial and binomial

*

Corresponding author. Tel.: +1-901 678 2492; fax: +1-901 678 2480.
E-mail address: [email protected] (W.-Y. Tan).

0025-5564/00/$ - see front matter Ó 2000 Elsevier Science Inc. All rights reserved.
PII: S 0 0 2 5 - 5 5 6 4 ( 0 0 ) 0 0 0 2 3 - 7

32

W.-Y. Tan, Z. Ye / Mathematical Biosciences 167 (2000) 31±50

distributions expressed in terms of stochastic equations whereas the observation model is a statistical model based on AIDS incidence data. A major problem in the classical Kalman ®lter

method is that one needs to assume that the parameters are known for deriving optimal estimates
or predictions of the state variables. Hence, in the HIV epidemic, to derive estimates of the
numbers of S people, I people and AIDS cases, one needs to assume the probabilities of infection
of S people by HIV (to be denoted by pS t) and the transition rates to AIDS of I people (to be
denoted by ct) as known or can be estimated from other sources. Thus, in Tan and Xiang [3,4],
the pS t were estimated from studies by Tan et al. [1] based on the San Francisco City Clinic
Cohort (SFCCC) data set whereas the estimates of ct were derived by studies by Satten and
Longini [2] based on the San Francisco Men's Health Study (SFMHS ) data set. In this paper, we
will use the state space model to develop a general Bayesian procedure to estimate simultaneously
the HIV infection distribution, the HIV incubation distribution and the numbers of S people, I
people and AIDS cases over the time span. The advantages of the new method over the approach
in Tan and Xiang [3,4] are: (1) One does not need to assume pS t and ct as known although if
some information on these parameters are available from previous studies, the information can
always be incorporated into the analysis through the Bayesian component of the method. Thus,
the method is always applicable even though there is no prior information about pS t and ct or
no data sets from previous studies to estimate these parameters [5]. (2) The method permits us to
incorporate or combine information from three sources: (a) Information from the stochastic
system model. (b) Information from the data set through the observation model. (c) Information
on pS t and ct from previous studies through the prior distribution of these parameters. Notice
that if there is no prior information on the parameters due to lack of previous studies, to implement the method one may always assume non-informative or uniform prior which re¯ects the

situation that our prior information about the parameters is lacking or vague and imprecise.
In Sections 2 and 3, we will introduce the chain binomial model and illustrate how to develop a
state space model for a large population at risk for the HIV epidemic. By using this state space
model, in Sections 4 and 5 we will propose a general procedure for estimating simultaneously the
HIV infection distribution, the HIV incubation distribution and the numbers of S people, I people
and AIDS cases. In Section 6, we will apply the method to the San Francisco homosexual population to estimate these distributions as well as the numbers of S people, I people and AIDS
cases. Finally in Section 7, we will draw some conclusions and discuss some issues relevant to the
model and the method.

2. The chain binomial model of the HIV epidemic
To derive a stochastic model for the HIV epidemic, consider a large population at risk for
AIDS. Then, there are three types of people in the population: S people (susceptible people), I
people (infective people) and A people (AIDS patients). S people are healthy people but can
contract HIV to become I people through sexual contact and/or IV drug contact with I people or
AIDS people or through contact with HIV-contaminated blood. I people are people who have
contracted HIV and can pass the HIV to S people through sexual contact or IV drug contact with
S people. According to the 1993 AIDS case de®nition (see [6]) by the Center of Disease Control
(CDC) at Atlanta, GA, an I person will be classi®ed as a clinical AIDS patient (A person) when

W.-Y. Tan, Z. Ye / Mathematical Biosciences 167 (2000) 31±50

33

this person develops AIDS symptoms and/or when his/her CD4 T-cell counts fall below
200=mm3 . In this section, we will illustrate how to develop a discrete time stochastic model for the
HIV epidemic with variable infection duration in this city. (With no loss of generality we will let a
month be the time unit unless otherwise stated.)
To begin with, let St denote the number of S people at time t, Zt the number of new AIDS
cases during the month t ÿ 1; t and Iu; t the number of I people who have contracted HIV at
time t ÿ ut P u. (We refer u as the infection duration of I people and denote by Iu infective
people with infection duration in u; u 1.) Suppose that at time t0 0, a few HIV were introduced into the population to start the HIV epidemic so that with probability one, Iu; t 0 if
u P t P 0. When time is discrete, we are then entertaining a multi-dimensional discrete time
stochastic process X t fSt; Iu; t; u 0; 1; . . . ; tg and Zt. For this stochastic process, let
pS t be the probability that a S person will contract HIV to become an I0 person during
t; t 1 and cu; t the probability that an Iu person will develop AIDS symptoms to become a
clinical AIDS patient during t; t 1. Further, we make the following assumptions:
1. As shown in [7], the pS t are functions of the dynamics of the HIV epidemic and the state variables and hence are basically stochastic probabilities. However, through Monte Carlo studies,
Tan and Byers [7] and Tan et al. [8] have shown that one may practically ignore randomness in
pS t. That is, one may derive pS t by replacing the random state variables by the corresponding expected numbers; see Remark 1. Thus in this paper, we will assume that pS t and cu; t
are deterministic functions of time t. As in the literature, we further assume that cu; t cu;

see [9,10].
2. Due to AIDS awareness, one may assume that there are no immigrants and recruitment for
AIDS cases.
3. As the total population size changes very little over time aside from death from AIDS, for S
people and I people one may assume that the numbers of immigrants and recruitment are almost equal to those by death and migration [11,12]. This is equivalent to assuming that the immigration and recruitment rate k equals to the death and migration rate l of people in the
population. Notice that for the San Francisco homosexual population, Hethcote and Van
Ark [11] have shown that the number of immigration per year is almost identical to that of migration per year (about 5% annually), see also [13]; further, based on census data [14], they have
estimated the death rate for people between age 24 and 54 as 0.000532 per month. Thus, for this
population one may expect that this assumption would not aect signi®cantly the estimates of
the HIV infection distribution and the HIV incubation distribution; see Remark 2.
4. As in the literature, we assume that there are no reverse transitions from I to S and from A to I;
see [10,11].
Remark 1. There are two way to derive pS t. By assuming a preferred mixing pattern [15], Tan
and Byers [7] have derived pS t explicitly as functions of the state variables (i.e., St; Iu; t).
Through Monte Carlo studies, they have shown that one may practically assume pS t as deterministic functions of time t by replacing the state variables by their expected numbers, respectively. Alternatively, by assuming pS T as deterministic unknown parameters, statisticians have
attempted to estimate pS t by noting that pS t is the incidence function of the HIV infection. The
latter approach is the main issue in the so-called backcalculation method. (For details, see
Chapter 8 of the book by Brookmeyer and Gail [10].) Notice that the ®rst approach assumes the

34

W.-Y. Tan, Z. Ye / Mathematical Biosciences 167 (2000) 31±50

dynamics of the HIV epidemic to construct pS t while the latter approach tries to estimate pS t
by avoiding the dynamics of the HIV epidemic. It is to be understood that regardless of the
approaches, the pS t are functions of the dynamics such as the mixing pattern of the epidemic
expect that in the latter approach the function form is not explicitly given in terms of the dynamics. By assuming a preferred mixing pattern, one may assume pS t as a mixture of two
functions, one relating to the restricted mixing pattern and the other to the proportional mixing
pattern. Then with the estimates of pS t available, one may derive estimates of the proportion of
the restricted mixing pattern a well as other parameters such as the per contact probability of
transmission. This is the approach used by Tan and Xiang [3] to estimate the per contact probability of HIV transmission from the I people to the S people.
Remark 2. Our Monte Carlo studies have indicated that in general this assumption has little
impact on the estimates of the HIV infection distribution and the HIV incubation distribution.
For the San Francisco homosexual population, the estimates of the HIV infection and the HIV
incubation distribution in Section 6 are almost identical to those given in Tan and Xiang [3,4] by
using other data sets. It is to be noted, however, that this assumption does have some impact on
the estimates of the numbers of S people and I people. In applying our theories to the San
Francisco data we have thus made some adjustment by incorporating a 1% monthly increase (i.e.,
k ÿ l 0:01) in estimating the numbers of S people and I people. Notice that with 50,000 people
in January 1970 and with a 1% monthly increase, the estimate of the population size of the San

Francisco homosexual population in 1985 is 58,048 which is very close to the survey results of
58,500 in 1985 by Lemp et al. [16].
Given the above assumptions, one may readily derive some basic results for the process

fSt; Iu; t; u 0; 1; . . . ; tg and Zt. In Section 2.1, we will derive some stochastic equaX

tions for St and Iu; t; u 0; 1; . . . ; t. In Section 2.2, we will derive the probability distributions
of X t.
2.1. Stochastic equations for St; Iu; t; u 0; 1; . . . ; t and Zt
Let FS t denote the number of S people who have contracted HIV to become I0 people
during t; t 1 and FI u; t the number of Iu people who have developed AIDS symptoms to
become clinical AIDS patients during t; t 1. Then the conditional distribution of FS t given
St is binomial with parameters fSt; pS tg (i.e., FS t j St BfSt; pS tg). Similarly,
FI u; t j Iu; t Bfu; t; cug. Further, under assumptions (1)±(4) given above, we have:
St 1 St ÿ FS t;

1

I0; t 1 FS t;

2

Iu 1; t 1 Iu; t ÿ FI u; t;
Zt 1

t
X
u0

FI u; t:

u 0; . . . ; t;

3
4

W.-Y. Tan, Z. Ye / Mathematical Biosciences 167 (2000) 31±50

35

Let
et 1 eS t 1; eI 0; t 1; eI u 1; t 1; u 0; 1; . . . ; t; eZ t 1T
denote the vector of random noises for the deviation from the respective conditional mean
numbers. From the above distribution results, one may readily derive the conditional means of the
random variables in the above equations. Then, by subtracting these conditional means from the
respective random variables in the above equations and noting that the time unit of one hour is
very small, we obtain
eS t ÿFS t ÿ StpS t;
eI 0; t 1 FS t ÿ StpS t;
eI u 1; t 1 ÿFI u; t ÿ Iu; tcu;
eZ t 1

t
X

u 0; 1; . . . ; t;

FI u; t ÿ Iu; tcu:

u1

Then, Eqs. (1)±(4) are equivalent to the following stochastic dierence equations:
St 1 St ÿ StpS t eS t;

5

I0; t 1 StpS t eI 0; t 1;

6

Iu 1; t 1 Iu; t ÿ Iu; tcu eI u 1; t 1;
Zt 1t

t
X

Iu; tcu eZ t 1:

u 0; 1; . . . ; t;

7
8

u0

In Eqs. (5)±(8), given X t the random noises et have expectation 0. It follows that the expected value of these random noises is 0. Using the basic formulae CovX ; Y
EfCovX ; Y j Z jg CovEX j Z; EY j Z, it is also obvious that elements of et are uncorrelated with elements of X t as well as with elements of es for all t 6 s. Further, because the
random noises are basically linear combinations of binomial random variables, the variances and
covariances of elements of et are easily be derived.
2.2. The probability distributions of X t
Let X fX
1; . . . ; X tM g, where tM is the last time point and H fpS t; ct; t 1; . . . ; tM g.

Then X is the collection of all the state variables and H the collection of all the parameters. Using
results in Section 2.1, the conditional probability distribution PrfX j X 0g of X given X 0 is
PrfX j X 0g

tY
M ÿ1
j0

where

PrfX
j 1 j X j; Hg;

9

36

W.-Y. Tan, Z. Ye / Mathematical Biosciences 167 (2000) 31±50

(

Pr X t 1 j X t; H

)

St
pS tI0;t1 1 ÿ pS tSt1

I0; t 1

t
Y
Iu; t

cuIu;tÿIu1;t1
Iu;
t
ÿ
Iu

1;
t

1
u0

1 ÿ cuIu1;t1 :

Hence
(

Pr X j X 0

)

C

where
C

tY
M ÿ1
t0

c1 u

tY
M ÿ1 n
t0

St
I0; t 1

tX
M ÿ1

pS tI0;t1 1 ÿ pS tSt1

( Y
t
u0

o tY
M ÿ1 n
u0

Iu; t
Iu; t ÿ Iu 1; t 1

o
cuc1 u 1 ÿ cuc2 u ;

10

)
;

fIu; t ÿ Iu 1; t 1g;

tu

c2 u

tX
M ÿ1

Iu 1; t 1:

tu

Notice that Eq. (10) is a product of binomial distributions so that the above distribution is referred to as a chain binomial distribution.
2.3. The mean numbers of X t
Let mS t ESt, mI u; t EIu; t, u 0; 1; . . . ; t and mZ t EZt
Qt be the expected numbers
of
St,
Iu;
t,
u

0;
1;
.
.
.
;
t
and
Zt,
respectively.
Let
G
t

S
j1 1 ÿ pS j and RI t
Qt
1
ÿ
cj
so
that
G
t
is
the
survival
functions
of
HIV
infection
and RI t the survival
S
j1
function of HIV incubation. Then, from Eqs. (5)±(8), we have
mS t 1 mS t1 ÿ pS t mS 0GS t;
mI 0; t 1 mS tpS t mS 0GS t ÿ 1pS t mS 0fI t;
mI u 1; t 1 mI u; t1 ÿ cu mI 0; t ÿ uRI u;
where fI t GS t ÿ 1pS t is the probability density function of HIV infection.
It follows that
mZ t 1

t
X
u0

mI u; tcu

t
X
u0

mI 0; t ÿ uRI u ÿ 1cu mS 0

t
X
u0

fI t ÿ ugu;

11

W.-Y. Tan, Z. Ye / Mathematical Biosciences 167 (2000) 31±50

37

where gu RI u ÿ 1cu is the probability density function of HIV incubation. Notice that
Eq. (11) is a convolution of the HIV infection distribution and the HIV incubation distribution.

3. The state space model of the HIV epidemic
State space models (Kalman ®lter models) are stochastic models consisting of two sub-models:
one sub-model has been referred to as the stochastic system model which is the stochastic model of
the system; the other sub-model has been referred to as the observation model which is a statistical
model based on some data from the system. For the model of the HIV epidemic in Section 2, the
stochastic system model of the state space model is the set of stochastic dierential equations given
in Eqs. (5)±(8) de®ned in Section 2; the observation model of the state space model is a statistical
model based on some available AIDS incidence data.
For the observation model, let Y j be the observed number of new AIDS cases during the
month j; j 1. Then, the observation model is given by the equation
Y j Zj 1 ej EZj 1 j ej;

12

where j Zj 1 ÿ EZj 1 and ej is the measurement error (reporting error for reporting
AIDS incidence) for observing Y j. Because reporting delay has been corrected for CDC surveillance data, one may assume that the ejs are independently normally distributed with mean
zero and conditional variance given Zj 1 as r2j Zj 1r2 ; for justi®cation for assuming
such a variance, see [4]. Thus, the conditional probability density of the observation
Y fY 1; . . . ; Y tM gT given fX; Hg is
(
)
tM
Y
Pr fY j X; Hg
Pr Y j j X j; H ;
13
j1

where

(

)

Pr Y j j X j; H

/

r2j ÿ1=2

exp

(

)
1
2
ÿ 2 Y j ÿ Zj 1 :
2rj

14

In Section 2.3, it has been shown that EZt 1 is a convolution of the HIV infection and HIV
incubation which is the basic formula used by the backcalculation method (see [9,10]). It follows
that the backcalculation method as given in [9,10] is a special case of the above observation model.
To present the state space model in matrix form, let
X t fSt; Iu; t; u 0; 1; . . . ; t; Ztg

T

and
t fS t; I u; t; u 0; 1; . . . ; t; Z tgT :

Then,
X t 1 F t 1; tX
t t 1;

38

W.-Y. Tan, Z. Ye / Mathematical Biosciences 167 (2000) 31±50

where F t 1; t is given by
2
1 ÿ pS t
0
6 pS t
0
6
6
0
1
ÿ
c0
6
F t 1; t 6
..
..
6
.
.
6
4
0
0
0
c0

...
...
...
..
.

0
0
0
..
.

...
...

1 ÿ ct
ct

3
0
07
7
07
7
.. 7:
.7
7
05
0

4. A general procedure for simultaneously estimating the state variables and the unknown parameters
Consider a state space model with stochastic system model given by (15) and with observation
model given by (16).
(a) Stochastic system model:
X j 1 F j 1; jX
j j 1:

15

(b) Observation model:
j 1 ej 1;
Y j 1 Hj 1; X

16

where F j 1; t and H j 1 are matrices of deterministic transition functions, e j 1 the
random noises and ej 1 are the random measurement error.
Let H denote the unknown parameters in F t 1; j, Hj 1 and in the probability distributions of e j 1 and ej 1. Let P X j H be the probability density function of X given H
and X 0 derived from the stochastic system model and P Y j X; H the probability density
function of Y given fX; Hg derived from the observation model. (P Y j X; H is usually referred
to as the likelihood function of the parameters.) Let P H be the prior distribution of H derived
from previous studies or from prior knowledge about H. (If there is no prior information or the
prior information is vague and imprecise, one usually assumes a non-informative or uniform
prior; see [17].) Based on the type of probability distributions being used, the standard inference in
the literature may be classi®ed as:
1. The Sampling Theory Inference: Given X, inference about H is derived only from the likelihood
function P Y j X; H. For example, the backcalculation method in the HIV epidemic derives
estimate of H fpS t; t 1; . . . ; tM g by maximizing P Y j X; H, see [9,10]; these are the maximum likelihood estimator (MLE) of H.
2. The Bayesian Inference: Given X, the Bayesian inference about H is derived from the posterior
distribution of H which is proportional to the product of P H and P Y j X; H. For example,
one may use the posterior mean fH j X; Yg of H fpS t; ct; t 1; . . . ; tM g given fX; Yg or
the posterior mode of H given fX; Yg as an estimate of H. These are the empirical Bayesian
estimate of H; see [18,19].
3. The Classical Kalman Filter: The classical theories of Kalman ®lter derive optimal estimators or
predictors of the state variables X by using P X j HP Y j X; H with H being assumed as
given or known. These are the procedures given in almost all of the texts on Kalman ®lters

W.-Y. Tan, Z. Ye / Mathematical Biosciences 167 (2000) 31±50

39

published to date; see, for example, [20±22]. For example, by using estimates of pS t and ct
from other sources, Tan and Xiang [3,4] have estimated the numbers of S people, I people and
AIDS cases in the homosexual population.
In the above, notice that in the sampling theory inference, the prior information about H and
the information about X from the stochastic system model are completely ignored; in the
Bayesian inference, the information from the stochastic system model has been ignored. In the
classical Kalman ®lter theories, the parameters H are assumed known. Thus, in each of these
cases, some information has been lost or ignored. In this section, we proceed to develop a general
procedure to estimate simultaneously the unknown parameters and the state variables by using
the multi-level Gibbs sampler method [23±25]. We will call this method a general Bayesian method
because it not only combines information from the likelihood and the prior distribution but also
incorporates information from the stochastic system model. To proceed, note ®rst that the joint
probability density function of H; X; Y is
P H; X; Y P HP X j HP Y j X; H:

17

Thus the conditional distribution of X given by Y; H is
P X j Y; H / P X j HP Y j X; H;

18

and the conditional distribution of H given by (Y, X) is
P H j Y; X / P HP X j HP Y j X; H:

19

The multi-level Gibb's sampler method is a Monte Carlo method to estimate P X j Y (the
conditional distribution of X given Y) and P H j Y (the posterior distribution of H given Y)
through a sequential procedure. The algorithm of this method iterates through the following loop:
(1) Given H and Y, generate X from P X j Y; H .
(2) Generate H from P H j Y; X , where X is the value obtained in (1).
(3) Using H obtained from (2) as initial values, go back to (1) and repeat the (1)±(2) loop until
convergence.
Since in practice it is often very dicult to derive P X j Y; H whereas it is easy to generate X
from P X j H, we will apply the weighted bootstrap method due to Smith and Gelfand [26] to
generate X from P X j Y; H. The algorithm of the weighted bootstrap method is given by the
following steps (for proof, see [26]):
(a) Given H and X j, generate a large random sample of size N for X j 1 by using
1
j 1 j X jg; denote it by fX
j 1; . . . ; X N j 1g.
P fX

(b) Compute wk and qk from P Y
j 1 j X s, s 0; 1; . . . ; j
1; H , k 1; . . . ; N , where

P
j 1 j X k s, s 0; 1; . . . ; j 1; H and qk wk = Ni1 wi .
wk P Y

(c)
qk . (Note
PN Construct a population P with elements fE1 ; . . . ; EN g and with P Ekk
q

1.)
Draw
an
element
randomly
from
P.
If
the
outcome
is
E
,
then
X
j 1 is an
k
i1 i

element generated from the conditional distribution of X given the observed data and given
the parameter values.
Starting with j 0 and continuing until j tM , by combining the above two iterative procedures, one can readily generate a random sample for X from P X j Y and a random sample for H
from P H j Y. From these generated samples, one may use the sample means as estimates of X
and H.

40

W.-Y. Tan, Z. Ye / Mathematical Biosciences 167 (2000) 31±50

5. A general procedure for simultaneously estimating the HIV infection, the HIV incubation and the
numbers of S people, I people and AIDS cases
In this section, we apply the general theory of Section 4 to develop a general procedure to estimate simultaneously the HIV infection distribution, the HIV incubation distribution as well as
the numbers of S people, I people and AIDS cases in the model given in Section 3. For this model,
P X j X 0; H is given by Eq. (9) and P Y j X; H is given by Eq. (14). We will estimate the prior
distribution P H by using results from previous studies. (This is the empirical Bayesian approach.)
5.1. The prior distribution P H
Since the HIV incubation is usually not aected by HIV infection [9,10], we assume that a prior
h1 fpS t; t 1; . . . ; tM g is independently distributed h2 fct; t 1; . . . ; tM g. Thus, P H

P h1 P h1 . Since P X j H is a product of binomial distributions, a natural conjugate prior for h1

is (see [27])
o
Yn
pS ta1 tÿ1 1 ÿ pS ta2 tÿ1 ;
P h1 /
20

t

where ai t > 0; i 1; 2.
Similarly, a natural conjugate prior for h2 is
o
Yn
b1 tÿ1
b2 tÿ1
P h2 /
ct
1 ÿ ct
;

21

t

where bi t > 0; i 1; 2.
In the above prior distributions, the unknown parameters fai t; bi t; i 1; 2; t 1; . . . ; tM g in
the prior distributions can be estimated from previous studies. For example, suppose that a prior
study with sample size n has been conducted to estimate the HIV infection distribution fI t. Let
nj be the Q
number of HIV infected individuals with HIV infection in j ÿ 1; j. Then, since
tÿ1
fI t pS t i1 1 ÿ pS i, we have
o
Yn
fI jnj
P pS j; j 1; 2; . . . /
jP1

(
)
j
inj Y
Y h
ÿ1
nj

pS j1 ÿ pS j
1 ÿ pS u
jP1

Yn

jP1

u1

pS j

nj

1 ÿ pS j

N jÿnj

o

;

P
where N j l P j nl.
From the above formulation, we may then identify a1 j with nj 1 and identify a2 t with
Nj ÿ nj 1. Let fÎ j denote the estimate of fI j. From the above equation, the prior pa^
rameters a1 j ÿ 1 and a2 j ÿ 1 are then
Pestimated, respectively, by a^1 j ÿ 1 nfI j n^j and
^
^l and n is the sample size in the study. Similar
a^2 j ÿ 1 N^ j ÿ n^j, where Nj
lPj n
procedures can be used to estimate bi t; i 1; 2; t 1; . . . from some previous studies. Notice
that if one assumes ai t bi t 1; i 1; 2, for all t, the prior distribution is equivalent to a

W.-Y. Tan, Z. Ye / Mathematical Biosciences 167 (2000) 31±50

41

non-informative uniform prior. This is equivalent to no prior information; in this case the results
of Bayesian approach are equivalent to the results from the sampling theory approach numerically although the two approaches are very dierent conceptually.
5.2. Generating X from the conditional density P X j H; Y
To use the weighted bootstrap method as described in Section 4, we will need to generate X
from P X j H. This can be achieved by using the stochastic Eqs. (1)±(4) given in Section 2. Thus,
given X j fSj; Iu; j; u 0; 1; . . . ; jg and given the parameter values, we use the binomial
generator to generate FS t and FI u; t through the conditional binomial distributions
FS t j St BfSt; pS tg and FI u; t j Iu; t BfIu; t; cug. These lead to St 1
St
Pt ÿ FS t, I0; t 1 FS t, Iu 1; t 1 Iu; t ÿ FI u; t, u 0; 1; . . . ; t and Zt 1
u0 FI u; t. The binomial generator is readily available from the IMSL subroutines [28] or other
software packages such as SAS. With the generation of X from P X j H, one may then apply the
weighted bootstrap method to generate X from P X j Y; H.
5.3. Generating H from the conditional density P H j X; Y
Using Eqs. (1)±(4) given in Section 2, and the prior distribution from Section 5.1, we obtain
tY
o
M ÿ1n
I0;t1a1 tÿ1
St1a2 tÿ1
P H j X; Y /
pS t
1 ÿ pS t

t0
tY
M ÿ1 n
u1

o
cuc1 ub1 uÿ1 1 ÿ cuc2 ub2 uÿ1 :

The above equation shows that the conditional distribution of pS t given X and given Y is a bdistribution with parameters fI0; t 1 a1 t; St 1 a2 tg: Similarly, the conditional distribution of ct given X and given Y is a b-distribution with parameters fc1 u b1 u;
c2 u b2 ug. Since generating a large sample from the b-distribution to give sample means are
numerically identical to compute the mean values from the b-distribution, the estimates of pS t
and ct are then given by
I0; t 1 a1 t
;
p^S t
I0; t 1 St 1 a1 t a2 t
c1 u b1 u
:
cû P2
i1 ci u bi u

We will use these estimates as the generated sample means.
Using the above approach, we can readily estimate simultaneously the numbers of S people, I
people and AIDS cases as well as the parameters fpS t; ctg. With the estimation of fpS t; ctg,
one may readily estimate the HIV infection
Qtÿ1 distribution fI t and the HIV
Qtÿ1 incubation distribution
gt through the formula fI t pS t i1 1 ÿ pS i and gt ct i1 1 ÿ ci: In Section 6,
we will apply the above method to the San Francisco homosexual population in the city of San
Francisco, CA, USA, to estimate simultaneously the HIV infection distribution, the HIV incubation distribution and the numbers of S people, I people and AIDS cases.

42

W.-Y. Tan, Z. Ye / Mathematical Biosciences 167 (2000) 31±50

6. Simultaneous estimation of the HIV infection, the HIV incubation and the numbers of S people, I
people and AIDS cases in the San Francisco homosexual population
As an application of the method given in the previous section, in this section we proceed to
estimate simultaneously the HIV infection distribution, the HIV incubation distribution and the
numbers of S people, I people and AIDS cases in the San Francisco homosexual population. For
this population, the number of the monthly AIDS incidence and the monthly death from AIDS
are available from January 1981 through December of 1994 from the gopher server of the CDC at
Atlanta, GA. This data set is given in Table 1 and is used to construct the observation model of
the state space model. (To avoid the problem of reporting delay and the confusion caused by the
change of new AIDS case de®nition eective in January 1993, we have used the data only up to
December 1992.)
6.1. The initial size
Since the average AIDS incubation period is around 10 years and since the ®rst AIDS case was
reported in 1981, as in [3], we assume 1 January 1970 as t0 0. It is also assumed that at time 0
there are no AIDS cases and no HIV infected people with infection duration u > 0 but to start the
HIV epidemic, some HIV were introduced into the population at time 0.
For the initial population size at time 0 in the city of San Fransisco, we follow [3] to assume that
S0 40 000 and I0; 0 36. Following [3], we also assume that there were 10 000 more S
people who would not contribute to AIDS so that there were 50 000 S people at time 0; for more
details, see [3].
6.2. The prior distributions of pS t and cu
For the San Francisco homosexual population, Tan and Xiang [3,4,18] have estimated both the
HIV infection density fI t and the HIV incubation density gt by using the SFCCC data
Table 1
San Francisco AIDS case report for 1981±1994 by month of primary diagnosis

a

a

Year

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992

1
6
24
46
77
104
136
156
159
200
220
286

3
5
19
32
62
93
137
144
141
174
195
303

2
0
31
39
72
114
142
184
179
191
195
229

1
6
25
43
77
100
130
141
197
156
190
215

1
6
19
39
73
103
149
130
163
180
208
218

3
15
21
48
80
109
149
155
201
173
193
241

3
12
27
67
94
121
150
139
169
176
229
267

3
10
35
59
89
138
148
136
160
195
243
244

5
10
28
70
76
112
161
156
138
156
220
249

3
14
31
54
89
149
140
117
155
175
303
240

3
20
26
60
70
99
123
132
138
182
227
196

8
12
31
56
89
142
131
155
142
150
241
226

Data from CDC gopher serve.

43

W.-Y. Tan, Z. Ye / Mathematical Biosciences 167 (2000) 31±50

available from CDC, and the SFMHS data, respectively. Let fÎ j and g^j denote the estimates of
fI j and gj, respectively. Since the sample sizes of the SFCCC data is n 1095, we have
PM
nl. (For practical purpose, one may take tM as tM 360
nj 1095fÎ j and N j tlj
months.) Thus, a1 j nj 1 and a2 j N j ÿ nj 1. These estimates are given in Table 2.
Similarly,
the minimum size of the SFMHS data is 711, we have mj 711^
gj and
P since
M
ml so that b1 j mj 1 and b2 j Mj ÿ mj 1. These estimates are given
Mj tlj
in Table 3.
Given the above prior distribution and given fS0; I0; 0 I0g, by using the procedures
given in Section 5, one may readily derive simultaneously the estimates of the HIV infection
distribution, the HIV incubation distribution and the numbers of S people, I people and AIDS
cases over the time span. These results are plotted in Figs. 1±3. Given below we summarize our
basic ®ndings:
Table 2
Prior information for infection distribution
Time

Jun 1977

Dec 1977

Jun 1978

Dec 1978

Jun 1979

Dec 1979

Jun 1980

Dec 1980

4.03
1060.42

5.92
1033.92

7.42
1001.62

8.39
957.60

11.15
903.19

15.61
829.79

15.50
733.92

14.02
650.26

Time
a1 t
a2 t

Jun 1981
13.53
574.57

Dec 1981
10.25
511.87

Jun 1982
6.67
469.05

Dec 1982
4.95
441.70

Jun 1983
3.40
424.51

Dec 1983
2.45
413.57

Jun 1984
2.03
406.48

Dec 1984
1.85
401.15

Time
a1 t
a2 t

Jun 1985
1.52
397.31

Dec 1985
1.52
394.28

Jun 1986
1.28
391.98

Dec 1986
1.31
390.37

Jun 1987
1.61
387.22

Dec 1987
1.87
383.12

Jun 1988
1.47
377.77

Dec 1988
1.25
375.83

Time
a1 t
a2 t

Jun 1989
1.41
373.92

Dec 1989
1.18
372.56

Jun 1990
1.22
371.38

Dec 1990
1.69
368.87

Jun 1991
5.09
355.00

Dec 1991
9.33
313.97

Jun 1992
10.25
259.07

Dec 1992
9.04
207.49

a1 t
a2 t

Table 3
Prior information for incubation distribution
Time

Jun 1977

Dec 1977

Jun 1978

Dec 1978

Jun 1979

Dec 1979

Jun 1980

Dec 1980

5.34
528.82

5.46
502.34

5.51
475.37

5.52
448.24

5.48
421.23

5.40
394.61

5.29
368.58

5.15
343.32

Time
b1 t
b2 t

Jun 1981
4.99
318.96

Dec 1981
4.82
295.62

Jun 1982
4.63
273.36

Dec 1982
4.44
252.24

Jun 1983
4.25
232.27

Dec 1983
4.05
213.48

Jun 1984
3.86
195.84

Dec 1984
3.67
179.34

Time
b1 t
b2 t

Jun 1985
3.49
163.95

Dec 1985
3.31
149.63

Jun 1986
3.15
136.34

Dec 1986
2.99
124.02

Jun 1987
2.84
112.62

Dec 1987
2.69
102.10

Jun 1988
2.56
92.41

Dec 1988
2.44
83.48

Time
b1 t
b2 t

Jun 1989
2.32
75.28

Dec 1989
2.21
67.74

Jun 1990
2.11
60.83

Dec 1990
2.02
54.50

Jun 1991
1.93
48.70

Dec 1991
1.85
43.39

Jun 1992
1.78
38.54

Dec 1992
1.71
34.11

b1 t
b2 t

44

W.-Y. Tan, Z. Ye / Mathematical Biosciences 167 (2000) 31±50

Fig. 1. Plots of the estimated HIV infection distribution.

Fig. 2. Plots of the estimated HIV incubation distribution.

(a) From Fig. 1, the estimated density of the HIV infection clearly showed a mixture of distributions with two obvious peaks. The ®rst peak occurs in May 1981 and is exactly two months
earlier (July 1981) than the estimated peak of sero-conversion by Bacchetti [29] and by Tan et al.
[1]. The second peak occurs at June 1995 and is considerably lower than that of the ®rst peak.
Comparing the estimated density of the HIV infection in Fig. 1 with the estimated density of the
HIV sero-conversion by Tan et al. [1], one may note that the two curves are quite similar to each
other.
(b) From Fig. 2, the estimated density of the HIV incubation distribution also appeared to be a
mixture of distributions with two peaks. The higher peak occurs at around 143 months after
infection and the lower peak occurs around 83 months after infection. This result seems to suggest
a staged model for HIV incubation as used by Satten and Longini [2].

W.-Y. Tan, Z. Ye / Mathematical Biosciences 167 (2000) 31±50

45

Fig. 3. Plots of the observed AIDS incidence, the Gibb's sampler estimate and the estimated numbers of susceptible
and infected people: (a) AIDS incidence; (b) susceptible people; (c) infected people.

(c) From Fig. 3(a), we observe that the estimates of the AIDS incidence by the Gibbs sampler
are almost identical to the corresponding observed AIDS incidence, respectively, suggesting the
usefulness of the method. This result indicates that the estimates by the Gibbs sampler can trace
the observed values very closely.
(d) To estimate the number of S people and I people, we ®gured in a 1% increase in population
size annually and assumed a population size of 50,000 at t0 0. Then, as shown in Fig. 3(b), the
total number of S people before January 1978 were always above 50 000 and were between 31 000
and 32 000 during January 1983 and January 1992. The total number of people who do not have
AIDS were estimated around 50,000 before January 1992.
(e) Results in Fig. 3(c) showed that the total number of infected people reached a peak around
the middle of 1985 and then decreased gradually to the lowest level around 1992. The results
before 1992 appeared to be consistent with those obtained by Bacchetti et al. [30] through
backcalculation method.

46

W.-Y. Tan, Z. Ye / Mathematical Biosciences 167 (2000) 31±50

Fig. 4. Plots of the estimated HIV infection distribution under dierent initial incubation distributions.

(f) To assess in¯uence of prior information on fpS t; ctg, we plot in Figs. 1 and 2 the estimates of the HIV infection and the HIV incubation under both with and without (i.e., non-informative uniform prior) prior information. The results show clearly that the prior information
seems to have little eects, especially in the case of HIV infection.
(g) To start the procedure, one needs some initial parameter values for pS t and cu. In this
paper, we ®rst assume an initial incubation distribution with a mean of 10 yr and derive estimates
of the infection distribution by using the standard backcalculation method (see [9,10]). This assumed incubation distribution and the associated estimate of the infection distribution will then
be used to give initial values for the parameters pS t and cu. To check eects of the initial
incubation distribution, we have assumed dierent incubation distributions as the initial assumed
distribution. These assumed distributions include uniform distribution, exponential distribution,
c-distribution, Weibull distribution and the generalized c-distributions with the same mean value
of 10 yr. We are elated to ®nd out that all initial distributions gave almost identical estimates. (As
an illustration, we plot in Fig. 4 the estimated HIV infection distributions under four dierent
initial incubation distribution.) This robustness property indicates that the procedure is quite
independent of the initial values of fpS t; cug.
7. Conclusions and discussion
In the classical Kalman ®lter method, one has to assume the parameters as known in order to
derive optimal estimates or predicted values of the state variables. For example, in the HIV
epidemic, one has to assume that the probabilities pS t and the transition rate ct as known in

W.-Y. Tan, Z. Ye / Mathematical Biosciences 167 (2000) 31±50

47

order to derive optimal estimates of the numbers of S people, I people and AIDS cases. In this
paper, we have developed a general procedure to estimate simultaneously the state variables and
the unknown parameters in HIV epidemic via the state space models. By using the San Francisco
homosexual population as an example, we have illustrated how to use the methodology to estimate simultaneously the HIV infection distribution, the HIV incubation distribution as well as the
numbers of S people, I people and AIDS cases in this population. From this analysis we have
drawn the following conclusions:
(1) The estimates of the AIDS cases traced the observed AIDS cases extremely well.
(2) Our analysis predicted two waves of HIV infection. The ®rst wave peaked around the
middle of 1985, a result which was consistent with ®ndings by Bacchetti et al. [30]. However, our
analysis predicted a second wave of HIV infection which will peak some time around the year
2000. This important message indicates that there is a high proportion of restricted mixing (i.e.,
like with like mixing) among the San Francisco homosexual population as have been suggested by
Tan et al. [31] and by Becker and Egerton [32].
(3) In studying HIV epidemic, Satten and Longini [2] have used a Markov-staged model which
partitions the infective stage into ®ve substages based on the number of CD4 T-cell counts per
mm3 . It follows that the probability distribution of the HIV incubation is a mixture of several
exponential distributions. (For proof of these results, see [33,34].) Our estimates show that the
probability distribution of the HIV incubation is a mixture of distributions with two obvious
peaks, thus providing strong support for the staging of infective stage by Satten and Longini [2].
(4) Results from Figs. 1 and 2 have shown that the estimated curves of the HIV infection
distribution and the HIV incubation distribution by using conjugate priors do not dier signi®cantly from those by using non-informative uniform prior. Since non-informative uniform prior
corresponds to no prior information, these results suggest that our approach with information
only from the data and from the stochastic system model have provided almost all information on
the HIV epidemic.
To study the HIV epidemic, in the past there were three dierent approaches: the deterministic
modeling approach, the stochastic modeling approach and the statistical modeling approach. A
major dierence between these approaches is the derivation of the probability pS t for the infection by HIV of S people during t; t 1. In the deterministic and stochastic approaches, one
constructs pS t by taking into account the dynamics and the epidemiology of the HIV epidemic.
Thus, the pS t are derived as explicit functions of the state variables St; Iu; t. On the other
hand, by assuming that the pS t are deterministic functions of time t, the statistical modeling
approach tries to estimate pS t from data without considering the dynamics and the epidemiology
of the HIV epidemic. Notice that in the stochastic approach, the pS t are stochastic variables
whereas in the deterministic and statistical approaches, pS t are deterministic functions of time t.
In this paper, we have adopted the statistical approach by considering the pS t as deterministic
functions of t and have attempted to estimate pS t along with the estimation of ct and the state
variables. The statistical approach has the advantage in that one can avoid many of the assumptions regarding mixing pattern and risk behaviors. If there is knowledge about the dynamics
and the epidemiology of the HIV epidemic, one can always incorporate this into pS t. For example, if the mixing pattern is a preferred mixing pattern, then one may write pS t as a linear
combination pS t hpR t 1 ÿ hpP t, where h is the proportion of restricted mixing pattern
and where pR t and pP t denote the probability components from the restricted mixing pattern

48

W.-Y. Tan, Z. Ye / Mathematical Biosciences 167 (2000) 31±50

and the proportional mixing pattern, respectively. Then, as illustrated in [3], with the estimates of
pS t and the state variables, one can always estimate the relevant parameters in pS t.
In deriving the results, we have assumed that during a one month period, the numbers of
immigrants and recruitment of the S people and I people equal to those of the death and migration out of these people, respectively. Our Monte Carlo studies seemed to indicate that this
assumption has little impact on the estimates of the HIV infection distribution and the HIV incubation distribution. As a further con®rmation, we note that for the San Francisco homosexual
population, the estimates of the HIV infection distribution and the incubation distribution are
almost identical to those derived before by Tan and Xiang [3,4] by using other data sets. We note,
however, that this assumption does have some impact on the estimation of the number of S people
and I people. To see this, denote by fkS ; kI g the immigration rates of S people and I people,
respectively, and flS ; lI g the death rates of S people and I people, respectively. To account for
eects of immigration and death, one needs then to add StkS ÿ lS to Eq. (1) and add
Iu; tkS ÿ lI to Eq. (3), respectively. Thus, if assumption (2) in Section 2 fails, then one would
expect that the method would underestimate these numbers. To correct this, we have ®gured a 1%
increase in estimating the numbers of S people and I people. Notice that with these adjustments,
the estimates of the numbers of S people and I people are almost identical to those given in [3,4].
In the studies of the HIV epidemic, Brookmeyer and Gail [9,10] have proposed a backcalculation method to estimate the HIV infection and to give short term projection of future AIDS
cases. This method uses AIDS incidence data and is based on the formulation that the distribution
of the time to AIDS onset is a convolution of the distribution of the HIV infection and the
distribution of the HIV incubation. However, there are two major diculties associated with this
method. First, the method is not identi®able if both the distribution of the HIV infection and the
distribution of the HIV incubation are unknown. Hence, one would need to assume the distribution of the HIV incubation as known if one wants to estimate the distribution of HIV infection
[9]. Similarly, one would need to assume the distribution of HIV infection as known if one wants
to estimate the distribution of the HIV incubation [35]. Second, the method is very sensitive to the
choice of the distribution of the HIV incubation or the distribution of the HIV infection
[10,30,36]. In this paper, we have solved these problems through the state space models. In
Sections 2 and 3, we have in fact shown that the backcalculation method is a special case of the
observation model. Thus, in addition to information from the data, the stochastic system model
has provided additional information from the system, thus helping solve the identi®ability
problem confronting the backcalculation method.
From the above demonstration, it appears that we have proposed a powerful procedure via the
application of the state space models to estimate simultaneously the unknown parameters and the
state variables in HIV epidemic. To make this approach applicable to a wide range of problems,
however, many problems need to be examined. First, it is necessary to extend the model to include
immigration and death as well as other disturbing factors. Second, in many practical situations,
one would need to deal with the problem of missing data. Third, in some cases, it may be necessary to deal with reporting delay in the AIDS incidence data. We have just ®nished another
paper exten

Directory UMM :Data Elmu:jurnal:M:Mathematical Biosciences:Vol167.Issue1.Sept2000:

Dokumen yang terkait

PERENCANAAN PORTAL KELUAR KAMPUS III UMM

PERENCANAAN ULANG (REDESIGN) STRUKTUR ATAS PROYEK RUMAH SUSUN SEDERHANA 2 (RUSUNAWA) UMM MALANG DENGAN MENGGUNAKAN BALOK PRATEGANG PARSIAL

PERSEPSI MAHASISWA TENTANG MATERI TAYANGAN SANG PEMBURU DI LATIVI ( Studi Pada Mahasiswa Jurusan Ilmu Komunikasi UMM Angkatan 2002)

APRESIASI MAHASISWA TERHADAP TAYANGAN Â“OPERA VAN JAVAÂ” DI TRANS7 (Studi Pada Mahasiswa Jurusan Ilmu Komunikasi UMM Angkatan 2008)

FAKTOR-FAKTOR PENYEBAB KESULITAN BELAJAR BAHASA ARAB PADA MAHASISWA MA’HAD ABDURRAHMAN BIN AUF UMM

38 Mahasiswa UMM Pilih KKN di Padang

FPP UMM Seleksi Sarjana Membangun Desa

Lagi, UMM Tambah Mahasiswa Asing

Pengajian Muhammadiyah di UMM Akan Hadirkan BJ Habibie

Robert John Pope: UMM Miliki Intellectual Honesty

Dukungan

Links

Directory UMM :Data Elmu:jurnal:M:Mathematical Biosciences:Vol167.Issue1.Sept2000:

Dokumen yang terkait

PERENCANAAN PORTAL KELUAR KAMPUS III UMM

PERENCANAAN ULANG (REDESIGN) STRUKTUR ATAS PROYEK RUMAH SUSUN SEDERHANA 2 (RUSUNAWA) UMM MALANG DENGAN MENGGUNAKAN BALOK PRATEGANG PARSIAL

PERSEPSI MAHASISWA TENTANG MATERI TAYANGAN SANG PEMBURU DI LATIVI ( Studi Pada Mahasiswa Jurusan Ilmu Komunikasi UMM Angkatan 2002)

APRESIASI MAHASISWA TERHADAP TAYANGAN Â“OPERA VAN JAVAÂ” DI TRANS7 (Studi Pada Mahasiswa Jurusan Ilmu Komunikasi UMM Angkatan 2008)

FAKTOR-FAKTOR PENYEBAB KESULITAN BELAJAR BAHASA ARAB PADA MAHASISWA MA’HAD ABDURRAHMAN BIN AUF UMM

38 Mahasiswa UMM Pilih KKN di Padang

FPP UMM Seleksi Sarjana Membangun Desa

Lagi, UMM Tambah Mahasiswa Asing

Pengajian Muhammadiyah di UMM Akan Hadirkan BJ Habibie

Robert John Pope: UMM Miliki Intellectual Honesty

Dokumen yang Anda mencari sudah siap untuk unduhkan