Assessing impacts of small perturbations

Ecological Modelling 156 (2002) 185 /199
www.elsevier.com/locate/ecolmodel

Assessing impacts of small perturbations using a model-based
approach
Michael B. Dale a, Patricia E.R. Dale a,*, Cen Li b, Gautam Biswas c
a

b

Griffith University, Australian School of Environmental Studies, Nathan 4111, Queensland, Australia
Department of Electrical Engineering and Computer Science, Middle Tennessee State University, Murfreesboro, TN, USA
c
Department of Computer Science, Vanderbilt University, TN, USA
Received 28 November 2001; received in revised form 26 April 2002; accepted 8 May 2002

Abstract
When examining the effects of a disturbance on a complex system like vegetation it is difficult to distinguish between
those changes that affect the processes underlying the functioning of the system and other changes which simply shift
the state of the system but have no effect on the processes. The former is obviously a more significant effect than the
latter. In this paper we examine a model-based clustering procedure which can make such a distinction. Given

observations on several sites on several occasions, we model the dynamics of the processes using a continuous hidden
Markov model. In this model the actual Markov process is hidden, but at any observation time we can observe
surrogate variables whose values will be conditional on the underlying state of the process. We further ask if there is
evidence for more than one such process, i.e. whether our data are heterogeneous. By estimating the number of clusters
using a Bayesian information criterion we can choose between these alternatives. An analogous assessment is made of
the number of states in the underlying hidden Markov models, as well as the transition matrices between states and
emission probabilities relating the underlying hidden state to the observed attributes. The methodology was applied to
the question of determining if a runnelling treatment of a salt marsh for mosquito management had changed the
underlying processes related to the vegetation. # 2002 Elsevier Science B.V. All rights reserved.
Keywords: Hidden Markov models; Model-based clustering; Salt marshes; Runnelling

1. Introduction
Assessing the impact of disturbance on vegetation is a critical activity, for vegetation reflects

Abbreviations: MML, minimum message length; HMM,
hidden Markov model; BIC, Bayesian information criterion.
* Corresponding author. Tel.: /61-7-3875-7136; fax: /61-73875-7616
E-mail address: p.dale@mailbox.gu.edu.au (P.E.R. Dale).

general environmental conditions and should indicate the cumulative effects of change. In some

cases the impact is so large that there can be little
need for sophisticated assessment: clearing of a
forest can hardly be regarded as other than an
obvious and major impact. Yet a destructive fire in
a Eucalyptus regnans F. Muell. forest can cause
equally massive changes but is a necessary part of
the process of regeneration and as such represents
only a change of state and not a change of process.

0304-3800/02/$ - see front matter # 2002 Elsevier Science B.V. All rights reserved.
PII: S 0 3 0 4 - 3 8 0 0 ( 0 2 ) 0 0 1 5 8 - 8

186

M.B. Dale et al. / Ecological Modelling 156 (2002) 185 /199

When dealing with less intrusive activities, with
more delicate manipulations, the assessment of
impact itself becomes a delicate procedure. There
are experimental designs for assessing impact, such

as ‘Before After Control Intervention’ (Faith et al.,
1995), but these only inform us that, at some risk
level, there has (or has not) been a recognisable
change in the system. We may have simply
displaced our system to a state that was within
the normal range of its variation. But it is also
possible that we have had a greater impact and
have changed the fundamentals of the system, the
processes that have been controlling it.
The main focus of this paper is to critically
evaluate a method of analysis that enables the
distinction between shifted state and changed
processes to be identified with respect to the
impacts of modification on sites. The method of
analysis relies on model-based clustering of the
sequences, such that a single model is an adequate
fit to all the sequences within any single cluster.
The number of clusters and their components and
the models for each cluster are estimated from the
data.


2. The nature of the model
In a previous paper (Dale and Dale, 2002) we
discussed and illustrated the use of model-based
clustering, using a minimal message length (MML)
method to visualise changes in a relatively simple
salt marsh community. In that study we adopted a
two-stage approach, initially clustering the (multivariate) data for sample sites at each time as if they
were independent observations. This allowed us to
differentiate vegetation states. The program used
for this analysis, (Boulton and Wallace, 1970)
SNOB in a revised version, estimates the number
of clusters (states). The temporal information was
then introduced by ordering the states through
time and displaying this graphically. Thus the
serial order was introduced after the clustering
was complete, and the dependency between observations through time was not incorporated into
the clustering procedure itself.
Edgoose and Allison (1999) have remarked that
such a two-stage process can give results very


different from those obtained when the temporal
dependency between observations is incorporated
into the clustering process. They proposed an
approach for incorporating the dependency between observations in the clustering procedure.
First they establish if temporal dependency exists,
which involves a comparison with the ‘independent’ model. Second, and if the dependence is
shown, they estimate its order. At no stage do they
consider the possible existence of clusters with
different hidden processes being present in the
data. We address this matter here to examine the
effects of temporal dependence following the
methods of Li and Biswas (1999, 2000) which use
hidden Markov modelling. Thus Edgoose and
Allison’s method is complementary to that of Li
and Biswas rather than competitive.
2.1. Theoretical framework
We shall proceed with a model-based clustering
procedure (Banfield and Raftery, 1993; Bensmail
et al., 1995). Such methods were pioneered using

Gaussian mixture models, but the procedure is
more general. Assuming we have data consisting
of records of some properties of individual sample
sites recorded at many times, in a treatment and
control experimental design. We can ask if there is
evidence that models fitted to individual series are
essentially the same model representing a single
process, or alternatively whether it is preferable to
separate the sample units into clusters, with each
cluster having its own model. Whatever the outcome, there will be states identified within each
cluster whose interactions define the relevant
process for that cluster, and which provide information as to the nature of that particular
process. Moreover, the dynamics of change within
each model can be analysed using a transition
matrix, and the process can be ‘mapped’ to
illustrate its dominant characteristics. To put
very simply, the methodology identifies a number
of clusters within which there are states. A cluster
represents a process; a state is a condition within a
process and there may be several states within a

single process. If there is only one cluster, then this
suggests that there is one process; if there is more
than one cluster, then there is more than one

187

M.B. Dale et al. / Ecological Modelling 156 (2002) 185 /199

process. Obviously there may be several processes
present before we introduce any treatment. However, if the treatment applied disturbs the system
sufficiently to introduce another process, then this
should result in more clusters. Whether this is
actually observed depends on the sensitivity of the
analysis and the amount of data, especially the
number of observation through time.
In the present case we have to model sequences
of observations and one possibility is to use hidden
Markov models (HMMs) (see Rabiner, 1989). In
such a model the state definitions do not correspond to attribute values (if they did, then a
Markov chain model would be more appropriate;

see Sebastiani et al., 1999). HMMs are more
appropriate when the state definitions are not
directly observable or it is infeasible to define
states by exhaustive enumeration of attribute
values, for example when the values are continuously valued in time. In HMM the complete set of
states and the exact sequence of states a system
traverses may not be observable, but they can be
estimated from the observed behaviour of the
system.
The idea is simply to cluster the sequences that
have been fitted by HMM models. If there are
several clusters (i.e. several processes) we can then
ask if these correspond to the treatment-control
distinction (see Critchlow, 1980, for a suitable
method). If they do, we have evidence for process
shifts related to impact, and if not, then we can
reject such, but should consider whether a single
treatment is really appropriate for a heterogeneous
target, or whether we might not consider alternatives based on the revealed process typology. If
there is only a single cluster, then we can accept

that our treatment has at most resulted in a change
in state.
Clustering methods for HMMs have been
proposed by Dermatas and Kokkinakis (1996),
Smyth (1997), Oates et al. (1998), but these
methods presume that the number of states in
the Markov model is known. Li and Biswas (1999,
2000) have presented a procedure for clustering
HMMs that has been used here. This procedure
uses Schwarz’s Bayesian information criterion
(BIC) to determine the number of clusters and
the number of states within any single HMM

(Schwarz, 1978). BIC provides one means of
penalising a model for excessive complexity to
prevent overfitting. Another, Akaike’s criterion
(Akaike, 1978), is the asymptote for cross-validation estimates, while the MML principle noted
above is a third such measure.
2.2. Clustering algorithm
The algorithm involves four nested loops. These

involve the following calculations:
. Determining the number of clusters (representing processes) in a data set.
. Distributing the things (samples) into the clusters.
. Computing the model sizes (number of states)
for the individual cluster.
. Estimating the HMM parameters for the
within-cluster models, including mean and
standard deviation to describe the external
probability-generating function for class, given
the hidden state and assuming a Gaussian
distribution.
2.2.1. The number of clusters in a partition
Within-cluster and between-cluster properties
need to be addressed. A distance measure does
well for the former and they use a partitioned
mutual information measure for the second. This
is based on the Bayesian posterior probability.
 
N
log(p(MjX )) log(prob(X jM; p))d log

;
2
where model M has d parameters, N data things,
and p is the parameter configuration.
Expressions derived from this basic form are
used for selecting the number of clusters and the
number of states (steps 1 and 3). For the number
of clusters the expression finally takes the form
X

N
K
X
log(prob(X jM))
log
Pk prob(Xi juk ; lk )
i1

K


K
X
k1

2

k1

dk
log (N);

188

M.B. Dale et al. / Ecological Modelling 156 (2002) 185 /199

where there are K clusters l1 to lk , Pk is the
likelihood of the given data model k and uk and dk
are the model parameter configuration and number of significant model parameters for cluster k ,
respectively.

estimating HMMs and their possibly associated
clusters. Unfortunately, this procedure may be
detrimental, and the authors state their intention
of using a minimum description length approach
instead.

2.2.2. The structure of a given partition size
The structure uses a crisp k -means procedure to
maximise a partition mutual information measure.
The distance measure is a ‘sequence to model
likelihood’ measure which is effectively minimising
partition model posterior probability.

2.3. Interpretation

2.2.3. The hidden Markov structure for each cluster
A sequential search procedure is used to search
for the hidden Markov structure. The number of
states is initialised at 1 and increased until BIC
indicates that an optimum has been reached. BIC
used balances the likelihood against a model
complexity penalty term, the expression taking
the form
log(prob(Xk jlk ))

N(k)
X

log(prob(Xkj jlk ;uk ))

j1



N(k)
dk log
;
2
where N (k ) is the number of things in cluster k
and dk is the number of parameters in lk with
parameters uk .
2.2.4. The parameters of each hidden Markov
structure
A Baum /Welch procedure (a variation of the
EM algorithm) is used to obtain maximum likelihood estimates of the Markov parameters including the transition probabilities and the
emission parameters. These last may be used to
characterise the hidden state since the observed
values are regarded as random drawings from a
Gaussian distribution with the specified emission
mean and variance depending on the state of the
process.
An alternative procedure has been given by
Oates et al. (1998) who use a dynamic timewarping dissimilarity measure between the several
pairs of series to obtain an initial clustering, before

The procedure supplies us with an estimate of
the number of clusters and HMM for each cluster.
The number of states may differ in each cluster. In
addition, the following three pieces of information
are obtained for each HMM:
2.3.1. The probability that a state ‘j’ initiates a
sequence
For an n -state process there are n such values.
Since we began our observations at an arbitrary
time, the initial state is itself somewhat arbitrary
and the main interest of these values lies in those
cases that do NOT initiate any sequence.
2.3.2. The transition probabilities for each Markov
process showing the probability with which state ‘j’
follows state ‘i’
For an n -state process there are n2 such values.
Overall, a large number of states does not seem
likely, although such an occurrence might be an
indicator that a higher order Markov process is
present. These probabilities indicate likely paths of
change, including possible cyclic pathways, and
represent the processes that are operating in the
system.
2.3.3. The emission probability, which is the
probability that attribute value ‘X’ is generated
given that the underlying Markov process is in state
‘g’
For an n -state process and m descriptors there
are nm such values. Li and Biswas (1999) use mean
and standard deviation to characterise the emissions, assuming a Gaussian distribution within
clusters. The emission probabilities permit us to
predict the species responses and potentially to
examine the relationships with environmental
attributes. Environmental attributes may also
provide a context for state changes, i.e. they may
modify the probabilities of transitions. Unfortu-

M.B. Dale et al. / Ecological Modelling 156 (2002) 185 /199

189

nately, the program does not, at present, provide
emission probabilities for descriptors not used in
the clustering process.

(Sporobolus height and density; Sarcocornia
height and density) in 30 sites over 56 timeperiods. We have two substantive questions and
one subsidiary question to ask. The main questions are:

3. Data and analyses

1) Is there evidence for clusters of sites indicating
that different processes are active in different
parts of the marsh?
2) If such clusters exist do they correlate with the
runnelling treatment, which would indicate
that our treatment has modified the underlying processes of vegetation change in the
marsh?

The study site is on Coomera Island (S27851?,
E153833?), to the north of the Gold Coast,
Queensland and close to areas of rapid population
growth. It is mainly vegetated with marine couch
(Sporobolus virginicus (L. Kunth) and Sarcocornia
quinqueflora (Bunge ex Ung.-Stern) with the grey
mangrove (Avicennia marina (Forsk)) along the
inlet which floods the marsh. It is also an area of
major mosquito breeding. The problem species,
Ochlerotatus vigilax (Skuse),1 is a vector of
alphaviruses such as Ross river virus and Barmah
forest virus.
To control the mosquito, a small part of the
marsh (0.5 ha) was runnelled in November 1985.
Runnels up to 0.30 cm deep and 0.90 cm wide were
constructed to link isolated pools, in which the
mosquitoes breed, to the tidal source, allowing
increased predator access. The method has worked
to reduce mosquito populations, and previous
assessment indicates relatively little, if any, impact
(Dale et al., 1993, 1996). The data used here were
collected from 30 sample sites at quarterly intervals from November 1985 to November 1999. The
vegetation attributes measured included the species, size and density of vegetation in permanent
small quadrats (10 /10 cm);2 the environmental
attributes included water table depth and salinity,
substrate moisture, pH, salinity and distance from
the tidal flooding front (tidal edge of the marsh).
In all, there were 1680 observations (each site at
each time).
In our analyses we have used only the vegetation
data, which consist of records for four attributes

We are also interested, of course, in determining
effective procedures for using the HMM-based
clustering procedure. We therefore examined three
different analyses:
1) Using four attributes (Sporobolus and Sarcocornia density and size) and estimating the
number of states in each cluster.
2) Using two attributes only (Sporobolus and
Sarcocornia density only).
3) Using only two attributes, but one species
(Sporobolus height and density) with an estimated number of states in each cluster. The
rationale for excluding Sarcocornia is that it is
absent from much of the marsh and potentially would affect the process of estimation.
To assess whether the treatment and control
sites were distributed significantly differently between clusters (when there was greater than one
cluster) we used a x2 analysis. To aid explanation,
we also analysed the environmental data using t tests when more than one cluster was identified.

4. Results

1

Until recently known as Aedes vigilax (Skuse).
Density: number in 10/10 cm permanent plots. Size:
height of youngest node for Sporobolus ; length of succulent
shoots for Sarcocornia .
2

The three analyses gave different results, but
none suggested a significant effect of modification.
Analysing the four-attribute data we obtained one
cluster only, whilst the two-attribute results both
identified two clusters (two processes).

190

M.B. Dale et al. / Ecological Modelling 156 (2002) 185 /199

Table 1
The four-attribute analysis */state descriptions
State Sporobolus mean density
(SD)

Sporobolus mean height (mm)
(SD)

Sarcocornia mean density
(SD)

Sarcocornia mean size (mm)
(SD)

0
1
2
3
4

0.052 (1.49)
97.55 (41.37)
33.572 (19.70)
0.12 (0.37)
2.00 (0)

0.005 (0.18)
0.044 (0.51)
6.583 (0.49)
92.078 (4.15)
2.578 (1.12)

0 (2.25)
0 (0)
0 (12.67)
16.76 (0.72)
66.111 (3.14)

0.001 (0.04)
127.948 (73.06)
0.626 (0.07)
0 (0)
28.318 (11.74)

4.1. Four-attribute data
The four-attribute analysis yielded one cluster,
suggesting that only one process is operating. The
states within the cluster are summarised in Table 1
and the transition matrix is in Table 2. Fig. 1
illustrates the states and the main transitions
between them.
Four of the states form a dynamic cycle of states
of the form 1/2/4/3/1/2/4/3. . . with an occasional 3/0. Thus, as shown in Fig. 1, tall dense
Sporobolus with virtually no Sarcocornia (State 1)
changes to very sparse, short Sporobolus with
sparse, very short Sarcocornia (State 2), which
becomes very short, low density Sporobolus with
sparse, large Sarcocornia (State 4), then very
dense, moderate-sized Sarcocornia with virtually
no Sporobolus (State 3) and then the pattern
repeats. At the Sarcocornia state (3) only the
change may be to bare ground (State 0) and, if
so, it is a one-way path, i.e. State 0 is an absorbing
state. The cyclic pattern may also reflect a secondorder process, a possibility which will be discussed
later. As for the effects of treatment, since there is
only one cluster for all sites, irrespective of
treatment, they all are contained within it and
hence no significant effects of modification can be
identified.
4.2. Two-attribute analysis: Sporobolus and
Sarcocornia density
If only the density of the two species is examined
we obtain two clusters or processes */one essentially Sarcocornia (Cluster 1) and the other Sporobolus (Cluster 2). The state descriptions and
transition matrices are shown in Tables 3 and 4.

The t -test results, where significant, are shown in
Table 5. The system is shown diagrammatically in
Fig. 2.
The two processes appear to relate to the habitat
characteristics of the two species. Cluster 1 (Sarcocornia ) is drier, saltier, with shorter Sporobolus
and larger Sarcocornia , and it contains more
mangrove pneumatophores than Cluster 2. Cluster
2 is wetter, less salty, has taller Sporobolus , very
short Sarcocornia and has more crab holes. One of
the local grapsid crab species (Parasesarma erythrodactyl ) appears to prefer the Sporobolus
environment (Chapman et al., 1998). Each cluster
or process differentiates the states within the
species. Within each cluster there are five states
which show similar variations to those of the fourattribute process model. The states defined for one
cluster appear to have little in common with those
of the other. Cluster 1 shows clear diagonal
dominance in the transition matrix (Table 4),
with self-transitions ranging from 0.810 to 0.944,
which indicates that the vegetation is stable.
Cluster 2 also shows this tendency, though less
markedly, except for State 2. The range of selftransitions is 0.547 /0.955.

Table 2
Transition matrix for the four-attribute analysis
From state to

0

1

2

3

4

0
1
2
3
4

1
0
0
0.01
0

0
0
0
0.99
0

0
1
0
0
0

0
0
0
0
1

0
0
1
0
0

191

M.B. Dale et al. / Ecological Modelling 156 (2002) 185 /199

4.3. Two-attribute, one-species analysis:
Sporobolus density and height

Fig. 1. Four-attribute data */one cluster.

There was no significant relationship between
treatment and cluster. This again indicates that
runnelling has not modified the processes operating, even though the observed data are heterogeneous.

If we restrict the analysis to Sporobolus only,
since it is more widespread than Sarcocornia at the
site, we again get two clusters each of four states.
The state descriptions are in Table 6 and the
transition matrices in Table 7. Fig. 3 shows the
pattern of change graphically.
Neither cluster here shows the marked diagonal
dominance of the transition matrices that appeared in the previous analysis. Cluster 2 is more
dynamic than Cluster 1 with larger off-diagonal
elements. In Cluster 1 all states may go to bare
ground (State 1), and State 2 (tall dense Sporobolus ) always does so. The bare ground and sparse
states (1 and 3) are unlikely to change, and this is
similar to the four-attribute analysis. The situation
is consistent with Dale and Dale (2002) and makes
sense in the field. Low densities and height of
Sporobolus seem to change towards bare ground.
This occurs, but to a lesser extent, in Cluster 2 as
well, where State 0 (very sparse, short Sporobolus),
which has very similar emission probabilities to
State 3 of Cluster 1, generally becomes shorter (or
stays the same). The similar states of moderately
dense, short Sporobolus states (State 0 in Cluster 1;
State 2 in Cluster 2), although having similar
emission probabilities, are each part of a different
process. In Cluster 1, State 0 directly or indirectly
ends up as bare ground, but this is not so in
Cluster 2 where State 2 cycles through large
variations in height and density but does not reach
an empty state.

Table 3
State descriptions for the two-attribute, two-cluster result
State Cluster 1

0
1
2
3
4

Cluster 2

Sporobolus mean density
(SD)

Sarcocornia mean density
(SD)

Sporobolus mean density
(SD)

Sarcocornia mean density
(SD)

1.38
35.5
87.5
13.0
1.10

84.7 (44.2)
46.1 (27.9)
24.4 (18.2)
46.7 (29.9)
37.59 (23.62)

69.511 (35.88)
112.059 (58.67)
90.637 (37.146)
15.114 (12.37)
9.673 (10.25)

32.191 (18.39)
0 (0.06)
7.483 (4.89)
17.767 (15.10)
0 (0.11)

(4.03)
(13.4)
(28.3)
(7.9)
(3.89)

192

M.B. Dale et al. / Ecological Modelling 156 (2002) 185 /199

Table 4
Transition matrices for the two-attribute (Sporobolus and Sarcocornia density) two-cluster result
From-state-to

0

1

2

3

4

Cluster 1
0
1
2
3
4

0.832
0.000
0.009
0.037
0.042

0.002
0.810
0.090
0.040
0

0.000
0.027
0.902
0.005
0.003

0.034
0.163
0.000
0.844
0.011

0.13
0
0
0.07
0.944

Cluster 2
0
1
2
3
4

0.789
0.002
0.149
0
0

0.040
0.955
0.303
0
0.034

0.099
0.023
0.547
0
0.007

0.082
0.002
0.001
0.752
0.036

0
0.018
0
0.248
0.923

The environmental attributes with significant ttest results are shown in Table 8. Cluster 1, which
is prone to reach the bare ground state, is drier,
saltier and has more and larger Sarcocornia than
Cluster 2, indicating that this cluster is from a
lower marsh position closer to the mangroves
which fringe the marsh on the seaward edge
(Adam et al., 1988). In support of this there was
a significant relationship with distance from the
flooding inlet, with Cluster 1 being considerably
closer to it (Table 8). Again there was no
significant relationship between the clusters and
whether a site had been modified or not.

5. Discussion
5.1. Substantive results on the impact of runnelling
Overall, we find that there is no evidence that
our treatment has modified the processes operating to control the vegetation changes in the marsh.
From other evidence (Dale et al., 1993, 1996; Dale,
2001) we have identified areas of the marsh which
have changed after the treatment, generally becoming wetter, but they apparently remain within
the range of states of the system. As wetness has
increased, the density of Sporobolus has decreased,

Table 5
The two-attribute analysis: Sporobolus and Sarcocornia density and significant relationships with other attribute not used in the cluster
analysis
Variable

t

df

P

Cluster 1 mean density (SE)

Cluster 2 mean density (SE)

Substrate moisture (g/g)
Substrate salinity (ppt)
Sporobolus height (mm)
Sarcocornia shoot length (mm)
Water table salinity (ppt)
Pneumatophores (10/10 cm)
Crab holes (10/10 cm)

9.7832
3.3725
26.9746
31.3960
3.1474
5.6574
3.5071

1674
1674
1677
1675
1522
1678
1678

B/0.001
0.0008
B/0.001
B/0.001
0.0017
B/0.001
0.0005

0.5753
37.6122
39.9061
44.1358
30.1076
0.0618
0.0714

0.6112
34.2077
94.4614
6.5995
28.2553
0.0053
0.1450

(0.008)
(0.760)
(1.522)
(0.900)
(0.444)
(0.008)
(0.016)

(0.002)
(0.664)
(1.332)
(0.787)
(0.386)
(0.007)
(0.014)

193

M.B. Dale et al. / Ecological Modelling 156 (2002) 185 /199

Fig. 2. Two-attribute data */Sporobolus and Sarcocornia density, two clusters (key as for Fig. 1).

since Sporobolus tends to be found in slightly
higher areas (Adam et al., 1988). The increased
wetness and reduced Sporobolus are associated
also with increased crab activity (Chapman et al.,
1998), but the causal mechanisms are not clear.
Other results are consistent with what is known
about the plants. Such results include:
. the inverse relationship in the distribution of
Sporobolus and Sarcocornia ;

. the lower salinity environment of Sporobolus at
greater distance from the tidal source; and
. the increased presence of mangrove pneumatophores in areas of higher salinity, whereas crab
activity is greatest in the wetter places.
Changes identified in the field and in remote
sensing research are consistent with the findings
here. Thus Dale et al. (1996) identified a reduction
in Sporobolus cover and biomass in the study area

Table 6
State descriptions for the two-attribute, one species analysis of Sporobolus density and height
State Cluster 1

Cluster 2

Sporobolus mean density (SD) Sporobolus mean density (SD) Sporobolus mean density (SD) Sporobolus mean density (SD)
0
1
2
3

98.046
0.001
129.341
6.474

(2.59)
(0.50)
(84.32)
(7.16)

16.688
0.014
149.613
12.118

(0.69)
(0.494)
(32.10)
(41.36)

3.204
9.979
91.275
135.975

(2.59)
(15.51)
(3.64)
(66.67)

16.954
0.680
16.765
98.871

(50.75)
(0.94)
(0.73)
(33.62)

194

M.B. Dale et al. / Ecological Modelling 156 (2002) 185 /199

Table 7
Transition matrices for the two-attribute, one species analysis
of Sporobolus density and height
From state to

0

1

2

3

Cluster 1
0
1
2
3

0
0.023
0
0.025

0.214
0.950
1.000
0.261

0.534
0
0
0

0.251
0.027
0
0.714

Cluster 2
0
1
2
3

0.334
0.666
0.011
0

0.666
0
0
1.000

0
0.334
0
0

0
0
0.989
0

between 1983 and 1991, and the analysis in Dale
and Dale (submitted for publication) clearly identified an absorbing state which is bare ground. In
the present analyses the four-attribute single
process result with five states identified this, and
to a lesser extent it appeared in Cluster 1 of the

two-attribute analysis of Sporobolus density and
height. It was not apparent in the two-attribute
Sporobolus and Sarcocornia density analysis. This
perhaps illustrates that the selection of attributes
can have a marked effect on the results and their
interpretation. The cyclic nature of the changes of
state may be related to season or may reflect local
effects of tides, which have a lunar cycle.
5.2. Methodological
Although there are many other ways of evaluating the similarity between series of observations
(for example Merriam and Sneath, 1966; Lance
and Williams, 1967; Dale et al., 1970; Juang and
Rabiner, 1985; Little and Ross, 1985; Dale et al.,
1988; Agrawal et al., 1995; Bollob’as et al., 1997;
Allison, 1998; Oates et al., 1998; Edgoose and
Allison, 1999; Perng et al., 2000), we shall concentrate on the HMM approach, for which there
remain several issues to consider. These include:

Fig. 3. Two-attribute, one species data */Sporobolus density and height, two clusters (key as for Fig. 1).

195

M.B. Dale et al. / Ecological Modelling 156 (2002) 185 /199

Table 8
The two-attribute, one species analysis of Sporobolus density and height */other significantly related variables
Variable

t

df

P

Cluster 1 mean (SE)

Cluster 2 mean (SE)

Substrate moisture (g/g)
Sarcocornia density (10/10 cm)
Sarcocornia shoot length (mm)
Water table salinity (ppt)
Distance from tidal source (m)

4.6328
9.8757
9.3918
2.4938
2.0543

1674
1678
1675
1552
28

B/0.001
B/0.001
B/0.001
0.0127
0.0494

0.5858
30.3819
30.6952
29.8873
75.3846

0.6032
15.1418
16.8996
28.4185
98.2353

. data stability, coarseness and variance;
. the propriety of using first-order Markov
models, especially when nonlinearity, nonstationarity and system memory are likely (Lee,
1990);
. the use of crisp clusters and its implication for
consistency of estimates and model uncertainty;
. the potential environmental interpretation of
states.
5.2.1. Data coarseness and variance
Much vegetation data are collected using coarse
measures such as ordered categories. We should
strictly introduce a trade-off between the coarseness of the specification of parameters and the loss
of fit induced by crude estimates of parameters, so
that our model reflects the crudeness of our
observation (Zukerman et al., 2000). Wallace and
Dowe (2000), using the minimum message length
principle, obtained a trade-off between the costs of
encoding very precise parameter estimates and the
loss of fit to the data resulting from using coarse
estimates. Paluŝ (1996a) has examined this question in time series where continuous attributes
have been quantised.
5.2.2. The suitability of Markov models
Markov models are relatively simple, though
powerful, models, but they are certainly not the
only possibilities for modelling sequences. Raftery
and Tavare (1994), Liebovitch (1995), Stanley et
3
Both sample multivariate data without any obvious
demarcation into sampling units, and exhibit considerable
local variation in the relationship between the observed features
and the underlying conceptual meaning due to phenotypic
responses for vegetation, and to accents and dialects for speech.

(0.003)
(1.162)
(1.106)
(0.444)
(8.373)

(0.003)
(1.016)
(0.967)
(0.387)
(7.322)

al. (1995), Le et al. (1996) have all proposed
generalisations and alternatives. Ostendorf et al.
(1966) discussed specific shortcomings of HMMs
for continuous speech recognition, which is a
problem akin to vegetation sequence studies.3
They specifically recognise the weak modelling of
duration, the assumptions of conditional independence of observations given the state sequence,
and the restrictions on feature extraction imposed
by frame-based observations, which is really a
question of scale.
Orlóci et al. (1993) have, in contrast, argued that
a Markov model is suitable for studying vegetation
processes. However, theoretical considerations are
probably overruled by practical considerations. In
order to estimate parameters for more complex
models we need long data sequences, and these are
usually not available. The ‘memory’ of an ecological system might be quite long, and the collection
of data over periods of time, which might reach
centuries for forests, is simply infeasible.
5.2.3. Nonstationarity
We have assumed that the parameters of the
models fitted to a group of sequences do not
change with time; that is the estimates do not
depend on where in the sequence we start. This
assumption is unlikely to be true in ecological
studies where responses to environmental changes
may well mean that the parameters themselves
vary. An example would be the effects of radiation
reported by Ball et al. (1991). Such nonstationarity
is also intimately related to, and may be indistinguishable from, nonlinearity.
In the four-attribute results, the processes appeared to be operating in a cyclic manner. In this
case the starting point in a sequence may not be

196

M.B. Dale et al. / Ecological Modelling 156 (2002) 185 /199

important. However, if the starting point is
critical, one approach would be to look for simple
nonstationary models (trends, periodicities) to
represent the time-varying aspects of any sequence
(see Cox, 1958; Paluŝ, 1996b; Le et al., 1996). We
can then try to remove them instead of complicating our models directly, by examining residuals
after the removal of any nonstationary effects.
Incorporation is also possible; for example, Sun et
al. (1994) discuss nonstationary models. We might
also look to semi-continuous or discrete Markov
processes, especially if it is sensible to group our
observations into events or episodes (see, e.g.,
Dietterich and Michalski, 1985; Mannila and
Toivonen, 1996).
An appearance of nonstationarity can also
result from non-Gaussian error distributions or
variation in the variance of the error distribution
through time. These might be studied through the
use of entropy rates. Entropy measures the error
that we have in determining our location in a state
space. Entropy rate considers how this error
changes with time and can be measured using the
Kolmogorov /Sinai or metric entropy rate (Paluŝ,
1997). Entropy rate is the maximal diversity of
patterns in a data stream and can be related to
mutual information. Again, long sequences are
necessary to investigate such maters.
A somewhat similar problem occurs when a
single series of observations changes its generating
process one, or more times, perhaps because of
environmental fluctuation. A single model for the
entire sequence is then inappropriate, although
between such changes a simple model may be
adequate. Change-point estimation has been the
object of much study, as have been other aspects of
within-sequence similarity and dissimilarity (see,
e.g., Lawrence et al., 1993).
5.2.4. Higher order processes
Due to the limited duration of observations, we
have chosen to use first-order Markov processes
only. This means that the next state of the system
is dependent only on the present state, i.e. the
system has no memory. Ecologically, this means
that a snapshot of the vegetation at one time
provides a reasonable basis for predicting its
future state, which has considerable practical

significance. There is some evidence that this
assumption is untrue in the ‘two-attribute onespecies’ analysis. Here we have two clusters, but in
each there exists a state with emission probabilities
similar to a state in the other cluster. However, the
transition probabilities differ for the two states.
This does mean that observationally indistinguishable sites could have different futures. How, then,
is this future to be determined? While the emission
probabilities are not exact descriptions of the
hidden state, we do not have proof of higher order
processes; this does suggest that memory is present
in the system. We will attempt an assessment of the
amount of memory in our system elsewhere. The
four-attribute result may also reflect a secondorder process describing some periodic changes
using three states based on height alone: low,
moderate and tall. Sporobolus and Sarcocornia
seem to be negatively correlated and the model
simply needs greater complexity to distinguish
low 0/moderate from tall 0/moderate transitions.
To avoid the complexity of a second-order (or
higher) model, we can transform it into a firstorder model in the following way. In a secondorder model the transition probabilities depend on
the previous two states. We recode the pairs of
states to produce ‘new’ states. Given a sequence 1 /
2/4/3/1/2/4/3. . ., we might recode 1 /2 to W,
2/4 to X, 4/3 to Y, 3 /1 to Z, which results in a
(first-order) process WXYZWXYZ. . ., i.e. we code
every pair of symbols using a new symbol, using
overlapping pairs. By this means we obtain a firstorder sequence, using W, X, Y and Z. For n initial
states we would finally have n2 states, although not
all of these may actually be observed. For a thirdorder process we would need to consider triples
and hence up to n3 possible states. This simply
trades the order of model against the number of
states.
5.2.5. Partial assignment, model uncertainty and
consistency of estimates
In assigning each sequence crisply to a cluster
we are implicitly accepting that a single ‘true’
model is appropriate and therefore ignoring model
uncertainty. Wallace and Freeman (1987) have
shown that such crisp clustering (segmentation)
can lead to inconsistent estimation of cluster

M.B. Dale et al. / Ecological Modelling 156 (2002) 185 /199

parameters. By allowing things to be probabilistically (fuzzily) assigned to several clusters, this
inconsistency can be removed. Such a procedure
also incorporates model uncertainty, for everything is effectively associated with the model of
every cluster. Clusters of HMMs seem likely to
behave similarly.
5.3. Environmental interpretation
Although we have here relied on simple tests, it
is obvious that we could analyse the environmental
data in the same HMM manner as the plant data,
although perhaps the Gaussian assumption for
emissions is less acceptable and other distributions
would have to be introduced. We would then have
a second set of HMMs that would form a basis for
comparison of the two results. Juang and Rabiner
(1985) have examined similarity measures between
HMMs, so that inter-cluster relationships can be
examined, while Allison et al. (1990) used ‘pair
HMMs’ for a similar purpose.
In summary, in this paper we have shown how
the use of HMMs allows us to identify the nature
of a given treatment. In this particular case there is
no evidence that runnelling introduces changes in
the processes operating in the salt marsh, for a
single HMM is sufficient to capture the temporal
variation. We have also shown that restricting the
descriptors to Sporobolus properties only shows
that Sporobolus is acting in different ways in
different parts of the marsh, though this heterogeneity is not a reflection of our treatment.
Instead, it probably reflects the relative position
of a site on the marsh. Using the density of the two
species as the only property we also obtain a
different result, one which also identifies heterogeneity in the system. This too may reflect position
on the marsh. The brief analysis of the other
environmental attributes does support these ideas
in terms of wetness-related factors, which in turn
are related to relative position on the marsh.
The biggest problem with this methodology, and
with generalisations of it, is the need for the
observations to be made over adequate periods
of time. We have used 56 occasions representing 14
years of data collection, yet this is hardly sufficient
even for first-order Markov processes. Extensions

197

to nonstationary, or other more complex, models
would require much larger samples. What is clear
is that using changes derived from observing two
occasions cannot provide an acceptable assessment
of the effects of impacts on the processes operating
within a system and of its future course. Change in
vegetation may be slow, but it is inexorable, and
we accept predictions based on limited evidence at
our peril.

Acknowledgements
We thank many student assistants who have
helped to collect the field data over many years.
We thank the Gold Coast City Council for
providing boat transport to the study site for the
whole time-period. Financial support has come
from the Gold Coast City Council and the
Mosquito and Arbovirus Research Committee,
as well as from the Queensland State Health
Department.

References
Adam, P., Wilson, N.C., Huntley, B., 1988. The phytosociology
of coastal saltmarsh vegetation in New South Wales.
Wetlands (Australia) 7, 35 /84.
Agrawal, R., Lin, K.-I., Sawhney, H.S., Shim, K., 1995. Fast
similarity search in the presence of noise, scaling and
translation in time series databases. Proceedings of the
21st Very Large Database Conference, Zürich, Switzerland.
Akaike, H., 1978. A Bayesian analysis of the minimum AIC
procedure. Ann. Inst. Stat. Math. 30, 9 /14.
Allison, L., 1998. Information-theoretic sequence alignment.
Technical Report, 98/14, School of Computer Science,
Monash University, Clayton, Victoria.
Allison, L., Wallace, C.S., Yee, C.N., 1990. Inductive inference
over macro-molecules. Technical Report, 90/148, Department of Computer Science, Monash University, Clayton,
Victoria.
Ball, M.C., Hodges, V.S., Laughlin, G.P., 1991. Cold-induced
photoinhibition limits regeneration of snow gum at tree line.
Funct. Ecol. 5, 663 /668.
Banfield, J.D., Raftery, A.E., 1993. Model-based Gaussian and
non-Gaussian clustering. Biometrics 49, 803 /821.
Bensmail, H., Celeux, G., Raftery, A.E., Roberts, C.P., 1995.
Inference in model-based cluster analysis. Technical Report,
285, Department of Statistics, University of Washington,
Washington, DC.

198

M.B. Dale et al. / Ecological Modelling 156 (2002) 185 /199

Bollob’as, B., Das, G., Gunopulos, D., Mannila, H., 1997.
Time-series similarity problems and well-separated geometric sets. Available from: http://www.almaden.ibm.com/
cs/quest/papers/cg97_expanded.ps
Boulton, D.M., Wallace, C.S., 1970. A program for numerical
classification. Comput. J. 13, 63 /69.
Chapman, H., Dale, P.E.R., Kay, B.H., 1998. A method for
assessing the effects of runnelling on salt-marsh grapsid crab
populations. J. Am. Mosq. Control Assoc. 14, 61 /68.
Cox, D.R., 1958. The regression analysis of binary sequences. J.
R. Stat. Soc. Ser. B 20, 215 /232.
Critchlow, D.E., 1980. Metric methods for analyzing partially
ranked data. In: Lecture Notes in Statistics, vol. 34.
Springer-Verlag, Berlin.
Dale, P.E.R., 2001. Wetlands of conservation significance:
mosquito borne disease and its control. Arbovirus Res.
Aust. 8, 102 /108.
Dale, P.E.R., Dale, M.B., 2002. Optimal classification to
describe environmental change: pictures from the exposition. Community Ecol. 3, 19 /29.
Dale, M.B., Macnaughton-Smith, P.W.T., Lance, G.N., 1970.
Numerical classification of sequences. Aust. Comput. J. 2,
9 /13.
Dale, M.B., Coutts, R., Dale, P.E.R., 1988. Landscape
classification by sequences: a study of Toohey Forest.
Vegetatio 29, 113 /129.
Dale, P.E.R., Dale, P.T., Hulsman, K., Kay, B.H., 1993.
Runnelling to control saltmarsh mosquitoes: long-term
efficacy and environmental impacts. J. Am. Mosq. Control
Assoc. 9, 174 /181.
Dale, P.E.R., Chandica, A.L., Evans, M., 1996. Using image
subtraction and classification to evaluate change in subtropical intertidal wetlands. Int. J. Remote Sensing 17, 703 /
719.
Dermatas, E., Kokkinakis, G., 1996. Algorithm for clustering
continuous density hidden Markov model by recognition
error. IEEE Trans. Speech Audio Process. 4, 231 /234.
Dietterich, T.G., Michalski, R.S., 1985. Discovering patterns in
sequences of events. Artif. Intell. 25, 287 /332.
Edgoose, T., Allison, L., 1999. MML Markov classification of
sequential data. Stat. Comput. 9, 269 /278.
Faith, D.P., Dostine, P.L., Humphrey, C.L., 1995. Detection of
mining impacts on aquatic macroinvertebrate communities:
results of a disturbance experiment and the design of a
multivariate BACIP monitoring program at Coronation
Hill, NT. Aust. J. Ecol. 20, 167 /180.
Juang, B.H., Rabiner, L.R., 1985. A probabilistic distance
measure for hidden Markov models. AT&T Tech. J. 64,
391 /408.
Lance, G.N., Williams, W.T., 1967. Note on the classification
of multilevel data. Comput. J. 9, 381 /382.
Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S.,
Neuwald, A.F., Wootton, J.C., 1993. Detecting subtle
sequence signals. Science 262, 208 /214.
Le, N.D., Martin, R.D., Raftery, A.E., 1996. Modelling outliers, bursts and flat stretches in time series using mixture

transition distribution (MTD) model. J. Am. Stat. Assoc.
91, 1504 /1515.
Lee, K.F., 1990. Context-dependent phonetic hidden Markov
models for speaker independent continuous speech recognition. IEEE Trans. Acoust. Speech Signal Process. 38, 599 /
609.
Li, C., Biswas, G., 1999. Temporal pattern generation using
hidden Markov model-based unsupervised classification. In:
Advances in Intelligent Data Analysis In: Lecture Notes in
Computer Science, vol. 1642. Springer-Verlag, Berlin, pp.
245 /256.
Li, C., Biswas, G., 2000. Bayesian temporal data clustering
using hidden Markov model representation. In: Langley, P.
(Ed.), Proceedings of the Seventeenth International Conference on Machine Learning. Morgan Kaufmann, San
Francisco, CA, pp. 543 /550.
Liebovitch, L.S., 1995. Ion channel kinetics. In: Iannaccane,
P.M., Khokha, M.K. (Eds.), Fractal Geometry in Biological
Systems: An Analytical Approach. CRC Press, London, pp.
31 /56.
Little, I.P., Ross, D.R., 1985. The Levenshtein metric: a new
means for soil classification tested by data from a sandpodzol chronosequence and evaluated by discriminant analysis.
Aust. J. Soil Res. 23, 115 /130.
Mannila, H., Toivonen, H., 1996. Discovering generalized
episodes using minimal occurrences. Proceedings of the
Second International Conference on Knowledge Discovery
and Data Mining (KDD’96), AAAI Press, Portland, OR,
pp. 146 /151.
Merriam, D.F., Sneath, P.H.A., 1966. Quantitative comparison
of contour maps. J. Geophys. Res. 71, 1105 /1115.
Oates, T., Firoiu, L., Cohen, P.R., 1998. Clustering time series
with hidden Markov model and dynamic time warping.
Proceedings of the 4th International Conference on Knowledge Discovery Datamining, New York, pp. 294 /298.
Orlóci, L., Anand, M., He, X.S., 1993. Markov chains: a
realistic model for temporal coenosere. Biom. Praxom. 33,
7 /26.
Ostendorf, M., Digilakis, V., Kimball, O.A., 1966. From
HMMs to segment models: a unified view of stochastic
modelling for speech recognition. IEEE Trans. Speech
Audio Process. 4, 360 /378.
Paluŝ, M., 1996a. Coarse grained entropy rates for characterization of complex time series. Physica D 93, 64 /77.
Paluŝ, M., 1996b. Detecting nonlinearity in multivariate time
series. Phys. Lett. A 213, 1387.
Paluŝ, M., 1997. Kolmogorov entropy from time series using
information-theoretic functionals. Neural Netw. World 7,
269 /292.
Perng, C.-S., Wang, H.-X., Zhang, S.R., Parker, D.S., 2000.
Landmarks: a new model for similarity-based pattern
querying in time series databases. Sixteenth International
Conference on Data Engineering, pp. 33 /47.
Rabiner, L.R., 1989. A tutorial on hidden Markov models and
selected applications in speech recognition. Proc. IEEE 77,
257 /286.

M.B. Dale et al. / Ecological Modelling 156 (2002) 185 /199
Raftery, A.E., Tavare, S., 1994. Estimation and modelling
repeated patterns in high-order Markov chains with the
mixture transition distribution (MTD) model. Appl. Stat. 4,
178 /200.
Schwarz, G., 1978. Estimating dimension of a model. Ann. Stat.
6, 461 /464.
Sebastiani, P., Ramoni, M., Cohen, P., Warwick, J., Davis, P.,
1999. Discovering dynamics using Bayesian clustering. In:
Proceedings of the 3rd International Symposium on Intelligent Data Analysis.
Smyth, P., 1997. Clustering sequences with hidden Markov
models. In: Mozer, M.C., Jordan, M.I., Petsche, T. (Eds.),
Advanced Neural Information Process, vol. 9. MIT Press,
Cambridge, MA.
Stanley, H.E., Buldyrev, S.V., Goldberger, A.L., Havlin, S.,
Mantegna, R.S., Peng, C.-K., Simons, M., 1995. Scale
invariant features of coding and noncoding DNA se-

199

quences. In: Iannaccane, P.M., Khokha, M.K. (Eds.),
Fractal Geometry in Biological Systems: An Analytical
Approach. CRC Press, London, pp. 15 /30.
Sun, D., Deng, L., Wu, C., 1994. State-dependent time warping
in the trended hidden Markov model. Signal Process. 39,
263 /275.
Wallace, C.S., Dowe, D.L., 2000. MML clustering of multistate, Poisson, von Mises circular and Gaussian distributions. Stat. Comput. 10, 73 /83.
Wallace, C.S., Freeman, P.R., 1987. Estimation and inference
by compact coding. J. R. Stat. Soc. B 49, 240 /252.
Zukerman, I., Albrecht, D.W., Nicholson, A.E., Doklo, K.,
2000. Trading off granularity against complexity in predictive models for complex domains. Sixth Pacific Rim
International Co